diff --git a/.gitattributes b/.gitattributes index c7d9f3332a950355d5a77d85000f05e6f45435ea..6636f7b144effac61f806581a4bc14e7152b6ac8 100644 --- a/.gitattributes +++ b/.gitattributes @@ -32,3 +32,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text +model_final.pdparams filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/demo/car.jpg filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/demo/P0072__1.0__0___0.png filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/demo/P0861__1.0__1154___824.png filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/docs/images/picodet_android_demo1.jpg filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/docs/images/picodet_android_demo2.jpg filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/docs/images/picodet_android_demo3.jpg filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/docs/images/picodet_map.png filter=lfs diff=lfs merge=lfs -text +PaddleDetection-release-2.6/docs/images/tinypose_demo.png filter=lfs diff=lfs merge=lfs -text diff --git a/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/1_bug-report.yml b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/1_bug-report.yml new file mode 100644 index 0000000000000000000000000000000000000000..e2afdaa5ee2f2275ca567500bd2b640680e35b73 --- /dev/null +++ b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/1_bug-report.yml @@ -0,0 +1,106 @@ +name: 🐛 报BUG Bug Report +description: 报告一个可复现的Bug以帮助我们修复PaddleDetection。 Report a bug to help us reproduce and fix it. +labels: [type/bug-report, status/new-issue] + +body: +- type: markdown + attributes: + value: | + Thank you for submitting a PaddleDetection Bug Report! + +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + (必选项) 在向PaddleDetection报bug之前,请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的bug。 + + (Required) Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues). + + options: + - label: > + 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues),没有发现相似的bug。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar bug report. + required: true + +- type: dropdown + attributes: + label: Bug组件 Bug Component + description: | + (可选项) 请选择在哪部分代码发现这个bug。(Optional) Please select the part of PaddleDetection where you found the bug. + multiple: true + options: + - "Training" + - "Validation" + - "Inference" + - "Export" + - "Deploy" + - "Installation" + - "DataProcess" + - "Other" + validations: + required: false + +- type: textarea + id: code + attributes: + label: Bug描述 Describe the Bug + description: | + 请清晰而简洁地描述这个bug,并附上bug复现步骤、报错信息或截图、代码改动说明或最小可复现代码。如果代码太长,请将可执行代码放到[AIStudio](https://aistudio.baidu.com/aistudio/index)中并将项目设置为公开(或者放到github gist上),并在项目中描述清楚bug复现步骤,在issue中描述期望结果与实际结果。 + + 如果你报告的是一个报错信息,请将完整回溯的报错贴在这里,并使用 ` ```三引号块``` `展示错误信息。 + + + placeholder: | + 请清晰简洁的描述这个bug。 A clear and concise description of what the bug is. + + ```python + 代码改动说明,或最小可复现代码。 Code change description, or sample code to reproduce the problem. + ``` + + ```shell + 带有完整回溯信息的报错日志或截图。 The error log or screenshot you got, with the full traceback. + ``` + validations: + required: true + +- type: textarea + attributes: + label: 复现环境 Environment + description: 请具体说明复现bug的环境信息。Please specify the environment information for reproducing the bug. + placeholder: | + - OS: Linux/Windows + - PaddlePaddle: 2.2.2 + - PaddleDetection: release/2.4 + - Python: 3.8.0 + - CUDA: 10.2 + - CUDNN: 7.6 + - GCC: 8.2.0 + validations: + required: true + +- type: checkboxes + attributes: + label: Bug描述确认 Bug description confirmation + description: > + (必选项) 请确认是否提供了详细的Bug描述和环境信息,确认问题是否可以复现。 + + (Required) Please confirm whether the bug description and environment information are provided, and whether the problem can be reproduced. + + options: + - label: > + 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced. + required: true + +- type: checkboxes + attributes: + label: 是否愿意提交PR? Are you willing to submit a PR? + description: > + (可选项) 如果你对修复bug有自己的想法,十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls),共同提升PaddleDetection。 + + (Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature. + options: + - label: 我愿意提交PR!I'd like to help by submitting a PR! + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/2_feature-request.yml b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/2_feature-request.yml new file mode 100644 index 0000000000000000000000000000000000000000..dcf9ec4462886c7064315f0fc6ac167dd6c6dbf5 --- /dev/null +++ b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/2_feature-request.yml @@ -0,0 +1,50 @@ +name: 🚀 新需求 Feature Request +description: 提交一个你对PaddleDetection的新需求。 Submit a request for a new Paddle feature. +labels: [type/feature-request, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 你可以在这里提出你对PaddleDetection的新需求,包括但不限于:功能或模型缺失、功能不全或无法使用、精度/性能不符合预期等。 + + #### You could submit a request for a new feature here, including but not limited to: new features or models, incomplete or unusable features, accuracy/performance not as expected, etc. + +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + 在向PaddleDetection提新需求之前,请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的需求。 + + Before submitting a feature request, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues). + + options: + - label: > + 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues),没有类似需求。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar feature requests. + required: true + +- type: textarea + id: description + attributes: + label: 需求描述 Feature Description + description: | + 请尽可能包含任务目标、需求场景、功能描述等信息,全面的信息有利于我们准确评估你的需求。 + Please include as much information as possible, such as mission objectives, requirement scenarios, functional descriptions, etc. Comprehensive information will help us accurately assess your feature request. + value: "1. 任务目标(请描述你正在做的项目是什么,如模型、论文、项目是什么?); 2. 需求场景(请描述你的项目中为什么需要用此功能); 3. 功能描述(请简单描述或设计这个功能)" + validations: + required: true + +- type: checkboxes + attributes: + label: 是否愿意提交PR Are you willing to submit a PR? + description: > + (可选)如果你对新feature有自己的想法,十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls),共同提升PaddleDetection + + (Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature. + options: + - label: Yes I'd like to help by submitting a PR! + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/3_documentation-issue.yml b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/3_documentation-issue.yml new file mode 100644 index 0000000000000000000000000000000000000000..4ea08cd5f4b99003d2323e1578bd0456a9dcf848 --- /dev/null +++ b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/3_documentation-issue.yml @@ -0,0 +1,38 @@ +name: 📚 文档 Documentation Issue +description: 反馈一个官网文档错误。 Report an issue related to https://github.com/PaddlePaddle/PaddleDetection. +labels: [type/docs, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 请确认反馈的问题来自PaddlePaddle官网文档:https://github.com/PaddlePaddle/PaddleDetection 。 + + #### Before submitting a Documentation Issue, Please make sure that issue is related to https://github.com/PaddlePaddle/PaddleDetection. + +- type: textarea + id: link + attributes: + label: 文档链接&描述 Document Links & Description + description: | + 请说明有问题的文档链接以及该文档存在的问题。 + Please fill in the link to the document and describe the question. + validations: + required: true + + +- type: textarea + id: error + attributes: + label: 请提出你的建议 Please give your suggestion + description: | + 请告诉我们,你希望如何改进这个文档。或者你可以提个PR修复这个问题。 + Please tell us how you would like to improve this document. Or you can submit a PR to fix this problem. + + validations: + required: false + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/4_ask-a-question.yml b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/4_ask-a-question.yml new file mode 100644 index 0000000000000000000000000000000000000000..af237f516eb333d4c5f33bba4b7dc9c0dec2e30f --- /dev/null +++ b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/4_ask-a-question.yml @@ -0,0 +1,37 @@ +name: 🙋🏼‍♀️🙋🏻‍♂️提问 Ask a Question +description: 提出一个使用/咨询问题。 Ask a usage or consultation question. +labels: [type/question, status/new-issue] + +body: +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + #### 你可以在这里提出一个使用/咨询问题,提问之前请确保: + + - 1)已经百度/谷歌搜索过你的问题,但是没有找到解答; + + - 2)已经在官网查询过[教程文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED_cn.md)与[FAQ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/docs/tutorials/FAQ),但是没有找到解答; + + - 3)已经在[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)中搜索过,没有找到同类issue或issue未被解答。 + + + #### You could ask a usage or consultation question here, before your start, please make sure: + + - 1) You have searched your question on Baidu/Google, but found no answer; + + - 2) You have checked the [tutorials](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED.md), but found no answer; + + - 3) You have searched [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues), but found no similar issue or the issue has not been answered. + + options: + - label: > + 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer. + required: true + +- type: textarea + id: question + attributes: + label: 请提出你的问题 Please ask your question + validations: + required: true diff --git a/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/5_others.yml b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/5_others.yml new file mode 100644 index 0000000000000000000000000000000000000000..ec2f08ae16098cd8987f3b6bc726d9a28696833a --- /dev/null +++ b/PaddleDetection-release-2.6/.github/ISSUE_TEMPLATE/5_others.yml @@ -0,0 +1,23 @@ +name: 🧩 其他 Others +description: 提出其他问题。 Report any other non-support related issues. +labels: [type/others, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 你可以在这里提出任何前面几类模板不适用的问题,包括但不限于:优化性建议、框架使用体验反馈、版本兼容性问题、报错信息不清楚等。 + + #### You can report any issues that are not applicable to the previous types of templates, including but not limited to: enhancement suggestions, feedback on the use of the framework, version compatibility issues, unclear error information, etc. + +- type: textarea + id: others + attributes: + label: 问题描述 Please describe your issue + validations: + required: true + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉! Thanks for your contribution 🎉! diff --git a/PaddleDetection-release-2.6/.gitignore b/PaddleDetection-release-2.6/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..4b6a6e8246385c676e00a412f6030ec4100d090f --- /dev/null +++ b/PaddleDetection-release-2.6/.gitignore @@ -0,0 +1,88 @@ +# Virtualenv +/.venv/ +/venv/ + +# Byte-compiled / optimized / DLL files +__pycache__/ +.ipynb_checkpoints/ +*.py[cod] + +# C extensions +*.so + +# json file +*.json + +# log file +*.log + +# Distribution / packaging +/bin/ +*build/ +/develop-eggs/ +*dist/ +/eggs/ +/lib/ +/lib64/ +/output/ +/inference_model/ +/output_inference/ +/parts/ +/sdist/ +/var/ +*.egg-info/ +/.installed.cfg +/*.egg +/.eggs + +# AUTHORS and ChangeLog will be generated while packaging +/AUTHORS +/ChangeLog + +# BCloud / BuildSubmitter +/build_submitter.* +/logger_client_log + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +.tox/ +.coverage +.cache +.pytest_cache +nosetests.xml +coverage.xml + +# Translations +*.mo + +# Sphinx documentation +/docs/_build/ + +*.tar +*.pyc + +.idea/ + +dataset/coco/annotations +dataset/coco/train2017 +dataset/coco/val2017 +dataset/voc/VOCdevkit +dataset/fruit/fruit-detection/ +dataset/voc/test.txt +dataset/voc/trainval.txt +dataset/wider_face/WIDER_test +dataset/wider_face/WIDER_train +dataset/wider_face/WIDER_val +dataset/wider_face/wider_face_split + +ppdet/version.py + +# NPU meta folder +kernel_meta/ + +# MAC +*.DS_Store + diff --git a/PaddleDetection-release-2.6/.pre-commit-config.yaml b/PaddleDetection-release-2.6/.pre-commit-config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..099148ac4ed123b68803486f7d30d157005b617d --- /dev/null +++ b/PaddleDetection-release-2.6/.pre-commit-config.yaml @@ -0,0 +1,44 @@ +- repo: https://github.com/PaddlePaddle/mirrors-yapf.git + sha: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37 + hooks: + - id: yapf + files: \.py$ +- repo: https://github.com/pre-commit/pre-commit-hooks + sha: a11d9314b22d8f8c7556443875b731ef05965464 + hooks: + - id: check-merge-conflict + - id: check-symlinks + - id: detect-private-key + files: (?!.*paddle)^.*$ + - id: end-of-file-fixer + files: \.(md|yml)$ + - id: trailing-whitespace + files: \.(md|yml)$ +- repo: https://github.com/Lucas-C/pre-commit-hooks + sha: v1.0.1 + hooks: + - id: forbid-crlf + files: \.(md|yml)$ + - id: remove-crlf + files: \.(md|yml)$ + - id: forbid-tabs + files: \.(md|yml)$ + - id: remove-tabs + files: \.(md|yml)$ +- repo: local + hooks: + - id: clang-format-with-version-check + name: clang-format + description: Format files with ClangFormat. + entry: bash ./.travis/codestyle/clang_format.hook -i + language: system + files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto)$ + +- repo: local + hooks: + - id: cpplint-cpp-source + name: cpplint + description: Check C++ code style using cpplint.py. + entry: bash ./.travis/codestyle/cpplint_pre_commit.hook + language: system + files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx)$ diff --git a/PaddleDetection-release-2.6/.style.yapf b/PaddleDetection-release-2.6/.style.yapf new file mode 100644 index 0000000000000000000000000000000000000000..4741fb4f3bbc6681088cf9e960321e7b857a93a8 --- /dev/null +++ b/PaddleDetection-release-2.6/.style.yapf @@ -0,0 +1,3 @@ +[style] +based_on_style = pep8 +column_limit = 80 diff --git a/PaddleDetection-release-2.6/.travis.yml b/PaddleDetection-release-2.6/.travis.yml new file mode 100644 index 0000000000000000000000000000000000000000..b8eff51456d9723695cd037543e73f921ad4d009 --- /dev/null +++ b/PaddleDetection-release-2.6/.travis.yml @@ -0,0 +1,35 @@ +language: cpp +cache: ccache +sudo: required +dist: trusty +services: + - docker +os: + - linux +env: + - JOB=PRE_COMMIT + +addons: + apt: + packages: + - git + - python + - python-pip + - python2.7-dev + ssh_known_hosts: 13.229.163.131 +before_install: + - sudo pip install -U virtualenv pre-commit pip -i https://pypi.tuna.tsinghua.edu.cn/simple + - docker pull paddlepaddle/paddle:latest + - git pull https://github.com/PaddlePaddle/PaddleDetection develop + +script: + - exit_code=0 + - .travis/precommit.sh || exit_code=$(( exit_code | $? )) + # - docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c + # 'cd /py_unittest; sh .travis/unittest.sh' || exit_code=$(( exit_code | $? )) + - if [ $exit_code -eq 0 ]; then true; else exit 1; fi; + +notifications: + email: + on_success: change + on_failure: always diff --git a/PaddleDetection-release-2.6/.travis/codestyle/clang_format.hook b/PaddleDetection-release-2.6/.travis/codestyle/clang_format.hook new file mode 100644 index 0000000000000000000000000000000000000000..1c4aa5b164a9871a227e753c9dce57827eabd748 --- /dev/null +++ b/PaddleDetection-release-2.6/.travis/codestyle/clang_format.hook @@ -0,0 +1,4 @@ +#!/bin/bash +set -e + +clang-format $@ diff --git a/PaddleDetection-release-2.6/.travis/codestyle/cpplint_pre_commit.hook b/PaddleDetection-release-2.6/.travis/codestyle/cpplint_pre_commit.hook new file mode 100644 index 0000000000000000000000000000000000000000..c90bf29ecb794bde52df7468d7626211397b0391 --- /dev/null +++ b/PaddleDetection-release-2.6/.travis/codestyle/cpplint_pre_commit.hook @@ -0,0 +1,27 @@ +#!/bin/bash + +TOTAL_ERRORS=0 +if [[ ! $TRAVIS_BRANCH ]]; then + # install cpplint on local machine. + if [[ ! $(which cpplint) ]]; then + pip install cpplint + fi + # diff files on local machine. + files=$(git diff --cached --name-status | awk '$1 != "D" {print $2}') +else + # diff files between PR and latest commit on Travis CI. + branch_ref=$(git rev-parse "$TRAVIS_BRANCH") + head_ref=$(git rev-parse HEAD) + files=$(git diff --name-status $branch_ref $head_ref | awk '$1 != "D" {print $2}') +fi +# The trick to remove deleted files: https://stackoverflow.com/a/2413151 +for file in $files; do + if [[ $file =~ ^(patches/.*) ]]; then + continue; + else + cpplint --filter=-readability/fn_size,-build/include_what_you_use,-build/c++11 $file; + TOTAL_ERRORS=$(expr $TOTAL_ERRORS + $?); + fi +done + +exit $TOTAL_ERRORS diff --git a/PaddleDetection-release-2.6/.travis/precommit.sh b/PaddleDetection-release-2.6/.travis/precommit.sh new file mode 100644 index 0000000000000000000000000000000000000000..bcbfb2bb530ca6fecd1ac4c9e049c292a61e5e64 --- /dev/null +++ b/PaddleDetection-release-2.6/.travis/precommit.sh @@ -0,0 +1,21 @@ +#!/bin/bash +function abort(){ + echo "Your commit not fit PaddlePaddle code style" 1>&2 + echo "Please use pre-commit scripts to auto-format your code" 1>&2 + exit 1 +} + +trap 'abort' 0 +set -e +cd `dirname $0` +cd .. +export PATH=/usr/bin:$PATH +pre-commit install + +if ! pre-commit run -a ; then + ls -lh + git diff --exit-code + exit 1 +fi + +trap : 0 diff --git a/PaddleDetection-release-2.6/.travis/requirements.txt b/PaddleDetection-release-2.6/.travis/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..27a340d8f53b1adf94d99b887c525984d53dfd4c --- /dev/null +++ b/PaddleDetection-release-2.6/.travis/requirements.txt @@ -0,0 +1,8 @@ +# add python requirements for unittests here, note install pycocotools +# directly is not supported in travis ci, it is installed by compiling +# from source files in unittest.sh +tqdm +cython +shapely +llvmlite==0.33 +numba==0.50 diff --git a/PaddleDetection-release-2.6/.travis/unittest.sh b/PaddleDetection-release-2.6/.travis/unittest.sh new file mode 100644 index 0000000000000000000000000000000000000000..e71833134fe62db0b3eddfe2806951e1880624d9 --- /dev/null +++ b/PaddleDetection-release-2.6/.travis/unittest.sh @@ -0,0 +1,47 @@ +#!/bin/bash + +abort(){ + echo "Run unittest failed" 1>&2 + echo "Please check your code" 1>&2 + echo " 1. you can run unit tests by 'bash .travis/unittest.sh' locally" 1>&2 + echo " 2. you can add python requirements in .travis/requirements.txt if you use new requirements in unit tests" 1>&2 + exit 1 +} + +unittest(){ + if [ $? != 0 ]; then + exit 1 + fi + find "./ppdet" -name 'tests' -type d -print0 | \ + xargs -0 -I{} -n1 bash -c \ + 'python -m unittest discover -v -s {}' +} + +trap 'abort' 0 +set -e + +# install travis python dependencies exclude pycocotools +if [ -f ".travis/requirements.txt" ]; then + pip install -r .travis/requirements.txt +fi + +# install pycocotools +if [ `pip list | grep pycocotools | wc -l` -eq 0 ]; then + # install git if needed + if [ -n `which git` ]; then + apt-get update + apt-get install -y git + fi; + git clone https://github.com/cocodataset/cocoapi.git + cd cocoapi/PythonAPI + make install + python setup.py install --user + cd ../.. + rm -rf cocoapi +fi + +export PYTHONPATH=`pwd`:$PYTHONPATH + +unittest . + +trap : 0 diff --git a/PaddleDetection-release-2.6/LICENSE b/PaddleDetection-release-2.6/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64 --- /dev/null +++ b/PaddleDetection-release-2.6/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/PaddleDetection-release-2.6/README.md b/PaddleDetection-release-2.6/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/README_cn.md b/PaddleDetection-release-2.6/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..15b0896cbb74e7712a52b77e0c0dada6bde248b0 --- /dev/null +++ b/PaddleDetection-release-2.6/README_cn.md @@ -0,0 +1,800 @@ +简体中文 | [English](README_en.md) + +
+

+ +

+ +

+ + + + + +

+
+ +## 🌈简介 + +PaddleDetection是一个基于PaddlePaddle的目标检测端到端开发套件,在提供丰富的模型组件和测试基准的同时,注重端到端的产业落地应用,通过打造产业级特色模型|工具、建设产业应用范例等手段,帮助开发者实现数据准备、模型选型、模型训练、模型部署的全流程打通,快速进行落地应用。 + +主要模型效果示例如下(点击标题可快速跳转): + +| [**通用目标检测**](#pp-yoloe-高精度目标检测模型) | [**小目标检测**](#pp-yoloe-sod-高精度小目标检测模型) | [**旋转框检测**](#pp-yoloe-r-高性能旋转框检测模型) | [**3D目标物检测**](https://github.com/PaddlePaddle/Paddle3D) | +| :--------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: | +| | | | | +| [**人脸检测**](#模型库) | [**2D关键点检测**](#️pp-tinypose-人体骨骼关键点识别) | [**多目标追踪**](#pp-tracking-实时多目标跟踪系统) | [**实例分割**](#模型库) | +| | | | | +| [**车辆分析——车牌识别**](#️pp-vehicle-实时车辆分析工具) | [**车辆分析——车流统计**](#️pp-vehicle-实时车辆分析工具) | [**车辆分析——违章检测**](#️pp-vehicle-实时车辆分析工具) | [**车辆分析——属性分析**](#️pp-vehicle-实时车辆分析工具) | +| | | | | +| [**行人分析——闯入分析**](#pp-human-实时行人分析工具) | [**行人分析——行为分析**](#pp-human-实时行人分析工具) | [**行人分析——属性分析**](#pp-human-实时行人分析工具) | [**行人分析——人流统计**](#pp-human-实时行人分析工具) | +| | | | | + +同时,PaddleDetection提供了模型的在线体验功能,用户可以选择自己的数据进行在线推理。 + +`说明`:考虑到服务器负载压力,在线推理均为CPU推理,完整的模型开发实例以及产业部署实践代码示例请前往[🎗️产业特色模型|产业工具](#️产业特色模型产业工具-1)。 + +`传送门`:[模型在线体验](https://www.paddlepaddle.org.cn/models) + +
+

+ +

+
+ +## ✨主要特性 + +#### 🧩模块化设计 +PaddleDetection将检测模型解耦成不同的模块组件,通过自定义模块组件组合,用户可以便捷高效地完成检测模型的搭建。`传送门`:[🧩模块组件](#模块组件)。 + +#### 📱丰富的模型库 +PaddleDetection支持大量的最新主流的算法基准以及预训练模型,涵盖2D/3D目标检测、实例分割、人脸检测、关键点检测、多目标跟踪、半监督学习等方向。`传送门`:[📱模型库](#模型库)、[⚖️模型性能对比](#️模型性能对比)。 + +#### 🎗️产业特色模型|产业工具 +PaddleDetection打造产业级特色模型以及分析工具:PP-YOLOE+、PP-PicoDet、PP-TinyPose、PP-HumanV2、PP-Vehicle等,针对通用、高频垂类应用场景提供深度优化解决方案以及高度集成的分析工具,降低开发者的试错、选择成本,针对业务场景快速应用落地。`传送门`:[🎗️产业特色模型|产业工具](#️产业特色模型产业工具-1)。 + +#### 💡🏆产业级部署实践 +PaddleDetection整理工业、农业、林业、交通、医疗、金融、能源电力等AI应用范例,打通数据标注-模型训练-模型调优-预测部署全流程,持续降低目标检测技术产业落地门槛。`传送门`:[💡产业实践范例](#产业实践范例)、[🏆企业应用案例](#企业应用案例)。 + +
+

+ +

+
+ +## 📣最新进展 + +PaddleDetection 2.6版本发布! [点击查看版本更新介绍](https://github.com/PaddlePaddle/PaddleDetection/releases/tag/v2.6.0) + +## 👫开源社区 + +- **📑项目合作:** 如果您是企业开发者且有明确的目标检测垂类应用需求,请扫描如下二维码入群,并联系`群管理员AI`后可免费与官方团队展开不同层次的合作。 +- **🏅️社区贡献:** PaddleDetection非常欢迎你加入到飞桨社区的开源建设中,参与贡献方式可以参考[开源项目开发指南](docs/contribution/README.md)。 +- **💻直播教程:** PaddleDetection会定期在飞桨直播间([B站:飞桨PaddlePaddle](https://space.bilibili.com/476867757)、[微信: 飞桨PaddlePaddle](https://mp.weixin.qq.com/s/6ji89VKqoXDY6SSGkxS8NQ)),针对发新内容、以及产业范例、使用教程等进行直播分享。 +- **🎁加入社区:** **微信扫描二维码并填写问卷之后,可以及时获取如下信息,包括:** + - 社区最新文章、直播课等活动预告 + - 往期直播录播&PPT + - 30+行人车辆等垂类高性能预训练模型 + - 七大任务开源数据集下载链接汇总 + - 40+前沿检测领域顶会算法 + - 15+从零上手目标检测理论与实践视频课程 + - 10+工业安防交通全流程项目实操(含源码) + +
+ +

PaddleDetection官方交流群二维码

+
+ +- **🎈社区近期活动** + + - **👀YOLO系列专题** + - `文章传送门`:[YOLOv8来啦!YOLO内卷期模型怎么选?9+款AI硬件如何快速部署?深度解析](https://mp.weixin.qq.com/s/rPwprZeHEpmGOe5wxrmO5g) + - `代码传送门`:[PaddleYOLO全系列](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/feature_models/PaddleYOLO_MODEL.md) + +
+ +

+
+ + - **🎯少目标迁移学习专题** + - `文章传送门`:[囿于数据少?泛化性差?PaddleDetection少样本迁移学习助你一键突围!](https://mp.weixin.qq.com/s/dFEQoxSzVCOaWVZPb3N7WA) + + - **⚽️2022卡塔尔世界杯专题** + - `文章传送门`:[世界杯决赛号角吹响!趁周末来搭一套足球3D+AI量化分析系统吧!](https://mp.weixin.qq.com/s/koJxjWDPBOlqgI-98UsfKQ) + +
+ +

+
+ + - **🔍旋转框小目标检测专题** + - `文章传送门`:[Yes, PP-YOLOE!80.73mAP、38.5mAP,旋转框、小目标检测能力双SOTA!](https://mp.weixin.qq.com/s/6ji89VKqoXDY6SSGkxS8NQ) + +
+ +

+
+ + - **🎊YOLO Vision世界学术交流大会** + - **PaddleDetection**受邀参与首个以**YOLO为主题**的**YOLO-VISION**世界大会,与全球AI领先开发者学习交流。 + - `活动链接传送门`:[YOLO-VISION](https://ultralytics.com/yolo-vision) + +
+ +
+ +- **🏅️社区贡献** + - `活动链接传送门`:[Yes, PP-YOLOE! 基于PP-YOLOE的算法开发](https://github.com/PaddlePaddle/PaddleDetection/issues/7345) + + +## 🍱安装 + +参考[安装说明](docs/tutorials/INSTALL_cn.md)进行安装。 + +## 🔥教程 + +**深度学习入门教程** + +- [零基础入门深度学习](https://www.paddlepaddle.org.cn/tutorials/projectdetail/4676538) +- [零基础入门目标检测](https://aistudio.baidu.com/aistudio/education/group/info/1617) + +**快速开始** + +- [快速体验](docs/tutorials/QUICK_STARTED_cn.md) +- [示例:30分钟快速开发交通标志检测模型](docs/tutorials/GETTING_STARTED_cn.md) + +**数据准备** +- [数据准备](docs/tutorials/data/README.md) +- [数据处理模块](docs/advanced_tutorials/READER.md) + +**配置文件说明** +- [RCNN参数说明](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md) +- [PP-YOLO参数说明](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md) + +**模型开发** + +- [新增检测模型](docs/advanced_tutorials/MODEL_TECHNICAL.md) +- 二次开发 + - [目标检测](docs/advanced_tutorials/customization/detection.md) + - [关键点检测](docs/advanced_tutorials/customization/keypoint_detection.md) + - [多目标跟踪](docs/advanced_tutorials/customization/pphuman_mot.md) + - [行为识别](docs/advanced_tutorials/customization/action_recognotion/) + - [属性识别](docs/advanced_tutorials/customization/pphuman_attribute.md) + +**部署推理** + +- [模型导出教程](deploy/EXPORT_MODEL.md) +- [模型压缩](https://github.com/PaddlePaddle/PaddleSlim) + - [剪裁/量化/蒸馏教程](configs/slim) +- [Paddle Inference部署](deploy/README.md) + - [Python端推理部署](deploy/python) + - [C++端推理部署](deploy/cpp) +- [Paddle Lite部署](deploy/lite) +- [Paddle Serving部署](deploy/serving) +- [ONNX模型导出](deploy/EXPORT_ONNX_MODEL.md) +- [推理benchmark](deploy/BENCHMARK_INFER.md) + +## 🔑FAQ +- [FAQ/常见问题汇总](docs/tutorials/FAQ) + +## 🧩模块组件 + + + + + + + + + + + + + + + + + + + + +
+ Backbones + + Necks + + Loss + + Common + + Data Augmentation +
+ + + + + + + +
  • Post-processing
  • + +
  • Training
  • + +
  • Common
  • +
    + +
    + +## 📱模型库 + + + + + + + + + + + + + + + + +
    + 2D Detection + + Multi Object Tracking + + KeyPoint Detection + + Others +
    + + + + + + + +
  • Instance Segmentation
  • + +
  • Face Detection
  • + +
  • Semi-Supervised Detection
  • + +
  • 3D Detection
  • + +
  • Vehicle Analysis Toolbox
  • + +
  • Human Analysis Toolbox
  • + +
  • Sport Analysis Toolbox
  • +
    + +## ⚖️模型性能对比 + +#### 🖥️服务器端模型性能对比 + +各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。 + +
    + +
    + +
    + 测试说明(点击展开) + +- ViT为ViT-Cascade-Faster-RCNN模型,COCO数据集mAP高达55.7% +- Cascade-Faster-RCNN为Cascade-Faster-RCNN-ResNet50vd-DCN,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS +- PP-YOLOE是对PP-YOLO v2模型的进一步优化,L版本在COCO数据集mAP为51.6%,Tesla V100预测速度78.1FPS +- PP-YOLOE+是对PPOLOE模型的进一步优化,L版本在COCO数据集mAP为53.3%,Tesla V100预测速度78.1FPS +- YOLOX和YOLOv5均为基于PaddleDetection复现算法,YOLOv5代码在[PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)中,参照[PaddleYOLO_MODEL](docs/feature_models/PaddleYOLO_MODEL.md) +- 图中模型均可在[📱模型库](#模型库)中获取 +
    + +#### ⌚️移动端模型性能对比 + +各移动端模型在COCO数据集上精度mAP和高通骁龙865处理器上预测速度(FPS)对比图。 + +
    + +
    + + +
    + 测试说明(点击展开) + +- 测试数据均使用高通骁龙865(4xA77+4xA55)处理器,batch size为1, 开启4线程测试,测试使用NCNN预测库,测试脚本见[MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark) +- PP-PicoDet及PP-YOLO-Tiny为PaddleDetection自研模型,可在[📱模型库](#模型库)中获取,其余模型PaddleDetection暂未提供 +
    + +## 🎗️产业特色模型|产业工具 + +产业特色模型|产业工具是PaddleDetection针对产业高频应用场景打造的兼顾精度和速度的模型以及工具箱,注重从数据处理-模型训练-模型调优-模型部署的端到端打通,且提供了实际生产环境中的实践范例代码,帮助拥有类似需求的开发者高效的完成产品开发落地应用。 + +该系列模型|工具均已PP前缀命名,具体介绍、预训练模型以及产业实践范例代码如下。 + +### 💎PP-YOLOE 高精度目标检测模型 + +
    + 简介(点击展开) + +PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型,超越了多种流行的YOLO模型。PP-YOLOE避免了使用诸如Deformable Convolution或者Matrix NMS之类的特殊算子,以使其能轻松地部署在多种多样的硬件上。其使用大规模数据集obj365预训练模型进行预训练,可以在不同场景数据集上快速调优收敛。 + +`传送门`:[PP-YOLOE说明](configs/ppyoloe/README_cn.md)。 + +`传送门`:[arXiv论文](https://arxiv.org/abs/2203.16250)。 + +
    + +
    + 预训练模型(点击展开) + +| 模型名称 | COCO精度(mAP) | V100 TensorRT FP16速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 | +| :---------- | :-------------: | :-------------------------: | :----------: | :-----------------------------------------------------: | :-------------------------------------------------------------------------------------: | +| PP-YOLOE+_l | 53.3 | 149.2 | 服务器 | [链接](configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | + +`传送门`:[全部预训练模型](configs/ppyoloe/README_cn.md)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ----------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | --------------------------------------------------- | +| 农业 | 农作物检测 | 用于葡萄栽培中基于图像的监测和现场机器人技术,提供了来自5种不同葡萄品种的实地实例 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) | +| 通用 | 低光场景检测 | 低光数据集使用ExDark,包括从极低光环境到暮光环境等10种不同光照条件下的图片。 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) | +| 工业 | PCB电路板瑕疵检测 | 工业数据集使用PKU-Market-PCB,该数据集用于印刷电路板(PCB)的瑕疵检测,提供了6种常见的PCB缺陷 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) | +
    + +### 💎PP-YOLOE-R 高性能旋转框检测模型 + +
    + 简介(点击展开) + +PP-YOLOE-R是一个高效的单阶段Anchor-free旋转框检测模型,基于PP-YOLOE+引入了一系列改进策略来提升检测精度。根据不同的硬件对精度和速度的要求,PP-YOLOE-R包含s/m/l/x四个尺寸的模型。在DOTA 1.0数据集上,PP-YOLOE-R-l和PP-YOLOE-R-x在单尺度训练和测试的情况下分别达到了78.14mAP和78.28 mAP,这在单尺度评估下超越了几乎所有的旋转框检测模型。通过多尺度训练和测试,PP-YOLOE-R-l和PP-YOLOE-R-x的检测精度进一步提升至80.02mAP和80.73 mAP,超越了所有的Anchor-free方法并且和最先进的Anchor-based的两阶段模型精度几乎相当。在保持高精度的同时,PP-YOLOE-R避免使用特殊的算子,例如Deformable Convolution或Rotated RoI Align,使其能轻松地部署在多种多样的硬件上。 + +`传送门`:[PP-YOLOE-R说明](configs/rotate/ppyoloe_r)。 + +`传送门`:[arXiv论文](https://arxiv.org/abs/2211.02386)。 + +
    + +
    + 预训练模型(点击展开) + +| 模型 | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +| :----------: | :------: | :---: | :-----------------: | :------------------------: | :--------: | :-------: | :--------: | :------: | :------: | :-----: | :-----------: | :---------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 | 281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) | + +`传送门`:[全部预训练模型](configs/rotate/ppyoloe_r)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ---------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| 通用 | 旋转框检测 | 手把手教你上手PP-YOLOE-R旋转框检测,10分钟将脊柱数据集精度训练至95mAP | [基于PP-YOLOE-R的旋转框检测](https://aistudio.baidu.com/aistudio/projectdetail/5058293) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/5058293) | +
    + +### 💎PP-YOLOE-SOD 高精度小目标检测模型 + +
    + 简介(点击展开) + +PP-YOLOE-SOD(Small Object Detection)是PaddleDetection团队针对小目标检测提出的检测方案,在VisDrone-DET数据集上单模型精度达到38.5mAP,达到了SOTA性能。其分别基于切图拼图流程优化的小目标检测方案以及基于原图模型算法优化的小目标检测方案。同时提供了数据集自动分析脚本,只需输入数据集标注文件,便可得到数据集统计结果,辅助判断数据集是否是小目标数据集以及是否需要采用切图策略,同时给出网络超参数参考值。 + +`传送门`:[PP-YOLOE-SOD 小目标检测模型](configs/smalldet)。 + +
    + +
    + 预训练模型(点击展开) +- VisDrone数据集预训练模型 + +| 模型 | COCOAPI mAPval
    0.5:0.95 | COCOAPI mAPval
    0.5 | COCOAPI mAPtest_dev
    0.5:0.95 | COCOAPI mAPtest_dev
    0.5 | MatlabAPI mAPtest_dev
    0.5:0.95 | MatlabAPI mAPtest_dev
    0.5 | 下载 | 配置文件 | +| :------------------ | :-----------------------------: | :------------------------: | :----------------------------------: | :-----------------------------: | :------------------------------------: | :-------------------------------: | :---------------------------------------------------------------------------------------------: | :----------------------------------------------------------: | +| **PP-YOLOE+_SOD-l** | **31.9** | **52.1** | **25.6** | **43.5** | **30.25** | **51.18** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml) | + +`传送门`:[全部预训练模型](configs/smalldet)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ---------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| 通用 | 小目标检测 | 基于PP-YOLOE-SOD的无人机航拍图像检测案例全流程实操。 | [基于PP-YOLOE-SOD的无人机航拍图像检测](https://aistudio.baidu.com/aistudio/projectdetail/5036782) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/5036782) | +
    + +### 💫PP-PicoDet 超轻量实时目标检测模型 + +
    + 简介(点击展开) + +全新的轻量级系列模型PP-PicoDet,在移动端具有卓越的性能,成为全新SOTA轻量级模型。 + +`传送门`:[PP-PicoDet说明](configs/picodet/README.md)。 + +`传送门`:[arXiv论文](https://arxiv.org/abs/2111.00902)。 + +
    + +
    + 预训练模型(点击展开) + +| 模型名称 | COCO精度(mAP) | 骁龙865 四线程速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 | +| :-------- | :-------------: | :---------------------: | :------------: | :--------------------------------------------------: | :----------------------------------------------------------------------------------: | +| PicoDet-L | 36.1 | 39.7 | 移动端、嵌入式 | [链接](configs/picodet/picodet_l_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | + +`传送门`:[全部预训练模型](configs/picodet/README.md)。 +
    + + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| -------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | +| 智慧城市 | 道路垃圾检测 | 通过在市政环卫车辆上安装摄像头对路面垃圾检测并分析,实现对路面遗撒的垃圾进行监控,记录并通知环卫人员清理,大大提升了环卫人效。 | [基于PP-PicoDet的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) | +
    + +### 📡PP-Tracking 实时多目标跟踪系统 + +
    + 简介(点击展开) + +PaddleDetection团队提供了实时多目标跟踪系统PP-Tracking,是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 + +`传送门`:[PP-Tracking说明](configs/mot/README.md)。 + +
    + +
    + 预训练模型(点击展开) + +| 模型名称 | 模型简介 | 精度 | 速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 | +| :-------- | :----------------------------------: | :--------------------: | :-------: | :--------------------: | :--------------------------------------------------------: | :------------------------------------------------------------------------------------------------: | +| ByteTrack | SDE多目标跟踪算法 仅包含检测模型 | MOT-17 test: 78.4 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/bytetrack/bytetrack_yolox.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | +| FairMOT | JDE多目标跟踪算法 多任务联合学习方法 | MOT-16 test: 75.0 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | +| OC-SORT | SDE多目标跟踪算法 仅包含检测模型 | MOT-17 half val: 75.5 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/ocsort/ocsort_yolox.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_mot_ch.pdparams) | +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ---------- | -------------------------- | ---------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| 通用 | 多目标跟踪 | 快速上手单镜头、多镜头跟踪 | [PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/3022582) | +
    + +### ⛷️PP-TinyPose 人体骨骼关键点识别 + +
    + 简介(点击展开) + +PaddleDetection 中的关键点检测部分紧跟最先进的算法,包括 Top-Down 和 Bottom-Up 两种方法,可以满足用户的不同需求。同时,PaddleDetection 提供针对移动端设备优化的自研实时关键点检测模型 PP-TinyPose。 + +`传送门`:[PP-TinyPose说明](configs/keypoint/tiny_pose)。 + +
    + +
    + 预训练模型(点击展开) + +| 模型名称 | 模型简介 | COCO精度(AP) | 速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 | +| :---------: | :----------------------------------: | :------------: | :-----------------------: | :------------: | :-----------------------------------------------------: | :--------------------------------------------------------------------------------------: | +| PP-TinyPose | 轻量级关键点算法
    输入尺寸256x192 | 68.8 | 骁龙865 四线程: 158.7 FPS | 移动端、嵌入式 | [链接](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | + +`传送门`:[全部预训练模型](configs/keypoint/README.md)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ---- | ---------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| 运动 | 健身 | 提供从模型选型、数据准备、模型训练优化,到后处理逻辑和模型部署的全流程可复用方案,有效解决了复杂健身动作的高效识别,打造AI虚拟健身教练! | [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4385813) | +
    + +### 🏃🏻PP-Human 实时行人分析工具 + +
    + 简介(点击展开) + +PaddleDetection深入探索核心行业的高频场景,提供了行人开箱即用分析工具,支持图片/单镜头视频/多镜头视频/在线视频流多种输入方式,广泛应用于智慧交通、智慧城市、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 +PP-Human支持四大产业级功能:五大异常行为识别、26种人体属性分析、实时人流计数、跨镜头(ReID)跟踪。 + +`传送门`:[PP-Human行人分析工具使用指南](deploy/pipeline/README.md)。 + +
    + +
    + 预训练模型(点击展开) + +| 任务 | T4 TensorRT FP16: 速度(FPS) | 推荐部署硬件 | 模型下载 | 模型体积 | +| :----------------: | :---------------------------: | :----------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------: | +| 行人检测(高精度) | 39.8 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(高精度) | 31.4 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 属性识别(高精度) | 单人 117.6 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人 100 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.4 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 打架识别 | 50.8 | 服务器 | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 340.1 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 166.7 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + +`传送门`:[完整预训练模型](deploy/pipeline/README.md)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- | +| 智能安防 | 摔倒检测 | 飞桨行人分析PP-Human中提供的摔倒识别算法,采用了关键点+时空图卷积网络的技术,对摔倒姿势无限制、背景环境无要求。 | [基于PP-Human v2的摔倒检测](https://aistudio.baidu.com/aistudio/projectdetail/4606001) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4606001) | +| 智能安防 | 打架识别 | 本项目基于PaddleVideo视频开发套件训练打架识别模型,然后将训练好的模型集成到PaddleDetection的PP-Human中,助力行人行为分析。 | [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) | +| 智能安防 | 摔倒检测 | 基于PP-Human完成来客分析整体流程。使用PP-Human完成来客分析中非常常见的场景: 1. 来客属性识别(单镜和跨境可视化);2. 来客行为识别(摔倒识别)。 | [基于PP-Human的来客分析案例教程](https://aistudio.baidu.com/aistudio/projectdetail/4537344) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4537344) | +
    + +### 🏎️PP-Vehicle 实时车辆分析工具 + +
    + 简介(点击展开) + +PaddleDetection深入探索核心行业的高频场景,提供了车辆开箱即用分析工具,支持图片/单镜头视频/多镜头视频/在线视频流多种输入方式,广泛应用于智慧交通、智慧城市、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 +PP-Vehicle囊括四大交通场景核心功能:车牌识别、属性识别、车流量统计、违章检测。 + +`传送门`:[PP-Vehicle车辆分析工具指南](deploy/pipeline/README.md)。 + +
    + +
    + 预训练模型(点击展开) + +| 任务 | T4 TensorRT FP16: 速度(FPS) | 推荐部署硬件 | 模型方案 | 模型体积 | +| :----------------: | :-------------------------: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------: | +| 车辆检测(高精度) | 38.9 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车辆跟踪(高精度) | 25 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车牌识别 | 213.7 | 服务器 | [车牌检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [车牌识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | 车牌检测:3.9M
    车牌字符识别: 12M | +| 车辆属性 | 136.8 | 服务器 | [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | + +`传送门`:[完整预训练模型](deploy/pipeline/README.md)。 +
    + +
    + 产业应用代码示例(点击展开) + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| -------- | ---------------- | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| 智慧交通 | 交通监控车辆分析 | 本项目基于PP-Vehicle演示智慧交通中最刚需的车流量监控、车辆违停检测以及车辆结构化(车牌、车型、颜色)分析三大场景。 | [基于PP-Vehicle的交通监控分析系统](https://aistudio.baidu.com/aistudio/projectdetail/4512254) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4512254) | +
    + +## 💡产业实践范例 + +产业实践范例是PaddleDetection针对高频目标检测应用场景,提供的端到端开发示例,帮助开发者打通数据标注-模型训练-模型调优-预测部署全流程。 +针对每个范例我们都通过[AI-Studio](https://ai.baidu.com/ai-doc/AISTUDIO/Tk39ty6ho)提供了项目代码以及说明,用户可以同步运行体验。 + +`传送门`:[产业实践范例完整列表](industrial_tutorial/README.md) + +- [基于PP-YOLOE-R的旋转框检测](https://aistudio.baidu.com/aistudio/projectdetail/5058293) +- [基于PP-YOLOE-SOD的无人机航拍图像检测](https://aistudio.baidu.com/aistudio/projectdetail/5036782) +- [基于PP-Vehicle的交通监控分析系统](https://aistudio.baidu.com/aistudio/projectdetail/4512254) +- [基于PP-Human v2的摔倒检测](https://aistudio.baidu.com/aistudio/projectdetail/4606001) +- [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) +- [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) +- [基于Faster-RCNN的瓷砖表面瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2571419) +- [基于PaddleDetection的PCB瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2367089) +- [基于FairMOT实现人流量统计](https://aistudio.baidu.com/aistudio/projectdetail/2421822) +- [基于YOLOv3实现跌倒检测](https://aistudio.baidu.com/aistudio/projectdetail/2500639) +- [基于PP-PicoDetv2 的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) +- [基于人体关键点检测的合规检测](https://aistudio.baidu.com/aistudio/projectdetail/4061642?contributionType=1) +- [基于PP-Human的来客分析案例教程](https://aistudio.baidu.com/aistudio/projectdetail/4537344) +- 持续更新中... + +## 🏆企业应用案例 + +企业应用案例是企业在实生产环境下落地应用PaddleDetection的方案思路,相比产业实践范例其更多强调整体方案设计思路,可供开发者在项目方案设计中做参考。 + +`传送门`:[企业应用案例完整列表](https://www.paddlepaddle.org.cn/customercase) + +- [中国南方电网——变电站智慧巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2330) +- [国铁电气——轨道在线智能巡检系统](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2280) +- [京东物流——园区车辆行为识别](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2611) +- [中兴克拉—厂区传统仪表统计监测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2618) +- [宁德时代—动力电池高精度质量检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2609) +- [中国科学院空天信息创新研究院——高尔夫球场遥感监测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2483) +- [御航智能——基于边缘的无人机智能巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2481) +- [普宙无人机——高精度森林巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2121) +- [领邦智能——红外无感测温监控](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2615) +- [北京地铁——口罩检测](https://mp.weixin.qq.com/s/znrqaJmtA7CcjG0yQESWig) +- [音智达——工厂人员违规行为检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2288) +- [华夏天信——输煤皮带机器人智能巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2331) +- [优恩物联网——社区住户分类支持广告精准投放](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2485) +- [螳螂慧视——室内3D点云场景物体分割与检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2599) +- 持续更新中... + +## 📝许可证书 + +本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 + + +## 📌引用 + +``` +@misc{ppdet2019, +title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, +author={PaddlePaddle Authors}, +howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, +year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/README_en.md b/PaddleDetection-release-2.6/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..c45b100e1e2e7fef7e979a10f16ad52d786351ae --- /dev/null +++ b/PaddleDetection-release-2.6/README_en.md @@ -0,0 +1,541 @@ +[简体中文](README_cn.md) | English + +
    +

    + +

    + +**A High-Efficient Development Toolkit for Object Detection based on [PaddlePaddle](https://github.com/paddlepaddle/paddle)** + +

    + + + + + +

    +
    + +
    + + +
    + +## Product Update + +- 🔥 **2022.11.15:SOTA rotated object detector and small object detector based on PP-YOLOE** + - Rotated object detector [PP-YOLOE-R](configs/rotate/ppyoloe_r) + - SOTA Anchor-free rotated object detection model with high accuracy and efficiency + - A series of models, named s/m/l/x, for cloud and edge devices + - Avoiding using special operators to be deployed friendly with TensorRT. + - Small object detector [PP-YOLOE-SOD](configs/smalldet) + - End-to-end detection pipeline based on sliced images + - SOTA model on VisDrone based on original images. + +- 2022.8.26:PaddleDetection releases[release/2.5 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5) + + - 🗳 Model features: + + - Release [PP-YOLOE+](configs/ppyoloe): Increased accuracy by a maximum of 2.4% mAP to 54.9% mAP, 3.75 times faster model training convergence rate, and up to 2.3 times faster end-to-end inference speed; improved generalization for multiple downstream tasks + - Release [PicoDet-NPU](configs/picodet) model which supports full quantization deployment of models; add [PicoDet](configs/picodet) layout analysis model + - Release [PP-TinyPose Plus](./configs/keypoint/tiny_pose/). With 9.1% AP accuracy improvement in physical exercise, dance, and other scenarios, our PP-TinyPose Plus supports unconventional movements such as turning to one side, lying down, jumping, and high lifts + + - 🔮 Functions in different scenarios + + - Release the pedestrian analysis tool [PP-Human v2](./deploy/pipeline). It introduces four new behavior recognition: fighting, telephoning, smoking, and trespassing. The underlying algorithm performance is optimized, covering three core algorithm capabilities: detection, tracking, and attributes of pedestrians. Our model provides end-to-end development and model optimization strategies for beginners and supports online video streaming input. + - First release [PP-Vehicle](./deploy/pipeline), which has four major functions: license plate recognition, vehicle attribute analysis (color, model), traffic flow statistics, and violation detection. It is compatible with input formats, including pictures, online video streaming, and video. And we also offer our users a comprehensive set of tutorials for customization. + + - 💡 Cutting-edge algorithms: + + - Release [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) which overs classic and latest models of [YOLO family](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_en.md): YOLOv3, PP-YOLOE (a real-time high-precision object detection model developed by Baidu PaddlePaddle), and cutting-edge detection algorithms such as YOLOv4, YOLOv5, YOLOX, YOLOv6, YOLOv7 and YOLOv8 + - Newly add high precision detection model based on [ViT](configs/vitdet) backbone network, with a 55.7% mAP accuracy on COCO dataset; newly add multi-object tracking model [OC-SORT](configs/mot/ocsort); newly add [ConvNeXt](configs/convnext) backbone network. + + - 📋 Industrial applications: Newly add [Smart Fitness](https://aistudio.baidu.com/aistudio/projectdetail/4385813), [Fighting recognition](https://aistudio.baidu.com/aistudio/projectdetail/4086987?channelType=0&channel=0),[ and Visitor Analysis](https://aistudio.baidu.com/aistudio/projectdetail/4230123?channelType=0&channel=0). + +- 2022.3.24:PaddleDetection released[release/2.4 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4) + - Release high-performanace SOTA object detection model [PP-YOLOE](configs/ppyoloe). It integrates cloud and edge devices and provides S/M/L/X versions. In particular, Verson L has the accuracy as 51.4% on COCO test 2017 dataset, inference speed as 78.1 FPS on a single Test V100. It supports mixed precision training, 33% faster than PP-YOLOv2. Its full range of multi-sized models can meet different hardware arithmetic requirements, and adaptable to server, edge-device GPU and other AI accelerator cards on servers. + - Release ultra-lightweight SOTA object detection model [PP-PicoDet Plus](configs/picodet) with 2% improvement in accuracy and 63% improvement in CPU inference speed. Add PicoDet-XS model with a 0.7M parameter, providing model sparsification and quantization functions for model acceleration. No specific post processing module is required for all the hardware, simplifying the deployment. + - Release the real-time pedestrian analysis tool [PP-Human](deploy/pphuman). It has four major functions: pedestrian tracking, visitor flow statistics, human attribute recognition and falling detection. For falling detection, it is optimized based on real-life data with accurate recognition of various types of falling posture. It can adapt to different environmental background, light and camera angle. + - Add [YOLOX](configs/yolox) object detection model with nano/tiny/S/M/L/X. X version has the accuracy as 51.8% on COCO Val2017 dataset. + +- [More releases](https://github.com/PaddlePaddle/PaddleDetection/releases) + +## Brief Introduction + +**PaddleDetection** is an end-to-end object detection development kit based on PaddlePaddle. Providing **over 30 model algorithm** and **over 300 pre-trained models**, it covers object detection, instance segmentation, keypoint detection, multi-object tracking. In particular, PaddleDetection offers **high- performance & light-weight** industrial SOTA models on **servers and mobile** devices, champion solution and cutting-edge algorithm. PaddleDetection provides various data augmentation methods, configurable network components, loss functions and other advanced optimization & deployment schemes. In addition to running through the whole process of data processing, model development, training, compression and deployment, PaddlePaddle also provides rich cases and tutorials to accelerate the industrial application of algorithm. + +
    + +
    + +## Features + +- **Rich model library**: PaddleDetection provides over 250 pre-trained models including **object detection, instance segmentation, face recognition, multi-object tracking**. It covers a variety of **global competition champion** schemes. +- **Simple to use**: Modular design, decoupling each network component, easy for developers to build and try various detection models and optimization strategies, quick access to high-performance, customized algorithm. +- **Getting Through End to End**: PaddlePaddle gets through end to end from data augmentation, constructing models, training, compression, depolyment. It also supports multi-architecture, multi-device deployment for **cloud and edge** device. +- **High Performance**: Due to the high performance core, PaddlePaddle has clear advantages in training speed and memory occupation. It also supports FP16 training and multi-machine training. + +
    + +
    + +## Exchanges + +- If you have any question or suggestion, please give us your valuable input via [GitHub Issues](https://github.com/PaddlePaddle/PaddleDetection/issues) + + Welcome to join PaddleDetection user groups on WeChat (scan the QR code, add and reply "D" to the assistant) + +
    + +
    + +## Kit Structure + + + + + + + + + + + + + + + + + + + +
    + Architectures + + Backbones + + Components + + Data Augmentation +
    +
      +
      Object Detection +
        +
      • Faster RCNN
      • +
      • FPN
      • +
      • Cascade-RCNN
      • +
      • PSS-Det
      • +
      • RetinaNet
      • +
      • YOLOv3
      • +
      • YOLOF
      • +
      • YOLOX
      • +
      • YOLOv5
      • +
      • YOLOv6
      • +
      • YOLOv7
      • +
      • YOLOv8
      • +
      • RTMDet
      • +
      • PP-YOLO
      • +
      • PP-YOLO-Tiny
      • +
      • PP-PicoDet
      • +
      • PP-YOLOv2
      • +
      • PP-YOLOE
      • +
      • PP-YOLOE+
      • +
      • PP-YOLOE-SOD
      • +
      • PP-YOLOE-R
      • +
      • SSD
      • +
      • CenterNet
      • +
      • FCOS
      • +
      • FCOSR
      • +
      • TTFNet
      • +
      • TOOD
      • +
      • GFL
      • +
      • GFLv2
      • +
      • DETR
      • +
      • Deformable DETR
      • +
      • Swin Transformer
      • +
      • Sparse RCNN
      • +
      +
      Instance Segmentation +
        +
      • Mask RCNN
      • +
      • Cascade Mask RCNN
      • +
      • SOLOv2
      • +
      +
      Face Detection +
        +
      • BlazeFace
      • +
      +
      Multi-Object-Tracking +
        +
      • JDE
      • +
      • FairMOT
      • +
      • DeepSORT
      • +
      • ByteTrack
      • +
      • OC-SORT
      • +
      • BoT-SORT
      • +
      • CenterTrack
      • +
      +
      KeyPoint-Detection +
        +
      • HRNet
      • +
      • HigherHRNet
      • +
      • Lite-HRNet
      • +
      • PP-TinyPose
      • +
      +
    +
    +
    Details +
      +
    • ResNet(&vd)
    • +
    • Res2Net(&vd)
    • +
    • CSPResNet
    • +
    • SENet
    • +
    • Res2Net
    • +
    • HRNet
    • +
    • Lite-HRNet
    • +
    • DarkNet
    • +
    • CSPDarkNet
    • +
    • MobileNetv1/v3
    • +
    • ShuffleNet
    • +
    • GhostNet
    • +
    • BlazeNet
    • +
    • DLA
    • +
    • HardNet
    • +
    • LCNet
    • +
    • ESNet
    • +
    • Swin-Transformer
    • +
    • ConvNeXt
    • +
    • Vision Transformer
    • +
    +
    +
    Common +
      +
    • Sync-BN
    • +
    • Group Norm
    • +
    • DCNv2
    • +
    • EMA
    • +
    + +
    KeyPoint +
      +
    • DarkPose
    • +
    + +
    FPN +
      +
    • BiFPN
    • +
    • CSP-PAN
    • +
    • Custom-PAN
    • +
    • ES-PAN
    • +
    • HRFPN
    • +
    + +
    Loss +
      +
    • Smooth-L1
    • +
    • GIoU/DIoU/CIoU
    • +
    • IoUAware
    • +
    • Focal Loss
    • +
    • CT Focal Loss
    • +
    • VariFocal Loss
    • +
    + +
    Post-processing +
      +
    • SoftNMS
    • +
    • MatrixNMS
    • +
    + +
    Speed +
      +
    • FP16 training
    • +
    • Multi-machine training
    • +
    + +
    +
    Details +
      +
    • Resize
    • +
    • Lighting
    • +
    • Flipping
    • +
    • Expand
    • +
    • Crop
    • +
    • Color Distort
    • +
    • Random Erasing
    • +
    • Mixup
    • +
    • AugmentHSV
    • +
    • Mosaic
    • +
    • Cutmix
    • +
    • Grid Mask
    • +
    • Auto Augment
    • +
    • Random Perspective
    • +
    +
    + +## Model Performance + +
    + Performance comparison of Cloud models + +The comparison between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones. + +
    + +
    + +**Clarification:** + +- `ViT` stands for `ViT-Cascade-Faster-RCNN`, which has highest mAP on COCO as 55.7% +- `Cascade-Faster-RCNN`stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models +- `PP-YOLOE` are optimized `PP-YOLO v2`. It reached accuracy as 51.4% on COCO dataset, inference speed as 78.1 FPS on Tesla V100 +- `PP-YOLOE+` are optimized `PP-YOLOE`. It reached accuracy as 53.3% on COCO dataset, inference speed as 78.1 FPS on Tesla V100 +- The models in the figure are available in the[ model library](#模型库) + +
    + +
    + Performance omparison on mobiles + +The comparison between COCO mAP and FPS on Qualcomm Snapdragon 865 processor of models on mobile devices. + +
    + +
    + +**Clarification:** + +- Tests were conducted on Qualcomm Snapdragon 865 (4 \*A77 + 4 \*A55) batch_size=1, 4 thread, and NCNN inference library, test script see [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark) +- [PP-PicoDet](configs/picodet) and [PP-YOLO-Tiny](configs/ppyolo) are self-developed models of PaddleDetection, and other models are not tested yet. + +
    + +## Model libraries + +
    + 1. General detection + +#### PP-YOLOE series Recommended scenarios: Cloud GPU such as Nvidia V100, T4 and edge devices such as Jetson series + +| Model | COCO Accuracy(mAP) | V100 TensorRT FP16 Speed(FPS) | Configuration | Download | +|:---------- |:------------------:|:-----------------------------:|:-------------------------------------------------------:|:----------------------------------------------------------------------------------------:| +| PP-YOLOE+_s | 43.9 | 333.3 | [link](configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | +| PP-YOLOE+_m | 50.0 | 208.3 | [link](configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | +| PP-YOLOE+_l | 53.3 | 149.2 | [link](configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | +| PP-YOLOE+_x | 54.9 | 95.2 | [link](configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | + +#### PP-PicoDet series Recommended scenarios: Mobile chips and x86 CPU devices, such as ARM CPU(RK3399, Raspberry Pi) and NPU(BITMAIN) + +| Model | COCO Accuracy(mAP) | Snapdragon 865 four-thread speed (ms) | Configuration | Download | +|:---------- |:------------------:|:-------------------------------------:|:-----------------------------------------------------:|:-------------------------------------------------------------------------------------:| +| PicoDet-XS | 23.5 | 7.81 | [Link](configs/picodet/picodet_xs_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | +| PicoDet-S | 29.1 | 9.56 | [Link](configs/picodet/picodet_s_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | +| PicoDet-M | 34.4 | 17.68 | [Link](configs/picodet/picodet_m_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | +| PicoDet-L | 36.1 | 25.21 | [Link](configs/picodet/picodet_l_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | + +#### [Frontier detection algorithm](docs/feature_models/PaddleYOLO_MODEL.md) + +| Model | COCO Accuracy(mAP) | V100 TensorRT FP16 speed(FPS) | Configuration | Download | +|:-------- |:------------------:|:-----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:| +| [YOLOX-l](configs/yolox) | 50.1 | 107.5 | [Link](configs/yolox/yolox_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | +| [YOLOv5-l](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5) | 48.6 | 136.0 | [Link](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5/yolov5_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | +| [YOLOv7-l](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) | 51.0 | 135.0 | [链接](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7/yolov7_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | + +#### Other general purpose models [doc](docs/MODEL_ZOO_en.md) + +
    + +
    + 2. Instance segmentation + +| Model | Introduction | Recommended Scenarios | COCO Accuracy(mAP) | Configuration | Download | +|:----------------- |:-------------------------------------------------------- |:--------------------------------------------- |:--------------------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:| +| Mask RCNN | Two-stage instance segmentation algorithm |
    Edge-Cloud end
    | box AP: 41.4
    mask AP: 37.5 | [Link](configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | +| Cascade Mask RCNN | Two-stage instance segmentation algorithm |
    Edge-Cloud end
    | box AP: 45.7
    mask AP: 39.7 | [Link](configs/mask_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | +| SOLOv2 | Lightweight single-stage instance segmentation algorithm |
    Edge-Cloud end
    | mask AP: 38.0 | [Link](configs/solov2/solov2_r50_fpn_3x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | + +
    + +
    + 3. Keypoint detection + +| Model | Introduction | Recommended scenarios | COCO Accuracy(AP) | Speed | Configuration | Download | +|:-------------------- |:--------------------------------------------------------------------------------------------- |:--------------------------------------------- |:-----------------:|:---------------------------------:|:---------------------------------------------------------:|:-------------------------------------------------------------------------------------------:| +| HRNet-w32 + DarkPose |
    Top-down Keypoint detection algorithm
    Input size: 384x288
    |
    Edge-Cloud end
    | 78.3 | T4 TensorRT FP16 2.96ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | +| HRNet-w32 + DarkPose | Top-down Keypoint detection algorithm
    Input size: 256x192 | Edge-Cloud end | 78.0 | T4 TensorRT FP16 1.75ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | +| PP-TinyPose | Light-weight keypoint algorithm
    Input size: 256x192 | Mobile | 68.8 | Snapdragon 865 four-thread 6.30ms | [Link](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | +| PP-TinyPose | Light-weight keypoint algorithm
    Input size: 128x96 | Mobile | 58.1 | Snapdragon 865 four-thread 2.37ms | [Link](configs/keypoint/tiny_pose/tinypose_128x96.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | + +#### Other keypoint detection models [doc](configs/keypoint) + +
    + +
    + 4. Multi-object tracking PP-Tracking + +| Model | Introduction | Recommended scenarios | Accuracy | Configuration | Download | +|:--------- |:------------------------------------------------------------- |:--------------------- |:----------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:| +| ByteTrack | SDE Multi-object tracking algorithm with detection model only | Edge-Cloud end | MOT-17 half val: 77.3 | [Link](configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) | +| FairMOT | JDE multi-object tracking algorithm multi-task learning | Edge-Cloud end | MOT-16 test: 75.0 | [Link](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | +| OC-SORT | SDE multi-object tracking algorithm with detection model only | Edge-Cloud end | MOT-16 half val: 75.5 | [Link](configs/mot/ocsort/ocsort_yolox.yml) | - | + +#### Other multi-object tracking models [docs](configs/mot) + +
    + +
    + 5. Industrial real-time pedestrain analysis tool-PP Human + +| Task | End-to-End Speed(ms) | Model | Size | +|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:| +| Pedestrian detection (high precision) | 25.1ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian detection (lightweight) | 16.2ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian tracking (high precision) | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian tracking (lightweight) | 21.0ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Attribute recognition (high precision) | Single person8.5ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Attribute recognition (lightweight) | Single person 7.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Falling detection | Single person 10ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking:182M
    Keypoint detection:101M
    Behavior detection based on key points: 21.8M | +| Intrusion detection | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Fighting detection | 19.7ms | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| Smoking detection | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | Object detection:182M
    Object detection based on Human ID: 27M | +| Phoning detection | Single person ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | Object detection:182M
    Image classification based on Human ID:45M | + +Please refer to [docs](deploy/pipeline/README_en.md) for details. + +
    + +
    + 6. Industrial real-time vehicle analysis tool-PP Vehicle + +| Task | End-to-End Speed(ms) | Model | Size | +|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:| +| Vehicle detection (high precision) | 25.7ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle detection (lightweight) | 13.2ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| Vehicle tracking (high precision) | 40ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle tracking (lightweight) | 25ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Plate Recognition | 4.68ms | [plate detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [plate recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | Plate detection:3.9M
    Plate recognition:12M | +| Vehicle attribute | 7.31ms | [attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | + +Please refer to [docs](deploy/pipeline/README_en.md) for details. + +
    + + +## Document tutorials + +### Introductory tutorials + +- [Installation](docs/tutorials/INSTALL_cn.md) +- [Quick start](docs/tutorials/QUICK_STARTED_cn.md) +- [Data preparation](docs/tutorials/data/README.md) +- [Geting Started on PaddleDetection](docs/tutorials/GETTING_STARTED_cn.md) +- [FAQ](docs/tutorials/FAQ) + +### Advanced tutorials + +- Configuration + + - [RCNN Configuration](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md) + - [PP-YOLO Configuration](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md) + +- Compression based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) + + - [Pruning/Quantization/Distillation Tutorial](configs/slim) + +- [Inference deployment](deploy/README.md) + + - [Export model for inference](deploy/EXPORT_MODEL.md) + + - [Paddle Inference deployment](deploy/README.md) + + - [Inference deployment with Python](deploy/python) + - [Inference deployment with C++](deploy/cpp) + + - [Paddle-Lite deployment](deploy/lite) + + - [Paddle Serving deployment](deploy/serving) + + - [ONNX model export](deploy/EXPORT_ONNX_MODEL.md) + + - [Inference benchmark](deploy/BENCHMARK_INFER.md) + +- Advanced development + + - [Data processing module](docs/advanced_tutorials/READER.md) + - [New object detection models](docs/advanced_tutorials/MODEL_TECHNICAL.md) + - Custumization + - [Object detection](docs/advanced_tutorials/customization/detection.md) + - [Keypoint detection](docs/advanced_tutorials/customization/keypoint_detection.md) + - [Multiple object tracking](docs/advanced_tutorials/customization/pphuman_mot.md) + - [Action recognition](docs/advanced_tutorials/customization/action_recognotion/) + - [Attribute recognition](docs/advanced_tutorials/customization/pphuman_attribute.md) + +### Courses + +- **[Theoretical foundation] [Object detection 7-day camp](https://aistudio.baidu.com/aistudio/education/group/info/1617):** Overview of object detection tasks, details of RCNN series object detection algorithm and YOLO series object detection algorithm, PP-YOLO optimization strategy and case sharing, introduction and practice of AnchorFree series algorithm + +- **[Industrial application] [AI Fast Track industrial object detection technology and application](https://aistudio.baidu.com/aistudio/education/group/info/23670):** Super object detection algorithms, real-time pedestrian analysis system PP-Human, breakdown and practice of object detection industrial application + +- **[Industrial features] 2022.3.26** **[Smart City Industry Seven-Day Class](https://aistudio.baidu.com/aistudio/education/group/info/25620)** : Urban planning, Urban governance, Smart governance service, Traffic management, community governance. + +- **[Academic exchange] 2022.9.27 [YOLO Vision Event](https://www.youtube.com/playlist?list=PL1FZnkj4ad1NHVC7CMc3pjSQ-JRK-Ev6O):** As the first YOLO-themed event, PaddleDetection was invited to communicate with the experts in the field of Computer Vision around the world. + +### [Industrial tutorial examples](./industrial_tutorial/README.md) + +- [Rotated object detection based on PP-YOLOE-R](https://aistudio.baidu.com/aistudio/projectdetail/5058293) + +- [Aerial image detection based on PP-YOLOE-SOD](https://aistudio.baidu.com/aistudio/projectdetail/5036782) + +- [Fall down recognition based on PP-Human v2](https://aistudio.baidu.com/aistudio/projectdetail/4606001) + +- [Intelligent fitness recognition based on PP-TinyPose Plus](https://aistudio.baidu.com/aistudio/projectdetail/4385813) + +- [Road litter detection based on PP-PicoDet Plus](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [Visitor flow statistics based on FairMOT](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + +- [Guest analysis based on PP-Human](https://aistudio.baidu.com/aistudio/projectdetail/4537344) + +- [More examples](./industrial_tutorial/README.md) + +## Applications + +- [Fitness app on android mobile](https://github.com/zhiboniu/pose_demo_android) +- [PP-Tracking GUI Visualization Interface](https://github.com/yangyudong2020/PP-Tracking_GUi) + +## Recommended third-party tutorials + +- [Deployment of PaddleDetection for Windows I ](https://zhuanlan.zhihu.com/p/268657833) +- [Deployment of PaddleDetection for Windows II](https://zhuanlan.zhihu.com/p/280206376) +- [Deployment of PaddleDetection on Jestson Nano](https://zhuanlan.zhihu.com/p/319371293) +- [How to deploy YOLOv3 model on Raspberry Pi for Helmet detection](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md) +- [Use SSD-MobileNetv1 for a project -- From dataset to deployment on Raspberry Pi](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md) + +## Version updates + +Please refer to the[ Release note ](https://github.com/PaddlePaddle/Paddle/wiki/PaddlePaddle-2.3.0-Release-Note-EN)for more details about the updates + +## License + +PaddlePaddle is provided under the [Apache 2.0 license](LICENSE) + +## Contribute your code + +We appreciate your contributions and your feedback! + +- Thank [Mandroide](https://github.com/Mandroide) for code cleanup and +- Thank [FL77N](https://github.com/FL77N/) for `Sparse-RCNN`model +- Thank [Chen-Song](https://github.com/Chen-Song) for `Swin Faster-RCNN`model +- Thank [yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) for developing PP-Tracking GUI interface +- Thank Shigure19 for developing PP-TinyPose fitness APP +- Thank [manangoel99](https://github.com/manangoel99) for Wandb visualization methods + +## Quote + +``` +@misc{ppdet2019, +title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, +author={PaddlePaddle Authors}, +howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, +year={2019} +} +``` diff --git "a/PaddleDetection-release-2.6/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" "b/PaddleDetection-release-2.6/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" new file mode 100644 index 0000000000000000000000000000000000000000..393bf18f7e64bb7360c4bba43d7b9b48662dd8e0 --- /dev/null +++ "b/PaddleDetection-release-2.6/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" @@ -0,0 +1,125 @@ +# 直播答疑第一期 + +### 答疑全程回放可以通过链接下载观看:https://pan.baidu.com/s/168ouju4MxN5XJEb-GU1iAw 提取码: 92mw + +## PaddleDetection框架/API问题 + +#### Q1. warmup能详细讲解下吗? +A1. warmup是在训练初期学习率从0调整至预设学习率的过程,设置可以参考[源码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/optimizer.py#L156),可以设置step数或epoch数 + +#### Q2. 如果类别不匹配 也能用pretrain weights吗? +A2. 可以,类别不匹配时,模型会自动不加载shape不匹配的权重,通常和类别数相关的权重位于head层 + +#### Q3. 请问nms_eta怎么用呀,源码上没有写的很清楚,API文档也没有细说 +A3. 针对密集的场景,nms_eta会在每轮动态的调整nms阈值,避免过滤掉两个重叠程度很高但是属于不同物体的检测框,具体可以参考[源码](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/detection/multiclass_nms_op.cc#L139),默认为1,通常无需设置 + +#### Q4. 请问anchor_cluster.py中的--size 是模型的input size 还是 实际使用图片的size? +A4. 是实际推理时的图片尺寸,一般可以参考TestReader中的image_shape的设置。 + +#### Q5. 请问为什么预测的坐标会出现负的值? +A5. 模型算法中是有可能负值的情况,首先需要判断模型预测效果是否符合预期,如果正常可以考虑在后处理中增加clip的操作限制输出box在图像中;如果不正常,说明模型训练效果欠佳,需要进一步排查问题或调优 + +#### Q6. PaddleDetection 人脸检测blazeface模型,一键式预测时load_params没有参数文件,从哪里下载? +A6. blazeface的模型可以在[模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/face_detection#%E6%A8%A1%E5%9E%8B%E5%BA%93)中下载到,如果想部署需要参考[步骤](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md) 导出模型 + +## PP-YOLOE问题 +#### Q1. 训练PP-YOLOE的时候,loss是越训练越高这种情况 是数据集的问题吗? +A1. 可以从以下几个方面排查 + +1. 数据: 首先确认数据集没问题,包括标注,类别等 +2. 超参数:base_lr根据batch_size调整,遵守线性原则;warmup_iters根据总的epoch数进行调整 +3. 预训练参数:可以加载官方提供的自在coco数据集上的预训练参数 +4. 网络结构方面:分析下box的分布情况 适当调整dfl的参数 + +#### Q2. 检测模型选型问题:PicoDet、PP-YOLO系列如何选型 +A2. PicoDet是针对移动端设备设计的模型,是针对arm,x86等低算力设备上设计;PP-YOLO是针对服务器端设计的模型,英伟达N卡,百度昆仑卡等。手机端,无gpu桌面端,优先PicoDet;有高算力设备,如N卡,优先PP-YOLO系列;对延时不敏感的场景,更注重高精度,优先PP-YOLO系列 + +#### Q3. ConvBNLayer中BN层的参数都不会使用L2Decay;PP-YOLOE-s的其它部分都会按照配置文件的设置使用0.0005的L2Decay。是这样吗 +A3. PP-YOLOE的backbone和neck部分使用了ConvBNLayer,其中BN层不会使用L2Decay,其他部分使用全局设置的0.0005的L2Decay + +#### Q4. PP-YOLOE的Conv的bias也不使用decay吗? +A4. PP-YOLOE的backbone和neck部分的Conv是没有bias参数的,head部分的Conv bias使用全局decay + +#### Q5. 在测速时,为什么要用PaddleInference而不是直接加载模型测时间呢 +A5. PaddleInference会将paddle导出的预测模型会前向算子做融合,从而实现速度优化,并且实际部署过程也是使用PaddleInference实现 + +#### Q6. PP-YOLOE系列在部署的时候,前后处理是不是一样的啊? +A6. PP-YOLO系列模型在部署时的前处理都是 decode-resize-normalize-permute的流程,后处理方面PP-YOLOv2使用了Matrix NMS,PP-YOLOE使用的是普通的NMS算法 + +#### Q7. 针对小目标和类别不平衡的数据集,PP-YOLOE有什么调整策略吗 +A7 针对小目标数据集,可以适当增大ppyoloe的输入尺寸,然后在模型中增加注意力机制,目前基于PP-YOLOE的小目标检测正在开发中;针对类别不平衡问题,可以从数据采样的角度处理,目前PP-YOLOE还没有专门针对类别不平衡问题的优化 + +## PP-Human问题 +#### Q1. 请问pphuman用导出的模型18个点(不是官方17个点)去预测时,报错是为什么 +A1. 这个问题是关键点模型输出点的数量与行为识别模型不一致导致的。如果希望用18点模型预测,除了关键点用18点模型以外,还需要自建18点的动作识别模型。 + +#### Q2. 为什么官方导出模型设置的window_size是50 +A2. 导出模型的设置与训练和预测的输入数据长度是一致的;我们主要采用的数据集是ntu、企业提供的实际数据等等。在训练这个模型的时候,我们对这些数据中摔倒的片段做了统计分析,基本上每个动作片段持续的帧数大约是40~80左右。综合考虑到实际使用的延迟以及预测效果,我们选择了50这个量级,在我们的这部分数据上既能完整描述一个完整动作,又不会使得延迟过大。 + +总的来说,这个window_size的数值最好还是根据实际动作以及设备的情况进行选择。例如在某种设备上,50帧的长度根本不足以包含一个完整的动作,那么这个数值就需要扩大;又或者某些动作持续时间很短,50帧的长度包含了太多不相关的其他动作,容易造成误识别,那么这个数值可以适当缩小。 + + +#### Q3. PP-Human中如何替换检测、跟踪、关键点模型 +A3. 我们使用的模型都是PaddleDetection中模型进行导出得到的。理论上PP-Human所使用的模型都是可以直接替换的,但是需要注意是流程和前后处理一样的模型。 + +#### Q4. PP-Human中的数据标注问题(检测、跟踪、关键点、行为、属性)标注工具推荐和标注步骤 +A4. 标注工具:检测 labelme, labelImg, cvat; 跟踪darklabel,cvat;关键点 labelme,cvat。检测标注可以使用tools/x2coco.py转换成coco格式 + +#### Q5. PP-Human中如何更改label(属性和动作识别) +A5. 在PPHuman中,动作识别被定义为基于骨骼点序列的分类问题,目前我们已经开源的摔倒动作识别是一个二分类问题;属性方面我们当前还暂时没有开放训练,正在建设中 + +#### Q6. PP-Human的哪些功能支持单人、哪些支持多人 +A6. PP-Human的功能实现基于一套流程:检测->跟踪->具体功能。当前我们的具体功能模型每次处理的是单人的,即属性、动作等都是属于图像中每一个具体人的。但是基于这套流程下来,图像中的每一个人都得到了处理的。所以单人、多人实际都是一样支持的。 + +#### Q7. PP-Human对视频流预测的支持及服务化部署 +A7. 目前正在建设当中,下个版本会支持这部分功能 + +#### Q8. 在使用pphuman训练自己的数据集时,训练完进行测试时,可视化的标签如何更改,没有更改的情况下还是falling + +A8. 可视化的函数位于https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/visualize.py#L368,这里在可视化的时候将 action_text替换为期望的类别即可。 + +#### Q9. 关键点检测可以实现一个连贯动作的检测吗,比如健身规范 +A9. 基于关键点是可以实现的。这里可以有不同思路去做: + +1. 如果是期望判定动作规范的程度,且这个动作可以很好的描述。那么可以在关键点模型获得的坐标的基础上,人工增加逻辑判断即可。这里我们提供一个安卓的健身APP示例:https://github.com/zhiboniu/pose_demo_android ,其中实现判定各项动作的逻辑可以参考https://github.com/zhiboniu/pose_demo_android/blob/release/1.0/app/src/main/cpp/pose_action.cc 。 + +2. 当一个动作较难用逻辑去描述的时候,可能参考现有摔倒检测的案例,训练一个识别健身动作的模型,但对收集数据的要求会比较高。 + + +#### Q10. 有遮挡的生产环境中梯子,可以用关键点检测判断人员上下梯动作是否合规 +A10. 这个问题需要视遮挡的程度而定,如果遮挡过于严重时关键点检测模型的效果会大打折扣,从而导致行为的判断失准。此外,由于基于关键点的方案抹去了外观信息,如果只是从人物本身的动作上去做判断,那么在遮挡不严重的场景下是可以的。反之,如果梯子这个物体是判断动作是否合规的必要元素,那么这个方案其实不一定是最佳选择。 + +#### Q11. 关键点做的行为识别并不是时序上的动作识别吗 +A11. 是时序的动作识别。这里是将一定时间范围内的每一帧关键点坐标组成一个时序的关键点序列,再通过行为识别模型去预测这个序列所属的行为类别。 + + +## 检测算法问题 +#### Q1. 大图片小目标 最终推理的图片也是大图片 怎么预处理呀 +A1. 小目标问题常见的处理方式是切图以及增大网络输入尺寸,如果使用基于anchor的检测算法,可以通过对目标物体大小聚类生成anchor,参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py); 目前基于PP-YOLOE的小目标检测正在开发中 + +#### Q2. 想问下大的目标对象怎么检测,比如发票 +A2. 如果使用基于anchor的检测算法,可以通过对目标物体大小聚类生成anchor,参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py);另外可以增强深层特征提升大物体检测效果 + +#### Q3. 在做预测时发现预测框特别多,有的框的置信度甚至低于0.1,请问如果将这种框过滤掉?也就是训练模型时就把这些极低置信度的预测结果过滤掉,避免在推理部署时,做不必要的计算,从而影响推理速度。 +A3. 后处理部分有两个过滤,1)是提取置信度最高的Top 100个框做nms。2)是根据设定阈值threshold进行过滤。如果你可以确认图片上目标相对比较少<10个,可以调整Top 100这个值到50或者更低,这样可以加速nms部分的计算。其次调整threshold这个影响最终检测的准确度和召回率的效果。 + +#### Q4. 正负样本的比例一般怎么设计 +A4. 在PaddleDetection中,支持负样本训练,TrainDataset下设置allow_empty: true即可,通过数据集测试,负样本比例在0.3时对模型提升效果最明显。 + +## 压缩部署问题 +#### Q1. PaddleDetection训练的模型导出inference model后,在做推理部署的时候,前后处理相关代码如何编写,有什么参考教程吗? +A1. 目前PaddleDetection下的网络模型大部分都能够支持c++ inference,不同的处理方式针对不同功能,例如:PP-YOLOE速度测试不包含后处理,PicoDet为支持不同的第三方推理引擎会设置是否导出nms + +object_detector.cc是针对所有检测模型的流程,其中前处理大部分都是decode-resize-normalize-permute 部分网络会加入padding的操作;大部分模型的后处理操作都放在模型里面了,picodet有单独提供nms的后处理代码 + +检测模型的输入统一为image,im_shape,scale_factor ,如果模型中没有使用im_shape,输出个数会减少,但是整套预处理流程不需要额外开发 + +#### Q2. 针对TensorRT的加速问题,fp16在v100确实可以,但是耗时好像有点偏差,我在1080ti上,单张图片跑1000次,耗时50s,还是float32的,可是在v100上,float16耗时97 +A2. 目前PPYOLOE等模型的速度都有在V100上使用TensorRT FP16测试,关于速度测试有以下几个方面可以排查: + +1. 速度测试时是否正确设置warmup,以避免过长的启动时间影响速度测试准确度 +2. 在开启TensorRT时,生成engine文件的过程耗时较长,可以在https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L745 中将use_static设置为True + + +#### Q3. PaddleDetection已经支持了在线量化一些模型,比如想训练其他的一个新模型,是不是可以轻松用起来qat?如果不能,为什么只能支持很有限的模型,而qat其他模型总会出各种各样的问题,原因是什么? +A3. 目前PaddleDetection模型很多,只能针对部分模型开源了QAT的config,其他模型也是支持QAT的,只是配置文件没有覆盖到,如果量化报错,通常是配置问题。检测模型一般建议跳过head最后一个conv。如果想要跳过某些层量化,可以设置skip_quant,参考[代码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/modeling/heads/yolo_head.py#L97) diff --git a/PaddleDetection-release-2.6/benchmark/README.md b/PaddleDetection-release-2.6/benchmark/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1c8bb2bf084a77226b069167ffc19ec9723e437d --- /dev/null +++ b/PaddleDetection-release-2.6/benchmark/README.md @@ -0,0 +1,47 @@ +# 通用检测benchmark测试脚本说明 + +``` +├── benchmark +│ ├── analysis_log.py +│ ├── prepare.sh +│ ├── README.md +│ ├── run_all.sh +│ ├── run_benchmark.sh +``` + +## 脚本说明 + +### prepare.sh +相关数据准备脚本,完成数据、模型的自动下载 +### run_all.sh +主要运行脚本,可完成所有相关模型的测试方案 +### run_benchmark.sh +单模型运行脚本,可完成指定模型的测试方案 + +## Docker 运行环境 +* docker image: registry.baidubce.com/paddlepaddle/paddle:2.1.2-gpu-cuda10.2-cudnn7 +* paddle = 2.1.2 +* python = 3.7 + +## 运行benchmark测试 + +### 运行所有模型 +``` +git clone https://github.com/PaddlePaddle/PaddleDetection.git +cd PaddleDetection +bash benchmark/run_all.sh +``` + +### 运行指定模型 +* Usage:bash run_benchmark.sh ${run_mode} ${batch_size} ${fp_item} ${max_epoch} ${model_name} +* model_name: faster_rcnn, fcos, deformable_detr, gfl, hrnet, higherhrnet, solov2, jde, fairmot +``` +git clone https://github.com/PaddlePaddle/PaddleDetection.git +cd PaddleDetection +bash benchmark/prepare.sh + +# 单卡 +CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh sp 2 fp32 1 faster_rcnn +# 多卡 +CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark.sh mp 2 fp32 1 faster_rcnn +``` diff --git a/PaddleDetection-release-2.6/benchmark/configs/faster_rcnn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/benchmark/configs/faster_rcnn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..02f138559c8e866ce1d1c9e7dde40720df9cf400 --- /dev/null +++ b/PaddleDetection-release-2.6/benchmark/configs/faster_rcnn_r50_fpn_1x_coco.yml @@ -0,0 +1,48 @@ +_BASE_: [ + '../../configs/datasets/coco_detection.yml', + '../../configs/runtime.yml', + '../../configs/faster_rcnn/_base_/optimizer_1x.yml', + '../../configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', +] +weights: output/faster_rcnn_r50_fpn_1x_coco/model_final + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/benchmark/prepare.sh b/PaddleDetection-release-2.6/benchmark/prepare.sh new file mode 100644 index 0000000000000000000000000000000000000000..0133f2b6847e6e2baa445d499bf7cf9a2d77743b --- /dev/null +++ b/PaddleDetection-release-2.6/benchmark/prepare.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +pip install -U pip Cython +pip install -r requirements.txt + +mv ./dataset/coco/download_coco.py . && rm -rf ./dataset/coco/* && mv ./download_coco.py ./dataset/coco/ +# prepare lite train data +wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/coco_benchmark.tar +cd ./dataset/coco/ && tar -xvf coco_benchmark.tar && mv -u coco_benchmark/* . +rm -rf coco_benchmark/ + +cd ../../ +rm -rf ./dataset/mot/* +# prepare mot mini train data +wget -nc -P ./dataset/mot/ https://paddledet.bj.bcebos.com/data/mot_benchmark.tar +cd ./dataset/mot/ && tar -xvf mot_benchmark.tar && mv -u mot_benchmark/* . +rm -rf mot_benchmark/ diff --git a/PaddleDetection-release-2.6/benchmark/run_all.sh b/PaddleDetection-release-2.6/benchmark/run_all.sh new file mode 100644 index 0000000000000000000000000000000000000000..cffeb09421cd14f45fc05e51b8922daab815ab67 --- /dev/null +++ b/PaddleDetection-release-2.6/benchmark/run_all.sh @@ -0,0 +1,47 @@ +# Use docker: paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7 paddle=2.1.2 python3.7 +# +# Usage: +# git clone https://github.com/PaddlePaddle/PaddleDetection.git +# cd PaddleDetection +# bash benchmark/run_all.sh +log_path=${LOG_PATH_INDEX_DIR:-$(pwd)} # benchmark系统指定该参数,不需要跑profile时,log_path指向存speed的目录 + +# run prepare.sh +bash benchmark/prepare.sh + +model_name_list=(faster_rcnn fcos deformable_detr gfl hrnet higherhrnet solov2 jde fairmot) +fp_item_list=(fp32) +max_epoch=2 + +for model_item in ${model_name_list[@]}; do + for fp_item in ${fp_item_list[@]}; do + case ${model_item} in + faster_rcnn) bs_list=(1 8) ;; + fcos) bs_list=(2) ;; + deformable_detr) bs_list=(2) ;; + gfl) bs_list=(2) ;; + hrnet) bs_list=(64) ;; + higherhrnet) bs_list=(20) ;; + solov2) bs_list=(2) ;; + jde) bs_list=(4) ;; + fairmot) bs_list=(6) ;; + *) echo "wrong model_name"; exit 1; + esac + for bs_item in ${bs_list[@]} + do + run_mode=sp + log_name=detection_${model_item}_bs${bs_item}_${fp_item} # 如:clas_MobileNetv1_mp_bs32_fp32_8 + echo "index is speed, 1gpus, begin, ${log_name}" + CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh ${run_mode} ${bs_item} \ + ${fp_item} ${max_epoch} ${model_item} | tee ${log_path}/${log_name}_speed_1gpus 2>&1 + sleep 60 + + run_mode=mp + log_name=detection_${model_item}_bs${bs_item}_${fp_item} # 如:clas_MobileNetv1_mp_bs32_fp32_8 + echo "index is speed, 8gpus, run_mode is multi_process, begin, ${log_name}" + CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark.sh ${run_mode} \ + ${bs_item} ${fp_item} ${max_epoch} ${model_item}| tee ${log_path}/${log_name}_speed_8gpus8p 2>&1 + sleep 60 + done + done +done diff --git a/PaddleDetection-release-2.6/benchmark/run_benchmark.sh b/PaddleDetection-release-2.6/benchmark/run_benchmark.sh new file mode 100644 index 0000000000000000000000000000000000000000..908bfe59fe0e88d73783bb1328017b561be24bec --- /dev/null +++ b/PaddleDetection-release-2.6/benchmark/run_benchmark.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +set -xe +# Usage:CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh ${run_mode} ${batch_size} ${fp_item} ${max_epoch} ${model_name} +python="python3.7" +# Parameter description +function _set_params(){ + run_mode=${1:-"sp"} # sp|mp + batch_size=${2:-"2"} + fp_item=${3:-"fp32"} # fp32|fp16 + max_epoch=${4:-"1"} + model_item=${5:-"model_item"} + run_log_path=${TRAIN_LOG_DIR:-$(pwd)} +# 添加日志解析需要的参数 + base_batch_size=${batch_size} + mission_name="目标检测" + direction_id="0" + ips_unit="images/s" + skip_steps=10 # 解析日志,有些模型前几个step耗时长,需要跳过 (必填) + keyword="ips:" # 解析日志,筛选出数据所在行的关键字 (必填) + index="1" + model_name=${model_item}_bs${batch_size}_${fp_item} + + device=${CUDA_VISIBLE_DEVICES//,/ } + arr=(${device}) + num_gpu_devices=${#arr[*]} + log_file=${run_log_path}/${model_item}_${run_mode}_bs${batch_size}_${fp_item}_${num_gpu_devices} +} +function _train(){ + echo "Train on ${num_gpu_devices} GPUs" + echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size" + + # set runtime params + set_optimizer_lr_sp=" " + set_optimizer_lr_mp=" " + # parse model_item + case ${model_item} in + faster_rcnn) model_yml="benchmark/configs/faster_rcnn_r50_fpn_1x_coco.yml" + set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;; + fcos) model_yml="configs/fcos/fcos_r50_fpn_1x_coco.yml" + set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;; + deformable_detr) model_yml="configs/deformable_detr/deformable_detr_r50_1x_coco.yml" ;; + gfl) model_yml="configs/gfl/gfl_r50_fpn_1x_coco.yml" + set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;; + hrnet) model_yml="configs/keypoint/hrnet/hrnet_w32_256x192.yml" ;; + higherhrnet) model_yml="configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml" ;; + solov2) model_yml="configs/solov2/solov2_r50_fpn_1x_coco.yml" ;; + jde) model_yml="configs/mot/jde/jde_darknet53_30e_1088x608.yml" ;; + fairmot) model_yml="configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml" ;; + *) echo "Undefined model_item"; exit 1; + esac + + set_batch_size="TrainReader.batch_size=${batch_size}" + set_max_epoch="epoch=${max_epoch}" + set_log_iter="log_iter=1" + if [ ${fp_item} = "fp16" ]; then + set_fp_item="--fp16" + else + set_fp_item=" " + fi + + case ${run_mode} in + sp) train_cmd="${python} -u tools/train.py -c ${model_yml} ${set_fp_item} \ + -o ${set_batch_size} ${set_max_epoch} ${set_log_iter} ${set_optimizer_lr_sp}" ;; + mp) rm -rf mylog + train_cmd="${python} -m paddle.distributed.launch --log_dir=./mylog \ + --gpus=${CUDA_VISIBLE_DEVICES} tools/train.py -c ${model_yml} ${set_fp_item} \ + -o ${set_batch_size} ${set_max_epoch} ${set_log_iter} ${set_optimizer_lr_mp}" + log_parse_file="mylog/workerlog.0" ;; + *) echo "choose run_mode(sp or mp)"; exit 1; + esac + + timeout 15m ${train_cmd} > ${log_file} 2>&1 + if [ $? -ne 0 ];then + echo -e "${train_cmd}, FAIL" + export job_fail_flag=1 + else + echo -e "${train_cmd}, SUCCESS" + export job_fail_flag=0 + fi + kill -9 `ps -ef|grep 'python'|awk '{print $2}'` + + if [ $run_mode = "mp" -a -d mylog ]; then + rm ${log_file} + cp mylog/workerlog.0 ${log_file} + fi +} + +source ${BENCHMARK_ROOT}/scripts/run_model.sh # 在该脚本中会对符合benchmark规范的log使用analysis.py 脚本进行性能数据解析;该脚本在联调时可从benchmark repo中下载https://github.com/PaddlePaddle/benchmark/blob/master/scripts/run_model.sh;如果不联调只想要产出训练log可以注掉本行,提交时需打开 +_set_params $@ +# _train # 如果只想产出训练log,不解析,可取消注释 +_run # 该函数在run_model.sh中,执行时会调用_train; 如果不联调只想要产出训练log可以注掉本行,提交时需打开 + diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/README.md b/PaddleDetection-release-2.6/configs/cascade_rcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8084fac2e999cea4db11bca2f9bf12b56b0a44d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/README.md @@ -0,0 +1,28 @@ +# Cascade R-CNN: High Quality Object Detection and Instance Segmentation + +## Model Zoo + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50-FPN | Cascade Faster | 1 | 1x | ---- | 41.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | Cascade Mask | 1 | 1x | ---- | 41.8 | 36.3 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | + + +## Citations +``` +@article{Cai_2019, + title={Cascade R-CNN: High Quality Object Detection and Instance Segmentation}, + ISSN={1939-3539}, + url={http://dx.doi.org/10.1109/tpami.2019.2956516}, + DOI={10.1109/tpami.2019.2956516}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + publisher={Institute of Electrical and Electronics Engineers (IEEE)}, + author={Cai, Zhaowei and Vasconcelos, Nuno}, + year={2019}, + pages={1–1} +} +``` diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_fpn_reader.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9b9abccd63e499bfa9402f3038425470e4a6e953 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_fpn_reader.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9b9abccd63e499bfa9402f3038425470e4a6e953 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..ea2937babd488b1e874f75494093d942366315e5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml @@ -0,0 +1,97 @@ +architecture: CascadeRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: False + +MaskFeat: + num_convs: 4 + out_channel: 256 + +MaskAssigner: + mask_resolution: 28 + +MaskPostProcess: + binary_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..c5afe774347209812ed759e31fb03e5aff677d96 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml @@ -0,0 +1,75 @@ +architecture: CascadeRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..63f898e9c52556bfa0fbbe9c369900c09ab3f94c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b2c7e536d5ccaedf3ef25e7e36624b664897cfef --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_mask_rcnn_r50_fpn.yml', + '_base_/cascade_mask_fpn_reader.yml', +] +weights: output/cascade_mask_rcnn_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0ab507caa9548e9118aeafb32f5c7394409601c8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml @@ -0,0 +1,18 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_mask_rcnn_r50_fpn.yml', + '_base_/cascade_mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..736ba2e7430717781364343312716d5b3f2ef4aa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_mask_rcnn_r50_fpn.yml', + '_base_/cascade_mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b2cc7993b9885bb5feafbff53bcc82ae3049a148 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_rcnn_r50_fpn.yml', + '_base_/cascade_fpn_reader.yml', +] +weights: output/cascade_rcnn_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..905adbd61a5b2b5213d737d5ad2a49df650c8425 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml @@ -0,0 +1,18 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_rcnn_r50_fpn.yml', + '_base_/cascade_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/cascade_rcnn_r50_vd_fpn_ssld_1x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] diff --git a/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a6272145d03bb273c90ccf8d950c8b88f9b3e13b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/cascade_rcnn_r50_fpn.yml', + '_base_/cascade_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/centernet/README.md b/PaddleDetection-release-2.6/configs/centernet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6dd52cd32608d6e76f08e50bbda8f3c2f4190418 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/README.md @@ -0,0 +1,37 @@ +English | [简体中文](README_cn.md) + +# CenterNet (CenterNet: Objects as Points) + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Citations](#Citations) + +## Introduction + +[CenterNet](http://arxiv.org/abs/1904.07850) is an Anchor Free detector, which model an object as a single point -- the center point of its bounding box. The detector uses keypoint estimation to find center points and regresses to all other object properties. The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. + +## Model Zoo + +### CenterNet Results on COCO-val 2017 + +| backbone | input shape | mAP | FPS | download | config | +| :--------------| :------- | :----: | :------: | :----: |:-----: | +| DLA-34(paper) | 512x512 | 37.4 | - | - | - | +| DLA-34 | 512x512 | 37.6 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_dla34_140e_coco.pdparams) | [config](./centernet_dla34_140e_coco.yml) | +| ResNet50 + DLAUp | 512x512 | 38.9 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams) | [config](./centernet_r50_140e_coco.yml) | +| MobileNetV1 + DLAUp | 512x512 | 28.2 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv1_140e_coco.pdparams) | [config](./centernet_mbv1_140e_coco.yml) | +| MobileNetV3_small + DLAUp | 512x512 | 17 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_small_140e_coco.pdparams) | [config](./centernet_mbv3_small_140e_coco.yml) | +| MobileNetV3_large + DLAUp | 512x512 | 27.1 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_large_140e_coco.pdparams) | [config](./centernet_mbv3_large_140e_coco.yml) | +| ShuffleNetV2 + DLAUp | 512x512 | 23.8 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_shufflenetv2_140e_coco.pdparams) | [config](./centernet_shufflenetv2_140e_coco.yml) | + + +## Citations +``` +@article{zhou2019objects, + title={Objects as points}, + author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp}, + journal={arXiv preprint arXiv:1904.07850}, + year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/centernet/README_cn.md b/PaddleDetection-release-2.6/configs/centernet/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..d78cd40e3d3d8751422d3d6bde078ff21d08223d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/README_cn.md @@ -0,0 +1,36 @@ +简体中文 | [English](README.md) + +# CenterNet (CenterNet: Objects as Points) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [引用](#引用) + +## 内容 + +[CenterNet](http://arxiv.org/abs/1904.07850)是Anchor Free检测器,将物体表示为一个目标框中心点。CenterNet使用关键点检测的方式定位中心点并回归物体的其他属性。CenterNet是以中心点为基础的检测方法,是端到端可训练的,并且相较于基于anchor的检测器更加检测高效。 + +## 模型库 + +### CenterNet在COCO-val 2017上结果 + +| 骨干网络 | 输入尺寸 | mAP | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :------: | :----: |:-----: | +| DLA-34(paper) | 512x512 | 37.4 | - | - | - | +| DLA-34 | 512x512 | 37.6 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_dla34_140e_coco.pdparams) | [配置文件](./centernet_dla34_140e_coco.yml) | +| ResNet50 + DLAUp | 512x512 | 38.9 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams) | [配置文件](./centernet_r50_140e_coco.yml) | +| MobileNetV1 + DLAUp | 512x512 | 28.2 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv1_140e_coco.pdparams) | [配置文件](./centernet_mbv1_140e_coco.yml) | +| MobileNetV3_small + DLAUp | 512x512 | 17 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_small_140e_coco.pdparams) | [配置文件](./centernet_mbv3_small_140e_coco.yml) | +| MobileNetV3_large + DLAUp | 512x512 | 27.1 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_large_140e_coco.pdparams) | [配置文件](./centernet_mbv3_large_140e_coco.yml) | +| ShuffleNetV2 + DLAUp | 512x512 | 23.8 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_shufflenetv2_140e_coco.pdparams) | [配置文件](./centernet_shufflenetv2_140e_coco.yml) | + +## 引用 +``` +@article{zhou2019objects, + title={Objects as points}, + author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp}, + journal={arXiv preprint arXiv:1904.07850}, + year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_dla34.yml b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_dla34.yml new file mode 100644 index 0000000000000000000000000000000000000000..f8fb86912ef1ef48bdcde6363b5b228966fefd09 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_dla34.yml @@ -0,0 +1,22 @@ +architecture: CenterNet +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/DLA34_pretrain.pdparams + +CenterNet: + backbone: DLA + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +DLA: + depth: 34 + +CenterNetDLAFPN: + down_ratio: 4 + +CenterNetHead: + head_planes: 256 + regress_ltrb: False + +CenterNetPostProcess: + max_per_img: 100 + regress_ltrb: False diff --git a/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_r50.yml b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..f2dc2ee2897d190066250113c9dbf01a7b92e130 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_r50.yml @@ -0,0 +1,34 @@ +architecture: CenterNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +CenterNet: + backbone: ResNet + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [0, 1, 2, 3] + freeze_at: -1 + norm_decay: 0. + dcn_v2_stages: [3] + + +CenterNetDLAFPN: + first_level: 0 + last_level: 4 + down_ratio: 4 + dcn_v2: False + +CenterNetHead: + head_planes: 256 + regress_ltrb: False + +CenterNetPostProcess: + max_per_img: 100 + regress_ltrb: False diff --git a/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_reader.yml b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..81af4ab840502da6e738ac667dd0883041ba8992 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/_base_/centernet_reader.yml @@ -0,0 +1,35 @@ +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 512, 512] + sample_transforms: + - Decode: {} + - FlipWarpAffine: {keep_res: False, input_h: 512, input_w: 512, use_random: True} + - CenterRandColor: {} + - Lighting: {eigval: [0.2141788, 0.01817699, 0.00341571], eigvec: [[-0.58752847, -0.69563484, 0.41340352], [-0.5832747, 0.00994535, -0.81221408], [-0.56089297, 0.71832671, 0.41158938]]} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: False} + - Permute: {} + - Gt2CenterNetTarget: {down_ratio: 4, max_objs: 128} + batch_size: 16 + shuffle: True + drop_last: True + use_shared_memory: True + +EvalReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: True, input_h: 512, input_w: 512} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834]} + - Permute: {} + batch_size: 1 + + +TestReader: + inputs_def: + image_shape: [3, 512, 512] + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: True, input_h: 512, input_w: 512} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/centernet/_base_/optimizer_140e.yml b/PaddleDetection-release-2.6/configs/centernet/_base_/optimizer_140e.yml new file mode 100644 index 0000000000000000000000000000000000000000..8c014e1ffe9f4971a9f322644bc943880ed57cec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/_base_/optimizer_140e.yml @@ -0,0 +1,14 @@ +epoch: 140 + +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [90, 120] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_dla34_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_dla34_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..e6a66a9b955e1c53512e3d399a40a1e120f8a0d2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_dla34_140e_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_140e.yml', + '_base_/centernet_dla34.yml', + '_base_/centernet_reader.yml', +] + +weights: output/centernet_dla34_140e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_mbv1_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv1_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..48429a1dd9ced07b4e906304a199a8e2193e235d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv1_140e_coco.yml @@ -0,0 +1,21 @@ +_BASE_: [ + 'centernet_r50_140e_coco.yml' +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_pretrained.pdparams +weights: output/centernet_mbv1_140e_coco/model_final + +CenterNet: + backbone: MobileNet + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +MobileNet: + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [3, 5, 11, 13] + +TrainReader: + batch_size: 32 diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_large_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_large_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..57830a9b5ab3c4124138a1283f964ae62fa2c00e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_large_140e_coco.yml @@ -0,0 +1,22 @@ +_BASE_: [ + 'centernet_r50_140e_coco.yml' +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams +weights: output/centernet_mbv3_large_140e_coco/model_final + +CenterNet: + backbone: MobileNetV3 + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +MobileNetV3: + model_name: large + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [4, 7, 13, 16] + +TrainReader: + batch_size: 32 diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_small_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_small_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..de73f1b2f4023ecb9bfee96436403192b7f6d80f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_mbv3_small_140e_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + 'centernet_r50_140e_coco.yml' +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_small_x1_0_ssld_pretrained.pdparams +weights: output/centernet_mbv3_small_140e_coco/model_final + +CenterNet: + backbone: MobileNetV3 + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +MobileNetV3: + model_name: small + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [4, 9, 12] + +CenterNetDLAFPN: + first_level: 0 + last_level: 3 + down_ratio: 8 + dcn_v2: False + +TrainReader: + batch_size: 32 diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_r50_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_r50_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a8b1e98e3373507c8572ee77fab868f2b21bed64 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_r50_140e_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_140e.yml', + '_base_/centernet_r50.yml', + '_base_/centernet_reader.yml', +] + +weights: output/centernet_r50_140e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/centernet/centernet_shufflenetv2_140e_coco.yml b/PaddleDetection-release-2.6/configs/centernet/centernet_shufflenetv2_140e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..9ccdae16400c77c0c7f2775db531e4687379f545 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/centernet/centernet_shufflenetv2_140e_coco.yml @@ -0,0 +1,33 @@ +_BASE_: [ + 'centernet_r50_140e_coco.yml' +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ShuffleNetV2_x1_0_pretrained.pdparams +weights: output/centernet_shufflenetv2_140e_coco/model_final + +CenterNet: + backbone: ShuffleNetV2 + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +ShuffleNetV2: + scale: 1.0 + feature_maps: [5, 13, 17] + act: leaky_relu + +CenterNetDLAFPN: + first_level: 0 + last_level: 3 + down_ratio: 8 + dcn_v2: False + +TrainReader: + batch_size: 32 + +TestReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: False, input_h: 512, input_w: 512} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834]} + - Permute: {} diff --git a/PaddleDetection-release-2.6/configs/convnext/README.md b/PaddleDetection-release-2.6/configs/convnext/README.md new file mode 100644 index 0000000000000000000000000000000000000000..644d66815660427d2a6cdf587c014d8cb877eb15 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/convnext/README.md @@ -0,0 +1,20 @@ +# ConvNeXt (A ConvNet for the 2020s) + +## 模型库 +### ConvNeXt on COCO + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: | +| PP-YOLOE-ConvNeXt-tiny | 640 | 16 | 36e | 44.6 | 63.3 | 33.04 | 13.87 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [配置文件](./ppyoloe_convnext_tiny_36e_coco.yml) | +| YOLOX-ConvNeXt-s | 640 | 8 | 36e | 44.6 | 65.3 | 36.20 | 27.52 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [配置文件](./yolox_convnext_s_36e_coco.yml) | + + +## Citations +``` +@Article{liu2022convnet, + author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie}, + title = {A ConvNet for the 2020s}, + journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + year = {2022}, +} +``` diff --git a/PaddleDetection-release-2.6/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml b/PaddleDetection-release-2.6/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..360a368ec0837033ab408db59aa0d4ea5b7972dd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +depth_mult: 0.25 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_convnext_tiny_36e_coco/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/convnext_tiny_22k_224.pdparams + + +YOLOv3: + backbone: ConvNeXt + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +ConvNeXt: + arch: 'tiny' + drop_path_rate: 0.4 + layer_scale_init_value: 1.0 + return_idx: [1, 2, 3] + + +PPYOLOEHead: + static_assigner_epoch: 12 + nms: + nms_top_k: 10000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +TrainReader: + batch_size: 16 + + +epoch: 36 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [36] + use_warmup: false + +OptimizerBuilder: + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0005 diff --git a/PaddleDetection-release-2.6/configs/convnext/yolox_convnext_s_36e_coco.yml b/PaddleDetection-release-2.6/configs/convnext/yolox_convnext_s_36e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b41551dee8a2e2793ac09d474c0e7d2a8868299f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/convnext/yolox_convnext_s_36e_coco.yml @@ -0,0 +1,58 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../yolox/_base_/yolox_cspdarknet.yml', + '../yolox/_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 5 +weights: output/yolox_convnext_s_36e_coco/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/convnext_tiny_22k_224.pdparams + + +YOLOX: + backbone: ConvNeXt + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +ConvNeXt: + arch: 'tiny' + drop_path_rate: 0.4 + layer_scale_init_value: 1.0 + return_idx: [1, 2, 3] + + +TrainReader: + batch_size: 8 + mosaic_epoch: 30 + + +YOLOXHead: + l1_epoch: 30 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 1000 + score_threshold: 0.001 + nms_threshold: 0.65 + + +epoch: 36 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [36] + use_warmup: false + +OptimizerBuilder: + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0005 diff --git a/PaddleDetection-release-2.6/configs/datasets/coco_detection.yml b/PaddleDetection-release-2.6/configs/datasets/coco_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..176ba271c7910531ef1e6f8ed72572cd2a5d4efa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/coco_detection.yml @@ -0,0 +1,21 @@ +metric: COCO +num_classes: 80 + +TrainDataset: + name: COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + name: COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + allow_empty: true + +TestDataset: + name: ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/datasets/coco_instance.yml b/PaddleDetection-release-2.6/configs/datasets/coco_instance.yml new file mode 100644 index 0000000000000000000000000000000000000000..91c4ab8890e5353becf0deb43d9e0d256a991987 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/coco_instance.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 80 + +TrainDataset: + name: COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_poly', 'is_crowd'] + +EvalDataset: + name: COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + name: ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/datasets/dota.yml b/PaddleDetection-release-2.6/configs/datasets/dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..9dda08400aaac1d914b1858dda32ff0f82717b49 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/dota.yml @@ -0,0 +1,21 @@ +metric: RBOX +num_classes: 15 + +TrainDataset: + !COCODataSet + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +EvalDataset: + !COCODataSet + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +TestDataset: + !ImageFolder + anno_path: test1024/DOTA_test1024.json + dataset_dir: dataset/dota/ diff --git a/PaddleDetection-release-2.6/configs/datasets/dota_ms.yml b/PaddleDetection-release-2.6/configs/datasets/dota_ms.yml new file mode 100644 index 0000000000000000000000000000000000000000..802e8846d7f443a7032cf49a88bfe79328ea41db --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/dota_ms.yml @@ -0,0 +1,21 @@ +metric: RBOX +num_classes: 15 + +TrainDataset: + !COCODataSet + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota_ms/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +EvalDataset: + !COCODataSet + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota_ms/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +TestDataset: + !ImageFolder + anno_path: test1024/DOTA_test1024.json + dataset_dir: dataset/dota_ms/ diff --git a/PaddleDetection-release-2.6/configs/datasets/mcmot.yml b/PaddleDetection-release-2.6/configs/datasets/mcmot.yml new file mode 100644 index 0000000000000000000000000000000000000000..5f639a045b0630fcd4fe87fd01ff461be2c5d8a8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/mcmot.yml @@ -0,0 +1,25 @@ +metric: MCMOT +num_classes: 10 +# using VisDrone2019 MOT dataset with 10 classes as default, you can modify it for your needs. + +# for MCMOT training +TrainDataset: + !MCMOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_mcmot.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + label_list: label_list.txt + +# for MCMOT evaluation +# If you want to change the MCMOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_mcmot/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MCMOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/datasets/mot.yml b/PaddleDetection-release-2.6/configs/datasets/mot.yml new file mode 100644 index 0000000000000000000000000000000000000000..7107da4905e88847aba29e66346a5c05bc418462 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/mot.yml @@ -0,0 +1,23 @@ +metric: MOT +num_classes: 1 + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/datasets/objects365_detection.yml b/PaddleDetection-release-2.6/configs/datasets/objects365_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..735ebf96dcea828459428016bad764c8461e8ee8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/objects365_detection.yml @@ -0,0 +1,21 @@ +metric: COCO +num_classes: 365 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: annotations/zhiyuan_objv2_train.json + dataset_dir: dataset/objects365 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: annotations/zhiyuan_objv2_val.json + dataset_dir: dataset/objects365 + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/zhiyuan_objv2_val.json + dataset_dir: dataset/objects365/ diff --git a/PaddleDetection-release-2.6/configs/datasets/roadsign_voc.yml b/PaddleDetection-release-2.6/configs/datasets/roadsign_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..9a081611aa8dafef5d5c6f1af1476cc038db5702 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/roadsign_voc.yml @@ -0,0 +1,21 @@ +metric: VOC +map_type: integral +num_classes: 4 + +TrainDataset: + name: VOCDataSet + dataset_dir: dataset/roadsign_voc + anno_path: train.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +EvalDataset: + name: VOCDataSet + dataset_dir: dataset/roadsign_voc + anno_path: valid.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +TestDataset: + name: ImageFolder + anno_path: dataset/roadsign_voc/label_list.txt diff --git a/PaddleDetection-release-2.6/configs/datasets/sniper_coco_detection.yml b/PaddleDetection-release-2.6/configs/datasets/sniper_coco_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..b5cff989f5b58e79836e95efa2070c580e5edc44 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/sniper_coco_detection.yml @@ -0,0 +1,47 @@ +metric: SNIPERCOCO +num_classes: 80 + +TrainDataset: + !SniperCOCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + is_trainset: true + image_target_sizes: [2000, 1000] + valid_box_ratio_ranges: [[-1, 0.1],[0.08, -1]] + chip_target_size: 512 + chip_target_stride: 200 + use_neg_chip: false + max_neg_num_per_im: 8 + + +EvalDataset: + !SniperCOCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + is_trainset: false + image_target_sizes: [2000, 1000] + valid_box_ratio_ranges: [[-1, 0.1], [0.08, -1]] + chip_target_size: 512 + chip_target_stride: 200 + max_per_img: -1 + nms_thresh: 0.5 + +TestDataset: + !SniperCOCODataSet + image_dir: val2017 + dataset_dir: dataset/coco + is_trainset: false + image_target_sizes: [2000, 1000] + valid_box_ratio_ranges: [[-1, 0.1],[0.08, -1]] + chip_target_size: 500 + chip_target_stride: 200 + max_per_img: -1 + nms_thresh: 0.5 + + diff --git a/PaddleDetection-release-2.6/configs/datasets/sniper_visdrone_detection.yml b/PaddleDetection-release-2.6/configs/datasets/sniper_visdrone_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..f6c12a9516b71026e3451e10b128aec1fbf96160 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/sniper_visdrone_detection.yml @@ -0,0 +1,47 @@ +metric: SNIPERCOCO +num_classes: 9 + +TrainDataset: + !SniperCOCODataSet + image_dir: train + anno_path: annotations/train.json + dataset_dir: dataset/VisDrone2019_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + is_trainset: true + image_target_sizes: [8145, 2742] + valid_box_ratio_ranges: [[-1, 0.03142857142857144], [0.02333211853008726, -1]] + chip_target_size: 1536 + chip_target_stride: 1184 + use_neg_chip: false + max_neg_num_per_im: 8 + + +EvalDataset: + !SniperCOCODataSet + image_dir: val + anno_path: annotations/val.json + dataset_dir: dataset/VisDrone2019_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + is_trainset: false + image_target_sizes: [8145, 2742] + valid_box_ratio_ranges: [[-1, 0.03142857142857144], [0.02333211853008726, -1]] + chip_target_size: 1536 + chip_target_stride: 1184 + max_per_img: -1 + nms_thresh: 0.5 + +TestDataset: + !SniperCOCODataSet + image_dir: val + dataset_dir: dataset/VisDrone2019_coco + is_trainset: false + image_target_sizes: [8145, 2742] + valid_box_ratio_ranges: [[-1, 0.03142857142857144], [0.02333211853008726, -1]] + chip_target_size: 1536 + chip_target_stride: 1184 + max_per_img: -1 + nms_thresh: 0.5 + + diff --git a/PaddleDetection-release-2.6/configs/datasets/spine_coco.yml b/PaddleDetection-release-2.6/configs/datasets/spine_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..2339c26db1fcd55a52c8cc7b7dc2623964b7c97a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/spine_coco.yml @@ -0,0 +1,21 @@ +metric: RBOX +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/spine_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/valid.json + dataset_dir: dataset/spine_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] + +TestDataset: + !ImageFolder + anno_path: annotations/valid.json + dataset_dir: dataset/spine_coco diff --git a/PaddleDetection-release-2.6/configs/datasets/visdrone_detection.yml b/PaddleDetection-release-2.6/configs/datasets/visdrone_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..37feb6e2618ff9d83ce2842a9e581dcfd31efc78 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/visdrone_detection.yml @@ -0,0 +1,22 @@ +metric: COCO +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: VisDrone2019-DET-train + anno_path: train.json + dataset_dir: dataset/visdrone + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + # image_dir: test_dev + # anno_path: test_dev.json + dataset_dir: dataset/visdrone + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/visdrone diff --git a/PaddleDetection-release-2.6/configs/datasets/voc.yml b/PaddleDetection-release-2.6/configs/datasets/voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..72182bed9d17ca076c94a1872613ce7ad29d36d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/voc.yml @@ -0,0 +1,21 @@ +metric: VOC +map_type: 11point +num_classes: 20 + +TrainDataset: + name: VOCDataSet + dataset_dir: dataset/voc + anno_path: trainval.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +EvalDataset: + name: VOCDataSet + dataset_dir: dataset/voc + anno_path: test.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +TestDataset: + name: ImageFolder + anno_path: dataset/voc/label_list.txt diff --git a/PaddleDetection-release-2.6/configs/datasets/wider_face.yml b/PaddleDetection-release-2.6/configs/datasets/wider_face.yml new file mode 100644 index 0000000000000000000000000000000000000000..cc01378d728af95ce7072001aefea08ba80631e2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/datasets/wider_face.yml @@ -0,0 +1,20 @@ +metric: WiderFace +num_classes: 1 + +TrainDataset: + !WIDERFaceDataSet + dataset_dir: dataset/wider_face + anno_path: wider_face_split/wider_face_train_bbx_gt.txt + image_dir: WIDER_train/images + data_fields: ['image', 'gt_bbox', 'gt_class'] + +EvalDataset: + !WIDERFaceDataSet + dataset_dir: dataset/wider_face + anno_path: wider_face_split/wider_face_val_bbx_gt.txt + image_dir: WIDER_val/images + data_fields: ['image'] + +TestDataset: + !ImageFolder + use_default_label: true diff --git a/PaddleDetection-release-2.6/configs/dcn/README.md b/PaddleDetection-release-2.6/configs/dcn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..90502dcea6f3ae3a452fa3f48d2005801900064f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/README.md @@ -0,0 +1,37 @@ +### Deformable ConvNets v2 + +| 骨架网络 | 网络类型 | 卷积 | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| Box AP | Mask AP | 下载 | 配置文件 | +| :------------------- | :------------- | :-----: |:--------: | :-----: | :-----------: |:----: | :-----: | :----------------------------------------------------------: | :----: | +| ResNet50-FPN | Faster | c3-c5 | 1 | 1x | - | 42.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml) | +| ResNet50-vd-FPN | Faster | c3-c5 | 1 | 1x | - | 42.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml) | +| ResNet50-vd-FPN | Faster | c3-c5 | 1 | 2x | - | 43.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml) | +| ResNet101-vd-FPN | Faster | c3-c5 | 1 | 1x | - | 45.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Faster | c3-c5 | 1 | 1x | - | 46.5 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) |[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) | +| ResNet50-FPN | Mask | c3-c5 | 1 | 1x | - | 42.7 | 38.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml) | +| ResNet50-vd-FPN | Mask | c3-c5 | 1 | 2x | - | 44.6 | 39.8 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml) | +| ResNet101-vd-FPN | Mask | c3-c5 | 1 | 1x | - | 45.6 | 40.6 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Mask | c3-c5 | 1 | 1x | - | 47.3 | 42.0 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) | +| ResNet50-FPN | Cascade Faster | c3-c5 | 1 | 1x | - | 42.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Cascade Faster | c3-c5 | 1 | 1x | - | 48.8 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) | + + +**注意事项:** + +- Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168). +- `c3-c5`意思是在resnet模块的3到5阶段增加`dcn`. + +## Citations +``` +@inproceedings{dai2017deformable, + title={Deformable Convolutional Networks}, + author={Dai, Jifeng and Qi, Haozhi and Xiong, Yuwen and Li, Yi and Zhang, Guodong and Hu, Han and Wei, Yichen}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + year={2017} +} +@article{zhu2018deformable, + title={Deformable ConvNets v2: More Deformable, Better Results}, + author={Zhu, Xizhou and Hu, Han and Lin, Stephen and Dai, Jifeng}, + journal={arXiv preprint arXiv:1811.11168}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..9f2738f3a85175b7cde3e1dac962177cb5852912 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml @@ -0,0 +1,16 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../cascade_rcnn/_base_/optimizer_1x.yml', + '../cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml', + '../cascade_rcnn/_base_/cascade_fpn_reader.yml', +] +weights: output/cascade_rcnn_dcn_r50_fpn_1x_coco/model_final + +ResNet: + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4180919edcbf139f4c61109c084e1cd289caba0e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml @@ -0,0 +1,16 @@ +_BASE_: [ + 'cascade_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco/model_final + +ResNet: + depth: 101 + groups: 64 + base_width: 4 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..274c1710bb11612798c8368e3edd048b4fddad97 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'faster_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/faster_rcnn_dcn_r101_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..1cd02ac1f2cedcc7892497752a4d4779dc635718 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml @@ -0,0 +1,16 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', +] +weights: output/faster_rcnn_dcn_r50_fpn_1x_coco/model_final + +ResNet: + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..735edbbd1e8160467761eb1e79406ac4ed89de9b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'faster_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/faster_rcnn_dcn_r50_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..685d9671068e76139da5212e3059403626843ccc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml @@ -0,0 +1,26 @@ +_BASE_: [ + 'faster_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/faster_rcnn_dcn_r50_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..68fef482bed4eeaa09faf27c9babece0c57adaed --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml @@ -0,0 +1,17 @@ +_BASE_: [ + 'faster_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + groups: 64 + base_width: 4 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..930bd89875e360374aed7a970989878cc63c34c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'mask_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/mask_rcnn_dcn_r101_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b14a1ed1dd2ed3bf8522ebe810be8b0d4d0f80a7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml @@ -0,0 +1,16 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '../mask_rcnn/_base_/optimizer_1x.yml', + '../mask_rcnn/_base_/mask_rcnn_r50_fpn.yml', + '../mask_rcnn/_base_/mask_fpn_reader.yml', +] +weights: output/mask_rcnn_dcn_r50_fpn_1x_coco/model_final + +ResNet: + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d36b5f56f0c790b9eb6aa7e9f7778057a66fc1be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml @@ -0,0 +1,26 @@ +_BASE_: [ + 'mask_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/mask_rcnn_dcn_r50_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8e7857c5916ccd0da2177fee64dc662971e8922f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml @@ -0,0 +1,17 @@ +_BASE_: [ + 'mask_rcnn_dcn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + variant: d + groups: 64 + base_width: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] diff --git a/PaddleDetection-release-2.6/configs/deformable_detr/README.md b/PaddleDetection-release-2.6/configs/deformable_detr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..09118ec5b547e754facf22d0ed5202231dc85174 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/deformable_detr/README.md @@ -0,0 +1,36 @@ +# Deformable DETR + +## Introduction + + +Deformable DETR is an object detection model based on DETR. We reproduced the model of the paper. + + +## Model Zoo + +| Backbone | Model | Images/GPU | Inf time (fps) | Box AP | Config | Download | +|:------:|:--------:|:--------:|:--------------:|:------:|:------:|:--------:| +| R-50 | Deformable DETR | 2 | --- | 44.5 | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/deformable_detr/deformable_detr_r50_1x_coco.yml) | [model](https://paddledet.bj.bcebos.com/models/deformable_detr_r50_1x_coco.pdparams) | + +**Notes:** + +- Deformable DETR is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. +- Deformable DETR uses 8GPU to train 50 epochs. + +GPU multi-card training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --fleet +``` + +## Citations +``` +@inproceedings{ +zhu2021deformable, +title={Deformable DETR: Deformable Transformers for End-to-End Object Detection}, +author={Xizhou Zhu and Weijie Su and Lewei Lu and Bin Li and Xiaogang Wang and Jifeng Dai}, +booktitle={International Conference on Learning Representations}, +year={2021}, +url={https://openreview.net/forum?id=gZ9hCDWe6ke} +} +``` diff --git a/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_r50.yml b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..641129a6e519dd234d1a418d702f31bd97e6365a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_r50.yml @@ -0,0 +1,48 @@ +architecture: DETR +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vb_normal_pretrained.pdparams +hidden_dim: 256 +use_focal_loss: True + + +DETR: + backbone: ResNet + transformer: DeformableTransformer + detr_head: DeformableDETRHead + post_process: DETRBBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + lr_mult_list: [0.0, 0.1, 0.1, 0.1] + num_stages: 4 + + +DeformableTransformer: + num_queries: 300 + position_embed_type: sine + nhead: 8 + num_encoder_layers: 6 + num_decoder_layers: 6 + dim_feedforward: 1024 + dropout: 0.1 + activation: relu + num_feature_levels: 4 + num_encoder_points: 4 + num_decoder_points: 4 + + +DeformableDETRHead: + num_mlp_layers: 3 + + +DETRLoss: + loss_coeff: {class: 2, bbox: 5, giou: 2, mask: 1, dice: 1} + aux_loss: True + + +HungarianMatcher: + matcher_coeff: {class: 2, bbox: 5, giou: 2} diff --git a/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_reader.yml b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..c15a0f3b6390fb7627f46f040fbd5054398b0e6b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_detr_reader.yml @@ -0,0 +1,48 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - RandomSelect: { transforms1: [ RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ], + transforms2: [ + RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ] }, + RandomSizeCrop: { min_size: 384, max_size: 600 }, + RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ] + } + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - NormalizeBox: {} + - BboxXYXY2XYWH: {} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_optimizer_1x.yml b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..c068f4de493fabb52fac94d3d55c8b2b04efd850 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/deformable_detr/_base_/deformable_optimizer_1x.yml @@ -0,0 +1,16 @@ +epoch: 50 + +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [40] + use_warmup: false + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/deformable_detr/deformable_detr_r50_1x_coco.yml b/PaddleDetection-release-2.6/configs/deformable_detr/deformable_detr_r50_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4ca749106d418ad7e188a3ebba33fc0cb2860279 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/deformable_detr/deformable_detr_r50_1x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/deformable_optimizer_1x.yml', + '_base_/deformable_detr_r50.yml', + '_base_/deformable_detr_reader.yml', +] +weights: output/deformable_detr_r50_1x_coco/model_final +find_unused_parameters: True diff --git a/PaddleDetection-release-2.6/configs/detr/README.md b/PaddleDetection-release-2.6/configs/detr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8f661212a950e858161acdf44efa00d4c343f209 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/detr/README.md @@ -0,0 +1,39 @@ +# DETR + +## Introduction + + +DETR is an object detection model based on transformer. We reproduced the model of the paper. + + +## Model Zoo + +| Backbone | Model | Images/GPU | Inf time (fps) | Box AP | Config | Download | +|:------:|:--------:|:--------:|:--------------:|:------:|:------:|:--------:| +| R-50 | DETR | 4 | --- | 42.3 | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/detr/detr_r50_1x_coco.yml) | [model](https://paddledet.bj.bcebos.com/models/detr_r50_1x_coco.pdparams) | + +**Notes:** + +- DETR is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. +- DETR uses 8GPU to train 500 epochs. + +GPU multi-card training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/detr/detr_r50_1x_coco.yml --fleet +``` + +## Citations +``` +@inproceedings{detr, + author = {Nicolas Carion and + Francisco Massa and + Gabriel Synnaeve and + Nicolas Usunier and + Alexander Kirillov and + Sergey Zagoruyko}, + title = {End-to-End Object Detection with Transformers}, + booktitle = {ECCV}, + year = {2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/detr/_base_/detr_r50.yml b/PaddleDetection-release-2.6/configs/detr/_base_/detr_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..5006f11937c9a7c2566913a08144fbb6ee3d0efa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/detr/_base_/detr_r50.yml @@ -0,0 +1,44 @@ +architecture: DETR +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vb_normal_pretrained.pdparams +hidden_dim: 256 + + +DETR: + backbone: ResNet + transformer: DETRTransformer + detr_head: DETRHead + post_process: DETRBBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [3] + lr_mult_list: [0.0, 0.1, 0.1, 0.1] + num_stages: 4 + + +DETRTransformer: + num_queries: 100 + position_embed_type: sine + nhead: 8 + num_encoder_layers: 6 + num_decoder_layers: 6 + dim_feedforward: 2048 + dropout: 0.1 + activation: relu + + +DETRHead: + num_mlp_layers: 3 + + +DETRLoss: + loss_coeff: {class: 1, bbox: 5, giou: 2, no_object: 0.1, mask: 1, dice: 1} + aux_loss: True + + +HungarianMatcher: + matcher_coeff: {class: 1, bbox: 5, giou: 2} diff --git a/PaddleDetection-release-2.6/configs/detr/_base_/detr_reader.yml b/PaddleDetection-release-2.6/configs/detr/_base_/detr_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..997ef724afcebc3ba648ea3f09858b9950dd0550 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/detr/_base_/detr_reader.yml @@ -0,0 +1,48 @@ +worker_num: 0 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - RandomSelect: { transforms1: [ RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ], + transforms2: [ + RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ] }, + RandomSizeCrop: { min_size: 384, max_size: 600 }, + RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ] + } + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - NormalizeBox: {} + - BboxXYXY2XYWH: {} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/detr/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/detr/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..13528c5eba5bc81092c7af62e289a4d887c6f15f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/detr/_base_/optimizer_1x.yml @@ -0,0 +1,16 @@ +epoch: 500 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [400] + use_warmup: false + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/detr/detr_r50_1x_coco.yml b/PaddleDetection-release-2.6/configs/detr/detr_r50_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d8838fac01b446168338ba27fcf4c2ae1722f0ef --- /dev/null +++ b/PaddleDetection-release-2.6/configs/detr/detr_r50_1x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/detr_r50.yml', + '_base_/detr_reader.yml', +] +weights: output/detr_r50_1x_coco/model_final +find_unused_parameters: True diff --git a/PaddleDetection-release-2.6/configs/dino/README.md b/PaddleDetection-release-2.6/configs/dino/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e7d666f8b0f6e4bf28f37bbea5dfdcf64b68ce97 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/README.md @@ -0,0 +1,39 @@ +# DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection + +## Introduction + + +[DINO](https://arxiv.org/abs/2203.03605) is an object detection model based on DETR. We reproduced the model of the paper. + + +## Model Zoo + +| Backbone | Model | Epochs | Box AP | Config | Download | +|:------:|:---------------:|:------:|:------:|:---------------------------------------:|:--------------------------------------------------------------------------------:| +| R-50 | dino_r50_4scale | 12 | 49.1 | [config](./dino_r50_4scale_1x_coco.yml) | [model](https://paddledet.bj.bcebos.com/models/dino_r50_4scale_1x_coco.pdparams) | +| R-50 | dino_r50_4scale | 24 | 50.5 | [config](./dino_r50_4scale_2x_coco.yml) | [model](https://paddledet.bj.bcebos.com/models/dino_r50_4scale_2x_coco.pdparams) | + +**Notes:** + +- DINO is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. +- DINO uses 4GPU to train. + +GPU multi-card training +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/dino/dino_r50_4scale_1x_coco.yml --fleet --eval +``` + +## Custom Operator +- Multi-scale deformable attention custom operator see [here](../../ppdet/modeling/transformers/ext_op). + +## Citations +``` +@misc{zhang2022dino, + title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection}, + author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum}, + year={2022}, + eprint={2203.03605}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/PaddleDetection-release-2.6/configs/dino/_base_/dino_r50.yml b/PaddleDetection-release-2.6/configs/dino/_base_/dino_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..0b151bd48960fbfbba90962018525d53bd5a8865 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/_base_/dino_r50.yml @@ -0,0 +1,49 @@ +architecture: DETR +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams +hidden_dim: 256 +use_focal_loss: True + + +DETR: + backbone: ResNet + transformer: DINOTransformer + detr_head: DINOHead + post_process: DETRBBoxPostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + lr_mult_list: [0.0, 0.1, 0.1, 0.1] + num_stages: 4 + +DINOTransformer: + num_queries: 900 + position_embed_type: sine + num_levels: 4 + nhead: 8 + num_encoder_layers: 6 + num_decoder_layers: 6 + dim_feedforward: 2048 + dropout: 0.0 + activation: relu + pe_temperature: 20 + pe_offset: 0.0 + num_denoising: 100 + label_noise_ratio: 0.5 + box_noise_scale: 1.0 + learnt_init_query: True + +DINOHead: + loss: + name: DINOLoss + loss_coeff: {class: 1, bbox: 5, giou: 2} + aux_loss: True + matcher: + name: HungarianMatcher + matcher_coeff: {class: 2, bbox: 5, giou: 2} + +DETRBBoxPostProcess: + num_top_queries: 300 diff --git a/PaddleDetection-release-2.6/configs/dino/_base_/dino_reader.yml b/PaddleDetection-release-2.6/configs/dino/_base_/dino_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..c62a8054cf3593d75f51a85562dfd816ce1c3463 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/_base_/dino_reader.yml @@ -0,0 +1,48 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - RandomSelect: { transforms1: [ RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ], + transforms2: [ + RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ] }, + RandomSizeCrop: { min_size: 384, max_size: 600 }, + RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ] + } + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - NormalizeBox: {} + - BboxXYXY2XYWH: {} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadMaskBatch: {pad_to_stride: -1, return_pad_mask: true} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..63b3a9ed27949559e77696af0b026c49118a1a5c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_1x.yml @@ -0,0 +1,16 @@ +epoch: 12 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [11] + use_warmup: false + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_2x.yml b/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..d009dfd2e0f8e6432b1dfd8888e15876b5cb8f3b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/_base_/optimizer_2x.yml @@ -0,0 +1,16 @@ +epoch: 24 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [20] + use_warmup: false + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_1x_coco.yml b/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c3f471e14e1554844b93235663e0f2ad4a611bfe --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_1x_coco.yml @@ -0,0 +1,11 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/dino_r50.yml', + '_base_/dino_reader.yml', +] + +weights: output/dino_r50_4scale_1x_coco/model_final +find_unused_parameters: True +log_iter: 100 diff --git a/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_2x_coco.yml b/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..e8588f55e03aef6e6287bd7653ea3973f5bfedb1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/dino/dino_r50_4scale_2x_coco.yml @@ -0,0 +1,11 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_2x.yml', + '_base_/dino_r50.yml', + '_base_/dino_reader.yml', +] + +weights: output/dino_r50_4scale_2x_coco/model_final +find_unused_parameters: True +log_iter: 100 diff --git a/PaddleDetection-release-2.6/configs/face_detection/README.md b/PaddleDetection-release-2.6/configs/face_detection/README.md new file mode 100644 index 0000000000000000000000000000000000000000..badfa494d049d38f4563293c4adea560deea900c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/README.md @@ -0,0 +1,176 @@ +# 人脸检测模型 + +## 简介 +`face_detection`中提供高效、高速的人脸检测解决方案,包括最先进的模型和经典模型。 + +![](../../docs/images/12_Group_Group_12_Group_Group_12_935.jpg) + +## 模型库 + +#### WIDER-FACE数据集上的mAP + +| 网络结构 | 输入尺寸 | 图片个数/GPU | 学习率策略 | Easy/Medium/Hard Set | 预测时延(SD855)| 模型大小(MB) | 下载 | 配置文件 | +|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:| +| BlazeFace | 640 | 8 | 1000e | 0.885 / 0.855 / 0.731 | - | 0.472 |[下载链接](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/blazeface_1000e.yml) | +| BlazeFace-FPN-SSH | 640 | 8 | 1000e | 0.907 / 0.883 / 0.793 | - | 0.479 |[下载链接](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/blazeface_fpn_ssh_1000e.yml) | + +**注意:** +- 我们使用多尺度评估策略得到`Easy/Medium/Hard Set`里的mAP。具体细节请参考[在WIDER-FACE数据集上评估](#在WIDER-FACE数据集上评估)。 + +## 快速开始 + +### 数据准备 +我们使用[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)进行训练和模型测试,官方网站提供了详细的数据介绍。 +- WIDER-Face数据源: +使用如下目录结构加载`wider_face`类型的数据集: + + ``` + dataset/wider_face/ + ├── wider_face_split + │ ├── wider_face_train_bbx_gt.txt + │ ├── wider_face_val_bbx_gt.txt + ├── WIDER_train + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_100.jpg + │ │ │ ├── 0_Parade_marchingband_1_381.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ├── WIDER_val + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_1004.jpg + │ │ │ ├── 0_Parade_marchingband_1_1045.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ``` + +- 手动下载数据集: +要下载WIDER-FACE数据集,请运行以下命令: +``` +cd dataset/wider_face && ./download_wider_face.sh +``` + +### 参数配置 +基础模型的配置可以参考`configs/face_detection/_base_/blazeface.yml`; +改进模型增加FPN和SSH的neck结构,配置文件可以参考`configs/face_detection/_base_/blazeface_fpn.yml`,可以根据需求配置FPN和SSH,具体如下: +```yaml +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: hard_swish #配置backbone中BlazeBlock的激活函数,基础模型为relu,增加FPN和SSH时需使用hard_swish + +BlazeNeck: + neck_type : fpn_ssh #可选only_fpn、only_ssh和fpn_ssh + in_channel: [96,96] +``` + + + +### 训练与评估 +训练流程与评估流程方法与其他算法一致,请参考[GETTING_STARTED_cn.md](../../docs/tutorials/GETTING_STARTED_cn.md)。 +**注意:** 人脸检测模型目前不支持边训练边评估。 + +#### 在WIDER-FACE数据集上评估 +- 步骤一:评估并生成结果文件: +```shell +python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \ + -o weights=output/blazeface_1000e/model_final \ + multi_scale=True +``` +设置`multi_scale=True`进行多尺度评估,评估完成后,将在`output/pred`中生成txt格式的测试结果。 + +- 步骤二:下载官方评估脚本和Ground Truth文件: +``` +wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip +unzip eval_tools.zip && rm -f eval_tools.zip +``` + +- 步骤三:开始评估 + +方法一:python评估: +``` +git clone https://github.com/wondervictor/WiderFace-Evaluation.git +cd WiderFace-Evaluation +# 编译 +python3 setup.py build_ext --inplace +# 开始评估 +python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth +``` + +方法二:MatLab评估: +``` +# 在`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称: +pred_dir = './pred'; +legend_name = 'Paddle-BlazeFace'; + +`wider_eval.m` 是评估模块的主要执行程序。运行命令如下: +matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;" +``` + +### Python脚本预测 +为了支持二次开发,这里提供通过Python脚本使用Paddle Detection whl包来进行预测的示例。 +```python +import cv2 +import paddle +import numpy as np +from ppdet.core.workspace import load_config +from ppdet.engine import Trainer +from ppdet.metrics import get_infer_results +from ppdet.data.transform.operators import NormalizeImage, Permute + + +if __name__ == '__main__': + # 准备基础的参数 + config_path = 'PaddleDetection/configs/face_detection/blazeface_1000e.yml' + cfg = load_config(config_path) + weight_path = 'PaddleDetection/output/blazeface_1000e.pdparams' + infer_img_path = 'PaddleDetection/demo/hrnet_demo.jpg' + cfg.weights = weight_path + bbox_thre = 0.8 + paddle.set_device('gpu') + # 创建所需的类 + trainer = Trainer(cfg, mode='test') + trainer.load_weights(cfg.weights) + trainer.model.eval() + normaler = NormalizeImage(mean=[123, 117, 104], std=[127.502231, 127.502231, 127.502231], is_scale=False) + permuter = Permute() + # 进行图片读取 + im = cv2.imread(infer_img_path) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + # 准备数据字典 + data_dict = {'image': im} + data_dict = normaler(data_dict) + data_dict = permuter(data_dict) + h, w, c = im.shape + data_dict['im_id'] = paddle.Tensor(np.array([[0]])) + data_dict['im_shape'] = paddle.Tensor(np.array([[h, w]], dtype=np.float32)) + data_dict['scale_factor'] = paddle.Tensor(np.array([[1., 1.]], dtype=np.float32)) + data_dict['image'] = paddle.Tensor(data_dict['image'].reshape((1, c, h, w))) + data_dict['curr_iter'] = paddle.Tensor(np.array([0])) + # 进行预测 + outs = trainer.model(data_dict) + # 对预测的数据进行后处理得到最终的bbox信息 + for key in ['im_shape', 'scale_factor', 'im_id']: + outs[key] = data_dict[key] + for key, value in outs.items(): + outs[key] = value.numpy() + clsid2catid, catid2name = {0: 'face'}, {0: 0} + batch_res = get_infer_results(outs, clsid2catid) + bbox = [sub_dict for sub_dict in batch_res['bbox'] if sub_dict['score'] > bbox_thre] + print(bbox) +``` + +## Citations + +``` +@article{bazarevsky2019blazeface, + title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs}, + author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann}, + year={2019}, + eprint={1907.05047}, + archivePrefix={arXiv}, +``` diff --git a/PaddleDetection-release-2.6/configs/face_detection/README_en.md b/PaddleDetection-release-2.6/configs/face_detection/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..96bb53280ef442cbf5c0f12ee5e0cdef3bb57c33 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/README_en.md @@ -0,0 +1,176 @@ +# Face Detection Model + +## Introduction +`face_detection` High efficiency, high speed face detection solutions, including the most advanced models and classic models. + +![](../../docs/images/12_Group_Group_12_Group_Group_12_935.jpg) + +## Model Library + +#### A mAP on the WIDERFACE dataset + +| Network structure | size | images/GPUs | Learning rate strategy | Easy/Medium/Hard Set | Prediction delay(SD855)| Model size(MB) | Download | Configuration File | +|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:| +| BlazeFace | 640 | 8 | 1000e | 0.885 / 0.855 / 0.731 | - | 0.472 |[link](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/blazeface_1000e.yml) | +| BlazeFace-FPN-SSH | 640 | 8 | 1000e | 0.907 / 0.883 / 0.793 | - | 0.479 |[link](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/blazeface_fpn_ssh_1000e.yml) | + +**Attention:** +- We use a multi-scale evaluation strategy to get the mAP in `Easy/Medium/Hard Set`. Please refer to the [evaluation on the WIDER FACE dataset](#Evaluated-on-the-WIDER-FACE-Dataset) for details. + +## Quick Start + +### Data preparation +We use [WIDER-FACE dataset](http://shuoyang1213.me/WIDERFACE/) for training and model tests, the official web site provides detailed data is introduced. +- WIDER-Face data source: +- Load a dataset of type `wider_face` using the following directory structure: + ``` + dataset/wider_face/ + ├── wider_face_split + │ ├── wider_face_train_bbx_gt.txt + │ ├── wider_face_val_bbx_gt.txt + ├── WIDER_train + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_100.jpg + │ │ │ ├── 0_Parade_marchingband_1_381.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ├── WIDER_val + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_1004.jpg + │ │ │ ├── 0_Parade_marchingband_1_1045.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ``` + +- Manually download the dataset: +To download the WIDER-FACE dataset, run the following command: +``` +cd dataset/wider_face && ./download_wider_face.sh +``` + +### Parameter configuration +The configuration of the base model can be referenced to `configs/face_detection/_base_/blazeface.yml`; +Improved model to add FPN and SSH neck structure, configuration files can be referenced to `configs/face_detection/_base_/blazeface_fpn.yml`, You can configure FPN and SSH as required +```yaml +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: hard_swish #Configure Blaze Block activation function in Backbone. The basic model is Relu. hard_swish is needed to add FPN and SSH + +BlazeNeck: + neck_type : fpn_ssh #only_fpn, only_ssh and fpn_ssh + in_channel: [96,96] +``` + + + +### Training and Evaluation +The training process and evaluation process methods are consistent with other algorithms, please refer to [GETTING_STARTED_cn.md](../../docs/tutorials/GETTING_STARTED_cn.md)。 +**Attention:** Face detection models currently do not support training and evaluation. + +#### Evaluated on the WIDER-FACE Dataset +- Step 1: Evaluate and generate a result file: +```shell +python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \ + -o weights=output/blazeface_1000e/model_final \ + multi_scale=True +``` +Set `multi_scale=True` for multi-scale evaluation. After evaluation, test results in TXT format will be generated in `output/pred`. + +- Step 2: Download the official evaluation script and Ground Truth file: +``` +wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip +unzip eval_tools.zip && rm -f eval_tools.zip +``` + +- Step 3: Start the evaluation + +Method 1: Python evaluation: +``` +git clone https://github.com/wondervictor/WiderFace-Evaluation.git +cd WiderFace-Evaluation +# compile +python3 setup.py build_ext --inplace +# Begin to assess +python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth +``` + +Method 2: MatLab evaluation: +``` +# Change the name of save result path and draw curve in `eval_tools/wider_eval.m`: +pred_dir = './pred'; +legend_name = 'Paddle-BlazeFace'; + +`wider_eval.m` is the main implementation of the evaluation module. Run the following command: +matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;" +``` + +### Use by Python Code +In order to support development, here is an example of using the Paddle Detection whl package to make predictions through Python code. +```python +import cv2 +import paddle +import numpy as np +from ppdet.core.workspace import load_config +from ppdet.engine import Trainer +from ppdet.metrics import get_infer_results +from ppdet.data.transform.operators import NormalizeImage, Permute + + +if __name__ == '__main__': + # prepare for the parameters + config_path = 'PaddleDetection/configs/face_detection/blazeface_1000e.yml' + cfg = load_config(config_path) + weight_path = 'PaddleDetection/output/blazeface_1000e.pdparams' + infer_img_path = 'PaddleDetection/demo/hrnet_demo.jpg' + cfg.weights = weight_path + bbox_thre = 0.8 + paddle.set_device('gpu') + # create the class object + trainer = Trainer(cfg, mode='test') + trainer.load_weights(cfg.weights) + trainer.model.eval() + normaler = NormalizeImage(mean=[123, 117, 104], std=[127.502231, 127.502231, 127.502231], is_scale=False) + permuter = Permute() + # read the image file + im = cv2.imread(infer_img_path) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + # prepare for the data dict + data_dict = {'image': im} + data_dict = normaler(data_dict) + data_dict = permuter(data_dict) + h, w, c = im.shape + data_dict['im_id'] = paddle.Tensor(np.array([[0]])) + data_dict['im_shape'] = paddle.Tensor(np.array([[h, w]], dtype=np.float32)) + data_dict['scale_factor'] = paddle.Tensor(np.array([[1., 1.]], dtype=np.float32)) + data_dict['image'] = paddle.Tensor(data_dict['image'].reshape((1, c, h, w))) + data_dict['curr_iter'] = paddle.Tensor(np.array([0])) + # do the prediction + outs = trainer.model(data_dict) + # to do the postprocess to get the final bbox info + for key in ['im_shape', 'scale_factor', 'im_id']: + outs[key] = data_dict[key] + for key, value in outs.items(): + outs[key] = value.numpy() + clsid2catid, catid2name = {0: 'face'}, {0: 0} + batch_res = get_infer_results(outs, clsid2catid) + bbox = [sub_dict for sub_dict in batch_res['bbox'] if sub_dict['score'] > bbox_thre] + print(bbox) +``` + + +## Citations + +``` +@article{bazarevsky2019blazeface, + title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs}, + author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann}, + year={2019}, + eprint={1907.05047}, + archivePrefix={arXiv}, +``` diff --git a/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface.yml b/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface.yml new file mode 100644 index 0000000000000000000000000000000000000000..de54100fe63c1d0dd004c5c1797b6a6587106993 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface.yml @@ -0,0 +1,45 @@ +architecture: BlazeFace + +BlazeFace: + backbone: BlazeNet + neck: BlazeNeck + blaze_head: FaceHead + post_process: BBoxPostProcess + +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: relu + +BlazeNeck: + neck_type : None + in_channel: [96,96] + +FaceHead: + in_channels: [96,96] + anchor_generator: AnchorGeneratorSSD + loss: SSDLoss + +SSDLoss: + overlap_threshold: 0.35 + +AnchorGeneratorSSD: + steps: [8., 16.] + aspect_ratios: [[1.], [1.]] + min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] + max_sizes: [[], []] + offset: 0.5 + flip: False + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 750 + score_threshold: 0.01 + nms_threshold: 0.3 + nms_top_k: 5000 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface_fpn.yml b/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..6572a99d301eda65a65c485e133cc00497a2eee2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/_base_/blazeface_fpn.yml @@ -0,0 +1,45 @@ +architecture: BlazeFace + +BlazeFace: + backbone: BlazeNet + neck: BlazeNeck + blaze_head: FaceHead + post_process: BBoxPostProcess + +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: hard_swish + +BlazeNeck: + neck_type : fpn_ssh + in_channel: [96,96] + +FaceHead: + in_channels: [48, 48] + anchor_generator: AnchorGeneratorSSD + loss: SSDLoss + +SSDLoss: + overlap_threshold: 0.35 + +AnchorGeneratorSSD: + steps: [8., 16.] + aspect_ratios: [[1.], [1.]] + min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] + max_sizes: [[], []] + offset: 0.5 + flip: False + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 750 + score_threshold: 0.01 + nms_threshold: 0.3 + nms_top_k: 5000 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/face_detection/_base_/face_reader.yml b/PaddleDetection-release-2.6/configs/face_detection/_base_/face_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5a25e8aa0f1acdd1b3b235a8c1a3923eb2af4ba6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/_base_/face_reader.yml @@ -0,0 +1,44 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - CropWithDataAchorSampling: { + anchor_sampler: [[1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0]], + batch_sampler: [ + [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + ], + target_size: 640} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/face_detection/_base_/optimizer_1000e.yml b/PaddleDetection-release-2.6/configs/face_detection/_base_/optimizer_1000e.yml new file mode 100644 index 0000000000000000000000000000000000000000..d67da4c6786e9418029b3a336c7e8e7d80e2d0bf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/_base_/optimizer_1000e.yml @@ -0,0 +1,21 @@ +epoch: 1000 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 333 + - 800 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.0 + type: RMSProp + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/face_detection/blazeface_1000e.yml b/PaddleDetection-release-2.6/configs/face_detection/blazeface_1000e.yml new file mode 100644 index 0000000000000000000000000000000000000000..58fc908f81fc53c3ea3b39714826f2de5ea0fcea --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/blazeface_1000e.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/wider_face.yml', + '../runtime.yml', + '_base_/optimizer_1000e.yml', + '_base_/blazeface.yml', + '_base_/face_reader.yml', +] +weights: output/blazeface_1000e/model_final +multi_scale_eval: True diff --git a/PaddleDetection-release-2.6/configs/face_detection/blazeface_fpn_ssh_1000e.yml b/PaddleDetection-release-2.6/configs/face_detection/blazeface_fpn_ssh_1000e.yml new file mode 100644 index 0000000000000000000000000000000000000000..21dbd26443856710a5674f8e93e1cc0075836a38 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/face_detection/blazeface_fpn_ssh_1000e.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/wider_face.yml', + '../runtime.yml', + '_base_/optimizer_1000e.yml', + '_base_/blazeface_fpn.yml', + '_base_/face_reader.yml', +] +weights: output/blazeface_fpn_ssh_1000e/model_final +multi_scale_eval: True diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/README.md b/PaddleDetection-release-2.6/configs/faster_rcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..da495599ce180b80ce019ff1828ae63c1140a7ff --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/README.md @@ -0,0 +1,38 @@ +# Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks + +## Model Zoo + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50 | Faster | 1 | 1x | ---- | 36.7 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_1x_coco.yml) | +| ResNet50-vd | Faster | 1 | 1x | ---- | 37.6 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_1x_coco.yml) | +| ResNet101 | Faster | 1 | 1x | ---- | 39.0 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_1x_coco.pdparams) | [配置文件](./faster_rcnn_r101_1x_coco.yml) | +| ResNet34-FPN | Faster | 1 | 1x | ---- | 37.8 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r34_fpn_1x_coco.yml) | +| ResNet34-FPN-MultiScaleTest | Faster | 1 | 1x | ---- | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_multiscaletest_1x_coco.pdparams) | [配置文件](./faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml) | +| ResNet34-vd-FPN | Faster | 1 | 1x | ---- | 38.5 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_vd_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r34_vd_fpn_1x_coco.yml) | +| ResNet50-FPN | Faster | 1 | 1x | ---- | 38.4 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | Faster | 1 | 2x | ---- | 40.0 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_r50_fpn_2x_coco.yml) | +| ResNet50-vd-FPN | Faster | 1 | 1x | ---- | 39.5 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_fpn_1x_coco.yml) | +| ResNet50-vd-FPN | Faster | 1 | 2x | ---- | 40.8 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_fpn_2x_coco.yml) | +| ResNet101-FPN | Faster | 1 | 2x | ---- | 41.4 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_r101_fpn_2x_coco.yml) | +| ResNet101-vd-FPN | Faster | 1 | 1x | ---- | 42.0 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r101_vd_fpn_1x_coco.yml) | +| ResNet101-vd-FPN | Faster | 1 | 2x | ---- | 43.0 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_r101_vd_fpn_2x_coco.yml) | +| ResNeXt101-vd-FPN | Faster | 1 | 1x | ---- | 43.4 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Faster | 1 | 2x | ---- | 44.0 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 1x | ---- | 41.4 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 2x | ---- | 42.3 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| Swin-Tiny-FPN | Faster | 2 | 1x | ---- | 42.6 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_swin_tiny_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_swin_tiny_fpn_1x_coco.yml) | +| Swin-Tiny-FPN | Faster | 2 | 2x | ---- | 44.8 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_swin_tiny_fpn_2x_coco.pdparams) | [配置文件](./faster_rcnn_swin_tiny_fpn_2x_coco.yml) | +| Swin-Tiny-FPN | Faster | 2 | 3x | ---- | 45.3 | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_swin_tiny_fpn_3x_coco.pdparams) | [配置文件](./faster_rcnn_swin_tiny_fpn_3x_coco.yml) | + +## Citations +``` +@article{Ren_2017, + title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + publisher={Institute of Electrical and Electronics Engineers (IEEE)}, + author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian}, + year={2017}, + month={Jun}, +} +``` diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_fpn_reader.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9b9abccd63e499bfa9402f3038425470e4a6e953 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..fd29f5ea1a1df9e2599d3efcff344c5d3363945e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50.yml @@ -0,0 +1,66 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +FasterRCNN: + backbone: ResNet + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [2] + num_stages: 3 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [32, 64, 128, 256, 512] + strides: [16] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 12000 + post_nms_top_n: 2000 + topk_after_collect: False + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 6000 + post_nms_top_n: 1000 + + +BBoxHead: + head: Res5Head + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + with_pool: true + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..38ee81def0cb528f3f67e8ed616b9589bd72de9e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml @@ -0,0 +1,73 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1165cd0a03fd07f41eaea2701526639010cc7e9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResizeCrop: {resizes: [400, 500, 600], cropsizes: [[384, 600], ], prob: 0.5} + - RandomResize: {target_size: [[480, 1333], [512, 1333], [544, 1333], [576, 1333], [608, 1333], [640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 2} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + inputs_def: + image_shape: [-1, 3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: 640, keep_ratio: True} + - Pad: {size: 640} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_tiny_fpn.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_tiny_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..6208600e324a2be5ae2a16f799d58d315dbe1692 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_rcnn_swin_tiny_fpn.yml @@ -0,0 +1,72 @@ +architecture: FasterRCNN + +FasterRCNN: + backbone: SwinTransformer + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + bbox_post_process: BBoxPostProcess + +SwinTransformer: + embed_dim: 96 + depths: [2, 2, 6, 2] + num_heads: [3, 6, 12, 24] + window_size: 7 + ape: false + drop_path_rate: 0.1 + patch_norm: true + out_indices: [0,1,2,3] + pretrained: https://paddledet.bj.bcebos.com/models/pretrained/swin_tiny_patch4_window7_224.pdparams + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_reader.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1c1bb6bc262e86ea69ae78919064aa2b6834311 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/faster_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..4caaa63bda15917137a9ac22b736ae83c3d04856 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_swin_1x.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_swin_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..5c1c6679940834f8ff3bb985bb44f6dc2f281428 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/_base_/optimizer_swin_1x.yml @@ -0,0 +1,22 @@ +epoch: 12 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 1.0 + optimizer: + type: AdamW + weight_decay: 0.05 + + param_groups: + - + params: ['absolute_pos_embed', 'relative_position_bias_table', 'norm'] + weight_decay: 0. diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8876426fb6d6d4f5b89c39e050f1331520d02656 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams +weights: output/faster_rcnn_r101_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + norm_type: bn + freeze_at: 0 + return_idx: [2] + num_stages: 3 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a2e5ee527b60e95b121959492dba1855337467c9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams +weights: output/faster_rcnn_r101_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0a07dec75890977c9d717cce4a704dad59cec237 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.yml @@ -0,0 +1,25 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams +weights: output/faster_rcnn_r101_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..32e308b86ef9937601d29c9026dafd7650d86080 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/faster_rcnn_r101_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..65b8226b9ec8f493fcdb6e82e5f6f9bba903cecf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_2x_coco.yml @@ -0,0 +1,25 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/faster_rcnn_r101_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f1083528578c8ea681f4a550c6726fad31214d16 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams +weights: output/faster_rcnn_r34_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 34 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..559d5f1fe9fdcbf42189383a69f9d1a056792cda --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml @@ -0,0 +1,22 @@ +_BASE_: [ + 'faster_rcnn_r34_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams +weights: output/faster_rcnn_r34_fpn_multiscaletest_1x_coco/model_final + +EvalReader: + sample_transforms: + - Decode: {} +# - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900], use_flip: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + +TestReader: + sample_transforms: + - Decode: {} +# - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900], use_flip: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5cf576b6384ec25bb92fd40705cac8b6196ca793 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_vd_pretrained.pdparams +weights: output/faster_rcnn_r34_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 34 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a49bde88a8e90cfd55264262d3475b18954d1bd4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/faster_rcnn_r50.yml', + '_base_/faster_reader.yml', +] +weights: output/faster_rcnn_r50_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..e7b4518957b46bb6310cc65820cb2afd75aaa8bf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/faster_rcnn_r50_fpn.yml', + '_base_/faster_fpn_reader.yml', +] +weights: output/faster_rcnn_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..7edaadc30250ca5dec06f2db69650291027d4fd3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] +weights: output/faster_rcnn_r50_fpn_2x_coco/model_final + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ac0e720499ad204ad3a09785ac30ac6e6b1ef21c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/faster_rcnn_r50_vd_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [2] + num_stages: 3 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6bf9d7101ee9a89b2c480e04bb3279d608c2f9e3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/faster_rcnn_r50_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..7fc3a883574a6694b5379b21a38be7de354ee6df --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml @@ -0,0 +1,25 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/faster_rcnn_r50_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d71b82d8301ffd26c86d68245be890fd99e4dec0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/faster_rcnn_r50_fpn.yml', + '_base_/faster_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/faster_rcnn_r50_vd_fpn_ssld_1x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 12 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0562354e7a3c64bf1dd96a21108868dbca70d46e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/faster_rcnn_r50_fpn.yml', + '_base_/faster_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..7bb783b6aae6e76cf88eeed750087573aa6a0060 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_swin_1x.yml', + '_base_/faster_rcnn_swin_tiny_fpn.yml', + '_base_/faster_rcnn_swin_reader.yml', +] +weights: output/faster_rcnn_swin_tiny_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5848c4943b4a40a5b306fb87d9aae7508f56a8c7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_2x_coco.yml @@ -0,0 +1,22 @@ +_BASE_: [ + 'faster_rcnn_swin_tiny_fpn_1x_coco.yml', +] +weights: output/faster_rcnn_swin_tiny_fpn_2x_coco/model_final + +epoch: 24 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 1.0 + optimizer: + type: AdamW + weight_decay: 0.05 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_3x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_3x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a1b68cf4703886be497d8efa6aea4b9c5d256797 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_3x_coco.yml @@ -0,0 +1,22 @@ +_BASE_: [ + 'faster_rcnn_swin_tiny_fpn_1x_coco.yml', +] +weights: output/faster_rcnn_swin_tiny_fpn_3x_coco/model_final + +epoch: 36 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 1.0 + optimizer: + type: AdamW + weight_decay: 0.05 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..317d3741e38e5e4a4720add59e1f0792bf8c4a82 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml @@ -0,0 +1,17 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/faster_rcnn_x101_vd_64x4d_fpn_1x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + groups: 64 + base_width: 4 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..939878f247b2552d6e9e4364f5c9e6443c71de31 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + 'faster_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/faster_rcnn_x101_vd_64x4d_fpn_2x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + groups: 64 + base_width: 4 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/fcos/README.md b/PaddleDetection-release-2.6/configs/fcos/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7b3a58004de7f43264365182949a0532e8d91897 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/README.md @@ -0,0 +1,37 @@ +# FCOS (Fully Convolutional One-Stage Object Detection) + +## Model Zoo on COCO + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50-FPN | FCOS | 2 | 1x | ---- | 39.6 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_1x_coco.pdparams) | [config](./fcos_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | FCOS + iou | 2 | 1x | ---- | 40.0 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_iou_1x_coco.pdparams) | [config](./fcos_r50_fpn_iou_1x_coco.yml) | +| ResNet50-FPN | FCOS + DCN | 2 | 1x | ---- | 44.3 | [download](https://paddledet.bj.bcebos.com/models/fcos_dcn_r50_fpn_1x_coco.pdparams) | [config](./fcos_dcn_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | FCOS + multiscale_train | 2 | 2x | ---- | 41.8 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_multiscale_2x_coco.pdparams) | [config](./fcos_r50_fpn_multiscale_2x_coco.yml) | +| ResNet50-FPN | FCOS + multiscale_train + iou | 2 | 2x | ---- | 42.6 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_iou_multiscale_2x_coco.pdparams) | [config](./fcos_r50_fpn_iou_multiscale_2x_coco.yml) | + +**注意:** + - `+ iou` 表示与原版 FCOS 相比,不使用 `centerness` 而是使用 `iou` 来参与计算loss。 + - 基于 FCOS 的半监督检测方法 `DenseTeaher` 可以参照[DenseTeaher](../semi_det/denseteacher)去使用,结合无标签数据可以进一步提升检测性能。 + - PaddleDetection中默认使用`R50-vb`预训练,如果使用`R50-vd`结合[SSLD](../../../docs/feature_models/SSLD_PRETRAINED_MODEL.md)的预训练模型,可进一步显著提升检测精度,同时backbone部分配置也需要做出相应更改,如: + ```python + pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + ``` + +## Citations +``` +@inproceedings{tian2019fcos, + title = {{FCOS}: Fully Convolutional One-Stage Object Detection}, + author = {Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong}, + booktitle = {Proc. Int. Conf. Computer Vision (ICCV)}, + year = {2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_r50_fpn.yml b/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ace6b51f17fcd9fc2015a549039b8919f312012 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_r50_fpn.yml @@ -0,0 +1,48 @@ +architecture: FCOS +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +FCOS: + backbone: ResNet + neck: FPN + fcos_head: FCOSHead + +ResNet: + depth: 50 + variant: 'b' + norm_type: bn + freeze_at: 0 # res2 + return_idx: [1, 2, 3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: True + use_c5: False + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + num_shift: 0.5 + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_reader.yml b/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..8f0016125eb666ca5526ec5edd3373cf081adf6e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/_base_/fcos_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - RandomFlip: {} + batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 128} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + batch_size: 2 + shuffle: True + drop_last: True + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 128} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 128} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/fcos/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/fcos/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..d28b0947b9fb6567a70f11acfe6663dac89b0771 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..93ac2c0eb0dd7860259bf38549d3aa176b00cdc8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml @@ -0,0 +1,16 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/fcos_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/fcos_reader.yml', +] + +weights: output/fcos_dcn_r50_fpn_1x_coco/model_final + +ResNet: + dcn_v2_stages: [1, 2, 3] + +FCOSHead: + fcos_feat: + use_dcn: True diff --git a/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0b47d454fffa5056453356faf9e073a4a9d4ec60 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_1x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/fcos_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/fcos_reader.yml', +] + +weights: output/fcos_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_1x_coco.yml b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..18c33cf8e221af72ca3edb1ad355572ee456a3ae --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_1x_coco.yml @@ -0,0 +1,79 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/fcos_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/fcos_reader.yml', +] + +weights: output/fcos_r50_fpn_iou_1x_coco/model_final + + +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - RandomFlip: {} + batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + batch_size: 2 + shuffle: True + drop_last: True + use_shared_memory: True + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True + + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + quality: "iou" # default 'centerness' + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d53ea17f57b6c78076668489ca5246122bfe8edb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml @@ -0,0 +1,91 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/fcos_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/fcos_reader.yml', +] + +weights: output/fcos_r50_fpn_iou_multiscale_2x_coco_010/model_final + +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - RandomFlip: {} + batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + batch_size: 2 + shuffle: True + drop_last: True + use_shared_memory: True + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True + + +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + quality: "iou" # default 'centerness' + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_multiscale_2x_coco.yml b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_multiscale_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0afdbbc5be62468ae073258badb0ee2773948e3c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/fcos/fcos_r50_fpn_multiscale_2x_coco.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/fcos_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/fcos_reader.yml', +] + +weights: output/fcos_r50_fpn_multiscale_2x_coco/model_final + +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - RandomFlip: {} + batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 128} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + batch_size: 2 + shuffle: True + drop_last: True + use_shared_memory: True + +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 diff --git a/PaddleDetection-release-2.6/configs/few-shot/README.md b/PaddleDetection-release-2.6/configs/few-shot/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e000a90d4355d5daf738232c67b6a49f739c7495 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/README.md @@ -0,0 +1,76 @@ +# Co-tuning for Transfer Learning
    Supervised Contrastive Learning + +## Data preparation +以[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。 +Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 +可从Kaggle上下载,也可以从[下载链接](https://fsdet-dataset.bj.bcebos.com/roadsign_coco.tar.gz) 下载。 +分别从原始数据集中每类选取相同样本(例如:10shots即每类都有十个训练样本)训练即可。
    +工业数据集使用PKU-Market-PCB,该数据集用于印刷电路板(PCB)的瑕疵检测,提供了6种常见的PCB缺陷[下载链接](https://fsdet-dataset.bj.bcebos.com/pcb.tar.gz) + + +## Model Zoo +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 每类样本个数 | Box AP | 下载 | 配置文件 | +| :------------------- | :------------- | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50-vd | Faster | 1 | 10 | 60.1 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](./faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml) | +| PPYOLOE_crn_s | PPYOLOE | 1 | 30 | 17.8 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_s_80e_contrast_pcb.pdparams) |[配置文件](./ppyoloe_plus_crn_s_80e_contrast_pcb.yml) | + +## Compare-cotuning +| 骨架网络 | 网络类型 | 每张GPU图片个数 |每类样本个数 | Cotuning | Box AP | +| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | +| ResNet50-vd | Faster | 1 | 10 | False | 56.7 | +| ResNet50-vd | Faster | 1 | 10 | True | 60.1 | + +## Compare-contrast +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 每类样本个数 | Contrast | Box AP | +| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | +| PPYOLOE_crn_s | PPYOLOE | 1 | 30 | False | 15.4 | +| PPYOLOE_crn_s | PPYOLOE | 1 | 30 | True | 17.8 | + +## Training & Evaluation & Inference +### 1、Training + +``` +# -c 参数表示指定使用哪个配置文件 +# --eval 参数表示边训练边评估,训练过程中会保存验证效果最佳的checkpoint + +python tools/train.py -c configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml --eval +``` +### 2、Evaluation +``` +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置) + +python tools/eval.py -c configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml \ + -o weights=output/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign/best_model +``` + + +### 3、Inference +``` +# -c 参数表示指定使用哪个配置文件 +# --infer_img 参数指定预测图像路径 + +python tools/infer.py -c configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml \ + --infer_img=demo/road554.png +``` + +## Citations +``` +@article{you2020co, + title={Co-tuning for transfer learning}, + author={You, Kaichao and Kou, Zhi and Long, Mingsheng and Wang, Jianmin}, + journal={Advances in Neural Information Processing Systems}, + volume={33}, + pages={17236--17246}, + year={2020} +} + +@article{khosla2020supervised, + title={Supervised contrastive learning}, + author={Khosla, Prannay and Teterwak, Piotr and Wang, Chen and Sarna, Aaron and Tian, Yonglong and Isola, Phillip and Maschinot, Aaron and Liu, Ce and Krishnan, Dilip}, + journal={Advances in Neural Information Processing Systems}, + volume={33}, + pages={18661--18673}, + year={2020} +} +``` \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_fpn_reader.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9b9abccd63e499bfa9402f3038425470e4a6e953 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..fd29f5ea1a1df9e2599d3efcff344c5d3363945e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50.yml @@ -0,0 +1,66 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +FasterRCNN: + backbone: ResNet + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [2] + num_stages: 3 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [32, 64, 128, 256, 512] + strides: [16] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 12000 + post_nms_top_n: 2000 + topk_after_collect: False + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 6000 + post_nms_top_n: 1000 + + +BBoxHead: + head: Res5Head + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + with_pool: true + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..38ee81def0cb528f3f67e8ed616b9589bd72de9e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_rcnn_r50_fpn.yml @@ -0,0 +1,73 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_reader.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1c1bb6bc262e86ea69ae78919064aa2b6834311 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/faster_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..4caaa63bda15917137a9ac22b736ae83c3d04856 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_80e.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_80e.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a8773df15aa103f3194f56634604d84a2a084eb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/optimizer_80e.yml @@ -0,0 +1,18 @@ +epoch: 80 + +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_crn.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_crn.yml new file mode 100644 index 0000000000000000000000000000000000000000..a83f35008f4797311689ed952abef15df0c0eea7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_crn.yml @@ -0,0 +1,49 @@ +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +use_cot: False +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 30 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_reader.yml b/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd9cdeff8b9d46e41a4e6fb518339168dfd4b154 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/_base_/ppyoloe_plus_reader.yml @@ -0,0 +1,40 @@ +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml b/PaddleDetection-release-2.6/configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml new file mode 100644 index 0000000000000000000000000000000000000000..75fd9e3d0ccaa56fd77d8851711b8a44720df566 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign.yml @@ -0,0 +1,67 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/faster_rcnn_r50_fpn.yml', + '_base_/faster_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_1x_coco.pdparams +weights: output/faster_rcnn_r50_vd_fpn_1x_coco_cotuning_roadsign/model_final + +snapshot_epoch: 5 + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +use_cot: True +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + cot_classes: 80 + loss_cot: + name: COTLoss + cot_lambda: 1 + cot_scale: 1 + +num_classes: 4 +metric: COCO +map_type: integral + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train_shots10.json + dataset_dir: dataset/roadsign_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/roadsign_valid.json + dataset_dir: dataset/roadsign_coco + +TestDataset: + !ImageFolder + anno_path: annotations/roadsign_valid.json + dataset_dir: dataset/roadsign_coco \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/few-shot/ppyoloe_plus_crn_s_80e_contrast_pcb.yml b/PaddleDetection-release-2.6/configs/few-shot/ppyoloe_plus_crn_s_80e_contrast_pcb.yml new file mode 100644 index 0000000000000000000000000000000000000000..05320089fcb1f19a690edd030f9b57b502909a38 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/few-shot/ppyoloe_plus_crn_s_80e_contrast_pcb.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_80e.yml', + './_base_/ppyoloe_plus_crn.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_s_80e_contrast_pcb/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +epoch: 80 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEContrastHead + post_process: ~ + +PPYOLOEContrastHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5, contrast: 0.2} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + contrast_loss: + name: SupContrast + temperature: 100 + sample_num: 2048 + thresh: 0.75 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + +num_classes: 6 +metric: COCO +map_type: integral + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: pcb_cocoanno/train_shots30.json + dataset_dir: dataset/pcb + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: pcb_cocoanno/val.json + dataset_dir: dataset/pcb + +TestDataset: + !ImageFolder + anno_path: pcb_cocoanno/val.json + dataset_dir: dataset/pcb \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/gfl/README.md b/PaddleDetection-release-2.6/configs/gfl/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b2b79cbac443e2e519660e7ab99b3d29695d5aa8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/README.md @@ -0,0 +1,40 @@ +# Generalized Focal Loss Model(GFL) + +## Introduction + +We reproduce the object detection results in the paper [Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection](https://arxiv.org/abs/2006.04388) and [Generalized Focal Loss V2](https://arxiv.org/pdf/2011.12885.pdf). And We use a better performing pre-trained model and ResNet-vd structure to improve mAP. + +## Model Zoo + +| Backbone | Model | batch-size/GPU | lr schedule |FPS | Box AP | download | config | +| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50 | GFL | 2 | 1x | ---- | 41.0 | [model](https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_1x_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gfl_r50_fpn_1x_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gfl_r50_fpn_1x_coco.yml) | +| ResNet50 | GFL + [CWD](../slim/README.md) | 2 | 2x | ---- | 44.0 | [model](https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_2x_coco_cwd.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gfl_r50_fpn_2x_coco_cwd.log) | [config1](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gfl_r50_fpn_1x_coco.yml), [config2](../slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml) | +| ResNet101-vd | GFL | 2 | 2x | ---- | 46.8 | [model](https://paddledet.bj.bcebos.com/models/gfl_r101vd_fpn_mstrain_2x_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gfl_r101vd_fpn_mstrain_2x_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml) | +| ResNet34-vd | GFL | 2 | 1x | ---- | 40.8 | [model](https://paddledet.bj.bcebos.com/models/gfl_r34vd_1x_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gfl_r34vd_1x_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gfl_r34vd_1x_coco.yml) | +| ResNet18-vd | GFL | 2 | 1x | ---- | 36.6 | [model](https://paddledet.bj.bcebos.com/models/gfl_r18vd_1x_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gfl_r18vd_1x_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gfl_r18vd_1x_coco.yml) | +| ResNet18-vd | GFL + [LD](../slim/README.md) | 2 | 1x | ---- | 38.2 | [model](https://bj.bcebos.com/v1/paddledet/models/gfl_slim_ld_r18vd_1x_coco.pdparams) | [log](https://bj.bcebos.com/v1/paddledet/logs/train_gfl_slim_ld_r18vd_1x_coco.log) | [config1](./gfl_slim_ld_r18vd_1x_coco.yml), [config2](../slim/distill/gfl_ld_distill.yml) | +| ResNet50 | GFLv2 | 2 | 1x | ---- | 41.2 | [model](https://paddledet.bj.bcebos.com/models/gflv2_r50_fpn_1x_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_gflv2_r50_fpn_1x_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl/gflv2_r50_fpn_1x_coco.yml) | + + +**Notes:** + +- GFL is trained on COCO train2017 dataset with 8 GPUs and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. + +## Citations +``` +@article{li2020generalized, + title={Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection}, + author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian}, + journal={arXiv preprint arXiv:2006.04388}, + year={2020} +} + +@article{li2020gflv2, + title={Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection}, + author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian}, + journal={arXiv preprint arXiv:2011.12885}, + year={2020} +} + +``` diff --git a/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_r50_fpn.yml b/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..488bec61ee11f93200edb352dc0088f83f48bba3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_r50_fpn.yml @@ -0,0 +1,51 @@ +architecture: GFL +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +GFL: + backbone: ResNet + neck: FPN + head: GFLHead + +ResNet: + depth: 50 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +GFLHead: + conv_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: false + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + reg_max: 16 + loss_class: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_reader.yml b/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e36395c569f32b69f040e20ccd38aff350fbf91e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/_base_/gfl_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - Resize: {target_size: [800, 1333], keep_ratio: true, interp: 1} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2GFLTarget: + downsample_ratios: [8, 16, 32, 64, 128] + grid_cell_scale: 8 + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: True + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/gfl/_base_/gflv2_r50_fpn.yml b/PaddleDetection-release-2.6/configs/gfl/_base_/gflv2_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..e9708d86a140ef03bf67d6abc18bbcf00dd3baa4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/_base_/gflv2_r50_fpn.yml @@ -0,0 +1,56 @@ +architecture: GFL +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +GFL: + backbone: ResNet + neck: FPN + head: GFLHead + +ResNet: + depth: 50 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +GFLHead: + conv_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: false + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + reg_max: 16 + dgqp_module: + name: DGQP + reg_topk: 4 + reg_channels: 64 + add_mean: True + loss_class: + name: QualityFocalLoss + use_sigmoid: False + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/gfl/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/gfl/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..39c54ac805031619debf9b31119afa86b3ead857 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..04e6804b7180b8df07062f5e48dfd90f38fc45c2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml @@ -0,0 +1,46 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/gfl_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/gfl_r101vd_fpn_mstrain_2x_coco/model_final +find_unused_parameters: True +use_ema: true +ema_decay: 0.9998 + +ResNet: + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[480, 1333], [512, 1333], [544, 1333], [576, 1333], [608, 1333], [640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2GFLTarget: + downsample_ratios: [8, 16, 32, 64, 128] + grid_cell_scale: 8 diff --git a/PaddleDetection-release-2.6/configs/gfl/gfl_r18vd_1x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gfl_r18vd_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a38c86eee8d5a0669aa6a09f2e66ff08311450e3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gfl_r18vd_1x_coco.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/gfl_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet18_vd_pretrained.pdparams +weights: output/gfl_r18vd_1x_coco/model_final +find_unused_parameters: True + +ResNet: + depth: 18 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/gfl/gfl_r34vd_1x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gfl_r34vd_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..1f15085556b0fab9cf5693afe406510f5d55684a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gfl_r34vd_1x_coco.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/gfl_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_vd_pretrained.pdparams +weights: output/gfl_r34vd_1x_coco/model_final +find_unused_parameters: True + +ResNet: + depth: 34 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/gfl/gfl_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gfl_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..2e17b23d8b34b6094731c9f3115cc3325ce697bc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gfl_r50_fpn_1x_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/gfl_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +weights: output/gfl_r50_fpn_1x_coco/model_final +find_unused_parameters: True diff --git a/PaddleDetection-release-2.6/configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4417b701d33c31be672a6e2a66ce8b19882d39fa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml @@ -0,0 +1,73 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +weights: output/gfl_r18vd_1x_coco/model_final +find_unused_parameters: True + +architecture: GFL +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet18_vd_pretrained.pdparams + +GFL: + backbone: ResNet + neck: FPN + head: LDGFLHead + +ResNet: + depth: 18 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +LDGFLHead: # new head + conv_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: false + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + reg_max: 16 + loss_class: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + loss_ld: + name: KnowledgeDistillationKLDivLoss + loss_weight: 0.25 + T: 10 + loss_ld_vlr: + name: KnowledgeDistillationKLDivLoss + loss_weight: 0.25 + T: 10 + loss_kd: + name: KnowledgeDistillationKLDivLoss + loss_weight: 10 + T: 2 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/gfl/gflv2_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/gfl/gflv2_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b0a3d410b19696c27f87841966fd9b51ad1088eb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gfl/gflv2_r50_fpn_1x_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/gflv2_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/gfl_reader.yml', +] + +weights: output/gflv2_r50_fpn_1x_coco/model_final +find_unused_parameters: True diff --git a/PaddleDetection-release-2.6/configs/gn/README.md b/PaddleDetection-release-2.6/configs/gn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..dc3b88db8324c3a2a9f897f8f02024e382ad6ff6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gn/README.md @@ -0,0 +1,23 @@ +# Group Normalization + +## Model Zoo + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| Box AP | Mask AP | 下载 | 配置文件 | +| :------------- | :------------- | :-----------: | :------: | :--------: |:-----: | :-----: | :----: | :----: | +| ResNet50-FPN | Faster | 1 | 2x | - | 41.9 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml) | +| ResNet50-FPN | Mask | 1 | 2x | - | 42.3 | 38.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml) | +| ResNet50-FPN | Cascade Faster | 1 | 2x | - | 44.6 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml) | +| ResNet50-FPN | Cacade Mask | 1 | 2x | - | 45.0 | 39.3 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml) | + + +**注意:** Faster R-CNN baseline仅使用 `2fc` head,而此处使用[`4conv1fc` head](https://arxiv.org/abs/1803.08494)(4层conv之间使用GN),并且FPN也使用GN,而对于Mask R-CNN是在mask head的4层conv之间也使用GN。 + +## Citations +``` +@inproceedings{wu2018group, + title={Group Normalization}, + author={Wu, Yuxin and He, Kaiming}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml b/PaddleDetection-release-2.6/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..e2c750dfbe481eb6875fff6df0febba69d0ab947 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml @@ -0,0 +1,61 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '../cascade_rcnn/_base_/optimizer_1x.yml', + '../cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml', + '../cascade_rcnn/_base_/cascade_mask_fpn_reader.yml', +] +weights: output/cascade_mask_rcnn_r50_fpn_gn_2x_coco/model_final + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +FPN: + out_channel: 256 + norm_type: gn + +CascadeHead: + head: CascadeXConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +CascadeXConvNormHead: + num_convs: 4 + out_channel: 1024 + norm_type: gn + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: False + +MaskFeat: + num_convs: 4 + out_channel: 256 + norm_type: gn + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml b/PaddleDetection-release-2.6/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..2706790ed77301739e9d1374e9292f16a0c1c090 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml @@ -0,0 +1,37 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../cascade_rcnn/_base_/optimizer_1x.yml', + '../cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml', + '../cascade_rcnn/_base_/cascade_fpn_reader.yml', +] +weights: output/cascade_rcnn_r50_fpn_gn_2x_coco/model_final + +FPN: + out_channel: 256 + norm_type: gn + +CascadeHead: + head: CascadeXConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +CascadeXConvNormHead: + num_convs: 4 + out_channel: 1024 + norm_type: gn + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml b/PaddleDetection-release-2.6/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..200a98b4b9fb615c17b7bd42f88b3bb1b2474370 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', +] +weights: output/faster_rcnn_r50_fpn_gn_2x_coco/model_final + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + +FPN: + out_channel: 256 + norm_type: gn + +BBoxHead: + head: XConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +XConvNormHead: + num_convs: 4 + out_channel: 1024 + norm_type: gn + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml b/PaddleDetection-release-2.6/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..70beaf5851df945745c904dc9932928d9cedac01 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml @@ -0,0 +1,61 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '../mask_rcnn/_base_/optimizer_1x.yml', + '../mask_rcnn/_base_/mask_rcnn_r50_fpn.yml', + '../mask_rcnn/_base_/mask_fpn_reader.yml', +] +weights: output/mask_rcnn_r50_fpn_gn_2x_coco/model_final + +MaskRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +FPN: + out_channel: 256 + norm_type: gn + +BBoxHead: + head: XConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +XConvNormHead: + num_convs: 4 + out_channel: 1024 + norm_type: gn + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: False + +MaskFeat: + num_convs: 4 + out_channel: 256 + norm_type: gn + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/hrnet/README.md b/PaddleDetection-release-2.6/configs/hrnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f96ee347779fbebc31a44829be8e65765d3c089d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/hrnet/README.md @@ -0,0 +1,34 @@ +# High-resolution networks (HRNets) for object detection + +## Introduction + +- Deep High-Resolution Representation Learning for Human Pose Estimation: [https://arxiv.org/abs/1902.09212](https://arxiv.org/abs/1902.09212) + +``` +@inproceedings{SunXLW19, + title={Deep High-Resolution Representation Learning for Human Pose Estimation}, + author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang}, + booktitle={CVPR}, + year={2019} +} +``` + +- High-Resolution Representations for Labeling Pixels and Regions: [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514) + +``` +@article{SunZJCXLMWLW19, + title={High-Resolution Representations for Labeling Pixels and Regions}, + author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao + and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang}, + journal = {CoRR}, + volume = {abs/1904.04514}, + year={2019} +} +``` + +## Model Zoo + +| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | +| :---------------------- | :------------- | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | +| HRNetV2p_W18 | Faster | 1 | 1x | - | 36.8 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml) | +| HRNetV2p_W18 | Faster | 1 | 2x | - | 39.0 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml) | diff --git a/PaddleDetection-release-2.6/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml b/PaddleDetection-release-2.6/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml new file mode 100644 index 0000000000000000000000000000000000000000..6c556f306fdc2ea5bd320376236143984f4cba6a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml @@ -0,0 +1,68 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams + +FasterRCNN: + backbone: HRNet + neck: HRFPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + +HRNet: + width: 18 + freeze_at: 0 + return_idx: [0, 1, 2, 3] + +HRFPN: + out_channel: 256 + share_conv: false + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml b/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ff05964c41e05b2d7aaee9bf6ef330cee2337c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml @@ -0,0 +1,23 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + './_base_/faster_rcnn_hrnetv2p_w18.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', + '../runtime.yml', +] + +weights: output/faster_rcnn_hrnetv2p_w18_1x_coco/model_final +epoch: 12 + +LearningRate: + base_lr: 0.02 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +TrainReader: + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml b/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..73d9dc8850f67956c724bb033726b177aa703d37 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml @@ -0,0 +1,23 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + './_base_/faster_rcnn_hrnetv2p_w18.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', + '../runtime.yml', +] + +weights: output/faster_rcnn_hrnetv2p_w18_2x_coco/model_final +epoch: 24 + +LearningRate: + base_lr: 0.02 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +TrainReader: + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/keypoint/KeypointBenchmark.md b/PaddleDetection-release-2.6/configs/keypoint/KeypointBenchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..c7e5bd6ac090ea332c794013f7855faa0fa438c9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/KeypointBenchmark.md @@ -0,0 +1,50 @@ +# Keypoint Inference Benchmark + +## Benchmark on Server +We tested benchmarks in different runtime environments。 See the table below for details. + +| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) | +| :------------------------ | :------: | :------: | :-----: | :---: | :---: | +| LiteHRNet-18-256x192 | 88.8 ms | 40.7 ms | 4.4 ms | 2.0 ms | 1.8 ms | +| LiteHRNet-18-384x288 | 188.0 ms | 79.3 ms | 4.8 ms | 3.6 ms | 3.2 ms | +| LiteHRNet-30-256x192 | 148.4 ms | 69.0 ms | 7.1 ms | 3.1 ms | 2.8 ms | +| LiteHRNet-30-384x288 | 309.8 ms | 133.5 ms | 8.2 ms | 6.0 ms | 5.3 ms | +| PP-TinyPose-128x96 | 25.2 ms | 14.1 ms | 2.7 ms | 0.9 ms | 0.8 ms | +| PP-TinyPose-256x192 | 82.4 ms | 36.1 ms | 3.0 ms | 1.5 ms | 1.1 ms | + +**Notes:** +- These tests above are based Python deployment. +- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6. +- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8. +- The time only includes inference time. + + +| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) | +| :------------------------ | :------: | :------: | :-----: | :---: | :---: | +| DARK_HRNet_w32-256x192 | 363.93 ms | 97.38 ms | 4.13 ms | 3.74 ms | 1.75 ms | +| DARK_HRNet_w32-384x288 | 823.71 ms | 218.55 ms | 9.44 ms | 8.91 ms | 2.96 ms | +| HRNet_w32-256x192 | 363.67 ms | 97.64 ms | 4.11 ms | 3.71 ms | 1.72 ms | +| HRNet_w32-256x256_mpii | 485.56 ms | 131.48 ms | 4.81 ms | 4.26 ms | 2.00 ms | +| HRNet_w32-384x288 | 822.73 ms | 215.48 ms | 9.40 ms | 8.81 ms | 2.97 ms | +| PP-TinyPose-128x96 | 24.06 ms | 13.05 ms | 2.43 ms | 0.75 ms | 0.72 ms | +| PP-TinyPose-256x192 | 82.73 ms | 36.25 ms | 2.57 ms | 1.38 ms | 1.15 ms | + + +**Notes:** +- These tests above are based C++ deployment. +- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6. +- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8. +- The time only includes inference time. + +## Benchmark on Mobile +We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details. + +| Model | Kirin 980 (1-thread) | Kirin 980 (4-threads) | Qualcomm Snapdragon 845 (1-thread) | Qualcomm Snapdragon 845 (4-threads) | Qualcomm Snapdragon 660 (1-thread) | Qualcomm Snapdragon 660 (4-threads) | +| :------------------------ | :---: | :---: | :---: | :---: | :---: | :---: | +| PicoDet-s-192x192 (det) | 14.85 ms | 5.45 ms | 17.50 ms | 7.56 ms | 80.08 ms | 27.36 ms | +| PicoDet-s-320x320 (det) | 38.09 ms | 12.00 ms | 45.26 ms | 17.07 ms | 232.81 ms | 58.68 ms | +| PP-TinyPose-128x96 (pose) | 12.03 ms | 5.09 ms | 13.14 ms | 6.73 ms | 71.87 ms | 20.04 ms | + +**Notes:** +- These tests above are based Paddle Lite deployment, and version is v2.10-rc. +- The time only includes inference time. diff --git a/PaddleDetection-release-2.6/configs/keypoint/README.md b/PaddleDetection-release-2.6/configs/keypoint/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f81b2fabbd4a27c5bb7a56fca7abce34660af556 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/README.md @@ -0,0 +1,287 @@ +简体中文 | [English](README_en.md) + +# 关键点检测系列模型 + +
    + +
    + +## 目录 + +- [简介](#简介) +- [模型推荐](#模型推荐) +- [模型库](#模型库) +- [快速开始](#快速开始) + - [环境安装](#1环境安装) + - [数据准备](#2数据准备) + - [训练与测试](#3训练与测试) + - [单卡训练](#单卡训练) + - [多卡训练](#多卡训练) + - [模型评估](#模型评估) + - [模型预测](#模型预测) + - [模型部署](#模型部署) + - [Top-Down模型联合部署](#top-down模型联合部署) + - [Bottom-Up模型独立部署](#bottom-up模型独立部署) + - [与多目标跟踪联合部署](#与多目标跟踪模型fairmot联合部署) + - [完整部署教程及Demo](#完整部署教程及Demo) +- [自定义数据训练](#自定义数据训练) +- [BenchMark](#benchmark) + +## 简介 + +PaddleDetection 中的关键点检测部分紧跟最先进的算法,包括 Top-Down 和 Bottom-Up 两种方法,可以满足用户的不同需求。Top-Down 先检测对象,再检测特定关键点。Top-Down 模型的准确率会更高,但速度会随着对象数量的增加而变慢。不同的是,Bottom-Up 首先检测点,然后对这些点进行分组或连接以形成多个人体姿势实例。Bottom-Up 的速度是固定的,不会随着物体数量的增加而变慢,但精度会更低。 + +同时,PaddleDetection 提供针对移动端设备优化的自研实时关键点检测模型 [PP-TinyPose](./tiny_pose/README.md)。 + +## 模型推荐 + +### 移动端模型推荐 + +| 检测模型 | 关键点模型 | 输入尺寸 | COCO数据集精度 | 平均推理耗时 (FP16) | 参数量 (M) | Flops (G) | 模型权重 | Paddle-Lite部署模型(FP16) | +| :----------------------------------------------------------- | :------------------------------------ | :------------------------------: | :-----------------------------: | :------------------------------------: | --------------------------- | :-------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_128x96.yml) | 检测:192x192
    关键点:128x96 | 检测mAP:29.0
    关键点AP:58.1 | 检测耗时:2.37ms
    关键点耗时:3.27ms | 检测:1.18
    关键点:1.36 | 检测:0.35
    关键点:0.08 | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.nb)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.nb) | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_256x192.yml) | 检测:320x320
    关键点:256x192 | 检测mAP:38.5
    关键点AP:68.8 | 检测耗时:6.30ms
    关键点耗时:8.33ms | 检测:1.18
    关键点:1.36 | 检测:0.97
    关键点:0.32 | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.nb)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.nb) | + + +*详细关于PP-TinyPose的使用请参考[文档](./tiny_pose/README.md)。 + +### 服务端模型推荐 + +| 检测模型 | 关键点模型 | 输入尺寸 | COCO数据集精度 | 参数量 (M) | Flops (G) | 模型权重 | +| :----------------------------------------------------------- | :----------------------------------------- | :------------------------------: | :-----------------------------: | :----------------------: | :----------------------: | :----------------------------------------------------------: | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_384x288.yml) | 检测:640x640
    关键点:384x288 | 检测mAP:49.5
    关键点AP:77.8 | 检测:54.6
    关键点:28.6 | 检测:115.8
    关键点:17.3 | [检测](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [关键点](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_256x192.yml) | 检测:640x640
    关键点:256x192 | 检测mAP:49.5
    关键点AP:76.9 | 检测:54.6
    关键点:28.6 | 检测:115.8
    关键点:7.68 | [检测](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [关键点](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | + + +## 模型库 + +COCO数据集 + +| 模型 | 方案 |输入尺寸 | AP(coco val) | 模型下载 | 配置文件 | +| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------| ------- | +| PETR_Res50 |One-Stage| 512 | 65.5 | [petr_res50.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/petr_resnet50_16x2_coco.pdparams) | [config](./petr/petr_resnet50_16x2_coco.yml) | +| HigherHRNet-w32 |Bottom-Up| 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) | +| HigherHRNet-w32 | Bottom-Up| 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) | +| HigherHRNet-w32+SWAHR |Bottom-Up| 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) | +| HRNet-w32 | Top-Down| 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | [config](./hrnet/hrnet_w32_256x192.yml) | +| HRNet-w32 |Top-Down| 384x288 | 77.8 | [hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | [config](./hrnet/hrnet_w32_384x288.yml) | +| HRNet-w32+DarkPose |Top-Down| 256x192 | 78.0 | [dark_hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | [config](./hrnet/dark_hrnet_w32_256x192.yml) | +| HRNet-w32+DarkPose |Top-Down| 384x288 | 78.3 | [dark_hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | [config](./hrnet/dark_hrnet_w32_384x288.yml) | +| WiderNaiveHRNet-18 | Top-Down|256x192 | 67.6(+DARK 68.4) | [wider_naive_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/wider_naive_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 |Top-Down| 256x192 | 66.5 | [lite_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 |Top-Down| 384x288 | 69.7 | [lite_hrnet_18_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_384x288_coco.yml) | +| LiteHRNet-30 | Top-Down|256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml) | +| LiteHRNet-30 |Top-Down| 384x288 | 72.5 | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml) | + +备注: Top-Down模型测试AP结果基于GroundTruth标注框 + +MPII数据集 +| 模型 | 方案| 输入尺寸 | PCKh(Mean) | PCKh(Mean@0.1) | 模型下载 | 配置文件 | +| :---- | ---|----- | :--------: | :------------: | :----------------------------------------------------------: | -------------------------------------------- | +| HRNet-w32 | Top-Down|256x256 | 90.6 | 38.5 | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) | + +场景模型 +| 模型 | 方案 | 输入尺寸 | 精度 | 预测速度 |模型权重 | 部署模型 | 说明| +| :---- | ---|----- | :--------: | :--------: | :------------: |:------------: |:-------------------: | +| HRNet-w32 + DarkPose | Top-Down|256x192 | AP: 87.1 (业务数据集)| 单人2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | 针对摔倒场景特别优化,该模型应用于[PP-Human](../../deploy/pipeline/README.md) | + + +我们同时推出了基于LiteHRNet(Top-Down)针对移动端设备优化的实时关键点检测模型[PP-TinyPose](./tiny_pose/README.md), 欢迎体验。 + + + +## 快速开始 + +### 1、环境安装 + +​ 请参考PaddleDetection [安装文档](../../docs/tutorials/INSTALL_cn.md)正确安装PaddlePaddle和PaddleDetection即可。 + +### 2、数据准备 + +​ 目前KeyPoint模型支持[COCO](https://cocodataset.org/#keypoints-2017)数据集和[MPII](http://human-pose.mpi-inf.mpg.de/#overview)数据集,数据集的准备方式请参考[关键点数据准备](../../docs/tutorials/data/PrepareKeypointDataSet.md)。 + +​ 关于config配置文件内容说明请参考[关键点配置文件说明](../../docs/tutorials/KeyPointConfigGuide_cn.md)。 + +- 请注意,Top-Down方案使用检测框测试时,需要通过检测模型生成bbox.json文件。COCO val2017的检测结果可以参考[Detector having human AP of 56.4 on COCO val2017 dataset](https://paddledet.bj.bcebos.com/data/bbox.json),下载后放在根目录(PaddleDetection)下,然后修改config配置文件中`use_gt_bbox: False`后生效。然后正常执行测试命令即可。 + +### 3、训练与测试 + +#### 单卡训练 + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml +``` + +#### 多卡训练 + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml +``` + +#### 模型评估 + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml + +#当只需要保存评估预测的结果时,可以通过设置save_prediction_only参数实现,评估预测结果默认保存在output/keypoints_results.json文件中 +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --save_prediction_only +``` + +#### 模型预测 + +​ 注意:top-down模型只支持单人截图预测,如需使用多人图,请使用[联合部署推理]方式。或者使用bottom-up模型。 + +```shell +CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=./output/higherhrnet_hrnet_w32_512/model_final.pdparams --infer_dir=../images/ --draw_threshold=0.5 --save_txt=True +``` + +#### 模型部署 + +##### Top-Down模型联合部署 + +```shell +#导出检测模型 +python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams + +#导出关键点模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams + +#detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down方式) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4 --device=gpu +``` + +##### Bottom-Up模型独立部署 + +```shell +#导出模型 +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams + +#部署推理 +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 +``` + +##### 与多目标跟踪模型FairMOT联合部署 + +```shell +#导出FairMOT跟踪模型 +python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +#用导出的跟踪和关键点模型Python联合预测 +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU +``` + +**注意:** + 跟踪模型导出教程请参考[文档](../mot/README.md)。 + +### 完整部署教程及Demo + + +​ 我们提供了PaddleInference(服务器端)、PaddleLite(移动端)、第三方部署(MNN、OpenVino)支持。无需依赖训练代码,deploy文件夹下相应文件夹提供独立完整部署代码。 详见 [部署文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/README.md)介绍。 + +## 自定义数据训练 + +我们以[tinypose_256x192](./tiny_pose/README.md)为例来说明对于自定义数据如何修改: + +#### 1、配置文件[tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +基本的修改内容及其含义如下: + +``` +num_joints: &num_joints 17 #自定义数据的关键点数量 +train_height: &train_height 256 #训练图片尺寸-高度h +train_width: &train_width 192 #训练图片尺寸-宽度w +hmsize: &hmsize [48, 64] #对应训练尺寸的输出尺寸,这里是输入[w,h]的1/4 +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #关键点定义中左右对称的关键点,用于flip增强。若没有对称结构在 TrainReader 的 RandomFlipHalfBodyTransform 一栏中 flip_pairs 后面加一行 "flip: False"(注意缩紧对齐) +num_joints_half_body: 8 #半身关键点数量,用于半身增强 +prob_half_body: 0.3 #半身增强实现概率,若不需要则修改为0 +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身对应关键点id,用于半身增强中获取上半身对应的关键点。 +``` + +上述是自定义数据时所需要的修改部分,完整的配置及含义说明可参考文件:[关键点配置文件说明](../../docs/tutorials/KeyPointConfigGuide_cn.md)。 + +#### 2、其他代码修改(影响测试、可视化) +- keypoint_utils.py中的sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0,表示每个关键点的确定范围方差,根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- visualizer.py中的draw_pose函数中的EDGES,表示可视化时关键点之间的连接线关系。 +- pycocotools工具中的sigmas,同第一个keypoint_utils.py中的设置。用于coco指标评估时计算。 + +#### 3、数据准备注意 +- 训练数据请按coco数据格式处理。需要包括关键点[Nx3]、检测框[N]标注。 +- 请注意area>0,area=0时数据在训练时会被过滤掉。此外,由于COCO的评估机制,area较小的数据在评估时也会被过滤掉,我们建议在自定义数据时取`area = bbox_w * bbox_h`。 + +如有遗漏,欢迎反馈 + + +## 关键点稳定策略(仅适用于视频数据) +使用关键点算法处理视频数据时,由于预测针对单帧图像进行,在视频结果上往往会有抖动的现象。在一些依靠精细化坐标的应用场景(例如健身计数、基于关键点的虚拟渲染等)上容易造成误检或体验不佳的问题。针对这个问题,在PaddleDetection关键点视频推理中加入了[OneEuro滤波器](http://www.lifl.fr/~casiez/publications/CHI2012-casiez.pdf)和EMA两种关键点稳定方式。实现将当前关键点坐标结果和历史关键点坐标结果结合计算,使得输出的点坐标更加稳定平滑。该功能同时支持在Python及C++推理中一键开启使用。 + +```bash +# 使用Python推理 +python deploy/python/det_keypoint_unite_infer.py \ + --det_model_dir output_inference/picodet_s_320 \ + --keypoint_model_dir output_inference/tinypose_256x192 \ + --video_file test_video.mp4 --device gpu --smooth True + +# 使用CPP推理 +./deploy/cpp/build/main --det_model_dir output_inference/picodet_s_320 \ + --keypoint_model_dir output_inference/tinypose_256x192 \ + --video_file test_video.mp4 --device gpu --smooth True +``` +效果如下: + +![](https://user-images.githubusercontent.com/15810355/181733125-3710bacc-2080-47e4-b397-3621a2f0caae.gif) + +## BenchMark + +我们给出了不同运行环境下的测试结果,供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/keypoint/KeypointBenchmark.md)。 + +## 引用 + +``` +@inproceedings{cheng2020bottom, + title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation}, + author={Bowen Cheng and Bin Xiao and Jingdong Wang and Honghui Shi and Thomas S. Huang and Lei Zhang}, + booktitle={CVPR}, + year={2020} +} + +@inproceedings{SunXLW19, + title={Deep High-Resolution Representation Learning for Human Pose Estimation}, + author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang}, + booktitle={CVPR}, + year={2019} +} + +@article{wang2019deep, + title={Deep High-Resolution Representation Learning for Visual Recognition}, + author={Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang and Liu, Wenyu and Xiao, Bin}, + journal={TPAMI}, + year={2019} +} + +@InProceedings{Zhang_2020_CVPR, + author = {Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce}, + title = {Distribution-Aware Coordinate Representation for Human Pose Estimation}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} + +@inproceedings{Yulitehrnet21, + title={Lite-HRNet: A Lightweight High-Resolution Network}, + author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong}, + booktitle={CVPR}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/keypoint/README_en.md b/PaddleDetection-release-2.6/configs/keypoint/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..64ffc39d61c63a3893b079e255facaec3620aeb6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/README_en.md @@ -0,0 +1,269 @@ +[简体中文](README.md) | English + +# KeyPoint Detection Models + +## Content + +- [Introduction](#introduction) +- [Model Recommendation](#model-recommendation) +- [Model Zoo](#model-zoo) +- [Getting Start](#getting-start) + - [Environmental Installation](#1environmental-installation) + - [Dataset Preparation](#2dataset-preparation) + - [Training and Testing](#3training-and-testing) + - [Training on single GPU](#training-on-single-gpu) + - [Training on multiple GPU](#training-on-multiple-gpu) + - [Evaluation](#evaluation) + - [Inference](#inference) + - [Deploy Inference](#deploy-inference) + - [Deployment for Top-Down models](#deployment-for-top-down-models) + - [Deployment for Bottom-Up models](#deployment-for-bottom-up-models) + - [Joint Inference with Multi-Object Tracking Model FairMOT](#joint-inference-with-multi-object-tracking-model-fairmot) + - [Complete Deploy Instruction and Demo](#complete-deploy-instruction-and-demo) +- [Train with custom data](#train-with-custom-data) +- [BenchMark](#benchmark) + +## Introduction + +The keypoint detection part in PaddleDetection follows the state-of-the-art algorithm closely, including Top-Down and Bottom-Up methods, which can satisfy the different needs of users. Top-Down detects the object first and then detects the specific keypoint. Top-Down models will be more accurate, but slower as the number of objects increases. Differently, Bottom-Up detects the point first and then group or connect those points to form several instances of human pose. The speed of Bottom-Up is fixed, it won't slow down as the number of objects increases, but it will be less accurate. + +At the same time, PaddleDetection provides a self-developed real-time keypoint detection model [PP-TinyPose](./tiny_pose/README_en.md) optimized for mobile devices. + +
    + +
    + +## Model Recommendation + +### Mobile Terminal + + + + +| Detection Model | Keypoint Model | Input Size | Accuracy of COCO | Average Inference Time (FP16) | Params (M) | Flops (G) | Model Weight | Paddle-Lite Inference Model(FP16) | +| :----------------------------------------------------------- | :------------------------------------ | :-------------------------------------: | :--------------------------------------: | :-----------------------------------: | :--------------------------------: | :--------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_128x96.yml) | Detection:192x192
    Keypoint:128x96 | Detection mAP:29.0
    Keypoint AP:58.1 | Detection:2.37ms
    Keypoint:3.27ms | Detection:1.18
    Keypoint:1.36 | Detection:0.35
    Keypoint:0.08 | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.nb)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.nb) | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_256x192.yml) | Detection:320x320
    Keypoint:256x192 | Detection mAP:38.5
    Keypoint AP:68.8 | Detection:6.30ms
    Keypoint:8.33ms | Detection:1.18
    Keypoint:1.36 | Detection:0.97
    Keypoint:0.32 | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.nb)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.nb) | + + +*Specific documents of PP-TinyPose, please refer to [Document](./tiny_pose/README.md)。 + +### Terminal Server + + +| Detection Model | Keypoint Model | Input Size | Accuracy of COCO | Params (M) | Flops (G) | Model Weight | +| :----------------------------------------------------------- | :----------------------------------------- | :-------------------------------------: | :--------------------------------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------------------------: | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_384x288.yml) | Detection:640x640
    Keypoint:384x288 | Detection mAP:49.5
    Keypoint AP:77.8 | Detection:54.6
    Keypoint:28.6 | Detection:115.8
    Keypoint:17.3 | [Detection](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [Keypoint](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_256x192.yml) | Detection:640x640
    Keypoint:256x192 | Detection mAP:49.5
    Keypoint AP:76.9 | Detection:54.6
    Keypoint:28.6 | Detection:115.8
    Keypoint:7.68 | [Detection](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [Keypoint](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | + + +## Model Zoo + +COCO Dataset +| Model | Input Size | AP(coco val) | Model Download | Config File | +| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- | +| PETR_Res50 |One-Stage| 512 | 65.5 | [petr_res50.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/petr_resnet50_16x2_coco.pdparams) | [config](./petr/petr_resnet50_16x2_coco.yml) | +| HigherHRNet-w32 | 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) | +| HigherHRNet-w32 | 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) | +| HigherHRNet-w32+SWAHR | 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) | +| HRNet-w32 | 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | [config](./hrnet/hrnet_w32_256x192.yml) | +| HRNet-w32 | 384x288 | 77.8 | [hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | [config](./hrnet/hrnet_w32_384x288.yml) | +| HRNet-w32+DarkPose | 256x192 | 78.0 | [dark_hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | [config](./hrnet/dark_hrnet_w32_256x192.yml) | +| HRNet-w32+DarkPose | 384x288 | 78.3 | [dark_hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | [config](./hrnet/dark_hrnet_w32_384x288.yml) | +| WiderNaiveHRNet-18 | 256x192 | 67.6(+DARK 68.4) | [wider_naive_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/wider_naive_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 | 256x192 | 66.5 | [lite_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 | 384x288 | 69.7 | [lite_hrnet_18_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_384x288_coco.yml) | +| LiteHRNet-30 | 256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml) | +| LiteHRNet-30 | 384x288 | 72.5 | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml) | + +Note:The AP results of Top-Down models are based on bounding boxes in GroundTruth. + +MPII Dataset +| Model | Input Size | PCKh(Mean) | PCKh(Mean@0.1) | Model Download | Config File | +| :---- | -------- | :--------: | :------------: | :----------------------------------------------------------: | -------------------------------------------- | +| HRNet-w32 | 256x256 | 90.6 | 38.5 | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) | + + +Model for Scenes +| Model | Strategy | Input Size | Precision | Inference Speed |Model Weights | Model Inference and Deployment | description| +| :---- | ---|----- | :--------: | :-------: |:------------: |:------------: |:-------------------: | +| HRNet-w32 + DarkPose | Top-Down|256x192 | AP: 87.1 (on internal dataset)| 2.9ms per person |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | Especially optimized for fall scenarios, the model is applied to [PP-Human](../../deploy/pipeline/README.md) | + + +We also release [PP-TinyPose](./tiny_pose/README_en.md), a real-time keypoint detection model optimized for mobile devices. Welcome to experience. + +## Getting Start + +### 1.Environmental Installation + +​ Please refer to [PaddleDetection Installation Guide](../../docs/tutorials/INSTALL.md) to install PaddlePaddle and PaddleDetection correctly. + +### 2.Dataset Preparation + +​ Currently, KeyPoint Detection Models support [COCO](https://cocodataset.org/#keypoints-2017) and [MPII](http://human-pose.mpi-inf.mpg.de/#overview). Please refer to [Keypoint Dataset Preparation](../../docs/tutorials/data/PrepareKeypointDataSet_en.md) to prepare dataset. + +​ About the description for config files, please refer to [Keypoint Config Guild](../../docs/tutorials/KeyPointConfigGuide_en.md). + +- Note that, when testing by detected bounding boxes in Top-Down method, We should get `bbox.json` by a detection model. You can download the detected results for COCO val2017 [(Detector having human AP of 56.4 on COCO val2017 dataset)](https://paddledet.bj.bcebos.com/data/bbox.json) directly, put it at the root path (`PaddleDetection/`), and set `use_gt_bbox: False` in config file. + +### 3.Training and Testing + +#### Training on single GPU + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml +``` + +#### Training on multiple GPU + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml +``` + +#### Evaluation + +```shell +#COCO DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml + +#MPII DataSet +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml + +#If you only need the prediction result, you can set --save_prediction_only. Then the result will be saved at output/keypoints_results.json by default. +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --save_prediction_only +``` + +#### Inference + +​ Note:Top-down models only support inference for a cropped image with single person. If you want to do inference on image with several people, please see "joint inference by detection and keypoint". Or you can choose a Bottom-up model. + +```shell +CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=./output/higherhrnet_hrnet_w32_512/model_final.pdparams --infer_dir=../images/ --draw_threshold=0.5 --save_txt=True +``` + +#### Deploy Inference + +##### Deployment for Top-Down models + +```shell +#Export Detection Model +python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams + + +#Export Keypoint Model +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams + +#Deployment for detector and keypoint, which is only for Top-Down models +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4 --device=gpu +``` + +##### Deployment for Bottom-Up models + +```shell +#Export model +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams + + +#Keypoint independent deployment, which is only for bottom-up models +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 +``` + +##### Joint Inference with Multi-Object Tracking Model FairMOT + +```shell +#export FairMOT model +python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +#joint inference with Multi-Object Tracking model FairMOT +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU +``` + +**Note:** + To export MOT model, please refer to [Here](../../configs/mot/README_en.md). + +### Complete Deploy Instruction and Demo + +​ We provide standalone deploy of PaddleInference(Server-GPU)、PaddleLite(mobile、ARM)、Third-Engine(MNN、OpenVino), which is independent of training codes。For detail, please click [Deploy-docs](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/README_en.md)。 + +## Train with custom data + +We take an example of [tinypose_256x192](./tiny_pose/README_en.md) to show how to train with custom data. + +#### 1、For configs [tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +you may need to modify these for your job: + +``` +num_joints: &num_joints 17 #the number of joints in your job +train_height: &train_height 256 #the height of model input +train_width: &train_width 192 #the width of model input +hmsize: &hmsize [48, 64] #the shape of model output,usually 1/4 of [w,h] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #the correspondence between left and right keypoint id,used for flip transform。You can add an line(by "flip: False") behind of flip_pairs in RandomFlipHalfBodyTransform of TrainReader if you don't need it +num_joints_half_body: 8 #The joint numbers of half body, used for half_body transform +prob_half_body: 0.3 #The probability of half_body transform, set to 0 if you don't need it +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #The joint ids of half(upper) body, used to get the upper joints in half_body transform +``` + +For more configs, please refer to [KeyPointConfigGuide](../../docs/tutorials/KeyPointConfigGuide_en.md)。 + +#### 2、Others(used for test and visualization) +- In keypoint_utils.py, please set: "sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0", the value indicate the variance of a joint locations,normally 0.25-0.5 means the location is highly accuracy,for example: eyes。0.5-1.0 means the location is not sure so much,for example: shoulder。0.75 is recommand if you not sure。 +- In visualizer.py, please set "EDGES" in draw_pose function,this indicate the line to show between joints for visualization。 +- In pycocotools you installed, please set "sigmas",it is the same as that in keypoint_utils.py, but used for coco evaluation。 + +#### 3、Note for data preparation +- The data should has the same format as Coco data, and the keypoints(Nx3) and bbox(N) should be annotated. +- please set "area">0 in annotations files otherwise it will be skiped while training. Moreover, due to the evaluation mechanism of COCO, the data with small area may also be filtered out during evaluation. We recommend to set `area = bbox_w * bbox_h` when customizing your dataset. + + +## BenchMark + +We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/keypoint/KeypointBenchmark.md) for details. + +## Reference + +``` +@inproceedings{cheng2020bottom, + title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation}, + author={Bowen Cheng and Bin Xiao and Jingdong Wang and Honghui Shi and Thomas S. Huang and Lei Zhang}, + booktitle={CVPR}, + year={2020} +} + +@inproceedings{SunXLW19, + title={Deep High-Resolution Representation Learning for Human Pose Estimation}, + author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang}, + booktitle={CVPR}, + year={2019} +} + +@article{wang2019deep, + title={Deep High-Resolution Representation Learning for Visual Recognition}, + author={Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang and Liu, Wenyu and Xiao, Bin}, + journal={TPAMI}, + year={2019} +} + +@InProceedings{Zhang_2020_CVPR, + author = {Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce}, + title = {Distribution-Aware Coordinate Representation for Human Pose Estimation}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} + +@inproceedings{Yulitehrnet21, + title={Lite-HRNet: A Lightweight High-Resolution Network}, + author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong}, + booktitle={CVPR}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml new file mode 100644 index 0000000000000000000000000000000000000000..5dedfb32bb1bad8d130306db8ce55143055049c9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml @@ -0,0 +1,139 @@ +use_gpu: true +log_iter: 10 +save_dir: output +snapshot_epoch: 10 +weights: output/higherhrnet_hrnet_w32_512/model_final +epoch: 300 +num_joints: &num_joints 17 +flip_perm: &flip_perm [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] +input_size: &input_size 512 +hm_size: &hm_size 128 +hm_size_2x: &hm_size_2x 256 +max_people: &max_people 30 +metric: COCO +IouType: keypoints +num_classes: 1 + + +#####model +architecture: HigherHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +HigherHRNet: + backbone: HRNet + hrhrnet_head: HrHRNetHead + post_process: HrHRNetPostProcess + flip_perm: *flip_perm + eval_flip: true + +HRNet: + width: &width 32 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +HrHRNetHead: + num_joints: *num_joints + width: *width + loss: HrHRNetLoss + swahr: false + +HrHRNetLoss: + num_joints: *num_joints + swahr: false + + +#####optimizer +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [200, 260] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: None + +#####data +TrainDataset: + !KeypointBottomUpCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + return_bbox: False + return_area: False + return_class: False + +EvalDataset: + !KeypointBottomUpCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + test_mode: true + return_bbox: False + return_area: False + return_class: False + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 8 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomAffine: + max_degree: 30 + scale: [0.75, 1.5] + max_shift: 0.2 + trainsize: [*input_size, *input_size] + hmsize: [*hm_size, *hm_size_2x] + - KeyPointFlip: + flip_prob: 0.5 + flip_permutation: *flip_perm + hmsize: [*hm_size, *hm_size_2x] + - ToHeatmaps: + num_joints: *num_joints + hmsize: [*hm_size, *hm_size_2x] + sigma: 2 + - TagGenerate: + num_joints: *num_joints + max_people: *max_people + - NormalizePermute: + mean: *global_mean + std: *global_std + batch_size: 20 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml new file mode 100644 index 0000000000000000000000000000000000000000..7b0f7560a0c192dda869c2546c2c7c3dd7baa79d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 10 +save_dir: output +snapshot_epoch: 10 +weights: output/higherhrnet_hrnet_w32_512_swahr/model_final +epoch: 300 +num_joints: &num_joints 17 +flip_perm: &flip_perm [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] +input_size: &input_size 512 +hm_size: &hm_size 128 +hm_size_2x: &hm_size_2x 256 +max_people: &max_people 30 +metric: COCO +IouType: keypoints +num_classes: 1 + + +#####model +architecture: HigherHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +HigherHRNet: + backbone: HRNet + hrhrnet_head: HrHRNetHead + post_process: HrHRNetPostProcess + flip_perm: *flip_perm + eval_flip: true + +HRNet: + width: &width 32 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +HrHRNetHead: + num_joints: *num_joints + width: *width + loss: HrHRNetLoss + swahr: true + +HrHRNetLoss: + num_joints: *num_joints + swahr: true + + +#####optimizer +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [200, 260] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: None + + +#####data +TrainDataset: + !KeypointBottomUpCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + return_bbox: False + return_area: False + return_class: False + +EvalDataset: + !KeypointBottomUpCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + test_mode: true + return_bbox: False + return_area: False + return_class: False + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 8 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomAffine: + max_degree: 30 + scale: [0.75, 1.5] + max_shift: 0.2 + trainsize: [*input_size, *input_size] + hmsize: [*hm_size, *hm_size_2x] + - KeyPointFlip: + flip_prob: 0.5 + flip_permutation: *flip_perm + hmsize: [*hm_size, *hm_size_2x] + - ToHeatmaps: + num_joints: *num_joints + hmsize: [*hm_size, *hm_size_2x] + sigma: 2 + - TagGenerate: + num_joints: *num_joints + max_people: *max_people + - NormalizePermute: + mean: *global_mean + std: *global_std + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_640.yml b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..edd66e55d5295abc23466a355dad18714afa6e15 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_640.yml @@ -0,0 +1,139 @@ +use_gpu: true +log_iter: 10 +save_dir: output +snapshot_epoch: 10 +weights: output/higherhrnet_hrnet_w32_640/model_final +epoch: 300 +num_joints: &num_joints 17 +flip_perm: &flip_perm [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] +input_size: &input_size 640 +hm_size: &hm_size 160 +hm_size_2x: &hm_size_2x 320 +max_people: &max_people 30 +metric: COCO +IouType: keypoints +num_classes: 1 + + +#####model +architecture: HigherHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +HigherHRNet: + backbone: HRNet + hrhrnet_head: HrHRNetHead + post_process: HrHRNetPostProcess + flip_perm: *flip_perm + eval_flip: true + +HRNet: + width: &width 32 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +HrHRNetHead: + num_joints: *num_joints + width: *width + loss: HrHRNetLoss + swahr: false + +HrHRNetLoss: + num_joints: *num_joints + swahr: false + + +#####optimizer +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [200, 260] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: None + +#####data +TrainDataset: + !KeypointBottomUpCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + return_bbox: False + return_area: False + return_class: False + +EvalDataset: + !KeypointBottomUpCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + test_mode: true + return_bbox: False + return_area: False + return_class: False + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 8 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomAffine: + max_degree: 30 + scale: [0.75, 1.5] + max_shift: 0.2 + trainsize: [*input_size, *input_size] + hmsize: [*hm_size, *hm_size_2x] + - KeyPointFlip: + flip_prob: 0.5 + flip_permutation: *flip_perm + hmsize: [*hm_size, *hm_size_2x] + - ToHeatmaps: + num_joints: *num_joints + hmsize: [*hm_size, *hm_size_2x] + sigma: 2 + - TagGenerate: + num_joints: *num_joints + max_people: *max_people + - NormalizePermute: + mean: *global_mean + std: *global_std + batch_size: 20 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - EvalAffine: + size: *input_size + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml new file mode 100644 index 0000000000000000000000000000000000000000..a759c121a1e891e510f802cfbf53962c98a368be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml @@ -0,0 +1,141 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml new file mode 100644 index 0000000000000000000000000000000000000000..6eaa0ec0ba17e25cd3787e082963c3c863388eeb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml @@ -0,0 +1,145 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_384x288/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 384 +train_width: &train_width 288 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [72, 96] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + flip: true + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 32 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w48_256x192.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w48_256x192.yml new file mode 100644 index 0000000000000000000000000000000000000000..1417e03d22e96aea50833eab7d2a522f192ebfee --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/dark_hrnet_w48_256x192.yml @@ -0,0 +1,141 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w48_256x192/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W48_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 48 + loss: KeyPointMSELoss + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x192.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x192.yml new file mode 100644 index 0000000000000000000000000000000000000000..d80d97264082aa1de2af906042a656252f0cd356 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x192.yml @@ -0,0 +1,142 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false #whether to fuse normalize layer into model while export model diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml new file mode 100644 index 0000000000000000000000000000000000000000..09e860989dea37e479a5a3217994dd948c405627 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml @@ -0,0 +1,132 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x256_mpii/model_final +epoch: 210 +num_joints: &num_joints 16 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownMPIIEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 256 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [64, 64] +flip_perm: &flip_perm [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownMPIIDataset + image_dir: images + anno_path: annotations/mpii_train.json + dataset_dir: dataset/mpii + num_joints: *num_joints + + +EvalDataset: + !KeypointTopDownMPIIDataset + image_dir: images + anno_path: annotations/mpii_val.json + dataset_dir: dataset/mpii + num_joints: *num_joints + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [7, 8, 9, 10, 11, 12, 13, 14, 15] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_384x288.yml b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_384x288.yml new file mode 100644 index 0000000000000000000000000000000000000000..15425059a0f4723d5a13b36aec4962a4dc586d4d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/hrnet/hrnet_w32_384x288.yml @@ -0,0 +1,142 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_384x288/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 384 +train_width: &train_width 288 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [72, 96] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + flip: true + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_256x192_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_256x192_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..2664082465aa666ff7be789c008cd0873bde4021 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_256x192_coco.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/lite_hrnet_18_256x192_coco/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: false + +LiteHRNet: + network_type: lite_18 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_384x288_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_384x288_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f2bddfcca023d4a25a58e36565dfa25b84778957 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_18_384x288_coco.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/lite_hrnet_18_384x288_coco/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 384 +train_width: &train_width 288 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [72, 96] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: false + +LiteHRNet: + network_type: lite_18 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 3 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 32 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_256x192_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_256x192_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..118ba360448e59491c2df3b0bc0a23f29e205827 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_256x192_coco.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/lite_hrnet_30_256x192_coco/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: false + +LiteHRNet: + network_type: lite_30 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_384x288_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_384x288_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..97f3aa8e3e671a05828bc20416cfb17bd538c8fc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/lite_hrnet_30_384x288_coco.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/lite_hrnet_30_384x288_coco/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 384 +train_width: &train_width 288 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [72, 96] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: false + +LiteHRNet: + network_type: lite_30 + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 3 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 32 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a80d08c1fc98e49b8197c1fea1144cdf1efe34ac --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml @@ -0,0 +1,140 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/wider_naive_hrnet_18_256x192_coco/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: false + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/petr/petr_resnet50_16x2_coco.yml b/PaddleDetection-release-2.6/configs/keypoint/petr/petr_resnet50_16x2_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a97eff63ab19882e827f780fba024afafa49abca --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/petr/petr_resnet50_16x2_coco.yml @@ -0,0 +1,254 @@ +use_gpu: true +log_iter: 50 +save_dir: output +snapshot_epoch: 1 +weights: output/petr_resnet50_16x2_coco/model_final +epoch: 100 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: COCO +num_classes: 1 +trainsize: &trainsize 512 +flip_perm: &flip_perm [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] +find_unused_parameters: False + +#####model +architecture: PETR +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/PETR_pretrained.pdparams + +PETR: + backbone: + name: ResNet + depth: 50 + variant: b + norm_type: bn + freeze_norm: True + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + lr_mult_list: [0.1, 0.1, 0.1, 0.1] + neck: + name: ChannelMapper + in_channels: [512, 1024, 2048] + kernel_size: 1 + out_channels: 256 + norm_type: "gn" + norm_groups: 32 + act: None + num_outs: 4 + bbox_head: + name: PETRHead + num_query: 300 + num_classes: 1 # only person + in_channels: 2048 + sync_cls_avg_factor: true + with_kpt_refine: true + transformer: + name: PETRTransformer + as_two_stage: true + encoder: + name: TransformerEncoder + encoder_layer: + name: TransformerEncoderLayer + d_model: 256 + attn: + name: MSDeformableAttention + embed_dim: 256 + num_heads: 8 + num_levels: 4 + num_points: 4 + dim_feedforward: 1024 + dropout: 0.1 + num_layers: 6 + decoder: + name: PETR_TransformerDecoder + num_layers: 3 + return_intermediate: true + decoder_layer: + name: PETR_TransformerDecoderLayer + d_model: 256 + dim_feedforward: 1024 + dropout: 0.1 + self_attn: + name: MultiHeadAttention + embed_dim: 256 + num_heads: 8 + dropout: 0.1 + cross_attn: + name: MultiScaleDeformablePoseAttention + embed_dims: 256 + num_heads: 8 + num_levels: 4 + num_points: 17 + hm_encoder: + name: TransformerEncoder + encoder_layer: + name: TransformerEncoderLayer + d_model: 256 + attn: + name: MSDeformableAttention + embed_dim: 256 + num_heads: 8 + num_levels: 1 + num_points: 4 + dim_feedforward: 1024 + dropout: 0.1 + num_layers: 1 + refine_decoder: + name: PETR_DeformableDetrTransformerDecoder + num_layers: 2 + return_intermediate: true + decoder_layer: + name: PETR_TransformerDecoderLayer + d_model: 256 + dim_feedforward: 1024 + dropout: 0.1 + self_attn: + name: MultiHeadAttention + embed_dim: 256 + num_heads: 8 + dropout: 0.1 + cross_attn: + name: MSDeformableAttention + embed_dim: 256 + num_levels: 4 + positional_encoding: + name: PositionEmbedding + num_pos_feats: 128 + normalize: true + offset: -0.5 + loss_cls: + name: Weighted_FocalLoss + use_sigmoid: true + gamma: 2.0 + alpha: 0.25 + loss_weight: 2.0 + reduction: "mean" + loss_kpt: + name: L1Loss + loss_weight: 70.0 + loss_kpt_rpn: + name: L1Loss + loss_weight: 70.0 + loss_oks: + name: OKSLoss + loss_weight: 2.0 + loss_hm: + name: CenterFocalLoss + loss_weight: 4.0 + loss_kpt_refine: + name: L1Loss + loss_weight: 80.0 + loss_oks_refine: + name: OKSLoss + loss_weight: 3.0 + assigner: + name: PoseHungarianAssigner + cls_cost: + name: FocalLossCost + weight: 2.0 + kpt_cost: + name: KptL1Cost + weight: 70.0 + oks_cost: + name: OksCost + weight: 7.0 + +#####optimizer +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + milestones: [80] + gamma: 0.1 + use_warmup: false + # - !LinearWarmup + # start_factor: 0.001 + # steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + optimizer: + type: AdamW + regularizer: + factor: 0.0001 + type: L2 + + +#####data +TrainDataset: + !KeypointBottomUpCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + return_mask: false + +EvalDataset: + !KeypointBottomUpCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + test_mode: true + return_mask: false + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - Decode: {} + - PhotoMetricDistortion: + brightness_delta: 32 + contrast_range: [0.5, 1.5] + saturation_range: [0.5, 1.5] + hue_delta: 18 + - KeyPointFlip: + flip_prob: 0.5 + flip_permutation: *flip_perm + - RandomAffine: + max_degree: 30 + scale: [1.0, 1.0] + max_shift: 0. + trainsize: -1 + - RandomSelect: { transforms1: [ RandomShortSideRangeResize: { scales: [[400, 1400], [1400, 1400]]} ], + transforms2: [ + RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ] }, + RandomSizeCrop: { min_size: 384, max_size: 600}, + RandomShortSideRangeResize: { scales: [[400, 1400], [1400, 1400]]} ]} + batch_transforms: + - NormalizeImage: {mean: *global_mean, std: *global_std, is_scale: True} + - PadGT: {pad_img: True, minimum_gtnum: 1} + - Permute: {} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - PETR_Resize: {img_scale: [[800, 1333]], keep_ratio: True} + # - MultiscaleTestResize: {origin_target_size: [[800, 1333]], use_flip: false} + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - EvalAffine: {size: 800} + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README.md b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d9181c507ecdf4c0c025eed7d776fff0db2e756a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README.md @@ -0,0 +1,281 @@ +简体中文 | [English](README_en.md) + +# PP-TinyPose + +
    + +
    图片来源:COCO2017开源数据集
    +
    + +## 最新动态 +- **2022.8.01:发布PP-TinyPose升级版。 在健身、舞蹈等场景的业务数据集端到端AP提升9.1** + - 新增体育场景真实数据,复杂动作识别效果显著提升,覆盖侧身、卧躺、跳跃、高抬腿等非常规动作 + - 检测模型升级为[PP-PicoDet增强版](../../../configs/picodet/README.md),在COCO数据集上精度提升3.1% + - 关键点稳定性增强。新增滤波稳定方式,视频预测结果更加稳定平滑 + + ![](https://user-images.githubusercontent.com/15810355/181733705-d0f84232-c6a2-43dd-be70-4a3a246b8fbc.gif) + +## 简介 +PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测模型,可流畅地在移动端设备上执行多人姿态估计任务。借助PaddleDetecion自研的优秀轻量级检测模型[PicoDet](../../picodet/README.md),我们同时提供了特色的轻量级垂类行人检测模型。TinyPose的运行环境有以下依赖要求: +- [PaddlePaddle](https://github.com/PaddlePaddle/Paddle)>=2.2 + +如希望在移动端部署,则还需要: +- [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)>=2.11 + + +
    + +
    + +## 部署案例 + +- [Android Fitness Demo](https://github.com/zhiboniu/pose_demo_android) 基于PP-TinyPose, 高效实现健身校准与计数功能。 + +
    + +
    + +- 欢迎扫码快速体验 +
    + +
    + + +## 模型库 + +### Pipeline性能 +| 单人模型配置 | AP (业务数据集) | AP (COCO Val单人)| 单人耗时 (FP32) | 单人耗时 (FP16) | +| :---------------------------------- | :------: | :------: | :---: | :---: | +| PicoDet-S-Lcnet-Pedestrian-192\*192 + PP-TinyPose-128\*96 | 77.1 (+9.1) | 52.3 (+0.5) | 12.90 ms| 9.61 ms | + +| 多人模型配置 | AP (业务数据集) | AP (COCO Val多人)| 6人耗时 (FP32) | 6人耗时 (FP16)| +| :------------------------ | :-------: | :-------: | :---: | :---: | +| PicoDet-S-Lcnet-Pedestrian-320\*320 + PP-TinyPose-128\*96 | 78.0 (+7.7) | 50.1 (-0.2) | 47.63 ms| 34.62 ms | + +**说明** +- 关键点检测模型的精度指标是基于对应行人检测模型检测得到的检测框。 +- 精度测试中去除了flip操作,且检测置信度阈值要求0.5。 +- 速度测试环境为qualcomm snapdragon 865,采用arm8下4线程推理。 +- Pipeline速度包含模型的预处理、推理及后处理部分。 +- 精度值的增量对比自历史版本中对应模型组合, 详情请见**历史版本-Pipeline性能**。 +- 精度测试中,为了公平比较,多人数据去除了6人以上(不含6人)的图像。 + +### 关键点检测模型 +| 模型 | 输入尺寸 | AP (业务数据集) | AP (COCO Val) | 参数量 | FLOPS |单人推理耗时 (FP32) | 单人推理耗时(FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :---------- | :------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------------: | :-----------------: | :------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PP-TinyPose | 128*96 | 84.3 | 58.4 | 1.32 M | 81.56 M | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96_fp16.nb) | +| PP-TinyPose | 256*192 | 91.0 | 68.3 | 1.32 M | 326.24M |14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192_fp16.nb) | + + +### 行人检测模型 +| 模型 | 输入尺寸 | mAP (COCO Val-Person) | 参数量 | FLOPS | 平均推理耗时 (FP32) | 平均推理耗时 (FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :------------------- | :------: | :------------: | :------------: | :------------: | :-----------------: | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PicoDet-S-Lcnet-Pedestrian | 192*192 | 31.7 | 1.16 M | 170.03 M | 5.24ms | 3.66ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian_fp16.nb) | +| PicoDet-S-Lcnet-Pedestrian | 320*320 | 41.6 | 1.16 M | 472.07 M | 13.87ms | 8.94ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian_fp16.nb) | + +**说明** +- 关键点检测模型与行人检测模型均使用`COCO train2017`, `AI Challenger trainset`以及采集的多姿态场景数据集作为训练集。关键点检测模型使用多姿态场景数据集作为测试集,行人检测模型采用`COCO instances val2017`作为测试集。 +- 关键点检测模型的精度指标所依赖的检测框为ground truth标注得到。 +- 关键点检测模型与行人检测模型均在4卡环境下训练,若实际训练环境需要改变GPU数量或batch size, 须参考[FAQ](../../../docs/tutorials/FAQ/README.md)对应调整学习率。 +- 推理速度测试环境为 Qualcomm Snapdragon 865,采用arm8下4线程推理得到。 + +## 历史版本 + +
    +2021版本 + + +### Pipeline性能 +| 单人模型配置 | AP (COCO Val 单人) | 单人耗时 (FP32) | 单人耗时 (FP16) | +| :------------------------ | :------: | :---: | :---: | +| PicoDet-S-Pedestrian-192\*192 + PP-TinyPose-128\*96 | 51.8 | 11.72 ms| 8.18 ms | +| 其他优秀开源模型-192\*192 | 22.3 | 12.0 ms| - | + +| 多人模型配置 | AP (COCO Val 多人) | 6人耗时 (FP32) | 6人耗时 (FP16)| +| :------------------------ | :-------: | :---: | :---: | +| PicoDet-S-Pedestrian-320\*320 + PP-TinyPose-128\*96 | 50.3 | 44.0 ms| 32.57 ms | +| 其他优秀开源模型-256\*256 | 39.4 | 51.0 ms| - | + +**说明** +- 关键点检测模型的精度指标是基于对应行人检测模型检测得到的检测框。 +- 精度测试中去除了flip操作,且检测置信度阈值要求0.5。 +- 精度测试中,为了公平比较,多人数据去除了6人以上(不含6人)的图像。 +- 速度测试环境为qualcomm snapdragon 865,采用arm8下4线程、FP32推理得到。 +- Pipeline速度包含模型的预处理、推理及后处理部分。 +- 其他优秀开源模型的测试及部署方案,请参考[这里](https://github.com/zhiboniu/MoveNet-PaddleLite)。 +- 更多环境下的性能测试结果,请参考[Keypoint Inference Benchmark](../KeypointBenchmark.md)。 + + +### 关键点检测模型 +| 模型 | 输入尺寸 | AP (COCO Val) | 单人推理耗时 (FP32) | 单人推理耗时(FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :---------- | :------: | :-----------: | :-----------------: | :-----------------: | :------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16_lite.tar) | +| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16_lite.tar) | + +### 行人检测模型 +| 模型 | 输入尺寸 | mAP (COCO Val-Person) | 平均推理耗时 (FP32) | 平均推理耗时 (FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :------------------- | :------: | :------------: | :-----------------: | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16_lite.tar) | +| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16_lite.tar) | + + +**说明** +- 关键点检测模型与行人检测模型均使用`COCO train2017`和`AI Challenger trainset`作为训练集。关键点检测模型使用`COCO person keypoints val2017`作为测试集,行人检测模型采用`COCO instances val2017`作为测试集。 +- 关键点检测模型的精度指标所依赖的检测框为ground truth标注得到。 +- 关键点检测模型与行人检测模型均在4卡环境下训练,若实际训练环境需要改变GPU数量或batch size, 须参考[FAQ](../../../docs/tutorials/FAQ/README.md)对应调整学习率。 +- 推理速度测试环境为 Qualcomm Snapdragon 865,采用arm8下4线程推理得到。 + + +
    + +## 模型训练 +关键点检测模型与行人检测模型的训练集在`COCO`以外还扩充了[AI Challenger](https://arxiv.org/abs/1711.06475)数据集,各数据集关键点定义如下: +``` +COCO keypoint Description: + 0: "Nose", + 1: "Left Eye", + 2: "Right Eye", + 3: "Left Ear", + 4: "Right Ear", + 5: "Left Shoulder, + 6: "Right Shoulder", + 7: "Left Elbow", + 8: "Right Elbow", + 9: "Left Wrist", + 10: "Right Wrist", + 11: "Left Hip", + 12: "Right Hip", + 13: "Left Knee", + 14: "Right Knee", + 15: "Left Ankle", + 16: "Right Ankle" + +AI Challenger Description: + 0: "Right Shoulder", + 1: "Right Elbow", + 2: "Right Wrist", + 3: "Left Shoulder", + 4: "Left Elbow", + 5: "Left Wrist", + 6: "Right Hip", + 7: "Right Knee", + 8: "Right Ankle", + 9: "Left Hip", + 10: "Left Knee", + 11: "Left Ankle", + 12: "Head top", + 13: "Neck" +``` + +由于两个数据集的关键点标注形式不同,我们将两个数据集的标注进行了对齐,仍然沿用COCO的标注形式,您可以下载[训练的参考列表](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json)并放在`dataset/`下使用。对齐两个数据集标注文件的主要处理如下: +- `AI Challenger`关键点标注顺序调整至与COCO一致,统一是否标注/可见的标志位; +- 舍弃了`AI Challenger`中特有的点位;将`AI Challenger`数据中`COCO`特有点位标记为未标注; +- 重新排列了`image_id`与`annotation id`; +利用转换为`COCO`形式的合并数据标注,执行模型训练: +```bash +# 关键点检测模型 +python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml + +# 行人检测模型 +python3 -m paddle.distributed.launch tools/train.py -c configs/picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml +``` + +## 部署流程 +### 实现部署预测 +1. 通过以下命令将训练得到的模型导出: +```bash +python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final + +python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final +``` +导出后的模型如: +``` +picodet_s_192_pedestrian +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` +您也可以直接下载模型库中提供的对应`预测部署模型`,分别获取得到行人检测模型和关键点检测模型的预测部署模型,解压即可。 + +2. 执行Python联合部署预测 +```bash +# 预测一张图片 +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU + +# 预测多张图片 +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU + +# 预测一个视频 +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU +``` + +3. 执行C++联合部署预测 +- 请先按照[C++端预测部署](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/deploy/cpp),根据您的实际环境准备对应的`paddle_inference`库及相关依赖。 +- 我们提供了[一键编译脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/deploy/cpp/scripts/build.sh),您可以通过该脚本填写相关环境变量的位置,编译上述代码后,得到可执行文件。该过程中请保证`WITH_KEYPOINT=ON`. +- 编译完成后,即可执行部署预测,例如: +```bash +# 预测一张图片 +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU + +# 预测多张图片 +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU + +# 预测一个视频 +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU +``` + +### 实现移动端部署 +#### 直接使用我们提供的模型进行部署 +1. 下载模型库中提供的`Paddle-Lite部署模型`,分别获取得到行人检测模型和关键点检测模型的`.nb`格式文件。 +2. 准备Paddle-Lite运行环境, 可直接通过[PaddleLite预编译库下载](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)获取预编译库,无需自行编译。如需要采用FP16推理,则需要下载FP16的预编译库。 +3. 编译模型运行代码,详细步骤见[Paddle-Lite端侧部署](../../../deploy/lite/README.md)。 + +#### 将训练的模型实现端侧部署 +如果您希望将自己训练的模型应用于部署,可以参考以下步骤: +1. 将训练的模型导出 +```bash +python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final TestReader.fuse_normalize=true + +python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final TestReader.fuse_normalize=true +``` +2. 转换为Lite模型(依赖[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)) + +- 安装Paddle-Lite: +```bash +pip install paddlelite +``` +- 执行以下步骤,以得到对应后缀为`.nb`的Paddle-Lite模型用于端侧部署: +``` +# 1. 转换行人检测模型 +# FP32 +paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp32 +# FP16 +paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp16 --enable_fp16=true + +# 2. 转换关键点检测模型 +# FP32 +paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp32 +# FP16 +paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp16 --enable_fp16=true +``` + +3. 编译模型运行代码,详细步骤见[Paddle-Lite端侧部署](../../../deploy/lite/README.md)。 + +我们已提供包含数据预处理、模型推理及模型后处理的[全流程示例代码](../../../deploy/lite/),可根据实际需求进行修改。 + +**注意** +- 在导出模型时增加`TestReader.fuse_normalize=true`参数,可以将对图像的Normalize操作合并在模型中执行,从而实现加速。 +- FP16推理可实现更快的模型推理速度。若希望部署FP16模型,除模型转换步骤外,还需要编译支持FP16的Paddle-Lite预测库,详见[Paddle Lite 使用 ARM CPU 预测部署](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html)。 + +## 关键点稳定策略(仅支持视频推理) +请参考[关键点稳定策略](../README.md#关键点稳定策略仅适用于视频数据)。 + +## 优化策略 +TinyPose采用了以下策略来平衡模型的速度和精度表现: +- 轻量级的姿态估计任务骨干网络,[wider naive Lite-HRNet](https://arxiv.org/abs/2104.06403)。 +- 更小的输入尺寸,以提升整体推理速度。 +- 加入Distribution-Aware coordinate Representation of Keypoints ([DARK](https://arxiv.org/abs/1910.06278)),以提升低分辨率热力图下模型的精度表现。 +- Unbiased Data Processing ([UDP](https://arxiv.org/abs/1911.07524)),使用无偏数据编解码提升模型精度。 +- Augmentation by Information Dropping ([AID](https://arxiv.org/abs/2008.07139v2)),通过添加信息丢失的数组增强,提升模型对关键点的定位能力。 +- FP16 推理, 实现更快的模型推理速度。 diff --git a/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README_en.md b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6bf5d9b70877dac66c82dfdc821ec5dd4d1fe6f4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/README_en.md @@ -0,0 +1,224 @@ +[简体中文](README.md) | English + +# PP-TinyPose + +
    + +
    Image Source: COCO2017
    +
    + +## Introduction +PP-TinyPose is a real-time keypoint detection model optimized by PaddleDetecion for mobile devices, which can smoothly run multi-person pose estimation tasks on mobile devices. With the excellent self-developed lightweight detection model [PicoDet](../../picodet/README.md), we also provide a lightweight pedestrian detection model. PP-TinyPose has the following dependency requirements: +- [PaddlePaddle](https://github.com/PaddlePaddle/Paddle)>=2.2 + +If you want to deploy it on the mobile devives, you also need: +- [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)>=2.10 + + +
    + +
    + +## Deployment Case + +- [Android Fitness Demo](https://github.com/zhiboniu/pose_demo_android) based on PP-TinyPose, which efficiently implements fitness calibration and counting. + +
    + +
    + +- Welcome to scan the QR code for quick experience. +
    + +
    + + +## Model Zoo +### Keypoint Detection Model +| Model | Input Size | AP (COCO Val) | Inference Time for Single Person (FP32)| Inference Time for Single Person(FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model(FP32) | Paddle-Lite Model(FP16)| +| :------------------------ | :-------: | :------: | :------: |:---: | :---: | :---: | :---: | :---: | :---: | +| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16_lite.tar) | +| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16_lite.tar) | + +### Pedestrian Detection Model +| Model | Input Size | mAP (COCO Val) | Average Inference Time (FP32)| Average Inference Time (FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model(FP32) | Paddle-Lite Model(FP16)| +| :------------------------ | :-------: | :------: | :------: | :---: | :---: | :---: | :---: | :---: | :---: | +| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16_lite.tar) | +| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16_lite.tar) | + + +**Tips** +- The keypoint detection model and pedestrian detection model are both trained on `COCO train2017` and `AI Challenger trainset`. The keypoint detection model is evaluated on `COCO person keypoints val2017`, and the pedestrian detection model is evaluated on `COCO instances val2017`. +- The AP results of keypoint detection models are based on bounding boxes in GroundTruth. +- Both keypoint detection model and pedestrian detection model are trained in a 4-GPU environment. In practice, if number of GPUs or batch size need to be changed according to the training environment, you should refer to [FAQ](../../../docs/tutorials/FAQ/README.md) to adjust the learning rate. +- The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8. + +### Pipeline Performance +| Model for Single-Pose | AP (COCO Val Single-Person) | Time for Single Person(FP32) | Time for Single Person(FP16) | +| :------------------------ | :------: | :---: | :---: | +| PicoDet-S-Pedestrian-192\*192 + PP-TinyPose-128\*96 | 51.8 | 11.72 ms| 8.18 ms | +| Other opensource model-192\*192 | 22.3 | 12.0 ms| - | + +| Model for Multi-Pose | AP (COCO Val Multi-Persons) | Time for Six Persons(FP32) | Time for Six Persons(FP16)| +| :------------------------ | :-------: | :---: | :---: | +| PicoDet-S-Pedestrian-320\*320 + PP-TinyPose-128\*96 | 50.3 | 44.0 ms| 32.57 ms | +| Other opensource model-256\*256 | 39.4 | 51.0 ms| - | + +**Tips** +- The AP results of keypoint detection models are based on bounding boxes detected by corresponding detection model. +- In accuracy evaluation, there is no flip, and threshold of bounding boxes is set to 0.5. +- For fairness, in multi-persons test, we remove images with more than 6 people. +- The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8, FP32. +- Pipeline time includes time for preprocess, inferece and postprocess. +- About the deployment and testing for other opensource model, please refer to [Here](https://github.com/zhiboniu/MoveNet-PaddleLite). +- For more performance data in other runtime environment, please refer to [Keypoint Inference Benchmark](../KeypointBenchmark.md). + +## Model Training +In addition to `COCO`, the trainset for keypoint detection model and pedestrian detection model also includes [AI Challenger](https://arxiv.org/abs/1711.06475). Keypoints of each dataset are defined as follows: +``` +COCO keypoint Description: + 0: "Nose", + 1: "Left Eye", + 2: "Right Eye", + 3: "Left Ear", + 4: "Right Ear", + 5: "Left Shoulder, + 6: "Right Shoulder", + 7: "Left Elbow", + 8: "Right Elbow", + 9: "Left Wrist", + 10: "Right Wrist", + 11: "Left Hip", + 12: "Right Hip", + 13: "Left Knee", + 14: "Right Knee", + 15: "Left Ankle", + 16: "Right Ankle" + +AI Challenger Description: + 0: "Right Shoulder", + 1: "Right Elbow", + 2: "Right Wrist", + 3: "Left Shoulder", + 4: "Left Elbow", + 5: "Left Wrist", + 6: "Right Hip", + 7: "Right Knee", + 8: "Right Ankle", + 9: "Left Hip", + 10: "Left Knee", + 11: "Left Ankle", + 12: "Head top", + 13: "Neck" +``` + +Since the annatation format of these two datasets are different, we aligned their annotations to `COCO` format. You can download [Training List](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json) and put it at `dataset/`. To align these two datasets, we mainly did the following works: +- Align the indexes of the `AI Challenger` keypoint to be consistent with `COCO` and unify the flags whether the keypoint is labeled/visible. +- Discard the unique keypoints in `AI Challenger`. For keypoints not in this dataset but in `COCO`, set it to not labeled. +- Rearranged `image_id` and `annotation id`. + +Training with merged annotation file converted to `COCO` format: +```bash +# keypoint detection model +python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml + +# pedestrian detection model +python3 -m paddle.distributed.launch tools/train.py -c configs/picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml +``` + +## Model Deployment +### Deploy Inference +1. Export the trained model through the following command: +```bash +python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final + +python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final +``` +The exported model looks as: +``` +picodet_s_192_pedestrian +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` +You can also download `Deployment Model` from `Model Zoo` directly. And obtain the deployment models of pedestrian detection model and keypoint detection model, then unzip them. + +2. Python joint inference by detection and keypoint +```bash +# inference for one image +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU + +# inference for several images +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU + +# inference for a video +python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU +``` + +3. C++ joint inference by detection and keypoint +- First, please refer to [C++ Deploy Inference](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/deploy/cpp), prepare the corresponding `paddle_inference` library and related dependencies according to your environment. +- We provide [Compile Script](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/deploy/cpp/scripts/build.sh). You can fill the location of the relevant environment variables in this script and excute it to compile the above codes. you can get an executable file. Please ensure `WITH_KEYPOINT=ON` during this process. +- After compilation, you can do inference like: +```bash +# inference for one image +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU + +# inference for several images +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU + +# inference for a video +./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU +``` + +### Deployment on Mobile Devices +#### Deploy directly using models we provide +1. Download `Lite Model` from `Model Zoo` directly. And get the `.nb` format files of pedestrian detection model and keypoint detection model. +2. Prepare environment for Paddle-Lite, you can obtain precompiled libraries from [PaddleLite Precompiled Libraries](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html). If FP16 is needed, you should download [Precompiled Libraries for FP16](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip). +3. Compile the code to run models. The detail can be seen in [Paddle-Lite Deployment on Mobile Devices](../../../deploy/lite/README.md). + +#### Deployment self-trained models on Mobile Devices +If you want to deploy self-trained models, you can refer to the following steps: +1. Export the trained model +```bash +python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final TestReader.fuse_normalize=true + +python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final TestReader.fuse_normalize=true +``` +2. Convert to Lite Model(rely on [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)) + +- Install Paddle-Lite: +```bash +pip install paddlelite +``` +- Run the following commands to obtain `.nb` format models of Paddle-Lite: +``` +# 1. Convert pedestrian detection model +# FP32 +paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp32 +# FP16 +paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp16 --enable_fp16=true + +# 2. keypoint detection model +# FP32 +paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp32 +# FP16 +paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp16 --enable_fp16=true +``` + +3. Compile the code to run models. The detail can be seen in [Paddle-Lite Deployment on Mobile Devices](../../../deploy/lite/README.md). + +We provide [Example Code](../../../deploy/lite/) including data preprocessing, inferece and postpreocess. You can modify the codes according to your actual needs. + +**Note:** +- Add `TestReader.fuse_normalize=true` during the step of exporting model. The Normalize operation for the image will be executed in the model, which can achieve acceleration. +- With FP16, we can get a faster inference speed. If you want to deploy the FP16 model, in addition to the model conversion step, you also need to compile the Paddle-Lite prediction library that supports FP16. The detail is in [Paddle Lite Deployment on ARM CPU](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html). + +## Optimization Strategies +TinyPose adopts the following strategies to balance the speed and accuracy of the model: +- Lightweight backbone network for pose estimation, [wider naive Lite-HRNet](https://arxiv.org/abs/2104.06403). +- Smaller input size. +- Distribution-Aware coordinate Representation of Keypoints ([DARK](https://arxiv.org/abs/1910.06278)), which can improve the accuracy of the model under the low-resolution heatmap. +- Unbiased Data Processing ([UDP](https://arxiv.org/abs/1911.07524)). +- Augmentation by Information Dropping ([AID](https://arxiv.org/abs/2008.07139v2)). +- FP16 inference. diff --git a/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_128x96.yml b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_128x96.yml new file mode 100644 index 0000000000000000000000000000000000000000..e213c299020cae7954ad1fb9214d3e53156e2ee5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_128x96.yml @@ -0,0 +1,147 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/tinypose_128x96/model_final +epoch: 420 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 128 +train_width: &train_width 96 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [24, 32] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: true + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.008 + schedulers: + - !PiecewiseDecay + milestones: [380, 410] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.5 + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - AugmentationbyInformantionDropping: + prob_cutout: 0.5 + offset_factor: 0.05 + num_patch: 1 + trainsize: *trainsize + - TopDownAffine: + trainsize: *trainsize + use_udp: true + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 1 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 512 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + use_udp: true + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_256x192.yml b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_256x192.yml new file mode 100644 index 0000000000000000000000000000000000000000..9de2a635f4c8105335916c1a9200b189cc17f016 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/keypoint/tiny_pose/tinypose_256x192.yml @@ -0,0 +1,147 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/tinypose_256x192/model_final +epoch: 420 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: true + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.002 + schedulers: + - !PiecewiseDecay + milestones: [380, 410] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.5 + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - AugmentationbyInformantionDropping: + prob_cutout: 0.5 + offset_factor: 0.05 + num_patch: 1 + trainsize: *trainsize + - TopDownAffine: + trainsize: *trainsize + use_udp: true + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 128 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + use_udp: true + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/README.md b/PaddleDetection-release-2.6/configs/mask_rcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7e7276127554b80b0552f4051702f3ad74a0cfbf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/README.md @@ -0,0 +1,31 @@ +# Mask R-CNN + +## Model Zoo + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50 | Mask | 1 | 1x | ---- | 37.4 | 32.8 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml) | +| ResNet50 | Mask | 1 | 2x | ---- | 39.7 | 34.5 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml) | +| ResNet50-FPN | Mask | 1 | 1x | ---- | 39.2 | 35.6 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | Mask | 1 | 2x | ---- | 40.5 | 36.7 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml) | +| ResNet50-vd-FPN | Mask | 1 | 1x | ---- | 40.3 | 36.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml) | +| ResNet50-vd-FPN | Mask | 1 | 2x | ---- | 41.4 | 37.5 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) | +| ResNet101-FPN | Mask | 1 | 1x | ---- | 40.6 | 36.6 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml) | +| ResNet101-vd-FPN | Mask | 1 | 1x | ---- | 42.4 | 38.1 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Mask | 1 | 1x | ---- | 44.0 | 39.5 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) | +| ResNeXt101-vd-FPN | Mask | 1 | 2x | ---- | 44.6 | 39.8 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 1x | ---- | 42.0 | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 2x | ---- | 42.7 | 38.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | + + +## Citations +``` +@article{He_2017, + title={Mask R-CNN}, + journal={2017 IEEE International Conference on Computer Vision (ICCV)}, + publisher={IEEE}, + author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross}, + year={2017}, + month={Oct} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_fpn_reader.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..6d95dc6a7cb2fe8c49a0fba79f9b6b71232d4c20 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50.yml new file mode 100644 index 0000000000000000000000000000000000000000..04dab63701171ada046b60e687422e06f8043c26 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50.yml @@ -0,0 +1,87 @@ +architecture: MaskRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +MaskRCNN: + backbone: ResNet + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [2] + num_stages: 3 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [32, 64, 128, 256, 512] + strides: [16] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 12000 + post_nms_top_n: 2000 + topk_after_collect: False + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 6000 + post_nms_top_n: 1000 + + +BBoxHead: + head: Res5Head + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + with_pool: true + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: true + +MaskFeat: + num_convs: 0 + out_channel: 256 + +MaskAssigner: + mask_resolution: 14 + +MaskPostProcess: + binary_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..dd7587669661a9e24431a167835ef89527f5e0c8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml @@ -0,0 +1,91 @@ +architecture: MaskRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +MaskRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: False + +MaskFeat: + num_convs: 4 + out_channel: 256 + +MaskAssigner: + mask_resolution: 28 + +MaskPostProcess: + binary_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_reader.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..7001af7ac980eeb8ca688a8e39cca9dfcf950129 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/mask_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..63f898e9c52556bfa0fbbe9c369900c09ab3f94c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..aae703c194db64158587baf86d3e6aca60bd8923 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams +weights: output/mask_rcnn_r101_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..58d7a7884d7886c39544ee56bf445590122d0acc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/mask_rcnn_r101_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..01f4721cb8b139fad640b8fbf884d6df76023f13 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/mask_rcnn_r50.yml', + '_base_/mask_reader.yml', +] +weights: output/mask_rcnn_r50_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f1e6b669bd7a941608c785593314c6e7feff0b59 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'mask_rcnn_r50_1x_coco.yml', +] +weights: output/mask_rcnn_r50_2x_coco/model_final + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..95e48c2b43d8ec8ca6dc95f2a6a45cf8359bcc49 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/mask_rcnn_r50_fpn.yml', + '_base_/mask_fpn_reader.yml', +] +weights: output/mask_rcnn_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f687fd69b1e0521a825da658f2ad14a33ef4b581 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] +weights: output/mask_rcnn_r50_fpn_2x_coco/model_final + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d5387417b4bfac35a71b3edf8f062a751dcae3b3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/mask_rcnn_r50_vd_fpn_1x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f85f0299cc2358ca08c548fd3c68eefd108f3d1f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml @@ -0,0 +1,26 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_pretrained.pdparams +weights: output/mask_rcnn_r50_vd_fpn_2x_coco/model_final + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c5718a8d277d442081d91e89787be16c90b5e01a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/mask_rcnn_r50_fpn.yml', + '_base_/mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/mask_rcnn_r50_vd_fpn_ssld_1x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 12 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..65b31e6f18d9795db1758b651eccef5969b1f74c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/mask_rcnn_r50_fpn.yml', + '_base_/mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/mask_rcnn_r50_vd_fpn_ssld_2x_coco/model_final + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..238750294f9783a59ca6ff9f8bdcb4799865f5fe --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/mask_rcnn_x101_vd_64x4d_fpn_1x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + variant: d + groups: 64 + base_width: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 12 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6a0d0f789972b8f11fc04475b69726d42f150746 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + 'mask_rcnn_r50_fpn_1x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt101_vd_64x4d_pretrained.pdparams +weights: output/mask_rcnn_x101_vd_64x4d_fpn_2x_coco/model_final + +ResNet: + # for ResNeXt: groups, base_width, base_channels + depth: 101 + variant: d + groups: 64 + base_width: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/mot/DataDownload.md b/PaddleDetection-release-2.6/configs/mot/DataDownload.md new file mode 100644 index 0000000000000000000000000000000000000000..a8c9207f8c9d1119b86acf2fdddea5da81e2aa3c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/DataDownload.md @@ -0,0 +1,39 @@ +# 多目标跟踪数据集下载汇总 +## 目录 +- [行人跟踪](#行人跟踪) +- [车辆跟踪](#车辆跟踪) +- [人头跟踪](#人头跟踪) +- [多类别跟踪](#多类别跟踪) + +## 行人跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| MOT17 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip) | - | +| MOT16 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip) | - | +| Caltech | [download](https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip) | - | +| Cityscapes | [download](https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip) | - | +| CUHKSYSU | [download](https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip) | - | +| PRW | [download](https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip) | - | +| ETHZ | [download](https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip) | - | + + +## 车辆跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| AICity21 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/aic21mtmct_vehicle.zip) | - | + + +## 人头跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| HT21 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip) | - | + + +## 多类别跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| VisDrone-MOT | [download](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip) | - | diff --git a/PaddleDetection-release-2.6/configs/mot/README.md b/PaddleDetection-release-2.6/configs/mot/README.md new file mode 100644 index 0000000000000000000000000000000000000000..73bf75fdfc31dfaf4344c8a7a5954e2e35c5baad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/README.md @@ -0,0 +1,301 @@ +简体中文 | [English](README_en.md) + +# 多目标跟踪 (Multi-Object Tracking) + +## 内容 +- [简介](#简介) +- [安装依赖](#安装依赖) +- [模型库和选型](#模型库和选型) +- [MOT数据集准备](#MOT数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) + - [用户自定义数据集准备](#用户自定义数据集准备) +- [引用](#引用) + + +## 简介 +多目标跟踪(Multi-Object Tracking, MOT)是对给定视频或图片序列,定位出多个感兴趣的目标,并在连续帧之间维持个体的ID信息和记录其轨迹。 +当前主流的做法是Tracking By Detecting方式,算法主要由两部分组成:Detection + Embedding。Detection部分即针对视频,检测出每一帧中的潜在目标。Embedding部分则将检出的目标分配和更新到已有的对应轨迹上(即ReID重识别任务),进行物体间的长时序关联。根据这两部分实现的不同,又可以划分为**SDE**系列和**JDE**系列算法。 +- SDE(Separate Detection and Embedding)这类算法完全分离Detection和Embedding两个环节,最具代表性的是**DeepSORT**算法。这样的设计可以使系统无差别的适配各类检测器,可以针对两个部分分别调优,但由于流程上是串联的导致速度慢耗时较长。也有算法如**ByteTrack**算法为了降低耗时,不使用Embedding特征来计算外观相似度,前提是检测器的精度足够高。 +- JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding,使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE**和**FairMOT**。这样的设计兼顾精度和速度,可以实现高精度的实时多目标跟踪。 + +PaddleDetection中提供了SDE和JDE两个系列的多种算法实现: +- SDE + - [ByteTrack](./bytetrack) + - [OC-SORT](./ocsort) + - [BoT-SORT](./botsort) + - [DeepSORT](./deepsort) + - [CenterTrack](./centertrack) +- JDE + - [JDE](./jde) + - [FairMOT](./fairmot) + - [MCFairMOT](./mcfairmot) + +**注意:** + - 以上算法原论文均为单类别的多目标跟踪,PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪; + - [DeepSORT](./deepsort)、[JDE](./jde)、[OC-SORT](./ocsort)、[BoT-SORT](./botsort)和[CenterTrack](./centertrack)均只支持单类别的多目标跟踪; + - [DeepSORT](./deepsort)需要额外添加ReID权重一起执行,[ByteTrack](./bytetrack)可加可不加ReID权重,默认不加; + + +### 实时多目标跟踪系统 PP-Tracking +PaddleDetection团队提供了实时多目标跟踪系统[PP-Tracking](../../deploy/pptracking),是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 +PP-Tracking单镜头跟踪采用的方案是[FairMOT](./fairmot),跨镜头跟踪采用的方案是[DeepSORT](./deepsort)。 + +
    + +
    + +
    + +
    + 视频来源:VisDrone和BDD100K公开数据集
    + + +#### AI Studio公开项目案例 +教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +#### Python端预测部署 +教程请参考[PP-Tracking Python部署文档](../../deploy/pptracking/python/README.md)。 + +#### C++端预测部署 +教程请参考[PP-Tracking C++部署文档](../../deploy/pptracking/cpp/README.md)。 + +#### GUI可视化界面预测部署 +教程请参考[PP-Tracking可视化界面使用文档](https://github.com/yangyudong2020/PP-Tracking_GUi)。 + + +### 实时行人分析工具 PP-Human +PaddleDetection团队提供了实时行人分析工具[PP-Human](../../deploy/pipeline),是基于PaddlePaddle深度学习框架的业界首个开源的产业级实时行人分析工具,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能覆盖多目标跟踪、属性识别、行为分析及人流量计数与轨迹记录。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 +PP-Human跟踪采用的方案是[ByteTrack](./bytetrack)。 + +![](https://user-images.githubusercontent.com/48054808/173030254-ecf282bd-2cfe-43d5-b598-8fed29e22020.gif) + +#### AI Studio公开项目案例 +PP-Human实时行人分析全流程实战教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982)。 + +PP-Human赋能社区智能精细化管理教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564)。 + + + +## 安装依赖 +一键安装MOT相关的依赖: +``` +pip install -r requirements.txt +# 或手动pip安装MOT相关的库 +pip install lap motmetrics sklearn +``` +**注意:** + - 预测需确保已安装[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + + + +## 模型库和选型 +- 基础模型 + - [ByteTrack](bytetrack/README_cn.md) + - [OC-SORT](ocsort/README_cn.md) + - [BoT-SORT](botsort/README_cn.md) + - [DeepSORT](deepsort/README_cn.md) + - [JDE](jde/README_cn.md) + - [FairMOT](fairmot/README_cn.md) + - [CenterTrack](centertrack/README_cn.md) +- 特色垂类模型 + - [行人跟踪](pedestrian/README_cn.md) + - [人头跟踪](headtracking21/README_cn.md) + - [车辆跟踪](vehicle/README_cn.md) +- 多类别跟踪 + - [多类别跟踪](mcfairmot/README_cn.md) +- 跨境头跟踪 + - [跨境头跟踪](mtmct/README_cn.md) + +### 模型选型总结 + +关于模型选型,PaddleDetection团队提供的总结建议如下: + +| MOT方式 | 经典算法 | 算法流程 | 数据集要求 | 其他特点 | +| :--------------| :--------------| :------- | :----: | :----: | +| SDE系列 | DeepSORT,ByteTrack,OC-SORT,BoT-SORT,CenterTrack | 分离式,两个独立模型权重先检测后ReID,也可不加ReID | 检测和ReID数据相对独立,不加ReID时即纯检测数据集 |检测和ReID可分别调优,鲁棒性较高,AI竞赛常用| +| JDE系列 | FairMOT,JDE | 联合式,一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练,不易调优,泛化性不强| + +**注意:** + - 由于数据标注的成本较大,建议选型前优先考虑**数据集要求**,如果数据集只有检测框标注而没有ReID标注,是无法使用JDE系列算法训练的,更推荐使用SDE系列; + - SDE系列算法在检测器精度足够高时,也可以不使用ReID权重进行物体间的长时序关联,可以参照[ByteTrack](bytetrack); + - 耗时速度和模型权重参数量计算量有一定关系,耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`; + + + +## MOT数据集准备 +PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接,参考[数据集下载汇总](DataDownload.md),用户可以自行下载使用。 + +根据模型选型总结,MOT数据集可以分为两类:一类纯检测框标注的数据集,仅SDE系列可以使用;另一类是同时有检测和ReID标注的数据集,SDE系列和JDE系列都可以使用。 + +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](../../docs/tutorials/data/PrepareDetDataSet.md)准备。 + +以MOT17数据集为例,下载并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip + +``` +并修改数据集部分的配置文件如下: +``` +num_classes: 1 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json +``` + +数据集目录为: +``` +dataset/mot + |——————MOT17 + |——————annotations + |——————images +``` + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip +``` + +然后按照以下命令可以快速下载各个公开数据集,也解压放在`PaddleDetection/dataset/mot`目录下: +``` +# MIX数据,同JDE,FairMOT论文使用的数据集 +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip +``` +数据集目录为: +``` +dataset/mot + |——————image_lists + |——————caltech.all + |——————citypersons.train + |——————cuhksysu.train + |——————eth.train + |——————mot16.train + |——————mot17.train + |——————prw.train + |——————Caltech + |——————Cityscapes + |——————CUHKSYSU + |——————ETHZ + |——————MOT16 + |——————MOT17 + |——————PRW +``` + +#### JDE数据集的格式 +这几个相关数据集都遵循以下结构: +``` +MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train +``` +所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + + +**注意:** + - MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 + - MIX数据集以及其子数据集都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 + - 更多场景的垂类模型例如车辆行人人头跟踪等,垂类数据集也需要处理成与MIX数据集相同的格式,参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。 + - 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 + + +### 用户自定义数据集准备 +用户自定义数据集准备请参考[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 + +## 引用 +``` +@inproceedings{Wojke2017simple, + title={Simple Online and Realtime Tracking with a Deep Association Metric}, + author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich}, + booktitle={2017 IEEE International Conference on Image Processing (ICIP)}, + year={2017}, + pages={3645--3649}, + organization={IEEE}, + doi={10.1109/ICIP.2017.8296962} +} + +@inproceedings{Wojke2018deep, + title={Deep Cosine Metric Learning for Person Re-identification}, + author={Wojke, Nicolai and Bewley, Alex}, + booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, + year={2018}, + pages={748--756}, + organization={IEEE}, + doi={10.1109/WACV.2018.00087} +} + +@article{wang2019towards, + title={Towards Real-Time Multi-Object Tracking}, + author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin}, + journal={arXiv preprint arXiv:1909.12605}, + year={2019} +} + +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} + +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} + +@article{aharon2022bot, + title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking}, + author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion}, + journal={arXiv preprint arXiv:2206.14651}, + year={2022} +} + +@article{zhou2020tracking, + title={Tracking Objects as Points}, + author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp}, + journal={ECCV}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/README_en.md b/PaddleDetection-release-2.6/configs/mot/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..3ae5444eb885591bab53bae7754dd35563a29964 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/README_en.md @@ -0,0 +1,217 @@ +English | [简体中文](README.md) + +# MOT (Multi-Object Tracking) + +## Table of Contents +- [Introduction](#Introduction) +- [Installation](#Installation) +- [Model Zoo](#Model_Zoo) +- [Dataset Preparation](#Dataset_Preparation) +- [Citations](#Citations) + +## Introduction +The current mainstream of 'Tracking By Detecting' multi-object tracking (MOT) algorithm is mainly composed of two parts: detection and embedding. Detection aims to detect the potential targets in each frame of the video. Embedding assigns and updates the detected target to the corresponding track (named ReID task). According to the different implementation of these two parts, it can be divided into **SDE** series and **JDE** series algorithm. + +- **SDE** (Separate Detection and Embedding) is a kind of algorithm which completely separates Detection and Embedding. The most representative is **DeepSORT** algorithm. This design can make the system fit any kind of detectors without difference, and can be improved for each part separately. However, due to the series process, the speed is slow. Time-consuming is a great challenge in the construction of real-time MOT system. +- **JDE** (Joint Detection and Embedding) is to learn detection and embedding simultaneously in a shared neural network, and set the loss function with a multi task learning approach. The representative algorithms are **JDE** and **FairMOT**. This design can achieve high-precision real-time MOT performance. + +Paddledetection implements three MOT algorithms of these two series, they are [DeepSORT](https://arxiv.org/abs/1812.00442) of SDE algorithm, and [JDE](https://arxiv.org/abs/1909.12605),[FairMOT](https://arxiv.org/abs/2004.01888) of JDE algorithm. + +### PP-Tracking real-time MOT system +In addition, PaddleDetection also provides [PP-Tracking](../../deploy/pptracking/README.md) real-time multi-object tracking system. +PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment. + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc. + +### AI studio public project tutorial +PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582). + +### Python predict and deployment +PP-Tracking supports Python predict and deployment. Please refer to this [doc](../../deploy/pptracking/python/README.md). + +### C++ predict and deployment +PP-Tracking supports C++ predict and deployment. Please refer to this [doc](../../deploy/pptracking/cpp/README.md). + +### GUI predict and deployment +PP-Tracking supports GUI predict and deployment. Please refer to this [doc](https://github.com/yangyudong2020/PP-Tracking_GUi). + +
    + +
    + +
    + +
    + video source:VisDrone, BDD100K dataset
    + + + +## Installation +Install all the related dependencies for MOT: +``` +pip install lap motmetrics sklearn +or +pip install -r requirements.txt +``` +**Notes:** +- Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. + + +## Model Zoo +- Base models + - [ByteTrack](bytetrack/README.md) + - [OC-SORT](ocsort/README.md) + - [BoT-SORT](botsort/README.md) + - [DeepSORT](deepsort/README.md) + - [JDE](jde/README.md) + - [FairMOT](fairmot/README.md) + - [CenterTrack](centertrack/README.md) +- Feature models + - [Pedestrian](pedestrian/README.md) + - [Head](headtracking21/README.md) + - [Vehicle](vehicle/README.md) +- Multi-Class Tracking + - [MCFairMOT](mcfairmot/README.md) +- Multi-Target Multi-Camera Tracking + - [MTMCT](mtmct/README.md) + + +## Dataset Preparation +### MOT Dataset +PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**. + +**Notes:** +- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets. +- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](vehicle/README.md), [head tracking](headtracking21/README.md) and more general [pedestrian tracking](pedestrian/README.md). User defined datasets can also be prepared by referring to data preparation [doc](../../docs/tutorials/data/PrepareMOTDataSet.md). +- The multipe category MOT model is [MCFairMOT] (mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](mcfairmot/README.md). +- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](mtmct/README.md) + +### Dataset Directory +First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip +``` + +Then, download the MIX dataset using the following command, and unzip them into `PaddleDetection/dataset/mot`: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip +``` + +The final directory is: +``` +dataset/mot + |——————image_lists + |——————caltech.10k.val + |——————caltech.all + |——————caltech.train + |——————caltech.val + |——————citypersons.train + |——————citypersons.val + |——————cuhksysu.train + |——————cuhksysu.val + |——————eth.train + |——————mot16.train + |——————mot17.train + |——————prw.train + |——————prw.val + |——————Caltech + |——————Cityscapes + |——————CUHKSYSU + |——————ETHZ + |——————MOT16 + |——————MOT17 + |——————PRW +``` + +### Data Format +These several relevant datasets have the following structure: +``` +MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train +``` +Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`. + +In the annotation text, each line is describing a bounding box and has the following format: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` +**Notes:** +- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`. +- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation. +- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1. + + +## Citations +``` +@inproceedings{Wojke2017simple, + title={Simple Online and Realtime Tracking with a Deep Association Metric}, + author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich}, + booktitle={2017 IEEE International Conference on Image Processing (ICIP)}, + year={2017}, + pages={3645--3649}, + organization={IEEE}, + doi={10.1109/ICIP.2017.8296962} +} + +@inproceedings{Wojke2018deep, + title={Deep Cosine Metric Learning for Person Re-identification}, + author={Wojke, Nicolai and Bewley, Alex}, + booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, + year={2018}, + pages={748--756}, + organization={IEEE}, + doi={10.1109/WACV.2018.00087} +} + +@article{wang2019towards, + title={Towards Real-Time Multi-Object Tracking}, + author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin}, + journal={arXiv preprint arXiv:1909.12605}, + year={2019} +} + +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} + +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} + +@article{aharon2022bot, + title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking}, + author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion}, + journal={arXiv preprint arXiv:2206.14651}, + year={2022} +} + +@article{zhou2020tracking, + title={Tracking Objects as Points}, + author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp}, + journal={ECCV}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/botsort/README.md b/PaddleDetection-release-2.6/configs/mot/botsort/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5cf981ca51fc682859c8a3d80f3a34dad36e54a0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/botsort/README.md @@ -0,0 +1,89 @@ +English | [简体中文](README_cn.md) + +# BOT_SORT (BoT-SORT: Robust Associations Multi-Pedestrian Tracking) + +## content +- [introduction](#introduction) +- [model zoo](#modelzoo) +- [Quick Start](#QuickStart) +- [Citation](Citation) + +## introduction +[BOT_SORT](https://arxiv.org/pdf/2206.14651v2.pdf)(BoT-SORT: Robust Associations Multi-Pedestrian Tracking). The configuration of common detectors is provided here for reference. Because different training data sets, input scales, number of training epochs, NMS threshold settings, etc. will lead to differences in model accuracy and performance, please adapt according to your needs + +## modelzoo + +### BOT_SORT在MOT-17 half Val Set + +| Dataset | detector | input size | detector mAP | MOTA | IDF1 | config | +| :-------- | :----- | :----: | :------: | :----: |:-----: |:----: | +| MOT-17 half train | PP-YOLOE-l | 640x640 | 52.7 | 55.5 | 64.2 |[config](./botsort_ppyoloe.yml) | + + +**Attention:** + - Model weight download link in the configuration file ` ` ` det_ Weights ` ` `, run the verification command to automatically download. + - **MOT17-half train** is a data set composed of pictures and labels of the first half frames of each video in the MOT17 train sequence (7 in total). To verify the accuracy, we can use the **MOT17-half val** to eval,It is composed of the second half frame of each video,download [link](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip),decompression `dataset/mot/` + + - BOT_ SORT training is a separate detector training MOT dataset, reasoning is to assemble a tracker to evaluate MOT indicators, and a separate detection model can also evaluate detection indicators. + - BOT_SORT export deployment is to export the detection model separately and then assemble the tracker for operation. Refer to [PP-Tracking](../../../deploy/pptracking/python)。 + - BOT_SORT is the main scheme for PP Human, PP Vehicle and other pipelines to analyze the project tracking direction. For specific use, please refer to [Pipeline](../../../deploy/pipeline) and [MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md). + + +## QuickStart + +### 1. train +Start training and evaluation with the following command +```bash +#Single gpu +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp + +#Multi gpu +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +``` + +### 2. evaluate +#### 2.1 detection +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +``` + +**Attention:** + - eval detection use ```tools/eval.py```,eval mot use ```tools/eval_mot.py```. + +#### 2.2 mot +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/botsort/botsort_ppyoloe.yml --scaled=True +``` +**Attention:** + - `--scaled` indicates whether the coordinates of the output results of the model have been scaled back to the original drawing. If the detection model used is JDE YOLOv3, it is false. If the universal detection model is used, it is true. The default value is false. + - mot result save `{output_dir}/mot_results/`,each video sequence in it corresponds to a txt, and each line of information in each txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and `{output_dir}` could use `--output_dir` to set. + +### 3. export detection model + +```bash +python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --output_dir=output_inference -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +### 4. Use the export model to predict + +```bash +# download demo video +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 +``` +**Attention:** + - You must fix `tracker_config.yml` tracker `type: BOTSORTTracker`,if you want to use BOT_SORT. + - The tracking model is used to predict videos. It does not support prediction of a single image. By default, the videos with visualized tracking results are saved. You can add `--save_mot_txts` (save a txt for each video) or `--save_mot_txt_per_img`(Save a txt for each image) or `--save_images` save the visualization picture of tracking results. + - Each line of the trace result txt file format `frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## Citation +``` +@article{aharon2022bot, + title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking}, + author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion}, + journal={arXiv preprint arXiv:2206.14651}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/botsort/README_cn.md b/PaddleDetection-release-2.6/configs/mot/botsort/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..5c92653db2fda7c182ab887608f33755a01b7c66 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/botsort/README_cn.md @@ -0,0 +1,89 @@ +简体中文 | [English](README.md) + +# BOT_SORT (BoT-SORT: Robust Associations Multi-Pedestrian Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 简介 +[BOT_SORT](https://arxiv.org/pdf/2206.14651v2.pdf)(BoT-SORT: Robust Associations Multi-Pedestrian Tracking)。此处提供了常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + +## 模型库 + +### BOT_SORT在MOT-17 half Val Set上结果 + +| 检测训练数据集 | 检测器 | 输入尺度 | 检测mAP | MOTA | IDF1 | 配置文件 | +| :-------- | :----- | :----: | :------: | :----: |:-----: |:----: | +| MOT-17 half train | PP-YOLOE-l | 640x640 | 52.7 | 55.5 | 64.2 |[配置文件](./botsort_ppyoloe.yml) | + + +**注意:** + - 模型权重下载链接在配置文件中的```det_weights```,运行验证的命令即可自动下载。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + + - BOT_SORT的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 + - BOT_SORT的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python)。 + - BOT_SORT是PP-Human和PP-Vehicle等Pipeline分析项目跟踪方向的主要方案,具体使用参照[Pipeline](../../../deploy/pipeline)和[MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md)。 + + +## 快速开始 + +### 1. 训练 +通过如下命令一键式启动训练和评估 +```bash +#单卡训练 +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp + +#多卡训练 +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +``` + +### 2. 评估 +#### 2.1 评估检测效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 2.2 评估跟踪效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/botsort/botsort_ppyoloe.yml --scaled=True +``` +**注意:** + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 导出预测模型 + +```bash +python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --output_dir=output_inference -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +### 4. 用导出的模型基于Python去预测 + +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 +``` +**注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: BOTSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{aharon2022bot, + title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking}, + author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion}, + journal={arXiv preprint arXiv:2206.14651}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/botsort/botsort_ppyoloe.yml b/PaddleDetection-release-2.6/configs/mot/botsort/botsort_ppyoloe.yml new file mode 100644 index 0000000000000000000000000000000000000000..5df704dbdda40742654696aa21fd6e872beda855 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/botsort/botsort_ppyoloe.yml @@ -0,0 +1,75 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '../bytetrack/_base_/mot17.yml', + '../bytetrack/_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/botsort_ppyoloe/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode, set 'COCO' can be training mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +ByteTrack: + detector: YOLOv3 # PPYOLOe version + reid: None + tracker: BOTSORTTracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: None + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + + +BOTSORTTracker: + track_high_thresh: 0.3 + track_low_thresh: 0.2 + new_track_thresh: 0.4 + match_thresh: 0.7 + track_buffer: 30 + min_box_area: 0 + camera_motion: False + cmc_method: 'sparseOptFlow' # only camera_motion is True, + # sparseOptFlow | files (Vidstab GMC) | orb | ecc + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/README.md b/PaddleDetection-release-2.6/configs/mot/bytetrack/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/README_cn.md b/PaddleDetection-release-2.6/configs/mot/bytetrack/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..3e896ec0447fc9179d79828a1724f9dd968d1255 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/README_cn.md @@ -0,0 +1,195 @@ +简体中文 | [English](README.md) + +# ByteTrack (ByteTrack: Multi-Object Tracking by Associating Every Detection Box) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) + - [行人跟踪](#行人跟踪) + - [人头跟踪](#人头跟踪) +- [多类别适配](#多类别适配) +- [快速开始](#快速开始) +- [引用](#引用) + + +## 简介 +[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪,而不仅是关联高分的检测框。对于低分数检测框会利用它们与轨迹片段的相似性来恢复真实对象并过滤掉背景检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + + +## 模型库 + +### 行人跟踪 + +#### 基于不同检测器的ByteTrack在 MOT-17 half Val Set 上的结果 + +| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP(0.5:0.95) | MOTA | IDF1 | FPS | 配置文件 | +| :-------- | :----- | :----: | :----:|:------: | :----: |:-----: |:----:|:----: | +| MOT-17 half train | YOLOv3 | 608x608 | - | 42.7 | 49.5 | 54.8 | - |[配置文件](./bytetrack_yolov3.yml) | +| MOT-17 half train | PP-YOLOE-l | 640x640 | - | 52.9 | 50.4 | 59.7 | - |[配置文件](./bytetrack_ppyoloe.yml) | +| MOT-17 half train | PP-YOLOE-l | 640x640 |PPLCNet| 52.9 | 51.7 | 58.8 | - |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) | +| **mix_mot_ch** | YOLOX-x | 800x1440| - | 61.9 | 77.3 | 71.6 | - |[配置文件](./bytetrack_yolox.yml) | +| **mix_det** | YOLOX-x | 800x1440| - | 65.4 | 84.5 | 77.4 | - |[配置文件](./bytetrack_yolox.yml) | + +**注意:** + - 检测任务相关配置和文档请查看[detector](detector/)。 + - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行```tools/eval_mot.py```评估的命令即可自动下载,```reid_weights```若为None则表示不需要使用。 + - **ByteTrack默认不使用ReID权重**,如需使用ReID权重,可以参考 [bytetrack_ppyoloe_pplcnet.yml](./bytetrack_ppyoloe_pplcnet.yml),如需**更换ReID权重,可改动其中的`reid_weights: `为自己的权重路径**。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + - **mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集,**mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 + + +#### YOLOX-x ByteTrack(mix_det)在 MOT-16/MOT-17 上的结果 + +[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pp-yoloe-an-evolved-version-of-yolo/multi-object-tracking-on-mot16)](https://paperswithcode.com/sota/multi-object-tracking-on-mot16?p=pp-yoloe-an-evolved-version-of-yolo) + +| 网络 | 测试集 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :---------: | :-------: | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| ByteTrack-x| MOT-17 Train | 84.4 | 72.8 | 837 | 5653 | 10985 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| **MOT-17 Test** | **78.4** | 69.7 | 4974 | 37551 | 79524 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| MOT-16 Train | 83.5 | 72.7 | 800 | 6973 | 10419 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| **MOT-16 Test** | **77.7** | 70.1 | 1570 | 15695 | 23304 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | + + +**注意:** + - **mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。 + - MOT-17 Train 和 MOT-16 Train 的指标均为本地评估该数据后的指标,由于Train集包括在了训练集中,此MOTA指标不代表模型的检测跟踪能力,只是因为MOT-17和MOT-16无验证集而它们的Train集有ground truth,是为了方便验证精度。 + - MOT-17 Test 和 MOT-16 Test 的指标均为交到 [MOTChallenge](https://motchallenge.net)官网评测后的指标,因为MOT-17和MOT-16的Test集未开放ground truth,此MOTA指标可以代表模型的检测跟踪能力。 + - ByteTrack的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 + - ByteTrack的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。 + + +### 人头跟踪 + +#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果 + +| 模型 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: | +| ByteTrack-x | 1440x800 | 64.1 | 63.4 | 4191 | 185162 | 210240 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) | + +#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: | +| ByteTrack-x | 1440x800 | 72.6 | 61.8 | 5163 | 71235 | 154139 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) | + +**注意:** + - 更多人头跟踪模型可以参考[headtracking21](../headtracking21)。 + + +## 多类别适配 + +多类别ByteTrack,可以参考 [bytetrack_ppyoloe_ppvehicle9cls.yml](./bytetrack_ppyoloe_ppvehicle9cls.yml),表示使用 [PP-Vehicle](../../ppvehicle/) 中的PPVehicle9cls数据集训好的模型权重去做多类别车辆跟踪。由于没有跟踪的ground truth标签无法做评估,故只做跟踪预测,只需修改`TestMOTDataset`确保路径存在,且其中的`anno_path`表示指定在一个`label_list.txt`中记录具体类别,需要自己手写,一行表示一个种类,注意路径`anno_path`如果写错或找不到则将默认使用COCO数据集80类的类别。 + +如需**更换检测器权重,可改动其中的`det_weights: `为自己的权重路径**,并注意**数据集路径、`label_list.txt`和类别数**做出相应更改。 + +预测多类别车辆跟踪: +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4 + +# 使用PPYOLOE 多类别车辆检测模型 +CUDA_VISIBLE_DEVICES=1 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml --video_file=bdd100k_demo.mp4 --scaled=True --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + - `--save_videos`表示保存可视化视频,同时会保存可视化的图片在`{output_dir}/mot_outputs/`中,`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 + + +## 快速开始 + +### 1. 训练 +通过如下命令一键式启动训练和评估 +```bash +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +# 或者 +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml --eval --amp +``` + +**注意:** + - ` --eval`是边训练边验证精度;`--amp`是混合精度训练避免溢出,推荐使用paddlepaddle2.2.2版本。 + +### 2. 评估 +#### 2.1 评估检测效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 2.2 评估跟踪效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolov3.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --scaled=True +``` +**注意:** + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 + +### 3. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# 使用PPYOLOe行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +# 或者使用YOLOX行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + - `--save_videos`表示保存可视化视频,同时会保存可视化的图片在`{output_dir}/mot_outputs/`中,`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 + + +### 4. 导出预测模型 + +Step 1:导出检测模型 +```bash +# 导出PPYOLOe行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +# 或者导出YOLOX行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +``` + +Step 2:导出ReID模型(可选步骤,默认不需要) +```bash +# 导出PPLCNet ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +```bash +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts +# 或者 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts +``` + +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ht21.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..8500af3165e1173cc442396ace1af54f09ab810a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ht21.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/train.json + dataset_dir: dataset/mot/HT21 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/HT21 + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/HT21 + anno_path: annotations/val_half.json + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: HT21/images/test + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_det.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_det.yml new file mode 100644 index 0000000000000000000000000000000000000000..fbe19bdaa29246919189d5d93a3ea01e3734b52c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_det.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/mot/mix_det + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + +TestDataset: + !ImageFolder + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_mot_ch.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_mot_ch.yml new file mode 100644 index 0000000000000000000000000000000000000000..a19f149301a1d993c552a12e60144f63990d6f4d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mix_mot_ch.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/mot/mix_mot_ch + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + +TestDataset: + !ImageFolder + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mot17.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mot17.yml new file mode 100644 index 0000000000000000000000000000000000000000..faf47f622d1c2847a9686dfa8d7e48a49c05436c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/mot17.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml new file mode 100644 index 0000000000000000000000000000000000000000..ef6342fd0e9249acf386b7795cb538b73a26f108 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml @@ -0,0 +1,60 @@ +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..535a977033ebc346af5cc4625986233618a26917 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml @@ -0,0 +1,66 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + mixup_epoch: 250 + use_shared_memory: true + +EvalReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT +EvalMOTReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml new file mode 100644 index 0000000000000000000000000000000000000000..48d4144221f6fa353af90ce3781a21329a566751 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml @@ -0,0 +1,67 @@ + +input_height: &input_height 800 +input_width: &input_width 1440 +input_size: &input_size [*input_height, *input_width] + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: *input_size + degrees: [-10, 10] + scale: [0.1, 2.0] + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: True + mixup_prob: 1.0 + mixup_scale: [0.5, 1.5] + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: *input_size} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 6 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 20 + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 800, 1440] + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 + + +# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 800, 1440] + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe.yml new file mode 100644 index 0000000000000000000000000000000000000000..5e7ffe07f0f758c641596e90ee0da4c31085fd85 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe.yml @@ -0,0 +1,59 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '_base_/mot17.yml', + '_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/bytetrack_ppyoloe/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode, set 'COCO' can be training mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +ByteTrack: + detector: YOLOv3 # PPYOLOe version + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: None + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.2 + low_conf_thres: 0.1 + min_box_area: 100 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..60f81165d5b324943a997dbc26fbe56f249f2ef6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml @@ -0,0 +1,59 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '_base_/mot17.yml', + '_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/bytetrack_ppyoloe_pplcnet/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +ByteTrack: + detector: YOLOv3 # PPYOLOe version + reid: PPLCNetEmbedding # use reid + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: https://bj.bcebos.com/v1/paddledet/models/mot/deepsort_pplcnet.pdparams + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.2 + low_conf_thres: 0.1 + min_box_area: 100 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml new file mode 100644 index 0000000000000000000000000000000000000000..f847a34d1e0c7af1ccaa6be33036f06a6473a7a4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml @@ -0,0 +1,49 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'bytetrack_ppyoloe.yml', + '_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/bytetrack_ppyoloe_ppvehicle9cls/model_final + +metric: MCMOT # multi-class, `MOT` for single class +num_classes: 9 +# pedestrian(1), rider(2), car(3), truck(4), bus(5), van(6), motorcycle(7), bicycle(8), others(9) +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + anno_path: dataset/mot/label_list.txt # absolute path + +### write in label_list.txt each line: +# pedestrian +# rider +# car +# truck +# bus +# van +# motorcycle +# bicycle +# others +### + +det_weights: https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle9cls.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.2 + low_conf_thres: 0.1 + min_box_area: 0 + vertical_ratio: 0 # only use 1.6 in MOT17 pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolov3.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolov3.yml new file mode 100644 index 0000000000000000000000000000000000000000..0ce35ae7831d36ead906a60ccbd5632f1b147b2e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolov3.yml @@ -0,0 +1,50 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/yolov3_darknet53_40e_608x608_mot17half.yml', + '_base_/mot17.yml', + '_base_/yolov3_mot_reader_608x608.yml' +] +weights: output/bytetrack_yolov3/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams +ByteTrack: + detector: YOLOv3 # General YOLOv3 version + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolov3_darknet53_40e_608x608_mot17half.pdparams +reid_weights: None + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.2 + low_conf_thres: 0.1 + min_box_area: 100 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox.yml new file mode 100644 index 0000000000000000000000000000000000000000..2e195c56d00cfc696e93fee4e9f709f123b5dcec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox.yml @@ -0,0 +1,68 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/yolox_x_24e_800x1440_mix_det.yml', + '_base_/mix_det.yml', + '_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/bytetrack_yolox/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.6 + low_conf_thres: 0.2 + min_box_area: 100 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox_ht21.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox_ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..ea21a87c5ed1ec8297155c80b8e7136e1941c636 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/bytetrack_yolox_ht21.yml @@ -0,0 +1,68 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/yolox_x_24e_800x1440_ht21.yml', + '_base_/ht21.yml', + '_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/bytetrack_yolox_ht21/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_ht21.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 30000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.7 + low_conf_thres: 0.1 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README.md b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README_cn.md b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..6de434cb4c7c5b80084b926bfc5dd70cbf7e196e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/README_cn.md @@ -0,0 +1,39 @@ +简体中文 | [English](README.md) + +# ByteTrack的检测器 + +## 简介 +[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪,而不仅是关联高分的检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + +## 模型库 + +### 在MOT17-half val数据集上的检测结果 +| 骨架网络 | 网络类型 | 输入尺度 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :------: | :-----: | +| DarkNet-53 | YOLOv3 | 608X608 | 40e | ---- | 42.7 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams) | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) | +| CSPResNet | PPYOLOe | 640x640 | 36e | ---- | 52.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams) | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml) | +| CSPDarkNet | YOLOX-x(mix_mot_ch) | 800x1440 | 24e | ---- | 61.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_mot_ch.pdparams) | [配置文件](./yolox_x_24e_800x1440_mix_mot_ch.yml) | +| CSPDarkNet | YOLOX-x(mix_det) | 800x1440 | 24e | ---- | 65.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./yolox_x_24e_800x1440_mix_det.yml) | + +**注意:** + - 以上模型除YOLOX外采用**MOT17-half train**数据集训练,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载,并解压放在`dataset/mot/MOT17/images/`文件夹下。 + - YOLOX-x(mix_mot_ch)采用**mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集;YOLOX-x(mix_det)采用**mix_det**数据集,是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 + - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。 + - 用于ByteTrack跟踪时,这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。 + + +## 快速开始 + +通过如下命令一键式启动评估、评估和导出 +```bash +job_name=ppyoloe_crn_l_36e_640x640_mot17half +config=configs/mot/bytetrack/detector/${job_name}.yml +log_dir=log_dir/${job_name} +# 1. training +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp +# 2. evaluation +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=output/${job_name}/model_final.pdparams +# 3. export +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=output/${job_name}/model_final.pdparams +``` diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..6c770e9bf85e953a30df43faf57c401518b7f6ad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml @@ -0,0 +1,83 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../ppyoloe/ppyoloe_crn_l_300e_coco.yml', + '../_base_/mot17.yml', +] +weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final +log_iter: 20 +snapshot_epoch: 2 + + +# schedule configuration for fine-tuning +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0.001 + epochs: 1 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 8 + + +# detector configuration +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..9b9df0f390133da5aaa4c4802245dce8d8d10229 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml @@ -0,0 +1,77 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolov3/yolov3_darknet53_270e_coco.yml', + '../_base_/mot17.yml', +] +weights: output/yolov3_darknet53_40e_608x608_mot17half/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 32 + - 36 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 100 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +TrainReader: + batch_size: 8 + mixup_epoch: 35 + +# detector configuration +architecture: YOLOv3 +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..bd102a48d1013b9e6399411562b47e1e85e2c2ec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/ht21.yml', +] +weights: output/yolox_x_24e_800x1440_ht21/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.0005 # fintune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 4 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 32] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml new file mode 100644 index 0000000000000000000000000000000000000000..2585e5a47ac0589f7d673803a5172b42f3b902bc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/mix_det.yml', +] +weights: output/yolox_x_24e_800x1440_mix_det/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.00075 # fintune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 6 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml new file mode 100644 index 0000000000000000000000000000000000000000..ae0fba92e76f267a89ff88702811fe4fc332a6ad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/mix_mot_ch.yml', +] +weights: output/yolox_x_24e_800x1440_mix_mot_ch/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.00075 # fine-tune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 6 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/README.md b/PaddleDetection-release-2.6/configs/mot/centertrack/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/README_cn.md b/PaddleDetection-release-2.6/configs/mot/centertrack/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..a91a844402ac3ddbcad27b44938fb35438c44e49 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/README_cn.md @@ -0,0 +1,156 @@ +简体中文 | [English](README.md) + +# CenterTrack (Tracking Objects as Points) + +## 内容 +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 模型库 + +### MOT17 + +| 训练数据集 | 输入尺度 | 总batch_size | val MOTA | test MOTA | FPS | 配置文件 | 下载链接| +| :---------------: | :-------: | :------------: | :----------------: | :---------: | :-------: | :----: | :-----: | +| MOT17-half train | 544x960 | 32 | 69.2(MOT17-half) | - | - |[config](./centertrack_dla34_70e_mot17half.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17half.pdparams) | +| MOT17 train | 544x960 | 32 | 87.9(MOT17-train) | 70.5(MOT17-test) | - |[config](./centertrack_dla34_70e_mot17.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17.pdparams) | +| MOT17 train(paper) | 544x960| 32 | - | 67.8(MOT17-test) | - | - | - | + + +**注意:** + - CenterTrack默认使用2 GPUs总batch_size为32进行训练,如改变GPU数或单卡batch_size,最好保持总batch_size为32去训练。 + - **val MOTA**可能会有1.0 MOTA左右的波动,最好使用2 GPUs和总batch_size为32的默认配置去训练。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的**前一半帧**的图片和标注用作训练集,而用每个视频的后一半帧组成的**MOT17-half val**作为验证集去评估得到**val MOTA**,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + - **MOT17 train**是MOT17的train序列(共7个)每个视频的所有帧的图片和标注用作训练集,由于MOT17数据集有限也使用**MOT17 train**数据集去评估得到**val MOTA**,而**test MOTA**为交到[MOT Challenge官网](https://motchallenge.net)评测的结果。 + + +## 快速开始 + +### 1.训练 +通过如下命令一键式启动训练和评估 +```bash +# 单卡训练(不推荐) +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --amp +# 多卡训练 +python -m paddle.distributed.launch --log_dir=centertrack_dla34_70e_mot17half/ --gpus 0,1 tools/train.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --amp +``` +**注意:** + - `--eval`暂不支持边训练边验证跟踪的MOTA精度,如果需要开启`--eval`边训练边验证检测mAP,需设置**注释配置文件中的`mot_metric: True`和`metric: MOT`**; + - `--amp`表示混合精度训练避免显存溢出; + - CenterTrack默认使用2 GPUs总batch_size为32进行训练,如改变GPU数或单卡batch_size,最好保持总batch_size仍然为32; + + +### 2.评估 + +#### 2.1 评估检测效果 + +注意首先需要**注释配置文件中的`mot_metric: True`和`metric: MOT`**: +```python +### for detection eval.py/infer.py +mot_metric: False +metric: COCO + +### for MOT eval_mot.py/infer_mot_mot.py +#mot_metric: True # 默认是不注释的,评估跟踪需要为 True,会覆盖之前的 mot_metric: False +#metric: MOT # 默认是不注释的,评估跟踪需要使用 MOT,会覆盖之前的 metric: COCO +``` + +然后执行以下语句: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 2.2 评估跟踪效果 + +注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**; + +然后执行以下语句: + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams +``` +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 + + +### 3.预测 + +#### 3.1 预测检测效果 +注意首先需要**注释配置文件中的`mot_metric: True`和`metric: MOT`**: +```python +### for detection eval.py/infer.py +mot_metric: False +metric: COCO + +### for MOT eval_mot.py/infer_mot_mot.py +#mot_metric: True # 默认是不注释的,评估跟踪需要为 True,会覆盖之前的 mot_metric: False +#metric: MOT # 默认是不注释的,评估跟踪需要使用 MOT,会覆盖之前的 metric: COCO +``` + +然后执行以下语句: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5 +``` + +**注意:** + - 预测检测使用的是```tools/infer.py```, 预测跟踪使用的是```tools/infer_mot.py```。 + + +#### 3.2 预测跟踪效果 + +注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**; + +然后执行以下语句: +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 +# 预测视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --video_file=mot17_demo.mp4 --draw_threshold=0.5 --save_videos -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams +#或预测图片文件夹 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --image_dir=mot17_demo/ --draw_threshold=0.5 --save_videos -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--save_videos`表示保存可视化视频,同时会保存可视化的图片在`{output_dir}/mot_outputs/`中,`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 + + +### 4. 导出预测模型 + +注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**; + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17half.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +注意首先应在`deploy/python/tracker_config.yml`中设置`type: CenterTracker`。 + +```bash +# 预测某个视频 +# wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 +python deploy/python/mot_centertrack_infer.py --model_dir=output_inference/centertrack_dla34_70e_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_images=True --save_mot_txts +# 预测图片文件夹 +python deploy/python/mot_centertrack_infer.py --model_dir=output_inference/centertrack_dla34_70e_mot17half/ --tracker_config=deploy/python/tracker_config.yml --image_dir=mot17_demo/ --device=GPU --save_images=True --save_mot_txts +``` + +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{zhou2020tracking, + title={Tracking Objects as Points}, + author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp}, + journal={ECCV}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_dla34.yml b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_dla34.yml new file mode 100644 index 0000000000000000000000000000000000000000..159165bd159ff7f5ee310b546b5a137fbf470259 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_dla34.yml @@ -0,0 +1,57 @@ +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams +architecture: CenterTrack +for_mot: True +mot_metric: True + +### model +CenterTrack: + detector: CenterNet + plugin_head: CenterTrackHead + tracker: CenterTracker + + +### CenterTrack.detector +CenterNet: + backbone: DLA + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + for_mot: True # Note + +DLA: + depth: 34 + pre_img: True # Note + pre_hm: True # Note + +CenterNetDLAFPN: + down_ratio: 4 + last_level: 5 + out_channel: 0 + dcn_v2: True + +CenterNetHead: + head_planes: 256 + prior_bias: -4.6 # Note + regress_ltrb: False + size_loss: 'L1' + loss_weight: {'heatmap': 1.0, 'size': 0.1, 'offset': 1.0} + +CenterNetPostProcess: + max_per_img: 100 # top-K + regress_ltrb: False + + +### CenterTrack.plugin_head +CenterTrackHead: + head_planes: 256 + task: tracking + loss_weight: {'tracking': 1.0, 'ltrb_amodal': 0.1} + add_ltrb_amodal: True + + +### CenterTrack.tracker +CenterTracker: + min_box_area: -1 + vertical_ratio: -1 + track_thresh: 0.4 + pre_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_reader.yml b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a5bf6fda60242be1628635bd97eac4d0a85bb2b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/centertrack_reader.yml @@ -0,0 +1,75 @@ +input_h: &input_h 544 +input_w: &input_w 960 +input_size: &input_size [*input_h, *input_w] +pre_img_epoch: &pre_img_epoch 70 # Add previous image as input + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - FlipWarpAffine: + keep_res: False + input_h: *input_h + input_w: *input_w + not_rand_crop: False + flip: 0.5 + is_scale: True + use_random: True + add_pre_img: True + - CenterRandColor: {saturation: 0.4, contrast: 0.4, brightness: 0.4} + - Lighting: {alphastd: 0.1, eigval: [0.2141788, 0.01817699, 0.00341571], eigvec: [[-0.58752847, -0.69563484, 0.41340352], [-0.5832747, 0.00994535, -0.81221408], [-0.56089297, 0.71832671, 0.41158938]]} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: False} + - Permute: {} + - Gt2CenterTrackTarget: + down_ratio: 4 + max_objs: 256 + hm_disturb: 0.05 + lost_disturb: 0.4 + fp_disturb: 0.1 + pre_hm: True + add_tracking: True + add_ltrb_amodal: True + batch_size: 16 # total 32 for 2 GPUs + shuffle: True + drop_last: True + collate_batch: True + use_shared_memory: True + pre_img_epoch: *pre_img_epoch + + +EvalReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: True, input_h: *input_h, input_w: *input_w} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: True, input_h: *input_h, input_w: *input_w} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} + - Permute: {} + batch_size: 1 + fuse_normalize: True + + +EvalMOTReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: False, input_h: *input_h, input_w: *input_w} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + sample_transforms: + - Decode: {} + - WarpAffine: {keep_res: False, input_h: *input_h, input_w: *input_w} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} + - Permute: {} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/optimizer_70e.yml b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/optimizer_70e.yml new file mode 100644 index 0000000000000000000000000000000000000000..a336290f2cecb9597b8c5fe351f132eef3235e4c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/_base_/optimizer_70e.yml @@ -0,0 +1,14 @@ +epoch: 70 + +LearningRate: + base_lr: 0.000125 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [60] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17.yml b/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17.yml new file mode 100644 index 0000000000000000000000000000000000000000..2888a01747a078af34a92dfae014358f61bc668d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17.yml @@ -0,0 +1,66 @@ +_BASE_: [ + '_base_/optimizer_70e.yml', + '_base_/centertrack_dla34.yml', + '_base_/centertrack_reader.yml', + '../../runtime.yml', +] +log_iter: 20 +snapshot_epoch: 5 +weights: output/centertrack_dla34_70e_mot17/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams + + +### for Detection eval.py/infer.py +# mot_metric: False +# metric: COCO + +### for MOT eval_mot.py/infer_mot_mot.py +mot_metric: True +metric: MOT + + +worker_num: 4 +TrainReader: + batch_size: 16 # total 32 for 2 GPUs + +EvalReader: + batch_size: 1 + +EvalMOTReader: + batch_size: 1 + + +# COCO style dataset for training +num_classes: 1 +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_track_id'] + # add 'gt_track_id', the boxes annotations of json file should have 'gt_track_id' + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot/MOT17 + data_root: images/train # set 'images/test' for MOTChallenge test + keep_ori_im: True # set True if save visualization images or video, or used in SDE MOT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot/MOT17 + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..a15dfdc70d7b073659796ea39e92485d56ccd654 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml @@ -0,0 +1,66 @@ +_BASE_: [ + '_base_/optimizer_70e.yml', + '_base_/centertrack_dla34.yml', + '_base_/centertrack_reader.yml', + '../../runtime.yml', +] +log_iter: 20 +snapshot_epoch: 5 +weights: output/centertrack_dla34_70e_mot17half/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams + + +### for Detection eval.py/infer.py +# mot_metric: False +# metric: COCO + +### for MOT eval_mot.py/infer_mot.py +mot_metric: True +metric: MOT + + +worker_num: 4 +TrainReader: + batch_size: 16 # total 32 for 2 GPUs + +EvalReader: + batch_size: 1 + +EvalMOTReader: + batch_size: 1 + + +# COCO style dataset for training +num_classes: 1 +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_track_id'] + # add 'gt_track_id', the boxes annotations of json file should have 'gt_track_id' + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot/MOT17 + data_root: images/half + keep_ori_im: True # set True if save visualization images or video, or used in SDE MOT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot/MOT17 + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/README.md b/PaddleDetection-release-2.6/configs/mot/deepsort/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/README_cn.md b/PaddleDetection-release-2.6/configs/mot/deepsort/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..08bee2e1e4c173c426c608562a4bcd4334bcc5e7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/README_cn.md @@ -0,0 +1,232 @@ +简体中文 | [English](README.md) + +# DeepSORT (Deep Cosine Metric Learning for Person Re-identification) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [适配其他检测器](#适配其他检测器) +- [引用](#引用) + +## 简介 +[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`和`PPLCNet`模型。 + +## 模型库 + +### DeepSORT在MOT-16 Training Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 | +| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: | +| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | +| ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) | +| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | +| PPLCNet | 1088x608 | 68.1 | 53.6 | 1979 | 17446 | 15766 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | + +### DeepSORT在MOT-16 Test Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 | +| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: | +| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | +| ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) | +| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | +| PPLCNet | 1088x608 | 61.1 | 48.8 | 2010 | 25401 | 43432 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | + + +### DeepSORT在MOT-17 half Val Set上结果 + +| 检测训练数据集 | 检测器 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 | +| :-------- | :----- | :----: |:------: | :----: |:-----: |:----:|:----: | +| MIX | JDE YOLOv3 | PCB Pyramid | - | 66.9 | 62.7 | - |[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) | +| MIX | JDE YOLOv3 | PPLCNet | - | 66.3 | 62.1 | - |[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | +| MOT-17 half train | YOLOv3 | PPLCNet | 42.7 | 50.2 | 52.4 | - |[配置文件](./deepsort_yolov3_pplcnet.yml) | +| MOT-17 half train | PPYOLOv2 | PPLCNet | 46.8 | 51.8 | 55.8 | - |[配置文件](./deepsort_ppyolov2_pplcnet.yml) | +| MOT-17 half train | PPYOLOe | PPLCNet | 52.7 | 56.7 | 60.5 | - |[配置文件](./deepsort_ppyoloe_pplcnet.yml) | +| MOT-17 half train | PPYOLOe | ResNet-50 | 52.7 | 56.7 | 64.6 | - |[配置文件](./deepsort_ppyoloe_resnet.yml) | + +**注意:** +模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行验证的命令即可自动下载。 +DeepSORT是分离检测器和ReID模型的,其中检测器单独训练MOT数据集,而组装成DeepSORT后只用于评估,现在支持两种评估的方式。 +- **方式1**:加载检测结果文件和ReID模型,在使用DeepSORT模型评估之前,应该首先通过一个检测模型得到检测结果,然后像这样准备好结果文件: +``` +det_results_dir + |——————MOT16-02.txt + |——————MOT16-04.txt + |——————MOT16-05.txt + |——————MOT16-09.txt + |——————MOT16-10.txt + |——————MOT16-11.txt + |——————MOT16-13.txt +``` +对于MOT16数据集,可以下载PaddleDetection提供的一个经过匹配之后的检测框结果det_results_dir.zip并解压: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip +``` +如果使用更强的检测模型,可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下: +``` +[frame_id],[x0],[y0],[w],[h],[score],[class_id] +``` +- `frame_id`是图片帧的序号 +- `x0,y0`是目标框的左上角x和y坐标 +- `w,h`是目标框的像素宽高 +- `score`是目标框的得分 +- `class_id`是目标框的类别,如果只有1类则是`0` + +- **方式2**:同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_yoloe_pplcnet.yml`进行修改。 + +## 快速开始 + +### 1. 评估 + +#### 1.1 评估检测效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 1.2 评估跟踪效果 +**方式1**:加载检测结果文件和ReID模型,得到跟踪结果 +```bash +# 下载PaddleDetection提供的MOT16数据集检测结果文件并解压,如需自己使用其他检测器生成请参照这个文件里的格式 +wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip + +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir det_results_dir +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir det_results_dir +``` + +**方式2**:加载行人检测模型和ReID模型,得到跟踪结果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyoloe_resnet.yml --scaled=True +``` +**注意:** + - JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的,因此MOTA较高。而其他通用检测模型如PPYOLOv2只使用了MOT17 half数据集训练。 + - JDE YOLOv3模型与通用检测模型如YOLOv3和PPYOLOv2最大的区别是使用了JDEBBoxPostProcess后处理,结果输出坐标没有缩放回原图,而通用检测模型输出坐标是缩放回原图的。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 2. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# 加载JDE YOLOv3行人检测模型和PCB Pyramid ReID模型,并保存为视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file=mot17_demo.mp4 --save_videos + +# 或者加载PPYOLOE行人检测模型和PPLCNet ReID模型,并保存为视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + + +### 3. 导出预测模型 + +Step 1:导出检测模型 +```bash +# 导出JDE YOLOv3行人检测模型 +CUDA_VISIBLE_DEVICES=0 python3.7 tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams + +# 或导出PPYOLOE行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +Step 2:导出ReID模型 +```bash +# 导出PCB Pyramid ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams + +# 或者导出PPLCNet ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + +# 或者导出ResNet ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_resnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams +``` + +### 4. 用导出的模型基于Python去预测 + +```bash +# 用导出的PPYOLOE行人检测模型和PPLCNet ReID模型 +python3.7 deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts --threshold=0.5 +``` +**注意:** + - 运行前需要先改动`deploy/pptracking/python/tracker_config.yml`里的tracker为`DeepSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示对每个视频保存一个txt,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 适配其他检测器 + +### 1、配置文件目录说明 +- `detector/xxx.yml`是纯粹的检测模型配置文件,如`detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml`,支持检测的所有流程(train/eval/infer/export/deploy)。DeepSORT跟踪的eval/infer与这个纯检测的yml文件无关,但是export的时候需要这个纯检测的yml单独导出检测模型,DeepSORT跟踪导出模型是分开detector和reid分别导出的,用户可自行定义和组装detector+reid成为一个完整的DeepSORT跟踪系统。 +- `detector/`下的检测器配置文件中,用户需要将自己的数据集转为COCO格式。由于ID的真实标签不需要参与进去,用户可以在此自行配置任何检测模型,只需保证输出结果包含结果框的种类、坐标和分数即可。 +- `reid/deepsort_yyy.yml`文件夹里的是ReID模型和tracker的配置文件,如`reid/deepsort_pplcnet.yml`,此处ReID模型是由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`deepsort_pcb_pyramid_r101.yml`和`deepsort_pplcnet.yml`,是在Market1501(751类人)行人ReID数据集上训练得到的,训练细节待PaddleClas公布。 +- `deepsort_xxx_yyy.yml`是一个完整的DeepSORT跟踪的配置,如`deepsort_ppyolov2_pplcnet.yml`,其中检测部分`xxx`是`detector/`里的,reid和tracker部分`yyy`是`reid/`里的。 +- DeepSORT跟踪的eval/infer有两种方式,方式1是只使用`reid/deepsort_yyy.yml`加载检测结果文件和`yyy`ReID模型,方式2是使用`deepsort_xxx_yyy.yml`加载`xxx`检测模型和`yyy`ReID模型,但是DeepSORT跟踪的deploy必须使用`deepsort_xxx_yyy.yml`。 +- 检测器的eval/infer/deploy只使用到`detector/xxx.yml`,ReID一般不单独使用,如需单独使用必须提前加载检测结果文件然后只使用`reid/deepsort_yyy.yml`。 + + +### 2、适配的具体步骤 +1.先将数据集制作成COCO格式按通用检测模型配置来训练,参照`detector/`文件夹里的模型配置文件,制作生成`detector/xxx.yml`, 已经支持有Faster R-CNN、YOLOv3、PPYOLOv2、JDE YOLOv3和PicoDet等模型。 + +2.制作`deepsort_xxx_yyy.yml`, 其中`DeepSORT.detector`的配置就是`detector/xxx.yml`里的, `EvalMOTDataset`和`det_weights`可以自行设置。`yyy`是`reid/deepsort_yyy.yml`如`reid/deepsort_pplcnet.yml`。 + +### 3、使用的具体步骤 +#### 1.加载检测模型和ReID模型去评估: +``` +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --scaled=True +``` +#### 2.加载检测模型和ReID模型去推理: +``` +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +``` +#### 3.导出检测模型和ReID模型: +```bash +# 导出检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/xxx.yml +# 导出ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_yyy.yml +``` +#### 4.使用导出的检测模型和ReID模型去部署: +``` +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file=mot17_demo.mp4 --device=GPU --scaled=True --save_mot_txts +``` +**注意:** + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + + +## 引用 +``` +@inproceedings{Wojke2017simple, + title={Simple Online and Realtime Tracking with a Deep Association Metric}, + author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich}, + booktitle={2017 IEEE International Conference on Image Processing (ICIP)}, + year={2017}, + pages={3645--3649}, + organization={IEEE}, + doi={10.1109/ICIP.2017.8296962} +} + +@inproceedings{Wojke2018deep, + title={Deep Cosine Metric Learning for Person Re-identification}, + author={Wojke, Nicolai and Bewley, Alex}, + booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, + year={2018}, + pages={748--756}, + organization={IEEE}, + doi={10.1109/WACV.2018.00087} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/deepsort_reader_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/deepsort_reader_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ab950aa94e0ea203ea6184d7e3910164ef85993 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/deepsort_reader_1088x608.yml @@ -0,0 +1,22 @@ +# DeepSORT does not need to train on MOT dataset, only used for evaluation. +# MOT dataset needs to be trained on the detector(like YOLOv3) only using bboxes. +# And gt IDs don't need to be trained. + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/mot17.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/mot17.yml new file mode 100644 index 0000000000000000000000000000000000000000..faf47f622d1c2847a9686dfa8d7e48a49c05436c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/_base_/mot17.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml new file mode 100644 index 0000000000000000000000000000000000000000..066e0ec08d1d99c6764f6d8ff3768e57f2a01563 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml @@ -0,0 +1,71 @@ +_BASE_: [ + 'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # JDE version YOLOv3 + reid: PCBPyramid + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml' +PCBPyramid: + model_name: "ResNet101" + num_conv_out_channels: 128 + num_classes: 751 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: JDE version YOLOv3 +# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml' +# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image. +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + +# Tracking requires higher quality boxes, so decode.conf_thresh will be higher +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.3 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.5 + nms_top_k: 2000 + normalized: true + return_idx: false diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..a412c9e5afb38e44715f0c7fa6c95b679fe2aa33 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml @@ -0,0 +1,70 @@ +_BASE_: [ + 'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # JDE version YOLOv3 + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml' +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: JDE version YOLOv3 +# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml' +# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image. +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + +# Tracking requires higher quality boxes, so decode.conf_thresh will be higher +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.3 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.5 + nms_top_k: 2000 + normalized: true + return_idx: false diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..0af80a7d899f02ac4b66c5191b2616ed1db1aa8e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml @@ -0,0 +1,109 @@ +_BASE_: [ + 'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + +# reader +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # PPYOLOe version + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml' +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: PPYOLOe version +# see 'configs/mot/deepsort/detector/ppyoloe_crn_l_300e_640x640_mot17half.yml' +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.4 # 0.01 in original detector + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..d9692304b055040bb22c49a2f90e05e4e7ba53eb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml @@ -0,0 +1,108 @@ +_BASE_: [ + 'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams + +# reader +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # PPYOLOe version + reid: ResNetEmbedding + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_resnet.yml' +ResNetEmbedding: + model_name: "ResNet50" + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: PPYOLOe version +# see 'configs/mot/deepsort/detector/ppyoloe_crn_l_300e_640x640_mot17half.yml' +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.4 # 0.01 in original detector + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..8cd393e457b69588140b1c00b5e20ddb69932f5d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml @@ -0,0 +1,98 @@ +_BASE_: [ + 'detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + +# reader +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # PPYOLOv2 version + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml' +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: PPYOLOv2 version +# see 'configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml' +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.25 # 0.01 in original detector + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.4 # 0.01 in original detector + post_threshold: 0.4 # 0.01 in original detector + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..adb0aa9a07daea757a9d67119a71a4ee8e1d9e68 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml @@ -0,0 +1,87 @@ +_BASE_: [ + 'detector/yolov3_darknet53_40e_608x608_mot17half.yml', + '_base_/mot17.yml', + '_base_/deepsort_reader_1088x608.yml', +] +metric: MOT +num_classes: 1 + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT + +det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + +# reader +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +# DeepSORT configuration +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: YOLOv3 # General YOLOv3 version + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + + +# reid and tracker configuration +# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml' +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter + + +# detector configuration: General YOLOv3 version +# see 'configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml' +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.3 # 0.01 in original detector + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README.md b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4cfe273fee2add8bf970f503fa0a7fd363435f22 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README.md @@ -0,0 +1,34 @@ +English | [简体中文](README_cn.md) + +# Detector for DeepSORT + +## Introduction +[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of a detector and a ReID model in series. The configs of several common detectors are provided here as a reference. Note that different training dataset, backbone, input size, training epochs and NMS threshold will lead to differences in model accuracy and performance. Please adapt according to your needs. + +## Model Zoo +### Results on MOT17-half dataset +| Backbone | Model | input size | lr schedule | FPS | Box AP | download | config | +| :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :----------: | :-----: | +| DarkNet-53 | YOLOv3 | 608X608 | 40e | ---- | 42.7 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams) | [config](./yolov3_darknet53_40e_608x608_mot17half.yml) | +| ResNet50-vd | PPYOLOv2 | 640x640 | 365e | ---- | 46.8 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams) | [config](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) | +| CSPResNet | PPYOLOe | 640x640 | 36e | ---- | 52.9 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams) | [config](./ppyoloe_crn_l_36e_640x640_mot17half.yml) | + +**Notes:** + - The above models are trained with **MOT17-half train** set, it can be downloaded from this [link](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip). + - **MOT17-half train** set is a dataset composed of pictures and labels of the first half frame of each video in MOT17 Train dataset (7 sequences in total). **MOT17-half val set** is used for evaluation, which is composed of the second half frame of each video. They can be downloaded from this [link](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip). Download and unzip it in the `dataset/mot/MOT17/images/`folder. + - YOLOv3 is trained with the same pedestrian dataset as `configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml`, which is not open yet. + - For pedestrian tracking, please use pedestrian detector combined with pedestrian ReID model. For vehicle tracking, please use vehicle detector combined with vehicle ReID model. + - High quality detected boxes are required for DeepSORT tracking, so the post-processing settings such as NMS threshold of these models are different from those in pure detection tasks. + +## Quick Start + +Start the training and evaluation with the following command +```bash +job_name=ppyoloe_crn_l_36e_640x640_mot17half +config=configs/mot/deepsort/detector/${job_name}.yml +log_dir=log_dir/${job_name} +# 1. training +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp --fleet +# 2. evaluation +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams +``` diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README_cn.md b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..050cc8ade944190bef0931909d34724a5b99cb54 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/README_cn.md @@ -0,0 +1,36 @@ +简体中文 | [English](README.md) + +# DeepSORT的检测器 + +## 简介 +[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成,此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + +## 模型库 + +### 在MOT17-half val数据集上的检测结果 +| 骨架网络 | 网络类型 | 输入尺度 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :------: | :-----: | +| DarkNet-53 | YOLOv3 | 608X608 | 40e | ---- | 42.7 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams) | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) | +| ResNet50-vd | PPYOLOv2 | 640x640 | 365e | ---- | 46.8 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams) | [配置文件](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) | +| CSPResNet | PPYOLOe | 640x640 | 36e | ---- | 52.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams) | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml) | + +**注意:** + - 以上模型均可采用**MOT17-half train**数据集训练,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载,并解压放在`dataset/mot/MOT17/images/`文件夹下。 + - YOLOv3和`configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml`是相同的pedestrian数据集训练的,此数据集暂未开放。 + - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。 + - 用于DeepSORT跟踪时需要高质量的检出框,因此这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。 + + +## 快速开始 + +通过如下命令一键式启动训练和评估 +```bash +job_name=ppyoloe_crn_l_36e_640x640_mot17half +config=configs/mot/deepsort/detector/${job_name}.yml +log_dir=log_dir/${job_name} +# 1. training +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp +# 2. evaluation +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/${job_name}.pdparams +``` diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml new file mode 100644 index 0000000000000000000000000000000000000000..d895b0ae51ae0fda9266657bd183604af2e213cc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml @@ -0,0 +1,83 @@ +_BASE_: [ + '../../../datasets/mot.yml', + '../../../runtime.yml', + '../../jde/_base_/optimizer_30e.yml', + '../../jde/_base_/jde_reader_1088x608.yml', +] +weights: output/jde_yolov3_darknet53_30e_1088x608_mix/model_final + +metric: MOTDet +num_classes: 1 +EvalReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + +EvalDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.half'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +TestDataset: + !ImageFolder + anno_path: None + + +# detector configuration +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams + +# JDE version for MOT dataset +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + freeze_norm: True + +YOLOv3FPN: + freeze_norm: True + +YOLOv3Head: + anchors: [[128,384], [180,540], [256,640], [512,640], + [32,96], [45,135], [64,192], [90,271], + [8,24], [11,34], [16,48], [23,68]] + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + loss: JDEDetectionLoss + +JDEDetectionLoss: + for_mot: False + +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.3 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.5 + nms_top_k: 2000 + normalized: true + return_idx: false diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..a0501222c9f35d657826fb525e54bd7f4f663ae4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml @@ -0,0 +1,82 @@ +_BASE_: [ + '../../../ppyoloe/ppyoloe_crn_l_300e_coco.yml', + '../_base_/mot17.yml', +] +weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final +log_iter: 20 +snapshot_epoch: 2 + + +# schedule configuration for fine-tuning +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0.001 + epochs: 1 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 8 + + +# detector configuration +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..cc55e46e722d9c93f76cc96278a6faa0cf29d3ef --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml @@ -0,0 +1,75 @@ +_BASE_: [ + '../../../ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml', + '../_base_/mot17.yml', +] +weights: output/ppyolov2_r50vd_dcn_365e_640x640_mot17half/model_final +log_iter: 20 +snapshot_epoch: 2 + + +# detector configuration +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml new file mode 100644 index 0000000000000000000000000000000000000000..9ab55f977b9c76e794ce1e6eb83172b459ba4d27 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml @@ -0,0 +1,76 @@ +_BASE_: [ + '../../../yolov3/yolov3_darknet53_270e_coco.yml', + '../_base_/mot17.yml', +] +weights: output/yolov3_darknet53_40e_608x608_mot17half/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 32 + - 36 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 100 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +TrainReader: + batch_size: 8 + mixup_epoch: 35 + +# detector configuration +architecture: YOLOv3 +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README.md b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README.md new file mode 100644 index 0000000000000000000000000000000000000000..57a81b4fa5e059f3bd6c0d1e001d3fa19818f8b6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README.md @@ -0,0 +1,26 @@ +English | [简体中文](README_cn.md) + +# ReID of DeepSORT + +## Introduction +[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of detector and ReID model in series. Several common ReID models are provided here for the configs of DeepSORT as a reference. + +## Model Zoo + +### Results on Market1501 pedestrian ReID dataset + +| Backbone | Model | Params | FPS | mAP | Top1 | Top5 | download | config | +| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: | +| ResNet-101 | PCB Pyramid Embedding | 289M | --- | 86.31 | 94.95 | 98.28 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | [config](./deepsort_pcb_pyramid_r101.yml) | +| PPLCNet-2.5x | PPLCNet Embedding | 36M | --- | 71.59 | 87.38 | 95.49 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams) | [config](./deepsort_pplcnet.yml) | + +### Results on VERI-Wild vehicle ReID dataset + +| Backbone | Model | Params | FPS | mAP | Top1 | Top5 | download | config | +| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: | +| PPLCNet-2.5x | PPLCNet Embedding | 93M | --- | 82.44 | 93.54 | 98.53 | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams) | [config](./deepsort_pplcnet_vehicle.yml) | + +**Notes:** + - ReID models are provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), the specific training process and code will be published by PaddleClas. + - For pedestrian tracking, please use the **Market1501** pedestrian ReID model in combination with a pedestrian detector. + - For vehicle tracking, please use the **VERI-Wild** vehicle ReID model in combination with a vehicle detector. diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README_cn.md b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..d5930b652c456486b491c6ecec5b3739ac028f8b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/README_cn.md @@ -0,0 +1,26 @@ +简体中文 | [English](README.md) + +# DeepSORT的ReID模型 + +## 简介 +[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成,此处提供了几个常用ReID模型的配置作为DeepSORT使用的参考。 + +## 模型库 + +### 在Market1501行人重识别数据集上的结果 + +| 骨架网络 | 网络类型 | Params | FPS | mAP | Top1 | Top5 | 下载链接 | 配置文件 | +| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: | +| ResNet-101 | PCB Pyramid Embedding | 289M | --- | 86.31 | 94.95 | 98.28 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | [配置文件](./deepsort_pcb_pyramid_r101.yml) | +| PPLCNet-2.5x | PPLCNet Embedding | 36M | --- | 71.59 | 87.38 | 95.49 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams) | [配置文件](./deepsort_pplcnet.yml) | + +### 在VERI-Wild车辆重识别数据集上的结果 + +| 骨架网络 | 网络类型 | Params | FPS | mAP | Top1 | Top5 | 下载链接 | 配置文件 | +| :-------------: | :-----------------: | :-------: | :------: | :-------: | :-------: | :-------: | :-------: | :-------: | +| PPLCNet-2.5x | PPLCNet Embedding | 93M | --- | 82.44 | 93.54 | 98.53 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams) | [配置文件](./deepsort_pplcnet_vehicle.yml) | + +**注意:** + - ReID模型由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供,具体训练流程和代码待PaddleClas公布. + - 行人跟踪请用**Market1501**行人重识别数据集训练的ReID模型结合行人检测器去使用。 + - 车辆跟踪请用**VERI-Wild**车辆重识别数据集训练的ReID模型结合车辆检测器去使用。 diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml new file mode 100644 index 0000000000000000000000000000000000000000..cbca94755fa97d1cdc8de9a55a39e7063de0417c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml @@ -0,0 +1,45 @@ +# This config represents a ReID only configuration of DeepSORT, it has two uses. +# One is used for loading the detection results and ReID model to get tracking results; +# Another is used for exporting the ReID model to deploy infer. + +_BASE_: [ + '../../../datasets/mot.yml', + '../../../runtime.yml', + '../_base_/deepsort_reader_1088x608.yml', +] + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: None +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams + + +# A ReID only configuration of DeepSORT, detector should be None. +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: None + reid: PCBPyramid + tracker: DeepSORTTracker + +PCBPyramid: + model_name: "ResNet101" + num_conv_out_channels: 128 + num_classes: 751 # default 751 classes in Market-1501 dataset. + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 # 0 means no need to filter out too small boxes + vertical_ratio: -1 # -1 means no need to filter out bboxes, usually set 1.6 for pedestrian + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..d50da28b2cadf80d42184d37b4428f564c2033ac --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet.yml @@ -0,0 +1,44 @@ +# This config represents a ReID only configuration of DeepSORT, it has two uses. +# One is used for loading the detection results and ReID model to get tracking results; +# Another is used for exporting the ReID model to deploy infer. + +_BASE_: [ + '../../../datasets/mot.yml', + '../../../runtime.yml', + '../_base_/deepsort_reader_1088x608.yml', +] + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: None +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet.pdparams + + +# A ReID only configuration of DeepSORT, detector should be None. +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: None + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 # filter out too small boxes + vertical_ratio: -1 # filter out bboxes, usually set 1.6 for pedestrian + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..6e07042d837eb3f6be29f6eef7cfb35275433fa3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml @@ -0,0 +1,44 @@ +# This config represents a ReID only configuration of DeepSORT, it has two uses. +# One is used for loading the detection results and ReID model to get tracking results; +# Another is used for exporting the ReID model to deploy infer. + +_BASE_: [ + '../../../datasets/mot.yml', + '../../../runtime.yml', + '../_base_/deepsort_reader_1088x608.yml', +] + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: kitti_vehicle/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: None +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet_vehicle.pdparams + + +# A ReID only configuration of DeepSORT, detector should be None. +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: None + reid: PPLCNetEmbedding + tracker: DeepSORTTracker + +PPLCNetEmbedding: + input_ch: 1280 + output_ch: 512 + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 # 0 means no need to filter out too small boxes + vertical_ratio: -1 # -1 means no need to filter out bboxes, usually set 1.6 for pedestrian + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter diff --git a/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_resnet.yml b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_resnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..a9460586b6485b055d59efb7fe204f044edb2e21 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/deepsort/reid/deepsort_resnet.yml @@ -0,0 +1,43 @@ +# This config represents a ReID only configuration of DeepSORT, it has two uses. +# One is used for loading the detection results and ReID model to get tracking results; +# Another is used for exporting the ReID model to deploy infer. + +_BASE_: [ + '../../../datasets/mot.yml', + '../../../runtime.yml', + '../_base_/deepsort_reader_1088x608.yml', +] + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: True # set as True in DeepSORT + +det_weights: None +reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams + + +# A ReID only configuration of DeepSORT, detector should be None. +architecture: DeepSORT +pretrain_weights: None + +DeepSORT: + detector: None + reid: ResNetEmbedding + tracker: DeepSORTTracker + +ResNetEmbedding: + model_name: "ResNet50" + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 # filter out too small boxes + vertical_ratio: -1 # filter out bboxes, usually set 1.6 for pedestrian + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + motion: KalmanFilter diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/README.md b/PaddleDetection-release-2.6/configs/mot/fairmot/README.md new file mode 100644 index 0000000000000000000000000000000000000000..adb20bb28120e2b03c55020e5f0ba25d4a7bfa57 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/README.md @@ -0,0 +1,208 @@ +English | [简体中文](README_cn.md) + +# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking) + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Getting Start](#Getting_Start) +- [Citations](#Citations) + +## Introduction + +[FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and obtains a higher level of real-time MOT performance. + +### PP-Tracking real-time MOT system +In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system. +PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment. + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc. + +### AI studio public project tutorial +PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582). + + +## Model Zoo + +### FairMOT Results on MOT-16 Training Set + +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - | +| DLA-34 | 1088x608 | 83.2 | 83.1 | 499 | 3861 | 14223 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 864x480 | 80.8 | 81.1 | 561 | 3643 | 16967 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot_dla34_30e_864x480.yml) | +| DLA-34 | 576x320 | 74.0 | 76.1 | 640 | 4989 | 23034 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot_dla34_30e_576x320.yml) | + + +### FairMOT Results on MOT-16 Test Set + +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - | +| DLA-34 | 1088x608 | 75.0 | 74.7 | 919 | 7934 | 36747 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 864x480 | 73.0 | 72.6 | 977 | 7578 | 40601 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot_dla34_30e_864x480.yml) | +| DLA-34 | 576x320 | 69.9 | 70.2 | 1044 | 8869 | 44898 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot_dla34_30e_576x320.yml) | + +**Notes:** + - FairMOT DLA-34 used 2 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epochs. + + +### FairMOT enhance model +### Results on MOT-16 Test Set +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) | +| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) | + +### Results on MOT-17 Test Set +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) | +| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) | + +**Notes:** + - FairMOT enhance used 8 GPUs for training, and the crowdhuman dataset is added to the train-set during training. + - For FairMOT enhance DLA-34 the batch size is 16 on each GPU,and trained for 60 epochs. + - For FairMOT enhance HarDNet-85 the batch size is 10 on each GPU,and trained for 30 epochs. + +### FairMOT light model +### Results on MOT-16 Test Set +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| HRNetV2-W18 | 1088x608 | 71.7 | 66.6 | 1340 | 8642 | 41592 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) | + +### Results on MOT-17 Test Set +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| HRNetV2-W18 | 1088x608 | 70.7 | 65.7 | 4281 | 22485 | 138468 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) | +| HRNetV2-W18 | 864x480 | 70.3 | 65.8 | 4056 | 18927 | 144486 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml) | +| HRNetV2-W18 | 576x320 | 65.3 | 64.8 | 4137 | 28860 | 163017 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) | + +**Notes:** + - FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epochs. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training. + +### FairMOT + BYTETracker + +### Results on MOT-17 Half Set +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 + BYTETracker| 1088x608 | 70.3 | 73.2 | 234 | 2176 | 13598 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [config](./fairmot_dla34_30e_1088x608_bytetracker.yml) | + +**Notes:** + - FairMOT here is for ablation study, the training dataset is the 5 datasets of MIX(Caltech,CUHKSYSU,PRW,Cityscapes,ETHZ) and the first half of MOT17 Train, and the pretrain weights is CenterNet COCO model, the evaluation is on the second half of MOT17 Train. + - BYTETracker adapt to other FairMOT models of PaddleDetection, you can modify the tracker of the config like this: + ``` + JDETracker: + use_byte: True + match_thres: 0.8 + conf_thres: 0.4 + low_conf_thres: 0.2 + ``` + +### Fairmot transfer learning model + +### Results on GMOT-40 airplane subset +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 96.6 | 94.7 | 19 | 300 | 466 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_airplane.pdparams) | [config](./fairmot_dla34_30e_1088x608_airplane.yml) | + +**Note:** + - The dataset of this model is a subset of airport category extracted from GMOT-40 dataset. The download link provided by the PaddleDetection team is```wget https://bj.bcebos.com/v1/paddledet/data/mot/airplane.zip```, unzip and store it in the ```dataset/mot```, and then copy the ```airplane.train``` to ```dataset/mot/image_lists```. + - FairMOT model here uses the pedestrian FairMOT trained model for pre- training weights. The train-set used is the complete set of airplane, with a total of 4 video sequences, and it also used for evaluation. +- When applied to the tracking other objects, you should modify ```min_box_area``` and ```vertical_ratio``` of the tracker in the corresponding config file, like this: + ``` +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 0 # 200 for pedestrian + vertical_ratio: 0 # 1.6 for pedestrian + ``` + + +## Getting Start + +### 1. Training + +Training FairMOT on 2 GPUs with following command + +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml +``` + + +### 2. Evaluation + +Evaluating the track performance of FairMOT on val dataset in single GPU with following commands: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams +``` +**Notes:** + - The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/train + keep_ori_im: False # set True if save visualization images or video + ``` + - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and you can set `{output_dir}` by `--output_dir`. + +### 3. Inference + +Inference a video on single GPU with following command: + +```bash +# inference on video and save a video +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos +``` +**Notes:** + - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. + + +### 4. Export model + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +``` + +### 5. Using exported model for python inference + +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**Notes:** + - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images. + - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`. + + +### 6. Using exported MOT and keypoint model for unite python inference + +```bash +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU +``` +**Notes:** + - Keypoint model export tutorial: `configs/keypoint/README.md`. + + +## Citations +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} +@article{shao2018crowdhuman, + title={CrowdHuman: A Benchmark for Detecting Human in a Crowd}, + author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian}, + journal={arXiv preprint arXiv:1805.00123}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/README_cn.md b/PaddleDetection-release-2.6/configs/mot/fairmot/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..dd5a27874e6c7439222ca9f8648099ca25bf9863 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/README_cn.md @@ -0,0 +1,202 @@ +简体中文 | [English](README.md) + +# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 内容 + +[FairMOT](https://arxiv.org/abs/2004.01888)以Anchor Free的CenterNet检测器为基础,克服了Anchor-Based的检测框架中anchor和特征不对齐问题,深浅层特征融合使得检测和ReID任务各自获得所需要的特征,并且使用低维度ReID特征,提出了一种由两个同质分支组成的简单baseline来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了更高水平的实时多目标跟踪精度。 + +### PP-Tracking 实时多目标跟踪系统 +此外,PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 + +### AI Studio公开项目案例 +PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +## 模型库 + +### FairMOT在MOT-16 Training Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: | +| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - | +| DLA-34 | 1088x608 | 83.2 | 83.1 | 499 | 3861 | 14223 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 864x480 | 80.8 | 81.1 | 561 | 3643 | 16967 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [配置文件](./fairmot_dla34_30e_864x480.yml) | +| DLA-34 | 576x320 | 74.0 | 76.1 | 640 | 4989 | 23034 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [配置文件](./fairmot_dla34_30e_576x320.yml) | + +### FairMOT在MOT-16 Test Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: | +| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - | +| DLA-34 | 1088x608 | 75.0 | 74.7 | 919 | 7934 | 36747 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 864x480 | 73.0 | 72.6 | 977 | 7578 | 40601 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [配置文件](./fairmot_dla34_30e_864x480.yml) | +| DLA-34 | 576x320 | 69.9 | 70.2 | 1044 | 8869 | 44898 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [配置文件](./fairmot_dla34_30e_576x320.yml) | + +**注意:** + - FairMOT DLA-34均使用2个GPU进行训练,每个GPU上batch size为6,训练30个epoch。 + + +### FairMOT enhance模型 +### 在MOT-16 Test Set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) | +| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) | + +### 在MOT-17 Test Set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) | +| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) | + +**注意:** + - FairMOT enhance模型均使用8个GPU进行训练,训练集中加入了crowdhuman数据集一起参与训练。 + - FairMOT enhance DLA-34 每个GPU上batch size为16,训练60个epoch。 + - FairMOT enhance HarDNet-85 每个GPU上batch size为10,训练30个epoch。 + +### FairMOT轻量级模型 +### 在MOT-16 Test Set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| HRNetV2-W18 | 1088x608 | 71.7 | 66.6 | 1340 | 8642 | 41592 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) | + +### 在MOT-17 Test Set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| HRNetV2-W18 | 1088x608 | 70.7 | 65.7 | 4281 | 22485 | 138468 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) | +| HRNetV2-W18 | 864x480 | 70.3 | 65.8 | 4056 | 18927 | 144486 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml) | +| HRNetV2-W18 | 576x320 | 65.3 | 64.8 | 4137 | 28860 | 163017 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) | + +**注意:** + - FairMOT HRNetV2-W18均使用8个GPU进行训练,每个GPU上batch size为4,训练30个epoch,使用的ImageNet预训练,优化器策略采用的是Momentum,并且训练集中加入了crowdhuman数据集一起参与训练。 + +### FairMOT + BYTETracker + +### 在MOT-17 Half上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 + BYTETracker| 1088x608 | 70.3 | 73.2 | 234 | 2176 | 13598 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bytetracker.yml) | + + +**注意:** + - FairMOT模型此处是ablation study的配置,使用的训练集是原先MIX的5个数据集(Caltech,CUHKSYSU,PRW,Cityscapes,ETHZ)加上MOT17 Train的前一半,且使用是预训练权重是CenterNet的COCO预训练权重,验证是在MOT17 Train的后一半上测的。 + - BYTETracker应用到PaddleDetection的其他FairMOT模型,只需要更改对应的config文件里的tracker部分为如下所示: + ``` + JDETracker: + use_byte: True + match_thres: 0.8 + conf_thres: 0.4 + low_conf_thres: 0.2 + ``` + +### FairMOT迁移学习模型 + +### 在GMOT-40的airplane子集上的结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| DLA-34 | 1088x608 | 96.6 | 94.7 | 19 | 300 | 466 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_airplane.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_airplane.yml) | + +**注意:** + - 此模型数据集是GMOT-40的airplane类别抽离出来的子集,PaddleDetection团队整理后的下载链接为: ```wget https://bj.bcebos.com/v1/paddledet/data/mot/airplane.zip```,下载解压存放于 ```dataset/mot```目录下,并将其中的```airplane.train```复制存放于```dataset/mot/image_lists```。 + - FairMOT模型此处训练是采用行人FairMOT训好的模型作为预训练权重,使用的训练集是airplane全集共4个视频序列,验证也是在全集上测的。 + - 应用到其他物体的跟踪,需要更改对应的config文件里的tracker部分的```min_box_area```和```vertical_ratio```,如下所示: + ``` +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 0 # 200 for pedestrian + vertical_ratio: 0 # 1.6 for pedestrian + ``` + +## 快速开始 + +### 1. 训练 + +使用2个GPU通过如下命令一键式启动训练 + +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml +``` + +### 2. 评估 + +使用单张GPU通过如下命令一键式启动评估 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams +``` +**注意:** + - 默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/train + keep_ori_im: False # set True if save visualization images or video + ``` + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + +### 6. 用导出的跟踪和关键点模型Python联合预测 + +```bash +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU +``` +**注意:** + - 关键点模型导出教程请参考`configs/keypoint/README.md`。 + + +## 引用 +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} +@article{shao2018crowdhuman, + title={CrowdHuman: A Benchmark for Detecting Human in a Crowd}, + author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian}, + journal={arXiv preprint arXiv:1805.00123}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_dla34.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_dla34.yml new file mode 100644 index 0000000000000000000000000000000000000000..9388ab6692be242f5532c696393944b71b232821 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_dla34.yml @@ -0,0 +1,46 @@ +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +CenterNet: + backbone: DLA + neck: CenterNetDLAFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +CenterNetDLAFPN: + down_ratio: 4 + last_level: 5 + out_channel: 0 + dcn_v2: True + with_sge: False + +CenterNetHead: + head_planes: 256 + prior_bias: -2.19 + regress_ltrb: True + size_loss: 'L1' + loss_weight: {'heatmap': 1.0, 'size': 0.1, 'offset': 1.0, 'iou': 0.0} + add_iou: False + +FairMOTEmbeddingHead: + ch_head: 256 + ch_emb: 128 + +CenterNetPostProcess: + max_per_img: 500 + down_ratio: 4 + regress_ltrb: True + +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hardnet85.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hardnet85.yml new file mode 100644 index 0000000000000000000000000000000000000000..0924d5fcfa656d2ea6d753930ff8a0ddc7324eaa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hardnet85.yml @@ -0,0 +1,43 @@ +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/centernet_hardnet85_coco.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +CenterNet: + backbone: HarDNet + neck: CenterNetHarDNetFPN + head: CenterNetHead + post_process: CenterNetPostProcess + +HarDNet: + depth_wise: False + return_idx: [1,3,8,13] + arch: 85 + +CenterNetHarDNetFPN: + num_layers: 85 + down_ratio: 4 + last_level: 4 + out_channel: 0 + +CenterNetHead: + head_planes: 128 + +FairMOTEmbeddingHead: + ch_head: 512 + +CenterNetPostProcess: + max_per_img: 500 + regress_ltrb: True + +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..36f761c6f134d48fc54a19854af2a2f37899ad4b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml @@ -0,0 +1,38 @@ +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +CenterNet: + backbone: HRNet + head: CenterNetHead + post_process: CenterNetPostProcess + neck: CenterNetDLAFPN + +HRNet: + width: 18 + freeze_at: 0 + return_idx: [0, 1, 2, 3] + upsample: False + +CenterNetDLAFPN: + down_ratio: 4 + last_level: 3 + out_channel: 0 + first_level: 0 + dcn_v2: False + +CenterNetPostProcess: + max_per_img: 500 + +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..6c6f8f51636bbe9b374ce76c6583e512758e1120 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_1088x608.yml @@ -0,0 +1,41 @@ +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 6 + shuffle: True + drop_last: True + use_shared_memory: True + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_576x320.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..a24cd3630141ce2d97785646b78ff3defb59c279 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_576x320.yml @@ -0,0 +1,41 @@ +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 320, 576] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [320, 576]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 6 + shuffle: True + drop_last: True + use_shared_memory: True + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [320, 576]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 320, 576] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [320, 576]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_864x480.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_864x480.yml new file mode 100644 index 0000000000000000000000000000000000000000..92020e9718e30544f18096b59d2d7aed9db33c67 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/fairmot_reader_864x480.yml @@ -0,0 +1,41 @@ +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 480, 864] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [480, 864]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 6 + shuffle: True + drop_last: True + use_shared_memory: True + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [480, 864]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 480, 864] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [480, 864]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e.yml new file mode 100644 index 0000000000000000000000000000000000000000..6e7ec0dc45e9180cf0e632bd19d0de66d619ec7d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e.yml @@ -0,0 +1,14 @@ +epoch: 30 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [20,] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e_momentum.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e_momentum.yml new file mode 100644 index 0000000000000000000000000000000000000000..987a9af72ef9e69c5354d53d3c2c74919fea5365 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/_base_/optimizer_30e_momentum.yml @@ -0,0 +1,19 @@ +epoch: 30 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [15, 22] + use_warmup: True + - !ExpWarmup + steps: 1000 + power: 4 + +OptimizerBuilder: + optimizer: + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..3ef2b55c4e15ac8f3e4a484947c3d11fb8fb9d02 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_dla34.yml', + '_base_/fairmot_reader_1088x608.yml', +] + +weights: output/fairmot_dla34_30e_1088x608/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_airplane.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_airplane.yml new file mode 100644 index 0000000000000000000000000000000000000000..441947c95cadc3ec2554d110ee04f95752b974b9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_airplane.yml @@ -0,0 +1,33 @@ +_BASE_: [ + 'fairmot_dla34_30e_1088x608.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +weights: output/fairmot_dla34_30e_1088x608_airplane/model_final + +JDETracker: + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + min_box_area: 0 + vertical_ratio: 0 + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['airplane.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: airplane/images/train + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml new file mode 100644 index 0000000000000000000000000000000000000000..a0ad44a0f9a6ef12d3904f1d78ede896f917a90b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml @@ -0,0 +1,31 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_dla34.yml', + '_base_/fairmot_reader_1088x608.yml', +] +weights: output/fairmot_dla34_30e_1088x608_bytetracker/model_final + +# for ablation study, MIX + MOT17-half +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.half', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +JDETracker: + use_byte: True + match_thres: 0.8 + conf_thres: 0.4 + low_conf_thres: 0.2 + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_576x320.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..e2f9ca5fde20c3601b5c4eb009e26a3359f6b5e5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_576x320.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_dla34.yml', + '_base_/fairmot_reader_576x320.yml', +] + +weights: output/fairmot_dla34_30e_576x320/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml new file mode 100644 index 0000000000000000000000000000000000000000..8bc152d040a11b0e809b63a3ad0f4a3e2da492b6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_dla34.yml', + '_base_/fairmot_reader_864x480.yml', +] + +weights: output/fairmot_dla34_30e_864x480/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_dla34_60e_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_dla34_60e_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..c404468e3ecdbf1d6faf7ec0fd47c83a356b18a4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_dla34_60e_1088x608.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_dla34.yml', + '_base_/fairmot_reader_1088x608.yml', +] +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 16 + shuffle: True + drop_last: True + use_shared_memory: True + +epoch: 60 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [40,] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +weights: output/fairmot_enhance_dla34_60e_1088x608/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..5cf598c836a3174d513f809c058618263673e069 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/fairmot_hardnet85.yml', + '_base_/fairmot_reader_1088x608.yml', +] +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 10 + shuffle: True + drop_last: True + use_shared_memory: True + +epoch: 30 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [20,] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +weights: output/fairmot_enhance_hardnet85_30e_1088x608/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..bd0645fdfe149fb1e00472142b1fcc70224c8641 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e_momentum.yml', + '_base_/fairmot_hrnetv2_w18_dlafpn.yml', + '_base_/fairmot_reader_1088x608.yml', +] + +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 4 + shuffle: True + drop_last: True + use_shared_memory: True + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..bc35d346e048fa285514e8a4908af60a4f1937c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e_momentum.yml', + '_base_/fairmot_hrnetv2_w18_dlafpn.yml', + '_base_/fairmot_reader_576x320.yml', +] + +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 320, 576] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [320, 576]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 4 + shuffle: True + drop_last: True + use_shared_memory: True + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml new file mode 100644 index 0000000000000000000000000000000000000000..061734a48bba1641d1a37b183980622506bf6cb1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e_momentum.yml', + '_base_/fairmot_hrnetv2_w18_dlafpn.yml', + '_base_/fairmot_reader_864x480.yml', +] + +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 480, 864] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [480, 864]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 4 + shuffle: True + drop_last: True + use_shared_memory: True + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480/model_final diff --git a/PaddleDetection-release-2.6/configs/mot/headtracking21/README.md b/PaddleDetection-release-2.6/configs/mot/headtracking21/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/headtracking21/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/headtracking21/README_cn.md b/PaddleDetection-release-2.6/configs/mot/headtracking21/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..b7f9274ee1e1e59b9330988edca4d247c39788ac --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/headtracking21/README_cn.md @@ -0,0 +1,95 @@ +[English](README.md) | 简体中文 +# 特色垂类跟踪模型 + +## 人头跟踪(Head Tracking) + +现有行人跟踪器对高人群密度场景表现不佳,人头跟踪更适用于密集场景的跟踪。 +[HT-21](https://motchallenge.net/data/Head_Tracking_21)是一个高人群密度拥挤场景的人头跟踪数据集,场景包括不同的光线和环境条件下的拥挤的室内和室外场景,所有序列的帧速率都是25fps。 +
    + +
    + +## 模型库 +### FairMOT 和 ByteTrack 在 HT-21 Training Set上的结果 +| 模型 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: | +| FairMOT DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | +| ByteTrack-x | 1440x800 | 64.1 | 63.4 | 4191 | 185162 | 210240 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) | + +### FairMOT 和 ByteTrack 在 HT-21 Test Set上的结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: | +| FairMOT DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | +| ByteTrack-x | 1440x800 | 72.6 | 61.8 | 5163 | 71235 | 154139 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) | + +**注意:** + - FairMOT DLA-34使用2个GPU进行训练,每个GPU上batch size为6,训练30个epoch。 + - ByteTrack使用YOLOX-x做检测器,使用8个GPU进行训练,每个GPU上batch size为8,训练30个epoch,具体细节参照[bytetrack](../bytetrack/)。 + - 此处提供PaddleDetection团队整理后的[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip),下载后需解压放到`dataset/mot/`目录下,HT-21 Test集的结果需要交到[官网](https://motchallenge.net)评测。 + + +## 快速开始 + +### 1. 训练 +使用2个GPU通过如下命令一键式启动训练 +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608_headtracking21/ --gpus 0,1 tools/train.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml +``` + +### 2. 评估 +使用单张GPU通过如下命令一键式启动评估 +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=output/fairmot_dla34_30e_1088x608_headtracking21/model_final.pdparams +``` + +### 3. 预测 +使用单个GPU通过如下命令预测一个视频,并保存为视频 +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams --video_file={your video name}.mp4 --save_videos +``` +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams +``` + +### 5. 用导出的模型基于Python去预测 +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608_headtracking21 --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + +## 引用 +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@InProceedings{Sundararaman_2021_CVPR, + author = {Sundararaman, Ramana and De Almeida Braga, Cedric and Marchand, Eric and Pettre, Julien}, + title = {Tracking Pedestrian Heads in Dense Crowd}, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2021}, + pages = {3865-3875} +} + +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml b/PaddleDetection-release-2.6/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml new file mode 100644 index 0000000000000000000000000000000000000000..8bfbc7ca8b7b76f4b0dbab42999cd6e15f392aaa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +weights: output/fairmot_dla34_30e_1088x608_headtracking21/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['ht21.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: HT21/images/test + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/jde/README.md b/PaddleDetection-release-2.6/configs/mot/jde/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6eab26e4fb12d862ff953b977a9393bef70df04f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/README.md @@ -0,0 +1,119 @@ +English | [简体中文](README_cn.md) + +# JDE (Towards Real-Time Multi-Object Tracking) + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Getting Start](#Getting_Start) +- [Citations](#Citations) + +## Introduction + +- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3, adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed. + +### PP-Tracking real-time MOT system +In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system. +PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment. + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc. + +### AI studio public project tutorial +PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582). + +
    + +
    + +## Model Zoo + +### JDE Results on MOT-16 Training Set + +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | +| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | +| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml) | +| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml) | + +### JDE Results on MOT-16 Test Set + +| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | +| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | +| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - | +| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | +| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - | +| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml) | +| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml) | + +**Notes:** + - JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. + +## Getting Start + +### 1. Training + +Training JDE on 8 GPUs with following command + +```bash +python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml +``` + +### 2. Evaluation + +Evaluating the track performance of JDE on val dataset in single GPU with following commands: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams +``` +**Notes:** + - The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/train + keep_ori_im: False # set True if save visualization images or video + ``` + - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and you can set `{output_dir}` by `--output_dir`. + +### 3. Inference + +Inference a video on single GPU with following command: + +```bash +# inference on video and save a video +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos +``` +**Notes:** + - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. + + +### 4. Export model + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams +``` + +### 5. Using exported model for python inference + +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**Notes:** + - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images. + - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`. + + +## Citations +``` +@article{wang2019towards, + title={Towards Real-Time Multi-Object Tracking}, + author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin}, + journal={arXiv preprint arXiv:1909.12605}, + year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/jde/README_cn.md b/PaddleDetection-release-2.6/configs/mot/jde/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..9fd0375f6e2cfebffe226a837ac99023130f0fc2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/README_cn.md @@ -0,0 +1,118 @@ +简体中文 | [English](README.md) + +# JDE (Towards Real-Time Multi-Object Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 内容 + +[JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务,并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding,训练过程被构建为一个多任务联合学习问题,兼顾精度和速度。 + +### PP-Tracking 实时多目标跟踪系统 +此外,PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 + +### AI Studio公开项目案例 +PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +
    + +
    + +## 模型库 + +### JDE在MOT-16 Training Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | +| DarkNet53 | 1088x608 | 72.0 | 66.9 | 1397 | 7274 | 22209 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | +| DarkNet53 | 864x480 | 69.1 | 64.7 | 1539 | 7544 | 25046 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml) | +| DarkNet53 | 576x320 | 63.7 | 64.4 | 1310 | 6782 | 31964 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml) | + + +### JDE在MOT-16 Test Set上结果 + +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: | +| DarkNet53(paper) | 1088x608 | 64.4 | 55.8 | 1544 | - | - | - | - | - | +| DarkNet53 | 1088x608 | 64.6 | 58.5 | 1864 | 10550 | 52088 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml) | +| DarkNet53(paper) | 864x480 | 62.1 | 56.9 | 1608 | - | - | - | - | - | +| DarkNet53 | 864x480 | 63.2 | 57.7 | 1966 | 10070 | 55081 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml) | +| DarkNet53 | 576x320 | 59.1 | 56.4 | 1911 | 10923 | 61789 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml) | + +**注意:** + - JDE使用8个GPU进行训练,每个GPU上batch size为4,训练了30个epoch。 + +## 快速开始 + +### 1. 训练 + +使用8GPU通过如下命令一键式启动训练 + +```bash +python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml +``` + +### 2. 评估 + +使用8GPU通过如下命令一键式启动评估 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams +``` +**注意:** + - 默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/train + keep_ori_im: False # set True if save visualization images or video +``` + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4 --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{wang2019towards, + title={Towards Real-Time Multi-Object Tracking}, + author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin}, + journal={arXiv preprint arXiv:1909.12605}, + year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_darknet53.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_darknet53.yml new file mode 100644 index 0000000000000000000000000000000000000000..f5370fc6affa10f33af04185c48d61d5a2f06d98 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_darknet53.yml @@ -0,0 +1,56 @@ +architecture: JDE +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams +find_unused_parameters: True + +JDE: + detector: YOLOv3 + reid: JDEEmbeddingHead + tracker: JDETracker + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + for_mot: True + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + freeze_norm: True + +YOLOv3FPN: + freeze_norm: True + +YOLOv3Head: + anchors: [[128,384], [180,540], [256,640], [512,640], + [32,96], [45,135], [64,192], [90,271], + [8,24], [11,34], [16,48], [23,68]] + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + loss: JDEDetectionLoss + +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.3 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.5 + nms_top_k: 2000 + normalized: true + +JDEEmbeddingHead: + anchor_levels: 3 + anchor_scales: 4 + embedding_dim: 512 + emb_loss: JDEEmbeddingLoss + jde_loss: JDELoss + +JDETracker: + det_thresh: 0.3 + track_buffer: 30 + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..e34e2a366217758d8d4bedaefacff027e7ef857c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_1088x608.yml @@ -0,0 +1,48 @@ +worker_num: 8 +TrainReader: + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2JDETargetThres: + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + anchors: [[[128,384], [180,540], [256,640], [512,640]], + [[32,96], [45,135], [64,192], [90,271]], + [[8,24], [11,34], [16,48], [23,68]]] + downsample_ratios: [32, 16, 8] + ide_thresh: 0.5 + fg_thresh: 0.5 + bg_thresh: 0.4 + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [608, 1088]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_576x320.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..d1205ada7e8b0d793b6966178b76fea5351f17a5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_576x320.yml @@ -0,0 +1,48 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [320, 576]} + - MOTRandomAffine: {} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2JDETargetThres: + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + anchors: [[[85,255], [120,360], [170,420], [340,420]], + [[21,64], [30,90], [43,128], [60,180]], + [[6,16], [8,23], [11,32], [16,45]]] + downsample_ratios: [32, 16, 8] + ide_thresh: 0.5 + fg_thresh: 0.5 + bg_thresh: 0.4 + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [320, 576]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 320, 576] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [320, 576]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_864x480.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_864x480.yml new file mode 100644 index 0000000000000000000000000000000000000000..439eced58e59fe893c6f5ea591c9cc7ec7daf240 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/jde_reader_864x480.yml @@ -0,0 +1,48 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [480, 864]} + - MOTRandomAffine: {} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2JDETargetThres: + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + anchors: [[[102,305], [143, 429], [203,508], [407,508]], + [[25,76], [36,107], [51,152], [71,215]], + [[6,19], [9,27], [13,38], [18,54]]] + downsample_ratios: [32, 16, 8] + ide_thresh: 0.5 + fg_thresh: 0.5 + bg_thresh: 0.4 + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + + +EvalMOTReader: + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [480, 864]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 + + +TestMOTReader: + inputs_def: + image_shape: [3, 480, 864] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: [480, 864]} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_30e.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_30e.yml new file mode 100644 index 0000000000000000000000000000000000000000..f90439a5c52573ccfa73c208dc82289b7de9ed31 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_30e.yml @@ -0,0 +1,20 @@ +epoch: 30 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [15, 22] + use_warmup: True + - !ExpWarmup + steps: 1000 + power: 4 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_60e.yml b/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_60e.yml new file mode 100644 index 0000000000000000000000000000000000000000..64b81300ded7b5b43ddab2edaaf0c24da547fe89 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/_base_/optimizer_60e.yml @@ -0,0 +1,20 @@ +epoch: 60 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [30, 44] + use_warmup: True + - !ExpWarmup + steps: 1000 + power: 4 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml new file mode 100644 index 0000000000000000000000000000000000000000..9aa2eaa96e50e1d5cd4267e36793fad2967514ad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_1088x608.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/jde_darknet53.yml', + '_base_/jde_reader_1088x608.yml', +] +weights: output/jde_darknet53_30e_1088x608/model_final + +JDE: + detector: YOLOv3 + reid: JDEEmbeddingHead + tracker: JDETracker + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + for_mot: True + +YOLOv3Head: + anchors: [[128,384], [180,540], [256,640], [512,640], + [32,96], [45,135], [64,192], [90,271], + [8,24], [11,34], [16,48], [23,68]] + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + loss: JDEDetectionLoss + +JDETracker: + det_thresh: 0.3 + track_buffer: 30 + min_box_area: 200 + motion: KalmanFilter + +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.5 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.4 + nms_top_k: 2000 + normalized: true + return_index: true diff --git a/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..7cee1aafc867d8352d7297dcc6d3bbee59e463f0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_576x320.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/jde_darknet53.yml', + '_base_/jde_reader_576x320.yml', +] +weights: output/jde_darknet53_30e_576x320/model_final + +JDE: + detector: YOLOv3 + reid: JDEEmbeddingHead + tracker: JDETracker + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + for_mot: True + +YOLOv3Head: + anchors: [[85,255], [120,360], [170,420], [340,420], + [21,64], [30,90], [43,128], [60,180], + [6,16], [8,23], [11,32], [16,45]] + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + loss: JDEDetectionLoss + +JDETracker: + det_thresh: 0.3 + track_buffer: 30 + min_box_area: 200 + motion: KalmanFilter + +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.5 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.4 + nms_top_k: 2000 + normalized: true + return_index: true diff --git a/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml new file mode 100644 index 0000000000000000000000000000000000000000..96ed2232c3b9d2a5ab2b702792fb61f8dcdffc9a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/jde/jde_darknet53_30e_864x480.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/mot.yml', + '../../runtime.yml', + '_base_/optimizer_30e.yml', + '_base_/jde_darknet53.yml', + '_base_/jde_reader_864x480.yml', +] +weights: output/jde_darknet53_30e_864x480/model_final + +JDE: + detector: YOLOv3 + reid: JDEEmbeddingHead + tracker: JDETracker + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: JDEBBoxPostProcess + for_mot: True + +YOLOv3Head: + anchors: [[102,305], [143, 429], [203,508], [407,508], + [25,76], [36,107], [51,152], [71,215], + [6,19], [9,27], [13,38], [18,54]] + anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]] + loss: JDEDetectionLoss + +JDETracker: + det_thresh: 0.3 + track_buffer: 30 + min_box_area: 200 + motion: KalmanFilter + +JDEBBoxPostProcess: + decode: + name: JDEBox + conf_thresh: 0.5 + downsample_ratio: 32 + nms: + name: MultiClassNMS + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.4 + nms_top_k: 2000 + normalized: true + return_index: true diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/README.md b/PaddleDetection-release-2.6/configs/mot/mcfairmot/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d77a252afc3543226a16b770e0796e4e073d14ce --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/README.md @@ -0,0 +1,140 @@ +English | [简体中文](README_cn.md) + +# MCFairMOT (Multi-class FairMOT) + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Getting Start](#Getting_Start) +- [Citations](#Citations) + +## Introduction + +MCFairMOT is the Multi-class extended version of [FairMOT](https://arxiv.org/abs/2004.01888). + +### PP-Tracking real-time MOT system +In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system. +PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment. + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc. + +### AI studio public project tutorial +PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582). + +## Model Zoo +### MCFairMOT Results on VisDrone2019 Val Set +| backbone | input shape | MOTA | IDF1 | IDS | FPS | download | config | +| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: | +| DLA-34 | 1088x608 | 24.3 | 41.6 | 2314 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_dla34_30e_1088x608_visdrone.yml) | +| HRNetV2-W18 | 1088x608 | 20.4 | 39.9 | 2603 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) | +| HRNetV2-W18 | 864x480 | 18.2 | 38.7 | 2416 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) | +| HRNetV2-W18 | 576x320 | 12.0 | 33.8 | 2178 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) | + +**Notes:** + - MOTA is the average MOTA of 10 categories in the VisDrone2019 MOT dataset, and its value is also equal to the average MOTA of all the evaluated video sequences. Here we provide the download [link](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip) of the dataset. + - MCFairMOT used 4 GPUs for training 30 epochs. The batch size is 6 on each GPU for MCFairMOT DLA-34, and 8 for MCFairMOT HRNetV2-W18. + +### MCFairMOT Results on VisDrone Vehicle Val Set +| backbone | input shape | MOTA | IDF1 | IDS | FPS | download | config | +| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: | +| DLA-34 | 1088x608 | 37.7 | 56.8 | 199 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [config](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml) | +| HRNetV2-W18 | 1088x608 | 35.6 | 56.3 | 190 | - |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml) | + +**Notes:** + - MOTA is the average MOTA of 4 categories in the VisDrone Vehicle dataset, and this dataset is extracted from the VisDrone2019 MOT dataset, here we provide the download [link](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot_vehicle.zip). + - The tracker used in MCFairMOT model here is ByteTracker. + +### MCFairMOT off-line quantization results on VisDrone Vehicle val-set +| Model | Compression Strategy | Prediction Delay(T4) |Prediction Delay(V100)| Model Configuration File |Compression Algorithm Configuration File | +| :--------------| :------- | :------: | :----: | :----: | :----: | +| DLA-34 | baseline | 41.3 | 21.9 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)| - | +| DLA-34 | off-line quantization | 37.8 | 21.2 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[Configuration File](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/slim/post_quant/mcfairmot_ptq.yml)| + + +## Getting Start + +### 1. Training +Training MCFairMOT on 4 GPUs with following command +```bash +python -m paddle.distributed.launch --log_dir=./mcfairmot_dla34_30e_1088x608_visdrone/ --gpus 0,1,2,3 tools/train.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml +``` + +### 2. Evaluation +Evaluating the track performance of MCFairMOT on val dataset in single GPU with following commands: +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=output/mcfairmot_dla34_30e_1088x608_visdrone/model_final.pdparams +``` +**Notes:** + - The default evaluation dataset is VisDrone2019 MOT val-set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mcmot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: your_dataset/images/val + keep_ori_im: False # set True if save visualization images or video + ``` + - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,cls_id,-1,-1`, and you can set `{output_dir}` by `--output_dir`. + +### 3. Inference +Inference a video on single GPU with following command: +```bash +# inference on video and save a video +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams --video_file={your video name}.mp4 --save_videos +``` +**Notes:** + - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. + + +### 4. Export model +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams +``` + +### 5. Using exported model for python inference +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/mcfairmot_dla34_30e_1088x608_visdrone --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**Notes:** + - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images. + - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,cls_id,-1,-1`. + +### 6. Off-line quantization + +The offline quantization model is calibrated using the VisDrone Vehicle val-set, running as: +```bash +CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml --slim_config=configs/slim/post_quant/mcfairmot_ptq.yml +``` +**Notes:** + - Offline quantization uses the VisDrone Vehicle val-set dataset and a 4-class vehicle tracking model by default. + +## Citations +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} + +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/README_cn.md b/PaddleDetection-release-2.6/configs/mot/mcfairmot/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..e8cd6bbc6fc2f953303932f918f7d47864fd4182 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/README_cn.md @@ -0,0 +1,137 @@ +简体中文 | [English](README.md) + +# MCFairMOT (Multi-class FairMOT) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 内容 + +MCFairMOT是[FairMOT](https://arxiv.org/abs/2004.01888)的多类别扩展版本。 + +### PP-Tracking 实时多目标跟踪系统 +此外,PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 + +### AI Studio公开项目案例 +PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +## 模型库 + +### MCFairMOT 在VisDrone2019 MOT val-set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: | +| DLA-34 | 1088x608 | 24.3 | 41.6 | 2314 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_dla34_30e_1088x608_visdrone.yml) | +| HRNetV2-W18 | 1088x608 | 20.4 | 39.9 | 2603 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) | +| HRNetV2-W18 | 864x480 | 18.2 | 38.7 | 2416 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) | +| HRNetV2-W18 | 576x320 | 12.0 | 33.8 | 2178 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) | + +**注意:** + - MOTA是VisDrone2019 MOT数据集10类目标的平均MOTA, 其值也等于所有评估的视频序列的平均MOTA,此处提供数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip)。 + - MCFairMOT模型均使用4个GPU进行训练,训练30个epoch。DLA-34骨干网络的每个GPU上batch size为6,HRNetV2-W18骨干网络的每个GPU上batch size为8。 + +### MCFairMOT 在VisDrone Vehicle val-set上结果 +| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FPS | 下载链接 | 配置文件 | +| :--------------| :------- | :----: | :----: | :---: | :------: | :----: |:----: | +| DLA-34 | 1088x608 | 37.7 | 56.8 | 199 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml) | +| HRNetV2-W18 | 1088x608 | 35.6 | 56.3 | 190 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml) | + +**注意:** + - MOTA是VisDrone Vehicle数据集4类车辆目标的平均MOTA, 该数据集是VisDrone数据集中抽出4类车辆类别组成的,此处提供数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot_vehicle.zip)。 + - MCFairMOT模型此处使用的跟踪器是使用的ByteTracker。 + +### MCFairMOT 在VisDrone Vehicle val-set上离线量化结果 +| 骨干网络 | 压缩策略 | 预测时延(T4) |预测时延(V100)| 配置文件 |压缩算法配置文件 | +| :--------------| :------- | :------: | :----: | :----: | :----: | +| DLA-34 | baseline | 41.3 | 21.9 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)| - | +| DLA-34 | 离线量化 | 37.8 | 21.2 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/slim/post_quant/mcfairmot_ptq.yml)| + +## 快速开始 + +### 1. 训练 +使用4个GPU通过如下命令一键式启动训练 +```bash +python -m paddle.distributed.launch --log_dir=./mcfairmot_dla34_30e_1088x608_visdrone/ --gpus 0,1,2,3 tools/train.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml +``` + +### 2. 评估 +使用单张GPU通过如下命令一键式启动评估 +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=output/mcfairmot_dla34_30e_1088x608_visdrone/model_final.pdparams +``` +**注意:** + - 默认评估的是VisDrone2019 MOT val-set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mcmot.yml`: + ``` + EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: your_dataset/images/val + keep_ori_im: False # set True if save visualization images or video + ``` + - 多类别跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,cls_id,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 预测 +使用单个GPU通过如下命令预测一个视频,并保存为视频 +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams --video_file={your video name}.mp4 --save_videos +``` +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams +``` + +### 5. 用导出的模型基于Python去预测 +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/mcfairmot_dla34_30e_1088x608_visdrone --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 多类别跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,cls_id,-1,-1`。 + +### 6. 离线量化 + +使用 VisDrone Vehicle val-set 对离线量化模型进行校准,运行方式: +```bash +CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml --slim_config=configs/slim/post_quant/mcfairmot_ptq.yml +``` +**注意:** + - 离线量化默认使用的是VisDrone Vehicle val-set数据集以及4类车辆跟踪模型。 + +## 引用 +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} + +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..287255fdaf032d2979083d460ef49335409e0b9f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml', + '../../datasets/mcmot.yml' +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + +weights: output/mcfairmot_dla34_30e_1088x608_visdrone/model_final + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml new file mode 100644 index 0000000000000000000000000000000000000000..99452f5dc55115a4267c9bb4ad4608009a54a16e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml @@ -0,0 +1,68 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml', + '../../datasets/mcmot.yml' +] +metric: MCMOT +num_classes: 4 + +# for MCMOT training +TrainDataset: + !MCMOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_mcmot_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + label_list: label_list.txt + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_mcmot_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt + +# for MCMOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt + + +pretrain_weights: https://paddledet.bj.bcebos.com/models/centernet_dla34_140e_coco.pdparams + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + use_byte: True + match_thres: 0.8 + conf_thres: 0.4 + low_conf_thres: 0.2 + +weights: output/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker/model_final + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..cbbb9fa4af630a489fbbfc3f27468c5d801ffd74 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml', + '../../datasets/mcmot.yml' +] + +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + +weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone/model_final + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml new file mode 100644 index 0000000000000000000000000000000000000000..a1c1de91dc3860b38a1f641cc719cbcfab92d7f3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml @@ -0,0 +1,78 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml', + '../../datasets/mcmot.yml' +] +metric: MCMOT +num_classes: 4 + +# for MCMOT training +TrainDataset: + !MCMOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_mcmot_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + label_list: label_list.txt + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_mcmot_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt + +# for MCMOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt + + +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + use_byte: True + match_thres: 0.8 + conf_thres: 0.4 + low_conf_thres: 0.2 + +weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker/model_final + +epoch: 30 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [15, 22] + use_warmup: True + - !ExpWarmup + steps: 1000 + power: 4 + +OptimizerBuilder: + optimizer: + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml new file mode 100644 index 0000000000000000000000000000000000000000..da1170ac53e8b15046dbf21150945e9dd9af0d1a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml @@ -0,0 +1,64 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml', + '../../datasets/mcmot.yml' +] + +metric: MCMOT +num_classes: 11 +weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot/model_final + +# for MCMOT training +TrainDataset: + !MCMOTDataSet + dataset_dir: dataset/mot + image_lists: ['bdd100k_mcmot.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + label_list: label_list.txt + +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: bdd100k_mcmot/images/val + keep_ori_im: False + +# model config +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..e0fe18381b7490bfb8f263e30599541cd26861b1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml', + '../../datasets/mcmot.yml' +] + +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + +weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone/model_final + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..02d918ddeb0010e6488095bb19658439b7aeebc6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml', + '../../datasets/mcmot.yml' +] + +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams +for_mot: True + +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker # multi-class tracker + +CenterNetHead: + regress_ltrb: False + +CenterNetPostProcess: + regress_ltrb: False + max_per_img: 200 + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine + +weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone/model_final + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [10, 20] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/mot/mtmct/README.md b/PaddleDetection-release-2.6/configs/mot/mtmct/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mtmct/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/mtmct/README_cn.md b/PaddleDetection-release-2.6/configs/mot/mtmct/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..f4b5d12ee6fc1ca6afbb86243535fb5435539c52 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mtmct/README_cn.md @@ -0,0 +1,137 @@ +English | [简体中文](README_cn.md) + +# MTMCT (Multi-Target Multi-Camera Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 简介 +MTMCT (Multi-Target Multi-Camera Tracking) 跨镜头多目标跟踪是某一场景下的不同摄像头拍摄的视频进行多目标跟踪,是跟踪领域一个非常重要的研究课题,在安防监控、自动驾驶、智慧城市等行业起着重要作用。MTMCT预测的是同一场景下的不同摄像头拍摄的视频,其方法的效果受场景先验知识和相机数量角度拓扑结构等信息的影响较大,PaddleDetection此处提供的是去除场景和相机相关优化方法后的一个基础版本的MTMCT算法实现,如果要继续提高效果,需要专门针对该场景和相机信息设计后处理算法。此处选用DeepSORT方案做MTMCT,为了达到实时性选用了PaddleDetection自研的[PP-YOLOv2](../../ppyolo/)和轻量级网络[PP-PicoDet](../../picodet/)作为检测器,选用PaddleClas自研的轻量级网络PP-LCNet作为ReID模型。 + +MTMCT是[PP-Tracking](../../../deploy/pptracking)项目中一个非常重要的方向,[PP-Tracking](../../../deploy/pptracking/README.md)是基于PaddlePaddle深度学习框架的业界首个开源实时跟踪系统。针对实际业务的难点痛点,PP-Tracking内置行人车辆跟踪、跨镜头跟踪、多类别跟踪、小目标跟踪及流量计数等能力与产业应用,同时提供可视化开发界面。模型集成目标检测、轻量级ReID、多目标跟踪等算法,进一步提升PP-Tracking在服务器端部署性能。同时支持Python、C++部署,适配Linux、NVIDIA Jetson等多个平台环境。具体可前往该目录使用。 + +### AI Studio公开项目案例 +PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +## 模型库 +### DeepSORT在 AIC21 MTMCT(CityFlow) 车辆跨境跟踪数据集Test集上的结果 + +| 检测器 | 输入尺度 | ReID | 场景 | Tricks | IDF1 | IDP | IDR | Precision | Recall | FPS | 检测器下载链接 | ReID下载链接 | +| :--------- | :--------- | :------- | :----- | :------ |:----- |:------- |:----- |:--------- |:-------- |:----- |:------ | :------ | +| PP-PicoDet | 640x640 | PP-LCNet | S06 | - | 0.3617 | 0.4417 | 0.3062 | 0.6266 | 0.4343 | - |[Detector](https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar) |[ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar) | +| PPYOLOv2 | 640x640 | PP-LCNet | S06 | - | 0.4450 | 0.4611 | 0.4300 | 0.6385 | 0.5954 | - |[Detector](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar) |[ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar) | + +**注意:** + - S06是AIC21 MTMCT数据集Test集的场景名称,S06场景下有’c041,c042,c043,c044,c045,c046‘共6个摄像头的视频。 + - 由于在部署过程中只需要前向参数,此处提供的是已经导出的模型,解压后可看到包括`infer_cfg.yml`、`model.pdiparams`、`model.pdiparams.info`和`model.pdmodel`四个文件。 + + +## 数据集准备 +对于车辆跨镜头跟踪是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨境跟踪数据集,此处提供PaddleDetection团队整理过后的数据集的下载链接:`wget https://paddledet.bj.bcebos.com/data/mot/aic21mtmct_vehicle.zip`,测试使用的是其中的S06文件夹目录,此外还提供AIC21 MTMCT数据集中S01场景抽出来的极小的一个demo测试数据集:`wget https://paddledet.bj.bcebos.com/data/mot/demo/mtmct-demo.tar` + +数据集的处理如下所示: +``` +# AIC21 MTMCT原始数据集的目录如下所示: +|——————AIC21_Track3_MTMC_Tracking + |——————cam_framenum (Number of frames below each camera) + |——————cam_loc (Positional relationship between cameras) + |——————cam_timestamp (Time difference between cameras) + |——————eval (evaluation function and ground_truth.txt) + |——————test (S06 dataset) + |——————train (S01,S03,S04 dataset) + |——————validation (S02,S05 dataset) + |——————DataLicenseAgreement_AICityChallenge_2021.pdf + |——————list_cam.txt (List of all camera paths) + |——————ReadMe.txt (Dataset description) +|——————gen_aicity_mtmct_data.py (Camera videos extraction script) +``` +需要处理成如下格式: +``` +aic21mtmct_vehicle/ +├── S01 + ├── gt + │ ├── gt.txt + ├── images + ├── c001 + │ ├── img1 + │ │ ├── 0000001.jpg + │ │ ... + │ ├── roi.jpg + ├── c002 + ... + ├── c006 +├── S02 +... +├── S05 +├── S06 + ├── images + ├── c041 + ├── img1 + ├── 0000001.jpg + ... + + ├── c042 + ... + ├── c046 + ├── zone (only for test-set S06 when use camera tricks for testing) + ├── c041.png + ... + ├── c046.png +``` + +#### 生成S01场景的验证集数据 +python gen_aicity_mtmct_data.py ./AIC21_Track3_MTMC_Tracking/train/S01 + +**注意:** + - AIC21 MTMCT数据集共有6个场景共计46个摄像头的数据,其中S01、S03和S04为训练集,S02和S05为验证集,S06是测试集,S06场景下有’c041,c042,c043,c044,c045,c046‘共6个摄像头的视频。 + +## 快速开始 + +### 1. 导出模型 +Step 1:下载导出的检测模型 +```bash +wget https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar +tar -xvf picodet_l_640_aic21mtmct_vehicle.tar +``` +Step 2:下载导出的ReID模型 +```bash +wget https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar +tar -xvf deepsort_pplcnet_vehicle.tar +``` +**注意:** + - PP-PicoDet是轻量级检测模型,其训练请参考[configs/picodet](../../picodet/README.md),并注意修改种类数和数据集路径。 + - PP-LCNet是轻量级ReID模型,其训练请参考[PaddleClas](https://github.com/PaddlePaddle/PaddleClas),是在VERI-Wild车辆重识别数据集训练得到的权重,建议直接使用无需重训。 + + +### 2. 用导出的模型基于Python去预测 +```bash +# 下载demo测试视频 +wget https://paddledet.bj.bcebos.com/data/mot/demo/mtmct-demo.tar +tar -xvf mtmct-demo.tar + +# 用导出的PicoDet车辆检测模型和PPLCNet车辆ReID模型去基于Python预测 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=picodet_l_640_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --mtmct_dir=mtmct-demo --mtmct_cfg=mtmct_cfg --device=GPU --scaled=True --save_mot_txts --save_images +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt),或`--save_images`表示保存跟踪结果可视化图片。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + - `--mtmct_dir`是MTMCT预测的某个场景的文件夹名字,里面包含该场景不同摄像头拍摄视频的图片文件夹视频,其数量至少为两个。 + - `--mtmct_cfg`是MTMCT预测的某个场景的配置文件,里面包含该一些trick操作的开关和该场景摄像头相关设置的文件路径,用户可以自行更改相关路径以及设置某些操作是否启用。 + - MTMCT跨镜头跟踪输出结果为视频和txt形式。每个图片文件夹各生成一个可视化的跨镜头跟踪结果,与单镜头跟踪的结果是不同的,单镜头跟踪的结果在几个视频文件夹间是独立无关的。MTMCT的结果txt只有一个,比单镜头跟踪结果txt多了第一列镜头id号,跨镜头跟踪结果txt文件每行信息是`camera_id,frame,id,x1,y1,w,h,-1,-1`。 + - MTMCT是[PP-Tracking](../../../deploy/pptracking)项目中的一个非常重要的方向,具体可前往该目录使用。 + + +## 引用 +``` +@InProceedings{Tang19CityFlow, +author = {Zheng Tang and Milind Naphade and Ming-Yu Liu and Xiaodong Yang and Stan Birchfield and Shuo Wang and Ratnesh Kumar and David Anastasiu and Jenq-Neng Hwang}, +title = {CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +month = {June}, +year = {2019}, +pages = {8797–8806} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/mtmct/gen_aicity_mtmct_data.py b/PaddleDetection-release-2.6/configs/mot/mtmct/gen_aicity_mtmct_data.py new file mode 100644 index 0000000000000000000000000000000000000000..00bc952f64bce5565cb537fd8c123c10f33e253a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/mtmct/gen_aicity_mtmct_data.py @@ -0,0 +1,62 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import cv2 +import glob + + +def video2frames(sourceVdo, dstDir): + videoData = cv2.VideoCapture(sourceVdo) + count = 0 + while (videoData.isOpened()): + count += 1 + ret, frame = videoData.read() + if ret: + cv2.imwrite(f"{dstDir}/{count:07d}.jpg", frame) + if count % 20 == 0: + print(f"{dstDir}/{count:07d}.jpg") + else: + break + videoData.release() + + +def transSeq(seqs_path, new_root): + sonCameras = glob.glob(seqs_path + "/*") + sonCameras.sort() + for vdoList in sonCameras: + Seq = vdoList.split('/')[-2] + Camera = vdoList.split('/')[-1] + os.system(f"mkdir -p {new_root}/{Seq}/images/{Camera}/img1") + + roi_path = vdoList + '/roi.jpg' + new_roi_path = f"{new_root}/{Seq}/images/{Camera}" + os.system(f"cp {roi_path} {new_roi_path}") + + video2frames(f"{vdoList}/vdo.avi", + f"{new_root}/{Scd eq}/images/{Camera}/img1") + + +if __name__ == "__main__": + seq_path = sys.argv[1] + new_root = 'aic21mtmct_vehicle' + + seq_name = seq_path.split('/')[-1] + data_path = seq_path.split('/')[-3] + os.system(f"mkdir -p {new_root}/{seq_name}/gt") + os.system(f"cp {data_path}/eval/ground*.txt {new_root}/{seq_name}/gt") + + # extract video frames + transSeq(seq_path, new_root) diff --git a/PaddleDetection-release-2.6/configs/mot/ocsort/README.md b/PaddleDetection-release-2.6/configs/mot/ocsort/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/ocsort/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/ocsort/README_cn.md b/PaddleDetection-release-2.6/configs/mot/ocsort/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..b673382dad91b012af786ae1d91e7126bdb4a201 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/ocsort/README_cn.md @@ -0,0 +1,101 @@ +简体中文 | [English](README.md) + +# OC_SORT (Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 简介 +[OC_SORT](https://arxiv.org/abs/2203.14360)(Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking)。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + +## 模型库 + +### OC_SORT在MOT-17 half Val Set上结果 + +| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 | +| :-------- | :----- | :----: | :----:|:------: | :----: |:-----: |:----:|:----: | +| MOT-17 half train | PP-YOLOE-l | 640x640 | - | 52.9 | 50.1 | 62.6 | - |[配置文件](./ocsort_ppyoloe.yml) | +| **mot17_ch** | YOLOX-x | 800x1440| - | 61.9 | 75.5 | 77.0 | - |[配置文件](./ocsort_yolox.yml) | + +**注意:** + - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行验证的命令即可自动下载,OC_SORT默认不需要```reid_weights```权重。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + - **mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集,**mix_det**是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 + - OC_SORT的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 + - OC_SORT的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python)。 + - OC_SORT是PP-Human和PP-Vehicle等Pipeline分析项目跟踪方向的主要方案,具体使用参照[Pipeline](../../../deploy/pipeline)和[MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md)。 + + +## 快速开始 + +### 1. 训练 +通过如下命令一键式启动训练和评估 +```bash +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +``` + +### 2. 评估 +#### 2.1 评估检测效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 2.2 评估跟踪效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/ocsort/ocsort_yolox.yml --scaled=True +``` +**注意:** + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/ocsort/ocsort_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + + +### 4. 导出预测模型 + +Step 1:导出检测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +```bash +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: OCSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_ppyoloe.yml b/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_ppyoloe.yml new file mode 100644 index 0000000000000000000000000000000000000000..ed2881a2138cb3986881accb3173f23f9ab815c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_ppyoloe.yml @@ -0,0 +1,76 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '../bytetrack/_base_/mot17.yml', + '../bytetrack/_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/ocsort_ppyoloe/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode, set 'COCO' can be training mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +ByteTrack: + detector: YOLOv3 # PPYOLOe version + reid: None + tracker: OCSORTTracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: None + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + + +OCSORTTracker: + det_thresh: 0.4 # 0.6 in yolox ocsort + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + vertical_ratio: 0 + min_box_area: 0 + use_byte: False + use_angle_cost: False + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_yolox.yml b/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_yolox.yml new file mode 100644 index 0000000000000000000000000000000000000000..4f05e2d04ce1d83c98e54b35d21217915c5ee8f4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/ocsort/ocsort_yolox.yml @@ -0,0 +1,83 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml', + '../bytetrack/_base_/mix_det.yml', + '../bytetrack/_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/ocsort_yolox/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: OCSORTTracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_mot_ch.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +OCSORTTracker: + det_thresh: 0.6 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + vertical_ratio: 1.6 + min_box_area: 100 + use_byte: False + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/README.md b/PaddleDetection-release-2.6/configs/mot/pedestrian/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/README_cn.md b/PaddleDetection-release-2.6/configs/mot/pedestrian/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..768733db537c5f752bbb56198bad196c68b28602 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/README_cn.md @@ -0,0 +1,135 @@ +[English](README.md) | 简体中文 +# 特色垂类跟踪模型 + +## 大规模行人跟踪 (Pedestrian Tracking) + +行人跟踪的主要应用之一是交通监控。 + +[PathTrack](https://www.trace.ethz.ch/publications/2017/pathtrack/index.html)包含720个视频序列,有着超过15000个行人的轨迹。包含了街景、舞蹈、体育运动、采访等各种场景的,大部分是移动摄像头拍摄场景。该数据集只有Pedestrian一类标注作为跟踪任务。 + +[VisDrone](http://aiskyeye.com)是无人机视角拍摄的数据集,是以俯视视角为主。该数据集涵盖不同位置(取自中国数千个相距数千公里的14个不同城市)、不同环境(城市和乡村)、不同物体(行人、车辆、自行车等)和不同密度(稀疏和拥挤的场景)。[VisDrone2019-MOT](https://github.com/VisDrone/VisDrone-Dataset)包含56个视频序列用于训练,7个视频序列用于验证。此处针对VisDrone2019-MOT多目标跟踪数据集进行提取,抽取出类别为pedestrian和people的数据组合成一个大的Pedestrian类别。 + + +## 模型库 + +### FairMOT在各个数据集val-set上Pedestrian类别的结果 + +| 数据集 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 | +| :-------------| :-------- | :------- | :----: | :----: | :----: | :-----: |:------: | +| PathTrack | DLA-34 | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_pathtrack.yml) | +| VisDrone | DLA-34 | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) | +| VisDrone | HRNetv2-W18| 1088x608 | 40.5 | 54.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml) | +| VisDrone | HRNetv2-W18| 864x480 | 38.6 | 50.9 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml) | +| VisDrone | HRNetv2-W18| 576x320 | 30.6 | 47.2 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml) | + +**注意:** + - FairMOT均使用DLA-34为骨干网络,4个GPU进行训练,每个GPU上batch size为6,训练30个epoch。 + + +## 数据集准备和处理 + +### 1、数据集处理代码说明 +代码统一都在tools目录下 +``` +# visdrone +tools/visdrone/visdrone2mot.py: 生成visdrone_pedestrian据集 +``` + +### 2、visdrone_pedestrian数据集处理 +``` +# 复制tool/visdrone/visdrone2mot.py到数据集目录下 +# 生成visdrone_pedestrian MOT格式的数据,抽取类别classes=1,2 (pedestrian, people) +<<--生成前目录-->> +├── VisDrone2019-MOT-val +│ ├── annotations +│ ├── sequences +│ ├── visdrone2mot.py +<<--生成后目录-->> +├── VisDrone2019-MOT-val +│ ├── annotations +│ ├── sequences +│ ├── visdrone2mot.py +│ ├── visdrone_pedestrian +│ │ ├── images +│ │ │ ├── train +│ │ │ ├── val +│ │ ├── labels_with_ids +│ │ │ ├── train +│ │ │ ├── val +# 执行 +python visdrone2mot.py --transMot=True --data_name=visdrone_pedestrian --phase=val +python visdrone2mot.py --transMot=True --data_name=visdrone_pedestrian --phase=train +``` + +## 快速开始 + +### 1. 训练 +使用2个GPU通过如下命令一键式启动训练 +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608_visdrone_pedestrian/ --gpus 0,1 tools/train.py -c configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml +``` + +### 2. 评估 +使用单张GPU通过如下命令一键式启动评估 +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml -o weights=output/fairmot_dla34_30e_1088x608_visdrone_pedestrian/model_final.pdparams +``` + +### 3. 预测 +使用单个GPU通过如下命令预测一个视频,并保存为视频 +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams --video_file={your video name}.mp4 --save_videos +``` +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams +``` + +### 5. 用导出的模型基于Python去预测 +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608_visdrone_pedestrian --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + +## 引用 +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@INPROCEEDINGS{8237302, +author={S. {Manen} and M. {Gygli} and D. {Dai} and L. V. {Gool}}, +booktitle={2017 IEEE International Conference on Computer Vision (ICCV)}, +title={PathTrack: Fast Trajectory Annotation with Path Supervision}, +year={2017}, +volume={}, +number={}, +pages={290-299}, +doi={10.1109/ICCV.2017.40}, +ISSN={2380-7504}, +month={Oct},} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml new file mode 100644 index 0000000000000000000000000000000000000000..bc16b074cabefa7f85b7c04705ed0794aedefbc4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_pathtrack.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +weights: output/fairmot_dla34_30e_1088x608_pathtrack/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['pathtrack.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: pathtrack/images/test + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..741c1f45374b923e056019be3f53514df4d93e01 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +weights: output/fairmot_dla34_30e_1088x608_visdrone_pedestrian/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_pedestrian.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_pedestrian/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..aca526dc3d08153dbb529b11e8c62a9b184fb2af --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_pedestrian.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_pedestrian/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..1daab8fdc9afb8c50d1d051fe9bda8de925d5d88 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_pedestrian.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_pedestrian/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..8fd0563511577928a8136a9e841cb230aaa3b69b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_pedestrian.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_pedestrian/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/PaddleDetection-release-2.6/configs/mot/pedestrian/tools/visdrone/visdrone2mot.py b/PaddleDetection-release-2.6/configs/mot/pedestrian/tools/visdrone/visdrone2mot.py new file mode 100644 index 0000000000000000000000000000000000000000..0be2f1eb8fcb080738ccb45d01d6c20671381706 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/pedestrian/tools/visdrone/visdrone2mot.py @@ -0,0 +1,299 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import os +import os.path as osp +import cv2 +import argparse +import numpy as np +import random + +# The object category indicates the type of annotated object, +# (i.e., ignored regions(0), pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),others(11)) + +# Extract single class or multi class +isExtractMultiClass = False +# These sequences are excluded because there are too few pedestrians +exclude_seq = [ + "uav0000117_02622_v", "uav0000182_00000_v", "uav0000268_05773_v", + "uav0000305_00000_v" +] + + +def mkdir_if_missing(d): + if not osp.exists(d): + os.makedirs(d) + + +def genGtFile(seqPath, outPath, classes=[]): + id_idx = 0 + old_idx = -1 + with open(seqPath, 'r') as singleSeqFile: + motLine = [] + allLines = singleSeqFile.readlines() + for line in allLines: + line = line.replace('\n', '') + line = line.split(',') + # exclude occlusion!='2' + if line[-1] != '2' and line[7] in classes: + if old_idx != int(line[1]): + id_idx += 1 + old_idx = int(line[1]) + newLine = line[0:6] + newLine[1] = str(id_idx) + newLine.append('1') + if (len(classes) > 1 and isExtractMultiClass): + class_index = str(classes.index(line[7]) + 1) + newLine.append(class_index) + else: + newLine.append('1') # use permanent class '1' + newLine.append('1') + motLine.append(newLine) + mkdir_if_missing(outPath) + gtFilePath = osp.join(outPath, 'gt.txt') + with open(gtFilePath, 'w') as gtFile: + motLine = list(map(lambda x: str.join(',', x), motLine)) + motLineStr = str.join('\n', motLine) + gtFile.write(motLineStr) + + +def genSeqInfo(img1Path, seqName): + imgPaths = glob.glob(img1Path + '/*.jpg') + seqLength = len(imgPaths) + if seqLength > 0: + image1 = cv2.imread(imgPaths[0]) + imgHeight = image1.shape[0] + imgWidth = image1.shape[1] + else: + imgHeight = 0 + imgWidth = 0 + seqInfoStr = f'''[Sequence]\nname={seqName}\nimDir=img1\nframeRate=30\nseqLength={seqLength}\nimWidth={imgWidth}\nimHeight={imgHeight}\nimExt=.jpg''' + seqInfoPath = img1Path.replace('/img1', '') + with open(seqInfoPath + '/seqinfo.ini', 'w') as seqFile: + seqFile.write(seqInfoStr) + + +def copyImg(img1Path, gtTxtPath, outputFileName): + with open(gtTxtPath, 'r') as gtFile: + allLines = gtFile.readlines() + imgList = [] + for line in allLines: + imgIdx = int(line.split(',')[0]) + if imgIdx not in imgList: + imgList.append(imgIdx) + seqName = gtTxtPath.replace('./{}/'.format(outputFileName), + '').replace('/gt/gt.txt', '') + sourceImgPath = osp.join('./sequences', seqName, + '{:07d}.jpg'.format(imgIdx)) + os.system(f'cp {sourceImgPath} {img1Path}') + + +def genMotLabels(datasetPath, outputFileName, classes=['2']): + mkdir_if_missing(osp.join(datasetPath, outputFileName)) + annotationsPath = osp.join(datasetPath, 'annotations') + annotationsList = glob.glob(osp.join(annotationsPath, '*.txt')) + for annotationPath in annotationsList: + seqName = annotationPath.split('/')[-1].replace('.txt', '') + if seqName in exclude_seq: + continue + mkdir_if_missing(osp.join(datasetPath, outputFileName, seqName, 'gt')) + mkdir_if_missing(osp.join(datasetPath, outputFileName, seqName, 'img1')) + genGtFile(annotationPath, + osp.join(datasetPath, outputFileName, seqName, 'gt'), classes) + img1Path = osp.join(datasetPath, outputFileName, seqName, 'img1') + gtTxtPath = osp.join(datasetPath, outputFileName, seqName, 'gt/gt.txt') + copyImg(img1Path, gtTxtPath, outputFileName) + genSeqInfo(img1Path, seqName) + + +def deleteFileWhichImg1IsEmpty(mot16Path, dataType='train'): + path = mot16Path + data_images_train = osp.join(path, 'images', f'{dataType}') + data_images_train_seqs = glob.glob(data_images_train + '/*') + if (len(data_images_train_seqs) == 0): + print('dataset is empty!') + for data_images_train_seq in data_images_train_seqs: + data_images_train_seq_img1 = osp.join(data_images_train_seq, 'img1') + if len(glob.glob(data_images_train_seq_img1 + '/*.jpg')) == 0: + print(f"os.system(rm -rf {data_images_train_seq})") + os.system(f'rm -rf {data_images_train_seq}') + + +def formatMot16Path(dataPath, pathType='train'): + train_path = osp.join(dataPath, 'images', pathType) + mkdir_if_missing(train_path) + os.system(f'mv {dataPath}/* {train_path}') + + +def VisualGt(dataPath, phase='train'): + seqList = sorted(glob.glob(osp.join(dataPath, 'images', phase) + '/*')) + seqIndex = random.randint(0, len(seqList) - 1) + seqPath = seqList[seqIndex] + gt_path = osp.join(seqPath, 'gt', 'gt.txt') + img_list_path = sorted(glob.glob(osp.join(seqPath, 'img1', '*.jpg'))) + imgIndex = random.randint(0, len(img_list_path)) + img_Path = img_list_path[imgIndex] + frame_value = int(img_Path.split('/')[-1].replace('.jpg', '')) + gt_value = np.loadtxt(gt_path, dtype=int, delimiter=',') + gt_value = gt_value[gt_value[:, 0] == frame_value] + get_list = gt_value.tolist() + img = cv2.imread(img_Path) + colors = [[255, 0, 0], [255, 255, 0], [255, 0, 255], [0, 255, 0], + [0, 255, 255], [0, 0, 255]] + for seq, _id, pl, pt, w, h, _, bbox_class, _ in get_list: + cv2.putText(img, + str(bbox_class), (pl, pt), cv2.FONT_HERSHEY_PLAIN, 2, + colors[bbox_class - 1]) + cv2.rectangle( + img, (pl, pt), (pl + w, pt + h), + colors[bbox_class - 1], + thickness=2) + cv2.imwrite('testGt.jpg', img) + + +def VisualDataset(datasetPath, phase='train', seqName='', frameId=1): + trainPath = osp.join(datasetPath, 'labels_with_ids', phase) + seq1Paths = osp.join(trainPath, seqName) + seq_img1_path = osp.join(seq1Paths, 'img1') + label_with_idPath = osp.join(seq_img1_path, '%07d' % frameId) + '.txt' + image_path = label_with_idPath.replace('labels_with_ids', 'images').replace( + '.txt', '.jpg') + seqInfoPath = str.join('/', image_path.split('/')[:-2]) + seqInfoPath = seqInfoPath + '/seqinfo.ini' + seq_info = open(seqInfoPath).read() + width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + with open(label_with_idPath, 'r') as label: + allLines = label.readlines() + images = cv2.imread(image_path) + for line in allLines: + line = line.split(' ') + line = list(map(lambda x: float(x), line)) + c1, c2, w, h = line[2:6] + x1 = c1 - w / 2 + x2 = c2 - h / 2 + x3 = c1 + w / 2 + x4 = c2 + h / 2 + cv2.rectangle( + images, (int(x1 * width), int(x2 * height)), + (int(x3 * width), int(x4 * height)), (255, 0, 0), + thickness=2) + cv2.imwrite('test.jpg', images) + + +def gen_image_list(dataPath, datType): + inputPath = f'{dataPath}/images/{datType}' + pathList = glob.glob(inputPath + '/*') + pathList = sorted(pathList) + allImageList = [] + for pathSingle in pathList: + imgList = sorted(glob.glob(osp.join(pathSingle, 'img1', '*.jpg'))) + for imgPath in imgList: + allImageList.append(imgPath) + with open(f'{dataPath}.{datType}', 'w') as image_list_file: + allImageListStr = str.join('\n', allImageList) + image_list_file.write(allImageListStr) + + +def gen_labels_mot(MOT_data, phase='train'): + seq_root = './{}/images/{}'.format(MOT_data, phase) + label_root = './{}/labels_with_ids/{}'.format(MOT_data, phase) + mkdir_if_missing(label_root) + seqs = [s for s in os.listdir(seq_root)] + print('seqs => ', seqs) + tid_curr = 0 + tid_last = -1 + for seq in seqs: + seq_info = open(osp.join(seq_root, seq, 'seqinfo.ini')).read() + seq_width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + seq_height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt') + gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',') + + seq_label_root = osp.join(label_root, seq, 'img1') + mkdir_if_missing(seq_label_root) + + for fid, tid, x, y, w, h, mark, label, _ in gt: + # if mark == 0 or not label == 1: + # continue + fid = int(fid) + tid = int(tid) + if not tid == tid_last: + tid_curr += 1 + tid_last = tid + x += w / 2 + y += h / 2 + label_fpath = osp.join(seq_label_root, '{:07d}.txt'.format(fid)) + label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format( + tid_curr, x / seq_width, y / seq_height, w / seq_width, + h / seq_height) + with open(label_fpath, 'a') as f: + f.write(label_str) + + +def parse_arguments(): + parser = argparse.ArgumentParser(description='input method') + parser.add_argument("--transMot", type=bool, default=False) + parser.add_argument("--genMot", type=bool, default=False) + parser.add_argument("--formatMotPath", type=bool, default=False) + parser.add_argument("--deleteEmpty", type=bool, default=False) + parser.add_argument("--genLabelsMot", type=bool, default=False) + parser.add_argument("--genImageList", type=bool, default=False) + parser.add_argument("--visualImg", type=bool, default=False) + parser.add_argument("--visualGt", type=bool, default=False) + parser.add_argument("--data_name", type=str, default='visdrone_pedestrian') + parser.add_argument("--phase", type=str, default='train') + parser.add_argument( + "--classes", type=str, default='1,2') # pedestrian and people + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_arguments() + classes = args.classes.split(',') + datasetPath = './' + dataName = args.data_name + phase = args.phase + if args.transMot: + genMotLabels(datasetPath, dataName, classes) + formatMot16Path(dataName, pathType=phase) + mot16Path = f'./{dataName}' + deleteFileWhichImg1IsEmpty(mot16Path, dataType=phase) + gen_labels_mot(dataName, phase=phase) + gen_image_list(dataName, phase) + if args.genMot: + genMotLabels(datasetPath, dataName, classes) + if args.formatMotPath: + formatMot16Path(dataName, pathType=phase) + if args.deleteEmpty: + mot16Path = f'./{dataName}' + deleteFileWhichImg1IsEmpty(mot16Path, dataType=phase) + if args.genLabelsMot: + gen_labels_mot(dataName, phase=phase) + if args.genImageList: + gen_image_list(dataName, phase) + if args.visualGt: + VisualGt(f'./{dataName}', phase) + if args.visualImg: + seqName = 'uav0000137_00458_v' + frameId = 43 + VisualDataset( + f'./{dataName}', phase=phase, seqName=seqName, frameId=frameId) diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/README.md b/PaddleDetection-release-2.6/configs/mot/vehicle/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/README_cn.md b/PaddleDetection-release-2.6/configs/mot/vehicle/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..606c583621431a2036c69f3f626684192422b4bd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/README_cn.md @@ -0,0 +1,171 @@ +[English](README.md) | 简体中文 +# 特色垂类跟踪模型 + +## 车辆跟踪 (Vehicle Tracking) + +车辆跟踪的主要应用之一是交通监控。在监控场景中,大多是从公共区域的监控摄像头视角拍摄车辆,获取图像后再进行车辆检测和跟踪。 + + +[BDD100K](https://www.bdd100k.com)是伯克利大学AI实验室(BAIR)提出的一个驾驶视频数据集,是以驾驶员视角为主。该数据集不仅分多类别标注,还分晴天、多云等六种天气,住宅区、公路等六种场景,白天、夜晚等三个时间段,以及是否遮挡、是否截断。BDD100K MOT数据集包含1400个视频序列用于训练,200个视频序列用于验证。每个视频序列大约40秒长,每秒5帧,因此每个视频大约有200帧。此处针对BDD100K MOT数据集进行提取,抽取出类别为car, truck, bus, trailer, other vehicle的数据组合成一个Vehicle类别。 + +[KITTI](http://www.cvlibs.net/datasets/kitti)是一个包含市区、乡村和高速公路等场景采集的数据集,每张图像中最多达15辆车和30个行人,还有各种程度的遮挡与截断。[KITTI-Tracking](http://www.cvlibs.net/datasets/kitti/eval_tracking.php)(2D bounding-boxes)数据集一共有50个视频序列,21个为训练集,29个为测试集,目标是估计类别Car和Pedestrian的目标轨迹,此处抽取出类别为Car的数据作为一个Vehicle类别。 + +[VisDrone](http://aiskyeye.com)是无人机视角拍摄的数据集,是以俯视视角为主。该数据集涵盖不同位置(取自中国数千个相距数千公里的14个不同城市)、不同环境(城市和乡村)、不同物体(行人、车辆、自行车等)和不同密度(稀疏和拥挤的场景)。[VisDrone2019-MOT](https://github.com/VisDrone/VisDrone-Dataset)包含56个视频序列用于训练,7个视频序列用于验证。此处针对VisDrone2019-MOT多目标跟踪数据集进行提取,抽取出类别为car、van、truck、bus的数据组合成一个Vehicle类别。 + + +## 模型库 + +### FairMOT在各个数据集val-set上Vehicle类别的结果 + +| 数据集 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | FPS | 下载链接 | 配置文件 | +| :-------------| :-------- | :------- | :----: | :----: | :----: | :-----: |:------: | +| BDD100K | DLA-34 | 1088x608 | 43.5 | 50.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml) | +| BDD100K | HRNetv2-W18| 576x320 | 32.6 | 38.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml) | +| KITTI | DLA-34 | 1088x608 | 82.7 | - | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_kitti_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_kitti_vehicle.yml) | +| VisDrone | DLA-34 | 1088x608 | 52.1 | 63.3 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_vehicle.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_vehicle.yml) | +| VisDrone | HRNetv2-W18| 1088x608 | 46.0 | 56.8 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.yml) | +| VisDrone | HRNetv2-W18| 864x480 | 43.7 | 56.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.yml) | +| VisDrone | HRNetv2-W18| 576x320 | 39.8 | 52.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml) | + +**注意:** + - FairMOT均使用DLA-34为骨干网络,4个GPU进行训练,每个GPU上batch size为6,训练30个epoch。 + + +## 数据集准备和处理 + +### 1、数据集处理代码说明 +代码统一都在tools目录下 +``` +# bdd100kmot +tools/bdd100kmot/gen_bdd100kmot_vehicle.sh:通过执行bdd100k2mot.py和gen_labels_MOT.py生成bdd100kmot_vehicle 数据集 +tools/bdd100kmot/bdd100k2mot.py:将bdd100k全集转换成mot格式 +tools/bdd100kmot/gen_labels_MOT.py:生成单类别的labels_with_ids文件 +# visdrone +tools/visdrone/visdrone2mot.py:生成visdrone_vehicle +``` + +### 2、bdd100kmot_vehicle数据集处理 +``` +# 复制tools/bdd100kmot里的代码到数据集目录下 +# 生成bdd100kmot_vehicle MOT格式的数据,抽取类别classes=2,3,4,9,10 (car, truck, bus, trailer, other vehicle) +<<--生成前目录-->> +├── bdd100k +│ ├── images +│ ├── labels +<<--生成后目录-->> +├── bdd100k +│ ├── images +│ ├── labels +│ ├── bdd100kmot_vehicle +│ │ ├── images +│ │ │ ├── train +│ │ │ ├── val +│ │ ├── labels_with_ids +│ │ │ ├── train +│ │ │ ├── val +# 执行 +sh gen_bdd100kmot_vehicle.sh +``` + +### 3、visdrone_vehicle数据集处理 +``` +# 复制tools/visdrone/visdrone2mot.py到数据集目录下 +# 生成visdrone_vehicle MOT格式的数据,抽取类别classes=4,5,6,9 (car, van, truck, bus) +<<--生成前目录-->> +├── VisDrone2019-MOT-val +│ ├── annotations +│ ├── sequences +│ ├── visdrone2mot.py +<<--生成后目录-->> +├── VisDrone2019-MOT-val +│ ├── annotations +│ ├── sequences +│ ├── visdrone2mot.py +│ ├── visdrone_vehicle +│ │ ├── images +│ │ │ ├── train +│ │ │ ├── val +│ │ ├── labels_with_ids +│ │ │ ├── train +│ │ │ ├── val +# 执行 +python visdrone2mot.py --transMot=True --data_name=visdrone_vehicle --phase=val +python visdrone2mot.py --transMot=True --data_name=visdrone_vehicle --phase=train +``` + +## 快速开始 + +### 1. 训练 +使用2个GPU通过如下命令一键式启动训练 +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608_bdd100kmot_vehicle/ --gpus 0,1 tools/train.py -c configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml +``` + +### 2. 评估 +使用单张GPU通过如下命令一键式启动评估 +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml -o weights=output/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle/model_final.pdparams +``` + +### 3. 预测 +使用单个GPU通过如下命令预测一个视频,并保存为视频 +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams --video_file={your video name}.mp4 --save_videos +``` +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + +### 4. 导出预测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.pdparams +``` + +### 5. 用导出的模型基于Python去预测 +```bash +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle --video_file={your video name}.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + +## 引用 +``` +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} + +@InProceedings{bdd100k, + author = {Yu, Fisher and Chen, Haofeng and Wang, Xin and Xian, Wenqi and Chen, + Yingying and Liu, Fangchen and Madhavan, Vashisht and Darrell, Trevor}, + title = {BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} + +@INPROCEEDINGS{Geiger2012CVPR, + author = {Andreas Geiger and Philip Lenz and Raquel Urtasun}, + title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}, + booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}, + year = {2012} +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +``` diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..1e2d3b332777d16ac5355f4fde9710f42375d5ff --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +weights: output/fairmot_dla34_30e_1088x608_bdd100kmot_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['bdd100kmot_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: bdd100kmot_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_kitti_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_kitti_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..09dc886a2043369db509b86344913c66d55465a0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_kitti_vehicle.yml @@ -0,0 +1,41 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +metric: KITTI +weights: output/fairmot_dla34_30e_1088x608_kitti_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['kitti_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: kitti_vehicle/images/train + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_visdrone_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_visdrone_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..6e7e84cb7439950cd3ecfddc3eab29a98354279a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_dla34_30e_1088x608_visdrone_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_dla34_30e_1088x608.yml' +] + +weights: output/fairmot_dla34_30e_1088x608_visdrone_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..63e79b54213f883156e92c3cc823148e31dc222a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..599536ff6aca317358f3877069c19bd17e954a30 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['bdd100kmot_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: bdd100kmot_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a155f110f24d449bacf725785d935753f36eba3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.yml b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..8dbbce5578c4585384997886994cd11772573d5a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml' +] + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_vehicle/model_final + +# for MOT training +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['visdrone_vehicle.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: visdrone_vehicle/images/val + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + +# for MOT video inference +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video + +# model config +FairMOT: + detector: CenterNet + reid: FairMOTEmbeddingHead + loss: FairMOTLoss + tracker: JDETracker + +JDETracker: + min_box_area: 0 + vertical_ratio: 0 # no need to filter bboxes according to w/h + conf_thres: 0.4 + tracked_thresh: 0.4 + metric_type: cosine diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/bdd100k2mot.py b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/bdd100k2mot.py new file mode 100644 index 0000000000000000000000000000000000000000..82ead01a39111045daa02112df5629f1c82fad14 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/bdd100k2mot.py @@ -0,0 +1,386 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import os +import os.path as osp +import cv2 +import random +import numpy as np +import argparse +import tqdm +import json + + +def mkdir_if_missing(d): + if not osp.exists(d): + os.makedirs(d) + + +def bdd2mot_tracking(img_dir, label_dir, save_img_dir, save_label_dir): + label_jsons = os.listdir(label_dir) + for label_json in tqdm(label_jsons): + with open(os.path.join(label_dir, label_json)) as f: + labels_json = json.load(f) + for label_json in labels_json: + img_name = label_json['name'] + video_name = label_json['videoName'] + labels = label_json['labels'] + txt_string = "" + for label in labels: + category = label['category'] + x1 = label['box2d']['x1'] + x2 = label['box2d']['x2'] + y1 = label['box2d']['y1'] + y2 = label['box2d']['y2'] + width = x2 - x1 + height = y2 - y1 + x_center = (x1 + x2) / 2. / args.width + y_center = (y1 + y2) / 2. / args.height + width /= args.width + height /= args.height + identity = int(label['id']) + # [class] [identity] [x_center] [y_center] [width] [height] + txt_string += "{} {} {} {} {} {}\n".format( + attr_id_dict[category], identity, x_center, y_center, + width, height) + + fn_label = os.path.join(save_label_dir, img_name[:-4] + '.txt') + source_img = os.path.join(img_dir, video_name, img_name) + target_img = os.path.join(save_img_dir, img_name) + with open(fn_label, 'w') as f: + f.write(txt_string) + os.system('cp {} {}'.format(source_img, target_img)) + + +def transBbox(bbox): + # bbox --> cx cy w h + bbox = list(map(lambda x: float(x), bbox)) + bbox[0] = (bbox[0] - bbox[2] / 2) * 1280 + bbox[1] = (bbox[1] - bbox[3] / 2) * 720 + bbox[2] = bbox[2] * 1280 + bbox[3] = bbox[3] * 720 + + bbox = list(map(lambda x: str(x), bbox)) + return bbox + + +def genSingleImageMot(inputPath, classes=[]): + labelPaths = glob.glob(inputPath + '/*.txt') + labelPaths = sorted(labelPaths) + allLines = [] + result = {} + for labelPath in labelPaths: + frame = str(int(labelPath.split('-')[-1].replace('.txt', ''))) + with open(labelPath, 'r') as labelPathFile: + lines = labelPathFile.readlines() + for line in lines: + line = line.replace('\n', '') + lineArray = line.split(' ') + if len(classes) > 0: + if lineArray[0] in classes: + lineArray.append(frame) + allLines.append(lineArray) + else: + lineArray.append(frame) + allLines.append(lineArray) + resultMap = {} + for line in allLines: + if line[1] not in resultMap.keys(): + resultMap[line[1]] = [] + resultMap[line[1]].append(line) + mot_gt = [] + id_idx = 0 + for rid in resultMap.keys(): + id_idx += 1 + for id_line in resultMap[rid]: + mot_line = [] + mot_line.append(id_line[-1]) + mot_line.append(str(id_idx)) + id_line_temp = transBbox(id_line[2:6]) + mot_line.extend(id_line_temp) + mot_line.append('1') # origin class: id_line[0] + mot_line.append('1') # permanent class => 1 + mot_line.append('1') + mot_gt.append(mot_line) + + result = list(map(lambda line: str.join(',', line), mot_gt)) + resultStr = str.join('\n', result) + return resultStr + + +def writeGt(inputPath, outPath, classes=[]): + singleImageResult = genSingleImageMot(inputPath, classes=classes) + outPathFile = outPath + '/gt.txt' + mkdir_if_missing(outPath) + with open(outPathFile, 'w') as gtFile: + gtFile.write(singleImageResult) + + +def genSeqInfo(seqInfoPath): + name = seqInfoPath.split('/')[-2] + img1Path = osp.join(str.join('/', seqInfoPath.split('/')[0:-1]), 'img1') + seqLength = len(glob.glob(img1Path + '/*.jpg')) + seqInfoStr = f'''[Sequence]\nname={name}\nimDir=img1\nframeRate=30\nseqLength={seqLength}\nimWidth=1280\nimHeight=720\nimExt=.jpg''' + with open(seqInfoPath, 'w') as seqFile: + seqFile.write(seqInfoStr) + + +def genMotGt(dataDir, classes=[]): + seqLists = sorted(glob.glob(dataDir)) + for seqList in seqLists: + inputPath = osp.join(seqList, 'img1') + outputPath = seqList.replace('labels_with_ids', 'images') + outputPath = osp.join(outputPath, 'gt') + mkdir_if_missing(outputPath) + print('processing...', outputPath) + writeGt(inputPath, outputPath, classes=classes) + seqList = seqList.replace('labels_with_ids', 'images') + seqInfoPath = osp.join(seqList, 'seqinfo.ini') + genSeqInfo(seqInfoPath) + + +def updateSeqInfo(dataDir, phase): + seqPath = osp.join(dataDir, 'labels_with_ids', phase) + seqList = glob.glob(seqPath + '/*') + for seqName in seqList: + print('seqName=>', seqName) + seqName_img1_dir = osp.join(seqName, 'img1') + txtLength = glob.glob(seqName_img1_dir + '/*.txt') + name = seqName.split('/')[-1].replace('.jpg', '').replace('.txt', '') + seqLength = len(txtLength) + seqInfoStr = f'''[Sequence]\nname={name}\nimDir=img1\nframeRate=30\nseqLength={seqLength}\nimWidth=1280\nimHeight=720\nimExt=.jpg''' + seqInfoPath = seqName_img1_dir.replace('labels_with_ids', 'images') + seqInfoPath = seqInfoPath.replace('/img1', '') + seqInfoPath = seqInfoPath + '/seqinfo.ini' + with open(seqInfoPath, 'w') as seqFile: + seqFile.write(seqInfoStr) + + +def VisualDataset(datasetPath, phase='train', seqName='', frameId=1): + trainPath = osp.join(datasetPath, 'labels_with_ids', phase) + seq1Paths = osp.join(trainPath, seqName) + seq_img1_path = osp.join(seq1Paths, 'img1') + label_with_idPath = osp.join(seq_img1_path, seqName + '-' + '%07d' % + frameId) + '.txt' + image_path = label_with_idPath.replace('labels_with_ids', 'images').replace( + '.txt', '.jpg') + + seqInfoPath = str.join('/', image_path.split('/')[:-2]) + seqInfoPath = seqInfoPath + '/seqinfo.ini' + seq_info = open(seqInfoPath).read() + width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + with open(label_with_idPath, 'r') as label: + allLines = label.readlines() + images = cv2.imread(image_path) + print('image_path => ', image_path) + for line in allLines: + line = line.split(' ') + line = list(map(lambda x: float(x), line)) + c1, c2, w, h = line[2:6] + x1 = c1 - w / 2 + x2 = c2 - h / 2 + x3 = c1 + w / 2 + x4 = c2 + h / 2 + cv2.rectangle( + images, (int(x1 * width), int(x2 * height)), + (int(x3 * width), int(x4 * height)), (255, 0, 0), + thickness=2) + cv2.imwrite('test.jpg', images) + + +def VisualGt(dataPath, phase='train'): + seqList = sorted(glob.glob(osp.join(dataPath, 'images', phase) + '/*')) + seqIndex = random.randint(0, len(seqList) - 1) + seqPath = seqList[seqIndex] + gt_path = osp.join(seqPath, 'gt', 'gt.txt') + img_list_path = sorted(glob.glob(osp.join(seqPath, 'img1', '*.jpg'))) + imgIndex = random.randint(0, len(img_list_path)) + img_Path = img_list_path[imgIndex] + + frame_value = img_Path.split('/')[-1].replace('.jpg', '') + frame_value = frame_value.split('-')[-1] + frame_value = int(frame_value) + seqNameStr = img_Path.split('/')[-1].replace('.jpg', '').replace('img', '') + frame_value = int(seqNameStr.split('-')[-1]) + print('frame_value => ', frame_value) + gt_value = np.loadtxt(gt_path, dtype=float, delimiter=',') + gt_value = gt_value[gt_value[:, 0] == frame_value] + + get_list = gt_value.tolist() + img = cv2.imread(img_Path) + + colors = [[255, 0, 0], [255, 255, 0], [255, 0, 255], [0, 255, 0], + [0, 255, 255], [0, 0, 255]] + for seq, _id, pl, pt, w, h, _, bbox_class, _ in get_list: + pl, pt, w, h = int(pl), int(pt), int(w), int(h) + print('pl,pt,w,h => ', pl, pt, w, h) + cv2.putText(img, + str(bbox_class), (pl, pt), cv2.FONT_HERSHEY_PLAIN, 2, + colors[int(bbox_class - 1)]) + cv2.rectangle( + img, (pl, pt), (pl + w, pt + h), + colors[int(bbox_class - 1)], + thickness=2) + cv2.imwrite('testGt.jpg', img) + print(seqPath, frame_value) + return seqPath.split('/')[-1], frame_value + + +def gen_image_list(dataPath, datType): + inputPath = f'{dataPath}/labels_with_ids/{datType}' + pathList = sorted(glob.glob(inputPath + '/*')) + print(pathList) + allImageList = [] + for pathSingle in pathList: + imgList = sorted(glob.glob(osp.join(pathSingle, 'img1', '*.txt'))) + for imgPath in imgList: + imgPath = imgPath.replace('labels_with_ids', 'images').replace( + '.txt', '.jpg') + allImageList.append(imgPath) + with open(f'{dataPath}.{datType}', 'w') as image_list_file: + allImageListStr = str.join('\n', allImageList) + image_list_file.write(allImageListStr) + + +def formatOrigin(datapath, phase): + label_with_idPath = osp.join(datapath, 'labels_with_ids', phase) + print(label_with_idPath) + for txtList in sorted(glob.glob(label_with_idPath + '/*.txt')): + print(txtList) + seqName = txtList.split('/')[-1] + seqName = str.join('-', seqName.split('-')[0:-1]).replace('.txt', '') + seqPath = osp.join(label_with_idPath, seqName, 'img1') + mkdir_if_missing(seqPath) + os.system(f'mv {txtList} {seqPath}') + + +def copyImg(fromRootPath, toRootPath, phase): + fromPath = osp.join(fromRootPath, 'images', phase) + toPathSeqPath = osp.join(toRootPath, 'labels_with_ids', phase) + seqList = sorted(glob.glob(toPathSeqPath + '/*')) + for seqPath in seqList: + seqName = seqPath.split('/')[-1] + imgTxtList = sorted(glob.glob(osp.join(seqPath, 'img1') + '/*.txt')) + img_toPathSeqPath = osp.join(seqPath, 'img1') + img_toPathSeqPath = img_toPathSeqPath.replace('labels_with_ids', + 'images') + mkdir_if_missing(img_toPathSeqPath) + + for imgTxt in imgTxtList: + imgName = imgTxt.split('/')[-1].replace('.txt', '.jpg') + imgfromPath = osp.join(fromPath, seqName, imgName) + print(f'cp {imgfromPath} {img_toPathSeqPath}') + os.system(f'cp {imgfromPath} {img_toPathSeqPath}') + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='BDD100K to MOT format') + parser.add_argument("--data_path", default='bdd100k') + parser.add_argument("--phase", default='train') + parser.add_argument("--classes", default='2,3,4,9,10') + + parser.add_argument("--img_dir", default="bdd100k/images/track/") + parser.add_argument("--label_dir", default="bdd100k/labels/box_track_20/") + parser.add_argument("--save_path", default="bdd100kmot_vehicle") + parser.add_argument("--height", default=720) + parser.add_argument("--width", default=1280) + args = parser.parse_args() + + attr_dict = dict() + attr_dict["categories"] = [{ + "supercategory": "none", + "id": 0, + "name": "pedestrian" + }, { + "supercategory": "none", + "id": 1, + "name": "rider" + }, { + "supercategory": "none", + "id": 2, + "name": "car" + }, { + "supercategory": "none", + "id": 3, + "name": "truck" + }, { + "supercategory": "none", + "id": 4, + "name": "bus" + }, { + "supercategory": "none", + "id": 5, + "name": "train" + }, { + "supercategory": "none", + "id": 6, + "name": "motorcycle" + }, { + "supercategory": "none", + "id": 7, + "name": "bicycle" + }, { + "supercategory": "none", + "id": 8, + "name": "other person" + }, { + "supercategory": "none", + "id": 9, + "name": "trailer" + }, { + "supercategory": "none", + "id": 10, + "name": "other vehicle" + }] + attr_id_dict = {i['name']: i['id'] for i in attr_dict['categories']} + + # create bdd100kmot_vehicle training set in MOT format + print('Loading and converting training set...') + train_img_dir = os.path.join(args.img_dir, 'train') + train_label_dir = os.path.join(args.label_dir, 'train') + save_img_dir = os.path.join(args.save_path, 'images', 'train') + save_label_dir = os.path.join(args.save_path, 'labels_with_ids', 'train') + if not os.path.exists(save_img_dir): os.makedirs(save_img_dir) + if not os.path.exists(save_label_dir): os.makedirs(save_label_dir) + bdd2mot_tracking(train_img_dir, train_label_dir, save_img_dir, + save_label_dir) + + # create bdd100kmot_vehicle validation set in MOT format + print('Loading and converting validation set...') + val_img_dir = os.path.join(args.img_dir, 'val') + val_label_dir = os.path.join(args.label_dir, 'val') + save_img_dir = os.path.join(args.save_path, 'images', 'val') + save_label_dir = os.path.join(args.save_path, 'labels_with_ids', 'val') + if not os.path.exists(save_img_dir): os.makedirs(save_img_dir) + if not os.path.exists(save_label_dir): os.makedirs(save_label_dir) + bdd2mot_tracking(val_img_dir, val_label_dir, save_img_dir, save_label_dir) + + # gen gt file + dataPath = args.data_path + phase = args.phase + classes = args.classes.split(',') + formatOrigin(osp.join(dataPath, 'bdd100kmot_vehicle'), phase) + dataDir = osp.join( + osp.join(dataPath, 'bdd100kmot_vehicle'), 'labels_with_ids', + phase) + '/*' + genMotGt(dataDir, classes=classes) + copyImg(dataPath, osp.join(dataPath, 'bdd100kmot_vehicle'), phase) + updateSeqInfo(osp.join(dataPath, 'bdd100kmot_vehicle'), phase) + gen_image_list(osp.join(dataPath, 'bdd100kmot_vehicle'), phase) + os.system(f'rm -r {dataPath}/bdd100kmot_vehicle/images/' + phase + '/*.jpg') diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_bdd100kmot_vehicle.sh b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_bdd100kmot_vehicle.sh new file mode 100644 index 0000000000000000000000000000000000000000..b88b25180d9615b5277b1101f321c0d2704c3241 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_bdd100kmot_vehicle.sh @@ -0,0 +1,16 @@ +data_path=bdd100k +img_dir=${data_path}/images/track +label_dir=${data_path}/labels/box_track_20 +save_path=${data_path}/bdd100kmot_vehicle + +phasetrain=train +phaseval=val +classes=2,3,4,9,10 + +# gen mot dataset +python bdd100k2mot.py --data_path=${data_path} --phase=${phasetrain} --classes=${classes} --img_dir=${img_dir} --label_dir=${label_dir} --save_path=${save_path} +python bdd100k2mot.py --data_path=${data_path} --phase=${phaseval} --classes=${classes} --img_dir=${img_dir} --label_dir=${label_dir} --save_path=${save_path} + +# gen new labels_with_ids +python gen_labels_MOT.py --mot_data=${data_path} --phase=${phasetrain} +python gen_labels_MOT.py --mot_data=${data_path} --phase=${phaseval} diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_labels_MOT.py b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_labels_MOT.py new file mode 100644 index 0000000000000000000000000000000000000000..91aa800c38591ce52c146dad9a73aecd7741fed7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/bdd100kmot/gen_labels_MOT.py @@ -0,0 +1,72 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import os.path as osp +import numpy as np +import argparse + + +def mkdirs(d): + if not osp.exists(d): + os.makedirs(d) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='BDD100K to MOT format') + parser.add_argument( + "--mot_data", default='./bdd100k') + parser.add_argument("--phase", default='train') + args = parser.parse_args() + + MOT_data = args.mot_data + phase = args.phase + seq_root = osp.join(MOT_data, 'bdd100kmot_vehicle', 'images', phase) + label_root = osp.join(MOT_data, 'bdd100kmot_vehicle', 'labels_with_ids', + phase) + mkdirs(label_root) + seqs = [s for s in os.listdir(seq_root)] + tid_curr = 0 + tid_last = -1 + + os.system(f'rm -r {MOT_data}/bdd100kmot_vehicle/labels_with_ids') + for seq in seqs: + print('seq => ', seq) + seq_info = open(osp.join(seq_root, seq, 'seqinfo.ini')).read() + seq_width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + seq_height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt') + gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',') + + seq_label_root = osp.join(label_root, seq, 'img1') + mkdirs(seq_label_root) + + for fid, tid, x, y, w, h, mark, label, _ in gt: + fid = int(fid) + tid = int(tid) + if not tid == tid_last: + tid_curr += 1 + tid_last = tid + x += w / 2 + y += h / 2 + label_fpath = osp.join(seq_label_root, + seq + '-' + '{:07d}.txt'.format(fid)) + label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format( + tid_curr, x / seq_width, y / seq_height, w / seq_width, + h / seq_height) + with open(label_fpath, 'a') as f: + f.write(label_str) diff --git a/PaddleDetection-release-2.6/configs/mot/vehicle/tools/visdrone/visdrone2mot.py b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/visdrone/visdrone2mot.py new file mode 100644 index 0000000000000000000000000000000000000000..a2fa200204f5656ce015d371715b0f7c2bf9366d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/mot/vehicle/tools/visdrone/visdrone2mot.py @@ -0,0 +1,295 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import os +import os.path as osp +import cv2 +import argparse +import numpy as np +import random + +# The object category indicates the type of annotated object, +# (i.e., ignored regions(0), pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),others(11)) + +# Extract single class or multi class +isExtractMultiClass = False +# The sequence is excluded because there are too few vehicles +exclude_seq = ["uav0000086_00000_v"] + + +def mkdir_if_missing(d): + if not osp.exists(d): + os.makedirs(d) + + +def genGtFile(seqPath, outPath, classes=[]): + id_idx = 0 + old_idx = -1 + with open(seqPath, 'r') as singleSeqFile: + motLine = [] + allLines = singleSeqFile.readlines() + for line in allLines: + line = line.replace('\n', '') + line = line.split(',') + # exclude occlusion!='2' + if line[-1] != '2' and line[7] in classes: + if old_idx != int(line[1]): + id_idx += 1 + old_idx = int(line[1]) + newLine = line[0:6] + newLine[1] = str(id_idx) + newLine.append('1') + if (len(classes) > 1 and isExtractMultiClass): + class_index = str(classes.index(line[7]) + 1) + newLine.append(class_index) + else: + newLine.append('1') # use permanent class '1' + newLine.append('1') + motLine.append(newLine) + mkdir_if_missing(outPath) + gtFilePath = osp.join(outPath, 'gt.txt') + with open(gtFilePath, 'w') as gtFile: + motLine = list(map(lambda x: str.join(',', x), motLine)) + motLineStr = str.join('\n', motLine) + gtFile.write(motLineStr) + + +def genSeqInfo(img1Path, seqName): + imgPaths = glob.glob(img1Path + '/*.jpg') + seqLength = len(imgPaths) + if seqLength > 0: + image1 = cv2.imread(imgPaths[0]) + imgHeight = image1.shape[0] + imgWidth = image1.shape[1] + else: + imgHeight = 0 + imgWidth = 0 + seqInfoStr = f'''[Sequence]\nname={seqName}\nimDir=img1\nframeRate=30\nseqLength={seqLength}\nimWidth={imgWidth}\nimHeight={imgHeight}\nimExt=.jpg''' + seqInfoPath = img1Path.replace('/img1', '') + with open(seqInfoPath + '/seqinfo.ini', 'w') as seqFile: + seqFile.write(seqInfoStr) + + +def copyImg(img1Path, gtTxtPath, outputFileName): + with open(gtTxtPath, 'r') as gtFile: + allLines = gtFile.readlines() + imgList = [] + for line in allLines: + imgIdx = int(line.split(',')[0]) + if imgIdx not in imgList: + imgList.append(imgIdx) + seqName = gtTxtPath.replace('./{}/'.format(outputFileName), + '').replace('/gt/gt.txt', '') + sourceImgPath = osp.join('./sequences', seqName, + '{:07d}.jpg'.format(imgIdx)) + os.system(f'cp {sourceImgPath} {img1Path}') + + +def genMotLabels(datasetPath, outputFileName, classes=['2']): + mkdir_if_missing(osp.join(datasetPath, outputFileName)) + annotationsPath = osp.join(datasetPath, 'annotations') + annotationsList = glob.glob(osp.join(annotationsPath, '*.txt')) + for annotationPath in annotationsList: + seqName = annotationPath.split('/')[-1].replace('.txt', '') + if seqName in exclude_seq: + continue + mkdir_if_missing(osp.join(datasetPath, outputFileName, seqName, 'gt')) + mkdir_if_missing(osp.join(datasetPath, outputFileName, seqName, 'img1')) + genGtFile(annotationPath, + osp.join(datasetPath, outputFileName, seqName, 'gt'), classes) + img1Path = osp.join(datasetPath, outputFileName, seqName, 'img1') + gtTxtPath = osp.join(datasetPath, outputFileName, seqName, 'gt/gt.txt') + copyImg(img1Path, gtTxtPath, outputFileName) + genSeqInfo(img1Path, seqName) + + +def deleteFileWhichImg1IsEmpty(mot16Path, dataType='train'): + path = mot16Path + data_images_train = osp.join(path, 'images', f'{dataType}') + data_images_train_seqs = glob.glob(data_images_train + '/*') + if (len(data_images_train_seqs) == 0): + print('dataset is empty!') + for data_images_train_seq in data_images_train_seqs: + data_images_train_seq_img1 = osp.join(data_images_train_seq, 'img1') + if len(glob.glob(data_images_train_seq_img1 + '/*.jpg')) == 0: + print(f"os.system(rm -rf {data_images_train_seq})") + os.system(f'rm -rf {data_images_train_seq}') + + +def formatMot16Path(dataPath, pathType='train'): + train_path = osp.join(dataPath, 'images', pathType) + mkdir_if_missing(train_path) + os.system(f'mv {dataPath}/* {train_path}') + + +def VisualGt(dataPath, phase='train'): + seqList = sorted(glob.glob(osp.join(dataPath, 'images', phase) + '/*')) + seqIndex = random.randint(0, len(seqList) - 1) + seqPath = seqList[seqIndex] + gt_path = osp.join(seqPath, 'gt', 'gt.txt') + img_list_path = sorted(glob.glob(osp.join(seqPath, 'img1', '*.jpg'))) + imgIndex = random.randint(0, len(img_list_path)) + img_Path = img_list_path[imgIndex] + frame_value = int(img_Path.split('/')[-1].replace('.jpg', '')) + gt_value = np.loadtxt(gt_path, dtype=int, delimiter=',') + gt_value = gt_value[gt_value[:, 0] == frame_value] + get_list = gt_value.tolist() + img = cv2.imread(img_Path) + colors = [[255, 0, 0], [255, 255, 0], [255, 0, 255], [0, 255, 0], + [0, 255, 255], [0, 0, 255]] + for seq, _id, pl, pt, w, h, _, bbox_class, _ in get_list: + cv2.putText(img, + str(bbox_class), (pl, pt), cv2.FONT_HERSHEY_PLAIN, 2, + colors[bbox_class - 1]) + cv2.rectangle( + img, (pl, pt), (pl + w, pt + h), + colors[bbox_class - 1], + thickness=2) + cv2.imwrite('testGt.jpg', img) + + +def VisualDataset(datasetPath, phase='train', seqName='', frameId=1): + trainPath = osp.join(datasetPath, 'labels_with_ids', phase) + seq1Paths = osp.join(trainPath, seqName) + seq_img1_path = osp.join(seq1Paths, 'img1') + label_with_idPath = osp.join(seq_img1_path, '%07d' % frameId) + '.txt' + image_path = label_with_idPath.replace('labels_with_ids', 'images').replace( + '.txt', '.jpg') + seqInfoPath = str.join('/', image_path.split('/')[:-2]) + seqInfoPath = seqInfoPath + '/seqinfo.ini' + seq_info = open(seqInfoPath).read() + width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + with open(label_with_idPath, 'r') as label: + allLines = label.readlines() + images = cv2.imread(image_path) + for line in allLines: + line = line.split(' ') + line = list(map(lambda x: float(x), line)) + c1, c2, w, h = line[2:6] + x1 = c1 - w / 2 + x2 = c2 - h / 2 + x3 = c1 + w / 2 + x4 = c2 + h / 2 + cv2.rectangle( + images, (int(x1 * width), int(x2 * height)), + (int(x3 * width), int(x4 * height)), (255, 0, 0), + thickness=2) + cv2.imwrite('test.jpg', images) + + +def gen_image_list(dataPath, datType): + inputPath = f'{dataPath}/images/{datType}' + pathList = glob.glob(inputPath + '/*') + pathList = sorted(pathList) + allImageList = [] + for pathSingle in pathList: + imgList = sorted(glob.glob(osp.join(pathSingle, 'img1', '*.jpg'))) + for imgPath in imgList: + allImageList.append(imgPath) + with open(f'{dataPath}.{datType}', 'w') as image_list_file: + allImageListStr = str.join('\n', allImageList) + image_list_file.write(allImageListStr) + + +def gen_labels_mot(MOT_data, phase='train'): + seq_root = './{}/images/{}'.format(MOT_data, phase) + label_root = './{}/labels_with_ids/{}'.format(MOT_data, phase) + mkdir_if_missing(label_root) + seqs = [s for s in os.listdir(seq_root)] + print('seqs => ', seqs) + tid_curr = 0 + tid_last = -1 + for seq in seqs: + seq_info = open(osp.join(seq_root, seq, 'seqinfo.ini')).read() + seq_width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + seq_height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt') + gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',') + + seq_label_root = osp.join(label_root, seq, 'img1') + mkdir_if_missing(seq_label_root) + + for fid, tid, x, y, w, h, mark, label, _ in gt: + # if mark == 0 or not label == 1: + # continue + fid = int(fid) + tid = int(tid) + if not tid == tid_last: + tid_curr += 1 + tid_last = tid + x += w / 2 + y += h / 2 + label_fpath = osp.join(seq_label_root, '{:07d}.txt'.format(fid)) + label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format( + tid_curr, x / seq_width, y / seq_height, w / seq_width, + h / seq_height) + with open(label_fpath, 'a') as f: + f.write(label_str) + + +def parse_arguments(): + parser = argparse.ArgumentParser(description='input method') + parser.add_argument("--transMot", type=bool, default=False) + parser.add_argument("--genMot", type=bool, default=False) + parser.add_argument("--formatMotPath", type=bool, default=False) + parser.add_argument("--deleteEmpty", type=bool, default=False) + parser.add_argument("--genLabelsMot", type=bool, default=False) + parser.add_argument("--genImageList", type=bool, default=False) + parser.add_argument("--visualImg", type=bool, default=False) + parser.add_argument("--visualGt", type=bool, default=False) + parser.add_argument("--data_name", type=str, default='visdrone_vehicle') + parser.add_argument("--phase", type=str, default='train') + parser.add_argument("--classes", type=str, default='4,5,6,9') + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_arguments() + classes = args.classes.split(',') + datasetPath = './' + dataName = args.data_name + phase = args.phase + if args.transMot: + genMotLabels(datasetPath, dataName, classes) + formatMot16Path(dataName, pathType=phase) + mot16Path = f'./{dataName}' + deleteFileWhichImg1IsEmpty(mot16Path, dataType=phase) + gen_labels_mot(dataName, phase=phase) + gen_image_list(dataName, phase) + if args.genMot: + genMotLabels(datasetPath, dataName, classes) + if args.formatMotPath: + formatMot16Path(dataName, pathType=phase) + if args.deleteEmpty: + mot16Path = f'./{dataName}' + deleteFileWhichImg1IsEmpty(mot16Path, dataType=phase) + if args.genLabelsMot: + gen_labels_mot(dataName, phase=phase) + if args.genImageList: + gen_image_list(dataName, phase) + if args.visualGt: + VisualGt(f'./{dataName}', phase) + if args.visualImg: + seqName = 'uav0000137_00458_v' + frameId = 43 + VisualDataset( + f'./{dataName}', phase=phase, seqName=seqName, frameId=frameId) diff --git a/PaddleDetection-release-2.6/configs/picodet/FULL_QUANTIZATION.md b/PaddleDetection-release-2.6/configs/picodet/FULL_QUANTIZATION.md new file mode 100644 index 0000000000000000000000000000000000000000..422ae07fe082849af3305e532677115ef822aabc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/FULL_QUANTIZATION.md @@ -0,0 +1,163 @@ +# PP-PicoDet全量化示例 + +目录: + +- [1.简介](#1简介) +- [2.Benchmark](#2Benchmark) +- [3.全量化流程](#全量化流程) + - [3.1 环境准备](#31-准备环境) + - [3.2 准备数据集](#32-准备数据集) + - [3.3 全精度模型训练](#33-全精度模型训练) + - [3.4 导出预测模型](#33-导出预测模型) + - [3.5 全量化并产出模型](#35-全量化并产出模型) +- [4.预测部署](#4预测部署) +- [5.FAQ](5FAQ) + +## 1. 简介 + +本示例以PicoDet为例,介绍从模型训练、模型全量化,到NPU硬件上部署的全流程。 + +* [Benchmark](#Benchmark)表格中已经提供了基于COCO数据预训练模型全量化的模型。 + +* 已经验证的NPU硬件: + + - 瑞芯微-开发板:Rockchip RV1109、Rockchip RV1126、Rockchip RK1808 + + - 晶晨-开发板:Amlogic A311D、Amlogic S905D3、Amlogic C308X + + - 恩智浦-开发板:NXP i.MX 8M Plus + + * 未验证硬件部署思路: + - 未验证,表示该硬件暂不支持Paddle Lite推理部署,可以选择Paddle2ONNX导出,使用硬件的推理引擎完成部署,前提该硬件支持ONNX的全量化模型。 + +## 2.Benchmark + +### PicoDet-S-NPU + +| 模型 | 策略 | mAP | FP32 | INT8 | 配置文件 | 模型 | +|:------------- |:-------- |:----:|:----:|:----:|:---------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------:| +| PicoDet-S-NPU | Baseline | 30.1 | - | - | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco_npu.yml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_416_coco_npu.tar) | +| PicoDet-S-NPU | 量化训练 | 29.7 | - | - | [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/full_quantization/detection/configs/picodet_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_npu_quant.tar) | + +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 + +## 3. 全量化流程 +基于自己数据训练的模型,可以参考如下流程。 + +### 3.1 准备环境 + +- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.3 +- PaddleDet >= 2.4 + +安装paddlepaddle: + +```shell +# CPU +pip install paddlepaddle +# GPU +pip install paddlepaddle-gpu +``` + +安装paddleslim: + +```shell +pip install paddleslim +``` + +安装paddledet: + +```shell +pip install paddledet +``` + +### 3.2 准备数据集 + +本案例默认以COCO数据进行全量化实验,如果自定义数据,可将数据按照COCO数据的标准准备;其他自定义数据,可以参考[PaddleDetection数据准备文档](../../docs/tutorials/data/PrepareDataSet.md) 来准备。 + +以PicoDet-S-NPU模型为例,如果已经准备好数据集,请直接修改[picodet_reader.yml](./configs/picodet_reader.yml)中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。 + +### 3.3 全精度模型训练 + +如需模型全量化,需要准备一个训好的全精度模型,如果已训好模型可跳过该步骤。 + +- 单卡GPU上训练: + +```shell +# training on single-GPU +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/picodet/picodet_s_416_coco_npu.yml --eval +``` + +**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。同时我们发布的config均由4卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小4倍。 + +- 多卡GPU上训练: + +```shell +# training on multi-GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/picodet_s_416_coco_npu.yml --eval +``` + +**注意:**PicoDet所有模型均由4卡GPU训练得到,如果改变训练GPU卡数,需要按线性比例缩放学习率base_lr。 + +- 评估: + +```shell +python tools/eval.py -c configs/picodet/picodet_s_416_coco_npu.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_npu.pdparams +``` + +### 3.4 导出预测模型 + +使用如下命令,导出Inference模型,用于全量化训练。导出模型默认存放在`output_inference`文件夹,包括*.pdmodel和*.pdiparams文件,用于全量化。 + +* 命令说明: + - -c: [3.3 全精度模型训练](#3.3全精度模型训练)训练时使用的yam配置文件。 + - -o weight: 预测模型文件,该文档直接使用基于COCO上训练好的模型。 + +```shell +python tools/export_model.py \ + -c configs/picodet/picodet_s_416_coco_npu.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_npu.pdparams \ +``` + +### 3.5 全量化训练并产出模型 + +- 进入PaddleSlim自动化压缩Demo文件夹下: + + ```shell + cd deploy/auto_compression/ + ``` + +全量化示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行全量化。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为: + +- 单卡量化训练: + + ``` + export CUDA_VISIBLE_DEVICES=0 + python run.py --config_path=./configs/picodet_s_qat_dis.yaml --save_dir='./output/' + ``` + +- 多卡量化训练: + + ``` + CUDA_VISIBLE_DEVICES=0,1,2,3 + python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \ + --config_path=./configs/picodet_s_qat_dis.yaml --save_dir='./output/' + ``` + +- 最终模型默认产出在`output`文件夹下,训练完成后,测试全量化模型精度 + +将config要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。使用eval.py脚本得到模型的mAP: + +``` +export CUDA_VISIBLE_DEVICES=0 +python eval.py --config_path=./configs/picodet_s_qat_dis.yaml +``` + +## 4.预测部署 + +请直接使用PicoDet的[Paddle Lite全量化Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/linux/picodet_detection)进行落地部署。 + +## 5.FAQ diff --git a/PaddleDetection-release-2.6/configs/picodet/README.md b/PaddleDetection-release-2.6/configs/picodet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..02a22c4c2f35ce99199ecd02ca1e6c5b428c3b8a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/README.md @@ -0,0 +1,355 @@ +简体中文 | [English](README_en.md) + +# PP-PicoDet + +![](../../docs/images/picedet_demo.jpeg) + +## 最新动态 + +- 发布PicoDet-NPU模型,支持模型全量化部署。详情请参考[PicoDet全量化示例](./FULL_QUANTIZATION.md) **(2022.08.10)** + +- 发布全新系列PP-PicoDet模型:**(2022.03.20)** + - (1)引入TAL及ETA Head,优化PAN等结构,精度提升2个点以上; + - (2)优化CPU端预测速度,同时训练速度提升一倍; + - (3)导出模型将后处理包含在网络中,预测直接输出box结果,无需二次开发,迁移成本更低,端到端预测速度提升10%-20%。 + +## 历史版本模型 + +- 详情请参考:[PicoDet 2021.10版本](./legacy_model/) + +## 简介 + +PaddleDetection中提出了全新的轻量级系列模型`PP-PicoDet`,在移动端具有卓越的性能,成为全新SOTA轻量级模型。详细的技术细节可以参考我们的[arXiv技术报告](https://arxiv.org/abs/2111.00902)。 + +PP-PicoDet模型有如下特点: + +- 🌟 更高的mAP: 第一个在1M参数量之内`mAP(0.5:0.95)`超越**30+**(输入416像素时)。 +- 🚀 更快的预测速度: 网络预测在ARM CPU下可达150FPS。 +- 😊 部署友好: 支持PaddleLite/MNN/NCNN/OpenVINO等预测库,支持转出ONNX,提供了C++/Python/Android的demo。 +- 😍 先进的算法: 我们在现有SOTA算法中进行了创新, 包括:ESNet, CSP-PAN, SimOTA等等。 + + +
    + +
    + +## 基线 + +| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[CPU](#latency)
    (ms) | 预测时延[Lite](#latency)
    (ms) | 权重下载 | 配置文件 | 导出模型 | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | :--------------------------------------- | +| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 3.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_xs_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 6.1ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_xs_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 4.8ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 6.6ms | 15.20ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 8.2ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 12.7ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 11.5ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 20.7ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 640*640 | 42.6 | 59.2 | 5.80 | 16.81 | 62.5ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_640_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet_non_postprocess.tar) | + +- 特色模型 + +| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[CPU](#latency)
    (ms) | 预测时延[Lite](#latency)
    (ms) | 权重下载 | 配置文件 | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | +| PicoDet-S-NPU | 416*416 | 30.1 | 44.2 | - | - | - | - | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_npu.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_npu.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco_npu.yml) | + + +
    +注意事项: + +- 时延测试: 我们所有的模型都在`英特尔酷睿i7 10750H`的CPU 和`骁龙865(4xA77+4xA55)`的ARM CPU上测试(4线程,FP16预测)。上面表格中标有`CPU`的是使用OpenVINO测试,标有`Lite`的是使用[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite)进行测试。 +- PicoDet在COCO train2017上训练,并且在COCO val2017上进行验证。使用4卡GPU训练,并且上表所有的预训练模型都是通过发布的默认配置训练得到。 +- Benchmark测试:测试速度benchmark性能时,导出模型后处理不包含在网络中,需要设置`-o export.benchmark=True` 或手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml#L12)。 + +
    + +#### 其他模型的基线 + +| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[NCNN](#latency)
    (ms) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | +| YOLOv3-Tiny | 416*416 | 16.6 | 33.1 | 8.86 | 5.62 | 25.42 | +| YOLOv4-Tiny | 416*416 | 21.7 | 40.2 | 6.06 | 6.96 | 23.69 | +| PP-YOLO-Tiny | 320*320 | 20.6 | - | 1.08 | 0.58 | 6.75 | +| PP-YOLO-Tiny | 416*416 | 22.7 | - | 1.08 | 1.02 | 10.48 | +| Nanodet-M | 320*320 | 20.6 | - | 0.95 | 0.72 | 8.71 | +| Nanodet-M | 416*416 | 23.5 | - | 0.95 | 1.2 | 13.35 | +| Nanodet-M 1.5x | 416*416 | 26.8 | - | 2.08 | 2.42 | 15.83 | +| YOLOX-Nano | 416*416 | 25.8 | - | 0.91 | 1.08 | 19.23 | +| YOLOX-Tiny | 416*416 | 32.8 | - | 5.06 | 6.45 | 32.77 | +| YOLOv5n | 640*640 | 28.4 | 46.0 | 1.9 | 4.5 | 40.35 | +| YOLOv5s | 640*640 | 37.2 | 56.0 | 7.2 | 16.5 | 78.05 | + +- ARM测试的benchmark脚本来自: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark)。 + +## 快速开始 + +
    +依赖包: + +- PaddlePaddle == 2.2.2 + +
    + +
    +安装 + +- [安装指导文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/INSTALL.md) +- [准备数据文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/data/PrepareDataSet_en.md) + +
    + +
    +训练&评估 + +- 单卡GPU上训练: + +```shell +# training on single-GPU +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval +``` + +**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。同时我们发布的config均由4卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小4倍。 + +- 多卡GPU上训练: + + +```shell +# training on multi-GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval +``` + +**注意:**PicoDet所有模型均由4卡GPU训练得到,如果改变训练GPU卡数,需要按线性比例缩放学习率base_lr。 + +- 评估: + +```shell +python tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +``` + +- 测试: + +```shell +python tools/infer.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +``` + +详情请参考[快速开始文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/GETTING_STARTED.md). + +
    + + +## 部署 + +### 导出及转换模型 + +
    +1. 导出模型 + +```shell +cd PaddleDetection +python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams \ + --output_dir=output_inference +``` + +- 如无需导出后处理,请指定:`-o export.benchmark=True`(如果-o已出现过,此处删掉-o)或者手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml) 中相应字段。 +- 如无需导出NMS,请指定:`-o export.nms=False`或者手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml) 中相应字段。 许多导出至ONNX场景只支持单输入及固定shape输出,所以如果导出至ONNX,推荐不导出NMS。 + +
    + +
    +2. 转换模型至Paddle Lite (点击展开) + +- 安装Paddlelite>=2.10: + +```shell +pip install paddlelite +``` + +- 转换模型至Paddle Lite格式: + +```shell +# FP32 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 +# FP16 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true +``` + +
    + +
    +3. 转换模型至ONNX (点击展开) + +- 安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 并且 ONNX > 1.10.1, 细节请参考[导出ONNX模型教程](../../deploy/EXPORT_ONNX_MODEL.md) + +```shell +pip install onnx +pip install paddle2onnx==0.9.2 +``` + +- 转换模型: + +```shell +paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file picodet_s_320_coco.onnx +``` + +- 简化ONNX模型: 使用`onnx-simplifier`库来简化ONNX模型。 + + - 安装 onnxsim >= 0.4.1: + ```shell + pip install onnxsim + ``` + - 简化ONNX模型: + ```shell + onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx + ``` + +
    + +- 部署用的模型 + +| 模型 | 输入尺寸 | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | +| PicoDet-XS | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet_fp16.tar) | +| PicoDet-XS | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet_fp16.tar) | +| PicoDet-S | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet_fp16.tar) | +| PicoDet-S | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_fp16.tar) | +| PicoDet-M | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet_fp16.tar) | +| PicoDet-M | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet_fp16.tar) | +| PicoDet-L | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 640*640 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet_fp16.tar) | + +### 部署 + +| 预测库 | Python | C++ | 带后处理预测 | +| :-------- | :--------: | :---------------------: | :----------------: | +| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)(带后处理开发中) | ✔︎ | +| Paddle Lite | - | [C++](../../deploy/lite) | ✔︎ | +| Android Demo | - | [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ | +| PaddleInference | [Python](../../deploy/python) | [C++](../../deploy/cpp) | ✔︎ | +| ONNXRuntime | [Python](../../deploy/third_engine/demo_onnxruntime) | Coming soon | ✔︎ | +| NCNN | Coming soon | [C++](../../deploy/third_engine/demo_ncnn) | ✘ | +| MNN | Coming soon | [C++](../../deploy/third_engine/demo_mnn) | ✘ | + + + +Android demo可视化: +
    + +
    + + +## 量化 + +
    +依赖包: + +- PaddlePaddle >= 2.2.2 +- PaddleSlim >= 2.2.2 + +**安装:** + +```shell +pip install paddleslim==2.2.2 +``` + +
    + +
    +量化训练 + +开始量化训练: + +```shell +python tools/train.py -c configs/picodet/picodet_s_416_coco_lcnet.yml \ + --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --eval +``` + +- 更多细节请参考[slim文档](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim) + +
    + +- 量化训练Model ZOO: + +| 量化模型 | 输入尺寸 | mAPval
    0.5:0.95 | Configs | Weight | Inference Model | Paddle Lite(INT8) | +| :-------- | :--------: | :--------------------: | :-------: | :----------------: | :----------------: | :----------------: | +| PicoDet-S | 416*416 | 31.5 | [config](./picodet_s_416_coco_lcnet.yml) | [slim config](../slim/quant/picodet_s_416_lcnet_quant.yml) | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet_quant.pdparams) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant_non_postprocess.tar) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant.nb) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant_non_postprocess.nb) | + +## 非结构化剪枝 + +
    +教程: + +训练及部署细节请参考[非结构化剪枝文档](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/legacy_model/pruner/README.md)。 + +
    + +## 应用 + +- **行人检测:** `PicoDet-S-Pedestrian`行人检测模型请参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) + +- **主体检测:** `PicoDet-L-Mainbody`主体检测模型请参考[主体检测文档](./legacy_model/application/mainbody_detection/README.md) + +## FAQ + +
    +显存爆炸(Out of memory error) + +请减小配置文件中`TrainReader`的`batch_size`。 + +
    + +
    +如何迁移学习 + +请重新设置配置文件中的`pretrain_weights`字段,比如利用COCO上训好的模型在自己的数据上继续训练: +```yaml +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams +``` + +
    + +
    +`transpose`算子在某些硬件上耗时验证 + +请使用`PicoDet-LCNet`模型,`transpose`较少。 + +
    + + +
    +如何计算模型参数量。 + +可以将以下代码插入:[trainer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/ppdet/engine/trainer.py#L141) 来计算参数量。 + +```python +params = sum([ + p.numel() for n, p in self.model. named_parameters() + if all([x not in n for x in ['_mean', '_variance']]) +]) # exclude BatchNorm running status +print('params: ', params) +``` + +
    + +## 引用PP-PicoDet +如果需要在你的研究中使用PP-PicoDet,请通过一下方式引用我们的技术报告: +``` +@misc{yu2021pppicodet, + title={PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices}, + author={Guanghua Yu and Qinyao Chang and Wenyu Lv and Chang Xu and Cheng Cui and Wei Ji and Qingqing Dang and Kaipeng Deng and Guanzhong Wang and Yuning Du and Baohua Lai and Qiwen Liu and Xiaoguang Hu and Dianhai Yu and Yanjun Ma}, + year={2021}, + eprint={2111.00902}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +``` diff --git a/PaddleDetection-release-2.6/configs/picodet/README_en.md b/PaddleDetection-release-2.6/configs/picodet/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..e5c63b7d1b44f0772304f19d00f3581a36f5600f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/README_en.md @@ -0,0 +1,342 @@ +English | [简体中文](README.md) + +# PP-PicoDet + +![](../../docs/images/picedet_demo.jpeg) + +## News + +- Released a new series of PP-PicoDet models: **(2022.03.20)** + - (1) It was used TAL/ETA Head and optimized PAN, which greatly improved the accuracy; + - (2) Moreover optimized CPU prediction speed, and the training speed is greatly improved; + - (3) The export model includes post-processing, and the prediction directly outputs the result, without secondary development, and the migration cost is lower. + +### Legacy Model + +- Please refer to: [PicoDet 2021.10](./legacy_model/) + +## Introduction + +We developed a series of lightweight models, named `PP-PicoDet`. Because of the excellent performance, our models are very suitable for deployment on mobile or CPU. For more details, please refer to our [report on arXiv](https://arxiv.org/abs/2111.00902). + +- 🌟 Higher mAP: the **first** object detectors that surpass mAP(0.5:0.95) **30+** within 1M parameters when the input size is 416. +- 🚀 Faster latency: 150FPS on mobile ARM CPU. +- 😊 Deploy friendly: support PaddleLite/MNN/NCNN/OpenVINO and provide C++/Python/Android implementation. +- 😍 Advanced algorithm: use the most advanced algorithms and offer innovation, such as ESNet, CSP-PAN, SimOTA with VFL, etc. + + +
    + +
    + +## Benchmark + +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[CPU](#latency)
    (ms) | Latency[Lite](#latency)
    (ms) | Weight | Config | Inference Model | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | :--------------------------------------- | +| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 3.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_xs_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 6.1ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_xs_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 4.8ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 6.6ms | 15.20ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 8.2ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 12.7ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 11.5ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 20.7ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 640*640 | 42.6 | 59.2 | 5.80 | 16.81 | 62.5ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_640_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet_non_postprocess.tar) | + +
    +Table Notes: + +- Latency: All our models test on `Intel core i7 10750H` CPU with MKLDNN by 12 threads and `Qualcomm Snapdragon 865(4xA77+4xA55)` with 4 threads by arm8 and with FP16. In the above table, test CPU latency on Paddle-Inference and testing Mobile latency with `Lite`->[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite). +- PicoDet is trained on COCO train2017 dataset and evaluated on COCO val2017. And PicoDet used 4 GPUs for training and all checkpoints are trained with default settings and hyperparameters. +- Benchmark test: When testing the speed benchmark, the post-processing is not included in the exported model, you need to set `-o export.benchmark=True` or manually modify [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml#L12). + +
    + +#### Benchmark of Other Models + +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[NCNN](#latency)
    (ms) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | +| YOLOv3-Tiny | 416*416 | 16.6 | 33.1 | 8.86 | 5.62 | 25.42 | +| YOLOv4-Tiny | 416*416 | 21.7 | 40.2 | 6.06 | 6.96 | 23.69 | +| PP-YOLO-Tiny | 320*320 | 20.6 | - | 1.08 | 0.58 | 6.75 | +| PP-YOLO-Tiny | 416*416 | 22.7 | - | 1.08 | 1.02 | 10.48 | +| Nanodet-M | 320*320 | 20.6 | - | 0.95 | 0.72 | 8.71 | +| Nanodet-M | 416*416 | 23.5 | - | 0.95 | 1.2 | 13.35 | +| Nanodet-M 1.5x | 416*416 | 26.8 | - | 2.08 | 2.42 | 15.83 | +| YOLOX-Nano | 416*416 | 25.8 | - | 0.91 | 1.08 | 19.23 | +| YOLOX-Tiny | 416*416 | 32.8 | - | 5.06 | 6.45 | 32.77 | +| YOLOv5n | 640*640 | 28.4 | 46.0 | 1.9 | 4.5 | 40.35 | +| YOLOv5s | 640*640 | 37.2 | 56.0 | 7.2 | 16.5 | 78.05 | + +- Testing Mobile latency with code: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark). + +## Quick Start + +
    +Requirements: + +- PaddlePaddle >= 2.2.2 + +
    + +
    +Installation + +- [Installation guide](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/INSTALL.md) +- [Prepare dataset](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/data/PrepareDataSet_en.md) + +
    + +
    +Training and Evaluation + +- Training model on single-GPU: + +```shell +# training on single-GPU +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval +``` +If the GPU is out of memory during training, reduce the batch_size in TrainReader, and reduce the base_lr in LearningRate proportionally. At the same time, the configs we published are all trained with 4 GPUs. If the number of GPUs is changed to 1, the base_lr needs to be reduced by a factor of 4. + +- Training model on multi-GPU: + + +```shell +# training on multi-GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval +``` + +- Evaluation: + +```shell +python tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +``` + +- Infer: + +```shell +python tools/infer.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +``` + +Detail also can refer to [Quick start guide](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/GETTING_STARTED.md). + +
    + + +## Deployment + +### Export and Convert Model + +
    +1. Export model + +```shell +cd PaddleDetection +python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams \ + --output_dir=output_inference +``` + +- If no post processing is required, please specify: `-o export.benchmark=True` (if -o has already appeared, delete -o here) or manually modify corresponding fields in [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml). +- If no NMS is required, please specify: `-o export.nms=True` or manually modify corresponding fields in [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/runtime.yml). Many scenes exported to ONNX only support single input and fixed shape output, so if exporting to ONNX, it is recommended not to export NMS. + + +
    + +
    +2. Convert to PaddleLite (click to expand) + +- Install Paddlelite>=2.10: + +```shell +pip install paddlelite +``` + +- Convert model: + +```shell +# FP32 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 +# FP16 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true +``` + +
    + +
    +3. Convert to ONNX (click to expand) + +- Install [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 and ONNX > 1.10.1, for details, please refer to [Tutorials of Export ONNX Model](../../deploy/EXPORT_ONNX_MODEL.md) + +```shell +pip install onnx +pip install paddle2onnx==0.9.2 +``` + +- Convert model: + +```shell +paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file picodet_s_320_coco.onnx +``` + +- Simplify ONNX model: use onnx-simplifier to simplify onnx model. + + - Install onnxsim >= 0.4.1: + ```shell + pip install onnxsim + ``` + - simplify onnx model: + ```shell + onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx + ``` + +
    + +- Deploy models + +| Model | Input size | ONNX(w/o postprocess) | Paddle Lite(fp32) | Paddle Lite(fp16) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | +| PicoDet-XS | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet_fp16.tar) | +| PicoDet-XS | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet_fp16.tar) | +| PicoDet-S | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet_fp16.tar) | +| PicoDet-S | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_fp16.tar) | +| PicoDet-M | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet_fp16.tar) | +| PicoDet-M | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet_fp16.tar) | +| PicoDet-L | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 640*640 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet_fp16.tar) | + + +### Deploy + +| Infer Engine | Python | C++ | Predict With Postprocess | +| :-------- | :--------: | :---------------------: | :----------------: | +| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)(postprocess coming soon) | ✔︎ | +| Paddle Lite | - | [C++](../../deploy/lite) | ✔︎ | +| Android Demo | - | [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ | +| PaddleInference | [Python](../../deploy/python) | [C++](../../deploy/cpp) | ✔︎ | +| ONNXRuntime | [Python](../../deploy/third_engine/demo_onnxruntime) | Coming soon | ✔︎ | +| NCNN | Coming soon | [C++](../../deploy/third_engine/demo_ncnn) | ✘ | +| MNN | Coming soon | [C++](../../deploy/third_engine/demo_mnn) | ✘ | + + +Android demo visualization: +
    + +
    + + +## Quantization + +
    +Requirements: + +- PaddlePaddle >= 2.2.2 +- PaddleSlim >= 2.2.2 + +**Install:** + +```shell +pip install paddleslim==2.2.2 +``` + +
    + +
    +Quant aware + +Configure the quant config and start training: + +```shell +python tools/train.py -c configs/picodet/picodet_s_416_coco_lcnet.yml \ + --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --eval +``` + +- More detail can refer to [slim document](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim) + +
    + +- Quant Aware Model ZOO: + +| Quant Model | Input size | mAPval
    0.5:0.95 | Configs | Weight | Inference Model | Paddle Lite(INT8) | +| :-------- | :--------: | :--------------------: | :-------: | :----------------: | :----------------: | :----------------: | +| PicoDet-S | 416*416 | 31.5 | [config](./picodet_s_416_coco_lcnet.yml) | [slim config](../slim/quant/picodet_s_416_lcnet_quant.yml) | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet_quant.pdparams) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant_non_postprocess.tar) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant.nb) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant_non_postprocess.nb) | + +## Unstructured Pruning + +
    +Tutorial: + +Please refer this [documentation](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/legacy_model/pruner/README.md) for details such as requirements, training and deployment. + +
    + +## Application + +- **Pedestrian detection:** model zoo of `PicoDet-S-Pedestrian` please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) + +- **Mainbody detection:** model zoo of `PicoDet-L-Mainbody` please refer to [mainbody detection](./legacy_model/application/mainbody_detection/README.md) + +## FAQ + +
    +Out of memory error. + +Please reduce the `batch_size` of `TrainReader` in config. + +
    + +
    +How to transfer learning. + +Please reset `pretrain_weights` in config, which trained on coco. Such as: +```yaml +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams +``` + +
    + +
    +The transpose operator is time-consuming on some hardware. + +Please use `PicoDet-LCNet` model, which has fewer `transpose` operators. + +
    + + +
    +How to count model parameters. + +You can insert below code at [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/ppdet/engine/trainer.py#L141) to count learnable parameters. + +```python +params = sum([ + p.numel() for n, p in self.model. named_parameters() + if all([x not in n for x in ['_mean', '_variance']]) +]) # exclude BatchNorm running status +print('params: ', params) +``` + +
    + +## Cite PP-PicoDet +If you use PicoDet in your research, please cite our work by using the following BibTeX entry: +``` +@misc{yu2021pppicodet, + title={PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices}, + author={Guanghua Yu and Qinyao Chang and Wenyu Lv and Chang Xu and Cheng Cui and Wei Ji and Qingqing Dang and Kaipeng Deng and Guanzhong Wang and Yuning Du and Baohua Lai and Qiwen Liu and Xiaoguang Hu and Dianhai Yu and Yanjun Ma}, + year={2021}, + eprint={2111.00902}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +``` diff --git a/PaddleDetection-release-2.6/configs/picodet/_base_/optimizer_300e.yml b/PaddleDetection-release-2.6/configs/picodet/_base_/optimizer_300e.yml new file mode 100644 index 0000000000000000000000000000000000000000..113707a03f6fd63dc075d0426c1a10b15d998140 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/_base_/optimizer_300e.yml @@ -0,0 +1,18 @@ +epoch: 300 + +LearningRate: + base_lr: 0.32 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_320_reader.yml b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_320_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..7d6500679dba0f06c6238aa8bed4f2fd0ad8bd5b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_320_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [256, 288, 320, 352, 384], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_416_reader.yml b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_416_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..ee4ae98865f7eb58994c0a79964d24e41c697373 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_416_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [352, 384, 416, 448, 480], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_640_reader.yml b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_640_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5502026af8b1d0762405db17e655b2b6628dea04 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_640_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 32 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_v2.yml b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_v2.yml new file mode 100644 index 0000000000000000000000000000000000000000..24e92b95cb32e3fd26e819bbe49795c6121cb2b2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/_base_/picodet_v2.yml @@ -0,0 +1,61 @@ +architecture: PicoDet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/LCNet_x1_5_pretrained.pdparams + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 1.5 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 128 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 128 + feat_out: 128 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 128 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml b/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..bb3d2e9bc923f0ee41c12fbbb7d1a7b91b97d339 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml @@ -0,0 +1,161 @@ +use_gpu: true +use_xpu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +metric: COCO +num_classes: 1 + +architecture: PicoDet +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_192_lcnet_pedestrian/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + fpn_stride: [8, 16, 32, 64] + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 4 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +worker_num: 6 +eval_height: &eval_height 192 +eval_width: &eval_width 192 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [128, 160, 192, 224, 256], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml b/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..91402ba5e6cf8edb587566260c1bb7a202d3be61 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml @@ -0,0 +1,160 @@ +use_gpu: true +use_xpu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +metric: COCO +num_classes: 1 + +architecture: PicoDet +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_320_lcnet_pedestrian/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + fpn_stride: [8, 16, 32, 64] + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [256, 288, 320, 352, 384], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/README.md b/PaddleDetection-release-2.6/configs/picodet/legacy_model/README.md new file mode 100644 index 0000000000000000000000000000000000000000..19236e28258d7b4036e380b874fc5f7943acb4cb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/README.md @@ -0,0 +1,60 @@ +# PP-PicoDet Legacy Model-ZOO (2021.10) + +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[NCNN](#latency)
    (ms) | Latency[Lite](#latency)
    (ms) | Download | Config | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | +| PicoDet-S | 320*320 | 27.1 | 41.4 | 0.99 | 0.73 | 8.13 | **6.65** | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_320_coco.yml) | +| PicoDet-S | 416*416 | 30.7 | 45.8 | 0.99 | 1.24 | 12.37 | **9.82** | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco.yml) | +| PicoDet-M | 320*320 | 30.9 | 45.7 | 2.15 | 1.48 | 11.27 | **9.61** | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_320_coco.yml) | +| PicoDet-M | 416*416 | 34.8 | 50.5 | 2.15 | 2.50 | 17.39 | **15.88** | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_m_416_coco.yml) | +| PicoDet-L | 320*320 | 32.9 | 48.2 | 3.30 | 2.23 | 15.26 | **13.42** | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_320_coco.yml) | +| PicoDet-L | 416*416 | 36.6 | 52.5 | 3.30 | 3.76 | 23.36 | **21.85** | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_416_coco.yml) | +| PicoDet-L | 640*640 | 40.9 | 57.6 | 3.30 | 8.91 | 54.11 | **50.55** | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_l_640_coco.yml) | + +#### More Configs + +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[NCNN](#latency)
    (ms) | Latency[Lite](#latency)
    (ms) | Download | Config | +| :--------------------------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | +| PicoDet-Shufflenetv2 1x | 416*416 | 30.0 | 44.6 | 1.17 | 1.53 | 15.06 | **10.63** | [model](https://paddledet.bj.bcebos.com/models/picodet_shufflenetv2_1x_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_shufflenetv2_1x_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/more_config/picodet_shufflenetv2_1x_416_coco.yml) | +| PicoDet-MobileNetv3-large 1x | 416*416 | 35.6 | 52.0 | 3.55 | 2.80 | 20.71 | **17.88** | [model](https://paddledet.bj.bcebos.com/models/picodet_mobilenetv3_large_1x_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_mobilenetv3_large_1x_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/more_config/picodet_mobilenetv3_large_1x_416_coco.yml) | +| PicoDet-LCNet 1.5x | 416*416 | 36.3 | 52.2 | 3.10 | 3.85 | 21.29 | **20.8** | [model](https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_lcnet_1_5x_416_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/more_config/picodet_lcnet_1_5x_416_coco.yml) | +| PicoDet-LCNet 1.5x | 640*640 | 40.6 | 57.4 | 3.10 | - | - | - | [model](https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_640_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_lcnet_1_5x_640_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/more_config/picodet_lcnet_1_5x_640_coco.yml) | +| PicoDet-R18 | 640*640 | 40.7 | 57.2 | 11.10 | - | - | - | [model](https://paddledet.bj.bcebos.com/models/picodet_r18_640_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_r18_640_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/more_config/picodet_r18_640_coco.yml) | + +
    +Table Notes: + +- Latency: All our models test on `Qualcomm Snapdragon 865(4xA77+4xA55)` with 4 threads by arm8 and with FP16. In the above table, test latency on [NCNN](https://github.com/Tencent/ncnn) and `Lite`->[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite). And testing latency with code: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark). +- PicoDet is trained on COCO train2017 dataset and evaluated on COCO val2017. +- PicoDet used 4 or 8 GPUs for training and all checkpoints are trained with default settings and hyperparameters. + +
    + +- Deploy models + +| Model | Input size | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | +| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_fp16.tar) | +| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_fp16.tar) | +| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_fp16.tar) | +| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_fp16.tar) | +| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_fp16.tar) | +| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_fp16.tar) | +| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_fp16.tar) | +| PicoDet-Shufflenetv2 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_shufflenetv2_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x_fp16.tar) | +| PicoDet-MobileNetv3-large 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_mobilenetv3_large_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x_fp16.tar) | +| PicoDet-LCNet 1.5x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_lcnet_1_5x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x_fp16.tar) | + + + +## Cite PP-PicoDet +``` +@misc{yu2021pppicodet, + title={PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices}, + author={Guanghua Yu and Qinyao Chang and Wenyu Lv and Chang Xu and Cheng Cui and Wei Ji and Qingqing Dang and Kaipeng Deng and Guanzhong Wang and Yuning Du and Baohua Lai and Qiwen Liu and Xiaoguang Hu and Dianhai Yu and Yanjun Ma}, + year={2021}, + eprint={2111.00902}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +``` diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_100e.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_100e.yml new file mode 100644 index 0000000000000000000000000000000000000000..c866b39985cbd3dfd80220798d90e6995299f4f2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_100e.yml @@ -0,0 +1,18 @@ +epoch: 100 + +LearningRate: + base_lr: 0.4 + schedulers: + - name: CosineDecay + max_epochs: 100 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_300e.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_300e.yml new file mode 100644 index 0000000000000000000000000000000000000000..fa4c9094a23d39fcce343a529cacac9beb74a675 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/optimizer_300e.yml @@ -0,0 +1,18 @@ +epoch: 300 + +LearningRate: + base_lr: 0.4 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_320_reader.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_320_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..4d3f0cbd8648bf2d8ef44cdbf1d2422865a22c94 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_320_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [256, 288, 320, 352, 384], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 128 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_416_reader.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_416_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..59433c64534163a454ad7e5a07b71d011119913c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_416_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [352, 384, 416, 448, 480], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 80 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_640_reader.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_640_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..60904fb6ba77c858a50f1e743e637961c38ccd1f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_640_reader.yml @@ -0,0 +1,42 @@ +worker_num: 6 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 56 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_esnet.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_esnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..aa099fca12122282641dc456eeb7f232338d447f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/_base_/picodet_esnet.yml @@ -0,0 +1,55 @@ +architecture: PicoDet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_0_pretrained.pdparams + +PicoDet: + backbone: ESNet + neck: CSPPAN + head: PicoHead + +ESNet: + scale: 1.0 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75] + +CSPPAN: + out_channels: 128 + use_depthwise: True + num_csp_blocks: 1 + num_features: 4 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 128 + feat_out: 128 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 128 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + loss_class: + name: VarifocalLoss + use_sigmoid: True + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + assigner: + name: SimOTAAssigner + candidate_topk: 10 + iou_weight: 6 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/README.md b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2d15c97b213c4ada85f76b6a7f86cbf181398f00 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/README.md @@ -0,0 +1,56 @@ +# 更多应用 + + +## 1. 版面分析任务 + +版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等。版面分析示意图如下图所示。 + +
    + +
    + +### 1.1 数据集 + +使用[PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)训练英文文档版面分析模型,该数据面向英文文献类(论文)场景,分别训练集(333,703张标注图片)、验证集(11,245张标注图片)和测试集(11,405张图片),包含5类:Table、Figure、Title、Text、List,更多[版面分析数据集](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md#32) + +### 1.2 模型库 + +使用PicoDet模型在PubLayNet数据集进行训练,同时采用FGD蒸馏,预训练模型如下: + +| 模型 | 图像输入尺寸 | mAPval
    0.5 | 下载地址 | 配置文件 | +| :-------- | :--------: | :----------------: | :---------------: | ----------------- | +| PicoDet-LCNet_x1_0 | 800*608 | 93.5% | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar) | [config](./picodet_lcnet_x1_0_layout.yml) | +| PicoDet-LCNet_x1_0 + FGD | 800*608 | 94.0% | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) | [teacher config](./picodet_lcnet_x2_5_layout.yml)|[student config](./picodet_lcnet_x1_0_layout.yml) | + + [FGD蒸馏介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/slim/distill/README.md) + +### 1.3 模型推理 + +了解版面分析整个流程(数据准备、模型训练、评估等),请参考[版面分析](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md),这里仅展示模型推理过程。首先下载模型库中的inference_model模型。 + +``` +mkdir inference_model +cd inference_model +# 下载并解压PubLayNet推理模型 +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar +cd .. +``` + +版面恢复任务进行推理,可以执行如下命令: + +```bash +python3 deploy/python/infer.py \ + --model_dir=inference_model/picodet_lcnet_x1_0_fgd_layout_infer/ \ + --image_file=docs/images/layout.jpg \ + --device=CPU +``` + +可视化版面结果如下图所示: + +
    + +
    + +## 2 Reference + +[1] Zhong X, Tang J, Yepes A J. Publaynet: largest dataset ever for document layout analysis[C]//2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019: 1015-1022. diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_demo.png b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_demo.png new file mode 100644 index 0000000000000000000000000000000000000000..da9640e245e34659771353e328bf97da129bd622 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_demo.png differ diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_res.jpg b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_res.jpg new file mode 100644 index 0000000000000000000000000000000000000000..93b3a8bef3bfc9f5c80a9505239af05d526b45a7 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/images/layout_res.jpg differ diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml new file mode 100644 index 0000000000000000000000000000000000000000..46e7e235f704e75f6b73b5497f694ba726a16143 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml @@ -0,0 +1,88 @@ +_BASE_: [ + '../../../../runtime.yml', + '../../_base_/picodet_esnet.yml', + '../../_base_/optimizer_100e.yml', + '../../_base_/picodet_640_reader.yml', +] + +weights: output/picodet_lcnet_x1_0_layout/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 10 +snapshot_epoch: 1 +epoch: 100 + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHead + +LCNet: + scale: 1.0 + feature_maps: [3, 4, 5] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: train.json + dataset_dir: ./dataset/publaynet/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: val.json + dataset_dir: ./dataset/publaynet/ + +TestDataset: + !ImageFolder + anno_path: ./dataset/publaynet/val.json + + +worker_num: 8 +eval_height: &eval_height 800 +eval_width: &eval_width 608 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [[768, 576], [800, 608], [832, 640]], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 24 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 608], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, 800, 608] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 608], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml new file mode 100644 index 0000000000000000000000000000000000000000..cb7f2157dc3fab588c25c569b3d504a7cb58a9ed --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml @@ -0,0 +1,32 @@ +_BASE_: [ + '../../_base_/picodet_esnet.yml', +] + +weights: output/picodet_lcnet_x2_5_layout/model_final +find_unused_parameters: True + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHead + +LCNet: + scale: 2.5 + feature_maps: [3, 4, 5] + +CSPPAN: + spatial_scales: [0.125, 0.0625, 0.03125] + +slim: Distill +slim_method: FGD +distill_loss: FGDFeatureLoss +distill_loss_name: ['neck_f_3', 'neck_f_2', 'neck_f_1', 'neck_f_0'] + +FGDFeatureLoss: + student_channels: 128 + teacher_channels: 128 + temp: 0.5 + alpha_fgd: 0.001 + beta_fgd: 0.0005 + gamma_fgd: 0.0005 + lambda_fgd: 0.000005 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/README.md b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0408587e62a81dbd97ae9128f59497287da26f5f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/README.md @@ -0,0 +1,30 @@ +# 更多应用 + + +## 1. 主体检测任务 + +主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。 + +主体检测是图像识别的前序步骤,被用于PaddleClas的PP-ShiTu图像识别系统中。PP-ShiTu中使用的主体检测模型基于PP-PicoDet。更多关于PP-ShiTu的介绍与使用可以参考:[PP-ShiTu](https://github.com/PaddlePaddle/PaddleClas)。 + + +### 1.1 数据集 + +PP-ShiTu图像识别任务中,训练主体检测模型时主要用到了以下几个数据集。 + +| 数据集 | 数据量 | 主体检测任务中使用的数据量 | 场景 | 数据集地址 | +| :------------: | :-------------: | :-------: | :-------: | :--------: | +| Objects365 | 1700K | 173k | 通用场景 | [地址](https://www.objects365.org/overview.html) | +| COCO2017 | 118K | 118k | 通用场景 | [地址](https://cocodataset.org/) | +| iCartoonFace | 48k | 48k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) | +| LogoDet-3k | 155k | 155k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| RPC | 54k | 54k | 商品检测 | [地址](https://rpc-dataset.github.io/) | + +在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别,最终融合的数据集中只包含 1 个类别,即前景,数据集定义配置可以参考[picodet_lcnet_x2_5_640_mainbody.yml](./picodet_lcnet_x2_5_640_mainbody.yml)。 + + +### 1.2 模型库 + +| 模型 | 图像输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 下载地址 | config | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | +| PicoDet-LCNet_x2_5 | 640*640 | 41.5 | 62.0 | [trained model](https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams) | [inference model](https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody_infer.tar) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_lcnet_x2_5_640_mainbody.log) | [config](./picodet_lcnet_x2_5_640_mainbody.yml) | diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml new file mode 100644 index 0000000000000000000000000000000000000000..cc06c1c7a58cd089c698ab6a35175fdbc317540f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml @@ -0,0 +1,41 @@ +_BASE_: [ + '../../../../runtime.yml', + '../../_base_/picodet_esnet.yml', + '../../_base_/optimizer_100e.yml', + '../../_base_/picodet_640_reader.yml', +] + +weights: output/picodet_lcnet_x2_5_640_mainbody/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 20 +snapshot_epoch: 2 + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHead + +LCNet: + scale: 2.5 + feature_maps: [3, 4, 5] + +metric: COCO +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: ./ + anno_path: train.json + dataset_dir: dataset/mainbody/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: ./ + anno_path: val.json + dataset_dir: dataset/mainbody/ + +TestDataset: + !ImageFolder + anno_path: ./dataset/mainbody/val.json diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..47f50425d138d174fce50f5e82e8be06382c4ade --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml @@ -0,0 +1,148 @@ +use_gpu: true +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false +weights: output/picodet_s_192_pedestrian/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 300 +metric: COCO +num_classes: 1 +# Exporting the model +export: + post_process: False # Whether post-processing is included in the network when export model. + nms: False # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +architecture: PicoDet + +PicoDet: + backbone: ESNet + neck: CSPPAN + head: PicoHead + +ESNet: + scale: 0.75 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 0.5, 0.5, 0.625, 0.5, 0.625, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] + +CSPPAN: + out_channels: 96 + use_depthwise: True + num_csp_blocks: 1 + num_features: 4 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 96 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + loss_class: + name: VarifocalLoss + use_sigmoid: True + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + assigner: + name: SimOTAAssigner + candidate_topk: 10 + iou_weight: 6 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.4 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json + +worker_num: 8 +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [128, 160, 192, 224, 256], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 128 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [192, 192], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + +TestReader: + inputs_def: + image_shape: [1, 3, 192, 192] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [192, 192], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + fuse_normalize: true diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..cf78ea61986201aaf58c78e4b8d0f6bdcd361464 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml @@ -0,0 +1,147 @@ +use_gpu: true +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false +weights: output/picodet_s_320_pedestrian/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 300 +metric: COCO +num_classes: 1 +# Exporting the model +export: + post_process: False # Whether post-processing is included in the network when export model. + nms: False # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +architecture: PicoDet + +PicoDet: + backbone: ESNet + neck: CSPPAN + head: PicoHead + +ESNet: + scale: 0.75 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 0.5, 0.5, 0.625, 0.5, 0.625, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] + +CSPPAN: + out_channels: 96 + use_depthwise: True + num_csp_blocks: 1 + num_features: 4 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 96 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + loss_class: + name: VarifocalLoss + use_sigmoid: True + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + assigner: + name: SimOTAAssigner + candidate_topk: 10 + iou_weight: 6 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.4 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json + +worker_num: 8 +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [256, 288, 320, 352, 384], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 128 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + +TestReader: + inputs_def: + image_shape: [1, 3, 320, 320] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..aaa1a807c70a4a028f2195ae8633e0cce2cd0bde --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml @@ -0,0 +1,23 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + '../_base_/optimizer_300e.yml', + '../_base_/picodet_416_reader.yml', +] + + +weights: output/picodet_lcnet_1_5x_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHead + +LCNet: + scale: 1.5 + feature_maps: [3, 4, 5] diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_640_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_640_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..7622d749cb267168ccd5d481bbe8bb4fbbc25054 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_640_coco.yml @@ -0,0 +1,49 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + '../_base_/optimizer_300e.yml', + '../_base_/picodet_640_reader.yml', +] + + +weights: output/picodet_lcnet_1_5x_640_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHead + +LCNet: + scale: 1.5 + feature_maps: [3, 4, 5] + +CSPPAN: + out_channels: 160 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 160 + +TrainReader: + batch_size: 24 + +LearningRate: + base_lr: 0.2 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_mobilenetv3_large_1x_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_mobilenetv3_large_1x_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..375bff97b7677bbe256a8f93b8d10218dc4cc2bf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_mobilenetv3_large_1x_416_coco.yml @@ -0,0 +1,39 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + '../_base_/optimizer_300e.yml', + '../_base_/picodet_416_reader.yml', +] + + +weights: output/picodet_mobilenetv3_large_1x_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 180 + +PicoDet: + backbone: MobileNetV3 + neck: CSPPAN + head: PicoHead + +MobileNetV3: + model_name: large + scale: 1.0 + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [7, 13, 16] + +TrainReader: + batch_size: 56 + +LearningRate: + base_lr: 0.3 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_r18_640_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_r18_640_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a1f60d4f01cec6ff1629f32538f1f783496a3825 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_r18_640_coco.yml @@ -0,0 +1,39 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + '../_base_/optimizer_300e.yml', + '../_base_/picodet_640_reader.yml', +] + + +weights: output/picodet_r18_640_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +PicoDet: + backbone: ResNet + neck: CSPPAN + head: PicoHead + +ResNet: + depth: 18 + variant: d + return_idx: [1, 2, 3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +TrainReader: + batch_size: 56 + +LearningRate: + base_lr: 0.3 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_shufflenetv2_1x_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_shufflenetv2_1x_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..15f62bcec5cc503c5ab0329dc868e789e87b2fe3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/more_config/picodet_shufflenetv2_1x_416_coco.yml @@ -0,0 +1,37 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + '../_base_/optimizer_300e.yml', + '../_base_/picodet_416_reader.yml', +] + +weights: output/picodet_shufflenetv2_1x_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +PicoDet: + backbone: ShuffleNetV2 + neck: CSPPAN + head: PicoHead + +ShuffleNetV2: + scale: 1.0 + feature_maps: [5, 13, 17] + act: leaky_relu + +CSPPAN: + out_channels: 96 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 96 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_320_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a41d5823e3bc3872dd90220408fd73ef7bcf8f3d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_320_coco.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_25_pretrained.pdparams +weights: output/picodet_l_320_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 250 + +ESNet: + scale: 1.25 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75] + +CSPPAN: + out_channels: 160 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 160 + +TrainReader: + batch_size: 56 + +LearningRate: + base_lr: 0.3 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..fcee20c3eff8fef97247ca3b4cfb5db95124114b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_416_coco.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_25_pretrained.pdparams +weights: output/picodet_l_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 250 + +ESNet: + scale: 1.25 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75] + +CSPPAN: + out_channels: 160 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 160 + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.3 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_640_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_640_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..990db111dfe704f8dda661d09a2e7eb6474aa262 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_l_640_coco.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_640_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_25_pretrained.pdparams +weights: output/picodet_l_640_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 +epoch: 250 + +ESNet: + scale: 1.25 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 1.0, 0.625, 0.5, 0.75, 0.625, 0.625, 0.5, 0.625, 1.0, 0.625, 0.75] + +CSPPAN: + out_channels: 160 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 160 + +TrainReader: + batch_size: 32 + +LearningRate: + base_lr: 0.3 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_320_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5b9f1ce7aa8f1b4cf1e9eea42b618c284b92c98f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_320_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +weights: output/picodet_m_320_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8c52f72ead3f523435a091b9ffaade66929e9645 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_m_416_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +weights: output/picodet_m_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..9945e3db13967ec875e050619cb66cac7827aa3e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_coco.yml @@ -0,0 +1,34 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x0_75_pretrained.pdparams +weights: output/picodet_s_320_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +ESNet: + scale: 0.75 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 0.5, 0.5, 0.625, 0.5, 0.625, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] + +CSPPAN: + out_channels: 96 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 96 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_voc.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..0be56616dcede4d60e4a54fb09cc66c45c63ebdf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_320_voc.yml @@ -0,0 +1,37 @@ +_BASE_: [ + '../../datasets/voc.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x0_75_pretrained.pdparams +weights: output/picodet_s_320_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +ESNet: + scale: 0.75 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 0.5, 0.5, 0.625, 0.5, 0.625, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] + +CSPPAN: + out_channels: 96 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 96 + +EvalReader: + collate_batch: false diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_416_coco.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_416_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..3764b6e4f26cc328b5e7a19815f88bbadf6e4013 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/picodet_s_416_coco.yml @@ -0,0 +1,34 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '_base_/picodet_esnet.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x0_75_pretrained.pdparams +weights: output/picodet_s_416_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 + +ESNet: + scale: 0.75 + feature_maps: [4, 11, 14] + act: hard_swish + channel_ratio: [0.875, 0.5, 0.5, 0.5, 0.625, 0.5, 0.625, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] + +CSPPAN: + out_channels: 96 + +PicoHead: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + feat_in_chan: 96 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/README.md b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ce51e36862827f16c1aac54531e15128eab79a87 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/README.md @@ -0,0 +1,135 @@ +# 非结构化稀疏在 PicoDet 上的应用教程 + +## 1. 介绍 +在模型压缩中,常见的稀疏方式为结构化稀疏和非结构化稀疏,前者在某个特定维度(特征通道、卷积核等等)上对卷积、矩阵乘法进行剪枝操作,然后生成一个更小的模型结构,这样可以复用已有的卷积、矩阵乘计算,无需特殊实现推理算子;后者以每一个参数为单元进行稀疏化,然而并不会改变参数矩阵的形状,所以更依赖于推理库、硬件对于稀疏后矩阵运算的加速能力。我们在 PP-PicoDet (以下简称PicoDet) 模型上运用了非结构化稀疏技术,在精度损失较小时,获得了在 ARM CPU 端推理的显著性能提升。本文档会介绍如何非结构化稀疏训练 PicoDet,关于非结构化稀疏的更多介绍请参照[这里](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/dygraph/unstructured_pruning)。 + +## 2. 版本要求 +```bash +PaddlePaddle >= 2.1.2 +PaddleSlim develop分支 (pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple) +``` + +## 3. 数据准备 +同 PicoDet + +## 4. 预训练模型 +在非结构化稀疏训练中,我们规定预训练模型是已经收敛完成的模型参数,所以需要额外在相关配置文件中声明。 + +声明预训练模型地址的配置文件:./configs/picodet/pruner/picodet_m_320_coco_pruner.yml +预训练模型地址请参照 PicoDet 文档:./configs/picodet/README.md + +## 5. 自定义稀疏化的作用范围 +为达到最佳推理加速效果,我们建议只对 1x1 卷积层进行稀疏化,其他层参数保持稠密。另外,有些层对于精度影响较大(例如head的最后几层,se-block的若干层),我们同样不建议对他们进行稀疏化,我们支持开发者通过传入自定义函数的形式,方便的指定哪些层不参与稀疏。例如,基于picodet_m_320这个模型,我们稀疏时跳过了后4层卷积以及6层se-block中的卷积,自定义函数如下: + +```python +NORMS_ALL = [ 'BatchNorm', 'GroupNorm', 'LayerNorm', 'SpectralNorm', 'BatchNorm1D', + 'BatchNorm2D', 'BatchNorm3D', 'InstanceNorm1D', 'InstanceNorm2D', + 'InstanceNorm3D', 'SyncBatchNorm', 'LocalResponseNorm' ] + +def skip_params_self(model): + skip_params = set() + for _, sub_layer in model.named_sublayers(): + if type(sub_layer).__name__.split('.')[-1] in NORMS_ALL: + skip_params.add(sub_layer.full_name()) + for param in sub_layer.parameters(include_sublayers=False): + cond_is_conv1x1 = len(param.shape) == 4 and param.shape[2] == 1 and param.shape[3] == 1 + cond_is_head_m = cond_is_conv1x1 and param.shape[0] == 112 and param.shape[1] == 128 + cond_is_se_block_m = param.name.split('.')[0] in ['conv2d_17', 'conv2d_18', 'conv2d_56', 'conv2d_57', 'conv2d_75', 'conv2d_76'] + if not cond_is_conv1x1 or cond_is_head_m or cond_is_se_block_m: + skip_params.add(param.name) + return skip_params +``` + +## 6. 训练 +我们已经将非结构化稀疏的核心功能通过 API 调用的方式嵌入到了训练中,所以如果您没有更细节的需求,直接运行 6.1 的命令启动训练即可。同时,为帮助您根据自己的需求更改、适配代码,我们也提供了更为详细的使用介绍,请参照 6.2。 + +### 6.1 直接使用 +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3.7 -m paddle.distributed.launch --log_dir=log_test --gpus 0,1,2,3 tools/train.py -c configs/picodet/pruner/picodet_m_320_coco_pruner.yml --slim_config configs/slim/prune/picodet_m_unstructured_prune_75.yml --eval +``` + +### 6.2 详细介绍 +- 自定义稀疏化的作用范围:可以参照本教程的第 5 节 +- 如何添加稀疏化训练所需的 4 行代码 + +```python +# after constructing model and before training + +# Pruner Step1: configs +configs = { + 'pruning_strategy': 'gmp', + 'stable_iterations': self.stable_epochs * steps_per_epoch, + 'pruning_iterations': self.pruning_epochs * steps_per_epoch, + 'tunning_iterations': self.tunning_epochs * steps_per_epoch, + 'resume_iteration': 0, + 'pruning_steps': self.pruning_steps, + 'initial_ratio': self.initial_ratio, +} + +# Pruner Step2: construct a pruner object +self.pruner = GMPUnstructuredPruner( + model, + ratio=self.cfg.ratio, + skip_params_func=skip_params_self, # Only pass in this value when you design your own skip_params function. And the following argument (skip_params_type) will be ignored. + skip_params_type=self.cfg.skip_params_type, + local_sparsity=True, + configs=configs) + +# training +for epoch_id in range(self.start_epoch, self.cfg.epoch): + model.train() + for step_id, data in enumerate(self.loader): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + loss.backward() + self.optimizer.step() + + # Pruner Step3: step during training + self.pruner.step() + + # Pruner Step4: save the sparse model + self.pruner.update_params() + # model-saving API +``` + +## 7. 模型评估与推理部署 +这部分与 PicoDet 文档中基本一致,只是在转换到 PaddleLite 模型时,需要添加一个输入参数(sparse_model): + +```bash +paddle_lite_opt --model_dir=inference_model/picodet_m_320_coco --valid_targets=arm --optimize_out=picodet_m_320_coco_fp32_sparse --sparse_model=True +``` + +**注意:** 目前稀疏化推理适用于 PaddleLite的 FP32 和 INT8 模型,所以执行上述命令时,请不要打开 FP16 开关。 + +## 8. 稀疏化结果 +我们在75%和85%稀疏度下,训练得到了 FP32 PicoDet-m模型,并在 SnapDragon-835设备上实测推理速度,效果如下表。其中: +- 对于 m 模型,mAP损失1.5,获得了 34\%-58\% 的加速性能 +- 同样对于 m 模型,除4线程推理速度基本持平外,单线程推理速度、mAP、模型体积均优于 s 模型。 + + +| Model | Input size | Sparsity | mAPval
    0.5:0.95 | Size
    (MB) | Latency single-thread[Lite](#latency)
    (ms) | speed-up single-thread | Latency 4-thread[Lite](#latency)
    (ms) | speed-up 4-thread | Download | SlimConfig | +| :-------- | :--------: |:--------: | :---------------------: | :----------------: | :----------------: |:----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | +| PicoDet-m-1.0 | 320*320 | 0 | 30.9 | 8.9 | 127 | 0 | 43 | 0 | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco.pdparams)| [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet/picodet_m_320_coco.yml)| +| PicoDet-m-1.0 | 320*320 | 75% | 29.4 | 5.6 | **80** | 58% | **32** | 34% | [model](https://paddledet.bj.bcebos.com/models/slim/picodet_m_320__coco_sparse_75.pdparams)| [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320__coco_sparse_75.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/slim/prune/picodet_m_unstructured_prune_75.yml)| +| PicoDet-s-1.0 | 320*320 | 0 | 27.1 | 4.6 | 68 | 0 | 26 | 0 | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet/picodet_s_320_coco.yml)| +| PicoDet-m-1.0 | 320*320 | 85% | 27.6 | 4.1 | **65** | 96% | **27** | 59% | [model](https://paddledet.bj.bcebos.com/models/slim/picodet_m_320__coco_sparse_85.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320__coco_sparse_85.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/slim/prune/picodet_m_unstructured_prune_85.yml)| + +**注意:** +- 上述模型体积是**部署模型体积**,即 PaddleLite 转换得到的 *.nb 文件的体积。 +- 加速一栏我们按照 FPS 增加百分比计算,即:$(dense\_latency - sparse\_latency) / sparse\_latency$ +- 上述稀疏化训练时,我们额外添加了一种数据增强方式到 _base_/picodet_320_reader.yml,代码如下。但是不添加的话,预期mAP也不会有明显下降(<0.1),且对速度和模型体积没有影响。 +```yaml +worker_num: 6 +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomDistort: {} + batch_transforms: +etc. +``` diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/optimizer_300e_pruner.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/optimizer_300e_pruner.yml new file mode 100644 index 0000000000000000000000000000000000000000..064d5623372bc8a7122bfc073bd289edc2a0b1b5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/optimizer_300e_pruner.yml @@ -0,0 +1,18 @@ +epoch: 300 + +LearningRate: + base_lr: 0.15 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 1.0 + steps: 34350 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/picodet_m_320_coco_pruner.yml b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/picodet_m_320_coco_pruner.yml new file mode 100644 index 0000000000000000000000000000000000000000..cf55a882ead0739c5859c7f335cbeb0d20f6415c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/legacy_model/pruner/picodet_m_320_coco_pruner.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../../../datasets/coco_detection.yml', + '../../../runtime.yml', + '../_base_/picodet_esnet.yml', + './optimizer_300e_pruner.yml', + '../_base_/picodet_320_reader.yml', +] + +weights: output/picodet_m_320_coco/model_final +find_unused_parameters: True +use_ema: true +cycle_epoch: 40 +snapshot_epoch: 10 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_l_320_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_l_320_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..c9225ff30f563877c8870dfaefdefb46f50effd7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_l_320_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams +weights: output/picodet_l_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 250 +snapshot_epoch: 10 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 160 + +LearningRate: + base_lr: 0.12 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +TrainReader: + batch_size: 24 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_l_416_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_l_416_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..f508e21d7ea3ddc518b4618873d78b56625bb93f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_l_416_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams +weights: output/picodet_l_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 250 +snapshot_epoch: 10 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 160 + +LearningRate: + base_lr: 0.12 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 + +TrainReader: + batch_size: 24 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_l_640_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_l_640_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..2fadd6a25b9ebe1e598cf10cbf01af23eefc14d4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_l_640_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_640_reader.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams +weights: output/picodet_l_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 200 +snapshot_epoch: 10 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 160 + +LearningRate: + base_lr: 0.06 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +TrainReader: + batch_size: 12 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_m_320_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_m_320_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..bd188c2188f73400e2423629aa8856137aa5082c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_m_320_coco_lcnet.yml @@ -0,0 +1,25 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +weights: output/picodet_m_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.24 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_m_416_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_m_416_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..c224f4e0975f04a7e76c0d80c511b730c02175d4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_m_416_coco_lcnet.yml @@ -0,0 +1,25 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +weights: output/picodet_m_416_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 250 +snapshot_epoch: 10 + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.24 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_s_320_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_s_320_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..c9fb52f320239cccf30257fe695e82fb5bb26121 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_s_320_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + +TrainReader: + batch_size: 64 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..ed00b479b3729e4202e190e945d2c5ddce0f7f4a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_416_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.24 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_npu.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_npu.yml new file mode 100644 index 0000000000000000000000000000000000000000..761cfde11334b42c993682d981c4ea28c44da092 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_s_416_coco_npu.yml @@ -0,0 +1,106 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', +] + +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_416_coco/best_model +find_unused_parameters: True +keep_best_weight: True +use_ema: True +epoch: 300 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: CSPPAN + head: PicoHeadV2 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + act: relu6 + +CSPPAN: + out_channels: 96 + use_depthwise: True + num_csp_blocks: 1 + num_features: 4 + act: relu6 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + act: relu6 + feat_in_chan: 96 + act: relu6 + +LearningRate: + base_lr: 0.2 + schedulers: + - !CosineDecay + max_epochs: 300 + min_lr_ratio: 0.08 + last_plateau_epochs: 30 + - !ExpWarmup + epochs: 2 + +worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 0.6 + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.1, 2.0] + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: True + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - RandomFlip: {prob: 0.5} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 40 + shuffle: true + drop_last: true + mosaic_epoch: 180 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_xs_320_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_xs_320_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..3b1b75313daa2b0e1bb72474dbf2e2f1ace5ff52 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_xs_320_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_320_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/LCNet_x0_35_pretrained.pdparams +weights: output/picodet_xs_320_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +LCNet: + scale: 0.35 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + +TrainReader: + batch_size: 64 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/picodet/picodet_xs_416_coco_lcnet.yml b/PaddleDetection-release-2.6/configs/picodet/picodet_xs_416_coco_lcnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..8ca47d23a9c6541e9d02aac74fa43d31b8469ed9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/picodet/picodet_xs_416_coco_lcnet.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/picodet_v2.yml', + '_base_/optimizer_300e.yml', + '_base_/picodet_416_reader.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/LCNet_x0_35_pretrained.pdparams +weights: output/picodet_xs_416_coco/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +LCNet: + scale: 0.35 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + +TrainReader: + batch_size: 56 + +LearningRate: + base_lr: 0.28 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/pose3d/README.md b/PaddleDetection-release-2.6/configs/pose3d/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0b9dec7e9c58740bede2de59a5163fa59528094b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pose3d/README.md @@ -0,0 +1,157 @@ +简体中文 + +
    + +
    + +# 3D Pose系列模型 + +## 目录 + +- [简介](#简介) +- [模型推荐](#模型推荐) +- [快速开始](#快速开始) + - [环境安装](#1环境安装) + - [数据准备](#2数据准备) + - [训练与测试](#3训练与测试) + - [单卡训练](#单卡训练) + - [多卡训练](#多卡训练) + - [模型评估](#模型评估) + - [模型预测](#模型预测) + - [使用说明](#4使用说明) + +## 简介 + +PaddleDetection 中提供了两种3D Pose算法(稀疏关键点),分别是适用于服务器端的大模型Metro3D和移动端的TinyPose3D。其中Metro3D基于[End-to-End Human Pose and Mesh Reconstruction with Transformers](https://arxiv.org/abs/2012.09760)进行了稀疏化改造,TinyPose3D是在TinyPose基础上修改输出3D关键点。 + +## 模型推荐 + +|模型|适用场景|human3.6m精度(14关键点)|human3.6m精度(17关键点)|模型下载| +|:--:|:--:|:--:|:--:|:--:| +|Metro3D|服务器端|56.014|46.619|[metro3d_24kpts.pdparams](https://bj.bcebos.com/v1/paddledet/models/pose3d/metro3d_24kpts.pdparams)| +|TinyPose3D|移动端|86.381|71.223|[tinypose3d_human36m.pdparams](https://bj.bcebos.com/v1/paddledet/models/pose3d/tinypose3d_human36M.pdparams)| + +注: +1. 训练数据基于 [MeshTransfomer](https://github.com/microsoft/MeshTransformer) 中的训练数据。 +2. 测试精度同 MeshTransfomer 采用 14 关键点测试。 + +## 快速开始 + +### 1、环境安装 + +​ 请参考PaddleDetection [安装文档](../../docs/tutorials/INSTALL_cn.md)正确安装PaddlePaddle和PaddleDetection即可。 + +### 2、数据准备 + + 我们的训练数据由coco、human3.6m、hr-lspet、posetrack3d、mpii组成。 + +​ 2.1 我们的训练数据下载地址为: + + [coco](https://bj.bcebos.com/v1/paddledet/data/coco.tar) + + [human3.6m](https://bj.bcebos.com/v1/paddledet/data/pose3d/human3.6m.tar.gz) + + [lspet+posetrack+mpii](https://bj.bcebos.com/v1/paddledet/data/pose3d/pose3d_others.tar.gz) + + [标注文件下载](https://bj.bcebos.com/v1/paddledet/data/pose3d/pose3d.tar.gz) + + 2.2 数据下载后按如下结构放在repo目录下 + +``` +${REPO_DIR} +|-- dataset +| |-- traindata +| |-- coco +| |-- hr-lspet +| |-- human3.6m +| |-- mpii +| |-- posetrack3d +| \-- pose3d +| |-- COCO2014-All-ver01.json +| |-- COCO2014-Part-ver01.json +| |-- COCO2014-Val-ver10.json +| |-- Human3.6m_train.json +| |-- Human3.6m_valid.json +| |-- LSPet_train_ver10.json +| |-- LSPet_test_ver10.json +| |-- MPII_ver01.json +| |-- PoseTrack_ver01.json +|-- ppdet +|-- deploy +|-- demo +|-- README_cn.md +|-- README_en.md +|-- ... +``` + + +### 3、训练与测试 + +#### 单卡训练 + +```shell +#单卡训练 +CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/pose3d/metro3d_24kpts.yml + +#多卡训练 +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/pose3d/metro3d_24kpts.yml +``` + +#### 模型评估 + +```shell +#单卡评估 +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/pose3d/metro3d_24kpts.yml -o weights=output/metro3d_24kpts/best_model.pdparams + +#当只需要保存评估预测的结果时,可以通过设置save_prediction_only参数实现,评估预测结果默认保存在output/keypoints_results.json文件中 +CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/pose3d/metro3d_24kpts.yml -o weights=output/metro3d_24kpts/best_model.pdparams --save_prediction_only + +#多卡评估 +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/eval.py -c configs/pose3d/metro3d_24kpts.yml -o weights=output/metro3d_24kpts/best_model.pdparams +``` + +#### 模型预测 + +```shell +#图片生成3视角图 +CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/pose3d/metro3d_24kpts.yml -o weights=./output/metro3d_24kpts/best_model.pdparams --infer_img=./demo/hrnet_demo.jpg --draw_threshold=0.5 +``` + +### 4、使用说明 + + 3D Pose在使用中相比2D Pose有更多的困难,该困难主要是由于以下两个原因导致的。 + + - 1)训练数据标注成本高; + + - 2)图像在深度信息上的模糊性; + + 由于(1)的原因训练数据往往只能覆盖少量动作,导致模型泛化性困难。由于(2)的原因图像在预测3D Pose坐标时深度z轴上误差通常大于x、y方向,容易导致时序间的较大抖动,且数据标注误差越大该问题表现的更加明显。 + + 要解决上述两个问题,就造成了两个矛盾的需求:1)提高泛化性需要更多的标注数据;2)降低预测误差需要高精度的数据标注。而3D Pose本身数据标注的困难导致越高精度的标注成本越高,标注数量则会相应降低。 + + 因此,我们提供的解决方案是: + + - 1)使用自动拟合标注方法自动产生大量低精度的数据。训练第一版模型,使其具有较普遍的泛化性。 + + - 2)标注少量目标动作的高精度数据,基于第一版模型finetune,得到目标动作上的高精度模型,且一定程度上继承了第一版模型的泛化性。 + + 我们的训练数据提供了大量的低精度自动生成式的数据,用户可以在此数据训练的基础上,标注自己高精度的目标动作数据进行finetune,即可得到相对稳定较好的模型。 + + 我们在医疗康复高精度数据上的训练效果展示如下 [高清视频](https://user-images.githubusercontent.com/31800336/218949226-22e6ab25-facb-4cc6-8eca-38d4bfd973e5.mp4) + +
    + +
    + + + +## 引用 + +``` +@inproceedings{lin2021end-to-end, +author = {Lin, Kevin and Wang, Lijuan and Liu, Zicheng}, +title = {End-to-End Human Pose and Mesh Reconstruction with Transformers}, +booktitle = {CVPR}, +year = {2021}, +} +``` diff --git a/PaddleDetection-release-2.6/configs/pose3d/metro3d_24kpts.yml b/PaddleDetection-release-2.6/configs/pose3d/metro3d_24kpts.yml new file mode 100644 index 0000000000000000000000000000000000000000..b8ea08a230b2b0e25b7d7859c02377dfc149411f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pose3d/metro3d_24kpts.yml @@ -0,0 +1,144 @@ +use_gpu: True +log_iter: 20 +save_dir: output +snapshot_epoch: 3 +weights: output/metro_modified/model_final +epoch: 50 +metric: Pose3DEval +num_classes: 1 +train_height: &train_height 224 +train_width: &train_width 224 +trainsize: &trainsize [*train_width, *train_height] +num_joints: &num_joints 24 + +#####model +architecture: METRO_Body +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +METRO_Body: + backbone: HRNet + trans_encoder: TransEncoder + num_joints: *num_joints + loss: Pose3DLoss + +HRNet: + width: 32 + freeze_at: -1 + freeze_norm: False + norm_momentum: 0.1 + downsample: True + +TransEncoder: + vocab_size: 30522 + num_hidden_layers: 4 + num_attention_heads: 4 + position_embeddings_size: 512 + intermediate_size: 3072 + input_feat_dim: [2048, 512, 128] + hidden_feat_dim: [1024, 256, 128] + attention_probs_dropout_prob: 0.1 + fc_dropout_prob: 0.1 + act_fn: 'gelu' + output_attentions: False + output_hidden_feats: False + +Pose3DLoss: + weight_3d: 1.0 + weight_2d: 0.0 + +#####optimizer +LearningRate: + base_lr: 0.0001 + schedulers: + - !CosineDecay + max_epochs: 52 + - !LinearWarmup + start_factor: 0.01 + steps: 2000 + + +OptimizerBuilder: + clip_grad_by_norm: 0.2 + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !Pose3DDataset + dataset_dir: dataset/traindata/ + image_dirs: ["human3.6m", "posetrack3d", "hr-lspet", "hr-lspet", "mpii/images", "coco/train2017"] + anno_list: ["pose3d/Human3.6m_train.json", "pose3d/PoseTrack_ver01.json", "pose3d/LSPet_train_ver10.json", "pose3d/LSPet_test_ver10.json", "pose3d/MPII_ver01.json", "pose3d/COCO2014-All-ver01.json"] + num_joints: *num_joints + test_mode: False + +EvalDataset: + !Pose3DDataset + dataset_dir: dataset/traindata/ + image_dirs: ["human3.6m"] + anno_list: ["pose3d/Human3.6m_valid.json"] + num_joints: *num_joints + test_mode: True + +TestDataset: + !ImageFolder + anno_path: dataset/traindata/coco/keypoint_imagelist.txt + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - SinglePoseAffine: + trainsize: *trainsize + rotate: [1.0, 30] #[prob, rotate range] + scale: [1.0, 0.25] #[prob, scale range] + - FlipPose: + flip_prob: 0.5 + img_res: *train_width + num_joints: *num_joints + - NoiseJitter: + noise_factor: 0.4 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: true + +EvalReader: + sample_transforms: + - SinglePoseAffine: + trainsize: *trainsize + rotate: [0., 30] + scale: [0., 0.25] + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + shuffle: false + drop_last: false + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false #whether to fuse nomalize layer into model while export model diff --git a/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_human36M.yml b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_human36M.yml new file mode 100644 index 0000000000000000000000000000000000000000..05c6656d145a7bb4af14bcc0a1781cf54de552b1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_human36M.yml @@ -0,0 +1,122 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 1 +weights: output/tinypose3d_human36M/model_final +epoch: 220 +num_joints: &num_joints 24 +pixel_std: &pixel_std 200 +metric: Pose3DEval +num_classes: 1 +train_height: &train_height 128 +train_width: &train_width 128 +trainsize: &trainsize [*train_width, *train_height] + +#####model +architecture: TinyPose3DHRHeatmapNet +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams + +TinyPose3DHRHeatmapNet: + backbone: LiteHRNet + post_process: HR3DNetPostProcess + num_joints: *num_joints + width: &width 40 + loss: Pose3DLoss + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +Pose3DLoss: + weight_3d: 1.0 + weight_2d: 0.0 + +#####optimizer +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + milestones: [17, 21] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.01 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !Pose3DDataset + dataset_dir: dataset/traindata/ + image_dirs: ["human3.6m"] + anno_list: ['pose3d/Human3.6m_train.json'] + num_joints: *num_joints + test_mode: False + +EvalDataset: + !Pose3DDataset + dataset_dir: dataset/traindata/ + image_dirs: ["human3.6m"] + anno_list: ['pose3d/Human3.6m_valid.json'] + num_joints: *num_joints + test_mode: True + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - SinglePoseAffine: + trainsize: *trainsize + rotate: [0.5, 30] #[prob, rotate range] + scale: [0.5, 0.25] #[prob, scale range] + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 128 + shuffle: true + drop_last: true + +EvalReader: + sample_transforms: + - SinglePoseAffine: + trainsize: *trainsize + rotate: [0., 30] + scale: [0., 0.25] + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 128 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_medical_multi_frames.yml b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_medical_multi_frames.yml new file mode 100644 index 0000000000000000000000000000000000000000..aad7a405571b4a6fa89714c00e6e39664483799a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_medical_multi_frames.yml @@ -0,0 +1,138 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 1 +weights: output/tinypose_3D_multi_frames/model_final +epoch: 420 +num_joints: &num_joints 24 +pixel_std: &pixel_std 200 +metric: Pose3DEval +num_classes: 1 +train_height: &train_height 128 +train_width: &train_width 96 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [24, 32] +flip_perm: &flip_perm [[1, 2], [4, 5], [7, 8], [10, 11], [13, 14], [16, 17], [18, 19], [20, 21], [22, 23]] + + +#####model +architecture: TinyPose3DHRNet +pretrain_weights: medical_multi_frames_best_model.pdparams + +TinyPose3DHRNet: + backbone: LiteHRNet + post_process: TinyPose3DPostProcess + num_joints: *num_joints + width: &width 40 + loss: KeyPointRegressionMSELoss + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointRegressionMSELoss: + reduction: 'mean' + +#####optimizer +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [17, 21] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.01 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + +#####data +TrainDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/train" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + + +EvalDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/val" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + +TestDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/val" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - CropAndFlipImages: + crop_range: [556, 1366] + - RandomFlipHalfBody3DTransformImages: + scale: 0.25 + rot: 30 + num_joints_half_body: 9 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 3, 6, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] + flip_pairs: *flip_perm + do_occlusion: true + - Resize: {interp: 2, target_size: [*train_height,*train_width], keep_ratio: false} + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - PermuteImages: {} + batch_size: 32 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - CropAndFlipImages: + crop_range: [556, 1366] + - Resize: {interp: 2, target_size: [*train_height,*train_width], keep_ratio: false} + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - PermuteImages: {} + batch_size: 32 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - LetterBoxResize: { target_size: [*train_height,*train_width]} + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_multi_frames_heatmap.yml b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_multi_frames_heatmap.yml new file mode 100644 index 0000000000000000000000000000000000000000..a5893ec9b9135a359a82af7897714b234a79c47c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pose3d/tinypose3d_multi_frames_heatmap.yml @@ -0,0 +1,138 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 1 +weights: output/tinypose3d_multi_frames_heatmap/model_final +epoch: 420 +num_joints: &num_joints 24 +pixel_std: &pixel_std 200 +metric: Pose3DEval +num_classes: 1 +train_height: &train_height 128 +train_width: &train_width 128 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [24, 32] +flip_perm: &flip_perm [[1, 2], [4, 5], [7, 8], [10, 11], [13, 14], [16, 17], [18, 19], [20, 21], [22, 23]] + +#####model +architecture: TinyPose3DHRHeatmapNet +pretrain_weights: medical_multi_frames_best_model.pdparams + +TinyPose3DHRHeatmapNet: + backbone: LiteHRNet + post_process: TinyPosePostProcess + num_joints: *num_joints + width: &width 40 + loss: KeyPointRegressionMSELoss + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointRegressionMSELoss: + reduction: 'mean' + +#####optimizer +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [17, 21] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.01 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + +#####data +TrainDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/train" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + + +EvalDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/val" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + +TestDataset: + !Keypoint3DMultiFramesDataset + dataset_dir: "data/medical/multi_frames/val" + image_dir: "images" + p3d_dir: "joint_pc/player_0" + json_path: "json_results/player_0/player_0.json" + img_size: *trainsize # w,h + num_frames: 6 + +worker_num: 4 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - CropAndFlipImages: + crop_range: [556, 1366] # 保留train_height/train_width比例的情况下,裁剪原图左右两个的黑色填充 + - RandomFlipHalfBody3DTransformImages: + scale: 0.25 + rot: 30 + num_joints_half_body: 9 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 3, 6, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] + flip_pairs: *flip_perm + do_occlusion: true + - Resize: {interp: 2, target_size: [*train_height,*train_width], keep_ratio: false} + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - PermuteImages: {} + batch_size: 1 #32 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - CropAndFlipImages: + crop_range: [556, 1366] + - Resize: {interp: 2, target_size: [*train_height,*train_width], keep_ratio: false} + #- OriginPointTranslationImages: {} + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - PermuteImages: {} + batch_size: 32 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - LetterBoxResize: { target_size: [*train_height,*train_width]} + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/configs/pphuman/README.md b/PaddleDetection-release-2.6/configs/pphuman/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a568f120d45549215ad4a4105845b0d50d9f5106 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/README.md @@ -0,0 +1,84 @@ +简体中文 | [English](README.md) + +# PP-YOLOE Human 检测模型 + +PaddleDetection团队提供了针对行人的基于PP-YOLOE的检测模型,用户可以下载模型进行使用。PP-Human中使用模型为业务数据集模型,我们同时提供CrowdHuman训练配置,可以使用开源数据进行训练。 +其中整理后的COCO格式的CrowdHuman数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip),检测类别仅一类 `pedestrian(1)`,原始数据集[下载链接](http://www.crowdhuman.org/download.html)。 + +相关模型的部署模型均在[PP-Human](../../deploy/pipeline/)项目中使用。 + +| 模型 | 数据集 | mAPval
    0.5:0.95 | mAPval
    0.5 | 下载 | 配置文件 | +|:---------|:-------:|:------:|:------:| :----: | :------:| +|PP-YOLOE-s| CrowdHuman | 42.5 | 77.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_36e_crowdhuman.pdparams) | [配置文件](./ppyoloe_crn_s_36e_crowdhuman.yml) | +|PP-YOLOE-l| CrowdHuman | 48.0 | 81.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_crowdhuman.pdparams) | [配置文件](./ppyoloe_crn_l_36e_crowdhuman.yml) | +|PP-YOLOE-s| 业务数据集 | 53.2 | - | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_36e_pphuman.pdparams) | [配置文件](./ppyoloe_crn_s_36e_pphuman.yml) | +|PP-YOLOE-l| 业务数据集 | 57.8 | - | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_pphuman.pdparams) | [配置文件](./ppyoloe_crn_l_36e_pphuman.yml) | +|PP-YOLOE+_t-aux(320)| 业务数据集 | 45.7 | 81.2 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.pdparams) | [配置文件](./ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.yml) | + + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 + +# YOLOv3 Human 检测模型 + +请参考[Human_YOLOv3页面](./pedestrian_yolov3/README_cn.md) + +# PP-YOLOE 香烟检测模型 +基于PP-YOLOE模型的香烟检测模型,是实现PP-Human中的基于检测的行为识别方案的一环,如何在PP-Human中使用该模型进行吸烟行为识别,可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/pphuman_action.md)。该模型检测类别仅包含香烟一类。由于数据来源限制,目前暂无法直接公开训练数据。该模型使用了小目标数据集VisDrone上的权重(参照[visdrone](../visdrone))作为预训练模型,以提升检测效果。 + +| 模型 | 数据集 | mAPval
    0.5:0.95 | mAPval
    0.5 | 下载 | 配置文件 | +|:---------|:-------:|:------:|:------:| :----: | :------:| +| PP-YOLOE-s | 香烟业务数据集 | 39.7 | 79.5 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_smoking_visdrone.yml) | + +# PP-HGNet 打电话识别模型 +基于PP-HGNet模型实现了打电话行为识别,详细可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/pphuman_action.md)。该模型基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md#3.3)套件进行训练。此处提供预测模型下载: + +| 模型 | 数据集 | Acc | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| PP-HGNet | 业务数据集 | 86.85 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | - | + +# HRNet 人体关键点模型 +人体关键点模型与ST-GCN模型一起完成[基于骨骼点的行为识别](../../deploy/pipeline/docs/tutorials/pphuman_action.md)方案。关键点模型采用HRNet模型,关于关键点模型相关详细资料可以查看关键点专栏页面[KeyPoint](../keypoint/README.md)。此处提供训练模型下载链接。 + +| 模型 | 数据集 | APval
    0.5:0.95 | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| HRNet | 业务数据集 | 87.1 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) | [配置文件](./hrnet_w32_256x192.yml) | + + +# ST-GCN 骨骼点行为识别模型 +人体关键点模型与[ST-GCN](https://arxiv.org/abs/1801.07455)模型一起完成[基于骨骼点的行为识别](../../deploy/pipeline/docs/tutorials/pphuman_action.md)方案。 +ST-GCN模型基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman)完成训练。 +此处提供预测模型下载链接。 + +| 模型 | 数据集 | APval
    0.5:0.95 | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| ST-GCN | 业务数据集 | 87.1 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | [配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml) | + +# PP-TSM 视频分类模型 +基于`PP-TSM`模型完成了[基于视频分类的行为识别](../../deploy/pipeline/docs/tutorials/pphuman_action.md)方案。 +PP-TSM模型基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/FightRecognition)完成训练。 +此处提供预测模型下载链接。 + +| 模型 | 数据集 | Acc | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| PP-TSM | 组合开源数据集 | 89.06 |[下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | [配置文件](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/FightRecognition/pptsm_fight_frames_dense.yaml) | + +# PP-HGNet、PP-LCNet 属性识别模型 +基于PP-HGNet、PP-LCNet 模型实现了行人属性识别,详细可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/pphuman_attribute.md)。该模型基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-LCNet.md)套件进行训练。此处提供预测模型下载链接. + +| 模型 | 数据集 | mA | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| PP-HGNet_small | 业务数据集 | 95.4 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | - | +| PP-LCNet | 业务数据集 | 94.5 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | [配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml) | + + +## 引用 +``` +@article{shao2018crowdhuman, + title={CrowdHuman: A Benchmark for Detecting Human in a Crowd}, + author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian}, + journal={arXiv preprint arXiv:1805.00123}, + year={2018} + } +``` diff --git a/PaddleDetection-release-2.6/configs/pphuman/dark_hrnet_w32_256x192.yml b/PaddleDetection-release-2.6/configs/pphuman/dark_hrnet_w32_256x192.yml new file mode 100644 index 0000000000000000000000000000000000000000..a759c121a1e891e510f802cfbf53962c98a368be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/dark_hrnet_w32_256x192.yml @@ -0,0 +1,141 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams + +TopDownHRNet: + backbone: HRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 32 + loss: KeyPointMSELoss + +HRNet: + width: *width + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + + +#####optimizer +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + milestones: [170, 200] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 + + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.5 + rot: 40 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - TopDownAffine: + trainsize: *trainsize + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 2 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README.md b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..825eedb6ea50bf3893297e0342367cca6517d0f1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README.md @@ -0,0 +1,50 @@ +English | [简体中文](README_cn.md) +# PaddleDetection applied for specific scenarios + +We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios. + +| Task | Algorithm | Box AP | Download | Configs | +|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |:------:| +| Pedestrian Detection | YOLOv3 | 51.8 | [model](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [config](./pedestrian_yolov3_darknet.yml) | + +## Pedestrian Detection + +The main applications of pedetestrian detection include intelligent monitoring. In this scenary, photos of pedetestrians are taken by surveillance cameras in public areas, then pedestrian detection are conducted on these photos. + +### 1. Network + +The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53. + +### 2. Configuration for training + +PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection: + +* num_classes: 1 +* dataset_dir: dataset/pedestrian + +### 3. Accuracy + +The accuracy of the model trained and evaluted on our private data is shown as followed: + +AP at IoU=.50:.05:.95 is 0.518. + +AP at IoU=.50 is 0.792. + +### 4. Inference + +Users can employ the model to conduct the inference: + +``` +export CUDA_VISIBLE_DEVICES=0 +python -u tools/infer.py -c configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams \ + --infer_dir configs/pphuman/pedestrian_yolov3/demo \ + --draw_threshold 0.3 \ + --output_dir configs/pphuman/pedestrian_yolov3/demo/output +``` + +Some inference results are visualized below: + +![](../../../docs/images/PedestrianDetection_001.png) + +![](../../../docs/images/PedestrianDetection_004.png) diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README_cn.md b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..50a183f8384686760befb2ce17f7617a4547a97b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/README_cn.md @@ -0,0 +1,51 @@ +[English](README.md) | 简体中文 +# 特色垂类检测模型 + +我们提供了针对不同场景的基于PaddlePaddle的检测模型,用户可以下载模型进行使用。 + +| 任务 | 算法 | 精度(Box AP) | 下载 | 配置文件 | +|:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: | :------:| +| 行人检测 | YOLOv3 | 51.8 | [下载链接](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml) | + +## 行人检测(Pedestrian Detection) + +行人检测的主要应用有智能监控。在监控场景中,大多是从公共区域的监控摄像头视角拍摄行人,获取图像后再进行行人检测。 + +### 1. 模型结构 + +Backbone为Dacknet53的YOLOv3。 + + +### 2. 训练参数配置 + +PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml),与之相比,在进行行人检测的模型训练时,我们对以下参数进行了修改: + +* num_classes: 1 +* dataset_dir: dataset/pedestrian + +### 2. 精度指标 + +模型在我们针对监控场景的内部数据上精度指标为: + +IOU=.5时的AP为 0.792。 + +IOU=.5-.95时的AP为 0.518。 + +### 3. 预测 + +用户可以使用我们训练好的模型进行行人检测: + +``` +export CUDA_VISIBLE_DEVICES=0 +python -u tools/infer.py -c configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams \ + --infer_dir configs/pphuman/pedestrian_yolov3/demo \ + --draw_threshold 0.3 \ + --output_dir configs/pphuman/pedestrian_yolov3/demo/output +``` + +预测结果示例: + +![](../../../docs/images/PedestrianDetection_001.png) + +![](../../../docs/images/PedestrianDetection_004.png) diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/001.png b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/001.png new file mode 100644 index 0000000000000000000000000000000000000000..63ae9167fd03e8a95756fe5f6195fc8d741b9cfa Binary files /dev/null and b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/001.png differ diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/002.png b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/002.png new file mode 100644 index 0000000000000000000000000000000000000000..0de905cf55e6b02487ee1b8220810df8eaa24c2c Binary files /dev/null and b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/002.png differ diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/003.png b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/003.png new file mode 100644 index 0000000000000000000000000000000000000000..e9026e099df42d4267be07a71401eb5426b47745 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/003.png differ diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/004.png b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/004.png new file mode 100644 index 0000000000000000000000000000000000000000..d8118ec3e0ef63bc74e825b5e7638a1886580604 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/demo/004.png differ diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian.json b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian.json new file mode 100644 index 0000000000000000000000000000000000000000..f72fe6dc65209ab3506d18556fb8b83b6ec832a9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian.json @@ -0,0 +1,11 @@ +{ + "images": [], + "annotations": [], + "categories": [ + { + "supercategory": "component", + "id": 1, + "name": "pedestrian" + } + ] +} diff --git a/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml new file mode 100644 index 0000000000000000000000000000000000000000..febd30643db87a1856b3776b5c2c6014f1587576 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '../../yolov3/_base_/optimizer_270e.yml', + '../../yolov3/_base_/yolov3_darknet53.yml', + '../../yolov3/_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams + +num_classes: 1 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/pedestrian + anno_path: annotations/instances_train2017.json + image_dir: train2017 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/pedestrian + anno_path: annotations/instances_val2017.json + image_dir: val2017 + +TestDataset: + !ImageFolder + anno_path: configs/pphuman/pedestrian_yolov3/pedestrian.json diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..445fefdc5c1a86c307a5c11b471df1aa95aafe7d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_crowdhuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/crowdhuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_pphuman.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_pphuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..c1ac43ede159ad8f6086abc18ca83aac3c2ff4a2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_l_36e_pphuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_pphuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/pphuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..7be5fe7e72e28c1d1fc9f1d517a95caa796fee76 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_s_36e_crowdhuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/crowdhuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_pphuman.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_pphuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..34911e2fe96cf7278f8dde9029f3028d4adf900c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_36e_pphuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_s_36e_pphuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/pphuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..40a731d4dece54b02948c58e9bbaef60d1d6d9ce --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml @@ -0,0 +1,54 @@ +_BASE_: [ + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_80e_smoking_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 16 + +LearningRate: + base_lr: 0.01 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 80 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + +metric: COCO +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking diff --git a/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.yml b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..9d542fe6b4f20d6950edcc3a775839028ef702fb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.yml @@ -0,0 +1,60 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + '../ppyoloe/_base_/ppyoloe_plus_reader_320.yml', +] + +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_300e_coco.pdparams # 640*640 COCO mAP 39.7 +depth_mult: 0.33 +width_mult: 0.375 + + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/pphuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/pphuman + + +TrainReader: + batch_size: 8 + + +epoch: 60 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 72 + - !LinearWarmup + start_factor: 0. + epochs: 1 + + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/README.md b/PaddleDetection-release-2.6/configs/ppvehicle/README.md new file mode 100644 index 0000000000000000000000000000000000000000..de4b783799e96cf132f9e1f7e49c317161f18d20 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/README.md @@ -0,0 +1,81 @@ +简体中文 | [English](README.md) + +## PP-YOLOE Vehicle 检测模型 + +PaddleDetection团队提供了针对自动驾驶场景的基于PP-YOLOE的检测模型,用户可以下载模型进行使用,主要包含5个数据集(BDD100K-DET、BDD100K-MOT、UA-DETRAC、PPVehicle9cls、PPVehicle)。其中前3者为公开数据集,后两者为整合数据集。 +- BDD100K-DET具体类别为10类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), train(6), motorcycle(7), bicycle(8), traffic light(9), traffic sign(10)`。 +- BDD100K-MOT具体类别为8类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), train(6), motorcycle(7), bicycle(8)`,但数据集比BDD100K-DET更大更多。 +- UA-DETRAC具体类别为4类,包括`car(1), bus(2), van(3), others(4)`。 +- PPVehicle9cls数据集整合了BDD100K-MOT和UA-DETRAC,具体类别为9类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), van(6), motorcycle(7), bicycle(8), others(9)`。 +- PPVehicle数据集整合了BDD100K-MOT和UA-DETRAC,是将BDD100K-MOT中的`car, truck, bus, van`和UA-DETRAC中的`car, bus, van`都合并为1类`vehicle(1)`后的数据集。 + +相关模型的部署模型均在[PP-Vehicle](../../deploy/pipeline/)项目中使用。 + +| 模型 | 数据集 | 类别数 | mAPval
    0.5:0.95 | 下载链接 | 配置文件 | +|:---------|:---------------:|:------:|:-----------------------:|:---------:| :-----: | +|PP-YOLOE-l| BDD100K-DET | 10 | 35.6 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_bdd100kdet.pdparams) | [配置文件](./ppyoloe_crn_l_36e_bdd100kdet.yml) | +|PP-YOLOE-l| BDD100K-MOT | 8 | 33.7 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_bdd100kmot.pdparams) | [配置文件](./ppyoloe_crn_l_36e_bdd100kmot.yml) | +|PP-YOLOE-l| UA-DETRAC | 4 | 51.4 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_uadetrac.pdparams) | [配置文件](./ppyoloe_crn_l_36e_uadetrac.yml) | +|PP-YOLOE-l| PPVehicle9cls | 9 | 40.0 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle9cls.pdparams) | [配置文件](./mot_ppyoloe_l_36e_ppvehicle9cls.yml) | +|PP-YOLOE-s| PPVehicle9cls | 9 | 35.3 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_s_36e_ppvehicle9cls.pdparams) | [配置文件](./mot_ppyoloe_s_36e_ppvehicle9cls.yml) | +|PP-YOLOE-l| PPVehicle | 1 | 63.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle.pdparams) | [配置文件](./mot_ppyoloe_l_36e_ppvehicle.yml) | +|PP-YOLOE-s| PPVehicle | 1 | 61.3 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_s_36e_ppvehicle.pdparams) | [配置文件](./mot_ppyoloe_s_36e_ppvehicle.yml) | +|PP-YOLOE+_t-aux(320)| PPVehicle | 1 | 53.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.pdparams) | [配置文件](./ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.yml) | + + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 +- 如需预测出对应类别,可自行修改和添加对应的label_list.txt文件(一行记录一个对应种类),TestDataset中的anno_path为绝对路径,如: +``` +TestDataset: + !ImageFolder + anno_path: label_list.txt # 如不使用dataset_dir,则anno_path即为相对于PaddleDetection主目录的相对路径 + # dataset_dir: dataset/ppvehicle # 如使用dataset_dir,则dataset_dir/anno_path作为新的anno_path +``` +label_list.txt里的一行记录一个对应种类,如下所示: +``` +vehicle +``` + +## YOLOv3 Vehicle 检测模型 + +请参考[Vehicle_YOLOv3页面](./vehicle_yolov3/README_cn.md) + +## PP-OCRv3 车牌识别模型 + +车牌识别采用Paddle自研超轻量级模型PP-OCRv3_det、PP-OCRv3_rec。在[CCPD数据集](https://github.com/detectRecog/CCPD)(CCPD2019+CCPD2020车牌数据集)上进行了fine-tune。模型训练基于[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/applications/%E8%BD%BB%E9%87%8F%E7%BA%A7%E8%BD%A6%E7%89%8C%E8%AF%86%E5%88%AB.md)完成,我们提供了预测模型下载: + +| 模型 | 数据集 | 精度 | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| PP-OCRv3_det | CCPD组合数据集 | hmean:0.979 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) | [配置文件](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) | +| PP-OCRv3_rec | CCPD组合数据集 | acc:0.773 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | [配置文件](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | + +## PP-LCNet 车牌属性模型 + +车牌属性采用Paddle自研超轻量级模型PP-LCNet。在[VeRi数据集](https://www.v7labs.com/open-datasets/veri-dataset)进行训练。模型训练基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/PULC/PULC_vehicle_attribute_en.md)完成,我们提供了预测模型下载: + +| 模型 | 数据集 | 精度 | 下载 | 配置文件 | +|:---------|:-------:|:------:| :----: | :------:| +| PP-LCNet_x1_0 | VeRi数据集 | 90.81 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | [配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml) | + + +## 引用 +``` +@InProceedings{bdd100k, + author = {Yu, Fisher and Chen, Haofeng and Wang, Xin and Xian, Wenqi and Chen, + Yingying and Liu, Fangchen and Madhavan, Vashisht and Darrell, Trevor}, + title = {BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} + +@article{CVIU_UA-DETRAC, + author = {Longyin Wen and Dawei Du and Zhaowei Cai and Zhen Lei and Ming{-}Ching Chang and + Honggang Qi and Jongwoo Lim and Ming{-}Hsuan Yang and Siwei Lyu}, + title = {{UA-DETRAC:} {A} New Benchmark and Protocol for Multi-Object Detection and Tracking}, + journal = {Computer Vision and Image Understanding}, + year = {2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..61df2fcc4b55820d6dca9e4f57ecc1fc02484777 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml @@ -0,0 +1,57 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_l_36e_ppvehicle/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml new file mode 100644 index 0000000000000000000000000000000000000000..4cd73b7e244a47fcdb3f64663df8995e1dde3e55 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_l_36e_ppvehicle9cls/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all_9cls.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..f4f384584c12ae0eff897ebc0fb7f233463ea708 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml @@ -0,0 +1,57 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_s_36e_ppvehicle/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml new file mode 100644 index 0000000000000000000000000000000000000000..653ff1a75822f965bfb0a8134f5fa78a309d52b9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_s_36e_ppvehicle9cls/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all_9cls.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml new file mode 100644 index 0000000000000000000000000000000000000000..921d8b33f17a2a6850cf292769bf51b00a7b1d92 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_bdd100kdet/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: images/100k/train + anno_path: labels/det_20/det_train_cocofmt.json + dataset_dir: dataset/bdd100k + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/100k/val + anno_path: labels/det_20/det_val_cocofmt.json + dataset_dir: dataset/bdd100k + +TestDataset: + !ImageFolder + anno_path: labels/det_20/det_val_cocofmt.json + dataset_dir: dataset/bdd100k + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml new file mode 100644 index 0000000000000000000000000000000000000000..b9d32be10d6cb415f22257bd778aab412420fa8a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_bdd100kmot/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 8 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/bdd100k + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/bdd100k + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/bdd100k + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml new file mode 100644 index 0000000000000000000000000000000000000000..5f3dd59cd9ef9d0c5e2947608f264187f433983c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_uadetrac/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 4 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: annotations/train.json + dataset_dir: dataset/uadetrac + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: annotations/test.json + dataset_dir: dataset/uadetrac + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/uadetrac + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.yml b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..7ed888d7a4e0fe04fe1c68f5e8282457506597bc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.yml @@ -0,0 +1,61 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + '../ppyoloe/_base_/ppyoloe_plus_reader_320.yml', +] + +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_300e_coco.pdparams # 640*640 COCO mAP 39.7 +depth_mult: 0.33 +width_mult: 0.375 + + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + + +TrainReader: + batch_size: 8 + + +epoch: 60 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 72 + - !LinearWarmup + start_factor: 0. + epochs: 1 + + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README.md b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..464472ff164404f2cd601adffcc7163ca34ae894 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README.md @@ -0,0 +1,53 @@ +English | [简体中文](README_cn.md) +# PaddleDetection applied for specific scenarios + +We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios. + +| Task | Algorithm | Box AP | Download | Configs | +|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |:------:| +| Vehicle Detection | YOLOv3 | 54.5 | [model](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [config](./vehicle_yolov3_darknet.yml) | + +## Vehicle Detection + +One of major applications of vehichle detection is traffic monitoring. In this scenary, vehicles to be detected are mostly captured by the cameras mounted on top of traffic light columns. + +### 1. Network + +The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53. + +### 2. Configuration for training + +PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection: + +* num_classes: 6 +* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]] +* nms/nms_top_k: 400 +* nms/score_threshold: 0.005 +* dataset_dir: dataset/vehicle + +### 3. Accuracy + +The accuracy of the model trained and evaluated on our private data is shown as followed: + +AP at IoU=.50:.05:.95 is 0.545. + +AP at IoU=.50 is 0.764. + +### 4. Inference + +Users can employ the model to conduct the inference: + +``` +export CUDA_VISIBLE_DEVICES=0 +python -u tools/infer.py -c configs/ppvehicle/vehicle_yolov3/vehicle_yolov3_darknet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams \ + --infer_dir configs/ppvehicle/vehicle_yolov3/demo \ + --draw_threshold 0.2 \ + --output_dir configs/ppvehicle/vehicle_yolov3/demo/output +``` + +Some inference results are visualized below: + +![](../../../docs/images/VehicleDetection_001.jpeg) + +![](../../../docs/images/VehicleDetection_005.png) diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README_cn.md b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..94657c866e5904c4729047bf9868ad2805015fe8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/README_cn.md @@ -0,0 +1,54 @@ +[English](README.md) | 简体中文 +# 特色垂类检测模型 + +我们提供了针对不同场景的基于PaddlePaddle的检测模型,用户可以下载模型进行使用。 + +| 任务 | 算法 | 精度(Box AP) | 下载 | 配置文件 | +|:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: | :------:| +| 车辆检测 | YOLOv3 | 54.5 | [下载链接](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [配置文件](./vehicle_yolov3_darknet.yml) | + + +## 车辆检测(Vehicle Detection) + +车辆检测的主要应用之一是交通监控。在这样的监控场景中,待检测的车辆多为道路红绿灯柱上的摄像头拍摄所得。 + +### 1. 模型结构 + +Backbone为Dacknet53的YOLOv3。 + +### 2. 训练参数配置 + +PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml),与之相比,在进行车辆检测的模型训练时,我们对以下参数进行了修改: + +* num_classes: 6 +* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]] +* nms/nms_top_k: 400 +* nms/score_threshold: 0.005 +* dataset_dir: dataset/vehicle + +### 3. 精度指标 + +模型在我们内部数据上的精度指标为: + +IOU=.50:.05:.95时的AP为 0.545。 + +IOU=.5时的AP为 0.764。 + +### 4. 预测 + +用户可以使用我们训练好的模型进行车辆检测: + +``` +export CUDA_VISIBLE_DEVICES=0 +python -u tools/infer.py -c configs/ppvehicle/vehicle_yolov3/vehicle_yolov3_darknet.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams \ + --infer_dir configs/ppvehicle/vehicle_yolov3/demo \ + --draw_threshold 0.2 \ + --output_dir configs/ppvehicle/vehicle_yolov3/demo/output +``` + +预测结果示例: + +![](../../../docs/images/VehicleDetection_001.jpeg) + +![](../../../docs/images/VehicleDetection_005.png) diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/001.jpeg b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/001.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..8786db5eb6773931c363358bb39462b33db55369 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/001.jpeg differ diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/003.png b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/003.png new file mode 100644 index 0000000000000000000000000000000000000000..c01ab4ce769fb3b1c8863093a35d27da0ab10efd Binary files /dev/null and b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/003.png differ diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/004.png b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/004.png new file mode 100644 index 0000000000000000000000000000000000000000..8907eb8d4d9b82e08ca214509c9fb41ca889db2a Binary files /dev/null and b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/004.png differ diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/005.png b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/005.png new file mode 100644 index 0000000000000000000000000000000000000000..bf17712809c2fe6fa8e7d4f093ec4ac94523537c Binary files /dev/null and b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/demo/005.png differ diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle.json b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle.json new file mode 100644 index 0000000000000000000000000000000000000000..5863a9a8c9e0d8b4daeff31e7fe7869e084d3fb4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle.json @@ -0,0 +1,36 @@ +{ + "images": [], + "annotations": [], + "categories": [ + { + "supercategory": "component", + "id": 1, + "name": "car" + }, + { + "supercategory": "component", + "id": 2, + "name": "truck" + }, + { + "supercategory": "component", + "id": 3, + "name": "bus" + }, + { + "supercategory": "component", + "id": 4, + "name": "motorbike" + }, + { + "supercategory": "component", + "id": 5, + "name": "tricycle" + }, + { + "supercategory": "component", + "id": 6, + "name": "carplate" + } + ] +} diff --git a/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle_yolov3_darknet.yml b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle_yolov3_darknet.yml new file mode 100644 index 0000000000000000000000000000000000000000..feb32952b00265cbac4ba0c3f17a72862b12c4e9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppvehicle/vehicle_yolov3/vehicle_yolov3_darknet.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../../datasets/coco_detection.yml', + '../../runtime.yml', + '../../yolov3/_base_/optimizer_270e.yml', + '../../yolov3/_base_/yolov3_darknet53.yml', + '../../yolov3/_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams + +YOLOv3Head: + anchors: [[8, 9], [10, 23], [19, 15], + [23, 33], [40, 25], [54, 50], + [101, 80], [139, 145], [253, 224]] + +BBoxPostProcess: + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.005 + nms_threshold: 0.45 + nms_top_k: 400 + +num_classes: 6 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/vehicle + anno_path: annotations/instances_train2017.json + image_dir: train2017 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/vehicle + anno_path: annotations/instances_val2017.json + image_dir: val2017 + +TestDataset: + !ImageFolder + anno_path: configs/ppvehicle/vehicle_yolov3/vehicle.json diff --git a/PaddleDetection-release-2.6/configs/ppyolo/README.md b/PaddleDetection-release-2.6/configs/ppyolo/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a50b5f9deb4ecc4784d8fe4b65c071940ac14063 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/README.md @@ -0,0 +1,240 @@ +English | [简体中文](README_cn.md) + +# PP-YOLO + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Getting Start](#Getting_Start) +- [Future Work](#Future_Work) +- [Appendix](#Appendix) + +## Introduction + +[PP-YOLO](https://arxiv.org/abs/2007.12099) is a optimized model based on YOLOv3 in PaddleDetection,whose performance(mAP on COCO) and inference spped are better than [YOLOv4](https://arxiv.org/abs/2004.10934),PaddlePaddle 2.0.2(available on pip now) or [Daily Version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-develop) is required to run this PP-YOLO。 + +PP-YOLO reached mmAP(IoU=0.5:0.95) as 45.9% on COCO test-dev2017 dataset, and inference speed of FP32 on single V100 is 72.9 FPS, inference speed of FP16 with TensorRT on single V100 is 155.6 FPS. + +
    + +
    + +PP-YOLO and PP-YOLOv2 improved performance and speed of YOLOv3 with following methods: + +- Better backbone: ResNet50vd-DCN +- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- Better ImageNet pretrain weights +- [PAN](https://arxiv.org/abs/1803.01534) +- Iou aware Loss +- larger input size + +## Model Zoo + +### PP-YOLO + +| Model | GPU number | images/GPU | backbone | input shape | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | +| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 512 | 43.9 | 44.4 | 89.9 | 188.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 416 | 42.1 | 42.5 | 109.1 | 215.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 320 | 38.9 | 39.3 | 132.2 | 242.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 512 | 44.4 | 45.0 | 89.9 | 188.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 416 | 42.7 | 43.2 | 109.1 | 215.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 320 | 39.5 | 40.1 | 132.2 | 242.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 512 | 29.2 | 29.5 | 357.1 | 657.9 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 416 | 28.6 | 28.9 | 409.8 | 719.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 320 | 26.2 | 26.4 | 480.7 | 763.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLOv2 | 8 | 12 | ResNet50vd | 640 | 49.1 | 49.5 | 68.9 | 106.5 | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | +| PP-YOLOv2 | 8 | 12 | ResNet101vd | 640 | 49.7 | 50.3 | 49.5 | 87.0 | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml) | + + +**Notes:** + +- PP-YOLO is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box APtest is evaluation results of `mAP(IoU=0.5:0.95)`. +- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ). +- PP-YOLO inference speed is tested on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode. +- PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. +- TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too) +- If you set `--run_benchmark=True`,you should install these dependencies at first, `pip install pynvml psutil GPUtil`. + +### PP-YOLO for mobile + +| Model | GPU number | images/GPU | Model Size | input shape | Box APval | Box AP50val | Kirin 990 1xCore(FPS) | download | config | +|:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :--------------------: | :--------------------: | :------: | :------: | +| PP-YOLO_MobileNetV3_large | 4 | 32 | 28MB | 320 | 23.2 | 42.6 | 14.1 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | +| PP-YOLO_MobileNetV3_small | 4 | 32 | 16MB | 320 | 17.2 | 33.8 | 21.5 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_small_coco.yml) | + +**Notes:** + +- PP-YOLO_MobileNetV3 is trained on COCO train2017 datast and evaluated on val2017 dataset,Box APval is evaluation results of `mAP(IoU=0.5:0.95)`, Box APval is evaluation results of `mAP(IoU=0.5)`. +- PP-YOLO_MobileNetV3 used 4 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ). +- PP-YOLO_MobileNetV3 inference speed is tested on Kirin 990 with 1 thread. + +### PP-YOLO tiny + +| Model | GPU number | images/GPU | Model Size | Post Quant Model Size | input shape | Box APval | Kirin 990 4xCore(FPS) | download | config | post quant model | +|:----------------------------:|:-------:|:-------------:|:----------:| :-------------------: | :---------: | :------------------: | :-------------------: | :------: | :----: | :--------------: | +| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 320 | 20.6 | 92.3 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | +| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 416 | 22.7 | 65.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | + +**Notes:** + +- PP-YOLO-tiny is trained on COCO train2017 datast and evaluated on val2017 dataset,Box APval is evaluation results of `mAP(IoU=0.5:0.95)`, Box APval is evaluation results of `mAP(IoU=0.5)`. +- PP-YOLO-tiny used 8 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ/README.md). +- PP-YOLO-tiny inference speed is tested on Kirin 990 with 4 threads by arm8 +- we alse provide PP-YOLO-tiny post quant inference model, which can compress model to **1.3MB** with nearly no inference on inference speed and performance + +### PP-YOLO on Pascal VOC + +PP-YOLO trained on Pascal VOC dataset as follows: + +| Model | GPU number | images/GPU | backbone | input shape | Box AP50val | download | config | +|:------------------:|:----------:|:----------:|:----------:| :----------:| :--------------------: | :------: | :-----: | +| PP-YOLO | 8 | 12 | ResNet50vd | 608 | 84.9 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | +| PP-YOLO | 8 | 12 | ResNet50vd | 416 | 84.3 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | +| PP-YOLO | 8 | 12 | ResNet50vd | 320 | 82.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | + +## Getting Start + +### 1. Training + +Training PP-YOLO on 8 GPUs with following command(all commands should be run under PaddleDetection dygraph directory as default) + +```bash +python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 & +``` + +optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in model configuration file and reader configuration file, such as `configs/ppyolo/_base_/ppyolo_tiny.yml` and `configs/ppyolo/_base_/ppyolo_tiny_reader.yml`. + +``` bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -n 9 -s 320 -m v2 -i 1000 +``` + +### 2. Evaluation + +Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +For evaluation on COCO test-dev2017 dataset, `configs/ppyolo/ppyolo_test.yml` should be used, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to pathes configured by `EvalReader.dataset` in `configs/ppyolo/ppyolo_test.yml` and run evaluation by following command: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +Evaluation results will be saved in `bbox.json`, compress it into a `zip` package and upload to [COCO dataset evaluation](https://competitions.codalab.org/competitions/20794#participate) to evaluate. + +**NOTE 1:** `configs/ppyolo/ppyolo_test.yml` is only used for evaluation on COCO test-dev2017 dataset, could not be used for training or COCO val2017 dataset evaluating. + +**NOTE 2:** Due to the overall upgrade of the dynamic graph framework, the following weight models published by paddledetection need to be evaluated by adding the -- bias field, such as + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --bias +``` +These models are: + +1.ppyolo_r50vd_dcn_1x_coco + +2.ppyolo_r50vd_dcn_voc + +3.ppyolo_r18vd_coco + +4.ppyolo_mbv3_large_coco + +5.ppyolo_mbv3_small_coco + +6.ppyolo_tiny_650e_coco + +### 3. Inference + +Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=demo +``` + +### 4. Inference deployment + +For inference deployment or benchmard, model exported with `tools/export_model.py` should be used and perform inference with Paddle inference library with following commands: + +```bash +# export model, model will be save in output/ppyolo as default +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# inference with Paddle Inference library +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=GPU +``` + + +## Appendix + +Optimizing method and ablation experiments of PP-YOLO compared with YOLOv3. + +| NO. | Model | Box APval | Box APtest | Params(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :--------------------------- | :------------------: |:--------------------: | :-------: | :------: | :-----------: | +| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | +| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | +| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | +| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | +| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | +| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | +| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | +| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | +| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | +| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | +| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | + +**Notes:** + +- Performance and inference spedd are measure with input shape as 608 +- All models are trained on COCO train2017 datast and evaluated on val2017 & test-dev2017 dataset,`Box AP` is evaluation results as `mAP(IoU=0.5:0.95)`. +- Inference speed is tested on single Tesla V100 with batch size as 1 following test method and environment configuration in benchmark above. +- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) with mAP as 39.0 is optimized YOLOv3 model in PaddleDetection,see [YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/README.md) for details. + +## Citation + +``` +@article{huang2021pp, + title={PP-YOLOv2: A Practical Object Detector}, + author={Huang, Xin and Wang, Xinxin and Lv, Wenyu and Bai, Xiaying and Long, Xiang and Deng, Kaipeng and Dang, Qingqing and Han, Shumin and Liu, Qiwen and Hu, Xiaoguang and others}, + journal={arXiv preprint arXiv:2104.10419}, + year={2021} +} +@misc{long2020ppyolo, +title={PP-YOLO: An Effective and Efficient Implementation of Object Detector}, +author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen}, +year={2020}, +eprint={2007.12099}, +archivePrefix={arXiv}, +primaryClass={cs.CV} +} +@misc{ppdet2019, +title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, +author={PaddlePaddle Authors}, +howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, +year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ppyolo/README_cn.md b/PaddleDetection-release-2.6/configs/ppyolo/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..5463f96eed11af300714175435a49730914f91cc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/README_cn.md @@ -0,0 +1,233 @@ +简体中文 | [English](README.md) + +# PP-YOLO 模型 + +## 内容 +- [简介](#简介) +- [模型库与基线](#模型库与基线) +- [使用说明](#使用说明) +- [未来工作](#未来工作) +- [附录](#附录) + +## 简介 + +[PP-YOLO](https://arxiv.org/abs/2007.12099)是PaddleDetection优化和改进的YOLOv3的模型,其精度(COCO数据集mAP)和推理速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934)模型,要求使用PaddlePaddle 2.0.2(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-develop)。 + +PP-YOLO在[COCO](http://cocodataset.org) test-dev2017数据集上精度达到45.9%,在单卡V100上FP32推理速度为72.9 FPS, V100上开启TensorRT下FP16推理速度为155.6 FPS。 + +
    + +
    + +PP-YOLO和PP-YOLOv2从如下方面优化和提升YOLOv3模型的精度和速度: + +- 更优的骨干网络: ResNet50vd-DCN +- 更大的训练batch size: 8 GPUs,每GPU batch_size=24,对应调整学习率和迭代轮数 +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- 更优的预训练模型 +- [PAN](https://arxiv.org/abs/1803.01534) +- Iou aware Loss +- 更大的输入尺寸 + +## 模型库 + +### PP-YOLO模型 + +| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | +| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 512 | 43.9 | 44.4 | 89.9 | 188.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 416 | 42.1 | 42.5 | 109.1 | 215.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO | 8 | 24 | ResNet50vd | 320 | 38.9 | 39.3 | 132.2 | 242.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 512 | 44.4 | 45.0 | 89.9 | 188.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 416 | 42.7 | 43.2 | 109.1 | 215.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 320 | 39.5 | 40.1 | 132.2 | 242.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 512 | 29.2 | 29.5 | 357.1 | 657.9 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 416 | 28.6 | 28.9 | 409.8 | 719.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLO | 4 | 32 | ResNet18vd | 320 | 26.2 | 26.4 | 480.7 | 763.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r18vd_coco.yml) | +| PP-YOLOv2 | 8 | 12 | ResNet50vd | 640 | 49.1 | 49.5 | 68.9 | 106.5 | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | +| PP-YOLOv2 | 8 | 12 | ResNet101vd | 640 | 49.7 | 50.3 | 49.5 | 87.0 | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml) | + +**注意:** + +- PP-YOLO模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集,Box APtest为`mAP(IoU=0.5:0.95)`评估结果。 +- PP-YOLO模型训练过程中使用8 GPUs,每GPU batch size为24进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ)调整学习率和迭代次数。 +- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 +- PP-YOLO模型FP32的推理速度测试数据为使用`tools/export_model.py`脚本导出模型后,使用`deploy/python/infer.py`脚本中的`--run_benchnark`参数使用Paddle预测库进行推理速度benchmark测试结果, 且测试的均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 +- TensorRT FP16的速度测试相比于FP32去除了`yolo_box`(bbox解码)部分耗时,即不包含数据预处理,bbox解码和NMS(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 + +### PP-YOLO 轻量级模型 + +| 模型 | GPU个数 | 每GPU图片个数 | 模型体积 | 输入尺寸 | Box APval | Box AP50val | Kirin 990 1xCore (FPS) | 模型下载 | 配置文件 | +|:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :--------------------: | :--------------------: | :------: | :------: | +| PP-YOLO_MobileNetV3_large | 4 | 32 | 28MB | 320 | 23.2 | 42.6 | 14.1 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | +| PP-YOLO_MobileNetV3_small | 4 | 32 | 16MB | 320 | 17.2 | 33.8 | 21.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_small_coco.yml) | + +- PP-YOLO_MobileNetV3 模型使用COCO数据集中train2017作为训练集,使用val2017作为测试集,Box APval为`mAP(IoU=0.5:0.95)`评估结果, Box AP50val为`mAP(IoU=0.5)`评估结果。 +- PP-YOLO_MobileNetV3 模型训练过程中使用4GPU,每GPU batch size为32进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ)调整学习率和迭代次数。 +- PP-YOLO_MobileNetV3 模型推理速度测试环境配置为麒麟990芯片单线程。 + +### PP-YOLO tiny模型 + +| 模型 | GPU 个数 | 每GPU图片个数 | 模型体积 | 后量化模型体积 | 输入尺寸 | Box APval | Kirin 990 1xCore (FPS) | 模型下载 | 配置文件 | 量化后模型 | +|:----------------------------:|:----------:|:-------------:| :--------: | :------------: | :----------:| :------------------: | :--------------------: | :------: | :------: | :--------: | +| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 320 | 20.6 | 92.3 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | +| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 416 | 22.7 | 65.4 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | + +- PP-YOLO-tiny 模型使用COCO数据集中train2017作为训练集,使用val2017作为测试集,Box APval为`mAP(IoU=0.5:0.95)`评估结果, Box AP50val为`mAP(IoU=0.5)`评估结果。 +- PP-YOLO-tiny 模型训练过程中使用8GPU,每GPU batch size为32进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/FAQ/README.md)调整学习率和迭代次数。 +- PP-YOLO-tiny 模型推理速度测试环境配置为麒麟990芯片4线程,arm8架构。 +- 我们也提供的PP-YOLO-tiny的后量化压缩模型,将模型体积压缩到**1.3M**,对精度和预测速度基本无影响 + +### Pascal VOC数据集上的PP-YOLO + +PP-YOLO在Pascal VOC数据集上训练模型如下: + +| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP50val | 模型下载 | 配置文件 | +|:------------------:|:-------:|:-------------:|:----------:| :----------:| :--------------------: | :------: | :-----: | +| PP-YOLO | 8 | 12 | ResNet50vd | 608 | 84.9 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | +| PP-YOLO | 8 | 12 | ResNet50vd | 416 | 84.3 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | +| PP-YOLO | 8 | 12 | ResNet50vd | 320 | 82.2 | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml) | + +## 使用说明 + +### 1. 训练 + +使用8GPU通过如下命令一键式启动训练(以下命令均默认在PaddleDetection根目录运行), 通过`--eval`参数开启训练中交替评估。 + +```bash +python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 & +``` + +可选:在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor,并注意修改模型配置文件和Reader配置文件中的anchor设置,如`configs/ppyolo/_base_/ppyolo_tiny.yml`和`configs/ppyolo/_base_/ppyolo_tiny_reader.yml`中anchor设置 +```bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -n 9 -s 320 -m v2 -i 1000 +``` + +### 2. 评估 + +使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +我们提供了`configs/ppyolo/ppyolo_test.yml`用于评估COCO test-dev2017数据集的效果,评估COCO test-dev2017数据集的效果须先从[COCO数据集下载页](https://cocodataset.org/#download)下载test-dev2017数据集,解压到`configs/ppyolo/ppyolo_test.yml`中`EvalReader.dataset`中配置的路径,并使用如下命令进行评估 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +评估结果保存于`bbox.json`中,将其压缩为zip包后通过[COCO数据集评估页](https://competitions.codalab.org/competitions/20794#participate)提交评估。 + +**注意1:** `configs/ppyolo/ppyolo_test.yml`仅用于评估COCO test-dev数据集,不用于训练和评估COCO val2017数据集。 + +**注意2:** 由于动态图框架整体升级,以下几个PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --bias +``` +主要有: + +1.ppyolo_r50vd_dcn_1x_coco + +2.ppyolo_r50vd_dcn_voc + +3.ppyolo_r18vd_coco + +4.ppyolo_mbv3_large_coco + +5.ppyolo_mbv3_small_coco + +6.ppyolo_tiny_650e_coco + +### 3. 推理 + +使用单GPU通过如下命令一键式推理图像,通过`--infer_img`指定图像路径,或通过`--infer_dir`指定目录并推理目录下所有图像 + +```bash +# 推理单张图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# 推理目录下所有图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=demo +``` + +### 4. 推理部署 + +PP-YOLO模型部署及推理benchmark需要通过`tools/export_model.py`导出模型后使用Paddle预测库进行部署和推理,可通过如下命令一键式启动。 + +```bash +# 导出模型,默认存储于output/ppyolo目录 +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 预测库推理 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=GPU +``` + + +## 附录 + +PP-YOLO模型相对于YOLOv3模型优化项消融实验数据如下表所示。 + +| 序号 | 模型 | Box APval | Box APtest | 参数量(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :--------------------------- | :------------------: | :-------------------: | :-------: | :------: | :-----------: | +| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | +| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | +| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | +| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | +| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | +| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | +| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | +| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | +| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | +| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | +| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | + +**注意:** + +- 精度与推理速度数据均为使用输入图像尺寸为608的测试结果 +- Box AP为在COCO train2017数据集训练,val2017和test-dev2017数据集上评估`mAP(IoU=0.5:0.95)`数据 +- 推理速度为单卡V100上,batch size=1, 使用上述benchmark测试方法的测试结果,测试环境配置为CUDA 10.2,CUDNN 7.5.1 +- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml)精度38.9为PaddleDetection优化后的YOLOv3模型,可参见[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/yolov3/README.md) + +## 引用 + +``` +@article{huang2021pp, + title={PP-YOLOv2: A Practical Object Detector}, + author={Huang, Xin and Wang, Xinxin and Lv, Wenyu and Bai, Xiaying and Long, Xiang and Deng, Kaipeng and Dang, Qingqing and Han, Shumin and Liu, Qiwen and Hu, Xiaoguang and others}, + journal={arXiv preprint arXiv:2104.10419}, + year={2021} +} +@misc{long2020ppyolo, +title={PP-YOLO: An Effective and Efficient Implementation of Object Detector}, +author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen}, +year={2020}, +eprint={2007.12099}, +archivePrefix={arXiv}, +primaryClass={cs.CV} +} +@misc{ppdet2019, +title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, +author={PaddlePaddle Authors}, +howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, +year={2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..fe51b296c72e4c663bf4c611d80a1173ff69f6a9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_1x.yml @@ -0,0 +1,22 @@ +epoch: 405 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 243 + - 324 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_2x.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..c601a18601c7a0d8a79049cb0d1b9a87f41900f4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_2x.yml @@ -0,0 +1,22 @@ +epoch: 811 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 649 + - 730 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_365e.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_365e.yml new file mode 100644 index 0000000000000000000000000000000000000000..d834a4ce0547a77a236964f7dc6ce52c217be2d5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_365e.yml @@ -0,0 +1,21 @@ +epoch: 365 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_650e.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_650e.yml new file mode 100644 index 0000000000000000000000000000000000000000..79a1f98eacb86cf8ae8ac34ce0c1e601cce78322 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/optimizer_650e.yml @@ -0,0 +1,22 @@ +epoch: 650 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 430 + - 540 + - 610 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_large.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_large.yml new file mode 100644 index 0000000000000000000000000000000000000000..0faaa9a9a3bb1d94abe183ed385558852d0fbc20 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_large.yml @@ -0,0 +1,56 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: MobileNetV3 + neck: PPYOLOFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNetV3: + model_name: large + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [13, 16] + +PPYOLOFPN: + in_channels: [160, 368] + coord_conv: true + conv_block_num: 0 + spp: true + drop_block: true + +YOLOv3Head: + anchors: [[11, 18], [34, 47], [51, 126], + [115, 71], [120, 195], [254, 235]] + anchor_masks: [[3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.5 + downsample: [32, 16] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MultiClassNMS + keep_top_k: 100 + nms_threshold: 0.45 + nms_top_k: 1000 + score_threshold: 0.005 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_small.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_small.yml new file mode 100644 index 0000000000000000000000000000000000000000..dda938298f2c1b65652405b808c6df14ed049c77 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_mbv3_small.yml @@ -0,0 +1,56 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_small_x1_0_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: MobileNetV3 + neck: PPYOLOFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNetV3: + model_name: small + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [9, 12] + +PPYOLOFPN: + in_channels: [96, 304] + coord_conv: true + conv_block_num: 0 + spp: true + drop_block: true + +YOLOv3Head: + anchors: [[11, 18], [34, 47], [51, 126], + [115, 71], [120, 195], [254, 235]] + anchor_masks: [[3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.5 + downsample: [32, 16] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MultiClassNMS + keep_top_k: 100 + nms_threshold: 0.45 + nms_top_k: 1000 + score_threshold: 0.005 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r18vd.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r18vd.yml new file mode 100644 index 0000000000000000000000000000000000000000..56a34838574f277b4b43dd536449ee39b7c4e0c1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r18vd.yml @@ -0,0 +1,57 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet18_vd_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: ResNet + neck: PPYOLOFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 18 + variant: d + return_idx: [2, 3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOFPN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + conv_block_num: 0 + +YOLOv3Head: + anchor_masks: [[3, 4, 5], [0, 1, 2]] + anchors: [[10, 14], [23, 27], [37, 58], + [81, 82], [135, 169], [344, 319]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml new file mode 100644 index 0000000000000000000000000000000000000000..22cad952379161a58dd298b98c1ab36999dae28d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml @@ -0,0 +1,66 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: ResNet + neck: PPYOLOFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOFPN: + coord_conv: true + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_reader.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..1698539afc0b63bf002831a3a6cd0c63a1828db9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_reader.yml @@ -0,0 +1,42 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 24 + shuffle: true + drop_last: true + mixup_epoch: 25000 + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny.yml new file mode 100644 index 0000000000000000000000000000000000000000..d03e2bb86a494d07b785ede5bf93db7886fe40cc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny.yml @@ -0,0 +1,55 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: MobileNetV3 + neck: PPYOLOTinyFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNetV3: + model_name: large + scale: .5 + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [7, 13, 16] + +PPYOLOTinyFPN: + detection_block_channels: [160, 128, 96] + spp: true + drop_block: true + +YOLOv3Head: + anchors: [[10, 15], [24, 36], [72, 42], + [35, 87], [102, 96], [60, 170], + [220, 125], [128, 222], [264, 266]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.5 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MultiClassNMS + keep_top_k: 100 + nms_threshold: 0.45 + nms_top_k: 1000 + score_threshold: 0.005 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny_reader.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..14c8a7f5aab0fca7d9c5dfccce4d8b590c9ab2ef --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolo_tiny_reader.yml @@ -0,0 +1,42 @@ +worker_num: 4 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [192, 224, 256, 288, 320, 352, 384, 416, 448, 480, 512], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 15], [24, 36], [72, 42], [35, 87], [102, 96], [60, 170], [220, 125], [128, 222], [264, 266]], downsample_ratios: [32, 16, 8]} + batch_size: 32 + shuffle: true + drop_last: true + mixup_epoch: 500 + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 320, 320] + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml new file mode 100644 index 0000000000000000000000000000000000000000..6288adeed8a4b057261f98132456f71b724fc45d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml @@ -0,0 +1,65 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_reader.yml b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..f0dfd9f62207676c95988331a2d6ba8a07a0b2b1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/_base_/ppyolov2_reader.yml @@ -0,0 +1,42 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 12 + shuffle: true + drop_last: true + mixup_epoch: 25000 + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..01558786e5f75658a023883bca9c6accd3ef23a2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_mbv3_large.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 10 +weights: output/ppyolo_mbv3_large_coco/model_final + +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: + target_size: [224, 256, 288, 320, 352, 384, 416, 448, 480, 512] + random_size: True + random_interp: True + keep_ratio: False + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: + anchor_masks: [[3, 4, 5], [0, 1, 2]] + anchors: [[11, 18], [34, 47], [51, 126], [115, 71], [120, 195], [254, 235]] + downsample_ratios: [32, 16] + iou_thresh: 0.25 + num_classes: 80 + batch_size: 32 + mixup_epoch: 200 + shuffle: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 320, 320] + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +epoch: 270 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 162 + - 216 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_small_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_small_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..53554c40ccba90cbb8019b23f2b7a64ce3c35bc7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_mbv3_small_coco.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_mbv3_small.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 10 +weights: output/ppyolo_mbv3_small_coco/model_final + +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: + target_size: [224, 256, 288, 320, 352, 384, 416, 448, 480, 512] + random_size: True + random_interp: True + keep_ratio: False + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: + anchor_masks: [[3, 4, 5], [0, 1, 2]] + anchors: [[11, 18], [34, 47], [51, 126], [115, 71], [120, 195], [254, 235]] + downsample_ratios: [32, 16] + iou_thresh: 0.25 + num_classes: 80 + batch_size: 32 + mixup_epoch: 200 + shuffle: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 320, 320] + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +epoch: 270 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 162 + - 216 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r18vd_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r18vd_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..311e3f16f7932bf493cddf21bcf05db9e8dd20cc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r18vd_coco.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r18vd.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 10 +weights: output/ppyolo_r18vd_coco/model_final + +TrainReader: + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: + target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] + random_size: True + random_interp: True + keep_ratio: False + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + is_scale: True + - Permute: {} + - Gt2YoloTarget: + anchor_masks: [[3, 4, 5], [0, 1, 2]] + anchors: [[10, 14], [23, 27], [37, 58], [81, 82], [135, 169], [344, 319]] + downsample_ratios: [32, 16] + + batch_size: 32 + mixup_epoch: 500 + shuffle: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [512, 512], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 512, 512] + sample_transforms: + - Decode: {} + - Resize: {target_size: [512, 512], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +epoch: 270 + +LearningRate: + base_lr: 0.004 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 162 + - 216 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..918f3401e79a34c6859d594603b322e833e263c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 +weights: output/ppyolo_r50vd_dcn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml new file mode 100644 index 0000000000000000000000000000000000000000..87b976b99640dbf66c92ba5b1180a80e696ba195 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml @@ -0,0 +1,44 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 8 +use_ema: true +weights: output/ppyolo_r50vd_dcn_1x_minicoco/model_final + +TrainReader: + batch_size: 12 + +TrainDataset: + !COCODataSet + image_dir: train2017 + # refer to https://github.com/giddyyupp/coco-minitrain + anno_path: annotations/instances_minitrain2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +epoch: 192 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 153 + - 173 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ac6531fe78ae85ec56fdaf6eed17b38dd807b805 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_2x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 +weights: output/ppyolo_r50vd_dcn_2x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..5349d6b1ed381705218f32daf17bff92a233d89e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 83 +weights: output/ppyolo_r50vd_dcn_voc/model_final + +TrainReader: + mixup_epoch: 350 + batch_size: 12 + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +epoch: 583 + +LearningRate: + base_lr: 0.00333 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 466 + - 516 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_test.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_test.yml new file mode 100644 index 0000000000000000000000000000000000000000..7c2ca0b5355c73f964bc950d3ab2d42629c9d82b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_test.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 + +EvalDataset: + !COCODataSet + image_dir: test2017 + anno_path: annotations/image_info_test-dev2017.json + dataset_dir: dataset/coco diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..288a0eba8063864877762dfecf9b22373121fe2a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolo_tiny_650e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_tiny.yml', + './_base_/optimizer_650e.yml', + './_base_/ppyolo_tiny_reader.yml', +] + +snapshot_epoch: 1 +weights: output/ppyolo_tiny_650e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0f1aee746e4fd58ed060c83213c3306aea57e83e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml @@ -0,0 +1,20 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolov2_r50vd_dcn.yml', + './_base_/optimizer_365e.yml', + './_base_/ppyolov2_reader.yml', +] + +snapshot_epoch: 8 +weights: output/ppyolov2_r101vd_dcn_365e_coco/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_ssld_pretrained.pdparams + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a5e1bc33560f882594156a6deb03798ea5553e7f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolov2_r50vd_dcn.yml', + './_base_/optimizer_365e.yml', + './_base_/ppyolov2_reader.yml', +] + +snapshot_epoch: 8 +weights: output/ppyolov2_r50vd_dcn_365e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..cb4d3451a57d2363850fb697ff3c21ac50e6c648 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + './_base_/ppyolov2_r50vd_dcn.yml', + './_base_/optimizer_365e.yml', + './_base_/ppyolov2_reader.yml', +] + +snapshot_epoch: 83 +weights: output/ppyolov2_r50vd_dcn_voc/model_final + +TrainReader: + mixup_epoch: 350 + batch_size: 12 + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +epoch: 583 + +LearningRate: + base_lr: 0.00333 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 466 + - 516 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/README.md b/PaddleDetection-release-2.6/configs/ppyoloe/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1c90e8ad6915e70182e45fe8dff7ed6e7ff7ba5f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/README.md @@ -0,0 +1,317 @@ +English | [简体中文](README_cn.md) + +# PP-YOLOE + +## Latest News +- Release PP-YOLOE+ model: **(2022.08)** + - Pre training model using large-scale data set obj365 + - In the backbone, add the alpha parameter to the block branch + - Optimize the end-to-end inference speed and improve the training convergence speed + +## Legacy model +- Please refer to:[PP-YOLOE 2022.03](./README_legacy.md) for details + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Getting Start](#Getting-Start) +- [Appendix](#Appendix) + +## Introduction +PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. For more details, please refer to our [report](https://arxiv.org/abs/2203.16250). + +
    + +
    + +PP-YOLOE+_l achieves 53.3 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE+_l can be further accelerated to 149.2 FPS. PP-YOLOE+_s/m/x also have excellent accuracy and speed performance, which can be found in [Model Zoo](#Model-Zoo) + +PP-YOLOE is composed of following methods: +- Scalable backbone and neck +- [Task Alignment Learning](https://arxiv.org/abs/2108.07755) +- Efficient Task-aligned head with [DFL](https://arxiv.org/abs/2006.04388) and [VFL](https://arxiv.org/abs/2008.13367) +- [SiLU(Swish) activation function](https://arxiv.org/abs/1710.05941) + +## Model Zoo + +### Model Zoo on COCO + +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:--------------:|:-----:|:-------:|:----------:|:----------:| :-------:|:--------------------------:|:---------------------------:|:---------:|:--------:|:---------------:| :---------------------: |:------------------------------------------------------------------------------------:|:-------------------------------------------:| +| PP-YOLOE+_s | 80 | 8 | 8 | cspresnet-s | 640 | 43.7 | 43.9 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_s_80e_coco.yml) | +| PP-YOLOE+_m | 80 | 8 | 8 | cspresnet-m | 640 | 49.8 | 50.0 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_m_80e_coco.yml) | +| PP-YOLOE+_l | 80 | 8 | 8 | cspresnet-l | 640 | 52.9 | 53.3 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_l_80e_coco.yml) | +| PP-YOLOE+_x | 80 | 8 | 8 | cspresnet-x | 640 | 54.7 | 54.9 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_x_80e_coco.yml) | + + +#### Tiny model + +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | T4 TensorRT FP16(FPS) | download | config | +|:--------:|:-----:|:----------:|:----------:|:----------:|:-----------:|:--------------------------:|:---------------------------:|:---------:|:--------:|:---------------------:| :------: |:--------:| +| PP-YOLOE+_t-aux(640) | 300 | 8 | 8 | cspresnet-t | 640 | 39.7 | 56.4 | 4.85 | 19.15 | 344.8 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_300e_coco.yml) | +| PP-YOLOE+_t-aux(640)-relu | 300 | 8 | 8 | cspresnet-t | 640 | 36.4 | 53.0 | 3.60 | 12.17 | 476.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_relu_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_relu_300e_coco.yml) | +| PP-YOLOE+_t-aux(320) | 300 | 8 | 8 | cspresnet-t | 320 | 33.3 | 48.5 | 4.85 | 4.80 | 729.9 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_320_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_320_300e_coco.yml) | +| PP-YOLOE+_t-aux(320)-relu | 300 | 8 | 8 | cspresnet-t | 320 | 29.5 | 43.7 | 3.60 | 3.04 | 984.8 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.yml) | + + +### Comprehensive Metrics +| Model | Epoch | AP0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | +|:------------------------:|:-----:|:---------------:|:----------:|:------------:|:------------:| :-----------: |:------------:|:------------:|:-------------:|:------------:| +| PP-YOLOE+_s | 80 | 43.7 | 60.6 | 47.9 | 26.5 | 47.5 | 59.0 | 46.7 | 71.4 | 81.7 | +| PP-YOLOE+_m | 80 | 49.8 | 67.1 | 54.5 | 31.8 | 53.9 | 66.2 | 53.3 | 75.0 | 84.6 | +| PP-YOLOE+_l | 80 | 52.9 | 70.1 | 57.9 | 35.2 | 57.5 | 69.1 | 56.0 | 77.9 | 86.9 | +| PP-YOLOE+_x | 80 | 54.7 | 72.0 | 59.9 | 37.9 | 59.3 | 70.4 | 57.0 | 78.7 | 87.2 | + + +### End-to-end Speed +| Model | AP0.5:0.95 | TRT-FP32(fps) | TRT-FP16(fps) | +|:-----------:|:---------------:|:-------------:|:-------------:| +| PP-YOLOE+_s | 43.7 | 44.44 | 47.85 | +| PP-YOLOE+_m | 49.8 | 39.06 | 43.86 | +| PP-YOLOE+_l | 52.9 | 34.01 | 42.02 | +| PP-YOLOE+_x | 54.7 | 26.88 | 36.76 | + +**Notes:** + +- PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset. +- The model weights in the table of Comprehensive Metrics are **the same as** that in the original Model Zoo, and evaluated on **val2017**. +- PP-YOLOE used 8 GPUs for mixed precision training, if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, **CUDA 10.2**, **CUDNN 7.6.5**, **TensorRT 6.0.1.8** in TensorRT mode. +- Refer to [Speed testing](#Speed-testing) to reproduce the speed testing results of PP-YOLOE. +- If you set `--run_benchmark=True`,you should install these dependencies at first, `pip install pynvml psutil GPUtil`. +- End-to-end speed test includes pre-processing + inference + post-processing and NMS time, using **Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz**, **single Tesla V100**, **CUDA 11.2**, **CUDNN 8.2.0**, **TensorRT 8.0.1.6**. + +### Model Zoo on Objects365 +| Model | Epoch | Machine number | GPU number | images/GPU | backbone | input shape | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:---------------:|:-----:|:-----------:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :--------:|:--------:| +| PP-YOLOE+_s | 60 | 3 | 8 | 8 | cspresnet-s | 640 | 18.1 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_s_60e_objects365.yml) | +| PP-YOLOE+_m | 60 | 4 | 8 | 8 | cspresnet-m | 640 | 25.0 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_m_60e_objects365.yml) | +| PP-YOLOE+_l | 60 | 3 | 8 | 8 | cspresnet-l | 640 | 30.8 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_l_60e_objects365.yml) | +| PP-YOLOE+_x | 60 | 4 | 8 | 8 | cspresnet-x | 640 | 32.7 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_x_60e_objects365.yml) | + + +**Notes:** +- The Details for multiple machine and multi-gpu training, see [DistributedTraining](../../docs/tutorials/DistributedTraining_en.md) + + +### Model Zoo on VOC + +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:---------------:|:-----:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :-------: |:--------:| +| PP-YOLOE+_s | 30 | 8 | 8 | cspresnet-s | 640 | 86.7 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [config](./voc/ppyoloe_plus_crn_s_30e_voc.yml) | +| PP-YOLOE+_l | 30 | 8 | 8 | cspresnet-l | 640 | 89.0 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_30e_voc.pdparams) | [config](./voc/ppyoloe_plus_crn_l_30e_voc.yml) | + + +### Feature Models + +The PaddleDetection team provides configs and weights of various feature detection models based on PP-YOLOE, which users can download for use: + +|Scenarios | Related Datasets | Links| +| :--------: | :---------: | :------: | +|Pedestrian Detection | CrowdHuman | [pphuman](../pphuman) | +|Vehicle Detection | BDD100K, UA-DETRAC | [ppvehicle](../ppvehicle) | +|Small Object Detection | VisDrone、DOTA、xView | [smalldet](../smalldet) | +|Densely Packed Object Detection | SKU110k | [application](./application) | +|Rotated Object Detection | DOTA | [PP-YOLOE-R](../rotate/ppyoloe_r/) | + + +## Getting Start + +### Datasets and Metrics + +PaddleDetection team provides **COCO and VOC dataset** , decompress and place it under `PaddleDetection/dataset/`: + +``` +wget https://bj.bcebos.com/v1/paddledet/data/coco.tar +# tar -xvf coco.tar + +wget https://bj.bcebos.com/v1/paddledet/data/voc.zip +# unzip voc.zip +``` + +**Note:** + - For the format of COCO style dataset, please refer to [format-data](https://cocodataset.org/#format-data) and [format-results](https://cocodataset.org/#format-results). + - For the evaluation metric of COCO, please refer to [detection-eval](https://cocodataset.org/#detection-eval), and install [cocoapi](https://github.com/cocodataset/cocoapi) at first. + - For the evaluation metric of VOC, please refer to [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html). + +### Custom dataset + +1.For the annotation of custom dataset, please refer to [DetAnnoTools](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/DetAnnoTools_en.md); + +2.For training preparation of custom dataset,please refer to [PrepareDataSet](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/PrepareDetDataSet_en.md). + + +### Training + +Training PP-YOLOE+ on 8 GPUs with following command + +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp +``` + +**Notes:** +- If you need to evaluate while training, please add `--eval`. +- PP-YOLOE+ supports mixed precision training, please add `--amp`. +- PaddleDetection supports multi-machine distributed training, you can refer to [DistributedTraining tutorial](../../docs/tutorials/DistributedTraining_en.md). + + +### Evaluation + +Evaluating PP-YOLOE+ on COCO val2017 dataset in single GPU with following commands: + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams +``` + +For evaluation on COCO test-dev2017 dataset, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to COCO dataset directory and configure `EvalDataset` like `configs/ppyolo/ppyolo_test.yml`. + +### Inference + +Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_dir=demo +``` + +### Exporting models + +For deployment on GPU or speed testing, model should be first exported to inference model using `tools/export_model.py`. + +**Exporting PP-YOLOE+ for Paddle Inference without TensorRT**, use following command + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams +``` + +**Exporting PP-YOLOE+ for Paddle Inference with TensorRT** for better performance, use following command with extra `-o trt=True` setting. + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True +``` + +If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/EXPORT_ONNX_MODEL_en.md). + +```bash +# export inference model +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True + +# install paddle2onnx +pip install paddle2onnx + +# convert to onnx +paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_l_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_plus_crn_l_80e_coco.onnx + +``` + +**Notes:** ONNX model only supports batch_size=1 now + +### Speed testing + +For fair comparison, the speed in [Model Zoo](#Model-Zoo) do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. Thus, you should export model with extra `-o exclude_nms=True` setting. + +**Using Paddle Inference without TensorRT** to test speed, run following command + +```bash +# export inference model +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True + +# speed testing with run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**Using Paddle Inference with TensorRT** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True trt=True + +# speed testing with run_benchmark=True,run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True + +``` + +**Using TensorRT Inference with ONNX** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True + +# convert to onnx +paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx + +# trt inference using fp16 and batch_size=1 +trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# trt inference using fp16 and batch_size=32 +trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows, + +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps + +``` + + +### Deployment + +PP-YOLOE can be deployed by following approaches: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim](../slim) + +Next, we will introduce how to use Paddle Inference to deploy PP-YOLOE models in TensorRT FP16 mode. + +First, refer to [Paddle Inference Docs](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python), download and install packages corresponding to CUDA, CUDNN and TensorRT version. + +Then, Exporting PP-YOLOE for Paddle Inference **with TensorRT**, use following command. + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True +``` + +Finally, inference in TensorRT FP16 mode. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 + +``` + +**Notes:** +- TensorRT will perform optimization for the current hardware platform according to the definition of the network, generate an inference engine and serialize it into a file. This inference engine is only applicable to the current hardware hardware platform. If your hardware and software platform has not changed, you can set `use_static=True` in [enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660). In this way, the serialized file generated will be saved in the `output_inference` folder, and the saved serialized file will be loaded the next time when TensorRT is executed. +- PaddleDetection release/2.4 and later versions will support NMS calling TensorRT, which requires PaddlePaddle release/2.3 and later versions. + +### Other Datasets + +Model | AP | AP50 +---|---|--- +[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) | 22.6 | 37.5 +[YOLOv5](https://github.com/ultralytics/yolov5) | 26.0 | 42.7 +**PP-YOLOE** | **30.5** | **46.4** + +**Notes** +- Here, we use [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricycle, awning-tricycle, bus, motor`. +- Above models trained using official default config, and load pretrained parameters on COCO dataset. +- *Due to the limited time, more verification results will be supplemented in the future. You are also welcome to contribute to PP-YOLOE* + + +## Appendix + +Ablation experiments of PP-YOLOE. + +| NO. | Model | Box APval | Params(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :---------------------------: | :------------------: | :-------: | :------: | :-----------: | +| A | PP-YOLOv2 | 49.1 | 54.58 | 115.77 | 68.9 | +| B | A + Anchor-free | 48.8 | 54.27 | 114.78 | 69.8 | +| C | B + CSPRepResNet | 49.5 | 47.42 | 101.87 | 85.5 | +| D | C + TAL | 50.4 | 48.32 | 104.75 | 84.0 | +| E | D + ET-Head | 50.9 | 52.20 | 110.07 | 78.1 | diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/README_cn.md b/PaddleDetection-release-2.6/configs/ppyoloe/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..6f0288d126def4891363a5e3d51c76b51073135b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/README_cn.md @@ -0,0 +1,314 @@ +简体中文 | [English](README.md) + +# PP-YOLOE + +## 最新动态 +- 发布PP-YOLOE+模型: **(2022.08)** + - 使用大规模数据集obj365预训练模型 + - 在backbone中block分支中增加alpha参数 + - 优化端到端推理速度,提升训练收敛速度 + +## 历史版本模型 +- 详情请参考:[PP-YOLOE 2022.03版本](./README_legacy.md) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [使用说明](#使用说明) +- [附录](#附录) + +## 简介 +PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型,超越了多种流行的YOLO模型。PP-YOLOE有一系列的模型,即s/m/l/x,可以通过width multiplier和depth multiplier配置。PP-YOLOE避免了使用诸如Deformable Convolution或者Matrix NMS之类的特殊算子,以使其能轻松地部署在多种多样的硬件上。更多细节可以参考我们的[report](https://arxiv.org/abs/2203.16250)。 + +
    + +
    + +PP-YOLOE+_l在COCO test-dev2017达到了53.3的mAP, 同时其速度在Tesla V100上达到了78.1 FPS。PP-YOLOE+_s/m/x同样具有卓越的精度速度性价比, 其精度速度可以在[模型库](#模型库)中找到。 + +PP-YOLOE由以下方法组成 +- 可扩展的backbone和neck +- [Task Alignment Learning](https://arxiv.org/abs/2108.07755) +- Efficient Task-aligned head with [DFL](https://arxiv.org/abs/2006.04388)和[VFL](https://arxiv.org/abs/2008.13367) +- [SiLU(Swish)激活函数](https://arxiv.org/abs/1710.05941) + +## 模型库 + +### COCO数据集模型库 + +| 模型 | Epoch | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:---------------:|:-----:|:---------:|:--------:|:----------:|:----------:|:--------------------------:|:---------------------------:|:---------:|:--------:|:---------------:| :---------------------: |:------------------------------------------------------------------------------------:|:-------------------------------------------:| +| PP-YOLOE+_s | 80 | 8 | 8 | cspresnet-s | 640 | 43.7 | 43.9 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_s_80e_coco.yml) | +| PP-YOLOE+_m | 80 | 8 | 8 | cspresnet-m | 640 | 49.8 | 50.0 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_m_80e_coco.yml) | +| PP-YOLOE+_l | 80 | 8 | 8 | cspresnet-l | 640 | 52.9 | 53.3 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_l_80e_coco.yml) | +| PP-YOLOE+_x | 80 | 8 | 8 | cspresnet-x | 640 | 54.7 | 54.9 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](./ppyoloe_plus_crn_x_80e_coco.yml) | + +#### Tiny模型 + +| 模型 | Epoch | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | T4 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:----------:|:-----:|:--------:|:-----------:|:---------:|:--------:|:--------------------------:|:---------------------------:|:---------:|:--------:|:---------------------:| :------: |:--------:| +| PP-YOLOE+_t-aux(640) | 300 | 8 | 8 | cspresnet-t | 640 | 39.7 | 56.4 | 4.85 | 19.15 | 344.8 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_300e_coco.yml) | +| PP-YOLOE+_t-aux(640)-relu | 300 | 8 | 8 | cspresnet-t | 640 | 36.4 | 53.0 | 3.60 | 12.17 | 476.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_relu_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_relu_300e_coco.yml) | +| PP-YOLOE+_t-aux(320) | 300 | 8 | 8 | cspresnet-t | 320 | 33.3 | 48.5 | 4.85 | 4.80 | 729.9 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_320_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_320_300e_coco.yml) | +| PP-YOLOE+_t-aux(320)-relu | 300 | 8 | 8 | cspresnet-t | 320 | 29.5 | 43.7 | 3.60 | 3.04 | 984.8 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.pdparams) | [config](./ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.yml) | + + +### 综合指标 +| 模型 | Epoch | AP0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | +|:------------------------:|:-----:|:---------------:|:----------:|:-----------:|:------------:|:-------------:|:------------:|:------------:|:-------------:|:------------:| +| PP-YOLOE+_s | 80 | 43.7 | 60.6 | 47.9 | 26.5 | 47.5 | 59.0 | 46.7 | 71.4 | 81.7 | +| PP-YOLOE+_m | 80 | 49.8 | 67.1 | 54.5 | 31.8 | 53.9 | 66.2 | 53.3 | 75.0 | 84.6 | +| PP-YOLOE+_l | 80 | 52.9 | 70.1 | 57.9 | 35.2 | 57.5 | 69.1 | 56.0 | 77.9 | 86.9 | +| PP-YOLOE+_x | 80 | 54.7 | 72.0 | 59.9 | 37.9 | 59.3 | 70.4 | 57.0 | 78.7 | 87.2 | + + +### 端到端速度 +| 模型 | AP0.5:0.95 | TRT-FP32(fps) | TRT-FP16(fps) | +|:------------------------:|:---------------:|:-------------:|:-------------:| +| PP-YOLOE+_s | 43.7 | 44.44 | 47.85 | +| PP-YOLOE+_m | 49.8 | 39.06 | 43.86 | +| PP-YOLOE+_l | 52.9 | 34.01 | 42.02 | +| PP-YOLOE+_x | 54.7 | 26.88 | 36.76 | + +**注意:** + +- PP-YOLOE模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集。 +- 综合指标的表格与模型库的表格里的模型权重是**同一个权重**,综合指标是使用**val2017**作为验证精度的。 +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- PP-YOLOE模型推理速度测试采用单卡V100,batch size=1进行测试,使用**CUDA 10.2**, **CUDNN 7.6.5**,TensorRT推理速度测试使用**TensorRT 6.0.1.8**。 +- 参考[速度测试](#速度测试)以复现PP-YOLOE推理速度测试结果。 +- 如果你设置了`--run_benchmark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 +- 端到端速度测试包含模型前处理 + 模型推理 + 模型后处理及NMS的时间,测试使用**Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz**, **单卡V100**, **CUDA 11.2**, **CUDNN 8.2.0**, **TensorRT 8.0.1.6**。 + +### Objects365数据集模型库 +| 模型 | Epoch | 机器个数 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:---------------:|:-----:|:-----------:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :--------:|:--------:| +| PP-YOLOE+_s | 60 | 3 | 8 | 8 | cspresnet-s | 640 | 18.1 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_s_60e_objects365.yml) | +| PP-YOLOE+_m | 60 | 4 | 8 | 8 | cspresnet-m | 640 | 25.0 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_m_60e_objects365.yml) | +| PP-YOLOE+_l | 60 | 3 | 8 | 8 | cspresnet-l | 640 | 30.8 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_l_60e_objects365.yml) | +| PP-YOLOE+_x | 60 | 4 | 8 | 8 | cspresnet-x | 640 | 32.7 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams) | [config](./objects365/ppyoloe_plus_crn_x_60e_objects365.yml) | + + +**注意:** +- 多机训练细节见[文档](../../docs/tutorials/DistributedTraining_cn.md) + + +### VOC数据集模型库 +| 模型 | Epoch | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:---------------:|:-----:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :-------: |:--------:| +| PP-YOLOE+_s | 30 | 8 | 8 | cspresnet-s | 640 | 86.7 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [config](./voc/ppyoloe_plus_crn_s_30e_voc.yml) | +| PP-YOLOE+_l | 30 | 8 | 8 | cspresnet-l | 640 | 89.0 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_30e_voc.pdparams) | [config](./voc/ppyoloe_plus_crn_l_30e_voc.yml) | + + +### 垂类应用模型 + +PaddleDetection团队提供了基于PP-YOLOE的各种垂类检测模型的配置文件和权重,用户可以下载进行使用: + +| 场景 | 相关数据集 | 链接 | +| :--------: | :---------: | :------: | +| 行人检测 | CrowdHuman | [pphuman](../pphuman) | +| 车辆检测 | BDD100K、UA-DETRAC | [ppvehicle](../ppvehicle) | +| 小目标检测 | VisDrone、DOTA、xView | [smalldet](../smalldet) | +| 密集目标检测 | SKU110k | [application](./application) | +| 旋转框检测 | DOTA | [PP-YOLOE-R](../rotate/ppyoloe_r/) | + + +## 使用说明 + +### 数据集和评价指标 + +下载PaddleDetection团队提供的**COCO和VOC数据**,并解压放置于`PaddleDetection/dataset/`下: + +``` +wget https://bj.bcebos.com/v1/paddledet/data/coco.tar +# tar -xvf coco.tar + +wget https://bj.bcebos.com/v1/paddledet/data/voc.zip +# unzip voc.zip +``` + +**注意:** + - COCO风格格式,请参考 [format-data](https://cocodataset.org/#format-data) 和 [format-results](https://cocodataset.org/#format-results)。 + - COCO风格评测指标,请参考 [detection-eval](https://cocodataset.org/#detection-eval) ,并首先安装 [cocoapi](https://github.com/cocodataset/cocoapi)。 + - VOC风格格式和评测指标,请参考 [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html)。 + +### 自定义数据集 + +1.自定义数据集的标注制作,请参考 [DetAnnoTools](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/DetAnnoTools.md); +2.自定义数据集的训练准备,请参考 [PrepareDataSet](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/PrepareDetDataSet.md). + + +### 训练 + +请执行以下指令训练PP-YOLOE+ + +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp +``` +**注意:** +- 如果需要边训练边评估,请添加`--eval`. +- PP-YOLOE+支持混合精度训练,请添加`--amp`. +- PaddleDetection支持多机训练,可以参考[多机训练教程](../../docs/tutorials/DistributedTraining_cn.md). + +### 评估 + +执行以下命令在单个GPU上评估COCO val2017数据集 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams +``` + +在coco test-dev2017上评估,请先从[COCO数据集下载](https://cocodataset.org/#download)下载COCO test-dev2017数据集,然后解压到COCO数据集文件夹并像`configs/ppyolo/ppyolo_test.yml`一样配置`EvalDataset`。 + +### 推理 + +使用以下命令在单张GPU上预测图片,使用`--infer_img`推理单张图片以及使用`--infer_dir`推理文件中的所有图片。 + + +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# 推理文件中的所有图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_dir=demo +``` + +### 模型导出 + +PP-YOLOE+在GPU上部署或者速度测试需要通过`tools/export_model.py`导出模型。 + +当你**使用Paddle Inference但不使用TensorRT**时,运行以下的命令导出模型 + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams +``` + +当你**使用Paddle Inference且使用TensorRT**时,需要指定`-o trt=True`来导出模型。 + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True +``` + +如果你想将PP-YOLOE模型导出为**ONNX格式**,参考 +[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md),运行以下命令: + +```bash + +# 导出推理模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True + +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx格式 +paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_l_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_plus_crn_l_80e_coco.onnx +``` + +**注意:** ONNX模型目前只支持batch_size=1 + +### 速度测试 + +为了公平起见,在[模型库](#模型库)中的速度测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致),需要在导出模型时指定`-o exclude_nms=True`. + +**使用Paddle Inference但不使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True + +# 速度测试,使用run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**使用Paddle Inference且使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型,使用trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True trt=True + +# 速度测试,使用run_benchmark=True, run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True + +``` + + +**使用 ONNX 和 TensorRT** 进行测速,执行以下命令: + +```bash +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True + +# 转化成ONNX格式 +paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx + +# 测试速度,半精度,batch_size=1 +trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# 测试速度,半精度,batch_size=32 +trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下,PPYOLOE-plus-s模型速度如下 +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps +``` + + + +### 部署 + +PP-YOLOE可以使用以下方式进行部署: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim模型量化](../slim) + +接下来,我们将介绍PP-YOLOE如何使用Paddle Inference在TensorRT FP16模式下部署 + +首先,参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 + +然后,运行以下命令导出模型 + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True +``` + +最后,使用TensorRT FP16进行推理 + +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# 推理文件夹下的所有图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 + +``` + +**注意:** +- TensorRT会根据网络的定义,执行针对当前硬件平台的优化,生成推理引擎并序列化为文件。该推理引擎只适用于当前软硬件平台。如果你的软硬件平台没有发生变化,你可以设置[enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660)的参数`use_static=True`,这样生成的序列化文件将会保存在`output_inference`文件夹下,下次执行TensorRT时将加载保存的序列化文件。 +- PaddleDetection release/2.4及其之后的版本将支持NMS调用TensorRT,需要依赖PaddlePaddle release/2.3及其之后的版本 + +### 泛化性验证 + +模型 | AP | AP50 +---|---|--- +[YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) | 22.6 | 37.5 +[YOLOv5](https://github.com/ultralytics/yolov5) | 26.0 | 42.7 +**PP-YOLOE** | **30.5** | **46.4** + +**注意** +- 试验使用[VisDrone](https://github.com/VisDrone/VisDrone-Dataset)数据集, 并且检测其中的9类,包括 `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`. +- 以上模型训练均采用官方提供的默认参数,并且加载COCO预训练参数 +- *由于人力/时间有限,后续将会持续补充更多验证结果,也欢迎各位开源用户贡献,共同优化PP-YOLOE* + + +## 附录 + +PP-YOLOE消融实验 + +| 序号 | 模型 | Box APval | 参数量(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :---------------------------: | :-------------------: | :-------: | :------: | :-----------: | +| A | PP-YOLOv2 | 49.1 | 54.58 | 115.77 | 68.9 | +| B | A + Anchor-free | 48.8 | 54.27 | 114.78 | 69.8 | +| C | B + CSPRepResNet | 49.5 | 47.42 | 101.87 | 85.5 | +| D | C + TAL | 50.4 | 48.32 | 104.75 | 84.0 | +| E | D + ET-Head | 50.9 | 52.20 | 110.07 | 78.1 | diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/README_legacy.md b/PaddleDetection-release-2.6/configs/ppyoloe/README_legacy.md new file mode 100644 index 0000000000000000000000000000000000000000..3daab44766fe8a07adf9a93fd30c9cf47aa38fac --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/README_legacy.md @@ -0,0 +1,39 @@ +# PP-YOLOE Legacy Model Zoo (2022.03) + +## Legacy Model Zoo +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:------------------------:|:-------:|:-------:|:--------:|:----------:| :-------:| :------------------: | :-------------------: |:---------:|:--------:|:---------------:| :---------------------: | :------: | :------: | +| PP-YOLOE-s | 400 | 8 | 32 | cspresnet-s | 640 | 43.4 | 43.6 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml) | +| PP-YOLOE-s | 300 | 8 | 32 | cspresnet-s | 640 | 43.0 | 43.2 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml) | +| PP-YOLOE-m | 300 | 8 | 28 | cspresnet-m | 640 | 49.0 | 49.1 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml) | +| PP-YOLOE-l | 300 | 8 | 20 | cspresnet-l | 640 | 51.4 | 51.6 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml) | +| PP-YOLOE-x | 300 | 8 | 16 | cspresnet-x | 640 | 52.3 | 52.4 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml) | + +### Comprehensive Metrics +| Model | Epoch | AP0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | download | config | +|:----------------------:|:-----:|:---------------:|:----------:|:-------------:| :------------:| :-----------: | :----------: |:------------:|:-------------:|:------------:| :-----: | :-----: | +| PP-YOLOE-s | 400 | 43.4 | 60.0 | 47.5 | 25.7 | 47.8 | 59.2 | 43.9 | 70.8 | 81.9 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml)| +| PP-YOLOE-s | 300 | 43.0 | 59.6 | 47.2 | 26.0 | 47.4 | 58.7 | 45.1 | 70.6 | 81.4 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml)| +| PP-YOLOE-m | 300 | 49.0 | 65.9 | 53.8 | 30.9 | 53.5 | 65.3 | 50.9 | 74.4 | 84.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml)| +| PP-YOLOE-l | 300 | 51.4 | 68.6 | 56.2 | 34.8 | 56.1 | 68.0 | 53.1 | 76.8 | 85.6 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml)| +| PP-YOLOE-x | 300 | 52.3 | 69.5 | 56.8 | 35.1 | 57.0 | 68.6 | 55.5 | 76.9 | 85.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml)| + + +**Notes:** + +- PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset. +- The model weights in the table of Comprehensive Metrics are **the same as** that in the original Model Zoo, and evaluated on **val2017**. +- PP-YOLOE used 8 GPUs for training, if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, **CUDA 10.2**, **CUDNN 7.6.5**, **TensorRT 6.0.1.8** in TensorRT mode. + +## Appendix + +Ablation experiments of PP-YOLOE. + +| NO. | Model | Box APval | Params(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :---------------------------: | :------------------: | :-------: | :------: | :-----------: | +| A | PP-YOLOv2 | 49.1 | 54.58 | 115.77 | 68.9 | +| B | A + Anchor-free | 48.8 | 54.27 | 114.78 | 69.8 | +| C | B + CSPRepResNet | 49.5 | 47.42 | 101.87 | 85.5 | +| D | C + TAL | 50.4 | 48.32 | 104.75 | 84.0 | +| E | D + ET-Head | 50.9 | 52.20 | 110.07 | 78.1 | diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_300e.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_300e.yml new file mode 100644 index 0000000000000000000000000000000000000000..d07bf4e53ef03571a04bda6353f798eabe24dfcd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_300e.yml @@ -0,0 +1,18 @@ +epoch: 300 + +LearningRate: + base_lr: 0.01 + schedulers: + - name: CosineDecay + max_epochs: 360 + - name: LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_36e_xpu.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_36e_xpu.yml new file mode 100644 index 0000000000000000000000000000000000000000..951938468bd767369a41a8318306d8301e5a62fb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_36e_xpu.yml @@ -0,0 +1,18 @@ +epoch: 36 + +LearningRate: + base_lr: 0.00125 + schedulers: + - name: CosineDecay + max_epochs: 43 + - name: LinearWarmup + start_factor: 0.001 + steps: 2000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_400e.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_400e.yml new file mode 100644 index 0000000000000000000000000000000000000000..0a8a5a6c377d2886e3c8e53b3d8fd03d7fba1146 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_400e.yml @@ -0,0 +1,18 @@ +epoch: 400 + +LearningRate: + base_lr: 0.01 + schedulers: + - name: CosineDecay + max_epochs: 480 + - name: LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_60e.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_60e.yml new file mode 100644 index 0000000000000000000000000000000000000000..b261003db3aa56122022234bb0332b4db811ae63 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_60e.yml @@ -0,0 +1,18 @@ +epoch: 60 + +LearningRate: + base_lr: 0.001 + schedulers: + - name: CosineDecay + max_epochs: 72 + - name: LinearWarmup + start_factor: 0. + epochs: 1 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_80e.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_80e.yml new file mode 100644 index 0000000000000000000000000000000000000000..b6ba4ec31a9703c56d2e470b646354cfdfdb7ddc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/optimizer_80e.yml @@ -0,0 +1,18 @@ +epoch: 80 + +LearningRate: + base_lr: 0.001 + schedulers: + - name: CosineDecay + max_epochs: 96 + - name: LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_crn.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_crn.yml new file mode 100644 index 0000000000000000000000000000000000000000..118db7ee19d423a39ba7310a28dc806479128866 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_crn.yml @@ -0,0 +1,47 @@ +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn.yml new file mode 100644 index 0000000000000000000000000000000000000000..c8e6191fdd4515b79596f4bd9ecb48731523a83b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn.yml @@ -0,0 +1,48 @@ +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 30 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn_tiny_auxhead.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn_tiny_auxhead.yml new file mode 100644 index 0000000000000000000000000000000000000000..8aea82150dfaef11a9c7e7362642fdd8e5e951d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_crn_tiny_auxhead.yml @@ -0,0 +1,60 @@ +architecture: PPYOLOEWithAuxHead +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] + +PPYOLOEWithAuxHead: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + aux_head: SimpleConvHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [384, 384, 384] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +SimpleConvHead: + feat_in: 288 + feat_out: 288 + num_convs: 1 + fpn_strides: [32, 16, 8] + norm_type: 'gn' + act: 'LeakyReLU' + reg_max: 16 + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + attn_conv: 'repvgg' # + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + is_close_gt: True # + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd9cdeff8b9d46e41a4e6fb518339168dfd4b154 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader.yml @@ -0,0 +1,40 @@ +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader_320.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader_320.yml new file mode 100644 index 0000000000000000000000000000000000000000..2b7be58daf8e208c4875cff6be9ea48dbf0073e5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_plus_reader_320.yml @@ -0,0 +1,40 @@ +worker_num: 4 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [224, 256, 288, 320, 352, 384, 416, 448, 480, 512, 544], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_reader.yml b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9f99713e5c106321025842db1f61361a82364e77 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/_base_/ppyoloe_reader.yml @@ -0,0 +1,40 @@ +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/README.md b/PaddleDetection-release-2.6/configs/ppyoloe/application/README.md new file mode 100644 index 0000000000000000000000000000000000000000..41bf34f5bece3831539462535d376f8ad367ee3b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/README.md @@ -0,0 +1,69 @@ +# PP-YOLOE+ 下游任务 + +我们验证了PP-YOLOE+模型强大的泛化能力,在农业、低光、工业等不同场景下游任务检测效果稳定提升! + +农业数据集采用[Embrapa WGISD](https://github.com/thsant/wgisd),该数据集用于葡萄栽培中基于图像的监测和现场机器人技术,提供了来自5种不同葡萄品种的实地实例, +处理后的COCO格式,包含图片训练集242张,测试集58张,5个类别,[Embrapa WGISD COCO格式下载](https://bj.bcebos.com/v1/paddledet/data/wgisd.zip); + +低光数据集使用[ExDark](https://github.com/cs-chan/Exclusively-Dark-Image-Dataset/tree/master/Dataset),该数据集是一个专门在低光照环境下拍摄出针对低光目标检测的数据集,包括从极低光环境到暮光环境等10种不同光照条件下的图片, +处理后的COCO格式,包含图片训练集5891张,测试集1472张,12个类别,[ExDark COCO格式下载](https://bj.bcebos.com/v1/paddledet/data/Exdark.zip); + +工业数据集使用[PKU-Market-PCB](https://robotics.pkusz.edu.cn/resources/dataset/),该数据集用于印刷电路板(PCB)的瑕疵检测,提供了6种常见的PCB缺陷, +处理后的COCO格式,包含图片训练集555张,测试集138张,6个类别,[PKU-Market-PCB COCO格式下载](https://bj.bcebos.com/v1/paddledet/data/PCB_coco.zip)。 + +商超数据集[SKU110k](https://github.com/eg4000/SKU110K_CVPR19)是商品超市场景下的密集目标检测数据集,包含11,762张图片和超过170个实例。其中包括8,233张用于训练的图像、588张用于验证的图像和2,941张用于测试的图像。 + + +## 实验结果: + +| 模型 | 数据集 | mAPval
    0.5:0.95 | 下载链接 | 配置文件 | +|:---------|:---------------:|:-----------------------:|:---------:| :-----: | +|PP-YOLOE_m| Embrapa WGISD | 52.7 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_80e_wgisd.pdparams) | [配置文件](./ppyoloe_crn_m_80e_wgisd.yml) | +|PP-YOLOE+_m
    (obj365_pretrained)| Embrapa WGISD | 60.8(+8.1) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd.yml) | +|PP-YOLOE+_m
    (coco_pretrained)| Embrapa WGISD | 59.7(+7.0) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd.yml) | +|PP-YOLOE_m| ExDark | 56.4 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_80e_exdark.pdparams) | [配置文件](./ppyoloe_crn_m_80e_exdark.yml) | +|PP-YOLOE+_m
    (obj365_pretrained)| ExDark | 57.7(+1.3) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark.yml) | +|PP-YOLOE+_m
    (coco_pretrained)| ExDark | 58.1(+1.7) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco_pretrained_exdark.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_coco_pretrained_exdark.yml) | +|PP-YOLOE_m| PKU-Market-PCB | 50.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_80e_pcb.pdparams) | [配置文件](./ppyoloe_crn_m_80e_pcb.yml) | +|PP-YOLOE+_m
    (obj365_pretrained)| PKU-Market-PCB | 52.7(+1.9) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb.yml) | +|PP-YOLOE+_m
    (coco_pretrained)| PKU-Market-PCB | 52.4(+1.6) | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco_pretrained_pcb.pdparams) | [配置文件](./ppyoloe_plus_crn_m_80e_coco_pretrained_pcb.yml) | + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 + + +## SKU110k Model ZOO +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box APval
    0.5:0.95 (maxDets=300) | Box APtest
    0.5:0.95 (maxDets=300) | download | config | +|:--------------:|:-----:|:-------:|:----------:|:----------:| :-------:|:-------------------------:|:---------------------------:|:---------:|:------:| +| PP-YOLOE+_s | 80 | 8 | 8 | cspresnet-s | 960 | 57.4 | 58.8 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_s_80e_sku110k.yml) | +| PP-YOLOE+_m | 80 | 8 | 8 | cspresnet-m | 960 | 58.2 | 59.7 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_m_80e_sku110k.yml) | +| PP-YOLOE+_l | 80 | 8 | 4 | cspresnet-l | 960 | 58.8 | 60.2 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_l_80e_sku110k.yml) | +| PP-YOLOE+_x | 80 | 8 | 4 | cspresnet-x | 960 | 59.0 | 60.3 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_sku110k.pdparams) | [config](./ppyoloe_plus_crn_x_80e_sku110k.yml) | + + +**注意:** +- SKU110k系列模型训练过程中使用8 GPUs进行训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- SKU110k数据集使用**maxDets=300**的mAP值作为评估指标。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 + + +## 引用 +``` +@inproceedings{goldman2019dense, + author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner}, + title = {Precise Detection in Densely Packed Scenes}, + booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)}, + year = {2019} +} + +@article{Exdark, +title={Getting to Know Low-light Images with The Exclusively Dark Dataset}, +author={Loh, Yuen Peng and Chan, Chee Seng}, +journal={Computer Vision and Image Understanding}, +volume={178}, +pages={30-42}, +year={2019}, +doi={https://doi.org/10.1016/j.cviu.2018.10.010} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/exdark_detection.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/exdark_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..07585bc5ddcea460aedaf5797b6720ceab988814 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/exdark_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 12 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: coco_annotations/train.json + dataset_dir: dataset/Exdark/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: coco_annotations/val.json + dataset_dir: dataset/Exdark/ + +TestDataset: + !ImageFolder + anno_path: coco_annotations/val.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/Exdark/ # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/pcb_detection.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/pcb_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..53f5f3744c5aa029ed80b7a5ab911ea831d2f78e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/pcb_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 6 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: pcb_cocoanno/train.json + dataset_dir: dataset/PCB_coco/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: pcb_cocoanno/val.json + dataset_dir: dataset/PCB_coco/ + +TestDataset: + !ImageFolder + anno_path: pcb_cocoanno/val.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/PCB_coco/ # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/sku110k.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/sku110k.yml new file mode 100644 index 0000000000000000000000000000000000000000..664ce2f25c354d1fe5e85642e7a6ae348b59a032 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/sku110k.yml @@ -0,0 +1,21 @@ +metric: COCO +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/annotations_train.json + dataset_dir: dataset/SKU110K_fixed + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/annotations_val.json + dataset_dir: dataset/SKU110K_fixed + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/annotations_test.json + dataset_dir: dataset/SKU110K_fixed diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/wgisd_detection.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/wgisd_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..a2721bbd193c91884c512294eb73978eddd3bb9a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/_base_/wgisd_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: data + anno_path: coco_annotations/new_train_bbox_instances.json + dataset_dir: dataset/wgisd/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: data + anno_path: coco_annotations/new_test_bbox_instances.json + dataset_dir: dataset/wgisd/ + +TestDataset: + !ImageFolder + anno_path: coco_annotations/new_test_bbox_instances.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/wgisd/ # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_exdark.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_exdark.yml new file mode 100644 index 0000000000000000000000000000000000000000..6f9914dce90d39062a01579dac3dc0dc6da56430 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_exdark.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/exdark_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_crn.yml', + '../_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_crn_m_80e_exdark/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_pcb.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_pcb.yml new file mode 100644 index 0000000000000000000000000000000000000000..7e7de1cf8e7461aa8f1d2a1acded2c5babb37c2e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_pcb.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/pcb_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_crn.yml', + '../_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_crn_m_80e_pcb/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_wgisd.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_wgisd.yml new file mode 100644 index 0000000000000000000000000000000000000000..c8658b0d6cfec4e3aba3c4ed865e1e02e55a60ca --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_crn_m_80e_wgisd.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/wgisd_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_crn.yml', + '../_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_crn_m_80e_wgisd/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_l_80e_sku110k.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_l_80e_sku110k.yml new file mode 100644 index 0000000000000000000000000000000000000000..858bf5f4a0ff4ad0141df000f13de0b56804b460 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_l_80e_sku110k.yml @@ -0,0 +1,127 @@ +_BASE_: [ + './_base_/sku110k.yml', + '../../runtime.yml' +] + +log_iter: 10 +snapshot_epoch: 20 +weights: output/ppyoloe_plus_crn_s_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +# arch +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + use_alpha: True + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 3000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + + +# reader +worker_num: 8 +eval_height: &eval_height 960 +eval_width: &eval_width 960 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2} + - RandomDistort: {} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +# optimizer +epoch: 80 + +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_exdark.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_exdark.yml new file mode 100644 index 0000000000000000000000000000000000000000..66fc8b52d4b7246fccbd12aa1ace9bee65be6229 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_exdark.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/exdark_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_coco_pretrained_exdark/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_pcb.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_pcb.yml new file mode 100644 index 0000000000000000000000000000000000000000..7b0e3abd8fc30e4d5097392b8367f4b11cc14f5d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_pcb.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/pcb_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_coco_pretrained_pcb/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd.yml new file mode 100644 index 0000000000000000000000000000000000000000..3e813cb09773beff735131544581b42480bebccf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/wgisd_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_coco_pretrained_wgisd/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark.yml new file mode 100644 index 0000000000000000000000000000000000000000..d97f2a115ac8aac0a4f31b629bc7a2a4d5388810 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/exdark_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_obj365_pretrained_exdark/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb.yml new file mode 100644 index 0000000000000000000000000000000000000000..72d5620c22e28c901a69eb358c7f82e067fa4986 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/pcb_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_obj365_pretrained_pcb/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd.yml new file mode 100644 index 0000000000000000000000000000000000000000..6cebc6d47d85a95194873ee17885c0691bf40883 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd.yml @@ -0,0 +1,15 @@ +_BASE_: [ + './_base_/wgisd_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_obj365_pretrained_wgisd/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_sku110k.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_sku110k.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd7a4431cd9eec6a40d04410e4c160557d4e9be1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_m_80e_sku110k.yml @@ -0,0 +1,127 @@ +_BASE_: [ + './_base_/sku110k.yml', + '../../runtime.yml' +] + +log_iter: 10 +snapshot_epoch: 20 +weights: output/ppyoloe_plus_crn_s_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 + + +# arch +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + use_alpha: True + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 3000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + + +# reader +worker_num: 8 +eval_height: &eval_height 960 +eval_width: &eval_width 960 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2} + - RandomDistort: {} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +# optimizer +epoch: 80 + +LearningRate: + base_lr: 0.004 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_s_80e_sku110k.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_s_80e_sku110k.yml new file mode 100644 index 0000000000000000000000000000000000000000..e196a6845a4be8f06bb623c965924157c5f206e2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_s_80e_sku110k.yml @@ -0,0 +1,127 @@ +_BASE_: [ + './_base_/sku110k.yml', + '../../runtime.yml' +] + +log_iter: 10 +snapshot_epoch: 20 +weights: output/ppyoloe_plus_crn_s_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + + +# arch +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + use_alpha: True + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 3000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + + +# reader +worker_num: 8 +eval_height: &eval_height 960 +eval_width: &eval_width 960 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2} + - RandomDistort: {} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +# optimizer +epoch: 80 + +LearningRate: + base_lr: 0.004 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_x_80e_sku110k.yml b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_x_80e_sku110k.yml new file mode 100644 index 0000000000000000000000000000000000000000..da465662cf1797b8ab70cab3171c99f7627e96da --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/application/ppyoloe_plus_crn_x_80e_sku110k.yml @@ -0,0 +1,127 @@ +_BASE_: [ + './_base_/sku110k.yml', + '../../runtime.yml' +] + +log_iter: 10 +snapshot_epoch: 20 +weights: output/ppyoloe_plus_crn_s_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams +depth_mult: 1.33 +width_mult: 1.25 + + +# arch +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + use_alpha: True + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 3000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + + +# reader +worker_num: 8 +eval_height: &eval_height 960 +eval_width: &eval_width 960 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [3000, 1800], keep_ratio: True, interp: 2} + - RandomDistort: {} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +# optimizer +epoch: 80 + +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/distill/README.md b/PaddleDetection-release-2.6/configs/ppyoloe/distill/README.md new file mode 100644 index 0000000000000000000000000000000000000000..868d70b88805dca01e63bd56dff7c08c06a2f5cb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/distill/README.md @@ -0,0 +1,46 @@ +# PPYOLOE+ Distillation(PPYOLOE+ 蒸馏) + +PaddleDetection提供了对PPYOLOE+ 进行模型蒸馏的方案,结合了logits蒸馏和feature蒸馏。更多蒸馏方案可以查看[slim/distill](../../slim/distill/)。 + +## 模型库 + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| ----------------- | ----------- | ------ | :----: | :-----------: | :--------------: | :------------: | +| PP-YOLOE+_x | teacher | 640 | 80e | 54.7 | [config](../ppyoloe_plus_crn_x_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | +| PP-YOLOE+_l | student | 640 | 80e | 52.9 | [config](../ppyoloe_plus_crn_l_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | +| PP-YOLOE+_l | distill | 640 | 80e | **54.0(+1.1)** | [config](./ppyoloe_plus_crn_l_80e_coco_distill.yml),[slim_config](../../slim/distill/ppyoloe_plus_distill_x_distill_l.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco_distill.pdparams) | +| PP-YOLOE+_l | teacher | 640 | 80e | 52.9 | [config](../ppyoloe_plus_crn_l_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | +| PP-YOLOE+_m | student | 640 | 80e | 49.8 | [config](../ppyoloe_plus_crn_m_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | +| PP-YOLOE+_m | distill | 640 | 80e | **51.0(+1.2)** | [config](./ppyoloe_plus_crn_m_80e_coco_distill.yml),[slim_config](../../slim/distill/ppyoloe_plus_distill_l_distill_m.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco_distill.pdparams) | + +## 快速开始 + +### 训练 +```shell +# 单卡 +python tools/train.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml --slim_config configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml +# 多卡 +python -m paddle.distributed.launch --log_dir=ppyoloe_plus_distill_x_distill_l/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml --slim_config configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +### 评估 +```shell +python tools/eval.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml -o weights=output/ppyoloe_plus_crn_l_80e_coco_distill/model_final.pdparams +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 +- `-o weights`: 指定压缩算法训好的模型路径。 + +### 测试 +```shell +python tools/infer.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml -o weights=output/ppyoloe_plus_crn_l_80e_coco_distill/model_final.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件。 +- `--slim_config`: 指定压缩策略配置文件。 +- `-o weights`: 指定压缩算法训好的模型路径。 +- `--infer_img`: 指定测试图像路径。 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..c000a4898012afd0cb832d36a9716130ad68ae48 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml @@ -0,0 +1,39 @@ +_BASE_: [ + '../ppyoloe_plus_crn_l_80e_coco.yml', +] +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_80e_coco_distill/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_m_80e_coco_distill.yml b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_m_80e_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..ef2f38510bcedb7ed5ab0859c893322299b7e0d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_m_80e_coco_distill.yml @@ -0,0 +1,39 @@ +_BASE_: [ + '../ppyoloe_plus_crn_m_80e_coco.yml', +] +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_coco_distill/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_s_80e_coco_distill.yml b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_s_80e_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..95ac5d0caef531ffca8109c348a06dc408410a18 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/distill/ppyoloe_plus_crn_s_80e_coco_distill.yml @@ -0,0 +1,39 @@ +_BASE_: [ + '../ppyoloe_plus_crn_s_80e_coco.yml', +] +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_80e_coco_distill/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/objects365/README_cn.md b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..8018d03c62d77514a13f2c45340fe3c23ce6fdec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/README_cn.md @@ -0,0 +1,15 @@ +# PP-YOLOE + +## 模型库 + +### Objects365数据集模型库 +| 模型 | Epoch | 机器个数 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:---------------:|:-----:|:-----------:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :--------:|:--------:| +| PP-YOLOE+_s | 60 | 3 | 8 | 8 | cspresnet-s | 640 | 18.1 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams) | [config](./ppyoloe_plus_crn_s_60e_objects365.yml) | +| PP-YOLOE+_m | 60 | 4 | 8 | 8 | cspresnet-m | 640 | 25.0 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams) | [config](./ppyoloe_plus_crn_m_60e_objects365.yml) | +| PP-YOLOE+_l | 60 | 3 | 8 | 8 | cspresnet-l | 640 | 30.8 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams) | [config](./ppyoloe_plus_crn_l_60e_objects365.yml) | +| PP-YOLOE+_x | 60 | 4 | 8 | 8 | cspresnet-x | 640 | 32.7 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams) | [config](./ppyoloe_plus_crn_x_60e_objects365.yml) | + + +**注意:** +- 多机训练细节见[文档](../../../docs/tutorials/DistributedTraining_cn.md) diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_l_60e_objects365.yml b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_l_60e_objects365.yml new file mode 100644 index 0000000000000000000000000000000000000000..ca283394fe24e23ed1395637ebe120da00fc49b6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_l_60e_objects365.yml @@ -0,0 +1,21 @@ +_BASE_: [ + '../../datasets/objects365_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_60e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_60e_objects365/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams + +CSPResNet: + use_alpha: False + +PPYOLOEHead: + static_assigner_epoch: 20 + +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_m_60e_objects365.yml b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_m_60e_objects365.yml new file mode 100644 index 0000000000000000000000000000000000000000..0877b5275a95e44e4275cc873d920aabb6a266cb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_m_60e_objects365.yml @@ -0,0 +1,21 @@ +_BASE_: [ + '../../datasets/objects365_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_60e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_60e_objects365/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams + +CSPResNet: + use_alpha: False + +PPYOLOEHead: + static_assigner_epoch: 20 + +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_s_60e_objects365.yml b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_s_60e_objects365.yml new file mode 100644 index 0000000000000000000000000000000000000000..0023af93f17faf61ab071823f557c649e3155c67 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_s_60e_objects365.yml @@ -0,0 +1,21 @@ +_BASE_: [ + '../../datasets/objects365_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_60e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_60e_objects365/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams + +CSPResNet: + use_alpha: False + +PPYOLOEHead: + static_assigner_epoch: 20 + +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_x_60e_objects365.yml b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_x_60e_objects365.yml new file mode 100644 index 0000000000000000000000000000000000000000..0c5fe97150c40a282783de40f9b8d6f5cd6a1be4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/objects365/ppyoloe_plus_crn_x_60e_objects365.yml @@ -0,0 +1,21 @@ +_BASE_: [ + '../../datasets/objects365_detection.yml', + '../../runtime.yml', + '../_base_/optimizer_60e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_x_60e_objects365/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams + +CSPResNet: + use_alpha: False + +PPYOLOEHead: + static_assigner_epoch: 20 + +depth_mult: 1.33 +width_mult: 1.25 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ef3422815b4376fdd921516e235ec59af28681f7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml new file mode 100644 index 0000000000000000000000000000000000000000..21af7774c7260ca7d3db01b64c92c92ef0e2d882 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml @@ -0,0 +1,71 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_36e_xpu.yml', + './_base_/ppyoloe_reader.yml', +] + +# note: these are default values (use_gpu = true and use_xpu = false) for CI. +# set use_gpu = false and use_xpu = true for training. +use_gpu: true +use_xpu: false + +log_iter: 100 +snapshot_epoch: 1 +weights: output/ppyoloe_crn_l_36e_coco/model_final +find_unused_parameters: True + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 4 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f6c2a4ab3df171904714a9b7edb17fd588f3d5fc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_m_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0afba55c4e3339e7f91070e524f9a9e4d37e4cd7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..bc60cf6b6cc414b6fc6f20f86b7dd09aa1699d40 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml @@ -0,0 +1,18 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_400e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_400e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +PPYOLOEHead: + static_assigner_epoch: 133 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..fc388e9416b567ca431c30b39280d09a9ebf04ab --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_x_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams +depth_mult: 1.33 +width_mult: 1.25 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..626cc2810510a908cb361bf63e4c1ae087adcba7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_80e.yml', + './_base_/ppyoloe_plus_crn.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..3209bef6b91c03c12b09bb8038adc82b7d1de8e0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_80e.yml', + './_base_/ppyoloe_plus_crn.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_m_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..862f322c48ec6fd7f7bc669f4be8b436746046e7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_80e.yml', + './_base_/ppyoloe_plus_crn.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5884cc0f7af9a3ed85afaa2cd4b89362b224482b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_t_auxhead_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_t_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.375 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_320_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_320_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..010a4f610c8c1c1a2f62e6c1b49541072fbea578 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_320_300e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + './_base_/ppyoloe_plus_reader_320.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_t_auxhead_320_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_t_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.375 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6822f188685ccab3c7887cb39d14c5e182362f12 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_300e_coco.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_t_auxhead_relu_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_t_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.375 + + +CSPResNet: + act: 'relu' + +CustomCSPPAN: + act: 'relu' + +PPYOLOEHead: + act: 'relu' + attn_conv: None diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ad7642881ae6055340ece761ae97775d314f0b13 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_plus_crn_tiny_auxhead.yml', + './_base_/ppyoloe_plus_reader_320.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_t_auxhead_relu_320_300e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_t_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.375 + + +CSPResNet: + act: 'relu' + +CustomCSPPAN: + act: 'relu' + +PPYOLOEHead: + act: 'relu' + attn_conv: None diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd41814f972cff4b1193c0c7813d22764b1f565d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_80e.yml', + './_base_/ppyoloe_plus_crn.yml', + './_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_x_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_x_obj365_pretrained.pdparams +depth_mult: 1.33 +width_mult: 1.25 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/voc/README_cn.md b/PaddleDetection-release-2.6/configs/ppyoloe/voc/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..8bfc61d9c8c16cfcad8f9e2a89442345608ce757 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/voc/README_cn.md @@ -0,0 +1,9 @@ +# PP-YOLOE + +## 模型库 + +### VOC数据集模型库 +| 模型 | Epoch | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP0.5 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:---------------:|:-----:|:-----------:|:-----------:|:---------:|:----------:|:--------------:|:---------:|:---------:|:-------------:|:-----------------------:| :-------: |:--------:| +| PP-YOLOE+_s | 30 | 8 | 8 | cspresnet-s | 640 | 86.7 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [config](./ppyoloe_plus_crn_s_30e_voc.yml) | +| PP-YOLOE+_l | 30 | 8 | 8 | cspresnet-l | 640 | 89.0 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_30e_voc.pdparams) | [config](./ppyoloe_plus_crn_l_30e_voc.yml) | diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_l_30e_voc.yml b/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_l_30e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..217e37f274c443f0d905ad162ede72674c6f9092 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_l_30e_voc.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../datasets/voc.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_30e_voc/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +TrainReader: + batch_size: 8 # default 8 gpus, total bs = 64 + +EvalReader: + batch_size: 4 + + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 1 + + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_s_30e_voc.yml b/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_s_30e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..080bcdd808d971bfc214e8e10a34ef26fd6700ca --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ppyoloe/voc/ppyoloe_plus_crn_s_30e_voc.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../datasets/voc.yml', + '../../runtime.yml', + '../_base_/optimizer_80e.yml', + '../_base_/ppyoloe_plus_crn.yml', + '../_base_/ppyoloe_plus_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_30e_voc/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_s_80e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + + +TrainReader: + batch_size: 8 # default 8 gpus, total bs = 64 + +EvalReader: + batch_size: 4 + + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 1 + + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/queryinst/README.md b/PaddleDetection-release-2.6/configs/queryinst/README.md new file mode 100644 index 0000000000000000000000000000000000000000..568135328ba43780a3829977b839169126fe0b10 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/README.md @@ -0,0 +1,41 @@ +# QueryInst: Instances as Queries + +## Introduction + +QueryInst is a multi-stage end-to-end system that treats instances of interest as learnable queries, enabling query +based object detectors, e.g., Sparse R-CNN, to have strong instance segmentation performance. The attributes of +instances such as categories, bounding boxes, instance masks, and instance association embeddings are represented by +queries in a unified manner. In QueryInst, a query is shared by both detection and segmentation via dynamic convolutions +and driven by parallelly-supervised multi-stage learning. + +## Model Zoo + +| Backbone | Lr schd | Proposals | MultiScale | RandomCrop | bbox AP | mask AP | Download | Config | +|:------------:|:-------:|:---------:|:----------:|:----------:|:-------:|:-------:|------------------------------------------------------------------------------------------------------|----------------------------------------------------------| +| ResNet50-FPN | 1x | 100 | × | × | 42.1 | 37.8 | [model](https://bj.bcebos.com/v1/paddledet/models/queryinst_r50_fpn_1x_pro100_coco.pdparams) | [config](./queryinst_r50_fpn_1x_pro100_coco.yml) | +| ResNet50-FPN | 3x | 300 | √ | √ | 47.9 | 42.1 | [model](https://bj.bcebos.com/v1/paddledet/models/queryinst_r50_fpn_ms_crop_3x_pro300_coco.pdparams) | [config](./queryinst_r50_fpn_ms_crop_3x_pro300_coco.yml) | + +- COCO val-set evaluation results. +- These configurations are for 4-card training. + +Please modify these parameters as appropriate: + +```yaml +worker_num: 4 +TrainReader: + use_shared_memory: true +find_unused_parameters: true +``` + +## Citations + +``` +@InProceedings{Fang_2021_ICCV, + author = {Fang, Yuxin and Yang, Shusheng and Wang, Xinggang and Li, Yu and Fang, Chen and Shan, Ying and Feng, Bin and Liu, Wenyu}, + title = {Instances As Queries}, + booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, + month = {October}, + year = {2021}, + pages = {6910-6919} +} +``` diff --git a/PaddleDetection-release-2.6/configs/queryinst/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/queryinst/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..a7c0f5cb16311f046adec9e11f7cd0cc4a93e3d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/_base_/optimizer_1x.yml @@ -0,0 +1,17 @@ +epoch: 12 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_r50_fpn.yml b/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..05ab1c02f8a02308cfd47d441697c4a548c32f1a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_r50_fpn.yml @@ -0,0 +1,74 @@ +num_proposals: &num_proposals 100 +proposal_embedding_dim: &proposal_embedding_dim 256 +bbox_resolution: &bbox_resolution 7 +mask_resolution: &mask_resolution 14 + +architecture: QueryInst +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +QueryInst: + backbone: ResNet + neck: FPN + rpn_head: EmbeddingRPNHead + roi_head: SparseRoIHead + post_process: SparsePostProcess + +ResNet: + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [ 0, 1, 2, 3 ] + num_stages: 4 + lr_mult_list: [ 0.1, 0.1, 0.1, 0.1 ] + +FPN: + out_channel: *proposal_embedding_dim + extra_stage: 0 + +EmbeddingRPNHead: + num_proposals: *num_proposals + +SparseRoIHead: + num_stages: 6 + bbox_roi_extractor: + resolution: *bbox_resolution + sampling_ratio: 2 + aligned: True + mask_roi_extractor: + resolution: *mask_resolution + sampling_ratio: 2 + aligned: True + bbox_head: DIIHead + mask_head: DynamicMaskHead + loss_func: QueryInstLoss + +DIIHead: + feedforward_channels: 2048 + dynamic_feature_channels: 64 + roi_resolution: *bbox_resolution + num_attn_heads: 8 + dropout: 0.0 + num_ffn_fcs: 2 + num_cls_fcs: 1 + num_reg_fcs: 3 + +DynamicMaskHead: + dynamic_feature_channels: 64 + roi_resolution: *mask_resolution + num_convs: 4 + conv_kernel_size: 3 + conv_channels: 256 + upsample_method: 'deconv' + upsample_scale_factor: 2 + +QueryInstLoss: + focal_loss_alpha: 0.25 + focal_loss_gamma: 2.0 + class_weight: 2.0 + l1_weight: 5.0 + giou_weight: 2.0 + mask_weight: 8.0 + +SparsePostProcess: + num_proposals: *num_proposals + binary_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_reader.yml b/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e867cc27454efaf321e40bd09dc674d4f32c3a8d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/_base_/queryinst_reader.yml @@ -0,0 +1,43 @@ +worker_num: 4 + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {del_poly: True} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {} + batch_size: 1 + shuffle: false + drop_last: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_1x_pro100_coco.yml b/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_1x_pro100_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..1e61252b71d3373c2fc062207ef2b88d699d8a0b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_1x_pro100_coco.yml @@ -0,0 +1,12 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/queryinst_r50_fpn.yml', + '_base_/queryinst_reader.yml', +] + +log_iter: 50 +find_unused_parameters: true + +weights: output/queryinst_r50_fpn_1x_pro100_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_ms_crop_3x_pro300_coco.yml b/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_ms_crop_3x_pro300_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..7dfa8997e3f71c0941b3b7626ad256333af2161a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/queryinst/queryinst_r50_fpn_ms_crop_3x_pro300_coco.yml @@ -0,0 +1,45 @@ +_BASE_: [ + './queryinst_r50_fpn_1x_pro100_coco.yml', +] + +weights: output/queryinst_r50_fpn_ms_crop_3x_pro300_coco/model_final + +EmbeddingRPNHead: + num_proposals: 300 + +QueryInstPostProcess: + num_proposals: 300 + +epoch: 36 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [27, 33] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {del_poly: True} + - RandomFlip: {prob: 0.5} + - RandomSelect: { transforms1: [ RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ], + transforms2: [ + RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ], max_size: 1333 }, + RandomSizeCrop: { min_size: 384, max_size: 600, keep_empty: true }, + RandomShortSideResize: { short_side_sizes: [ 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800 ], max_size: 1333 } ] + } + - NormalizeImage: { is_scale: true, mean: [ 0.485,0.456,0.406 ], std: [ 0.229, 0.224,0.225 ] } + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/README.md b/PaddleDetection-release-2.6/configs/rcnn_enhance/README.md new file mode 100644 index 0000000000000000000000000000000000000000..378f4d83c627c847ef5f5c48472710401fec6124 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/README.md @@ -0,0 +1,12 @@ +## 服务器端实用目标检测方案 + +### 简介 + +* 近年来,学术界和工业界广泛关注图像中目标检测任务。基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)中SSLD蒸馏方案训练得到的ResNet50_vd预训练模型(ImageNet1k验证集上Top1 Acc为82.39%),结合PaddleDetection中的丰富算子,飞桨提供了一种面向服务器端实用的目标检测方案PSS-DET(Practical Server Side Detection)。基于COCO2017目标检测数据集,V100单卡预测速度为61FPS时,COCO mAP可达41.2%。 + + +### 模型库 + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :-------------: | :-----: | +| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.5 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_enhance_3x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml) | diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/README_en.md b/PaddleDetection-release-2.6/configs/rcnn_enhance/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..bf768a294cdb3b51b0d3be57be7952b78ce6c91f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/README_en.md @@ -0,0 +1,12 @@ +## Practical Server Side Detection + +### Introduction + +* In recent years, the object detection task in image has been widely concerned by academia and industry. ResNet50vd pretraining model based on SSLD distillation program training in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) (Top1 on ImageNet1k verification set) Acc is 82.39%), combined with the rich operator of PaddleDetection, PaddlePaddle provides a practical server side detection scheme PSS-DET(Practical Server Side Detection). Based on COCO2017 object detection dataset, V100 single gpu prediction speed is 61FPS, COCO mAP can reach 41.2%. + + +### Model library + +| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Mask AP | Download | Configuration File | +| :-------------------- | :----------: | :----------------------: | :--------------------: | :-----------------: | :----: | :-----: | :---------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.5 | - | [link](https://paddledet.bj.bcebos.com/models/faster_rcnn_enhance_3x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml) | diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml new file mode 100644 index 0000000000000000000000000000000000000000..d47fd2c98ce28ab3e75f56e981a2be70326a8bbd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml @@ -0,0 +1,81 @@ +architecture: FasterRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + variant: d + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + dcn_v2_stages: [1,2,3] + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + in_channels: [256, 512, 1024, 2048] + out_channel: 64 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 500 + post_nms_top_n: 300 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxLibraAssigner + bbox_loss: DIouLoss + +TwoFCHead: + out_channel: 1024 + +BBoxLibraAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +DIouLoss: + loss_weight: 10.0 + use_complete_iou_loss: true + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..f1a7c998d4e332661491024ca17a1a0d996b589d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml @@ -0,0 +1,42 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - AutoAugment: {autoaug_type: v1} + - RandomResize: {target_size: [[384,1000], [416,1000], [448,1000], [480,1000], [512,1000], [544,1000], [576,1000], [608,1000], [640,1000], [672,1000]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [640, 640], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [640, 640], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/optimizer_3x.yml b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/optimizer_3x.yml new file mode 100644 index 0000000000000000000000000000000000000000..8bd85fae359c552952bdfc7cec4cbb5ff1198e85 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/_base_/optimizer_3x.yml @@ -0,0 +1,19 @@ +epoch: 36 + +LearningRate: + base_lr: 0.02 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0. + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml b/PaddleDetection-release-2.6/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a49f245f22dbcaf80cb9a8ca382c35f549858b18 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/faster_rcnn_enhance.yml', + '_base_/faster_rcnn_enhance_reader.yml', +] +weights: output/faster_rcnn_enhance_r50_3x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/res2net/README.md b/PaddleDetection-release-2.6/configs/res2net/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a51654d809ec9b0d11822946a4d9ef620b2e053b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/res2net/README.md @@ -0,0 +1,37 @@ +# Res2Net + +## Introduction + +- Res2Net: A New Multi-scale Backbone Architecture: [https://arxiv.org/abs/1904.01169](https://arxiv.org/abs/1904.01169) + +``` +@article{DBLP:journals/corr/abs-1904-01169, + author = {Shanghua Gao and + Ming{-}Ming Cheng and + Kai Zhao and + Xinyu Zhang and + Ming{-}Hsuan Yang and + Philip H. S. Torr}, + title = {Res2Net: {A} New Multi-scale Backbone Architecture}, + journal = {CoRR}, + volume = {abs/1904.01169}, + year = {2019}, + url = {http://arxiv.org/abs/1904.01169}, + archivePrefix = {arXiv}, + eprint = {1904.01169}, + timestamp = {Thu, 25 Apr 2019 10:24:54 +0200}, + biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-01169}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} +``` + + +## Model Zoo + +| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | +| :---------------------- | :------------- | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | +| Res2Net50-FPN | Faster | 2 | 1x | - | 40.6 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco.yml) | +| Res2Net50-FPN | Mask | 2 | 2x | - | 42.4 | 38.1 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco.yml) | +| Res2Net50-vd-FPN | Mask | 2 | 2x | - | 42.6 | 38.1 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco.yml) | + +Note: all the above models are trained with 8 gpus. diff --git a/PaddleDetection-release-2.6/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..1fbdc9d73ff4cedf8498a7d5fdb76d7b6b454bbb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco.yml @@ -0,0 +1,33 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Res2Net50_26w_4s_pretrained.pdparams +weights: output/faster_rcnn_res2net50_vb_26w_4s_fpn_1x_coco/model_final + +FasterRCNN: + backbone: Res2Net + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +Res2Net: + # index 0 stands for res2 + depth: 50 + width: 26 + scales: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + variant: b + + +TrainReader: + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..02970d1f0c659b5dac0dc3c53fa5f2a750272520 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '../mask_rcnn/_base_/optimizer_1x.yml', + '../mask_rcnn/_base_/mask_rcnn_r50_fpn.yml', + '../mask_rcnn/_base_/mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Res2Net50_26w_4s_pretrained.pdparams +weights: output/mask_rcnn_res2net50_vb_26w_4s_fpn_2x_coco/model_final + +MaskRCNN: + backbone: Res2Net + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + + +Res2Net: + # index 0 stands for res2 + depth: 50 + width: 26 + scales: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + variant: b + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + + +TrainReader: + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..549e1f79128c6f33f28b468c4c946ac99e495a6f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco.yml @@ -0,0 +1,47 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '../mask_rcnn/_base_/optimizer_1x.yml', + '../mask_rcnn/_base_/mask_rcnn_r50_fpn.yml', + '../mask_rcnn/_base_/mask_fpn_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Res2Net50_vd_26w_4s_pretrained.pdparams +weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_2x_coco/model_final + +MaskRCNN: + backbone: Res2Net + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + + +Res2Net: + # index 0 stands for res2 + depth: 50 + width: 26 + scales: 4 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + variant: d + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + + +TrainReader: + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/retinanet/README.md b/PaddleDetection-release-2.6/configs/retinanet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1259d47dddf5eb52e1499c7c63ae913d2f806c7f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/README.md @@ -0,0 +1,28 @@ +# RetinaNet (Focal Loss for Dense Object Detection) + +## Model Zoo + +| Backbone | Model | imgs/GPU | lr schedule | FPS | Box AP | download | config | +| ------------ | --------- | -------- | ----------- | --- | ------ | ---------- | ----------- | +| ResNet50-FPN | RetinaNet | 2 | 1x | --- | 37.5 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_1x_coco.pdparams) | [config](./retinanet_r50_fpn_1x_coco.yml) | +| ResNet50-FPN | RetinaNet | 2 | 2x | --- | 39.1 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_2x_coco.pdparams) | [config](./retinanet_r50_fpn_2x_coco.yml) | +| ResNet101-FPN| RetinaNet | 2 | 2x | --- | 40.6 | [model](https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams) | [config](./retinanet_r101_fpn_2x_coco.yml) | +| ResNet50-FPN | RetinaNet + [FGD](../slim/distill/README.md) | 2 | 2x | --- | 40.8 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r101_distill_r50_2x_coco.pdparams) | [config](./retinanet_r50_fpn_2x_coco.yml)/[slim_config](../slim/distill/retinanet_resnet101_coco_distill.yml) | + + +**Notes:** + +- The ResNet50-FPN are trained on COCO train2017 with 8 GPUs. Both ResNet101-FPN and ResNet50-FPN with [FGD](../slim/distill/README.md) are trained on COCO train2017 with 4 GPUs. +- All above models are evaluated on val2017. Box AP=`mAP(IoU=0.5:0.95)`. + + +## Citation + +```latex +@inproceedings{lin2017focal, + title={Focal loss for dense object detection}, + author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr}, + booktitle={Proceedings of the IEEE international conference on computer vision}, + year={2017} +} +``` diff --git a/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..39c54ac805031619debf9b31119afa86b3ead857 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_2x.yml b/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..61841433417b9fcc6f29a6c71a72ba23406b55ad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/_base_/optimizer_2x.yml @@ -0,0 +1,19 @@ +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_r50_fpn.yml b/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..fb2d767aed5bd383f312ce79e4e39e3710c3cb9c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_r50_fpn.yml @@ -0,0 +1,57 @@ +architecture: RetinaNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +RetinaNet: + backbone: ResNet + neck: FPN + head: RetinaHead + +ResNet: + depth: 50 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +RetinaHead: + conv_feat: + name: RetinaFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: null + use_dcn: false + anchor_generator: + name: RetinaAnchorGenerator + octave_base_scale: 4 + scales_per_octave: 3 + aspect_ratios: [0.5, 1.0, 2.0] + strides: [8.0, 16.0, 32.0, 64.0, 128.0] + bbox_assigner: + name: MaxIoUAssigner + positive_overlap: 0.5 + negative_overlap: 0.4 + allow_low_quality: true + loss_class: + name: FocalLoss + gamma: 2.0 + alpha: 0.25 + loss_weight: 1.0 + loss_bbox: + name: SmoothL1Loss + beta: 0.0 + loss_weight: 1.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_reader.yml b/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..1f686b4d7f06f143106491e9b8fe3957a40927c2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/_base_/retinanet_reader.yml @@ -0,0 +1,36 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: True + drop_last: True + collate_batch: False + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..bb72cda8e99ac6a597ea5fc9b113378f7954bac3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r50_fpn.yml', + '_base_/optimizer_2x.yml', + '_base_/retinanet_reader.yml' +] + +weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams diff --git a/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0518dd30fd5597b58e5756a022af187add56e221 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r101_fpn_2x_coco.yml @@ -0,0 +1,18 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r50_fpn.yml', + '_base_/optimizer_2x.yml', + '_base_/retinanet_reader.yml' +] + +weights: output/retinanet_r101_fpn_2x_coco/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams + +ResNet: + depth: 101 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + num_stages: 4 diff --git a/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..cb6d342baeb428547d42f417acda02e8c90e39da --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_1x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/retinanet_reader.yml' +] + +weights: output/retinanet_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_2x_coco.yml b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b25a5cfe1acf7fb872dd0bf289b06ceec59925ed --- /dev/null +++ b/PaddleDetection-release-2.6/configs/retinanet/retinanet_r50_fpn_2x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r50_fpn.yml', + '_base_/optimizer_2x.yml', + '_base_/retinanet_reader.yml' +] + +weights: output/retinanet_r50_fpn_2x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/rotate/README.md b/PaddleDetection-release-2.6/configs/rotate/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bbc9fca205895f610ef8097901ab0b1e91533367 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/README.md @@ -0,0 +1,132 @@ +简体中文 | [English](README_en.md) + +# 旋转框检测 + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [数据准备](#数据准备) +- [安装依赖](#安装依赖) + +## 简介 +旋转框常用于检测带有角度信息的矩形框,即矩形框的宽和高不再与图像坐标轴平行。相较于水平矩形框,旋转矩形框一般包括更少的背景信息。旋转框检测常用于遥感等场景中。 + +## 模型库 + +| 模型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| [S2ANet](./s2anet/README.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | +| [FCOSR](./fcosr/README.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) | +| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) | +| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) | +| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) | +| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) | +| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) | +| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) | +| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) | +| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) | + +**注意:** + +- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。 + +## 数据准备 +### DOTA数据准备 +DOTA数据集是一个大规模的遥感图像数据集,包含旋转框和水平框的标注。可以从[DOTA数据集官网](https://captain-whu.github.io/DOTA/)下载数据集并解压,解压后的数据集目录结构如下所示: +``` +${DOTA_ROOT} +├── test +│ └── images +├── train +│ ├── images +│ └── labelTxt +└── val + ├── images + └── labelTxt +``` + +对于有标注的数据,每一张图片会对应一个同名的txt文件,文件中每一行为一个旋转框的标注,其格式如下: +``` +x1 y1 x2 y2 x3 y3 x4 y4 class_name difficult +``` + +#### 单尺度切图 +DOTA数据集分辨率较高,因此一般在训练和测试之前对图像进行离线切图,使用单尺度进行切图可以使用以下命令: +``` bash +# 对于有标注的数据进行切图 +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval1024/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 + +# 对于无标注的数据进行切图需要设置--image_only +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 \ + --image_only + +``` + +#### 多尺度切图 +使用多尺度进行切图可以使用以下命令: +``` bash +# 对于有标注的数据进行切图 +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 + +# 对于无标注的数据进行切图需要设置--image_only +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 \ + --image_only +``` + +### 自定义数据集 +旋转框使用标准COCO数据格式,你可以将你的数据集转换成COCO格式以训练模型。COCO标准数据格式的标注信息中包含以下信息: +``` python +'annotations': [ + { + 'id': 2083, 'category_id': 9, 'image_id': 9008, + 'bbox': [x, y, w, h], # 水平框标注 + 'segmentation': [[x1, y1, x2, y2, x3, y3, x4, y4]], # 旋转框标注 + ... + } + ... +] +``` +**需要注意的是`bbox`的标注是水平框标注,`segmentation`为旋转框四个点的标注(顺时针或逆时针均可)。在旋转框训练时`bbox`是可以缺省,一般推荐根据旋转框标注`segmentation`生成。** 在PaddleDetection 2.4及之前的版本,`bbox`为旋转框标注[x, y, w, h, angle],`segmentation`缺省,**目前该格式已不再支持,请下载最新数据集或者转换成标准COCO格式**。 + +## 安装依赖 +旋转框检测模型需要依赖外部算子进行训练,评估等。Linux环境下,你可以执行以下命令进行编译安装 +``` +cd ppdet/ext_op +python setup.py install +``` +Windows环境请按照如下步骤安装: + +(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例; + +(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示; + +(3)设置环境变量:`set DISTUTILS_USE_SDK=1` + +(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python setup.py install`命令进行安装。 + +安装完成后,可以执行`ppdet/ext_op/unittest`下的单测验证外部op是否正确安装 diff --git a/PaddleDetection-release-2.6/configs/rotate/README_en.md b/PaddleDetection-release-2.6/configs/rotate/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6cdc111cdb984afcefedac58f2687270c7151e7d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/README_en.md @@ -0,0 +1,129 @@ +English | [简体中文](README.md) + +# Rotated Object Detection + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Data Preparation](#Data-Preparation) +- [Installation](#Installation) + +## Introduction +Rotated object detection is used to detect rectangular bounding boxes with angle information, that is, the long and short sides of the rectangular bounding box are no longer parallel to the image coordinate axes. Oriented bounding boxes generally contain less background information than horizontal bounding boxes. Rotated object detection is often used in remote sensing scenarios. + +## Model Zoo +| Model | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config | +|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| [S2ANet](./s2anet/README_en.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | +| [FCOSR](./fcosr/README_en.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) | +| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) | +| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) | +| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) | +| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) | +| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) | +| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) | +| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) | +| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) | + +**Notes:** + +- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training. + +## Data Preparation +### DOTA Dataset preparation +The DOTA dataset is a large-scale remote sensing image dataset containing annotations of oriented and horizontal bounding boxes. The dataset can be download from [Official Website of DOTA Dataset](https://captain-whu.github.io/DOTA/). When the dataset is decompressed, its directory structure is shown as follows. +``` +${DOTA_ROOT} +├── test +│ └── images +├── train +│ ├── images +│ └── labelTxt +└── val + ├── images + └── labelTxt +``` + +For labeled data, each image corresponds to a txt file with the same name, and each row in the txt file represent a rotated bouding box. The format is as follows: + +``` +x1 y1 x2 y2 x3 y3 x4 y4 class_name difficult +``` + +#### Slicing data with single scale +The image resolution of DOTA dataset is relatively high, so we usually slice the images before training and testing. To slice the images with a single scale, you can use the command below +``` bash +# slicing labeled data +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval1024/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 +# slicing unlabeled data by setting --image_only +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 \ + --image_only + +``` + +#### Slicing data with multi scale +To slice the images with multiple scales, you can use the command below +``` bash +# slicing labeled data +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 +# slicing unlabeled data by setting --image_only +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 \ + --image_only +``` + +### Custom Dataset +Rotated object detction uses the standard COCO data format, and you can convert your dataset to COCO format to train the model. The annotations of standard COCO format contains the following information +``` python +'annotations': [ + { + 'id': 2083, 'category_id': 9, 'image_id': 9008, + 'bbox': [x, y, w, h], # horizontal bouding box + 'segmentation': [[x1, y1, x2, y2, x3, y3, x4, y4]], # rotated bounding box + ... + } + ... +] +``` +**It should be noted that `bbox` is the horizontal bouding box, and `segmentation` is four points of rotated bounding box (clockwise or counterclockwise). The `bbox` can be empty when training rotated object detector, and it is recommended to generate `bbox` according to `segmentation`**. In PaddleDetection 2.4 and earlier versions, `bbox` represents the rotated bounding box [x, y, w, h, angle] and `segmentation` is empty. **But this format is no longer supported after PaddleDetection 2.5, please download the latest dataset or convert to standard COCO format**. +## Installation +Models of rotated object detection depend on external operators for training, evaluation, etc. In Linux environment, you can execute the following command to compile and install. +``` +cd ppdet/ext_op +python setup.py install +``` +In Windows environment, perform the following steps to install it: + +(1)Visual Studio (version required >= Visual Studio 2015 Update3); + +(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017; + +(3)Setting Environment Variables:set DISTUTILS_USE_SDK=1 + +(4)Enter `ppdet/ext_op` directory,use `python setup.py install` to install。 + +After the installation, you can execute the unittest of `ppdet/ext_op/unittest` to verify whether the external oprators is installed correctly. diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/README.md b/PaddleDetection-release-2.6/configs/rotate/fcosr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1d93449d96916fe752df11fe605d35729cddb1f3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/README.md @@ -0,0 +1,91 @@ +简体中文 | [English](README_en.md) + +# FCOSR + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [使用说明](#使用说明) +- [预测部署](#预测部署) +- [引用](#引用) + +## 简介 + +[FCOSR](https://arxiv.org/abs/2111.10780)是基于[FCOS](https://arxiv.org/abs/1904.01355)的单阶段Anchor-Free的旋转框检测算法。FCOSR主要聚焦于旋转框的标签匹配策略,提出了椭圆中心采样和模糊样本标签匹配的方法。在loss方面,FCOSR使用了[ProbIoU](https://arxiv.org/abs/2106.06072)避免边界不连续性问题。 + +## 模型库 + +| 模型 | Backbone | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) | + +**注意:** + +- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。 + +## 使用说明 + +参考[数据准备](../README.md#数据准备)准备数据。 + +### 训练 + +GPU单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml +``` + +GPU多卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml +``` + +### 预测 + +执行以下命令预测单张图片,图片预测结果会默认保存在`output`文件夹下面 +``` bash +python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5 +``` + +### DOTA数据集评估 + +参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果: +``` bash +python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True +``` +将预测结果处理成官网评估所需要的格式: +``` bash +python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +## 预测部署 + +部署教程请参考[预测部署](../../../deploy/README.md) + +## 引用 + +``` +@article{li2021fcosr, + title={Fcosr: A simple anchor-free rotated detector for aerial object detection}, + author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen}, + journal={arXiv preprint arXiv:2111.10780}, + year={2021} +} + +@inproceedings{tian2019fcos, + title={Fcos: Fully convolutional one-stage object detection}, + author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong}, + booktitle={Proceedings of the IEEE/CVF international conference on computer vision}, + pages={9627--9636}, + year={2019} +} + +@article{llerena2021gaussian, + title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection}, + author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio}, + journal={arXiv preprint arXiv:2106.06072}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/README_en.md b/PaddleDetection-release-2.6/configs/rotate/fcosr/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..8d7621b339b3adb5d309c195754fd013f191b7d5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/README_en.md @@ -0,0 +1,92 @@ +English | [简体中文](README.md) + +# FCOSR + +## Content +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Getting Start](#Getting-Start) +- [Deployment](#Deployment) +- [Citations](#Citations) + +## Introduction + +[FCOSR](https://arxiv.org/abs/2111.10780) is one stage anchor-free model based on [FCOS](https://arxiv.org/abs/1904.01355). FCOSR focuses on the label assignment strategy for oriented bounding boxes and proposes ellipse center sampling method and fuzzy sample assignment strategy. In terms of loss, FCOSR uses [ProbIoU](https://arxiv.org/abs/2106.06072) to avoid boundary discontinuity problem. + +## Model Zoo + +| Model | Backbone | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config | +|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) | + +**Notes:** + +- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training. + +## Getting Start + +Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data. + +### Training + +Single GPU Training +``` bash +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml +``` + +Multiple GPUs Training +``` bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml +``` + +### Inference + +Run the follow command to infer single image, the result of inference will be saved in `output` directory by default. + +``` bash +python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5 +``` + +### Evaluation on DOTA Dataset +Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format +`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset: +``` bash +python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True +``` +Process the prediction results into the format required for the official website evaluation: +``` bash +python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +## Deployment + +Please refer to the deployment tutorial[Deployment](../../../deploy/README_en.md) + +## Citations + +``` +@article{li2021fcosr, + title={Fcosr: A simple anchor-free rotated detector for aerial object detection}, + author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen}, + journal={arXiv preprint arXiv:2111.10780}, + year={2021} +} + +@inproceedings{tian2019fcos, + title={Fcos: Fully convolutional one-stage object detection}, + author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong}, + booktitle={Proceedings of the IEEE/CVF international conference on computer vision}, + pages={9627--9636}, + year={2019} +} + +@article{llerena2021gaussian, + title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection}, + author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio}, + journal={arXiv preprint arXiv:2106.06072}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_reader.yml b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..854436dc1e109289d9d460390a299dfad0d988e0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_reader.yml @@ -0,0 +1,46 @@ +worker_num: 4 +image_height: &image_height 1024 +image_width: &image_width 1024 +image_size: &image_size [*image_height, *image_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RandomRFlip: {} + - RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]} + - RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5} + - RResize: {target_size: *image_size, keep_ratio: True, interp: 2} + - Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'} + batch_transforms: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadRGT: {} + - PadBatch: {pad_to_stride: 32} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RResize: {target_size: *image_size, keep_ratio: True, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + collate_batch: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *image_size, keep_ratio: True, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_x50.yml b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_x50.yml new file mode 100644 index 0000000000000000000000000000000000000000..77a4d8a2ff0594aa9f948111092fd6c625d13234 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/fcosr_x50.yml @@ -0,0 +1,44 @@ +architecture: YOLOv3 +snapshot_epoch: 1 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt50_32x4d_pretrained.pdparams + +YOLOv3: + backbone: ResNet + neck: FPN + yolo_head: FCOSRHead + post_process: ~ + +ResNet: + depth: 50 + groups: 32 + base_width: 4 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + extra_stage: 2 + has_extra_convs: true + use_c5: false + relu_before_extra_convs: true + +FCOSRHead: + feat_channels: 256 + fpn_strides: [8, 16, 32, 64, 128] + stacked_convs: 4 + loss_weight: {class: 1.0, probiou: 1.0} + assigner: + name: FCOSRAssigner + factor: 12 + threshold: 0.23 + boundary: [[-1, 64], [64, 128], [128, 256], [256, 512], [512, 100000000.0]] + nms: + name: MultiClassNMS + nms_top_k: 2000 + keep_top_k: -1 + score_threshold: 0.1 + nms_threshold: 0.1 + normalized: False diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/optimizer_3x.yml b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/optimizer_3x.yml new file mode 100644 index 0000000000000000000000000000000000000000..859db126bed27471f6d8dcd02761299395ce9468 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/_base_/optimizer_3x.yml @@ -0,0 +1,20 @@ +epoch: 36 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0.3333333 + steps: 500 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..d9554d30896ca5e2a3a5eb03725f1f6bb97a7dfc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/fcosr/fcosr_x50_3x_dota.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/fcosr_reader.yml', + '_base_/fcosr_x50.yml' +] + +weights: output/fcosr_x50_3x_dota/model_final diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README.md b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5efb248e5ba37168c74ce156a7a76ace53cec9a0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README.md @@ -0,0 +1,178 @@ +简体中文 | [English](README_en.md) + +# PP-YOLOE-R + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [使用说明](#使用说明) +- [预测部署](#预测部署) +- [附录](#附录) +- [引用](#引用) + +## 简介 +PP-YOLOE-R是一个高效的单阶段Anchor-free旋转框检测模型。基于PP-YOLOE, PP-YOLOE-R以极少的参数量和计算量为代价,引入了一系列有用的设计来提升检测精度。在DOTA 1.0数据集上,PP-YOLOE-R-l和PP-YOLOE-R-x在单尺度训练和测试的情况下分别达到了78.14和78.27 mAP,这超越了几乎所有的旋转框检测模型。通过多尺度训练和测试,PP-YOLOE-R-l和PP-YOLOE-R-x的检测精度进一步提升至80.02和80.73 mAP。在这种情况下,PP-YOLOE-R-x超越了所有的anchor-free方法并且和最先进的anchor-based的两阶段模型精度几乎相当。此外,PP-YOLOE-R-s和PP-YOLOE-R-m通过多尺度训练和测试可以达到79.42和79.71 mAP。考虑到这两个模型的参数量和计算量,其性能也非常卓越。在保持高精度的同时,PP-YOLOE-R避免使用特殊的算子,例如Deformable Convolution或Rotated RoI Align,以使其能轻松地部署在多种多样的硬件上。在1024x1024的输入分辨率下,PP-YOLOE-R-s/m/l/x在RTX 2080 Ti上使用TensorRT FP16分别能达到69.8/55.1/48.3/37.1 FPS,在Tesla V100上分别能达到114.5/86.8/69.7/50.7 FPS。更多细节可以参考我们的[**技术报告**](https://arxiv.org/abs/2211.02386)。 + +
    + +
    + +PP-YOLOE-R相较于PP-YOLOE做了以下几点改动: +- Rotated Task Alignment Learning +- 解耦的角度预测头 +- 使用DFL进行角度预测 +- 可学习的门控单元 +- [ProbIoU损失函数](https://arxiv.org/abs/2106.06072) + +## 模型库 + +| 模型 | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +|:---:|:--------:|:----:|:--------------------:|:------------------------:|:----------:|:---------:|:--------:|:----------:|:-------:|:------:|:-----------:|:--------:|:------:| +| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) | +| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) | +| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) | +| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) | +| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) | +| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) | +| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) | +| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) | + +**注意:** + +- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。 +- CRN表示在PP-YOLOE中提出的CSPRepResNet +- PP-YOLOE-R的参数量和计算量是在重参数化之后计算得到,输入图像的分辨率为1024x1024 +- 速度测试使用TensorRT 8.2.3在DOTA测试集中测试2000张图片计算平均值得到。参考速度测试以复现[速度测试](#速度测试) + +## 使用说明 + +参考[数据准备](../README.md#数据准备)准备数据。 + +### 训练 + +GPU单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml +``` + +GPU多卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml +``` + +### 预测 + +执行以下命令预测单张图片,图片预测结果会默认保存在`output`文件夹下面 +``` bash +python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5 +``` + +### DOTA数据集评估 + +参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果: +``` bash +python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True +``` +将预测结果处理成官网评估所需要的格式: +``` bash +python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +### 速度测试 +可以使用Paddle模式或者Paddle-TRT模式进行测速。当使用Paddle-TRT模式测速时,需要确保**TensorRT版本大于8.2, PaddlePaddle版本为develop版本**。使用Paddle-TRT进行测速,可以执行以下命令: + +``` bash +# 导出模型 +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True + +# 速度测试 +CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16 +``` +当只使用Paddle进行测速,可以执行以下命令: +``` bash +# 导出模型 +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams + +# 速度测试 +CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode paddle +``` + +## 预测部署 + +**使用Paddle**进行部署,执行以下命令: +``` bash +# 导出模型 +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams + +# 预测图片 +python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu +``` + +**使用Paddle-TRT进行部署**,执行以下命令: +``` +# 导出模型 +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True + +# 预测图片 +python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu +``` + +**注意:** +- 使用Paddle-TRT使用确保**PaddlePaddle版本为develop版本且TensorRT版本大于8.2**. + +**使用ONNX Runtime进行部署**,执行以下命令: +``` +# 导出模型 +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams export_onnx=True + +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx模型 +paddle2onnx --model_dir output_inference/ppyoloe_r_crn_l_3x_dota --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_r_crn_l_3x_dota.onnx + +# 预测图片 +python configs/rotate/tools/onnx_infer.py --infer_cfg output_inference/ppyoloe_r_crn_l_3x_dota/infer_cfg.yml --onnx_file ppyoloe_r_crn_l_3x_dota.onnx --image_file demo/P0072__1.0__0___0.png + +``` + +## 附录 + +PP-YOLOE-R消融实验 + +| 模型 | mAP | 参数量(M) | FLOPs(G) | +| :-: | :-: | :------: | :------: | +| Baseline | 75.61 | 50.65 | 269.09 | +| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 | +| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 | +| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 | +| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 | + + +## 引用 + +``` +@article{wang2022pp, + title={PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector}, + author={Wang, Xinxin and Wang, Guanzhong and Dang, Qingqing and Liu, Yi and Hu, Xiaoguang and Yu, Dianhai}, + journal={arXiv preprint arXiv:2211.02386}, + year={2022} +} + +@article{xu2022pp, + title={PP-YOLOE: An evolved version of YOLO}, + author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others}, + journal={arXiv preprint arXiv:2203.16250}, + year={2022} +} + +@article{llerena2021gaussian, + title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection}, + author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio}, + journal={arXiv preprint arXiv:2106.06072}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README_en.md b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6a37ed5fbd0904f90eba7f4e52b8f7ddd7c3ac3a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/README_en.md @@ -0,0 +1,180 @@ +English | [简体中文](README.md) + +# PP-YOLOE-R + +## Content +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Getting Start](#Getting-Start) +- [Deployment](#Deployment) +- [Appendix](#Appendix) +- [Citations](#Citations) + +## Introduction +PP-YOLOE-R is an efficient anchor-free rotated object detector. Based on PP-YOLOE, PP-YOLOE-R introduces a bag of useful tricks to improve detection precision at the expense of marginal parameters and computations.PP-YOLOE-R-l and PP-YOLOE-R-x achieve 78.14 and 78.27 mAP respectively on DOTA 1.0 dataset with single-scale training and testing, which outperform almost all other rotated object detectors. With multi-scale training and testing, the detection precision of PP-YOLOE-R-l and PP-YOLOE-R-x is further improved to 80.02 and 80.73 mAP. In this case, PP-YOLOE-R-x surpasses all anchor-free methods and demonstrates competitive performance to state-of-the-art anchor-based two-stage model. Moreover, PP-YOLOE-R-s and PP-YOLOE-R-m can achieve 79.42 and 79.71 mAP with multi-scale training and testing, which is an excellent result considering the parameters and GLOPS of these two models. While maintaining high precision, PP-YOLOE-R avoids using special operators, such as Deformable Convolution or Rotated RoI Align, to be deployed friendly on various hardware. At the input resolution of 1024$\times$1024, PP-YOLOE-R-s/m/l/x can reach 69.8/55.1/48.3/37.1 FPS on RTX 2080 Ti and 114.5/86.8/69.7/50.7 FPS on Tesla V100 GPU with TensorRT and FP16-precision. For more details, please refer to our [**technical report**](https://arxiv.org/abs/2211.02386). + +
    + +
    + +Compared with PP-YOLOE, PP-YOLOE-R has made the following changes: +- Rotated Task Alignment Learning +- Decoupled Angle Prediction Head +- Angle Prediction with DFL +- Learnable Gating Unit for RepVGG +- [ProbIoU Loss](https://arxiv.org/abs/2106.06072) + +## Model Zoo +| Model | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config | +|:-----:|:--------:|:----:|:-------------------:|:--------------------------:|:-----------:|:---------:|:--------:|:-----:|:---:|:----------:|:----------:|:--------:|:------:| +| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) | +| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) | +| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) | +| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) | +| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) | +| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) | +| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) | +| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) | + +**Notes:** + +- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training. +- CRN denotes CSPRepResNet proposed in PP-YOLOE +- The parameters and GLOPs of PP-YOLOE-R are calculated after re-parameterization, and the resolution of the input image is 1024x1024 +- Speed ​​is calculated and averaged by testing 2000 images on the DOTA test dataset. Refer to [Speed testing](#Speed-testing) to reproduce the results. + +## Getting Start + +Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data. + +### Training + +Single GPU Training +``` bash +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml +``` + +Multiple GPUs Training +``` bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml +``` + +### Inference + +Run the follow command to infer single image, the result of inference will be saved in `output` directory by default. + +``` bash +python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5 +``` + +### Evaluation on DOTA Dataset +Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format +`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset: +``` bash +python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True +``` +Process the prediction results into the format required for the official website evaluation: +``` bash +python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +### Speed testing + +You can use Paddle mode or Paddle-TRT mode for speed testing. When using Paddle-TRT for speed testing, make sure that **the version of TensorRT is larger than 8.2 and the version of PaddlePaddle is the develop version**. Using Paddle-TRT to test speed, run following command + +``` bash +# export inference model +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True + +# speed testing +CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16 +``` +Using Paddle to test speed, run following command +``` bash +# export inference model +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams + +# speed testing +CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode paddle + +``` + +## Deployment + +**Using Paddle** to for deployment, run following command + +``` bash +# export inference model +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams + +# inference single image +python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu +``` + +**Using Paddle-TRT** for deployment, run following command + +``` bash +# export inference model +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True + +# inference single image +python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu +``` +**Notes:** +- When using Paddle-TRT for speed testing, make sure that **the version of TensorRT is larger than 8.2 and the version of PaddlePaddle is the develop version** + +**Using ONNX Runtime** for deployment, run following command + +``` bash +# export inference model +python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams export_onnx=True + +# install paddle2onnx +pip install paddle2onnx + +# convert to onnx model +paddle2onnx --model_dir output_inference/ppyoloe_r_crn_l_3x_dota --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_r_crn_l_3x_dota.onnx + +# inference single image +python configs/rotate/tools/onnx_infer.py --infer_cfg output_inference/ppyoloe_r_crn_l_3x_dota/infer_cfg.yml --onnx_file ppyoloe_r_crn_l_3x_dota.onnx --image_file demo/P0072__1.0__0___0.png +``` + +## Appendix + +Ablation experiments of PP-YOLOE-R + +| Model | mAP | Params(M) | FLOPs(G) | +| :-: | :-: | :------: | :------: | +| Baseline | 75.61 | 50.65 | 269.09 | +| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 | +| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 | +| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 | +| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 | + +## Citations + +``` +@article{wang2022pp, + title={PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector}, + author={Wang, Xinxin and Wang, Guanzhong and Dang, Qingqing and Liu, Yi and Hu, Xiaoguang and Yu, Dianhai}, + journal={arXiv preprint arXiv:2211.02386}, + year={2022} +} + +@article{xu2022pp, + title={PP-YOLOE: An evolved version of YOLO}, + author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others}, + journal={arXiv preprint arXiv:2203.16250}, + year={2022} +} + +@article{llerena2021gaussian, + title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection}, + author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio}, + journal={arXiv preprint arXiv:2106.06072}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/optimizer_3x.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/optimizer_3x.yml new file mode 100644 index 0000000000000000000000000000000000000000..1cdad4beb093deeef0b6918b88b81fc5964e95ce --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/optimizer_3x.yml @@ -0,0 +1,19 @@ +epoch: 36 + +LearningRate: + base_lr: 0.008 + schedulers: + - !CosineDecay + max_epochs: 44 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_crn.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_crn.yml new file mode 100644 index 0000000000000000000000000000000000000000..ab5bdb50aa731e3af664b68aa52b3c7293d715e8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_crn.yml @@ -0,0 +1,49 @@ +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOERHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + use_alpha: True + +PPYOLOERHead: + fpn_strides: [32, 16, 8] + grid_cell_offset: 0.5 + use_varifocal_loss: true + static_assigner_epoch: -1 + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.05} + static_assigner: + name: FCOSRAssigner + factor: 12 + threshold: 0.23 + boundary: [[512, 10000], [256, 512], [-1, 256]] + assigner: + name: RotatedTaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 2000 + keep_top_k: -1 + score_threshold: 0.1 + nms_threshold: 0.1 + normalized: False diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_reader.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..c429c6ea07c4efaebe97aa62a7029e2951f68dea --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_reader.yml @@ -0,0 +1,46 @@ +worker_num: 4 +image_height: &image_height 1024 +image_width: &image_width 1024 +image_size: &image_size [*image_height, *image_width] + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RandomRFlip: {} + - RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]} + - RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5} + - RResize: {target_size: *image_size, keep_ratio: True, interp: 2} + - Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'} + batch_transforms: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadRGT: {} + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RResize: {target_size: *image_size, keep_ratio: True, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + collate_batch: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *image_size, keep_ratio: True, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..b019d736c19b35423cb536eea0cf0e55036c2af7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_l_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml new file mode 100644 index 0000000000000000000000000000000000000000..a1411a3153dfae89d722d4895039b15370094c45 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota_ms.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_l_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..755cf3f4e5bb93072779cf83344124c6d28cb925 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_m_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml new file mode 100644 index 0000000000000000000000000000000000000000..d885b459ff61f5ab7b3dcdcf55b80f1d6a3d6a4f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota_ms.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_m_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams +depth_mult: 0.67 +width_mult: 0.75 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..a227f18ac2ddb93e7af79d2452ea7e043cfe3eb0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_s_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml new file mode 100644 index 0000000000000000000000000000000000000000..921a9d571b730d3f57865e51baca6d37080d42a1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota_ms.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_s_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..d81b5ef9861fcef9e044c792894f671886037182 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_x_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams +depth_mult: 1.33 +width_mult: 1.25 diff --git a/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml new file mode 100644 index 0000000000000000000000000000000000000000..d99cdb0787109cdd88054d15967ddf4bfbb2b52f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../../datasets/dota_ms.yml', + '../../runtime.yml', + '_base_/optimizer_3x.yml', + '_base_/ppyoloe_r_reader.yml', + '_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_x_3x_dota/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams +depth_mult: 1.33 +width_mult: 1.25 diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/README.md b/PaddleDetection-release-2.6/configs/rotate/s2anet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..270f7cb6884e815f14e06c8186a47ed200941bb3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/README.md @@ -0,0 +1,104 @@ +简体中文 | [English](README_en.md) + +# S2ANet + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [使用说明](#使用说明) +- [预测部署](#预测部署) +- [引用](#引用) + +## 简介 + +[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型. + +## 模型库 + +| 模型 | Conv类型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +|:---:|:------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| S2ANet | Conv | 71.45 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) | +| S2ANet | AlignConv | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | + +**注意:** + +- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。 +- 这里使用`multiclass_nms`,与原作者使用nms略有不同。 + + +## 使用说明 + +参考[数据准备](../README.md#数据准备)准备数据。 + +### 1. 训练 + +GPU单卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +GPU多卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +可以通过`--eval`开启边训练边测试。 + +### 2. 评估 +```bash +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams + +# 使用提供训练好的模型评估 +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +``` + +### 3. 预测 +执行如下命令,会将图像预测结果保存到`output`文件夹下。 +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` +使用提供训练好的模型预测: +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` + +### 4. DOTA数据评估 +执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。 +``` +python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True +``` +参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令生成评估文件 +``` +python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +## 预测部署 + +Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。 + +部署教程请参考[预测部署](../../../deploy/README.md) + + +## 引用 +``` +@article{han2021align, + author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, + journal={IEEE Transactions on Geoscience and Remote Sensing}, + title={Align Deep Features for Oriented Object Detection}, + year={2021}, + pages={1-11}, + doi={10.1109/TGRS.2021.3062048}} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/README_en.md b/PaddleDetection-release-2.6/configs/rotate/s2anet/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..aecda9a3f2c46522f186152d146497d9fb41833e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/README_en.md @@ -0,0 +1,102 @@ +English | [简体中文](README.md) + +# S2ANet + +## Content +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Getting Start](#Getting-Start) +- [Deployment](#Deployment) +- [Citations](#Citations) + +## Introduction + +[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotated objects. + +## Model Zoo +| Model | Conv Type | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config | +|:---:|:------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| S2ANet | Conv | 71.45 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) | +| S2ANet | AlignConv | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | + +**Notes:** +- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training. +- `multiclass_nms` is used here, which is slightly different from the original author's use of NMS. + +## Getting Start + +Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data. + +### 1. Train + +Single GPU Training +```bash +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +Multiple GPUs Training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +You can use `--eval`to enable train-by-test. + +### 2. Evaluation +```bash +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams + +# Use a trained model to evaluate +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +``` + +### 3. Prediction +Executing the following command will save the image prediction results to the `output` folder. +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` +Prediction using models that provide training: +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` + +### 4. DOTA Data evaluation +Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name. +``` +python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True +``` +Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format +`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can execute the following command to generate the file +``` +python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10 + +zip -r submit.zip submit +``` + +## Deployment + +The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator. + +Please refer to the deployment tutorial[Predict deployment](../../../deploy/README_en.md) + + +## Citations +``` +@article{han2021align, + author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, + journal={IEEE Transactions on Geoscience and Remote Sensing}, + title={Align Deep Features for Oriented Object Detection}, + year={2021}, + pages={1-11}, + doi={10.1109/TGRS.2021.3062048}} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} +``` diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet.yml new file mode 100644 index 0000000000000000000000000000000000000000..fc8b2e25836b616e102676c39735b4debffcf435 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet.yml @@ -0,0 +1,52 @@ +architecture: S2ANet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/s2anet_r50_fpn_1x_dota/model_final.pdparams + + +# Model Achitecture +S2ANet: + backbone: ResNet + neck: FPN + head: S2ANetHead + +ResNet: + depth: 50 + variant: d + norm_type: bn + return_idx: [1,2,3] + num_stages: 4 + +FPN: + in_channels: [256, 512, 1024] + out_channel: 256 + spatial_scales: [0.25, 0.125, 0.0625] + has_extra_convs: True + extra_stage: 2 + relu_before_extra_convs: False + +S2ANetHead: + anchor_strides: [8, 16, 32, 64, 128] + anchor_scales: [4] + anchor_ratios: [1.0] + anchor_assign: RBoxAssigner + stacked_convs: 2 + feat_in: 256 + feat_out: 256 + align_conv_type: 'AlignConv' # AlignConv Conv + align_conv_size: 3 + use_sigmoid_cls: True + reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.1] + cls_loss_weight: [1.1, 1.05] + nms_pre: 2000 + nms: + name: MultiClassNMS + keep_top_k: -1 + score_threshold: 0.05 + nms_threshold: 0.1 + normalized: False + +RBoxAssigner: + pos_iou_thr: 0.5 + neg_iou_thr: 0.4 + min_iou_thr: 0.0 + ignore_iof_thr: -2 diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_1x.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..65f794dc34c55f5d597b94eb1b305b28a28707f7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_1x.yml @@ -0,0 +1,20 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [7, 10] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_norm: 35 diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..54e73ce64634ce9a479d07bbde1c3de385d2a7d5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml @@ -0,0 +1,20 @@ +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [14, 20] + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_norm: 35 diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_reader.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..7d0fc15e002f8fe0772a7feea241418f9a2ada42 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/_base_/s2anet_reader.yml @@ -0,0 +1,44 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RandomRFlip: {} + - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} + - Poly2RBox: {rbox_type: 'le135'} + batch_transforms: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadRGT: {} + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Poly2Array: {} + - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: false + drop_last: false + collate_batch: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1024, 1024], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_1x_spine.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_1x_spine.yml new file mode 100644 index 0000000000000000000000000000000000000000..550586f45ce293b2edd082d6fe700b97c53c35f3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_1x_spine.yml @@ -0,0 +1,25 @@ +_BASE_: [ + '../../datasets/spine_coco.yml', + '../../runtime.yml', + '_base_/s2anet_optimizer_1x.yml', + '_base_/s2anet.yml', + '_base_/s2anet_reader.yml', +] + +weights: output/s2anet_1x_spine/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams + +# for 4 card +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [7, 10] + - !LinearWarmup + start_factor: 0.3333333333333333 + epochs: 5 + +S2ANetHead: + reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.05] + cls_loss_weight: [1.05, 1.0] diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..1b3e9eb4636dc56e2cb97142e2a9b3f4c16bb84d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/s2anet_optimizer_2x.yml', + '_base_/s2anet.yml', + '_base_/s2anet_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + +weights: output/s2anet_alignconv_2x_dota/model_final diff --git a/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_conv_2x_dota.yml b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_conv_2x_dota.yml new file mode 100644 index 0000000000000000000000000000000000000000..34d136d865b5c4692f69356a6a22835248efe970 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/s2anet/s2anet_conv_2x_dota.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../../datasets/dota.yml', + '../../runtime.yml', + '_base_/s2anet_optimizer_2x.yml', + '_base_/s2anet.yml', + '_base_/s2anet_reader.yml', +] +weights: output/s2anet_conv_1x_dota/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +ResNet: + depth: 50 + variant: b + norm_type: bn + return_idx: [1,2,3] + num_stages: 4 + +S2ANetHead: + align_conv_type: 'Conv' diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/convert.py b/PaddleDetection-release-2.6/configs/rotate/tools/convert.py new file mode 100644 index 0000000000000000000000000000000000000000..cf5bdd01f9ed024f64df10658ff3e5b91efd82ad --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/convert.py @@ -0,0 +1,163 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import json +import cv2 +from tqdm import tqdm +from multiprocessing import Pool + + +def load_dota_info(image_dir, anno_dir, file_name, ext=None): + base_name, extension = os.path.splitext(file_name) + if ext and (extension != ext and extension not in ext): + return None + info = {'image_file': os.path.join(image_dir, file_name), 'annotation': []} + anno_file = os.path.join(anno_dir, base_name + '.txt') + if not os.path.exists(anno_file): + return info + with open(anno_file, 'r') as f: + for line in f: + items = line.strip().split() + if (len(items) < 9): + continue + + anno = { + 'poly': list(map(float, items[:8])), + 'name': items[8], + 'difficult': '0' if len(items) == 9 else items[9], + } + info['annotation'].append(anno) + + return info + + +def load_dota_infos(root_dir, num_process=8, ext=None): + image_dir = os.path.join(root_dir, 'images') + anno_dir = os.path.join(root_dir, 'labelTxt') + data_infos = [] + if num_process > 1: + pool = Pool(num_process) + results = [] + for file_name in os.listdir(image_dir): + results.append( + pool.apply_async(load_dota_info, (image_dir, anno_dir, + file_name, ext))) + + pool.close() + pool.join() + + for result in results: + info = result.get() + if info: + data_infos.append(info) + + else: + for file_name in os.listdir(image_dir): + info = load_dota_info(image_dir, anno_dir, file_name, ext) + if info: + data_infos.append(info) + + return data_infos + + +def process_single_sample(info, image_id, class_names): + image_file = info['image_file'] + single_image = dict() + single_image['file_name'] = os.path.split(image_file)[-1] + single_image['id'] = image_id + image = cv2.imread(image_file) + height, width, _ = image.shape + single_image['width'] = width + single_image['height'] = height + + # process annotation field + single_objs = [] + objects = info['annotation'] + for obj in objects: + poly, name, difficult = obj['poly'], obj['name'], obj['difficult'] + if difficult == '2': + continue + + single_obj = dict() + single_obj['category_id'] = class_names.index(name) + 1 + single_obj['segmentation'] = [poly] + single_obj['iscrowd'] = 0 + xmin, ymin, xmax, ymax = min(poly[0::2]), min(poly[1::2]), max(poly[ + 0::2]), max(poly[1::2]) + width, height = xmax - xmin, ymax - ymin + single_obj['bbox'] = [xmin, ymin, width, height] + single_obj['area'] = height * width + single_obj['image_id'] = image_id + single_objs.append(single_obj) + + return (single_image, single_objs) + + +def data_to_coco(infos, output_path, class_names, num_process): + data_dict = dict() + data_dict['categories'] = [] + + for i, name in enumerate(class_names): + data_dict['categories'].append({ + 'id': i + 1, + 'name': name, + 'supercategory': name + }) + + pbar = tqdm(total=len(infos), desc='data to coco') + images, annotations = [], [] + if num_process > 1: + pool = Pool(num_process) + results = [] + for i, info in enumerate(infos): + image_id = i + 1 + results.append( + pool.apply_async( + process_single_sample, (info, image_id, class_names), + callback=lambda x: pbar.update())) + + pool.close() + pool.join() + + for result in results: + single_image, single_anno = result.get() + images.append(single_image) + annotations += single_anno + + else: + for i, info in enumerate(infos): + image_id = i + 1 + single_image, single_anno = process_single_sample(info, image_id, + class_names) + images.append(single_image) + annotations += single_anno + pbar.update() + + pbar.close() + + for i, anno in enumerate(annotations): + anno['id'] = i + 1 + + data_dict['images'] = images + data_dict['annotations'] = annotations + + with open(output_path, 'w') as f: + json.dump(data_dict, f) diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/generate_result.py b/PaddleDetection-release-2.6/configs/rotate/tools/generate_result.py new file mode 100644 index 0000000000000000000000000000000000000000..f8343ee5b368c796ef31b92977653843515bcf2a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/generate_result.py @@ -0,0 +1,266 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import re +import glob + +import numpy as np +from multiprocessing import Pool +from functools import partial +from shapely.geometry import Polygon +import argparse + +wordname_15 = [ + 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', + 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', + 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', + 'harbor', 'swimming-pool', 'helicopter' +] + +wordname_16 = wordname_15 + ['container-crane'] + +wordname_18 = wordname_16 + ['airport', 'helipad'] + +DATA_CLASSES = { + 'dota10': wordname_15, + 'dota15': wordname_16, + 'dota20': wordname_18 +} + + +def rbox_iou(g, p): + """ + iou of rbox + """ + g = np.array(g) + p = np.array(p) + g = Polygon(g[:8].reshape((4, 2))) + p = Polygon(p[:8].reshape((4, 2))) + g = g.buffer(0) + p = p.buffer(0) + if not g.is_valid or not p.is_valid: + return 0 + inter = Polygon(g).intersection(Polygon(p)).area + union = g.area + p.area - inter + if union == 0: + return 0 + else: + return inter / union + + +def py_cpu_nms_poly_fast(dets, thresh): + """ + Args: + dets: pred results + thresh: nms threshold + + Returns: index of keep + """ + obbs = dets[:, 0:-1] + x1 = np.min(obbs[:, 0::2], axis=1) + y1 = np.min(obbs[:, 1::2], axis=1) + x2 = np.max(obbs[:, 0::2], axis=1) + y2 = np.max(obbs[:, 1::2], axis=1) + scores = dets[:, 8] + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + + polys = [] + for i in range(len(dets)): + tm_polygon = [ + dets[i][0], dets[i][1], dets[i][2], dets[i][3], dets[i][4], + dets[i][5], dets[i][6], dets[i][7] + ] + polys.append(tm_polygon) + polys = np.array(polys) + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + ovr = [] + i = order[0] + keep.append(i) + + xx1 = np.maximum(x1[i], x1[order[1:]]) + yy1 = np.maximum(y1[i], y1[order[1:]]) + xx2 = np.minimum(x2[i], x2[order[1:]]) + yy2 = np.minimum(y2[i], y2[order[1:]]) + w = np.maximum(0.0, xx2 - xx1) + h = np.maximum(0.0, yy2 - yy1) + hbb_inter = w * h + hbb_ovr = hbb_inter / (areas[i] + areas[order[1:]] - hbb_inter) + h_inds = np.where(hbb_ovr > 0)[0] + tmp_order = order[h_inds + 1] + for j in range(tmp_order.size): + iou = rbox_iou(polys[i], polys[tmp_order[j]]) + hbb_ovr[h_inds[j]] = iou + + try: + if math.isnan(ovr[0]): + pdb.set_trace() + except: + pass + inds = np.where(hbb_ovr <= thresh)[0] + + order = order[inds + 1] + return keep + + +def poly2origpoly(poly, x, y, rate): + origpoly = [] + for i in range(int(len(poly) / 2)): + tmp_x = float(poly[i * 2] + x) / float(rate) + tmp_y = float(poly[i * 2 + 1] + y) / float(rate) + origpoly.append(tmp_x) + origpoly.append(tmp_y) + return origpoly + + +def nmsbynamedict(nameboxdict, nms, thresh): + """ + Args: + nameboxdict: nameboxdict + nms: nms + thresh: nms threshold + + Returns: nms result as dict + """ + nameboxnmsdict = {x: [] for x in nameboxdict} + for imgname in nameboxdict: + keep = nms(np.array(nameboxdict[imgname]), thresh) + outdets = [] + for index in keep: + outdets.append(nameboxdict[imgname][index]) + nameboxnmsdict[imgname] = outdets + return nameboxnmsdict + + +def merge_single(output_dir, nms, nms_thresh, pred_class_lst): + """ + Args: + output_dir: output_dir + nms: nms + pred_class_lst: pred_class_lst + class_name: class_name + + Returns: + + """ + class_name, pred_bbox_list = pred_class_lst + nameboxdict = {} + for line in pred_bbox_list: + splitline = line.split(' ') + subname = splitline[0] + splitname = subname.split('__') + oriname = splitname[0] + pattern1 = re.compile(r'__\d+___\d+') + x_y = re.findall(pattern1, subname) + x_y_2 = re.findall(r'\d+', x_y[0]) + x, y = int(x_y_2[0]), int(x_y_2[1]) + + pattern2 = re.compile(r'__([\d+\.]+)__\d+___') + + rate = re.findall(pattern2, subname)[0] + + confidence = splitline[1] + poly = list(map(float, splitline[2:])) + origpoly = poly2origpoly(poly, x, y, rate) + det = origpoly + det.append(confidence) + det = list(map(float, det)) + if (oriname not in nameboxdict): + nameboxdict[oriname] = [] + nameboxdict[oriname].append(det) + nameboxnmsdict = nmsbynamedict(nameboxdict, nms, nms_thresh) + + # write result + dstname = os.path.join(output_dir, class_name + '.txt') + with open(dstname, 'w') as f_out: + for imgname in nameboxnmsdict: + for det in nameboxnmsdict[imgname]: + confidence = det[-1] + bbox = det[0:-1] + outline = imgname + ' ' + str(confidence) + ' ' + ' '.join( + map(str, bbox)) + f_out.write(outline + '\n') + + +def generate_result(pred_txt_dir, + output_dir='output', + class_names=wordname_15, + nms_thresh=0.1): + """ + pred_txt_dir: dir of pred txt + output_dir: dir of output + class_names: class names of data + """ + pred_txt_list = glob.glob("{}/*.txt".format(pred_txt_dir)) + + # step1: summary pred bbox + pred_classes = {} + for class_name in class_names: + pred_classes[class_name] = [] + + for current_txt in pred_txt_list: + img_id = os.path.split(current_txt)[1] + img_id = img_id.split('.txt')[0] + with open(current_txt) as f: + res = f.readlines() + for item in res: + item = item.split(' ') + pred_class = item[0] + item[0] = img_id + pred_bbox = ' '.join(item) + pred_classes[pred_class].append(pred_bbox) + + pred_classes_lst = [] + for class_name in pred_classes.keys(): + print('class_name: {}, count: {}'.format(class_name, + len(pred_classes[class_name]))) + pred_classes_lst.append((class_name, pred_classes[class_name])) + + # step2: merge + pool = Pool(len(class_names)) + nms = py_cpu_nms_poly_fast + mergesingle_fn = partial(merge_single, output_dir, nms, nms_thresh) + pool.map(mergesingle_fn, pred_classes_lst) + + +def parse_args(): + parser = argparse.ArgumentParser(description='generate test results') + parser.add_argument('--pred_txt_dir', type=str, help='path of pred txt dir') + parser.add_argument( + '--output_dir', type=str, default='output', help='path of output dir') + parser.add_argument( + '--data_type', type=str, default='dota10', help='data type') + parser.add_argument( + '--nms_thresh', + type=float, + default=0.1, + help='nms threshold while merging results') + + return parser.parse_args() + + +if __name__ == '__main__': + args = parse_args() + + output_dir = args.output_dir + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + class_names = DATA_CLASSES[args.data_type] + + generate_result(args.pred_txt_dir, output_dir, class_names) + print('done!') diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/inference_benchmark.py b/PaddleDetection-release-2.6/configs/rotate/tools/inference_benchmark.py new file mode 100644 index 0000000000000000000000000000000000000000..7421e7810c0b93ed4d31f1f22bf175be91a7819b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/inference_benchmark.py @@ -0,0 +1,378 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import six +import glob +import time +import yaml +import argparse +import cv2 +import numpy as np + +import paddle +import paddle.version as paddle_version +from paddle.inference import Config, create_predictor, PrecisionType, get_trt_runtime_version + +TUNED_TRT_DYNAMIC_MODELS = {'DETR'} + + +def check_version(version='2.2'): + err = "PaddlePaddle version {} or higher is required, " \ + "or a suitable develop version is satisfied as well. \n" \ + "Please make sure the version is good with your code.".format(version) + + version_installed = [ + paddle_version.major, paddle_version.minor, paddle_version.patch, + paddle_version.rc + ] + + if version_installed == ['0', '0', '0', '0']: + return + + if version == 'develop': + raise Exception("PaddlePaddle develop version is required!") + + version_split = version.split('.') + + length = min(len(version_installed), len(version_split)) + for i in six.moves.range(length): + if version_installed[i] > version_split[i]: + return + if version_installed[i] < version_split[i]: + raise Exception(err) + + +def check_trt_version(version='8.2'): + err = "TensorRT version {} or higher is required," \ + "Please make sure the version is good with your code.".format(version) + version_split = list(map(int, version.split('.'))) + version_installed = get_trt_runtime_version() + length = min(len(version_installed), len(version_split)) + for i in six.moves.range(length): + if version_installed[i] > version_split[i]: + return + if version_installed[i] < version_split[i]: + raise Exception(err) + + +# preprocess ops +def decode_image(im_file, im_info): + if isinstance(im_file, str): + with open(im_file, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + else: + im = im_file + im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32) + return im, im_info + + +class Resize(object): + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class Permute(object): + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + im = im.transpose((2, 0, 1)) + return im, im_info + + +class NormalizeImage(object): + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class PadStride(object): + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +def preprocess(im, preprocess_ops): + # process image by preprocess_ops + im_info = { + 'scale_factor': np.array( + [1., 1.], dtype=np.float32), + 'im_shape': None, + } + im, im_info = decode_image(im, im_info) + for operator in preprocess_ops: + im, im_info = operator(im, im_info) + return im, im_info + + +def parse_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + '--model_dir', type=str, help='directory of inference model') + parser.add_argument( + '--run_mode', type=str, default='paddle', help='running mode') + parser.add_argument('--batch_size', type=int, default=1, help='batch size') + parser.add_argument( + '--image_dir', + type=str, + default='/paddle/data/DOTA_1024_ss/test1024/images', + help='directory of test images') + parser.add_argument( + '--warmup_iter', type=int, default=5, help='num of warmup iters') + parser.add_argument( + '--total_iter', type=int, default=2000, help='num of total iters') + parser.add_argument( + '--log_iter', type=int, default=50, help='num of log interval') + parser.add_argument( + '--tuned_trt_shape_file', + type=str, + default='shape_range_info.pbtxt', + help='dynamic shape range info') + args = parser.parse_args() + return args + + +def init_predictor(FLAGS): + model_dir, run_mode, batch_size = FLAGS.model_dir, FLAGS.run_mode, FLAGS.batch_size + yaml_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(yaml_file) as f: + yml_conf = yaml.safe_load(f) + + config = Config( + os.path.join(model_dir, 'model.pdmodel'), + os.path.join(model_dir, 'model.pdiparams')) + + # initial GPU memory(M), device ID + config.enable_use_gpu(200, 0) + # optimize graph and fuse op + config.switch_ir_optim(True) + + precision_map = { + 'trt_int8': Config.Precision.Int8, + 'trt_fp32': Config.Precision.Float32, + 'trt_fp16': Config.Precision.Half + } + + arch = yml_conf['arch'] + tuned_trt_shape_file = os.path.join(model_dir, FLAGS.tuned_trt_shape_file) + + if run_mode in precision_map.keys(): + if arch in TUNED_TRT_DYNAMIC_MODELS and not os.path.exists( + tuned_trt_shape_file): + print( + 'dynamic shape range info is saved in {}. After that, rerun the code'. + format(tuned_trt_shape_file)) + config.collect_shape_range_info(tuned_trt_shape_file) + config.enable_tensorrt_engine( + workspace_size=(1 << 25) * batch_size, + max_batch_size=batch_size, + min_subgraph_size=yml_conf['min_subgraph_size'], + precision_mode=precision_map[run_mode], + use_static=True, + use_calib_mode=False) + + if yml_conf['use_dynamic_shape']: + if arch in TUNED_TRT_DYNAMIC_MODELS and os.path.exists( + tuned_trt_shape_file): + config.enable_tuned_tensorrt_dynamic_shape(tuned_trt_shape_file, + True) + else: + min_input_shape = { + 'image': [batch_size, 3, 640, 640], + 'scale_factor': [batch_size, 2] + } + max_input_shape = { + 'image': [batch_size, 3, 1280, 1280], + 'scale_factor': [batch_size, 2] + } + opt_input_shape = { + 'image': [batch_size, 3, 1024, 1024], + 'scale_factor': [batch_size, 2] + } + config.set_trt_dynamic_shape_info( + min_input_shape, max_input_shape, opt_input_shape) + + # disable print log when predict + config.disable_glog_info() + # enable shared memory + config.enable_memory_optim() + # disable feed, fetch OP, needed by zero_copy_run + config.switch_use_feed_fetch_ops(False) + predictor = create_predictor(config) + return predictor, yml_conf + + +def create_preprocess_ops(yml_conf): + preprocess_ops = [] + for op_info in yml_conf['Preprocess']: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + preprocess_ops.append(eval(op_type)(**new_op_info)) + return preprocess_ops + + +def get_test_images(image_dir): + images = set() + infer_dir = os.path.abspath(image_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + return images + + +def create_inputs(image_files, preprocess_ops): + inputs = dict() + im_list, im_info_list = [], [] + for im_path in image_files: + im, im_info = preprocess(im_path, preprocess_ops) + im_list.append(im) + im_info_list.append(im_info) + + inputs['im_shape'] = np.stack( + [e['im_shape'] for e in im_info_list], axis=0).astype('float32') + inputs['scale_factor'] = np.stack( + [e['scale_factor'] for e in im_info_list], axis=0).astype('float32') + inputs['image'] = np.stack(im_list, axis=0).astype('float32') + return inputs + + +def measure_speed(FLAGS): + predictor, yml_conf = init_predictor(FLAGS) + input_names = predictor.get_input_names() + preprocess_ops = create_preprocess_ops(yml_conf) + + image_files = get_test_images(FLAGS.image_dir) + + batch_size = FLAGS.batch_size + warmup_iter, log_iter, total_iter = FLAGS.warmup_iter, FLAGS.log_iter, FLAGS.total_iter + + total_time = 0 + fps = 0 + for i in range(0, total_iter, batch_size): + # make data ready + inputs = create_inputs(image_files[i:i + batch_size], preprocess_ops) + for name in input_names: + input_tensor = predictor.get_input_handle(name) + input_tensor.copy_from_cpu(inputs[name]) + + paddle.device.cuda.synchronize() + # start running + start_time = time.perf_counter() + predictor.run() + paddle.device.cuda.synchronize() + + if i >= warmup_iter: + total_time += time.perf_counter() - start_time + if (i + 1) % log_iter == 0: + fps = (i + 1 - warmup_iter) / total_time + print( + f'Done image [{i + 1:<3}/ {total_iter}], ' + f'fps: {fps:.1f} img / s, ' + f'times per image: {1000 / fps:.1f} ms / img', + flush=True) + + if (i + 1) == total_iter: + fps = (i + 1 - warmup_iter) / total_time + print( + f'Overall fps: {fps:.1f} img / s, ' + f'times per image: {1000 / fps:.1f} ms / img', + flush=True) + break + + +if __name__ == '__main__': + FLAGS = parse_args() + if 'trt' in FLAGS.run_mode: + check_version('develop') + check_trt_version('8.2') + else: + check_version('2.4') + measure_speed(FLAGS) diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/onnx_infer.py b/PaddleDetection-release-2.6/configs/rotate/tools/onnx_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..fa9b06d2ddd525ed32f99891bdd67f3b6650f0be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/onnx_infer.py @@ -0,0 +1,302 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import six +import glob +import copy +import yaml +import argparse +import cv2 +import numpy as np +from shapely.geometry import Polygon +from onnxruntime import InferenceSession + + +# preprocess ops +def decode_image(img_path): + with open(img_path, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class Permute(object): + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + im = im.transpose((2, 0, 1)) + return im, im_info + + +class NormalizeImage(object): + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class PadStride(object): + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img_path): + img, im_info = decode_image(img_path) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = img + return inputs + + +# postprocess +def rbox_iou(g, p): + g = np.array(g) + p = np.array(p) + g = Polygon(g[:8].reshape((4, 2))) + p = Polygon(p[:8].reshape((4, 2))) + g = g.buffer(0) + p = p.buffer(0) + if not g.is_valid or not p.is_valid: + return 0 + inter = Polygon(g).intersection(Polygon(p)).area + union = g.area + p.area - inter + if union == 0: + return 0 + else: + return inter / union + + +def multiclass_nms_rotated(pred_bboxes, + pred_scores, + iou_threshlod=0.1, + score_threshold=0.1): + """ + Args: + pred_bboxes (numpy.ndarray): [B, N, 8] + pred_scores (numpy.ndarray): [B, C, N] + + Return: + bboxes (numpy.ndarray): [N, 10] + bbox_num (numpy.ndarray): [B] + """ + bbox_num = [] + bboxes = [] + for bbox_per_img, score_per_img in zip(pred_bboxes, pred_scores): + num_per_img = 0 + for cls_id, score_per_cls in enumerate(score_per_img): + keep_mask = score_per_cls > score_threshold + bbox = bbox_per_img[keep_mask] + score = score_per_cls[keep_mask] + + idx = score.argsort()[::-1] + bbox = bbox[idx] + score = score[idx] + keep_idx = [] + for i, b in enumerate(bbox): + supressed = False + for gi in keep_idx: + g = bbox[gi] + if rbox_iou(b, g) > iou_threshlod: + supressed = True + break + + if supressed: + continue + + keep_idx.append(i) + + keep_box = bbox[keep_idx] + keep_score = score[keep_idx] + keep_cls_ids = np.ones(len(keep_idx)) * cls_id + bboxes.append( + np.concatenate( + [keep_cls_ids[:, None], keep_score[:, None], keep_box], + axis=-1)) + num_per_img += len(keep_idx) + + bbox_num.append(num_per_img) + + return np.concatenate(bboxes, axis=0), np.array(bbox_num) + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def predict_image(infer_config, predictor, img_list): + # load preprocess transforms + transforms = Compose(infer_config['Preprocess']) + # predict image + for img_path in img_list: + inputs = transforms(img_path) + inputs_name = [var.name for var in predictor.get_inputs()] + inputs = {k: inputs[k][None, ] for k in inputs_name} + + outputs = predictor.run(output_names=None, input_feed=inputs) + + bboxes, bbox_num = multiclass_nms_rotated( + np.array(outputs[0]), np.array(outputs[1])) + print("ONNXRuntime predict: ") + for bbox in bboxes: + if bbox[0] > -1 and bbox[1] > infer_config['draw_threshold']: + print(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}" + f"{bbox[6]} {bbox[7]} {bbox[8]} {bbox[9]}") + + +def parse_args(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml") + parser.add_argument( + '--onnx_file', + type=str, + default="model.onnx", + help="onnx model file path") + parser.add_argument("--image_dir", type=str) + parser.add_argument("--image_file", type=str) + return parser.parse_args() + + +if __name__ == '__main__': + FLAGS = parse_args() + # load image list + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + # load predictor + predictor = InferenceSession(FLAGS.onnx_file) + # load infer config + with open(FLAGS.infer_cfg) as f: + infer_config = yaml.safe_load(f) + + predict_image(infer_config, predictor, img_list) diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/prepare_data.py b/PaddleDetection-release-2.6/configs/rotate/tools/prepare_data.py new file mode 100644 index 0000000000000000000000000000000000000000..21488e2c7a5a604dad4a508f2c67ec6bf8cea37a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/prepare_data.py @@ -0,0 +1,128 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import argparse +from convert import load_dota_infos, data_to_coco +from slicebase import SliceBase + +wordname_15 = [ + 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', + 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', + 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', + 'harbor', 'swimming-pool', 'helicopter' +] + +wordname_16 = wordname_15 + ['container-crane'] + +wordname_18 = wordname_16 + ['airport', 'helipad'] + +DATA_CLASSES = { + 'dota10': wordname_15, + 'dota15': wordname_16, + 'dota20': wordname_18 +} + + +def parse_args(): + parser = argparse.ArgumentParser('prepare data for training') + + parser.add_argument( + '--input_dirs', + nargs='+', + type=str, + default=None, + help='input dirs which contain image and labelTxt dir') + + parser.add_argument( + '--output_dir', + type=str, + default=None, + help='output dirs which contain image and labelTxt dir and coco style json file' + ) + + parser.add_argument( + '--coco_json_file', + type=str, + default='', + help='coco json annotation files') + + parser.add_argument('--subsize', type=int, default=1024, help='patch size') + + parser.add_argument('--gap', type=int, default=200, help='step size') + + parser.add_argument( + '--data_type', type=str, default='dota10', help='data type') + + parser.add_argument( + '--rates', + nargs='+', + type=float, + default=[1.], + help='scales for multi-slice training') + + parser.add_argument( + '--nproc', type=int, default=8, help='the processor number') + + parser.add_argument( + '--iof_thr', + type=float, + default=0.5, + help='the minimal iof between a object and a window') + + parser.add_argument( + '--image_only', + action='store_true', + default=False, + help='only processing image') + + args = parser.parse_args() + return args + + +def load_dataset(input_dir, nproc, data_type): + if 'dota' in data_type.lower(): + infos = load_dota_infos(input_dir, nproc) + else: + raise ValueError('only dota dataset is supported now') + + return infos + + +def main(): + args = parse_args() + infos = [] + for input_dir in args.input_dirs: + infos += load_dataset(input_dir, args.nproc, args.data_type) + + slicer = SliceBase( + args.gap, + args.subsize, + args.iof_thr, + num_process=args.nproc, + image_only=args.image_only) + slicer.slice_data(infos, args.rates, args.output_dir) + if args.coco_json_file: + infos = load_dota_infos(args.output_dir, args.nproc) + coco_json_file = os.path.join(args.output_dir, args.coco_json_file) + class_names = DATA_CLASSES[args.data_type] + data_to_coco(infos, coco_json_file, class_names, args.nproc) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/configs/rotate/tools/slicebase.py b/PaddleDetection-release-2.6/configs/rotate/tools/slicebase.py new file mode 100644 index 0000000000000000000000000000000000000000..5514b7e27c7de4047eab750fd6e1e811728a5139 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/rotate/tools/slicebase.py @@ -0,0 +1,267 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import math +import copy +from numbers import Number +from multiprocessing import Pool + +import cv2 +import numpy as np +from tqdm import tqdm +import shapely.geometry as shgeo + + +def choose_best_pointorder_fit_another(poly1, poly2): + """ + To make the two polygons best fit with each point + """ + x1, y1, x2, y2, x3, y3, x4, y4 = poly1 + combinate = [ + np.array([x1, y1, x2, y2, x3, y3, x4, y4]), + np.array([x2, y2, x3, y3, x4, y4, x1, y1]), + np.array([x3, y3, x4, y4, x1, y1, x2, y2]), + np.array([x4, y4, x1, y1, x2, y2, x3, y3]) + ] + dst_coordinate = np.array(poly2) + distances = np.array( + [np.sum((coord - dst_coordinate)**2) for coord in combinate]) + sorted = distances.argsort() + return combinate[sorted[0]] + + +def cal_line_length(point1, point2): + return math.sqrt( + math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2)) + + +class SliceBase(object): + def __init__(self, + gap=512, + subsize=1024, + thresh=0.7, + choosebestpoint=True, + ext='.png', + padding=True, + num_process=8, + image_only=False): + self.gap = gap + self.subsize = subsize + self.slide = subsize - gap + self.thresh = thresh + self.choosebestpoint = choosebestpoint + self.ext = ext + self.padding = padding + self.num_process = num_process + self.image_only = image_only + + def get_windows(self, height, width): + windows = [] + left, up = 0, 0 + while (left < width): + if (left + self.subsize >= width): + left = max(width - self.subsize, 0) + up = 0 + while (up < height): + if (up + self.subsize >= height): + up = max(height - self.subsize, 0) + right = min(left + self.subsize, width - 1) + down = min(up + self.subsize, height - 1) + windows.append((left, up, right, down)) + if (up + self.subsize >= height): + break + else: + up = up + self.slide + if (left + self.subsize >= width): + break + else: + left = left + self.slide + + return windows + + def slice_image_single(self, image, windows, output_dir, output_name): + image_dir = os.path.join(output_dir, 'images') + for (left, up, right, down) in windows: + image_name = output_name + str(left) + '___' + str(up) + self.ext + subimg = copy.deepcopy(image[up:up + self.subsize, left:left + + self.subsize]) + h, w, c = subimg.shape + if (self.padding): + outimg = np.zeros((self.subsize, self.subsize, 3)) + outimg[0:h, 0:w, :] = subimg + cv2.imwrite(os.path.join(image_dir, image_name), outimg) + else: + cv2.imwrite(os.path.join(image_dir, image_name), subimg) + + def iof(self, poly1, poly2): + inter_poly = poly1.intersection(poly2) + inter_area = inter_poly.area + poly1_area = poly1.area + half_iou = inter_area / poly1_area + return inter_poly, half_iou + + def translate(self, poly, left, up): + n = len(poly) + out_poly = np.zeros(n) + for i in range(n // 2): + out_poly[i * 2] = int(poly[i * 2] - left) + out_poly[i * 2 + 1] = int(poly[i * 2 + 1] - up) + return out_poly + + def get_poly4_from_poly5(self, poly): + distances = [ + cal_line_length((poly[i * 2], poly[i * 2 + 1]), + (poly[(i + 1) * 2], poly[(i + 1) * 2 + 1])) + for i in range(int(len(poly) / 2 - 1)) + ] + distances.append( + cal_line_length((poly[0], poly[1]), (poly[8], poly[9]))) + pos = np.array(distances).argsort()[0] + count = 0 + out_poly = [] + while count < 5: + if (count == pos): + out_poly.append( + (poly[count * 2] + poly[(count * 2 + 2) % 10]) / 2) + out_poly.append( + (poly[(count * 2 + 1) % 10] + poly[(count * 2 + 3) % 10]) / + 2) + count = count + 1 + elif (count == (pos + 1) % 5): + count = count + 1 + continue + + else: + out_poly.append(poly[count * 2]) + out_poly.append(poly[count * 2 + 1]) + count = count + 1 + return out_poly + + def slice_anno_single(self, annos, windows, output_dir, output_name): + anno_dir = os.path.join(output_dir, 'labelTxt') + for (left, up, right, down) in windows: + image_poly = shgeo.Polygon( + [(left, up), (right, up), (right, down), (left, down)]) + anno_file = output_name + str(left) + '___' + str(up) + '.txt' + with open(os.path.join(anno_dir, anno_file), 'w') as f: + for anno in annos: + gt_poly = shgeo.Polygon( + [(anno['poly'][0], anno['poly'][1]), + (anno['poly'][2], anno['poly'][3]), + (anno['poly'][4], anno['poly'][5]), + (anno['poly'][6], anno['poly'][7])]) + if gt_poly.area <= 0: + continue + inter_poly, iof = self.iof(gt_poly, image_poly) + if iof == 1: + final_poly = self.translate(anno['poly'], left, up) + elif iof > 0: + inter_poly = shgeo.polygon.orient(inter_poly, sign=1) + out_poly = list(inter_poly.exterior.coords)[0:-1] + if len(out_poly) < 4 or len(out_poly) > 5: + continue + + final_poly = [] + for p in out_poly: + final_poly.append(p[0]) + final_poly.append(p[1]) + + if len(out_poly) == 5: + final_poly = self.get_poly4_from_poly5(final_poly) + + if self.choosebestpoint: + final_poly = choose_best_pointorder_fit_another( + final_poly, anno['poly']) + + final_poly = self.translate(final_poly, left, up) + final_poly = np.clip(final_poly, 1, self.subsize) + else: + continue + outline = ' '.join(list(map(str, final_poly))) + if iof >= self.thresh: + outline = outline + ' ' + anno['name'] + ' ' + str(anno[ + 'difficult']) + else: + outline = outline + ' ' + anno['name'] + ' ' + '2' + + f.write(outline + '\n') + + def slice_data_single(self, info, rate, output_dir): + file_name = info['image_file'] + base_name = os.path.splitext(os.path.split(file_name)[-1])[0] + base_name = base_name + '__' + str(rate) + '__' + img = cv2.imread(file_name) + if img.shape == (): + return + + if (rate != 1): + resize_img = cv2.resize( + img, None, fx=rate, fy=rate, interpolation=cv2.INTER_CUBIC) + else: + resize_img = img + + height, width, _ = resize_img.shape + windows = self.get_windows(height, width) + self.slice_image_single(resize_img, windows, output_dir, base_name) + if not self.image_only: + annos = info['annotation'] + for anno in annos: + anno['poly'] = list(map(lambda x: rate * x, anno['poly'])) + self.slice_anno_single(annos, windows, output_dir, base_name) + + def check_or_mkdirs(self, path): + if not os.path.exists(path): + os.makedirs(path, exist_ok=True) + + def slice_data(self, infos, rates, output_dir): + """ + Args: + infos (list[dict]): data_infos + rates (float, list): scale rates + output_dir (str): output directory + """ + if isinstance(rates, Number): + rates = [rates, ] + + self.check_or_mkdirs(output_dir) + self.check_or_mkdirs(os.path.join(output_dir, 'images')) + if not self.image_only: + self.check_or_mkdirs(os.path.join(output_dir, 'labelTxt')) + + pbar = tqdm(total=len(rates) * len(infos), desc='slicing data') + + if self.num_process <= 1: + for rate in rates: + for info in infos: + self.slice_data_single(info, rate, output_dir) + pbar.update() + else: + pool = Pool(self.num_process) + for rate in rates: + for info in infos: + pool.apply_async( + self.slice_data_single, (info, rate, output_dir), + callback=lambda x: pbar.update()) + + pool.close() + pool.join() + + pbar.close() diff --git a/PaddleDetection-release-2.6/configs/runtime.yml b/PaddleDetection-release-2.6/configs/runtime.yml new file mode 100644 index 0000000000000000000000000000000000000000..a58b171ce774e045f4db2e0894a6781a25e0ec03 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/runtime.yml @@ -0,0 +1,16 @@ +use_gpu: true +use_xpu: false +use_mlu: false +use_npu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false +print_params: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + fuse_conv_bn: False diff --git a/PaddleDetection-release-2.6/configs/semi_det/README.md b/PaddleDetection-release-2.6/configs/semi_det/README.md new file mode 100644 index 0000000000000000000000000000000000000000..996a1decfec0328420654d2d39d930ea2c7fdc0f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/README.md @@ -0,0 +1,417 @@ +简体中文 | [English](README_en.md) + +# Semi-Supervised Detection (Semi DET) 半监督检测 + +## 内容 +- [简介](#简介) +- [模型库](#模型库) + - [Baseline](#Baseline) + - [DenseTeacher](#DenseTeacher) +- [半监督数据集准备](#半监督数据集准备) +- [半监督检测配置](#半监督检测配置) + - [训练集配置](#训练集配置) + - [预训练配置](#预训练配置) + - [全局配置](#全局配置) + - [模型配置](#模型配置) + - [数据增强配置](#数据增强配置) + - [其他配置](#其他配置) +- [使用说明](#使用说明) + - [训练](#训练) + - [评估](#评估) + - [预测](#预测) + - [部署](#部署) +- [引用](#引用) + +## 简介 +半监督目标检测(Semi DET)是**同时使用有标注数据和无标注数据**进行训练的目标检测,既可以极大地节省标注成本,也可以充分利用无标注数据进一步提高检测精度。PaddleDetection团队复现了[DenseTeacher](denseteacher)半监督检测算法,用户可以下载使用。 + +## 模型库 + +### [Baseline](baseline) + +**纯监督数据**模型的训练和模型库,请参照[Baseline](baseline); + + +### [DenseTeacher](denseteacher) + +| 模型 | 监督数据比例 | Sup Baseline | Sup Epochs (Iters) | Sup mAPval
    0.5:0.95 | Semi mAPval
    0.5:0.95 | Semi Epochs (Iters) | 模型下载 | 配置文件 | +| :------------: | :---------: | :---------------------: | :---------------------: |:---------------------------: |:----------------------------: | :------------------: |:--------: |:----------: | +| DenseTeacher-FCOS | 5% | [sup_config](./baseline/fcos_r50_fpn_2x_coco_sup005.yml) | 24 (8712) | 21.3 | **30.6** | 240 (87120) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi005.pdparams) | [config](denseteacher/denseteacher_fcos_r50_fpn_coco_semi005.yml) | +| DenseTeacher-FCOS | 10% | [sup_config](./baseline/fcos_r50_fpn_2x_coco_sup010.yml) | 24 (17424) | 26.3 | **35.1** | 240 (174240) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010.pdparams) | [config](denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml) | +| DenseTeacher-FCOS(LSJ)| 10% | [sup_config](./baseline/fcos_r50_fpn_2x_coco_sup010.yml) | 24 (17424) | 26.3 | **37.1(LSJ)** | 240 (174240) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010_lsj.pdparams) | [config](denseteacher/denseteacher_fcos_r50_fpn_coco_semi010_lsj.yml) | +| DenseTeacher-FCOS |100%(full)| [sup_config](./../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.ymll) | 24 (175896) | 42.6 | **44.2** | 24 (175896)| [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_full.pdparams) | [config](denseteacher/denseteacher_fcos_r50_fpn_coco_full.yml) | + + +## 半监督数据集准备 + +半监督目标检测**同时需要有标注数据和无标注数据**,且无标注数据量一般**远多于有标注数据量**。 +对于COCO数据集一般有两种常规设置: + +(1)抽取部分比例的原始训练集`train2017`作为标注数据和无标注数据; + +从`train2017`中按固定百分比(1%、2%、5%、10%等)抽取,由于抽取方法会对半监督训练的结果影响较大,所以采用五折交叉验证来评估。运行数据集划分制作的脚本如下: +```bash +python tools/gen_semi_coco.py +``` +会按照 1%、2%、5%、10% 的监督数据比例来划分`train2017`全集,为了交叉验证每一种划分会随机重复5次,生成的半监督标注文件如下: +- 标注数据集标注:`instances_train2017.{fold}@{percent}.json` +- 无标注数据集标注:`instances_train2017.{fold}@{percent}-unlabeled.json` +其中,`fold` 表示交叉验证,`percent` 表示有标注数据的百分比。 + +注意如果根据`txt_file`生成,需要下载`COCO_supervision.txt`: +```shell +wget https://bj.bcebos.com/v1/paddledet/data/coco/COCO_supervision.txt +``` + +(2)使用全量原始训练集`train2017`作为有标注数据 和 全量原始无标签图片集`unlabeled2017`作为无标注数据; + + +### 下载链接 + +PaddleDetection团队提供了COCO数据集全部的标注文件,请下载并解压存放至对应目录: + +```shell +# 下载COCO全量数据集图片和标注 +# 包括 train2017, val2017, annotations +wget https://bj.bcebos.com/v1/paddledet/data/coco.tar + +# 下载PaddleDetection团队整理的COCO部分比例数据的标注文件 +wget https://bj.bcebos.com/v1/paddledet/data/coco/semi_annotations.zip + +# unlabeled2017是可选,如果不需要训‘full’则无需下载 +# 下载COCO全量 unlabeled 无标注数据集 +wget https://bj.bcebos.com/v1/paddledet/data/coco/unlabeled2017.zip +wget https://bj.bcebos.com/v1/paddledet/data/coco/image_info_unlabeled2017.zip +# 下载转换完的 unlabeled2017 无标注json文件 +wget https://bj.bcebos.com/v1/paddledet/data/coco/instances_unlabeled2017.zip +``` + +如果需要用到COCO全量unlabeled无标注数据集,需要将原版的`image_info_unlabeled2017.json`进行格式转换,运行以下代码: + +
    + COCO unlabeled 标注转换代码: + +```python +import json +anns_train = json.load(open('annotations/instances_train2017.json', 'r')) +anns_unlabeled = json.load(open('annotations/image_info_unlabeled2017.json', 'r')) +unlabeled_json = { + 'images': anns_unlabeled['images'], + 'annotations': [], + 'categories': anns_train['categories'], +} +path = 'annotations/instances_unlabeled2017.json' +with open(path, 'w') as f: + json.dump(unlabeled_json, f) +``` + +
    + + +
    + 解压后的数据集目录如下: + +``` +PaddleDetection +├── dataset +│ ├── coco +│ │ ├── annotations +│ │ │ ├── instances_train2017.json +│ │ │ ├── instances_unlabeled2017.json +│ │ │ ├── instances_val2017.json +│ │ ├── semi_annotations +│ │ │ ├── instances_train2017.1@1.json +│ │ │ ├── instances_train2017.1@1-unlabeled.json +│ │ │ ├── instances_train2017.1@2.json +│ │ │ ├── instances_train2017.1@2-unlabeled.json +│ │ │ ├── instances_train2017.1@5.json +│ │ │ ├── instances_train2017.1@5-unlabeled.json +│ │ │ ├── instances_train2017.1@10.json +│ │ │ ├── instances_train2017.1@10-unlabeled.json +│ │ ├── train2017 +│ │ ├── unlabeled2017 +│ │ ├── val2017 +``` + +
    + +## 半监督检测配置 + +配置半监督检测,需要基于选用的**基础检测器**的配置文件,如: + +```python +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', + '../_base_/coco_detection_percent_10.yml', +] +log_iter: 50 +snapshot_epoch: 5 +epochs: &epochs 240 +weights: output/denseteacher_fcos_r50_fpn_coco_semi010/model_final +``` +并依次做出如下几点改动: + +### 训练集配置 + +首先可以直接引用已经配置好的半监督训练集,如: + +```python +_BASE_: [ + '../_base_/coco_detection_percent_10.yml', +] +``` + +具体来看,构建半监督数据集,需要同时配置监督数据集`TrainDataset`和无监督数据集`UnsupTrainDataset`的路径,**注意必须选用`SemiCOCODataSet`类而不是`COCODataSet`类**,如以下所示: + +**COCO-train2017部分比例数据集**: + +```python +# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10-unlabeled.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False +``` + +或者 **COCO-train2017 full全量数据集**: + +```python +# full labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# full unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: unlabeled2017 + anno_path: annotations/instances_unlabeled2017.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False +``` + +验证集`EvalDataset`和测试集`TestDataset`的配置**不需要更改**,且还是采用`COCODataSet`类。 + + +### 预训练配置 + +```python +### pretrain and warmup config, choose one and comment another +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams +semi_start_iters: 5000 +ema_start_iters: 3000 +use_warmup: &use_warmup True +``` + +**注意:** + - `Dense Teacher`原文使用`R50-va-caffe`预训练,PaddleDetection中默认使用`R50-vb`预训练,如果使用`R50-vd`结合[SSLD](../../../docs/feature_models/SSLD_PRETRAINED_MODEL.md)的预训练模型,可进一步显著提升检测精度,同时backbone部分配置也需要做出相应更改,如: + ```python + pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] +``` + +### 全局配置 + +需要在配置文件中添加如下全局配置,并且注意 DenseTeacher 模型需要使用`use_simple_ema: True`而不是`use_ema: True`: + +```python +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 4.0, distill_loss_box: 1.0, distill_loss_quality: 1.0} + concat_sup_data: True + suppress: linear + ratio: 0.01 + gamma: 2.0 + test_cfg: + inference_on: teacher +``` + +### 模型配置 + +如果没有特殊改动,则直接继承自基础检测器里的模型配置。 +以 `DenseTeacher` 为例,选择 `fcos_r50_fpn_iou_multiscale_2x_coco.yml` 作为**基础检测器**进行半监督训练,**teacher网络的结构和student网络的结构均为基础检测器的结构,且结构相同**。 + +```python +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', +] +``` + +### 数据增强配置 + +构建半监督训练集的Reader,需要在原先`TrainReader`的基础上,新增加`weak_aug`,`strong_aug`,`sup_batch_transforms`和`unsup_batch_transforms`,并且需要注意: +- 如果有`NormalizeImage`,需要单独从`sample_transforms`中抽出来放在`weak_aug`和`strong_aug`中; +- `sample_transforms`为**公用的基础数据增强**; +- 完整的弱数据增强为`sample_transforms + weak_aug`,完整的强数据增强为`sample_transforms + strong_aug`; + +如以下所示: + +原纯监督模型的`TrainReader`: +```python +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + batch_size: 2 + shuffle: True + drop_last: True +``` + +更改后的半监督TrainReader: + +```python +### reader config +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + weak_aug: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + sup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + unsup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + sup_batch_size: 2 + unsup_batch_size: 2 + shuffle: True + drop_last: True +``` + +### 其他配置 + +训练epoch数需要和全量数据训练时换算总iter数保持一致,如全量训练24 epoch(换算约为180k个iter),则10%监督数据的半监督训练,总epoch数需要为240 epoch左右(换算约为180k个iter)。示例如下: + +```python +### other config +epoch: 240 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: 240 + use_warmup: True + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_value: 1.0 +``` + + +## 使用说明 + +仅训练时必须使用半监督检测的配置文件去训练,评估、预测、部署也可以按基础检测器的配置文件去执行。 + +### 训练 + +```bash +# 单卡训练 (不推荐,需按线性比例相应地调整学习率) +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml --eval + +# 多卡训练 +python -m paddle.distributed.launch --log_dir=denseteacher_fcos_semi010/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml --eval +``` + +### 评估 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=output/denseteacher_fcos_r50_fpn_coco_semi010/model_final.pdparams +``` + +### 预测 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=output/denseteacher_fcos_r50_fpn_coco_semi010/model_final.pdparams --infer_img=demo/000000014439.jpg +``` + +### 部署 + +部署可以使用半监督检测配置文件,也可以使用基础检测器的配置文件去部署和使用。 + +```bash +# 导出模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010.pdparams + +# 导出权重预测 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/denseteacher_fcos_r50_fpn_coco_semi010 --image_file=demo/000000014439_640x640.jpg --device=GPU + +# 部署测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/denseteacher_fcos_r50_fpn_coco_semi010 --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16 + +# 导出ONNX +paddle2onnx --model_dir output_inference/denseteacher_fcos_r50_fpn_coco_semi010/ --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file denseteacher_fcos_r50_fpn_coco_semi010.onnx +``` + + +## 引用 + +``` + @article{denseteacher2022, + title={Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection}, + author={Hongyu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun}, + journal={arXiv preprint arXiv:2207.02541}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_full.yml b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_full.yml new file mode 100644 index 0000000000000000000000000000000000000000..2805f88c879b8a4b0616fcf587878799fbb42b43 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_full.yml @@ -0,0 +1,31 @@ +metric: COCO +num_classes: 80 + +# full labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# full unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: unlabeled2017 + anno_path: annotations/instances_unlabeled2017.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_1.yml b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_1.yml new file mode 100644 index 0000000000000000000000000000000000000000..569b8e9dc922b9ba290b96bc734e35840d74f551 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_1.yml @@ -0,0 +1,31 @@ +metric: COCO +num_classes: 80 + +# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@1.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@1-unlabeled.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_10.yml b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_10.yml new file mode 100644 index 0000000000000000000000000000000000000000..58746017866851b72b0a10ca0069de30e3e88440 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_10.yml @@ -0,0 +1,31 @@ +metric: COCO +num_classes: 80 + +# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10-unlabeled.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_5.yml b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_5.yml new file mode 100644 index 0000000000000000000000000000000000000000..01d5fde1b22ef51d5a41ebfb83f10cd38683c7cd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/_base_/coco_detection_percent_5.yml @@ -0,0 +1,31 @@ +metric: COCO +num_classes: 80 + +# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +TrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet` +UnsupTrainDataset: + !SemiCOCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5-unlabeled.json + dataset_dir: dataset/coco + data_fields: ['image'] + supervised: False + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + allow_empty: true + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/README.md b/PaddleDetection-release-2.6/configs/semi_det/baseline/README.md new file mode 100644 index 0000000000000000000000000000000000000000..457ad7f7cdba66b83b55c1974b6867d9982dff86 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/README.md @@ -0,0 +1,81 @@ +# Supervised Baseline 纯监督模型基线 + +## COCO数据集模型库 + +### [FCOS](../../fcos) + +| 基础模型 | 监督数据比例 | Epochs (Iters) | mAPval
    0.5:0.95 | 模型下载 | 配置文件 | +| :---------------: | :-------------: | :---------------: |:---------------------: |:--------: | :---------: | +| FCOS ResNet50-FPN | 5% | 24 (8712) | 21.3 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_2x_coco_sup005.pdparams) | [config](fcos_r50_fpn_2x_coco_sup005.yml) | +| FCOS ResNet50-FPN | 10% | 24 (17424) | 26.3 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_2x_coco_sup010.pdparams) | [config](fcos_r50_fpn_2x_coco_sup010.yml) | +| FCOS ResNet50-FPN | full | 24 (175896) | 42.6 | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_iou_multiscale_2x_coco.pdparams) | [config](../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml) | + +**注意:** + - 以上模型训练默认使用8 GPUs,总batch_size默认为16,默认初始学习率为0.01。如果改动了总batch_size,请按线性比例相应地调整学习率。 + + +### [PP-YOLOE+](../../ppyoloe) + +| 基础模型 | 监督数据比例 | Epochs (Iters) | mAPval
    0.5:0.95 | 模型下载 | 配置文件 | +| :---------------: | :-------------: | :---------------: | :---------------------: |:--------: | :---------: | +| PP-YOLOE+_s | 5% | 80 (7200) | 32.8 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco_sup005.pdparams) | [config](ppyoloe_plus_crn_s_80e_coco_sup005.yml) | +| PP-YOLOE+_s | 10% | 80 (14480) | 35.3 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco_sup010.pdparams) | [config](ppyoloe_plus_crn_s_80e_coco_sup010.yml) | +| PP-YOLOE+_s | full | 80 (146560) | 43.7 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](../../ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) | +| PP-YOLOE+_l | 5% | 80 (7200) | 42.9 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco_sup005.pdparams) | [config](ppyoloe_plus_crn_l_80e_coco_sup005.yml) | +| PP-YOLOE+_l | 10% | 80 (14480) | 45.7 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco_sup010.pdparams) | [config](ppyoloe_plus_crn_l_80e_coco_sup010.yml) | +| PP-YOLOE+_l | full | 80 (146560) | 49.8 | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | + +**注意:** + - 以上模型训练默认使用8 GPUs,总batch_size默认为64,默认初始学习率为0.001。如果改动了总batch_size,请按线性比例相应地调整学习率。 + + +### [Faster R-CNN](../../faster_rcnn) + +| 基础模型 | 监督数据比例 | Epochs (Iters) | mAPval
    0.5:0.95 | 模型下载 | 配置文件 | +| :---------------: | :-------------: | :---------------: | :---------------------: |:--------: | :---------: | +| Faster R-CNN ResNet50-FPN | 5% | 24 (8712) | 20.7 | [download](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco_sup005.pdparams) | [config](faster_rcnn_r50_fpn_2x_coco_sup005.yml) | +| Faster R-CNN ResNet50-FPN | 10% | 24 (17424) | 25.6 | [download](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco_sup010.pdparams) | [config](faster_rcnn_r50_fpn_2x_coco_sup010.yml) | +| Faster R-CNN ResNet50-FPN | full | 24 (175896) | 40.0 | [download](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco.pdparams) | [config](../../configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml) | + +**注意:** + - 以上模型训练默认使用8 GPUs,总batch_size默认为16,默认初始学习率为0.02。如果改动了总batch_size,请按线性比例相应地调整学习率。 + + +### [RetinaNet](../../retinanet) + +| 基础模型 | 监督数据比例 | Epochs (Iters) | mAPval
    0.5:0.95 | 模型下载 | 配置文件 | +| :---------------: | :-------------: | :---------------: | :---------------------: |:--------: | :---------: | +| RetinaNet ResNet50-FPN | 5% | 24 (8712) | 13.9 | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco_sup005.pdparams) | [config](retinanet_r50_fpn_2x_coco_sup005.yml) | +| RetinaNet ResNet50-FPN | 10% | 24 (17424) | 23.6 | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco_sup010.pdparams) | [config](retinanet_r50_fpn_2x_coco_sup010.yml) | +| RetinaNet ResNet50-FPN | full | 24 (175896) | 39.1 | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco.pdparams) | [config](../../configs/retinanet/retinanet_r50_fpn_2x_coco.yml) | + +**注意:** + - 以上模型训练默认使用8 GPUs,总batch_size默认为16,默认初始学习率为0.01。如果改动了总batch_size,请按线性比例相应地调整学习率。 + + +### 注意事项 + - COCO部分监督数据集请参照 [数据集准备](../README.md) 去下载和准备,各个比例的训练集均为**从train2017中抽取部分百分比的子集**,默认使用`fold`号为1的划分子集,`sup010`表示抽取10%的监督数据训练,`sup005`表示抽取5%,`full`表示全部train2017,验证集均为val2017全量; + - 抽取部分百分比的监督数据的抽法不同,或使用的`fold`号不同,精度都会因此而有约0.5 mAP之多的差异; + - PP-YOLOE+ 使用Objects365预训练,其余模型均使用ImageNet预训练; + - 线型比例相应调整学习率,参照公式: **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**。 + + +## 使用教程 + +将以下命令写在一个脚本文件里如```run.sh```,一键运行命令为:```sh run.sh```,也可命令行一句句去运行: + +```bash +model_type=semi_det/baseline +job_name=ppyoloe_plus_crn_s_80e_coco_sup010 # 可修改,如 fcos_r50_fpn_2x_coco_sup010 + +config=configs/${model_type}/${job_name}.yml +log_dir=log_dir/${job_name} +weights=output/${job_name}/model_final.pdparams + +# 1.training +# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp + +# 2.eval +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} +``` diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup005.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup005.yml new file mode 100644 index 0000000000000000000000000000000000000000..d0e4cf7022b643ee45303efef553f0a6ba71c8a5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup005.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../../faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/faster_rcnn_r50_fpn_2x_coco_sup005/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup010.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup010.yml new file mode 100644 index 0000000000000000000000000000000000000000..80136304b9beaef77c995298824da06d2204d7eb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/faster_rcnn_r50_fpn_2x_coco_sup010.yml @@ -0,0 +1,42 @@ +_BASE_: [ + '../../faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/faster_rcnn_r50_fpn_2x_coco_sup010/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + + +epoch: 24 +LearningRate: + base_lr: 0.02 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup005.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup005.yml new file mode 100644 index 0000000000000000000000000000000000000000..de9982a8c3a1c17dc69ed17cc7f9f4099cf58285 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup005.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/fcos_r50_fpn_2x_coco_sup005/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup010.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup010.yml new file mode 100644 index 0000000000000000000000000000000000000000..3636ae8bbc9eafcd8edcc6d0f6ad3262b34ebb8f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/fcos_r50_fpn_2x_coco_sup010.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/fcos_r50_fpn_2x_coco_sup010/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup005.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup005.yml new file mode 100644 index 0000000000000000000000000000000000000000..4dd4a898e4afaed66c0f3bf27b1991316d965999 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup005.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_80e_coco_sup005/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 80 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup010.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup010.yml new file mode 100644 index 0000000000000000000000000000000000000000..647252175cc3aaa9dbb8edb6c5b48b7d4568cd5b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_l_80e_coco_sup010.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_l_80e_coco_sup010/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 80 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup005.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup005.yml new file mode 100644 index 0000000000000000000000000000000000000000..88de96dcc44b7e5c2e7bee42c14ab6358ff308d1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup005.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_80e_coco_sup005/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 80 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup010.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup010.yml new file mode 100644 index 0000000000000000000000000000000000000000..aeb9435a0fee9ab502185f81a6b3710443471c89 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/ppyoloe_plus_crn_s_80e_coco_sup010.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_crn_s_80e_coco_sup010/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 80 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 5 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup005.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup005.yml new file mode 100644 index 0000000000000000000000000000000000000000..d901ea26e9c811395b35f447a74664de9eb72c91 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup005.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../../retinanet/retinanet_r50_fpn_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/retinanet_r50_fpn_2x_coco_sup005/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@5.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup010.yml b/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup010.yml new file mode 100644 index 0000000000000000000000000000000000000000..5480f3c57549f758f94fea5f5cccfc53142aa663 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/baseline/retinanet_r50_fpn_2x_coco_sup010.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../../retinanet/retinanet_r50_fpn_2x_coco.yml', +] +log_iter: 50 +snapshot_epoch: 2 +weights: output/retinanet_r50_fpn_2x_coco_sup010/model_final + + +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: semi_annotations/instances_train2017.1@10.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class'] + + +epoch: 24 +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + epochs: 1 diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/README.md b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7c629cc7c7c45cc8e23dc7bce6e3074f28abe585 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/README.md @@ -0,0 +1,101 @@ +简体中文 | [English](README_en.md) + +# Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection + +## FCOS模型库 + +| 模型 | 监督数据比例 | Sup Baseline | Sup Epochs (Iters) | Sup mAPval
    0.5:0.95 | Semi mAPval
    0.5:0.95 | Semi Epochs (Iters) | 模型下载 | 配置文件 | +| :------------: | :---------: | :---------------------: | :---------------------: |:---------------------------: |:----------------------------: | :------------------: |:--------: |:----------: | +| DenseTeacher-FCOS | 5% | [sup_config](../baseline/fcos_r50_fpn_2x_coco_sup005.yml) | 24 (8712) | 21.3 | **30.6** | 240 (87120) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi005.pdparams) | [config](./denseteacher_fcos_r50_fpn_coco_semi005.yml) | +| DenseTeacher-FCOS | 10% | [sup_config](../baseline/fcos_r50_fpn_2x_coco_sup010.yml) | 24 (17424) | 26.3 | **35.1** | 240 (174240) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010.pdparams) | [config](./denseteacher_fcos_r50_fpn_coco_semi010.yml) | +| DenseTeacher-FCOS(LSJ)| 10% | [sup_config](../baseline/fcos_r50_fpn_2x_coco_sup010.yml) | 24 (17424) | 26.3 | **37.1(LSJ)** | 240 (174240) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010_lsj.pdparams) | [config](./denseteacher_fcos_r50_fpn_coco_semi010_lsj.yml) | +| DenseTeacher-FCOS |100%(full)| [sup_config](../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.ymll) | 24 (175896) | 42.6 | **44.2** | 24 (175896)| [download](https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_full.pdparams) | [config](./denseteacher_fcos_r50_fpn_coco_full.yml) | + + +**注意:** + - 以上模型训练默认使用8 GPUs,监督数据总batch_size默认为16,无监督数据总batch_size默认也为16,默认初始学习率为0.01。如果改动了总batch_size,请按线性比例相应地调整学习率; + - **监督数据比例**是指使用的有标签COCO数据集占 COCO train2017 全量训练集的百分比,使用的无标签COCO数据集一般也是相同比例,但具体图片和有标签数据的图片不重合; + - `Semi Epochs (Iters)`表示**半监督训练**的模型的 Epochs (Iters),如果使用**自定义数据集**,需自行根据Iters换算到对应的Epochs调整,最好保证总Iters 和COCO数据集的设置较为接近; + - `Sup mAP`是**只使用有监督数据训练**的模型的精度,请参照**基础检测器的配置文件** 和 [baseline](../baseline); + - `Semi mAP`是**半监督训练**的模型的精度,模型下载和配置文件的链接均为**半监督模型**; + - `LSJ`表示 **large-scale jittering**,表示使用更大范围的多尺度训练,可进一步提升精度,但训练速度也会变慢; + - 半监督检测的配置讲解,请参照[文档](../README.md/#半监督检测配置); + - `Dense Teacher`原文使用`R50-va-caffe`预训练,PaddleDetection中默认使用`R50-vb`预训练,如果使用`R50-vd`结合[SSLD](../../../docs/feature_models/SSLD_PRETRAINED_MODEL.md)的预训练模型,可进一步显著提升检测精度,同时backbone部分配置也需要做出相应更改,如: + ```python + pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + ``` + + +## PPYOLOE+ 模型库 + +| 模型 | 监督数据比例 | Sup Baseline | Sup Epochs (Iters) | Sup mAPval
    0.5:0.95 | Semi mAPval
    0.5:0.95 | Semi Epochs (Iters) | 模型下载 | 配置文件 | +| :------------: | :---------: | :---------------------: | :---------------------: |:---------------------------: |:----------------------------: | :------------------: |:--------: |:----------: | +| DenseTeacher-PPYOLOE+_s | 5% | [sup_config](../baseline/ppyoloe_plus_crn_s_80e_coco_sup005.yml) | 80 (14480) | 32.8 | **34.0** | 200 (36200) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_ppyoloe_plus_crn_s_coco_semi005.pdparams) | [config](./denseteacher_ppyoloe_plus_crn_s_coco_semi005.yml) | +| DenseTeacher-PPYOLOE+_s | 10% | [sup_config](../baseline/ppyoloe_plus_crn_s_80e_coco_sup010.yml) | 80 (14480) | 35.3 | **37.5** | 200 (36200) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_ppyoloe_plus_crn_s_coco_semi010.pdparams) | [config](./denseteacher_ppyoloe_plus_crn_s_coco_semi010.yml) | +| DenseTeacher-PPYOLOE+_l | 5% | [sup_config](../baseline/ppyoloe_plus_crn_s_80e_coco_sup005.yml) | 80 (14480) | 42.9 | **45.4** | 200 (36200) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_ppyoloe_plus_crn_l_coco_semi005.pdparams) | [config](./denseteacher_ppyoloe_plus_crn_l_coco_semi005.yml) | +| DenseTeacher-PPYOLOE+_l | 10% | [sup_config](../baseline/ppyoloe_plus_crn_l_80e_coco_sup010.yml) | 80 (14480) | 45.7 | **47.4** | 200 (36200) | [download](https://paddledet.bj.bcebos.com/models/denseteacher_ppyoloe_plus_crn_l_coco_semi010.pdparams) | [config](./denseteacher_ppyoloe_plus_crn_l_coco_semi010.yml) | + + +## 使用说明 + +仅训练时必须使用半监督检测的配置文件去训练,评估、预测、部署也可以按基础检测器的配置文件去执行。 + +### 训练 + +```bash +# 单卡训练 (不推荐,需按线性比例相应地调整学习率) +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml --eval + +# 多卡训练 +python -m paddle.distributed.launch --log_dir=denseteacher_fcos_semi010/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml --eval +``` + +### 评估 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=output/denseteacher_fcos_r50_fpn_coco_semi010/model_final.pdparams +``` + +### 预测 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=output/denseteacher_fcos_r50_fpn_coco_semi010/model_final.pdparams --infer_img=demo/000000014439.jpg +``` + +### 部署 + +部署可以使用半监督检测配置文件,也可以使用基础检测器的配置文件去部署和使用。 + +```bash +# 导出模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml -o weights=https://paddledet.bj.bcebos.com/models/denseteacher_fcos_r50_fpn_coco_semi010.pdparams + +# 导出权重预测 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/denseteacher_fcos_r50_fpn_coco_semi010 --image_file=demo/000000014439_640x640.jpg --device=GPU + +# 部署测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/denseteacher_fcos_r50_fpn_coco_semi010 --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16 + +# 导出ONNX +paddle2onnx --model_dir output_inference/denseteacher_fcos_r50_fpn_coco_semi010/ --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file denseteacher_fcos_r50_fpn_coco_semi010.onnx +``` + + +## 引用 + +``` + @article{denseteacher2022, + title={Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection}, + author={Hongyu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun}, + journal={arXiv preprint arXiv:2207.02541}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_full.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_full.yml new file mode 100644 index 0000000000000000000000000000000000000000..1b15b222387dfcd94e2f933c34d6810ddace4f45 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_full.yml @@ -0,0 +1,166 @@ +_BASE_: [ + 'denseteacher_fcos_r50_fpn_coco_semi010.yml', + '../_base_/coco_detection_full.yml', +] +log_iter: 100 +snapshot_epoch: 2 +epochs: &epochs 24 +weights: output/denseteacher_fcos_r50_fpn_coco_full/model_final + + +### pretrain and warmup config, choose one and comment another +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/fcos_r50_fpn_iou_multiscale_2x_coco.pdparams # mAP=42.6 +# semi_start_iters: 0 +# ema_start_iters: 0 +# use_warmup: &use_warmup False + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams +semi_start_iters: 5000 +ema_start_iters: 3000 +use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 2.0, distill_loss_box: 1.0, distill_loss_quality: 1.0} + concat_sup_data: True + suppress: linear + ratio: 0.01 + gamma: 2.0 + test_cfg: + inference_on: teacher + + +### reader config +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + weak_aug: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + sup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + num_shift: 0.5 + norm_reg_targets: True + unsup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + sup_batch_size: 2 + unsup_batch_size: 2 + shuffle: True + drop_last: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True + + +### model config +architecture: FCOS +FCOS: + backbone: ResNet + neck: FPN + fcos_head: FCOSHead + +ResNet: + depth: 50 + variant: 'b' + norm_type: bn + freeze_at: 0 # res2 + return_idx: [1, 2, 3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: True + use_c5: False + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + num_shift: 0.5 + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + quality: "iou" + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [*epochs] + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_value: 1.0 diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi005.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi005.yml new file mode 100644 index 0000000000000000000000000000000000000000..3efa1a04b82351673cd72a68415a7115e9759b38 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi005.yml @@ -0,0 +1,164 @@ +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', + '../_base_/coco_detection_percent_5.yml', +] +log_iter: 20 +snapshot_epoch: 5 +epochs: &epochs 240 # 480 will be better +weights: output/denseteacher_fcos_r50_fpn_coco_semi005/model_final + + +### pretrain and warmup config, choose one and comment another +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/fcos_r50_fpn_2x_coco_sup005.pdparams # mAP=21.3 +# semi_start_iters: 0 +# ema_start_iters: 0 +# use_warmup: &use_warmup False + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams +semi_start_iters: 5000 +ema_start_iters: 3000 +use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 4.0, distill_loss_box: 1.0, distill_loss_quality: 1.0} + concat_sup_data: True + suppress: linear + ratio: 0.01 + gamma: 2.0 + test_cfg: + inference_on: teacher + + +### reader config +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + weak_aug: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + sup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + norm_reg_targets: True + unsup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + sup_batch_size: 2 + unsup_batch_size: 2 + shuffle: True + drop_last: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True + + +### model config +architecture: FCOS +FCOS: + backbone: ResNet + neck: FPN + fcos_head: FCOSHead + +ResNet: + depth: 50 + variant: 'b' + norm_type: bn + freeze_at: 0 # res2 + return_idx: [1, 2, 3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: True + use_c5: False + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + quality: "iou" + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [*epochs] + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_value: 1.0 diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml new file mode 100644 index 0000000000000000000000000000000000000000..76d884ca20fb3cd819c5dc5aed954df4cfad0848 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010.yml @@ -0,0 +1,169 @@ +_BASE_: [ + '../../fcos/fcos_r50_fpn_iou_multiscale_2x_coco.yml', + '../_base_/coco_detection_percent_10.yml', +] +log_iter: 50 +snapshot_epoch: 5 +epochs: &epochs 240 +weights: output/denseteacher_fcos_r50_fpn_coco_semi010/model_final + + +### pretrain and warmup config, choose one and comment another +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/fcos_r50_fpn_2x_coco_sup010.pdparams # mAP=26.3 +# semi_start_iters: 0 +# ema_start_iters: 0 +# use_warmup: &use_warmup False + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams +semi_start_iters: 5000 +ema_start_iters: 3000 +use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 4.0, distill_loss_box: 1.0, distill_loss_quality: 1.0} + concat_sup_data: True + suppress: linear + ratio: 0.01 + gamma: 2.0 + test_cfg: + inference_on: teacher + + +### reader config +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + weak_aug: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + sup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + num_shift: 0. # default 0.5 + multiply_strides_reg_targets: False + norm_reg_targets: True + unsup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + sup_batch_size: 2 + unsup_batch_size: 2 + shuffle: True + drop_last: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True + + +### model config +architecture: FCOS +FCOS: + backbone: ResNet + neck: FPN + fcos_head: FCOSHead + +ResNet: + depth: 50 + variant: 'b' + norm_type: bn + freeze_at: 0 # res2 + return_idx: [1, 2, 3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: True + use_c5: False + +FCOSHead: + fcos_feat: + name: FCOSFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: "gn" + use_dcn: False + fpn_stride: [8, 16, 32, 64, 128] + prior_prob: 0.01 + norm_reg_targets: True + centerness_on_reg: True + num_shift: 0. # default 0.5 + multiply_strides_reg_targets: False + sqrt_score: False + fcos_loss: + name: FCOSLoss + loss_alpha: 0.25 + loss_gamma: 2.0 + iou_loss_type: "giou" + reg_weights: 1.0 + quality: "iou" + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [*epochs] + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + clip_grad_by_value: 1.0 diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010_lsj.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010_lsj.yml new file mode 100644 index 0000000000000000000000000000000000000000..32107c93f86dec016880ec4e6ba53adff21e47a1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_fcos_r50_fpn_coco_semi010_lsj.yml @@ -0,0 +1,44 @@ +_BASE_: [ + 'denseteacher_fcos_r50_fpn_coco_semi010.yml', +] +log_iter: 50 +snapshot_epoch: 5 +epochs: &epochs 240 +weights: output/denseteacher_fcos_r50_fpn_coco_semi010_lsj/model_final + + +### reader config +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + # large-scale jittering + - RandomResize: {target_size: [[400, 1333], [1200, 1333]], keep_ratio: True, interp: 1, random_range: True} + - RandomFlip: {} + weak_aug: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + sup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + - Gt2FCOSTarget: + object_sizes_boundary: [64, 128, 256, 512] + center_sampling_radius: 1.5 + downsample_ratios: [8, 16, 32, 64, 128] + num_shift: 0. # default 0.5 + multiply_strides_reg_targets: False + norm_reg_targets: True + unsup_batch_transforms: + - Permute: {} + - PadBatch: {pad_to_stride: 32} + sup_batch_size: 2 + unsup_batch_size: 2 + shuffle: True + drop_last: True diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi005.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi005.yml new file mode 100644 index 0000000000000000000000000000000000000000..920613fd9e092f3c53783f71571e93b2413a388f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi005.yml @@ -0,0 +1,151 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml', + '../_base_/coco_detection_percent_5.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/denseteacher_ppyoloe_plus_crn_l_coco_semi005/model_final + +epochs: &epochs 200 +cosine_epochs: &cosine_epochs 240 + + +### pretrain and warmup config, choose one and comment another +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco_sup005.pdparams # mAP=42.9 +semi_start_iters: 0 +ema_start_iters: 0 +use_warmup: &use_warmup False + +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +# semi_start_iters: 5000 +# ema_start_iters: 3000 +# use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 1.0, distill_loss_iou: 2.5, distill_loss_dfl: 0., distill_loss_contrast: 0.1} + contrast_loss: + temperature: 0.2 + alpha: 0.9 + smooth_iter: 100 + concat_sup_data: True + suppress: linear + ratio: 0.01 + test_cfg: + inference_on: teacher + + +### reader config +batch_size: &batch_size 8 +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - RandomCrop: {} # unsup will be fake gt_boxes + weak_aug: + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + sup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + - PadGT: {} + unsup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + sup_batch_size: *batch_size + unsup_batch_size: *batch_size + shuffle: True + drop_last: True + collate_batch: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +### model config +architecture: PPYOLOE +norm_type: sync_bn +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +eval_size: ~ # means None, but not str 'None' +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: *cosine_epochs + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + epochs: 3 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 # dt-fcos 0.0001 + type: L2 + clip_grad_by_norm: 1.0 # dt-fcos clip_grad_by_value diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi010.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi010.yml new file mode 100644 index 0000000000000000000000000000000000000000..253a8c18ca773f9216aad9f32025261c3976ba38 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_l_coco_semi010.yml @@ -0,0 +1,151 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml', + '../_base_/coco_detection_percent_10.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/denseteacher_ppyoloe_plus_crn_l_coco_semi010/model_final + +epochs: &epochs 200 +cosine_epochs: &cosine_epochs 240 + + +### pretrain and warmup config, choose one and comment another +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco_sup010.pdparams # mAP=45.7 +semi_start_iters: 0 +ema_start_iters: 0 +use_warmup: &use_warmup False + +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +# semi_start_iters: 5000 +# ema_start_iters: 3000 +# use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 1.0, distill_loss_iou: 2.5, distill_loss_dfl: 0., distill_loss_contrast: 0.1} + contrast_loss: + temperature: 0.2 + alpha: 0.9 + smooth_iter: 100 + concat_sup_data: True + suppress: linear + ratio: 0.01 + test_cfg: + inference_on: teacher + + +### reader config +batch_size: &batch_size 8 +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - RandomCrop: {} # unsup will be fake gt_boxes + weak_aug: + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + sup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + - PadGT: {} + unsup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + sup_batch_size: *batch_size + unsup_batch_size: *batch_size + shuffle: True + drop_last: True + collate_batch: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +### model config +architecture: PPYOLOE +norm_type: sync_bn +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +eval_size: ~ # means None, but not str 'None' +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: *cosine_epochs + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + epochs: 3 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 # dt-fcos 0.0001 + type: L2 + clip_grad_by_norm: 1.0 # dt-fcos clip_grad_by_value diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi005.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi005.yml new file mode 100644 index 0000000000000000000000000000000000000000..d3482e5e9d18e4b7459a4457dd78043fc56fb7db --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi005.yml @@ -0,0 +1,151 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml', + '../_base_/coco_detection_percent_5.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/denseteacher_ppyoloe_plus_crn_s_coco_semi005/model_final + +epochs: &epochs 200 +cosine_epochs: &cosine_epochs 240 + + +### pretrain and warmup config, choose one and comment another +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_s_80e_coco_sup005.pdparams # mAP=32.8 +semi_start_iters: 0 +ema_start_iters: 0 +use_warmup: &use_warmup False + +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +# semi_start_iters: 5000 +# ema_start_iters: 3000 +# use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 1.0, distill_loss_iou: 2.5, distill_loss_dfl: 0., distill_loss_contrast: 0.1} + contrast_loss: + temperature: 0.2 + alpha: 0.9 + smooth_iter: 100 + concat_sup_data: True + suppress: linear + ratio: 0.01 + test_cfg: + inference_on: teacher + + +### reader config +batch_size: &batch_size 8 +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - RandomCrop: {} # unsup will be fake gt_boxes + weak_aug: + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + sup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + - PadGT: {} + unsup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + sup_batch_size: *batch_size + unsup_batch_size: *batch_size + shuffle: True + drop_last: True + collate_batch: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +### model config +architecture: PPYOLOE +norm_type: sync_bn +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +eval_size: ~ # means None, but not str 'None' +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: *cosine_epochs + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + epochs: 3 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 # dt-fcos 0.0001 + type: L2 + clip_grad_by_norm: 1.0 # dt-fcos clip_grad_by_value diff --git a/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi010.yml b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi010.yml new file mode 100644 index 0000000000000000000000000000000000000000..e8b0aad3aff745ac9b3a62d8e18d470f4fe6698a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/semi_det/denseteacher/denseteacher_ppyoloe_plus_crn_s_coco_semi010.yml @@ -0,0 +1,151 @@ +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml', + '../_base_/coco_detection_percent_10.yml', +] +log_iter: 50 +snapshot_epoch: 5 +weights: output/denseteacher_ppyoloe_plus_crn_s_coco_semi010/model_final + +epochs: &epochs 200 +cosine_epochs: &cosine_epochs 240 + + +### pretrain and warmup config, choose one and comment another +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_s_80e_coco_sup010.pdparams # mAP=35.3 +semi_start_iters: 0 +ema_start_iters: 0 +use_warmup: &use_warmup False + +# pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams +# semi_start_iters: 5000 +# ema_start_iters: 3000 +# use_warmup: &use_warmup True + + +### global config +use_simple_ema: True +ema_decay: 0.9996 +ssod_method: DenseTeacher +DenseTeacher: + train_cfg: + sup_weight: 1.0 + unsup_weight: 1.0 + loss_weight: {distill_loss_cls: 1.0, distill_loss_iou: 2.5, distill_loss_dfl: 0., distill_loss_contrast: 0.1} + contrast_loss: + temperature: 0.2 + alpha: 0.9 + smooth_iter: 100 + concat_sup_data: True + suppress: linear + ratio: 0.01 + test_cfg: + inference_on: teacher + + +### reader config +batch_size: &batch_size 8 +worker_num: 2 +SemiTrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - RandomCrop: {} # unsup will be fake gt_boxes + weak_aug: + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + strong_aug: + - StrongAugImage: {transforms: [ + RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1}, + RandomErasingCrop: {}, + RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]}, + RandomGrayscale: {prob: 0.2}, + ]} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], is_scale: true, norm_type: none} + sup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + - PadGT: {} + unsup_batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - Permute: {} + sup_batch_size: *batch_size + unsup_batch_size: *batch_size + shuffle: True + drop_last: True + collate_batch: True + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + + +### model config +architecture: PPYOLOE +norm_type: sync_bn +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +eval_size: ~ # means None, but not str 'None' +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +### other config +epoch: *epochs +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: *cosine_epochs + use_warmup: *use_warmup + - !LinearWarmup + start_factor: 0.001 + epochs: 3 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 # dt-fcos 0.0001 + type: L2 + clip_grad_by_norm: 1.0 # dt-fcos clip_grad_by_value diff --git a/PaddleDetection-release-2.6/configs/slim/README.md b/PaddleDetection-release-2.6/configs/slim/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6d67b37cb27e93ce71745eb442b1b3e1ceeb370e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/README.md @@ -0,0 +1,182 @@ +# 模型压缩 + +在PaddleDetection中, 提供了基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)进行模型压缩的完整教程和benchmark。目前支持的方法: + +- [剪裁](prune) +- [量化](quant) +- [离线量化](post_quant) +- [蒸馏](distill) +- [联合策略](extensions) + +推荐您使用剪裁和蒸馏联合训练,或者使用剪裁、量化训练和离线量化,进行检测模型压缩。 下面以YOLOv3为例,进行剪裁、蒸馏和量化实验。 + +## 实验环境 + +- Python 3.7+ +- PaddlePaddle >= 2.1.0 +- PaddleSlim >= 2.1.0 +- CUDA 10.1+ +- cuDNN >=7.6.5 + +**PaddleDetection、 PaddlePaddle与PaddleSlim 版本关系:** +| PaddleDetection版本 | PaddlePaddle版本 | PaddleSlim版本 | 备注 | +| :------------------: | :---------------: | :-------: |:---------------: | +| release/2.3 | >= 2.1 | 2.1 | 离线量化依赖Paddle 2.2及PaddleSlim 2.2 | +| release/2.1 | 2.2 | >= 2.1.0 | 2.1 | 量化模型导出依赖最新Paddle develop分支,可在[PaddlePaddle每日版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)中下载安装 | +| release/2.0 | >= 2.0.1 | 2.0 | 量化依赖Paddle 2.1及PaddleSlim 2.1 | + + +#### 安装PaddleSlim +- 方法一:直接安装: +``` +pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple +``` +- 方法二:编译安装: +``` +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd PaddleSlim +python setup.py install +``` + +## 快速开始 + +### 训练 + +```shell +python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} +``` + +- `-c`: 指定模型配置文件。 +- `--slim_config`: 指定压缩策略配置文件。 + + +### 评估 + +```shell +python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: 指定模型配置文件。 +- `--slim_config`: 指定压缩策略配置文件。 +- `-o weights`: 指定压缩算法训好的模型路径。 + +### 测试 + +```shell +python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \ + -o weights=output/{SLIM_CONFIG}/model_final + --infer_img={IMAGE_PATH} +``` + +- `-c`: 指定模型配置文件。 +- `--slim_config`: 指定压缩策略配置文件。 +- `-o weights`: 指定压缩算法训好的模型路径。 +- `--infer_img`: 指定测试图像路径。 + + +## 全链条部署 + +### 动转静导出模型 + +```shell +python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: 指定模型配置文件。 +- `--slim_config`: 指定压缩策略配置文件。 +- `-o weights`: 指定压缩算法训好的模型路径。 + +### 部署预测 + +- Paddle-Inference预测: + - [Python部署](../../deploy/python/README.md) + - [C++部署](../../deploy/cpp/README.md) + - [TensorRT预测部署教程](../../deploy/TENSOR_RT.md) +- 服务器端部署:使用[PaddleServing](../../deploy/serving/README.md)部署。 +- 手机移动端部署:使用[Paddle-Lite](../../deploy/lite/README.md) 在手机移动端部署。 + +## Benchmark + +### 剪裁 + +#### Pascal VOC上benchmark + +| 模型 | 压缩策略 | GFLOPs | 模型体积(MB) | 输入尺寸 | 预测时延(SD855) | Box AP | 下载 | 模型配置文件 | 压缩算法配置文件 | +| :---------: | :-------: | :------------: |:-------------: | :------: | :-------------: | :------: | :-----------------------------------------------------: |:-------------: | :------: | +| YOLOv3-MobileNetV1 | baseline | 24.13 | 93 | 608 | 332.0ms | 75.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | - | +| YOLOv3-MobileNetV1 | 剪裁-l1_norm(sensity) | 15.78(-34.49%) | 66(-29%) | 608 | - | 78.4(+3.3) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_voc_prune_l1_norm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/yolov3_prune_l1_norm.yml) | + +#### COCO上benchmark +| 模型 | 压缩策略 | GFLOPs | 模型体积(MB) | 输入尺寸 | 预测时延(SD855) | Box AP | 下载 | 模型配置文件 | 压缩算法配置文件 | +| :---------: | :-------: | :------------: |:-------------: | :------: | :-------------: | :------: | :-----------------------------------------------------: |:-------------: | :------: | +| PP-YOLO-MobileNetV3_large | baseline | -- | 18.5 | 608 | 25.1ms | 23.2 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | 剪裁-FPGM | -37% | 12.6 | 608 | - | 22.3 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_prune_fpgm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml) | +| YOLOv3-DarkNet53 | baseline | -- | 238.2 | 608 | - | 39.0 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | 剪裁-FPGM | -24% | - | 608 | - | 37.6 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_prune_fpgm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/yolov3_darknet_prune_fpgm.yml) | +| PP-YOLO_R50vd | baseline | -- | 183.3 | 608 | - | 44.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | 剪裁-FPGM | -35% | - | 608 | - | 42.1 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_prune_fpgm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml) | + +说明: +- 目前剪裁除RCNN系列模型外,其余模型均已支持。 +- SD855预测时延为使用PaddleLite部署,使用arm8架构并使用4线程(4 Threads)推理时延。 + +### 量化 + +#### COCO上benchmark + +| 模型 | 压缩策略 | 输入尺寸 | 模型体积(MB) | 预测时延(V100) | 预测时延(SD855) | Box AP | 下载 | Inference模型下载 | 模型配置文件 | 压缩算法配置文件 | +| ------------------ | ------------ | -------- | :---------: | :---------: |:---------: | :---------: | :----------------------------------------------: | :----------------------------------------------: |:------------------------------------------: | :------------------------------------: | +| PP-YOLOE-l | baseline | 640 | - | 11.2ms(trt_fp32) | 7.7ms(trt_fp16) | -- | 50.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | - | +| PP-YOLOE-l | 普通在线量化 | 640 | - | 6.7ms(trt_int8) | -- | 48.8 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyoloe_l_coco_qat.pdparams) | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyoloe_l_qat.yml) | +| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLOv2_R50vd | PACT在线量化 | 640 | -- | 17.3ms | -- | 48.1 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml) | +| PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_dcn_1x_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | PACT在线量化 | 608 | 67.3 | 13.8ms | -- | 44.3 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolo_r50vd_qat_pact.yml) | +| PP-YOLO-MobileNetV3_large | baseline | 320 | 18.5 | 2.7ms | 27.9ms | 23.2 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | 普通在线量化 | 320 | 5.6 | -- | 25.1ms | 24.3 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolo_mbv3_large_qat.yml) | +| YOLOv3-MobileNetV1 | baseline | 608 | 94.2 | 8.9ms | 332ms | 29.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_270e_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | 普通在线量化 | 608 | 25.4 | 6.6ms | 248ms | 30.5 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_mobilenet_v1_qat.yml) | +| YOLOv3-MobileNetV3 | baseline | 608 | 90.3 | 9.4ms | 367.2ms | 31.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_large_270e_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | - | +| YOLOv3-MobileNetV3 | PACT在线量化 | 608 | 24.4 | 8.0ms | 280.0ms | 31.1 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_mobilenet_v3_qat.yml) | +| YOLOv3-DarkNet53 | baseline | 608 | 238.2 | 16.0ms | -- | 39.0 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet53_270e_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | 普通在线量化 | 608 | 78.8 | 12.4ms | -- | 38.8 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_darknet_qat.yml) | +| SSD-MobileNet_v1 | baseline | 300 | 22.5 | 4.4ms | 26.6ms | 73.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_120e_voc.tar) |[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | - | +| SSD-MobileNet_v1 | 普通在线量化 | 300 | 7.1 | -- | 21.5ms | 72.9 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ssd_mobilenet_v1_qat.yml) | +| Mask-ResNet50-FPN | baseline | (800, 1333) | 174.1 | 359.5ms | -- | 39.2/35.6 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_coco.tar) |[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | - | +| Mask-ResNet50-FPN | 普通在线量化 | (800, 1333) | -- | -- | -- | 39.7(+0.5)/35.9(+0.3) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml) | + +说明: +- 上述V100预测时延非量化模型均是使用TensorRT-FP32测试,量化模型均使用TensorRT-INT8测试,并且都包含NMS耗时。 +- SD855预测时延为使用PaddleLite部署,使用arm8架构并使用4线程(4 Threads)推理时延。 +- 上述PP-YOLOE模型均在V100,开启TensorRT环境中测速,不包含NMS。(导出模型时指定:-o trt=True exclude_nms=True) + +### 离线量化 +需要准备val集,用来对离线量化模型进行校准,运行方式: +```shell +python tools/post_quant.py -c configs/{MODEL.yml} --slim_config configs/slim/post_quant/{SLIM_CONFIG.yml} +``` +例如: +```shell +python3.7 tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config=configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml +``` + +### 蒸馏 + +#### COCO上benchmark + +| 模型 | 压缩策略 | 输入尺寸 | Box AP | 下载 | 模型配置文件 | 压缩算法配置文件 | +| ------------------ | ------------ | -------- | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 29.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | 蒸馏 | 608 | 31.0(+1.6) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml) | + +- 具体蒸馏方法请参考[蒸馏策略文档](distill/README.md) + +### 蒸馏剪裁联合策略 + +#### COCO上benchmark + +| 模型 | 压缩策略 | 输入尺寸 | GFLOPs | 模型体积(MB) | 预测时延(SD855) | Box AP | 下载 | 模型配置文件 | 压缩算法配置文件 | +| ------------------ | ------------ | -------- | :---------: |:---------: |:---------: | :---------: |:----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 24.65 | 94.2 | 332.0ms | 29.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | 蒸馏+剪裁 | 608 | 7.54(-69.4%) | 30.9(-67.2%) | 166.1ms | 28.4(-1.0) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill_prune.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml) | +| YOLOv3-MobileNetV1 | 剪裁+量化 | 608 | - | - | - | - | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml) | diff --git a/PaddleDetection-release-2.6/configs/slim/README_en.md b/PaddleDetection-release-2.6/configs/slim/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..1ccdc86e0b58ef6031929006fa17ffc6220e561e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/README_en.md @@ -0,0 +1,168 @@ +# Model Compression + +In PaddleDetection, a complete tutorial and benchmarks for model compression based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) are provided. Currently supported methods: + +- [pruning](prune) +- [quantitative](quant) +- [distillation](distill) +- [The joint strategy](extensions) + +It is recommended that you use a combination of pruning and distillation training, or use pruning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments. + +## Experimental Environment + +- Python 3.7+ +- PaddlePaddle >= 2.1.0 +- PaddleSlim >= 2.1.0 +- CUDA 10.1+ +- cuDNN >=7.6.5 + +**Version Dependency between PaddleDetection, Paddle and PaddleSlim Version** +| PaddleDetection Version | PaddlePaddle Version | PaddleSlim Version | Note | +| :---------------------: | :------------------: | :----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| release/2.1 | >= 2.1.0 | 2.1 | Quantitative model exports rely on the latest Paddle Develop branch, available in[PaddlePaddle Daily version](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev) | +| release/2.0 | >= 2.0.1 | 2.0 | Quantization depends on Paddle 2.1 and PaddleSlim 2.1 | + + +#### Install PaddleSlim +- Method 1: Install it directly: +``` +pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple +``` +- Method 2: Compile and install: +``` +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd PaddleSlim +python setup.py install +``` + +## Quick Start + +### Train + +```shell +python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. + + +### Evaluation + +```shell +python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. + +### Test + +```shell +python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \ + -o weights=output/{SLIM_CONFIG}/model_final + --infer_img={IMAGE_PATH} +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. +- `--infer_img`: Specifies the test image path. + + +## Full Chain Deployment + +### the model is derived from moving to static + +```shell +python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. + +### prediction and deployment + +- Paddle-Inference Prediction: + - [Python Deployment](../../deploy/python/README.md) + - [C++ Deployment](../../deploy/cpp/README.md) + - [TensorRT Predictive Deployment Tutorial](../../deploy/TENSOR_RT.md) +- Server deployment: Used[PaddleServing](../../deploy/serving/README.md) +- Mobile deployment: Use[Paddle-Lite](../../deploy/lite/README.md) Deploy it on the mobile terminal. + +## Benchmark + +### Pruning + +#### Pascal VOC Benchmark + +| Model | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| :----------------: | :-------------------: | :------------: | :--------------: | :--------: | :------------------: | :--------: | :------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 24.13 | 93 | 608 | 332.0ms | 75.1 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | - | +| YOLOv3-MobileNetV1 | 剪裁-l1_norm(sensity) | 15.78(-34.49%) | 66(-29%) | 608 | - | 78.4(+3.3) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_voc_prune_l1_norm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/yolov3_prune_l1_norm.yml) | + +#### COCO Benchmark +| Mode | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| :-----------------------: | :------------------: | :----: | :--------------: | :--------: | :------------------: | :----: | :---------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | +| PP-YOLO-MobileNetV3_large | baseline | -- | 18.5 | 608 | 25.1ms | 23.2 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | 剪裁-FPGM | -37% | 12.6 | 608 | - | 22.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml) | +| YOLOv3-DarkNet53 | baseline | -- | 238.2 | 608 | - | 39.0 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | 剪裁-FPGM | -24% | - | 608 | - | 37.6 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/yolov3_darknet_prune_fpgm.yml) | +| PP-YOLO_R50vd | baseline | -- | 183.3 | 608 | - | 44.8 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | 剪裁-FPGM | -35% | - | 608 | - | 42.1 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml) | + +Description: +- Currently, all models except RCNN series models are supported. +- The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay. + +### Quantitative + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | Model Volume(MB) | Prediction Delay(V100) | Prediction Delay(SD855) | Box AP | Download | Download of Inference Model | Model Configuration File | Compression Algorithm Configuration File | +| ------------------------- | -------------------------- | ----------- | :--------------: | :--------------------: | :---------------------: | :-------------------: | :-----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | +| PP-YOLOE-l | baseline | 640 | - | 11.2ms(trt_fp32) | 7.7ms(trt_fp16) | -- | 50.9 | [link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | - | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | - | +| PP-YOLOE-l | Common Online quantitative | 640 | - | 6.7ms(trt_int8) | -- | 48.8 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyoloe_l_coco_qat.pdparams) | - | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyoloe_l_qat.yml) | +| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [link](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLOv2_R50vd | PACT Online quantitative | 640 | -- | 17.3ms | -- | 48.1 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml) | +| PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_dcn_1x_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | PACT Online quantitative | 608 | 67.3 | 13.8ms | -- | 44.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolo_r50vd_qat_pact.yml) | +| PP-YOLO-MobileNetV3_large | baseline | 320 | 18.5 | 2.7ms | 27.9ms | 23.2 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | Common Online quantitative | 320 | 5.6 | -- | 25.1ms | 24.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ppyolo_mbv3_large_qat.yml) | +| YOLOv3-MobileNetV1 | baseline | 608 | 94.2 | 8.9ms | 332ms | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Common Online quantitative | 608 | 25.4 | 6.6ms | 248ms | 30.5 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_mobilenet_v1_qat.yml) | +| YOLOv3-MobileNetV3 | baseline | 608 | 90.3 | 9.4ms | 367.2ms | 31.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_large_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | - | +| YOLOv3-MobileNetV3 | PACT Online quantitative | 608 | 24.4 | 8.0ms | 280.0ms | 31.1 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_mobilenet_v3_qat.yml) | +| YOLOv3-DarkNet53 | baseline | 608 | 238.2 | 16.0ms | -- | 39.0 | [link](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet53_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | Common Online quantitative | 608 | 78.8 | 12.4ms | -- | 38.8 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/yolov3_darknet_qat.yml) | +| SSD-MobileNet_v1 | baseline | 300 | 22.5 | 4.4ms | 26.6ms | 73.8 | [link](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_120e_voc.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | - | +| SSD-MobileNet_v1 | Common Online quantitative | 300 | 7.1 | -- | 21.5ms | 72.9 | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/ssd_mobilenet_v1_qat.yml) | +| Mask-ResNet50-FPN | baseline | (800, 1333) | 174.1 | 359.5ms | -- | 39.2/35.6 | [link](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | - | +| Mask-ResNet50-FPN | Common Online quantitative | (800, 1333) | -- | -- | -- | 39.7(+0.5)/35.9(+0.3) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml) | + +Description: +- The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time. +- The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay. + +### Distillation + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | Box AP | Download | Model Configuration File | Compression Strategy Configuration File | +| ------------------ | -------------------- | ---------- | :--------: | :-------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Distillation | 608 | 31.0(+1.6) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slimConfiguration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml) | + +- Please refer to the specific distillation method[Distillation Policy Document](distill/README.md) + +### Distillation Pruning Combined Strategy + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | GFLOPs | Model Volume(MB) | Prediction Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| ------------------ | ------------------------ | ---------- | :----------: | :--------------: | :---------------------: | :--------: | :-------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 24.65 | 94.2 | 332.0ms | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Distillation + Tailoring | 608 | 7.54(-69.4%) | 30.9(-67.2%) | 166.1ms | 28.4(-1.0) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill_prune.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slimConfiguration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml) | diff --git a/PaddleDetection-release-2.6/configs/slim/distill/README.md b/PaddleDetection-release-2.6/configs/slim/distill/README.md new file mode 100644 index 0000000000000000000000000000000000000000..97c93fcc42d3f7d233e5e8794144bfeb8c1cd5b0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/README.md @@ -0,0 +1,212 @@ +# Distillation(蒸馏) + +## 内容 +- [YOLOv3模型蒸馏](#YOLOv3模型蒸馏) +- [FGD模型蒸馏](#FGD模型蒸馏) +- [CWD模型蒸馏](#CWD模型蒸馏) +- [LD模型蒸馏](#LD模型蒸馏) +- [PPYOLOE模型蒸馏](#PPYOLOE模型蒸馏) +- [引用](#引用) + +## YOLOv3模型蒸馏 + +以YOLOv3-MobileNetV1为例,使用YOLOv3-ResNet34作为蒸馏训练的teacher网络, 对YOLOv3-MobileNetV1结构的student网络进行蒸馏。 +COCO数据集作为目标检测任务的训练目标难度更大,意味着teacher网络会预测出更多的背景bbox,如果直接用teacher的预测输出作为student学习的`soft label`会有严重的类别不均衡问题。解决这个问题需要引入新的方法,详细背景请参考论文:[Object detection at 200 Frames Per Second](https://arxiv.org/abs/1805.06361)。 +为了确定蒸馏的对象,我们首先需要找到student和teacher网络得到的`x,y,w,h,cls,objectness`等Tensor,用teacher得到的结果指导student训练。具体实现可参考[代码](../../../ppdet/slim/distill_loss.py) + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| :---------------: | :---------: | :----: | :----: |:-----------: | :--------------: | :------------: | +| YOLOv3-ResNet34 | teacher | 608 | 270e | 36.2 | [config](../../yolov3/yolov3_r34_270e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | +| YOLOv3-MobileNetV1 | student | 608 | 270e | 29.4 | [config](../../yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | +| YOLOv3-MobileNetV1 | distill | 608 | 270e | 31.0(+1.6) | [config](../../yolov3/yolov3_mobilenet_v1_270e_coco.yml),[slim_config](./yolov3_mobilenet_v1_coco_distill.yml) | [download](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams) | + +
    + 快速开始 + +```shell +# 单卡训练(不推荐) +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml +# 评估 +python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams +# 预测 +python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +
    + + +## FGD模型蒸馏 + +FGD全称为[Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837v1),是目标检测任务的一种蒸馏方法,FGD蒸馏分为两个部分`Focal`和`Global`。`Focal`蒸馏分离图像的前景和背景,让学生模型分别关注教师模型的前景和背景部分特征的关键像素;`Global`蒸馏部分重建不同像素之间的关系并将其从教师转移到学生,以补偿`Focal`蒸馏中丢失的全局信息。试验结果表明,FGD蒸馏算法在基于anchor和anchor free的方法上能有效提升模型精度。 +在PaddleDetection中,我们实现了FGD算法,并基于RetinaNet算法进行验证,实验结果如下: + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| ----------------- | ----------- | ------ | :----: | :-----------: | :--------------: | :------------: | +| RetinaNet-ResNet101| teacher | 1333x800 | 2x | 40.6 | [config](../../retinanet/retinanet_r101_fpn_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams) | +| RetinaNet-ResNet50 | student | 1333x800 | 2x | 39.1 | [config](../../retinanet/retinanet_r50_fpn_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco.pdparams) | +| RetinaNet-ResNet50 | FGD | 1333x800 | 2x | 40.8(+1.7) | [config](../../retinanet/retinanet_r50_fpn_2x_coco.yml),[slim_config](./retinanet_resnet101_coco_distill.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams) | + +
    + 快速开始 + +```shell +# 单卡训练(不推荐) +python tools/train.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --slim_config configs/slim/distill/retinanet_resnet101_coco_distill.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --slim_config configs/slim/distill/retinanet_resnet101_coco_distill.yml +# 评估 +python tools/eval.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams +# 预测 +python tools/infer.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +
    + + +## CWD模型蒸馏 + +CWD全称为[Channel-wise Knowledge Distillation for Dense Prediction*](https://arxiv.org/pdf/2011.13256.pdf),通过最小化教师网络与学生网络的通道概率图之间的 Kullback-Leibler (KL) 散度,使得在蒸馏过程更加关注每个通道的最显著的区域,进而提升文本检测与图像分割任务的精度。在PaddleDetection中,我们实现了CWD算法,并基于GFL和RetinaNet模型进行验证,实验结果如下: + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| ----------------- | ----------- | ------ | :----: | :-----------: | :--------------: | :------------: | +| RetinaNet-ResNet101| teacher | 1333x800 | 2x | 40.6 | [config](../../retinanet/retinanet_r101_fpn_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams) | +| RetinaNet-ResNet50 | student | 1333x800 | 2x | 39.1 | [config](../../retinanet/retinanet_r50_fpn_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco.pdparams) | +| RetinaNet-ResNet50 | CWD | 1333x800 | 2x | 40.5(+1.4) | [config](../../retinanet/retinanet_r50_fpn_2x_coco.yml),[slim_config](./retinanet_resnet101_coco_distill_cwd.yml) | [download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco_cwd.pdparams) | +| GFL_ResNet101-vd| teacher | 1333x800 | 2x | 46.8 | [config](../../gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/gfl_r101vd_fpn_mstrain_2x_coco.pdparams) | +| GFL_ResNet50 | student | 1333x800 | 1x | 41.0 | [config](../../gfl/gfl_r50_fpn_1x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_1x_coco.pdparams) | +| GFL_ResNet50 | CWD | 1333x800 | 2x | 44.0(+3.0) | [config](../../gfl/gfl_r50_fpn_1x_coco.yml),[slim_config](./gfl_r101vd_fpn_coco_distill_cwd.yml) | [download](https://bj.bcebos.com/v1/paddledet/models/gfl_r50_fpn_2x_coco_cwd.pdparams) | + +
    + 快速开始 + +```shell +# 单卡训练(不推荐) +python tools/train.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --slim_config configs/slim/distill/retinanet_resnet101_coco_distill_cwd.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --slim_config configs/slim/distill/retinanet_resnet101_coco_distill_cwd.yml +# 评估 +python tools/eval.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco_cwd.pdparams +# 预测 +python tools/infer.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_2x_coco_cwd.pdparams --infer_img=demo/000000014439_640x640.jpg + +# 单卡训练(不推荐) +python tools/train.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config configs/slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config configs/slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml +# 评估 +python tools/eval.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_2x_coco_cwd.pdparams +# 预测 +python tools/infer.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_2x_coco_cwd.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +
    + + +## LD模型蒸馏 + +LD全称为[Localization Distillation for Dense Object Detection](https://arxiv.org/abs/2102.12252),将回归框表示为概率分布,把分类任务的KD用在定位任务上,并且使用因地制宜、分而治之的策略,在不同的区域分别学习分类知识与定位知识。在PaddleDetection中,我们实现了LD算法,并基于GFL模型进行验证,实验结果如下: + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| ----------------- | ----------- | ------ | :----: | :-----------: | :--------------: | :------------: | +| GFL_ResNet101-vd| teacher | 1333x800 | 2x | 46.8 | [config](../../gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/gfl_r101vd_fpn_mstrain_2x_coco.pdparams) | +| GFL_ResNet18-vd | student | 1333x800 | 1x | 36.6 | [config](../../gfl/gfl_r18vd_1x_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/gfl_r18vd_1x_coco.pdparams) | +| GFL_ResNet18-vd | LD | 1333x800 | 1x | 38.2(+1.6) | [config](../../gfl/gfl_slim_ld_r18vd_1x_coco.yml),[slim_config](./gfl_ld_distill.yml) | [download](https://bj.bcebos.com/v1/paddledet/models/gfl_slim_ld_r18vd_1x_coco.pdparams) | + +
    + 快速开始 + +```shell +# 单卡训练(不推荐) +python tools/train.py -c configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml --slim_config configs/slim/distill/gfl_ld_distill.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml --slim_config configs/slim/distill/gfl_ld_distill.yml +# 评估 +python tools/eval.py -c configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/gfl_slim_ld_r18vd_1x_coco.pdparams +# 预测 +python tools/infer.py -c configs/gfl/gfl_slim_ld_r18vd_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/gfl_slim_ld_r18vd_1x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +
    + + +## PPYOLOE模型蒸馏 + +PaddleDetection提供了对PPYOLOE+ 进行模型蒸馏的方案,结合了logits蒸馏和feature蒸馏。 + +| 模型 | 方案 | 输入尺寸 | epochs | Box mAP | 配置文件 | 下载链接 | +| ----------------- | ----------- | ------ | :----: | :-----------: | :--------------: | :------------: | +| PP-YOLOE+_x | teacher | 640 | 80e | 54.7 | [config](../../ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | +| PP-YOLOE+_l | student | 640 | 80e | 52.9 | [config](../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | +| PP-YOLOE+_l | distill | 640 | 80e | **54.0(+1.1)** | [config](../../ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml),[slim_config](./ppyoloe_plus_distill_x_distill_l.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco_distill.pdparams) | +| PP-YOLOE+_l | teacher | 640 | 80e | 52.9 | [config](../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | +| PP-YOLOE+_m | student | 640 | 80e | 49.8 | [config](../../ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | +| PP-YOLOE+_m | distill | 640 | 80e | **51.0(+1.2)** | [config](../../ppyoloe/distill/ppyoloe_plus_crn_m_80e_coco_distill.yml),[slim_config](./ppyoloe_plus_distill_l_distill_m.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_m_80e_coco_distill.pdparams) | + +
    + 快速开始 + +```shell +# 单卡训练(不推荐) +python tools/train.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml --slim_config configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml +# 多卡训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml --slim_config configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml +# 评估 +python tools/eval.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco_distill.pdparams +# 预测 +python tools/infer.py -c configs/ppyoloe/distill/ppyoloe_plus_crn_l_80e_coco_distill.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco_distill.pdparams --infer_img=demo/000000014439_640x640.jpg +``` + +- `-c`: 指定模型配置文件,也是student配置文件。 +- `--slim_config`: 指定压缩策略配置文件,也是teacher配置文件。 + +
    + + +## 引用 +``` +@article{mehta2018object, + title={Object detection at 200 Frames Per Second}, + author={Rakesh Mehta and Cemalettin Ozturk}, + year={2018}, + eprint={1805.06361}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +@inproceedings{yang2022focal, + title={Focal and global knowledge distillation for detectors}, + author={Yang, Zhendong and Li, Zhe and Jiang, Xiaohu and Gong, Yuan and Yuan, Zehuan and Zhao, Danpei and Yuan, Chun}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={4643--4652}, + year={2022} +} + +@Inproceedings{zheng2022LD, + title={Localization Distillation for Dense Object Detection}, + author= {Zheng, Zhaohui and Ye, Rongguang and Wang, Ping and Ren, Dongwei and Zuo, Wangmeng and Hou, Qibin and Cheng, Mingming}, + booktitle={CVPR}, + year={2022} +} + +@inproceedings{shu2021channel, + title={Channel-wise knowledge distillation for dense prediction}, + author={Shu, Changyong and Liu, Yifan and Gao, Jianfei and Yan, Zheng and Shen, Chunhua}, + booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, + pages={5311--5320}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/slim/distill/gfl_ld_distill.yml b/PaddleDetection-release-2.6/configs/slim/distill/gfl_ld_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..2601e99f319e089d34caf912495c87a8fe0fd98c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/gfl_ld_distill.yml @@ -0,0 +1,25 @@ +_BASE_: [ + '../../gfl/gfl_r18vd_1x_coco.yml', +] + +# teacher pretrain model +pretrain_weights: https://paddledet.bj.bcebos.com/models/gfl_r101vd_fpn_mstrain_2x_coco.pdparams + +slim: Distill +slim_method: LD + +ResNet: + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +TrainReader: + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2GFLTarget: + downsample_ratios: [8, 16, 32, 64, 128] + grid_cell_scale: 8 + compute_vlr_region: True \ No newline at end of file diff --git a/PaddleDetection-release-2.6/configs/slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml b/PaddleDetection-release-2.6/configs/slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml new file mode 100644 index 0000000000000000000000000000000000000000..3af5ac17f2c1a4d4f4445944435879ed486301d0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/gfl_r101vd_fpn_coco_distill_cwd.yml @@ -0,0 +1,16 @@ +_BASE_: [ + '../../gfl/gfl_r101vd_fpn_mstrain_2x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/gfl_r101vd_fpn_mstrain_2x_coco.pdparams + +slim: Distill +slim_method: CWD +distill_loss: CWDFeatureLoss +distill_loss_name: ['cls_f_4', 'cls_f_3', 'cls_f_2', 'cls_f_1', 'cls_f_0'] + +CWDFeatureLoss: + student_channels: 80 + teacher_channels: 80 + tau: 1.0 + weight: 5.0 diff --git a/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_l_distill_m.yml b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_l_distill_m.yml new file mode 100644 index 0000000000000000000000000000000000000000..0a5bfcd29cc352bb67f48a33a28f4531728543e5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_l_distill_m.yml @@ -0,0 +1,53 @@ +# teacher and slim config +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml', +] +depth_mult: 1.0 +width_mult: 1.0 +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams +find_unused_parameters: True + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +slim: Distill +slim_method: PPYOLOEDistill +distill_loss: DistillPPYOLOELoss + +DistillPPYOLOELoss: # L -> M + loss_weight: {'logits': 4.0, 'feat': 1.0} + logits_distill: True + logits_loss_weight: {'class': 1.0, 'iou': 2.5, 'dfl': 0.5} + logits_ld_distill: True + logits_ld_params: {'weight': 20000, 'T': 10} + feat_distill: True + feat_distiller: 'fgd' # ['cwd', 'fgd', 'pkd', 'mgd', 'mimic'] + feat_distill_place: 'neck_feats' + teacher_width_mult: 1.0 # L + student_width_mult: 0.75 # M + feat_out_channels: [768, 384, 192] # The actual channel will multiply width_mult diff --git a/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_m_distill_s.yml b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_m_distill_s.yml new file mode 100644 index 0000000000000000000000000000000000000000..8ee944e9b91394b7bcfbf89a9610c302d803c9bb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_m_distill_s.yml @@ -0,0 +1,53 @@ +# teacher and slim config +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml', +] +depth_mult: 0.67 +width_mult: 0.75 +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams +find_unused_parameters: True + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +slim: Distill +slim_method: PPYOLOEDistill +distill_loss: DistillPPYOLOELoss + +DistillPPYOLOELoss: # M -> S + loss_weight: {'logits': 4.0, 'feat': 1.0} + logits_distill: True + logits_loss_weight: {'class': 1.0, 'iou': 2.5, 'dfl': 0.5} + logits_ld_distill: True + logits_ld_params: {'weight': 20000, 'T': 10} + feat_distill: True + feat_distiller: 'fgd' # ['cwd', 'fgd', 'pkd', 'mgd', 'mimic'] + feat_distill_place: 'neck_feats' + teacher_width_mult: 0.75 # M + student_width_mult: 0.5 # S + feat_out_channels: [768, 384, 192] # The actual channel will multiply width_mult diff --git a/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml new file mode 100644 index 0000000000000000000000000000000000000000..55d3c4c9f08ae373e764062c3820f3823710e1d7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/ppyoloe_plus_distill_x_distill_l.yml @@ -0,0 +1,53 @@ +# teacher and slim config +_BASE_: [ + '../../ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml', +] +depth_mult: 1.33 +width_mult: 1.25 +for_distill: True +architecture: PPYOLOE +PPYOLOE: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams +find_unused_parameters: True + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: True + drop_last: True + use_shared_memory: True + collate_batch: True + + +slim: Distill +slim_method: PPYOLOEDistill +distill_loss: DistillPPYOLOELoss + +DistillPPYOLOELoss: # X -> L + loss_weight: {'logits': 4.0, 'feat': 1.0} + logits_distill: True + logits_loss_weight: {'class': 1.0, 'iou': 2.5, 'dfl': 0.5} + logits_ld_distill: True + logits_ld_params: {'weight': 20000, 'T': 10} + feat_distill: True + feat_distiller: 'fgd' # ['cwd', 'fgd', 'pkd', 'mgd', 'mimic'] + feat_distill_place: 'neck_feats' + teacher_width_mult: 1.25 # X + student_width_mult: 1.0 # L + feat_out_channels: [768, 384, 192] # The actual channel will multiply width_mult diff --git a/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill.yml b/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..d4793c02063d8159e5277e705a58cc0b423d94ea --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../../retinanet/retinanet_r101_fpn_2x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams + +slim: Distill +slim_method: FGD +distill_loss: FGDFeatureLoss +distill_loss_name: ['neck_f_4', 'neck_f_3', 'neck_f_2', 'neck_f_1', 'neck_f_0'] + +FGDFeatureLoss: + student_channels: 256 + teacher_channels: 256 + temp: 0.5 + alpha_fgd: 0.001 + beta_fgd: 0.0005 + gamma_fgd: 0.0005 + lambda_fgd: 0.000005 diff --git a/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill_cwd.yml b/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill_cwd.yml new file mode 100644 index 0000000000000000000000000000000000000000..7087b85d040cc32ca366701663b29416c7547d01 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/retinanet_resnet101_coco_distill_cwd.yml @@ -0,0 +1,17 @@ +_BASE_: [ + '../../retinanet/retinanet_r101_fpn_2x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams + + +slim: Distill +slim_method: CWD +distill_loss: CWDFeatureLoss +distill_loss_name: ['cls_f_4', 'cls_f_3', 'cls_f_2', 'cls_f_1', 'cls_f_0'] + +CWDFeatureLoss: + student_channels: 80 + teacher_channels: 80 + tau: 1.0 + weight: 5.0 diff --git a/PaddleDetection-release-2.6/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml b/PaddleDetection-release-2.6/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..9998dec5620adac38fd8a487f7ad1ec6aeb055dd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml @@ -0,0 +1,12 @@ +_BASE_: [ + '../../yolov3/yolov3_r34_270e_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams + + +slim: Distill +distill_loss: DistillYOLOv3Loss + +DistillYOLOv3Loss: + weight: 1000 diff --git a/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml b/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml new file mode 100644 index 0000000000000000000000000000000000000000..f86fac5e9ed0f291c5b3f9b6266ac5755807422c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml @@ -0,0 +1,24 @@ +_BASE_: [ + '../../yolov3/yolov3_r34_270e_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams + +slim: DistillPrune + +distill_loss: DistillYOLOv3Loss + +DistillYOLOv3Loss: + weight: 1000 + +pruner: Pruner + +Pruner: + criterion: l1_norm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0', + 'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0', + 'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0', + 'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0'] + pruned_ratios: [0.5,0.5,0.5,0.5,0.5,0.5,0.7,0.7,0.7,0.7,0.7,0.7,0.8,0.8,0.8,0.8,0.8,0.8] diff --git a/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml b/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..ff17ea0b4126d934b851df60cda2db2e17fbbae2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml @@ -0,0 +1,19 @@ +# Weights of yolov3_mobilenet_v1_voc +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams +slim: PrunerQAT + +PrunerQAT: + criterion: fpgm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0', + 'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0', + 'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0', + 'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0'] + pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3] + print_prune_params: False + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_qat_model: True diff --git a/PaddleDetection-release-2.6/configs/slim/ofa/ofa_picodet_demo.yml b/PaddleDetection-release-2.6/configs/slim/ofa/ofa_picodet_demo.yml new file mode 100644 index 0000000000000000000000000000000000000000..a5ade9e3168bb0e6ecb68fabc47d98d789d4ae7d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/ofa/ofa_picodet_demo.yml @@ -0,0 +1,85 @@ +weights: https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x1_0_pretrained.pdparams +slim: OFA +OFA: + ofa_config: + task: expand_ratio + expand_ratio: [0.5, 1] + + skip_neck: True + skip_head: True + + RunConfig: + # Skip the output layer of each block by layer name + skip_layers: ['backbone._conv1._conv','backbone.2_1._conv_linear_1._conv', + 'backbone.2_1._conv_linear_2._conv', 'backbone.2_1._conv_dw_mv1._conv', + 'backbone.2_1._conv_pw_mv1._conv', 'backbone.2_2._conv_linear._conv', + 'backbone.2_3._conv_linear._conv', 'backbone.3_1._conv_linear_1._conv', + 'backbone.3_1._conv_linear_2._conv', 'backbone.3_1._conv_dw_mv1._conv', + 'backbone.3_1._conv_pw_mv1._conv', 'backbone.3_2._conv_linear._conv', + 'backbone.3_3._conv_linear._conv', 'backbone.3_4._conv_linear._conv', + 'backbone.3_5._conv_linear._conv', 'backbone.3_6._conv_linear._conv', + 'backbone.3_7._conv_linear._conv', 'backbone.4_1._conv_linear_1._conv', + 'backbone.4_1._conv_linear_2._conv', 'backbone.4_1._conv_dw_mv1._conv', + 'backbone.4_1._conv_pw_mv1._conv', 'backbone.4_2._conv_linear._conv', + 'backbone.4_3._conv_linear._conv'] + + # For block-wise search, make layers in each block in the same search space + same_search_space: [ + ['backbone.2_1._conv_dw_1._conv', 'backbone.2_1._conv_pw_2._conv', + 'backbone.2_1._conv_dw_2._conv', 'backbone.2_1._se.conv1', 'backbone.2_1._se.conv2'], + ['backbone.2_2._conv_pw._conv', 'backbone.2_2._conv_dw._conv', + 'backbone.2_2._se.conv1', 'backbone.2_2._se.conv2'], + ['backbone.2_3._conv_pw._conv', 'backbone.2_3._conv_dw._conv', + 'backbone.2_3._se.conv1', 'backbone.2_3._se.conv2'], + ['backbone.3_1._conv_dw_1._conv', 'backbone.3_1._conv_pw_2._conv', + 'backbone.3_1._conv_dw_2._conv', 'backbone.3_1._se.conv1', 'backbone.3_1._se.conv2'], + ['backbone.3_2._conv_pw._conv', 'backbone.3_2._conv_dw._conv', + 'backbone.3_2._se.conv1', 'backbone.3_2._se.conv2'], + ['backbone.3_3._conv_pw._conv', 'backbone.3_3._conv_dw._conv', + 'backbone.3_3._se.conv1', 'backbone.3_3._se.conv2'], + ['backbone.3_4._conv_pw._conv', 'backbone.3_4._conv_dw._conv', + 'backbone.3_4._se.conv1', 'backbone.3_4._se.conv2'], + ['backbone.3_5._conv_pw._conv', 'backbone.3_5._conv_dw._conv', + 'backbone.3_5._se.conv1', 'backbone.3_5._se.conv2'], + ['backbone.3_6._conv_pw._conv', 'backbone.3_6._conv_dw._conv', + 'backbone.3_6._se.conv1', 'backbone.3_6._se.conv2'], + ['backbone.3_7._conv_pw._conv', 'backbone.3_7._conv_dw._conv', + 'backbone.3_7._se.conv1', 'backbone.3_7._se.conv2'], + ['backbone.4_1._conv_dw_1._conv', 'backbone.4_1._conv_pw_2._conv', + 'backbone.4_1._conv_dw_2._conv', 'backbone.4_1._se.conv1', 'backbone.4_1._se.conv2'], + ['backbone.4_2._conv_pw._conv', 'backbone.4_2._conv_dw._conv', + 'backbone.4_2._se.conv1', 'backbone.4_2._se.conv2'], + ['backbone.4_3._conv_pw._conv', 'backbone.4_3._conv_dw._conv', + 'backbone.4_3._se.conv1', 'backbone.4_3._se.conv2']] + + # demo expand ratio + # Generally, for expand ratio, float in (0, 1] is available. + # But please be careful if the model is complicated. + # For picodet, there are many split and concat, the choice of channel number is important. + ofa_layers: + 'backbone.2_1._conv_dw_1._conv': + 'expand_ratio': [0.5, 1] + 'backbone.2_2._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.2_3._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_1._conv_dw_1._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_2._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_3._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_4._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_5._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_6._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.3_7._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.4_1._conv_dw_1._conv': + 'expand_ratio': [0.5, 1] + 'backbone.4_2._conv_pw._conv': + 'expand_ratio': [0.5, 1] + 'backbone.4_3._conv_pw._conv': + 'expand_ratio': [0.5, 1] diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..d715aedffe2dd5e15bdb222a74aa35bc273d2240 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/mcfairmot_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/mcfairmot_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..7ab8e38b9715aa10e5d38a84fa15a033c9ee919f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/mcfairmot_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/picodet_s_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/picodet_s_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1cf3ca6ab23accabf91b0d7294c0ab48accf693 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/picodet_s_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..712651fa58f6eca6907d4530caac2c0a2dde3551 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..e829d271598b3cf4243bbd724a7955c6544253e9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..dfa793d528a63255fce62c6d1c94a594fee58853 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/tinypose_128x96_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/tinypose_128x96_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..a3bd64761fac679d83bbbdb4011ea3ab327ad3f9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/tinypose_128x96_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/post_quant/yolov3_darknet53_ptq.yml b/PaddleDetection-release-2.6/configs/slim/post_quant/yolov3_darknet53_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..7715b082a171b42dc8efe624be54eac74a003e68 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/post_quant/yolov3_darknet53_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/PaddleDetection-release-2.6/configs/slim/prune/faster_rcnn_r50_fpn_prune_fpgm.yml b/PaddleDetection-release-2.6/configs/slim/prune/faster_rcnn_r50_fpn_prune_fpgm.yml new file mode 100644 index 0000000000000000000000000000000000000000..e86c17f04c5f7510ba95c1b09e51832dc49224bb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/faster_rcnn_r50_fpn_prune_fpgm.yml @@ -0,0 +1,16 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams +slim: Pruner + +Pruner: + criterion: fpgm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_33.w_0', 'conv2d_34.w_0', 'conv2d_35.w_0', + 'conv2d_36.w_0', 'conv2d_37.w_0', 'conv2d_38.w_0', + 'conv2d_39.w_0', 'conv2d_40.w_0', 'conv2d_41.w_0', + 'conv2d_42.w_0', 'conv2d_43.w_0', 'conv2d_44.w_0', + 'conv2d_45.w_0', 'conv2d_46.w_0', 'conv2d_47.w_0', + 'conv2d_48.w_0', 'conv2d_49.w_0', 'conv2d_50.w_0', + 'conv2d_51.w_0', 'conv2d_52.w_0'] + pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4] + print_params: False diff --git a/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_75.yml b/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_75.yml new file mode 100644 index 0000000000000000000000000000000000000000..94345b4e8839d347d0a9ae3eae0337af41f8add3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_75.yml @@ -0,0 +1,11 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_m_320_coco.pdparams +slim: UnstructuredPruner + +UnstructuredPruner: + stable_epochs: 0 + pruning_epochs: 150 + tunning_epochs: 150 + pruning_steps: 300 + ratio: 0.75 + initial_ratio: 0.15 + prune_params_type: conv1x1_only diff --git a/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_85.yml b/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_85.yml new file mode 100644 index 0000000000000000000000000000000000000000..db0af7e1087a63cf9891e9ab142f2c331e1443e0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/picodet_m_unstructured_prune_85.yml @@ -0,0 +1,11 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_m_320_coco.pdparams +slim: UnstructuredPruner + +UnstructuredPruner: + stable_epochs: 0 + pruning_epochs: 150 + tunning_epochs: 150 + pruning_steps: 300 + ratio: 0.85 + initial_ratio: 0.20 + prune_params_type: conv1x1_only diff --git a/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml b/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml new file mode 100644 index 0000000000000000000000000000000000000000..b9cecb3163f978b33123499f241ceb88fd05a688 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml @@ -0,0 +1,9 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +slim: Pruner + +Pruner: + criterion: fpgm + pruned_params: ['conv2d_62.w_0', 'conv2d_63.w_0', 'conv2d_64.w_0', + 'conv2d_65.w_0', 'conv2d_66.w_0', 'conv2d_67.w_0'] + pruned_ratios: [0.75, 0.75, 0.75, 0.75, 0.75, 0.75] + print_params: True diff --git a/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml b/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml new file mode 100644 index 0000000000000000000000000000000000000000..00dc57ae9759bb32f774a1852b629cdcac2c1b4a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml @@ -0,0 +1,13 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +slim: Pruner + +Pruner: + criterion: fpgm + pruned_params: ['conv2d_56.w_0', 'conv2d_57.w_0', 'conv2d_58.w_0', + 'conv2d_59.w_0', 'conv2d_60.w_0', 'conv2d_61.w_0', + 'conv2d_63.w_0', 'conv2d_64.w_0', 'conv2d_65.w_0', + 'conv2d_66.w_0', 'conv2d_67.w_0', 'conv2d_68.w_0', + 'conv2d_70.w_0', 'conv2d_71.w_0', 'conv2d_72.w_0', + 'conv2d_73.w_0', 'conv2d_74.w_0', 'conv2d_75.w_0'] + pruned_ratios: [0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.875,0.875,0.875,0.875,0.875,0.875] + print_params: False diff --git a/PaddleDetection-release-2.6/configs/slim/prune/yolov3_darknet_prune_fpgm.yml b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_darknet_prune_fpgm.yml new file mode 100644 index 0000000000000000000000000000000000000000..850fefb956431cbb15fc20f58fd868171722ff3c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_darknet_prune_fpgm.yml @@ -0,0 +1,13 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +slim: Pruner + +Pruner: + criterion: fpgm + pruned_params: ['conv2d_52.w_0', 'conv2d_53.w_0', 'conv2d_54.w_0', + 'conv2d_55.w_0', 'conv2d_56.w_0', 'conv2d_57.w_0', + 'conv2d_59.w_0', 'conv2d_60.w_0', 'conv2d_61.w_0', + 'conv2d_62.w_0', 'conv2d_63.w_0', 'conv2d_64.w_0', + 'conv2d_66.w_0', 'conv2d_67.w_0', 'conv2d_68.w_0', + 'conv2d_69.w_0', 'conv2d_70.w_0', 'conv2d_71.w_0'] + pruned_ratios: [0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.875,0.875,0.875,0.875,0.875,0.875] + print_params: True diff --git a/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_fpgm.yml b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_fpgm.yml new file mode 100644 index 0000000000000000000000000000000000000000..f3745386823a45a970d077d3201baffa3665490b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_fpgm.yml @@ -0,0 +1,14 @@ +# Weights of yolov3_mobilenet_v1_voc +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams +slim: Pruner + +Pruner: + criterion: fpgm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0', + 'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0', + 'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0', + 'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0'] + pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3] + print_params: False diff --git a/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_l1_norm.yml b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_l1_norm.yml new file mode 100644 index 0000000000000000000000000000000000000000..5b4f4667f2285cd73907df12aa1bd0f446a0f5c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/prune/yolov3_prune_l1_norm.yml @@ -0,0 +1,14 @@ +# Weights of yolov3_mobilenet_v1_voc +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams +slim: Pruner + +Pruner: + criterion: l1_norm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0', + 'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0', + 'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0', + 'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0'] + pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3] + print_params: False diff --git a/PaddleDetection-release-2.6/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..7363b4e55245024d5534a805be66301ca8b720fb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml @@ -0,0 +1,22 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + + +epoch: 5 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [3, 4] + - !LinearWarmup + start_factor: 0.001 + steps: 100 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_416_lcnet_quant.yml b/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_416_lcnet_quant.yml new file mode 100644 index 0000000000000000000000000000000000000000..000807ab6b138ca8f28440f97b44809e75a9ac3d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_416_lcnet_quant.yml @@ -0,0 +1,22 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: False + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.024 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_quant.yml b/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_quant.yml new file mode 100644 index 0000000000000000000000000000000000000000..099532ffc5c3791644ceda25db8c1f4581762d61 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/picodet_s_quant.yml @@ -0,0 +1,26 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: False + +epoch: 50 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 40 + - !LinearWarmup + start_factor: 0. + steps: 100 + +TrainReader: + batch_size: 96 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_mbv3_large_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_mbv3_large_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..2b2ddf90e7b36d60fbbd75f1b7beb6e7ffac9685 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_mbv3_large_qat.yml @@ -0,0 +1,16 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.99, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +PPYOLOFPN: + in_channels: [160, 368] + coord_conv: true + conv_block_num: 0 + spp: true + drop_block: false diff --git a/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_r50vd_qat_pact.yml b/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_r50vd_qat_pact.yml new file mode 100644 index 0000000000000000000000000000000000000000..fb6d98841040a13221ac8cba3acecc6236a6cb03 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/ppyolo_r50vd_qat_pact.yml @@ -0,0 +1,39 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 50 + +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 45 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +PPYOLOFPN: + coord_conv: true + block_size: 3 + keep_prob: 0.9 + spp: true + drop_block: false diff --git a/PaddleDetection-release-2.6/configs/slim/quant/ppyoloe_l_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/ppyoloe_l_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..4c0e94003a6ed0b7dde95ecd1f2361b87c61b4c8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/ppyoloe_l_qat.yml @@ -0,0 +1,26 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 30 +snapshot_epoch: 5 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 10 + - 20 + - !LinearWarmup + start_factor: 0. + steps: 100 + +TrainReader: + batch_size: 8 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..d218e1edcbdb3597f43650467b98839c7d5e28c2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml @@ -0,0 +1,33 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 50 +snapshot_epoch: 8 +LearningRate: + base_lr: 0.0005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 45 + - !LinearWarmup + start_factor: 0. + steps: 2000 + +TrainReader: + batch_size: 8 + +PPYOLOPAN: + drop_block: false + block_size: 3 + keep_prob: 0.9 + spp: true diff --git a/PaddleDetection-release-2.6/configs/slim/quant/ssd_mobilenet_v1_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/ssd_mobilenet_v1_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..05e068368fced56bdd3298323cf901dbbe29f925 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/ssd_mobilenet_v1_qat.yml @@ -0,0 +1,9 @@ +pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ssd_mobilenet_v1_300_120e_voc.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True diff --git a/PaddleDetection-release-2.6/configs/slim/quant/tinypose_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/tinypose_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..3b85dfe55d226d2514bf11c530abb8df1abf8664 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/tinypose_qat.yml @@ -0,0 +1,26 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: False + +epoch: 50 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 40 + - !LinearWarmup + start_factor: 0. + steps: 100 + +TrainReader: + batch_size: 256 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/yolov3_darknet_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_darknet_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..281b53418c215751470082794ef4c8d8b0d529e7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_darknet_qat.yml @@ -0,0 +1,31 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 50 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 45 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v1_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v1_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..d1452082983ced70d1709343cd42017d8a19d361 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v1_qat.yml @@ -0,0 +1,10 @@ +# Weights of yolov3_mobilenet_v1_coco +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True diff --git a/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v3_qat.yml b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v3_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..8e83f27aa92788a5a1ef1e0caa17ee9cc143bd4c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/slim/quant/yolov3_mobilenet_v3_qat.yml @@ -0,0 +1,24 @@ +# Weights of yolov3_mobilenet_v3_coco +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 50 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 35 + - 45 + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/smalldet/DataDownload.md b/PaddleDetection-release-2.6/configs/smalldet/DataDownload.md new file mode 100644 index 0000000000000000000000000000000000000000..73189056ea15b39e20fec31dbb0968b0ce4730e7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/DataDownload.md @@ -0,0 +1,99 @@ +# 小目标数据集下载汇总 + +## 目录 +- [数据集准备](#数据集准备) + - [VisDrone-DET](#VisDrone-DET) + - [DOTA水平框](#DOTA水平框) + - [Xview](#Xview) + - [用户自定义数据集](#用户自定义数据集) + +## 数据集准备 + +### VisDrone-DET + +VisDrone-DET是一个无人机航拍场景的小目标数据集,整理后的COCO格式VisDrone-DET数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone.zip),切图后的COCO格式数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone_sliced.zip),检测其中的**10类**,包括 `pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10)`,原始数据集[下载链接](https://github.com/VisDrone/VisDrone-Dataset)。 +具体使用和下载请参考[visdrone](../visdrone)。 + +### DOTA水平框 + +DOTA是一个大型的遥感影像公开数据集,这里使用**DOTA-v1.0**水平框数据集,切图后整理的COCO格式的DOTA水平框数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/dota_sliced.zip),检测其中的**15类**, +包括 `plane(0), baseball-diamond(1), bridge(2), ground-track-field(3), small-vehicle(4), large-vehicle(5), ship(6), tennis-court(7),basketball-court(8), storage-tank(9), soccer-ball-field(10), roundabout(11), harbor(12), swimming-pool(13), helicopter(14)`, +图片及原始数据集[下载链接](https://captain-whu.github.io/DOAI2019/dataset.html)。 + +### Xview + +Xview是一个大型的航拍遥感检测数据集,目标极小极多,切图后整理的COCO格式数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/xview_sliced.zip),检测其中的**60类**, +具体类别为: + +
    + +`Fixed-wing Aircraft(0), +Small Aircraft(1), +Cargo Plane(2), +Helicopter(3), +Passenger Vehicle(4), +Small Car(5), +Bus(6), +Pickup Truck(7), +Utility Truck(8), +Truck(9), +Cargo Truck(10), +Truck w/Box(11), +Truck Tractor(12), +Trailer(13), +Truck w/Flatbed(14), +Truck w/Liquid(15), +Crane Truck(16), +Railway Vehicle(17), +Passenger Car(18), +Cargo Car(19), +Flat Car(20), +Tank car(21), +Locomotive(22), +Maritime Vessel(23), +Motorboat(24), +Sailboat(25), +Tugboat(26), +Barge(27), +Fishing Vessel(28), +Ferry(29), +Yacht(30), +Container Ship(31), +Oil Tanker(32), +Engineering Vehicle(33), +Tower crane(34), +Container Crane(35), +Reach Stacker(36), +Straddle Carrier(37), +Mobile Crane(38), +Dump Truck(39), +Haul Truck(40), +Scraper/Tractor(41), +Front loader/Bulldozer(42), +Excavator(43), +Cement Mixer(44), +Ground Grader(45), +Hut/Tent(46), +Shed(47), +Building(48), +Aircraft Hangar(49), +Damaged Building(50), +Facility(51), +Construction Site(52), +Vehicle Lot(53), +Helipad(54), +Storage Tank(55), +Shipping container lot(56), +Shipping Container(57), +Pylon(58), +Tower(59) +` + +
    + +,原始数据集[下载链接](https://challenge.xviewdataset.org/)。 + + +### 用户自定义数据集 + +用户自定义数据集准备请参考[DET数据集标注工具](../../docs/tutorials/data/DetAnnoTools.md)和[DET数据集准备教程](../../docs/tutorials/data/PrepareDetDataSet.md)去准备。 diff --git a/PaddleDetection-release-2.6/configs/smalldet/README.md b/PaddleDetection-release-2.6/configs/smalldet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..db9c6366c210014a4e5c678bb2aca9d4365b47b4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/README.md @@ -0,0 +1,424 @@ +# PP-YOLOE-SOD 小目标检测模型(PP-YOLOE Small Object Detection) + +VisDroneVisDroneDOTAXview + +## 内容 +- [简介](#简介) +- [切图使用说明](#切图使用说明) + - [小目标数据集下载](#小目标数据集下载) + - [统计数据集分布](#统计数据集分布) + - [SAHI切图](#SAHI切图) +- [模型库](#模型库) + - [VisDrone模型](#VisDrone模型) + - [COCO模型](#COCO模型) + - [切图模型](#切图模型) + - [拼图模型](#拼图模型) + - [注意事项](#注意事项) +- [模型库使用说明](#模型库使用说明) + - [训练](#训练) + - [评估](#评估) + - [预测](#预测) + - [部署](#部署) +- [引用](#引用) + + +## 简介 +PaddleDetection团队提供了针对VisDrone-DET、DOTA水平框、Xview等小目标场景数据集的基于PP-YOLOE改进的检测模型 PP-YOLOE-SOD,以及提供了一套使用[SAHI](https://github.com/obss/sahi)(Slicing Aided Hyper Inference)工具的切图和拼图的方案。 + + - PP-YOLOE-SOD 是PaddleDetection团队自研的小目标检测特色模型,使用**数据集分布相关的基于向量的DFL算法** 和 **针对小目标优化的中心先验优化策略**,并且**在模型的Neck(FPN)结构中加入Transformer模块**,以及结合增加P2层、使用large size等策略,最终在多个小目标数据集上达到极高的精度。 + + - 切图拼图方案**适用于任何检测模型**,建议**使用 PP-YOLOE-SOD 结合切图拼图方案**一起使用以达到最佳的效果。 + + - 官方 AI Studio 教程案例请参考 [基于PP-YOLOE-SOD的无人机航拍图像检测案例全流程实操](https://aistudio.baidu.com/aistudio/projectdetail/5036782),欢迎一起动手实践学习。 + + - 第三方 AI Studio 教程案例可参考 [PPYOLOE:遥感场景下的小目标检测与部署(切图版)](https://aistudio.baidu.com/aistudio/projectdetail/4493701) 和 [涨分神器!基于PPYOLOE的切图和拼图解决方案](https://aistudio.baidu.com/aistudio/projectdetail/4438275),欢迎一起动手实践学习。 + +**注意:** + - **不通过切图拼图而直接使用原图或子图**去训练评估预测,推荐使用 PP-YOLOE-SOD 模型,更多细节和消融实验可参照[COCO模型](#COCO模型)和[VisDrone模型](./visdrone)。 + - 是否需要切图然后使用子图去**训练**,建议首先参照[切图使用说明](#切图使用说明)中的[统计数据集分布](#统计数据集分布)分析一下数据集再确定,一般数据集中**所有的目标均极小**的情况下推荐切图去训练。 + - 是否需要切图然后使用子图去**预测**,建议在切图训练的情况下,配合着**同样操作的切图策略和参数**去预测(inference)效果更佳。但其实即便不切图训练,也可进行切图预测(inference),只需**在常规的预测命令最后加上`--slice_infer`以及相关子图参数**即可。 + - 是否需要切图然后使用子图去**评估**,建议首先确保制作生成了合适的子图验证集,以及确保对应的标注框制作无误,并需要参照[模型库使用说明-评估](#评估)去**改动配置文件中的验证集(EvalDataset)的相关配置**,然后**在常规的评估命令最后加上`--slice_infer`以及相关子图参数**即可。 + - `--slice_infer`的操作在PaddleDetection中默认**子图预测框会自动组合并拼回原图**,默认返回的是原图上的预测框,此方法也**适用于任何训好的检测模型**,无论是否切图训练。 + + +## 切图使用说明 + +### 小目标数据集下载 +PaddleDetection团队整理提供的VisDrone-DET、DOTA水平框、Xview等小目标场景数据集的下载链接可以参照 [DataDownload.md](./DataDownload.md)。 + +### 统计数据集分布 + +对于待训的数据集(默认已处理为COCO格式,参照 [COCO格式数据集准备](../../docs/tutorials/data/PrepareDetDataSet.md#用户数据转成COCO数据),首先统计**标注框的平均宽高占图片真实宽高的比例**分布: + +以DOTA水平框数据集的train数据集为例: + +```bash +python tools/box_distribution.py --json_path dataset/DOTA/annotations/train.json --out_img box_distribution.jpg --eval_size 640 --small_stride 8 +``` + - `--json_path` :待统计数据集 COCO 格式 annotation 的json标注文件路径 + - `--out_img` :输出的统计分布图的路径 + - `--eval_size` :推理尺度(默认640) + - `--small_stride` :模型最小步长(默认8) + +统计结果打印如下: +```bash +Suggested reg_range[1] is 13 # DFL算法中推荐值,在 PP-YOLOE-SOD 模型的配置文件的head中设置为此值,效果最佳 +Mean of all img_w is 2304.3981547196595 # 原图宽的平均值 +Mean of all img_h is 2180.9354151880766 # 原图高的平均值 +Median of ratio_w is 0.03799439775910364 # 标注框的宽与原图宽的比例的中位数 +Median of ratio_h is 0.04074914637387802 # 标注框的高与原图高的比例的中位数 +all_img with box: 1409 # 数据集图片总数(排除无框或空标注的图片) +all_ann: 98905 # 数据集标注框总数 +Distribution saved as box_distribution.jpg +``` + +**注意:** +- 一般情况下,在原始数据集全部有标注框的图片中,**原图宽高的平均值大于1500像素,且有1/2以上的图片标注框的平均宽高与原图宽高比例小于0.04时(通过打印中位数得到该值)**,建议进行切图训练。 +- `Suggested reg_range[1]` 为数据集在优化后DFL算法中推荐的`reg_range`上限,即`reg_max + 1`,在 PP-YOLOE-SOD 模型的配置文件的head中设置这个值。 + + +### SAHI切图 + +针对需要切图的数据集,使用[SAHI](https://github.com/obss/sahi)库进行切图: + +#### 安装SAHI库: + +参考[SAHI installation](https://github.com/obss/sahi/blob/main/README.md#installation)进行安装,`pip install sahi`,参考[installation](https://github.com/obss/sahi/blob/main/README.md#installation)。 + +#### 基于SAHI切图 + +以DOTA水平框数据集的train数据集为例,切分后的**子图文件夹**与**子图json标注文件**共同保存在`dota_sliced`文件夹下,分别命名为`train_images_500_025`、`train_500_025.json`: + +```bash +python tools/slice_image.py --image_dir dataset/DOTA/train/ --json_path dataset/DOTA/annotations/train.json --output_dir dataset/dota_sliced --slice_size 500 --overlap_ratio 0.25 +``` + - `--image_dir`:原始数据集图片文件夹的路径 + - `--json_path`:原始数据集COCO格式的json标注文件的路径 + - `--output_dir`:切分后的子图及其json标注文件保存的路径 + - `--slice_size`:切分以后子图的边长尺度大小(默认切图后为正方形) + - `--overlap_ratio`:切分时的子图之间的重叠率 + +**注意:** +- 如果切图然后使用子图去**训练**,则只能**离线切图**,即切完图后保存成子图,存放在内存空间中。 +- 如果切图然后使用子图去**评估或预测**,则既可以**离线切图**,也可以**在线切图**,PaddleDetection中支持切图并自动拼图组合结果到原图上。 + + +## 模型库 + +### [VisDrone模型](visdrone/) + +| 模型 | COCOAPI mAPval
    0.5:0.95 | COCOAPI mAPval
    0.5 | COCOAPI mAPtest_dev
    0.5:0.95 | COCOAPI mAPtest_dev
    0.5 | MatlabAPI mAPtest_dev
    0.5:0.95 | MatlabAPI mAPtest_dev
    0.5 | 下载 | 配置文件 | +|:---------|:------:|:------:| :----: | :------:| :------: | :------:| :----: | :------:| +|PP-YOLOE-s| 23.5 | 39.9 | 19.4 | 33.6 | 23.68 | 40.66 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_s_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-s| 24.4 | 41.6 | 20.1 | 34.7 | 24.55 | 42.19 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_p2_alpha_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-s**| **25.1** | **42.8** | **20.7** | **36.2** | **25.16** | **43.86** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_s_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_sod_crn_s_80e_visdrone.yml) | +|PP-YOLOE-l| 29.2 | 47.3 | 23.5 | 39.1 | 28.00 | 46.20 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_l_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-l| 30.1 | 48.9 | 24.3 | 40.8 | 28.47 | 48.16 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-l**| **31.9** | **52.1** | **25.6** | **43.5** | **30.25** | **51.18** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml) | +|PP-YOLOE-Alpha-largesize-l| 41.9 | 65.0 | 32.3 | 53.0 | 37.13 | 61.15 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_alpha_largesize_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-largesize-l| 41.3 | 64.5 | 32.4 | 53.1 | 37.49 | 51.54 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml) | +|PP-YOLOE+_largesize-l | 43.3 | 66.7 | 33.5 | 54.7 | 38.24 | 62.76 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_largesize_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_crn_l_largesize_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-largesize-l** | 42.7 | 65.9 | **33.6** | **55.1** | **38.4** | **63.07** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml) | + +**注意:** + - 上表中的模型均为**使用原图训练**,也**使用原图评估预测**,AP精度均为**原图验证集**上评估的结果。 + - VisDrone-DET数据集**可使用原图训练,也可使用切图后训练**,通过数据集统计分布分析,推荐使用**原图训练**,推荐直接使用带**SOD**的模型配置文件去训练评估和预测部署,在显卡算力有限时也可使用切图后训练。 + - 上表中的模型指标均是使用VisDrone-DET的train子集作为训练集,使用VisDrone-DET的val子集和test_dev子集作为验证集。 + - **SOD**表示使用**基于向量的DFL算法**和针对小目标的**中心先验优化策略**,并**在模型的Neck结构中加入transformer**。 + - **P2**表示增加P2层(1/4下采样层)的特征,共输出4个PPYOLOEHead。 + - **Alpha**表示对CSPResNet骨干网络增加可一个学习权重参数Alpha参与训练。 + - **largesize**表示使用**以1600尺度为基础的多尺度训练**和**1920尺度预测**,相应的训练batch_size也减小,以速度来换取高精度。 + - MatlabAPI测试是使用官网评测工具[VisDrone2018-DET-toolkit](https://github.com/VisDrone/VisDrone2018-DET-toolkit)。 + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml --amp --eval +# 评估 +python tools/eval.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams +# 预测 +python tools/infer.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 +``` + +
    + + +### COCO模型 + +| 模型 | mAPval
    0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | 下载链接 | 配置文件 | +|:--------:|:-----------------------:|:----------:|:-----------:|:------------:|:-------------:|:-----------:|:------------:|:-------------:|:------------:|:-------:|:-------:| +|PP-YOLOE+_l| 52.9 | 70.1 | 57.9 | 35.2 | 57.5 | 69.1 | 56.0 | 77.9 | 86.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [配置文件](../ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | +|**PP-YOLOE+_SOD-l**| 53.0 | **70.4** | 57.7 | **37.1** | 57.5 | 69.0 | **56.5** | 77.5 | 86.7 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_coco.pdparams) | [配置文件](./ppyoloe_plus_sod_crn_l_80e_coco.yml) | + +**注意:** + - 上表中的模型均为**使用原图训练**,也**原图评估预测**,网络输入尺度为640x640,训练集为COCO的train2017,验证集为val2017,均为8卡总batch_size为64训练80 epoch。 + - **SOD**表示使用**基于向量的DFL算法**和针对小目标的**中心先验优化策略**,并**在模型的Neck结构中加入transformer**,可在 APsmall 上提升1.9。 + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --amp --eval +# 评估 +python tools/eval.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_coco.pdparams +# 预测 +python tools/infer.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.25 +``` + +
    + + +### 切图模型 + +| 模型 | 数据集 | SLICE_SIZE | OVERLAP_RATIO | 类别数 | mAPval
    0.5:0.95 | APval
    0.5 | 下载链接 | 配置文件 | +|:---------|:---------------:|:---------------:|:---------------:|:------:|:-----------------------:|:-------------------:|:---------:| :-----: | +|PP-YOLOE-P2-l| DOTA | 500 | 0.25 | 15 | 53.9 | 78.6 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025.pdparams) | [配置文件](./ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025.yml) | +|PP-YOLOE-P2-l| Xview | 400 | 0.25 | 60 | 14.9 | 27.0 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_p2_crn_l_80e_sliced_xview_400_025.pdparams) | [配置文件](./ppyoloe_p2_crn_l_80e_sliced_xview_400_025.yml) | +|PP-YOLOE-l| VisDrone-DET| 640 | 0.25 | 10 | 38.5 | 60.2 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | + +**注意:** + - 上表中的模型均为使用**切图后的子图训练**,且使用**切图后的子图评估预测**,AP精度均为**子图验证集**上评估的结果。 + - **SLICE_SIZE**表示使用SAHI工具切图后子图的边长大小,**OVERLAP_RATIO**表示切图的子图之间的重叠率。 + - VisDrone-DET的模型与[拼图模型](#拼图模型)表格中的VisDrone-DET是**同一个模型权重**,但此处AP精度是在**切图后的子图验证集**上评估的结果。 + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml --amp --eval +# 子图直接评估 +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +# 子图直接预测 +python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 +``` + +
    + + +### 拼图模型 + +| 模型 | 数据集 | SLICE_SIZE | OVERLAP_RATIO | 类别数 | mAPval
    0.5:0.95 | APval
    0.5 | 下载链接 | 配置文件 | +|:---------|:---------------:|:---------------:|:---------------:|:------:|:-----------------------:|:-------------------:|:---------:| :-----: | +|PP-YOLOE-l (原图直接评估)| VisDrone-DET| 640 | 0.25 | 10 | 29.7 | 48.5 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | +|PP-YOLOE-l (切图拼图评估)| VisDrone-DET| 640 | 0.25 | 10 | 37.3 | 59.5 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml) | + +**注意:** + - 上表中的模型均为使用**切图后的子图**训练,评估预测时分为两种,**直接使用原图**评估预测,和**使用子图自动拼成原图**评估预测,AP精度均为**原图验证集**上评估的结果。。 + - **SLICE_SIZE**表示使用SAHI工具切图后子图的边长大小,**OVERLAP_RATIO**表示切图的子图之间的重叠率。 + - VisDrone-DET的模型与[切图模型](#切图模型)表格中的VisDrone-DET是**同一个模型权重**,但此处AP精度是在**原图验证集**上评估的结果,需要提前修改`ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml`里的`EvalDataset`的默认的子图验证集路径为以下**原图验证集路径**: + ``` + EvalDataset: + !COCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone + ``` + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml --amp --eval +# 原图直接评估,注意需要提前修改此yml中的 `EvalDataset` 的默认的子图验证集路径 为 原图验证集路径: +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +# 切图拼图评估,加上 --slice_infer,注意是使用的带 _slice_infer 后缀的yml配置文件 +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --slice_infer +# 切图拼图预测,加上 --slice_infer +python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 --slice_infer +``` + +
    + + +### 注意事项 + +- 切图和拼图,需要使用[SAHI](https://github.com/obss/sahi)切图工具,需要首先安装:`pip install sahi`,参考[installation](https://github.com/obss/sahi/blob/main/README.md#installation)。 +- DOTA水平框和Xview数据集均是**切图后训练**,AP指标为**切图后的子图val上的指标**。 +- VisDrone-DET数据集请参照[visdrone](./visdrone),**可使用原图训练,也可使用切图后训练**,这上面表格中的指标均是使用VisDrone-DET的val子集做验证而未使用test_dev子集。 +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 常用训练验证部署等步骤请参考[ppyoloe](../ppyoloe#getting-start)。 +- 自动切图和拼图的推理预测需添加设置`--slice_infer`,具体见下文[模型库使用说明](#模型库使用说明)中的[预测](#预测)和[部署](#部署)。 +- 自动切图和拼图过程,参照[2.3 子图拼图评估](#评估)。 + + +## 模型库使用说明 + +### 训练 + +#### 1.1 原图训练 +首先将待训数据集制作成COCO数据集格式,然后按照PaddleDetection的模型的常规训练流程训练即可。 + +执行以下指令使用混合精度训练COCO数据集: + +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --amp --eval +``` + +**注意:** +- 使用默认配置训练需要设置`--amp`以避免显存溢出,`--eval`表示边训边验证,会自动保存最佳精度的模型权重。 + +#### 1.2 原图训练 +首先将待训数据集制作成COCO数据集格式,然后使用SAHI切图工具进行**离线切图**,对保存的子图按**常规检测模型的训练流程**走即可。 +也可直接下载PaddleDetection团队提供的切图后的VisDrone-DET、DOTA水平框、Xview数据集。 + +执行以下指令使用混合精度训练VisDrone切图数据集: + +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml --amp --eval +``` + + +### 评估 + +#### 2.1 子图评估 +**默认评估方式是子图评估**,子图数据集的验证集设置为: +``` +EvalDataset: + !COCODataSet + image_dir: val_images_640_025 + anno_path: val_640_025.json + dataset_dir: dataset/visdrone_sliced +``` +按常规检测模型的评估流程,评估提前切好并存下来的子图上的精度: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +``` + +#### 2.2 原图评估 +修改验证集的标注文件路径为**原图标注文件**: +``` +EvalDataset: + !COCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone +``` +直接评估原图上的精度: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +``` + +#### 2.3 子图拼图评估 +修改验证集的标注文件路径为**原图标注文件**: +``` +# very slow, preferly eval with a determined weights(xx.pdparams) +# if you want to eval during training, change SlicedCOCODataSet to COCODataSet and delete sliced_size and overlap_ratio +EvalDataset: + !SlicedCOCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone + sliced_size: [640, 640] + overlap_ratio: [0.25, 0.25] +``` +会在评估过程中自动对原图进行切图最后再重组和融合结果来评估原图上的精度: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --slice_infer --combine_method=nms --match_threshold=0.6 --match_metric=ios +``` + +**注意:** +- 设置`--slice_infer`表示切图预测并拼装重组结果,如果不使用则不写,注意需要确保EvalDataset的数据集类是选用的SlicedCOCODataSet而不是COCODataSet; +- 设置`--slice_size`表示切图的子图尺寸大小,设置`--overlap_ratio`表示子图间重叠率,可以自行修改选择合适的子图尺度sliced_size和子图间重叠率overlap_ratio,如: +``` +EvalDataset: + !SlicedCOCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone + sliced_size: [480, 480] + overlap_ratio: [0.2, 0.2] +``` +- 设置`--combine_method`表示子图结果重组去重的方式,默认是`nms`; +- 设置`--match_threshold`表示子图结果重组去重的阈值,默认是0.6; +- 设置`--match_metric`表示子图结果重组去重的度量标准,默认是`ios`表示交小比(两个框交集面积除以更小框的面积),也可以选择交并比`iou`(两个框交集面积除以并集面积),精度效果因数据集而而异,但选择`ios`预测速度会更快一点; + + +### 预测 + +#### 3.1 子图或原图直接预测 +与评估流程基本相同,可以在提前切好并存下来的子图上预测,也可以对原图预测,如: +```bash +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 +``` + +#### 3.2 原图自动切图并拼图预测 +也可以对原图进行自动切图并拼图重组来预测原图,如: +```bash +# 单张图 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 --slice_infer --slice_size 640 640 --overlap_ratio 0.25 0.25 --combine_method=nms --match_threshold=0.6 --match_metric=ios --save_results=True +# 或图片文件夹 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_dir=demo/ --draw_threshold=0.25 --slice_infer --slice_size 640 640 --overlap_ratio 0.25 0.25 --combine_method=nms --match_threshold=0.6 --match_metric=ios +``` +- 设置`--slice_infer`表示切图预测并拼装重组结果,如果不使用则不写; +- 设置`--slice_size`表示切图的子图尺寸大小,设置`--overlap_ratio`表示子图间重叠率; +- 设置`--combine_method`表示子图结果重组去重的方式,默认是`nms`; +- 设置`--match_threshold`表示子图结果重组去重的阈值,默认是0.6; +- 设置`--match_metric`表示子图结果重组去重的度量标准,默认是`ios`表示交小比(两个框交集面积除以更小框的面积),也可以选择交并比`iou`(两个框交集面积除以并集面积),精度效果因数据集而而异,但选择`ios`预测速度会更快一点; +- 设置`--save_results`表示保存图片结果为json文件,一般只单张图预测时使用; + + +### 部署 + +#### 4.1 导出模型 +```bash +# export model +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +``` + +#### 4.2 使用原图或子图直接推理 +```bash +# deploy infer +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=GPU --save_images --threshold=0.25 +``` + +#### 4.3 使用原图自动切图并拼图重组结果来推理 +```bash +# deploy slice infer +# 单张图 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=GPU --save_images --threshold=0.25 --slice_infer --slice_size 640 640 --overlap_ratio 0.25 0.25 --combine_method=nms --match_threshold=0.6 --match_metric=ios --save_results=True +# 或图片文件夹 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image_dir=demo/ --device=GPU --save_images --threshold=0.25 --slice_infer --slice_size 640 640 --overlap_ratio 0.25 0.25 --combine_method=nms --match_threshold=0.6 --match_metric=ios +``` +- 设置`--slice_infer`表示切图预测并拼装重组结果,如果不使用则不写; +- 设置`--slice_size`表示切图的子图尺寸大小,设置`--overlap_ratio`表示子图间重叠率; +- 设置`--combine_method`表示子图结果重组去重的方式,默认是`nms`; +- 设置`--match_threshold`表示子图结果重组去重的阈值,默认是0.6; +- 设置`--match_metric`表示子图结果重组去重的度量标准,默认是`ios`表示交小比(两个框交集面积除以更小框的面积),也可以选择交并比`iou`(两个框交集面积除以并集面积),精度效果因数据集而而异,但选择`ios`预测速度会更快一点; +- 设置`--save_results`表示保存图片结果为json文件,一般只单张图预测时使用; + + +## 引用 +``` +@article{akyon2022sahi, + title={Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection}, + author={Akyon, Fatih Cagatay and Altinuc, Sinan Onur and Temizel, Alptekin}, + journal={2022 IEEE International Conference on Image Processing (ICIP)}, + doi={10.1109/ICIP46576.2022.9897990}, + pages={966-970}, + year={2022} +} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +``` diff --git a/PaddleDetection-release-2.6/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml b/PaddleDetection-release-2.6/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..d0fc0c389f6ed6e0af1bb9e52406cd2c80205c2c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 15 + +TrainDataset: + !COCODataSet + image_dir: train_images_500_025 + anno_path: train_500_025.json + dataset_dir: dataset/dota_sliced + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val_images_500_025 + anno_path: val_500_025.json + dataset_dir: dataset/dota_sliced + +TestDataset: + !ImageFolder + anno_path: val_500_025.json + dataset_dir: dataset/dota_sliced diff --git a/PaddleDetection-release-2.6/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml b/PaddleDetection-release-2.6/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..03848ca17549e159d5dab0886c9f83d461c4fdd7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: train_images_640_025 + anno_path: train_640_025.json + dataset_dir: dataset/visdrone_sliced + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val_images_640_025 + anno_path: val_640_025.json + dataset_dir: dataset/visdrone_sliced + +TestDataset: + !ImageFolder + anno_path: val_640_025.json + dataset_dir: dataset/visdrone_sliced diff --git a/PaddleDetection-release-2.6/configs/smalldet/_base_/xview_sliced_400_025_detection.yml b/PaddleDetection-release-2.6/configs/smalldet/_base_/xview_sliced_400_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..c80f545bd7e280b7d97f8ff9e7db25e86162bdf5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/_base_/xview_sliced_400_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 60 + +TrainDataset: + !COCODataSet + image_dir: train_images_400_025 + anno_path: train_400_025.json + dataset_dir: dataset/xview_sliced + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val_images_400_025 + anno_path: val_400_025.json + dataset_dir: dataset/xview_sliced + +TestDataset: + !ImageFolder + anno_path: val_400_025.json + dataset_dir: dataset/xview_sliced diff --git a/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..26275899ff3b5125b46b86db402e0543e7780036 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml @@ -0,0 +1,58 @@ +_BASE_: [ + './_base_/visdrone_sliced_640_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_sliced_visdrone_640_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +TrainReader: + batch_size: 8 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 + + +EvalDataset: + !COCODataSet + image_dir: val_images_640_025 + anno_path: val_640_025.json + dataset_dir: dataset/visdrone_sliced + +# EvalDataset: +# !COCODataSet +# image_dir: VisDrone2019-DET-val +# anno_path: val.json +# dataset_dir: dataset/visdrone diff --git a/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml new file mode 100644 index 0000000000000000000000000000000000000000..91e45ff54c31b1092a2863b015a2d944ba3b678e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml @@ -0,0 +1,15 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml', +] +weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams + + +# very slow, better to use the determined weight (xx. pdparams) to evaluate separately, rather than evaling during training +# if you want to eval during training, change SlicedCOCODataSet to COCODataSet, then delete sliced_size and overlap_ratio +EvalDataset: + !SlicedCOCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone + sliced_size: [640, 640] + overlap_ratio: [0.25, 0.25] diff --git a/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025.yml b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..4e47f2c88d3a1c230dcc6461cab70eeb68f53419 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025.yml @@ -0,0 +1,54 @@ +_BASE_: [ + './_base_/DOTA_sliced_500_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_p2_crn_l_80e_sliced_DOTA_500_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + + +TrainReader: + batch_size: 4 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_xview_400_025.yml b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_xview_400_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..e94d799b2727e5097a7b5e90f7b7c1935bed0df8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_p2_crn_l_80e_sliced_xview_400_025.yml @@ -0,0 +1,54 @@ +_BASE_: [ + './_base_/xview_sliced_400_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_p2_crn_l_80e_sliced_xview_400_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + + +TrainReader: + batch_size: 4 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ad4c52eac56fad3303169d02c6f6578abbdcf106 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml @@ -0,0 +1,31 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_80e.yml', + '../ppyoloe/_base_/ppyoloe_plus_crn.yml', + '../ppyoloe/_base_/ppyoloe_plus_reader.yml', +] +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_plus_sod_crn_l_80e_coco/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_l_obj365_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +CustomCSPPAN: + num_layers: 4 + use_trans: True + +PPYOLOEHead: + reg_range: [-2, 17] + static_assigner_epoch: -1 + assigner: + name: TaskAlignedAssigner_CR + center_radius: 1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/README.md b/PaddleDetection-release-2.6/configs/smalldet/visdrone/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fbe4ad82224ec6b4b3ee8bb1d7d30fd8c2913791 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/README.md @@ -0,0 +1,186 @@ +# VisDrone-DET 小目标检测模型 + +PaddleDetection团队提供了针对VisDrone-DET小目标数航拍场景的基于PP-YOLOE的检测模型,用户可以下载模型进行使用。整理后的COCO格式VisDrone-DET数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone.zip),检测其中的10类,包括 `pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10)`,原始数据集[下载链接](https://github.com/VisDrone/VisDrone-Dataset)。其他相关小目标数据集可参照 [DataDownload.md](../DataDownload.md)。 + +**注意:** +- VisDrone-DET数据集包括**train集6471张,val集548张,test_dev集1610张**,test-challenge集1580张(未开放检测框标注),前三者均有开放检测框标注。 +- 模型均**只使用train集训练**,在val集和test_dev集上分别验证精度,test_dev集图片数较多,精度参考性较高。 + + +## 原图训练,原图评估: + +| 模型 | COCOAPI mAPval
    0.5:0.95 | COCOAPI mAPval
    0.5 | COCOAPI mAPtest_dev
    0.5:0.95 | COCOAPI mAPtest_dev
    0.5 | MatlabAPI mAPtest_dev
    0.5:0.95 | MatlabAPI mAPtest_dev
    0.5 | 下载 | 配置文件 | +|:---------|:------:|:------:| :----: | :------:| :------: | :------:| :----: | :------:| +|PP-YOLOE-s| 23.5 | 39.9 | 19.4 | 33.6 | 23.68 | 40.66 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-s| 24.4 | 41.6 | 20.1 | 34.7 | 24.55 | 42.19 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_p2_alpha_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_p2_alpha_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-s**| **25.1** | **42.8** | **20.7** | **36.2** | **25.16** | **43.86** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_s_80e_visdrone.pdparams) | [配置文件](./ppyoloe_plus_sod_crn_s_80e_visdrone.yml) | +|PP-YOLOE-l| 29.2 | 47.3 | 23.5 | 39.1 | 28.00 | 46.20 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-l| 30.1 | 48.9 | 24.3 | 40.8 | 28.47 | 48.16 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_p2_alpha_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-l**| **31.9** | **52.1** | **25.6** | **43.5** | **30.25** | **51.18** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams) | [配置文件](./ppyoloe_plus_sod_crn_l_80e_visdrone.yml) | +|PP-YOLOE-Alpha-largesize-l| 41.9 | 65.0 | 32.3 | 53.0 | 37.13 | 61.15 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_alpha_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-largesize-l| 41.3 | 64.5 | 32.4 | 53.1 | 37.49 | 51.54 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml) | +|PP-YOLOE+_largesize-l | 43.3 | 66.7 | 33.5 | 54.7 | 38.24 | 62.76 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_plus_crn_l_largesize_80e_visdrone.yml) | +|**PP-YOLOE+_SOD-largesize-l** | 42.7 | 65.9 | **33.6** | **55.1** | **38.4** | **63.07** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml) | + +**注意:** + - 上表中的模型均为**使用原图训练**,也**使用原图评估预测**,AP精度均为**原图验证集**上评估的结果。 + - VisDrone-DET数据集**可使用原图训练,也可使用切图后训练**,通过数据集统计分布分析,推荐使用**原图训练**,推荐直接使用带**SOD**的模型配置文件去训练评估和预测部署,在显卡算力有限时也可使用切图后训练。 + - 上表中的模型指标均是使用VisDrone-DET的train子集作为训练集,使用VisDrone-DET的val子集和test_dev子集作为验证集。 + - **SOD**表示使用**基于向量的DFL算法**和针对小目标的**中心先验优化策略**,并**在模型的Neck结构中加入transformer**。 + - **P2**表示增加P2层(1/4下采样层)的特征,共输出4个PPYOLOEHead。 + - **Alpha**表示对CSPResNet骨干网络增加可一个学习权重参数Alpha参与训练。 + - **largesize**表示使用**以1600尺度为基础的多尺度训练**和**1920尺度预测**,相应的训练batch_size也减小,以速度来换取高精度。 + - MatlabAPI测试是使用官网评测工具[VisDrone2018-DET-toolkit](https://github.com/VisDrone/VisDrone2018-DET-toolkit)。 + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml --amp --eval +# 评估 +python tools/eval.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams +# 预测 +python tools/infer.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 +``` + +
    + + +## 子图训练,原图评估和拼图评估: + +| 模型 | 数据集 | SLICE_SIZE | OVERLAP_RATIO | 类别数 | mAPval
    0.5:0.95 | APval
    0.5 | 下载链接 | 配置文件 | +|:---------|:---------------:|:---------------:|:---------------:|:------:|:-----------------------:|:-------------------:|:---------:| :-----: | +|PP-YOLOE-l(子图直接评估)| VisDrone-DET| 640 | 0.25 | 10 | 38.5(子图val) | 60.2 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | +|PP-YOLOE-l(原图直接评估)| VisDrone-DET| 640 | 0.25 | 10 | 29.7(原图val) | 48.5 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](../ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | +|PP-YOLOE-l (切图拼图评估)| VisDrone-DET| 640 | 0.25 | 10 | 37.3(原图val) | 59.5 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](../ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | + +**注意:** + - 上表中的模型均为使用**切图后的子图**训练,评估预测时分为两种,**直接使用原图**评估预测,和**使用子图自动拼成原图**评估预测,AP精度均为**原图验证集**上评估的结果。。 + - **SLICE_SIZE**表示使用SAHI工具切图后子图的边长大小,**OVERLAP_RATIO**表示切图的子图之间的重叠率。 + - VisDrone-DET的模型与[切图模型](../README.md#切图模型)表格中的VisDrone-DET是**同一个模型权重**,但此处AP精度是在**原图验证集**上评估的结果,需要提前修改`ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml`里的`EvalDataset`的默认的子图验证集路径为以下**原图验证集路径**: + ``` + EvalDataset: + !COCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + dataset_dir: dataset/visdrone + ``` + +
    + 快速开始 + +```shell +# 训练 +python -m paddle.distributed.launch --log_dir=logs/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml --amp --eval +# 子图直接评估 +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +# 原图直接评估,注意需要提前修改此yml中的 `EvalDataset` 的默认的子图验证集路径 为 原图验证集路径: +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams +# 切图拼图评估,加上 --slice_infer,注意是使用的带 _slice_infer 后缀的yml配置文件 +python tools/eval.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025_slice_infer.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --slice_infer +# 切图拼图预测,加上 --slice_infer +python tools/infer.py -c configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams --infer_img=demo/visdrone_0000315_01601_d_0000509.jpg --draw_threshold=0.25 --slice_infer +``` + +
    + + +## 注意事项: + - PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 + - 具体使用教程请参考[ppyoloe](../../ppyoloe#getting-start)。 + - MatlabAPI测试是使用官网评测工具[VisDrone2018-DET-toolkit](https://github.com/VisDrone/VisDrone2018-DET-toolkit)。 + + +## PP-YOLOE+_SOD 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| PP-YOLOE+_SOD-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_s_80e_visdrone_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_s_80e_visdrone_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_s_80e_visdrone_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_s_80e_visdrone_wo_nms.onnx) | +| PP-YOLOE+_SOD-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_80e_visdrone_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_80e_visdrone_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_80e_visdrone_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_80e_visdrone_wo_nms.onnx) | +| PP-YOLOE+_SOD-largesize-l | 1920 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/smalldet/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone_wo_nms.onnx) | + + +## 测速 + +1.参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 +测速需要设置`--run_benchmark=True`, 你需要安装以下依赖`pip install pynvml psutil GPUtil`。 +导出ONNX,你需要安装以下依赖`pip install paddle2onnx`。 + +2.运行以下命令导出**带NMS的模型和ONNX**,并使用TensorRT FP16进行推理和测速 + +### 注意: + +- 由于NMS参数设置对速度影响极大,部署测速时可调整`keep_top_k`和`nms_top_k`,在只低约0.1 mAP精度的情况下加快预测速度,导出模型的时候也可这样设置: + ``` + nms: + name: MultiClassNMS + nms_top_k: 1000 # 10000 + keep_top_k: 100 # 500 + score_threshold: 0.01 + nms_threshold: 0.6 + ``` + +```bash +# 导出带NMS的模型 +python tools/export_model.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.pdparams trt=True + +# 导出带NMS的ONNX +paddle2onnx --model_dir output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.onnx + +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_mode=trt_fp16 + +# 推理文件夹下的所有图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_dir=demo/ --device=gpu --run_mode=trt_fp16 + +# 单张图片普通测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_benchmark=True + +# 单张图片TensorRT FP16测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp16 +``` + +3.运行以下命令导出**不带NMS的模型和ONNX**,并使用TensorRT FP16进行推理和测速,以及**ONNX下FP16测速** + +```bash +# 导出带NMS的模型 +python tools/export_model.py -c configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.pdparams trt=True exclude_nms=True + +# 导出带NMS的ONNX +paddle2onnx --model_dir output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.onnx + +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_mode=trt_fp16 + +# 推理文件夹下的所有图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_dir=demo/ --device=gpu --run_mode=trt_fp16 + +# 单张图片普通测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_benchmark=True + +# 单张图片TensorRT FP16测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone --image_file=demo/visdrone_0000315_01601_d_0000509.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp16 + +# 单张图片ONNX TensorRT FP16测速 +/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x1920x1920 --fp16 +``` + +**注意:** +- TensorRT会根据网络的定义,执行针对当前硬件平台的优化,生成推理引擎并序列化为文件。该推理引擎只适用于当前软硬件平台。如果你的软硬件平台没有发生变化,你可以设置[enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/python/infer.py#L857)的参数`use_static=True`,这样生成的序列化文件将会保存在`output_inference`文件夹下,下次执行TensorRT时将加载保存的序列化文件。 +- PaddleDetection release/2.4及其之后的版本将支持NMS调用TensorRT,需要依赖PaddlePaddle release/2.3及其之后的版本 + + +# 引用 +``` +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +``` diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..2eaa3164309a77ad441e9087ff65822d61ab278b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_80e_visdrone.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../../datasets/visdrone_detection.yml', + '../../runtime.yml', + '../../ppyoloe/_base_/optimizer_300e.yml', + '../../ppyoloe/_base_/ppyoloe_crn.yml', + '../../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + + +TrainReader: + batch_size: 8 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..c16f2116a7015be193d052bc7c9811d2b4133de3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml @@ -0,0 +1,58 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +CSPResNet: + use_alpha: True + + +LearningRate: + base_lr: 0.0025 + + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..f0093beb0fdf66079483f3bec07a94cc5afec617 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml @@ -0,0 +1,33 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_p2_alpha_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +TrainReader: + batch_size: 4 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +LearningRate: + base_lr: 0.005 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..53b4c9f0d66b74e20897e0ae509176d1ab4beceb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml @@ -0,0 +1,65 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +LearningRate: + base_lr: 0.005 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] + + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920, 1984, 2048], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 1 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..d5fefe6a8ae2db471d15575c0fd4bc342141a480 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_80e_visdrone.yml @@ -0,0 +1,45 @@ +_BASE_: [ + '../../datasets/visdrone_detection.yml', + '../../runtime.yml', + '../../ppyoloe/_base_/optimizer_300e.yml', + '../../ppyoloe/_base_/ppyoloe_crn.yml', + '../../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_80e_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + + +TrainReader: + batch_size: 8 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..70e34cd05a872e168f88e7d59858d69305559e29 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml @@ -0,0 +1,32 @@ +_BASE_: [ + 'ppyoloe_crn_s_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_p2_alpha_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams + + +TrainReader: + batch_size: 4 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +LearningRate: + base_lr: 0.005 + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_crn_l_largesize_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_crn_l_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..fddfe46a1afe02ff3955d0b22f41efd511cb7722 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_crn_l_largesize_80e_visdrone.yml @@ -0,0 +1,58 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_crn_l_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams + + +CSPResNet: + use_alpha: True + + +LearningRate: + base_lr: 0.0025 + + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..e0c6e41d3dc30ff78cae57596bca86521a008099 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml @@ -0,0 +1,58 @@ +_BASE_: [ + '../../datasets/visdrone_detection.yml', + '../../runtime.yml', + '../../ppyoloe/_base_/optimizer_80e.yml', + '../../ppyoloe/_base_/ppyoloe_plus_crn.yml', + '../../ppyoloe/_base_/ppyoloe_plus_reader.yml' +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_sod_crn_l_80e_visdrone/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_l_80e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +CustomCSPPAN: + num_layers: 4 + use_trans: True + +PPYOLOEHead: + reg_range: [-2,8] + static_assigner_epoch: -1 + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner_CR + center_radius: 1 + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..d7d0865132b10fdf872af4bb70526d17333e0d71 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone.yml @@ -0,0 +1,56 @@ +_BASE_: [ + 'ppyoloe_plus_sod_crn_l_80e_visdrone.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_sod_crn_l_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams + +PPYOLOEHead: + reg_range: [-2,20] + static_assigner_epoch: -1 + +LearningRate: + base_lr: 0.00125 + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 1 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_s_80e_visdrone.yml b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_s_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..fd444eab6f238e266839caaf91f65d65be159ef8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smalldet/visdrone/ppyoloe_plus_sod_crn_s_80e_visdrone.yml @@ -0,0 +1,58 @@ +_BASE_: [ + '../../datasets/visdrone_detection.yml', + '../../runtime.yml', + '../../ppyoloe/_base_/optimizer_80e.yml', + '../../ppyoloe/_base_/ppyoloe_plus_crn.yml', + '../../ppyoloe/_base_/ppyoloe_plus_reader.yml' +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_plus_sod_crn_s_80e_visdrone/model_final + +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_plus_crn_s_80e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 8 + +EvalReader: + batch_size: 1 + +TestReader: + batch_size: 1 + fuse_normalize: True + + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +CustomCSPPAN: + num_layers: 4 + use_trans: True + +PPYOLOEHead: + reg_range: [-2,8] + static_assigner_epoch: -1 + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner_CR + center_radius: 1 + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/DataAnalysis.md b/PaddleDetection-release-2.6/configs/smrt/DataAnalysis.md new file mode 100644 index 0000000000000000000000000000000000000000..66da22f43d9ba9494c35ee0fa0285aa45099399f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/DataAnalysis.md @@ -0,0 +1,68 @@ +# 数据分析功能说明 + +为了更好的帮助用户进行数据分析,从推荐更合适的模型,我们推出了**数据分析**功能,用户不需要上传原图,只需要上传标注好的文件格式即可进一步分析数据特点。 + +当前支持格式有: +* LabelMe标注数据格式 +* 精灵标注数据格式 +* LabelImg标注数据格式 +* VOC数据格式 +* COCO数据格式 +* Seg数据格式 + +## LabelMe标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的json文件,每一个json文件除后缀外与对应的图像同名。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,需提供rectangle类型标注信息;对于分割任务,需提供polygon类型标注信息。 +
    + +
    + +## 精灵标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的json文件,每一个json文件除后缀外与对应的图像同名。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,需提供bndbox或polygon类型标注信息;对于分割任务,需提供polygon类型标注信息。 +
    + +
    + +## LabelImg标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的xml文件,每一个xml文件除后缀外与对应的图像同名。 +2. 仅支持检测任务。 +3. 标注文件中必须提供bndbox字段信息;segmentation字段是可选的。 + +
    + +
    + +## VOC数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的xml文件,每一个xml文件除后缀外与对应的图像同名。 +2. 仅支持检测任务。 +3. 标注文件中必须提供bndbox字段信息;segmentation字段是可选的。 +
    + +
    + +## COCO数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中仅存在一个名为annotation.json的文件。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,标注文件中必须包含bbox字段,segmentation字段是可选的;对于分割任务,标注文件中必须包含segmentation字段。 +
    + +
    + + +## Seg数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的png文件,每一个png文件除后缀外与对应的图像同名。 +2. 仅支持分割任务。 +3. 标注文件需要与原始图像在像素上严格保持一一对应,格式只可为png(后缀为.png或.PNG)。标注文件中的每个像素值为[0,255]区间内从0开始依序递增的整数ID,除255外,标注ID值的增加不能跳跃。在标注文件中,使用255表示需要忽略的像素,使用0表示背景类标注。 + +
    + +
    diff --git a/PaddleDetection-release-2.6/configs/smrt/README.md b/PaddleDetection-release-2.6/configs/smrt/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d9ffbcc1275dc5cb55d07bbe88f030defc55ddf5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/README.md @@ -0,0 +1,216 @@ +# 飞桨产业模型选型工具PaddleSMRT + +## 一、项目介绍 + +PaddleSMRT (Paddle Sense Model Recommend Tool) 是飞桨结合产业落地经验推出的产业模型选型工具,在项目落地过程中,用户根据自身的实际情况,输入自己的需求,即可以得到对应在算法模型、部署硬件以及教程文档的信息。同时为了更加精准的推荐,增加了数据分析功能,用户上传自己的标注文件,系统可以自动分析数据特点,例如数据分布不均衡、小目标、密集型等,从而提供更加精准的模型以及优化策略,更好的符合场景的需求。 + +飞桨官网使用[链接](https://www.paddlepaddle.org.cn/smrt) + +本文档主要介绍PaddleSMRT在检测方向上是如何进行模型选型推荐,以及推荐模型的使用方法。分割方向模型介绍请参考[文档](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.5/configs/smrt) + +## 二、数据介绍 + +PaddleSMRT结合产业真实场景,通过比较检测算法效果,向用户推荐最适合的模型。目前PaddleSMRT覆盖工业质检、城市安防两大场景,下面介绍PaddleSMRT进行算法对比所使用的数据集 + +### 1. 新能源电池质检数据集 + +数据集为新能源电池电池组件质检数据集,包含15021张图片,包含22045个标注框,覆盖45种缺陷类型,例如掉胶,裂纹,划痕等。 + +新能源电池数据展示图: + +
    + + +
    + +数据集特点为: + +1. 类别分布均衡 +2. 属于小目标数据 +3. 非密集型数据 + +### 2. 铝件质检数据集 + +数据集为铝件生产过程中的质检数据集,包含11293张图片,包含43157个标注框,覆盖5种缺陷类型,例如划伤,压伤,起皮等。 + +铝件质检数据展示图: + +
    + + +
    + + +数据集特点为: + +1. 类别分布不均衡 +2. 属于小目标数据 +3. 非密集型数据 + + +### 3. 人车数据集 + +数据集包含2600张人工标注的两点anchor box标签。标签包括以下人和车的类别共22种: +其中行人包括普通行人、3D 假人、坐着的人、骑车的人;车辆包括两厢车、三厢车、小型客车、小货车、皮卡车、轻卡、厢式货车、牵引车、水泥车、工程车辆、校车、中小型客车、大型单层客车、小型电动车、摩托车、自行车、三轮车以及其它特殊车辆。 + +人车数据展示图: + +
    + + +
    + + +数据集特点为: + +1. 类别分布不均衡 +2. 属于小目标数据 +3. 非密集型数据 + +**说明:** + +数据集特点判断依据如下: + +- 数据分布不均衡:采样1000张图片,不同类别样本个数标准差大于400 +- 小目标数据集:相对大小小于0.1或绝对大小小于32像素的样本个数比例大于30% +- 密集型数据集: + +``` + 密集目标定义:周围目标距离小于自身大小两倍的个数大于2; + + 密集图片定义:密集目标个数占图片目标总数50%以上; + + 密集数据集定义:密集图片个数占总个数30%以上 + +``` + +为了更好的帮助用户选择模型,我们也提供了丰富的数据分析功能,用户只需要上传标注文件(不需要原图)即可了解数据特点分布和模型优化建议 + +
    + +
    + +## 三、推荐模型使用全流程 + +通过模型选型工具会得到对应场景和数据特点的检测模型配置,例如[PP-YOLOE](./ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml) + +该配置文件的使用方法如下 + +### 1. 环境配置 + +首先需要安装PaddlePaddle + +```bash +# CUDA10.2 +pip install paddlepaddle-gpu==2.2.2 -i https://mirror.baidu.com/pypi/simple + +# CPU +pip install paddlepaddle==2.2.2 -i https://mirror.baidu.com/pypi/simple +``` + +然后安装PaddleDetection和相关依赖 + +```bash +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +详细安装文档请参考[文档](../../docs/tutorials/INSTALL_cn.md) + +### 2. 数据准备 + +用户需要准备训练数据集,建议标注文件使用COCO数据格式。如果使用lableme或者VOC数据格式,先使用[格式转换脚本](../../tools/x2coco.py)将标注格式转化为COCO,详细数据准备文档请参考[文档](../../docs/tutorials/data/PrepareDataSet.md) + +本文档以新能源电池工业质检子数据集为例展开,数据下载[链接](https://bj.bcebos.com/v1/paddle-smrt/data/battery_mini.zip) + +数据储存格式如下: + +``` +battery_mini +├── annotations +│   ├── test.json +│   └── train.json +└── images + ├── Board_daowen_101.png + ├── Board_daowen_109.png + ├── Board_daowen_117.png + ... +``` + + + +### 3. 模型训练/评估/预测 + +使用经过模型选型工具推荐的模型进行训练,目前所推荐的模型均使用**单卡训练**,可以在训练的过程中进行评估,模型默认保存在`./output`下 + +```bash +python tools/train.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml --eval +``` + +如果训练过程出现中断,可以使用-r命令恢复训练 + +```bash +python tools/train.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml --eval -r output/ppyoloe_crn_m_300e_battery_1024/9.pdparams +``` + +如果期望单独评估模型训练精度,可以使用`tools/eval.py` + +```bash +python tools/eval.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams +``` + +完成训练后,可以使用`tools/infer.py`可视化训练效果 + +```bash +python tools/infer.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams --infer_img=images/Board_diaojiao_1591.png +``` + +更多模型训练参数请参考[文档](../../docs/tutorials/GETTING_STARTED_cn.md) + +### 4. 模型导出部署 + +完成模型训练后,需要将模型部署到1080Ti,2080Ti或其他服务器设备上,使用Paddle Inference完成C++部署 + +首先需要将模型导出为部署时使用的模型和配置文件 + +```bash +python tools/export_model.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams +``` + +接下来可以使用PaddleDetection中的部署代码实现C++部署,详细步骤请参考[文档](../../deploy/cpp/README.md) + +如果期望使用可视化界面的方式进行部署,可以参考下面部分的内容。 + +## 四、部署demo + +为了更方便大家部署,我们也提供了完备的可视化部署Demo,欢迎尝试使用 + +* [Windows Demo下载地址](https://github.com/PaddlePaddle/PaddleX/tree/develop/deploy/cpp/docs/csharp_deploy) + +
    + +
    + +* [Linux Demo下载地址](https://github.com/cjh3020889729/The-PaddleX-QT-Visualize-GUI) + +
    + +
    + +## 五、场景范例 + +为了更方便大家更好的进行产业落地,PaddleSMRT也提供了详细的应用范例,欢迎大家使用。 + +* 工业视觉 + * [工业缺陷检测](https://aistudio.baidu.com/aistudio/projectdetail/2598319) + * [表计读数](https://aistudio.baidu.com/aistudio/projectdetail/2598327) + * [钢筋计数](https://aistudio.baidu.com/aistudio/projectdetail/2404188) +* 城市 + * [行人计数](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + * [车辆计数](https://aistudio.baidu.com/aistudio/projectdetail/3391734?contributionType=1) + * [安全帽检测](https://aistudio.baidu.com/aistudio/projectdetail/3944737?contributionType=1) diff --git a/PaddleDetection-release-2.6/configs/smrt/images/00362.jpg b/PaddleDetection-release-2.6/configs/smrt/images/00362.jpg new file mode 100644 index 0000000000000000000000000000000000000000..da4ab37d5cb5501e3c1471b30a3f465dd9b0a88f Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/00362.jpg differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/Board_diaojiao_1591.png b/PaddleDetection-release-2.6/configs/smrt/images/Board_diaojiao_1591.png new file mode 100644 index 0000000000000000000000000000000000000000..0ec35b9450209fba4b9579fcc325e70fc5f63ddd Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/Board_diaojiao_1591.png differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/UpCoa_liewen_163.png b/PaddleDetection-release-2.6/configs/smrt/images/UpCoa_liewen_163.png new file mode 100644 index 0000000000000000000000000000000000000000..294c29b4ed04c81d672cbe72ddaa4ccb3e301f67 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/UpCoa_liewen_163.png differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_0.jpg b/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_0.jpg new file mode 100644 index 0000000000000000000000000000000000000000..dbf0dfaa769f9a72bd7825f8589fffa5aca3ac6e Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_0.jpg differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_10.jpg b/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_10.jpg new file mode 100644 index 0000000000000000000000000000000000000000..25467e8174b27df7bf43b33795eb7ea1af605813 Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/lvjian1_10.jpg differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/renche_00002.jpg b/PaddleDetection-release-2.6/configs/smrt/images/renche_00002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9446db44df96cf18ef7871c345a8010cdfec49df Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/renche_00002.jpg differ diff --git a/PaddleDetection-release-2.6/configs/smrt/images/renche_00204.jpg b/PaddleDetection-release-2.6/configs/smrt/images/renche_00204.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2c46e933b970411eca850195b59f1c477d5d2a5e Binary files /dev/null and b/PaddleDetection-release-2.6/configs/smrt/images/renche_00204.jpg differ diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..601dd1915ee65fb452340c783be8ca1cab905ce1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..734f1bee70ce4c2708f846af4d10e350fa6a329f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..cdebd4ba4ae55e40c940b230bd61528a39fe0fcf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..8200439dc928fea3d0c091d98acee30117a33be1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..6000902f03363a5763e7c26fd38565f85dcb2388 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: /paddle/dataset/model-select/gongye/lvjian1/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: /paddle/dataset/model-select/gongye/lvjian1/slice_lvjian1_data/eval/ + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..fc1ce195ea4c5dce2bb60358f0509eb5e01a50c4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..507d0088cbc343e7ca281f8c8f54aa169e135a43 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd8b6d49f2d5b3a2de4a72e8bfe2064bcfcfc4a7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..0c09049a60423464a8c14666f7c86098564482d8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..f7dc75f975efdbd08ae4f96ce2f54cc55a4569f4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..96ab192171798f7947ee857b8291152e5933c57a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..ccc2162de1c995fdede25ccfa337d6136d14b3df --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..e886dd6c10bd03bcb2ef64f1a5f54ba0e923efcc --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..d3b7ac28fc68dfb85c0bd0d67f61b77b844bd034 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..6138e875e83a9a91708159b7f99e692c901f4f1b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..5da1090006a2e1ba967c0870ba7311306ec1a164 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..611ea34a6b6bed018698f7c2fc7f1e6cf6528988 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..37fb675f0acc4585f5ded137db46473b57c517c0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/coco/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/coco/renche + +TestDataset: + !ImageFolder + anno_path: dataset/coco/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..bc58d999cabfdfb8f2252ca0e34c73e118ba70e9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..027e38e202eaff50e69ac0d3204541d5ae7a08a6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..272caf679a296cb4375e3628aa070fd71cec9931 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..38a14259f54dd7a515aa68e5a5f7a79909f5a40b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..80c7bac76453e407d743a4e677257ebd4e2505b3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..2151ecf711c0f52560f9318085f0fee2de7b8a85 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..8902f32ec42da89643b85f0743799555c3abc8ec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..f244c1dd13381d360440a1c7705c8f5f81abf576 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..7563756955b97f722a9c099dfb8ce57a90b6c6f7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..d15e07f8e88cd1f9d592296e71cc587a6e6892ef --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.0015 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..a65cbdf540bd9e48800610516e0978d9f51b2c41 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..0427b81d4f8eeca71f6245a583f0f0a2d99f3569 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: /paddle/dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: /paddle/dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: /paddle/dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..1ef01cfc633414a9e4f71bbfc656a116c76fc7bf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..42d30e00ff940b49b778306fa45562cf87f36396 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..b6155305fc4233b1c754dae4f2bb6cc368aa55f8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..72a184127f10d32176a90bd0045d20a6d88457fa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.003 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..df1939153b2672222fd9f3589da89ac3aa1a5a93 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [12, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..07310a067794e789bd58172381cfecf37a1b3f03 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..ba94ad254319fa8fa2ca1cb3b982c7f4b5508c5f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..961d7823a32e8ee377274f1bf65399ab21b5a321 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a47aded5e8cea1ded2d916509f54d53157dd7be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..c1e70d2198f2af380c5cc9ab80704a9861f11c00 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..be3f79044af32b12bba0e5aa13059585fd65d9ab --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..250251c32504ced54291d2b5449e1ffdafb8b3ea --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..128328bf3853bff327b47bb1945908c338b3dcb8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..c6b4b8ce5c6ef099f9ba3ef9e603ddc4e273e413 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..ef11461339080740eb3ac2414eda709f10b00ddb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..20025b07da573fbb7cff5936c50509358b85aa99 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..6e0352a1952c58a3d168787364f0b2b77fede322 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..448b65db663322476f7f0db79fcd5e6a52982720 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..7e6b8871b9525a0f6775266298872178cf5b49aa --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..190ed8fa183656127445602792df861b8018e938 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..947c6e43bc6ff42f150566b7ef1e9713cd749926 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..148a0459e8e8f5aea9b74d1e943852c82f524127 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..9362638d3f05b30c2274e199410e7fa509e0eb10 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..bf881d55a0808df85739784270373e1ada4d9f3a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..688ea9bfdf6160715343d18c5b9ea83a27b6bc8e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..4b7d8e7d85a3cf61aadf2bfb276f1d325a712808 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..39eca1f8ee87f21026ee483dc6c69e6f30ac9bf7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a982c06df9f32675c3de251f96e0b6477ea0943 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..39020c77e8ef1d47e9b3df08417f7f4c6a765249 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..e27315c3572f3c89f1f98fc250e50a3d23661250 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/sniper/README.md b/PaddleDetection-release-2.6/configs/sniper/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3aadee560ee6d7cd3691075db016a94fec7e0ea3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/README.md @@ -0,0 +1,67 @@ +English | [简体中文](README_cn.md) + +# SNIPER: Efficient Multi-Scale Training + +## Model Zoo + +| Sniper | GPU number | images/GPU | Model | Dataset | Schedulers | Box AP | Download | Config | +| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| w/o | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 23.3 | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [config](./faster_rcnn_r50_fpn_1x_visdrone.yml) | +| w/ | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 29.7 | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [config](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) | + +### Note +- Here, we use VisDrone dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricycle, awning-tricycle, bus, motor`. +- Do not support deploy by now because sniper dataset crop behavior. + +## Getting Start +### 1. Training +a. optional: Run `tools/sniper_params_stats.py` to get image_target_sizes\valid_box_ratio_ranges\chip_target_size\chip_target_stride,and modify this params in configs/datasets/sniper_coco_detection.yml +```bash +python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json +``` +b. optional: train detector to get negative proposals. +```bash +python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 & +``` +c. train models +```bash +python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 & +``` + +### 2. Evaluation +Evaluating SNIPER on custom dataset in single GPU with following commands: +```bash +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final +``` + +### 3. Inference +Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo +``` + +## Citations +``` +@misc{1805.09300, +Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis}, +Title = {SNIPER: Efficient Multi-Scale Training}, +Year = {2018}, +Eprint = {arXiv:1805.09300}, +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563}} +``` diff --git a/PaddleDetection-release-2.6/configs/sniper/README_cn.md b/PaddleDetection-release-2.6/configs/sniper/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..a01a3a928c56518ec93f50d3d6645ea086c33321 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/README_cn.md @@ -0,0 +1,68 @@ +简体中文 | [English](README.md) + +# SNIPER: Efficient Multi-Scale Training + +## 模型库 +| 有无sniper | GPU个数 | 每张GPU图片个数 | 骨架网络 | 数据集 | 学习率策略 | Box AP | 模型下载 | 配置文件 | +| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| w/o sniper | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 23.3 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [配置文件](./faster_rcnn_r50_fpn_1x_visdrone.yml) | +| w sniper | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 29.7 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [配置文件](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) | + + +### 注意 +- 我们使用的是`VisDrone`数据集, 并且检查其中的9类,包括 `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`. +- 暂时不支持和导出预测部署(deploy). + + +## 使用说明 +### 1. 训练 +a. 可选:统计数据集信息,获得数据缩放尺度、有效框范围、chip尺度和步长等参数,修改configs/datasets/sniper_coco_detection.yml中对应参数 +```bash +python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json +``` +b. 可选:训练检测器,生成负样本 +```bash +python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 & +``` +c. 训练模型 +```bash +python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 & +``` + +### 2. 评估 +使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果 +```bash +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final +``` + +### 3. 推理 +使用单GPU通过如下命令一键式推理图像,通过`--infer_img`指定图像路径,或通过`--infer_dir`指定目录并推理目录下所有图像 + +```bash +# 推理单张图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png + +# 推理目录下所有图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo +``` + +## Citations +``` +@misc{1805.09300, +Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis}, +Title = {SNIPER: Efficient Multi-Scale Training}, +Year = {2018}, +Eprint = {arXiv:1805.09300}, +} + +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563}} +``` diff --git a/PaddleDetection-release-2.6/configs/sniper/_base_/faster_fpn_reader.yml b/PaddleDetection-release-2.6/configs/sniper/_base_/faster_fpn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..363ca4664b9effb1317e6661732f99113b7d1bff --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/_base_/faster_fpn_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - SniperDecodeCrop: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/sniper/_base_/faster_reader.yml b/PaddleDetection-release-2.6/configs/sniper/_base_/faster_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5c3b348024e8d48289db85706ddd6454f40c0815 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/_base_/faster_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - SniperDecodeCrop: {} + - RandomResize: {target_size: [[800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + + +EvalReader: + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/sniper/_base_/ppyolo_reader.yml b/PaddleDetection-release-2.6/configs/sniper/_base_/ppyolo_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..f88e908c903b256bc08fa209f0f2368e3d58596b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/_base_/ppyolo_reader.yml @@ -0,0 +1,40 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - SniperDecodeCrop: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - SniperDecodeCrop: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml b/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..08039e98dbd5e3e85812c780ffb0dd9dcc555a07 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/sniper_visdrone_detection.yml', + '../runtime.yml', + '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '_base_/faster_fpn_reader.yml', +] +weights: output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final +find_unused_parameters: true diff --git a/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_visdrone.yml b/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..b6a449328e6d5c90c05dd087b5e0074fc292c38a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/faster_rcnn_r50_fpn_1x_visdrone.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../faster_rcnn/_base_/optimizer_1x.yml', + '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml', + '../faster_rcnn/_base_/faster_fpn_reader.yml', +] +weights: output/faster_rcnn_r50_fpn_1x_visdrone/model_final + + +metric: COCO +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: annotations/train.json + dataset_dir: dataset/VisDrone2019_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: annotations/val.json + dataset_dir: dataset/VisDrone2019_coco + +TestDataset: + !ImageFolder + anno_path: annotations/val.json diff --git a/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_sniper_visdrone.yml b/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_sniper_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..c615c7732c51a73c4c4618240a000cf6ce351a80 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_sniper_visdrone.yml @@ -0,0 +1,33 @@ +_BASE_: [ + '../datasets/sniper_visdrone_detection.yml', + '../runtime.yml', + '../ppyolo/_base_/ppyolo_r50vd_dcn.yml', + '../ppyolo/_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 8 +use_ema: true +weights: output/ppyolo_r50vd_dcn_1x_sniper_visdrone/model_final + + + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0. + milestones: + - 153 + - 173 + - !LinearWarmup + start_factor: 0.1 + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_visdrone.yml b/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..dd1db0e872de109b70840f9799de07e80c9bb950 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sniper/ppyolo_r50vd_dcn_1x_visdrone.yml @@ -0,0 +1,54 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyolo/_base_/ppyolo_r50vd_dcn.yml', + '../ppyolo/_base_/optimizer_1x.yml', + '../ppyolo/_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 8 +use_ema: true +weights: output/ppyolo_r50vd_dcn_1x_visdrone/model_final + +epoch: 192 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 153 + - 173 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +metric: COCO +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: annotations/train.json + dataset_dir: dataset/VisDrone2019_coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: annotations/val.json + dataset_dir: dataset/VisDrone2019_coco + +TestDataset: + !ImageFolder + anno_path: annotations/val.json diff --git a/PaddleDetection-release-2.6/configs/solov2/README.md b/PaddleDetection-release-2.6/configs/solov2/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1cc378f847d27fbe0add5d7cd89883b020e6d646 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/README.md @@ -0,0 +1,52 @@ +# SOLOv2 for instance segmentation + +## Introduction + +SOLOv2 (Segmenting Objects by Locations) is a fast instance segmentation framework with strong performance. We reproduced the model of the paper, and improved and optimized the accuracy and speed of the SOLOv2. + +**Highlights:** + +- Training Time: The training time of the model of `solov2_r50_fpn_1x` on Tesla v100 with 8 GPU is only 10 hours. + +## Model Zoo + +| Detector | Backbone | Multi-scale training | Lr schd | Mask APval | V100 FP32(FPS) | GPU | Download | Configs | +| :-------: | :---------------------: | :-------------------: | :-----: | :--------------------: | :-------------: | :-----: | :---------: | :------------------------: | +| YOLACT++ | R50-FPN | False | 80w iter | 34.1 (test-dev) | 33.5 | Xp | - | - | +| CenterMask | R50-FPN | True | 2x | 36.4 | 13.9 | Xp | - | - | +| CenterMask | V2-99-FPN | True | 3x | 40.2 | 8.9 | Xp | - | - | +| PolarMask | R50-FPN | True | 2x | 30.5 | 9.4 | V100 | - | - | +| BlendMask | R50-FPN | True | 3x | 37.8 | 13.5 | V100 | - | - | +| SOLOv2 (Paper) | R50-FPN | False | 1x | 34.8 | 18.5 | V100 | - | - | +| SOLOv2 (Paper) | X101-DCN-FPN | True | 3x | 42.4 | 5.9 | V100 | - | - | +| SOLOv2 | R50-FPN | False | 1x | 35.5 | 21.9 | V100 | [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/solov2_r50_fpn_1x_coco.yml) | +| SOLOv2 | R50-FPN | True | 3x | 38.0 | 21.9 | V100 | [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/solov2_r50_fpn_3x_coco.yml) | +| SOLOv2 | R101vd-FPN | True | 3x | 42.7 | 12.1 | V100 | [model](https://paddledet.bj.bcebos.com/models/solov2_r101_vd_fpn_3x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/solov2_r101_vd_fpn_3x_coco.yml) | + +**Notes:** + +- SOLOv2 is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. + +## Enhanced model +| Backbone | Input size | Lr schd | V100 FP32(FPS) | Mask APval | Download | Configs | +| :---------------------: | :-------------------: | :-----: | :------------: | :-----: | :---------: | :------------------------: | +| Light-R50-VD-DCN-FPN | 512 | 3x | 38.6 | 39.0 | [model](https://paddledet.bj.bcebos.com/models/solov2_r50_enhance_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/solov2_r50_enhance_coco.yml) | + +**Optimizing method of enhanced model:** +- Better backbone network: ResNet50vd-DCN +- A better pre-training model for knowledge distillation +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- Synchronized Batch Normalization +- Multi-scale training +- More data augmentation methods +- DropBlock + +## Citations +``` +@article{wang2020solov2, + title={SOLOv2: Dynamic, Faster and Stronger}, + author={Wang, Xinlong and Zhang, Rufeng and Kong, Tao and Li, Lei and Shen, Chunhua}, + journal={arXiv preprint arXiv:2003.10152}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/configs/solov2/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/solov2/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..d034482d1e007c4e07fc9b1323b86e04588710bb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0. + steps: 1000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_light_reader.yml b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_light_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..901049c13d35251558c9235058cad80d8e5ea1be --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_light_reader.yml @@ -0,0 +1,47 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {} + - RandomDistort: {} + - RandomCrop: {} + - RandomResize: {interp: 1, + target_size: [[352, 852], [384, 852], [416, 852], [448, 852], [480, 852], [512, 852]], + keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2Solov2Target: {num_grids: [40, 36, 24, 16, 12], + scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]], + coord_sigma: 0.2} + batch_size: 2 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Resize: {interp: 1, target_size: [512, 852], keep_ratio: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Resize: {interp: 1, target_size: [512, 852], keep_ratio: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_r50_fpn.yml b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..93a6892698a879c6ff60e731f617e6d0649072a9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_r50_fpn.yml @@ -0,0 +1,40 @@ +architecture: SOLOv2 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +SOLOv2: + backbone: ResNet + neck: FPN + solov2_head: SOLOv2Head + mask_head: SOLOv2MaskHead + +ResNet: + depth: 50 + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +SOLOv2Head: + seg_feat_channels: 512 + stacked_convs: 4 + num_grids: [40, 36, 24, 16, 12] + kernel_out_channels: 256 + solov2_loss: SOLOv2Loss + mask_nms: MaskMatrixNMS + +SOLOv2MaskHead: + mid_channels: 128 + out_channels: 256 + start_level: 0 + end_level: 3 + +SOLOv2Loss: + ins_loss_weight: 3.0 + focal_loss_gamma: 2.0 + focal_loss_alpha: 0.25 + +MaskMatrixNMS: + pre_nms_top_n: 500 + post_nms_top_n: 100 diff --git a/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_reader.yml b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..f21516235f46a1822387d48c73e7026763e7fc4c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/_base_/solov2_reader.yml @@ -0,0 +1,43 @@ +worker_num: 8 +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2Solov2Target: {num_grids: [40, 36, 24, 16, 12], + scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]], + coord_sigma: 0.2} + batch_size: 2 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/solov2/solov2_r101_vd_fpn_3x_coco.yml b/PaddleDetection-release-2.6/configs/solov2/solov2_r101_vd_fpn_3x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..db29c9ad19edd5396562e1b9e3f8400ae1a3367c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/solov2_r101_vd_fpn_3x_coco.yml @@ -0,0 +1,66 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/solov2_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/solov2_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_pretrained.pdparams +weights: output/solov2_r101_vd_fpn_3x_coco/model_final +epoch: 36 +use_ema: true +ema_decay: 0.9998 + +ResNet: + depth: 101 + variant: d + freeze_at: 0 + return_idx: [0,1,2,3] + dcn_v2_stages: [1,2,3] + num_stages: 4 + +SOLOv2Head: + seg_feat_channels: 512 + stacked_convs: 4 + num_grids: [40, 36, 24, 16, 12] + kernel_out_channels: 256 + solov2_loss: SOLOv2Loss + mask_nms: MaskMatrixNMS + dcn_v2_stages: [0, 1, 2, 3] + +SOLOv2MaskHead: + mid_channels: 128 + out_channels: 256 + start_level: 0 + end_level: 3 + use_dcn_in_tower: True + + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0. + steps: 2000 + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {} + - RandomResize: {interp: 1, + target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], + keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2Solov2Target: {num_grids: [40, 36, 24, 16, 12], + scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]], + coord_sigma: 0.2} + batch_size: 2 + shuffle: true + drop_last: true diff --git a/PaddleDetection-release-2.6/configs/solov2/solov2_r50_enhance_coco.yml b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_enhance_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..0cadd8783a3c45efc4c20f96fcd3241a0df8c02a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_enhance_coco.yml @@ -0,0 +1,50 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/solov2_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/solov2_light_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams +weights: output/solov2_r50_fpn_3x_coco/model_final +epoch: 36 +use_ema: true +ema_decay: 0.9998 + +ResNet: + depth: 50 + variant: d + freeze_at: 0 + freeze_norm: false + norm_type: sync_bn + return_idx: [0,1,2,3] + dcn_v2_stages: [1,2,3] + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + num_stages: 4 + +SOLOv2Head: + seg_feat_channels: 256 + stacked_convs: 3 + num_grids: [40, 36, 24, 16, 12] + kernel_out_channels: 128 + solov2_loss: SOLOv2Loss + mask_nms: MaskMatrixNMS + dcn_v2_stages: [2] + drop_block: True + +SOLOv2MaskHead: + mid_channels: 128 + out_channels: 128 + start_level: 0 + end_level: 3 + use_dcn_in_tower: True + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..e5f548d53a80937be526f9c927fa8f6cdb6e7e9c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/solov2_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/solov2_reader.yml', +] +weights: output/solov2_r50_fpn_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_3x_coco.yml b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_3x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ffff46bbfd6806036ac602091da747d06eb8bd7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/solov2/solov2_r50_fpn_3x_coco.yml @@ -0,0 +1,38 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + '_base_/solov2_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/solov2_reader.yml', +] +weights: output/solov2_r50_fpn_3x_coco/model_final +epoch: 36 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [24, 33] + - !LinearWarmup + start_factor: 0. + steps: 1000 + +TrainReader: + sample_transforms: + - Decode: {} + - Poly2Mask: {} + - RandomResize: {interp: 1, + target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], + keep_ratio: True} + - RandomFlip: {} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2Solov2Target: {num_grids: [40, 36, 24, 16, 12], + scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]], + coord_sigma: 0.2} + batch_size: 2 + shuffle: true + drop_last: true diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/README.md b/PaddleDetection-release-2.6/configs/sparse_rcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5443b037f247938fc9a72194fff62c9a27cedc50 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/README.md @@ -0,0 +1,25 @@ +# Sparse R-CNN: End-to-End Object Detection with Learnable Proposals + + +## Introduction +Sparse RCNN is a purely sparse method for object detection in images. + + +## Model Zoo + +| Backbone | Proposals | lr schedule | Box AP | download | config | +| :-------------- | :-----: | :------------: | :-----: | :-----: | :-----: | +| ResNet50-FPN | 100 | 3x | 43.0 | [download](https://paddledet.bj.bcebos.com/models/sparse_rcnn_r50_fpn_3x_pro100_coco.pdparams) | [config](./sparse_rcnn_r50_fpn_3x_pro100_coco.yml) | +| ResNet50-FPN | 300 | 3x | 44.6 | [download](https://paddledet.bj.bcebos.com/models/sparse_rcnn_r50_fpn_3x_pro300_coco.pdparams) | [config](./sparse_rcnn_r50_fpn_3x_pro300_coco.yml) | + +## Citations +``` +@misc{sun2021sparse, + title={Sparse R-CNN: End-to-End Object Detection with Learnable Proposals}, + author={Peize Sun and Rufeng Zhang and Yi Jiang and Tao Kong and Chenfeng Xu and Wei Zhan and Masayoshi Tomizuka and Lei Li and Zehuan Yuan and Changhu Wang and Ping Luo}, + year={2021}, + eprint={2011.12450}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/optimizer_3x.yml b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/optimizer_3x.yml new file mode 100644 index 0000000000000000000000000000000000000000..19e1037130158909632a4d6515f6adf53cf5ad3c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/optimizer_3x.yml @@ -0,0 +1,17 @@ +epoch: 36 + +LearningRate: + base_lr: 0.000025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [28, 34] + - !LinearWarmup + start_factor: 0.01 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 1.0 + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_r50_fpn.yml b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..9f7516fcd8652c866ad660f2f0afc9e36f1a6033 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_r50_fpn.yml @@ -0,0 +1,44 @@ +architecture: SparseRCNN +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +SparseRCNN: + backbone: ResNet + neck: FPN + head: SparseRCNNHead + postprocess: SparsePostProcess + +ResNet: + # index 0 stands for res2 + depth: 50 + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + +SparseRCNNHead: + head_hidden_dim: 256 + head_dim_feedforward: 2048 + nhead: 8 + head_dropout: 0.0 + head_cls: 1 + head_reg: 3 + head_dim_dynamic: 64 + head_num_dynamic: 2 + head_num_heads: 6 + deep_supervision: true + num_proposals: 100 + loss_func: SparseRCNNLoss + +SparseRCNNLoss: + losses: ["labels", "boxes"] + focal_loss_alpha: 0.25 + focal_loss_gamma: 2.0 + class_weight: 2.0 + l1_weight: 5.0 + giou_weight: 2.0 + +SparsePostProcess: + num_proposals: 100 diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_reader.yml b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..b9544b31c44159b66cd63df23a6d6a79aeb081bd --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/_base_/sparse_rcnn_reader.yml @@ -0,0 +1,41 @@ +worker_num: 4 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[480, 1333], [512, 1333], [544, 1333], [576, 1333], [608, 1333], [640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: true, interp: 1} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {use_padding_shape: True} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {use_padding_shape: True} + batch_size: 1 + shuffle: false + drop_last: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - Gt2SparseTarget: {use_padding_shape: True} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml b/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6123f149df1fd72c7446eec4e8702eb3df592441 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/sparse_rcnn_r50_fpn.yml', + '_base_/optimizer_3x.yml', + '_base_/sparse_rcnn_reader.yml', +] + +num_classes: 80 +weights: output/sparse_rcnn_r50_fpn_3x_pro100_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro300_coco.yml b/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro300_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..6cb3187829cd021753d5c699a1389abfd5048764 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro300_coco.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/sparse_rcnn_r50_fpn.yml', + '_base_/optimizer_3x.yml', + '_base_/sparse_rcnn_reader.yml', +] + +num_classes: 80 +weights: output/sparse_rcnn_r50_fpn_3x_pro300_coco/model_final + +snapshot_epoch: 1 + + +SparseRCNNHead: + num_proposals: 300 + +SparsePostProcess: + num_proposals: 300 diff --git a/PaddleDetection-release-2.6/configs/ssd/README.md b/PaddleDetection-release-2.6/configs/ssd/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1b8a82d0960ca579acfd89a90c24740250d2bf59 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/README.md @@ -0,0 +1,22 @@ +# SSD: Single Shot MultiBox Detector + +## Model Zoo + +### SSD on Pascal VOC + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| VGG | SSD | 8 | 240e | ---- | 77.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_vgg16_300_240e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_vgg16_300_240e_voc.yml) | +| MobileNet v1 | SSD | 32 | 120e | ---- | 73.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | + +**注意:** SSD-VGG使用4GPU在总batch size为32下训练240个epoch。SSD-MobileNetv1使用2GPU在总batch size为64下训练120周期。 + +## Citations +``` +@article{Liu_2016, + title={SSD: Single Shot MultiBox Detector}, + journal={ECCV}, + author={Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.}, + year={2016}, +} +``` diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_120e.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_120e.yml new file mode 100644 index 0000000000000000000000000000000000000000..6fb65a906245bcf13106d60eea24110f3c62c70b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_120e.yml @@ -0,0 +1,17 @@ +epoch: 120 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + milestones: [40, 60, 80, 100] + gamma: [0.5, 0.5, 0.4, 0.1] + use_warmup: false + +OptimizerBuilder: + optimizer: + momentum: 0.0 + type: RMSProp + regularizer: + factor: 0.00005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_1700e.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_1700e.yml new file mode 100644 index 0000000000000000000000000000000000000000..fe5fedc7cd33855ef103325359adda131587fe64 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_1700e.yml @@ -0,0 +1,18 @@ +epoch: 1700 + +LearningRate: + base_lr: 0.4 + schedulers: + - !CosineDecay + max_epochs: 1700 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 2000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_240e.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_240e.yml new file mode 100644 index 0000000000000000000000000000000000000000..de31eac3d22c97b2b72083a79342b880f4be9b8a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_240e.yml @@ -0,0 +1,21 @@ +epoch: 240 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 160 + - 200 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_70e.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_70e.yml new file mode 100644 index 0000000000000000000000000000000000000000..7cf56fee5f57e9b885265cf8a266af9acab3af8f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/optimizer_70e.yml @@ -0,0 +1,17 @@ +epoch: 70 + +LearningRate: + base_lr: 0.05 + schedulers: + - !PiecewiseDecay + milestones: [48, 60] + gamma: [0.1, 0.1] + use_warmup: false + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_reader.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..2b3e1da90b0d48ebb2b8cdd2a28174c495d698e4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_reader.yml @@ -0,0 +1,39 @@ +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [127.5, 127.5, 127.5]} + - RandomCrop: {allow_no_crop: False} + - RandomFlip: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [127.5, 127.5, 127.5], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 32 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [127.5, 127.5, 127.5], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + + +TestReader: + inputs_def: + image_shape: [3, 300, 300] + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [127.5, 127.5, 127.5], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_v1_300.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_v1_300.yml new file mode 100644 index 0000000000000000000000000000000000000000..b8fe6946eeaf43272a7bb5c7e94b7df5b420802e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_mobilenet_v1_300.yml @@ -0,0 +1,41 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ssd_mobilenet_v1_coco_pretrained.pdparams + +SSD: + backbone: MobileNet + ssd_head: SSDHead + post_process: BBoxPostProcess + +MobileNet: + norm_decay: 0. + scale: 1 + conv_learning_rate: 0.1 + extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] + with_extra_blocks: true + feature_maps: [11, 13, 14, 15, 16, 17] + +SSDHead: + kernel_size: 1 + padding: 0 + anchor_generator: + steps: [0, 0, 0, 0, 0, 0] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] + min_ratio: 20 + max_ratio: 90 + base_size: 300 + min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0] + max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0] + offset: 0.5 + flip: true + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_300.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_300.yml new file mode 100644 index 0000000000000000000000000000000000000000..5b463b718d245205fe4daaf336fbe92d5725bbdf --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_300.yml @@ -0,0 +1,38 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams + +SSD: + backbone: ResNet + ssd_head: SSDHead + post_process: BBoxPostProcess + r34_backbone: True + +ResNet: + # index 0 stands for res2 + depth: 34 + norm_type: bn + freeze_norm: False + freeze_at: -1 + return_idx: [2] + num_stages: 3 + +SSDHead: + anchor_generator: + steps: [8, 16, 32, 64, 100, 300] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] + min_sizes: [21.0, 45.0, 99.0, 153.0, 207.0, 261.0] + max_sizes: [45.0, 99.0, 153.0, 207.0, 261.0, 315.0] + offset: 0.5 + clip: True + min_max_aspect_ratios_order: True + use_extra_head: True + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.05 + nms_threshold: 0.5 + nms_top_k: 400 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_reader.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..0888b30a0af972279a1dc026a778ec476deb310c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_r34_reader.yml @@ -0,0 +1,38 @@ +worker_num: 3 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomCrop: {num_attempts: 1} + - RandomFlip: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - RandomDistort: {brightness: [0.875, 1.125, 0.5], random_apply: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: true + use_shared_memory: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 + + +TestReader: + inputs_def: + image_shape: [3, 300, 300] + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_reader.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..22f8cc0a3ad2d2f11ffc1c77e3354f34abc431b6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 90 + + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [104., 117., 123.]} + - RandomCrop: {allow_no_crop: true} + - RandomFlip: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + + batch_transforms: + - NormalizeImage: {mean: [104., 117., 123.], std: [1., 1., 1.], is_scale: false} + - Permute: {} + + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [104., 117., 123.], std: [1., 1., 1.], is_scale: false} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 300, 300] + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [104., 117., 123.], std: [1., 1., 1.], is_scale: false} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_vgg16_300.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_vgg16_300.yml new file mode 100644 index 0000000000000000000000000000000000000000..8d322d9c1f8646a40d5256180546af63eea8a8fb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssd_vgg16_300.yml @@ -0,0 +1,37 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/VGG16_caffe_pretrained.pdparams + +# Model Architecture +SSD: + # model feat info flow + backbone: VGG + ssd_head: SSDHead + # post process + post_process: BBoxPostProcess + +VGG: + depth: 16 + normalizations: [20., -1, -1, -1, -1, -1] + +SSDHead: + anchor_generator: + steps: [8, 16, 32, 64, 100, 300] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] + min_ratio: 20 + max_ratio: 90 + min_sizes: [30.0, 60.0, 111.0, 162.0, 213.0, 264.0] + max_sizes: [60.0, 111.0, 162.0, 213.0, 264.0, 315.0] + offset: 0.5 + flip: true + min_max_aspect_ratios_order: true + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite300_reader.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite300_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..86b69737cbe9c5d35893da15fbbfe44579c91f9f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite300_reader.yml @@ -0,0 +1,39 @@ +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {allow_no_crop: Fasle} + - RandomFlip: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 + + +TestReader: + inputs_def: + image_shape: [3, 300, 300] + sample_transforms: + - Decode: {} + - Resize: {target_size: [300, 300], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite320_reader.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite320_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..57eeadc6ebe751e6657e84359ecfaac8cdb67824 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite320_reader.yml @@ -0,0 +1,39 @@ +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {allow_no_crop: Fasle} + - RandomFlip: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 + + +TestReader: + inputs_def: + image_shape: [3, 320, 320] + sample_transforms: + - Decode: {} + - Resize: {target_size: [320, 320], keep_ratio: False, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: true} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_ghostnet_320.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_ghostnet_320.yml new file mode 100644 index 0000000000000000000000000000000000000000..6a9e13b5a1ca30a0dee10388a1931ffba0d412eb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_ghostnet_320.yml @@ -0,0 +1,42 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/GhostNet_x1_3_ssld_pretrained.pdparams + +SSD: + backbone: GhostNet + ssd_head: SSDHead + post_process: BBoxPostProcess + +GhostNet: + scale: 1.3 + conv_decay: 0.00004 + with_extra_blocks: true + extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] + feature_maps: [13, 18, 19, 20, 21, 22] + lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] + +SSDHead: + use_sepconv: True + conv_decay: 0.00004 + anchor_generator: + steps: [16, 32, 64, 107, 160, 320] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] + min_ratio: 20 + max_ratio: 95 + base_size: 320 + min_sizes: [] + max_sizes: [] + offset: 0.5 + flip: true + clip: true + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v1_300.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v1_300.yml new file mode 100644 index 0000000000000000000000000000000000000000..db811ade9d7b89d2b407d40dec7e313feff11420 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v1_300.yml @@ -0,0 +1,41 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_ssld_pretrained.pdparams + +SSD: + backbone: MobileNet + ssd_head: SSDHead + post_process: BBoxPostProcess + +MobileNet: + conv_decay: 0.00004 + scale: 1 + extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] + with_extra_blocks: true + feature_maps: [11, 13, 14, 15, 16, 17] + +SSDHead: + use_sepconv: True + conv_decay: 0.00004 + anchor_generator: + steps: [16, 32, 64, 100, 150, 300] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] + min_ratio: 20 + max_ratio: 95 + base_size: 300 + min_sizes: [] + max_sizes: [] + offset: 0.5 + flip: true + clip: true + min_max_aspect_ratios_order: False + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_large_320.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_large_320.yml new file mode 100644 index 0000000000000000000000000000000000000000..cc6e3284a3ed961009128c6b7c51f6abd901f376 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_large_320.yml @@ -0,0 +1,44 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams + +SSD: + backbone: MobileNetV3 + ssd_head: SSDHead + post_process: BBoxPostProcess + +MobileNetV3: + scale: 1.0 + model_name: large + conv_decay: 0.00004 + with_extra_blocks: true + extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] + feature_maps: [14, 17, 18, 19, 20, 21] + lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] + multiplier: 0.5 + +SSDHead: + use_sepconv: True + conv_decay: 0.00004 + anchor_generator: + steps: [16, 32, 64, 107, 160, 320] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] + min_ratio: 20 + max_ratio: 95 + base_size: 320 + min_sizes: [] + max_sizes: [] + offset: 0.5 + flip: true + clip: true + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_small_320.yml b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_small_320.yml new file mode 100644 index 0000000000000000000000000000000000000000..887f95fa291c772d73d7b133f488e8282d315940 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/_base_/ssdlite_mobilenet_v3_small_320.yml @@ -0,0 +1,44 @@ +architecture: SSD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_small_x1_0_ssld_pretrained.pdparams + +SSD: + backbone: MobileNetV3 + ssd_head: SSDHead + post_process: BBoxPostProcess + +MobileNetV3: + scale: 1.0 + model_name: small + conv_decay: 0.00004 + with_extra_blocks: true + extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] + feature_maps: [10, 13, 14, 15, 16, 17] + lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] + multiplier: 0.5 + +SSDHead: + use_sepconv: True + conv_decay: 0.00004 + anchor_generator: + steps: [16, 32, 64, 107, 160, 320] + aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] + min_ratio: 20 + max_ratio: 95 + base_size: 320 + min_sizes: [] + max_sizes: [] + offset: 0.5 + flip: true + clip: true + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 200 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 400 + nms_eta: 1.0 diff --git a/PaddleDetection-release-2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml b/PaddleDetection-release-2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..feaec0c43273aecded4dc1d6c63164ceef50c487 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml @@ -0,0 +1,14 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_120e.yml', + '_base_/ssd_mobilenet_v1_300.yml', + '_base_/ssd_mobilenet_reader.yml', +] +weights: output/ssd_mobilenet_v1_300_120e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false diff --git a/PaddleDetection-release-2.6/configs/ssd/ssd_r34_70e_coco.yml b/PaddleDetection-release-2.6/configs/ssd/ssd_r34_70e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..3c5af37f06a37da90f398d681bc623dbed09b7c1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssd_r34_70e_coco.yml @@ -0,0 +1,11 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_70e.yml', + '_base_/ssd_r34_300.yml', + '_base_/ssd_r34_reader.yml', +] +weights: output/ssd_r34_70e_coco/model_final + +log_iter: 100 +snapshot_epoch: 5 diff --git a/PaddleDetection-release-2.6/configs/ssd/ssd_vgg16_300_240e_voc.yml b/PaddleDetection-release-2.6/configs/ssd/ssd_vgg16_300_240e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..ff24242a1fb94a8a895b6230684865bb40fff44a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssd_vgg16_300_240e_voc.yml @@ -0,0 +1,14 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_240e.yml', + '_base_/ssd_vgg16_300.yml', + '_base_/ssd_reader.yml', +] +weights: output/ssd_vgg16_300_240e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false diff --git a/PaddleDetection-release-2.6/configs/ssd/ssdlite_ghostnet_320_coco.yml b/PaddleDetection-release-2.6/configs/ssd/ssdlite_ghostnet_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c6eb6c11725bb041f12a1234ebe931cb019bd18e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssdlite_ghostnet_320_coco.yml @@ -0,0 +1,27 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1700e.yml', + '_base_/ssdlite_ghostnet_320.yml', + '_base_/ssdlite320_reader.yml', +] +weights: output/ssdlite_ghostnet_320_coco/model_final + +epoch: 1700 + +LearningRate: + base_lr: 0.2 + schedulers: + - !CosineDecay + max_epochs: 1700 + - !LinearWarmup + start_factor: 0.33333 + steps: 2000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v1_300_coco.yml b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v1_300_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..75cb8a8a20dad059f2f638378bbe4ef418bcfd27 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v1_300_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1700e.yml', + '_base_/ssdlite_mobilenet_v1_300.yml', + '_base_/ssdlite300_reader.yml', +] +weights: output/ssdlite_mobilenet_v1_300_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_large_320_coco.yml b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_large_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..78d561aade1cfcf27ea15b56c1225ae2aebdc4da --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_large_320_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1700e.yml', + '_base_/ssdlite_mobilenet_v3_large_320.yml', + '_base_/ssdlite320_reader.yml', +] +weights: output/ssdlite_mobilenet_v3_large_320_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_small_320_coco.yml b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_small_320_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..fa0ce5346b2d5a08fd7816ae986c7747a410af8b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ssd/ssdlite_mobilenet_v3_small_320_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1700e.yml', + '_base_/ssdlite_mobilenet_v3_small_320.yml', + '_base_/ssdlite320_reader.yml', +] +weights: output/ssdlite_mobilenet_v3_small_320_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/tood/README.md b/PaddleDetection-release-2.6/configs/tood/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1eccb73dc50ced15bbe2cac9b11f340fba00ea78 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/tood/README.md @@ -0,0 +1,35 @@ +# TOOD + +## Introduction + +[TOOD: Task-aligned One-stage Object Detection](https://arxiv.org/abs/2108.07755) + +TOOD is an object detection model. We reproduced the model of the paper. + + +## Model Zoo + +| Backbone | Model | Images/GPU | Inf time (fps) | Box AP | Config | Download | +|:------:|:--------:|:--------:|:--------------:|:------:|:------:|:--------:| +| R-50 | TOOD | 4 | --- | 42.5 | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/tood/tood_r50_fpn_1x_coco.yml) | [model](https://paddledet.bj.bcebos.com/models/tood_r50_fpn_1x_coco.pdparams) | + +**Notes:** + +- TOOD is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. +- TOOD uses 8GPU to train 12 epochs. + +GPU multi-card training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/tood/tood_r50_fpn_1x_coco.yml --fleet +``` + +## Citations +``` +@inproceedings{feng2021tood, + title={TOOD: Task-aligned One-stage Object Detection}, + author={Feng, Chengjian and Zhong, Yujie and Gao, Yu and Scott, Matthew R and Huang, Weilin}, + booktitle={ICCV}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/tood/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/tood/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..39c54ac805031619debf9b31119afa86b3ead857 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/tood/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/tood/_base_/tood_r50_fpn.yml b/PaddleDetection-release-2.6/configs/tood/_base_/tood_r50_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..0cb8575b09beb8ba4d0e20d2512bdac5b34ecaf1 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/tood/_base_/tood_r50_fpn.yml @@ -0,0 +1,42 @@ +architecture: TOOD +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +TOOD: + backbone: ResNet + neck: FPN + head: TOODHead + +ResNet: + depth: 50 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1, 2, 3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +TOODHead: + stacked_convs: 6 + grid_cell_scale: 8 + static_assigner_epoch: 4 + loss_weight: { class: 1.0, iou: 2.0 } + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/tood/_base_/tood_reader.yml b/PaddleDetection-release-2.6/configs/tood/_base_/tood_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..2807a2b81b3e19f73791b90d024dcc03c79d3942 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/tood/_base_/tood_reader.yml @@ -0,0 +1,40 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - Resize: {target_size: [800, 1333], keep_ratio: true} + - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: true + use_shared_memory: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false diff --git a/PaddleDetection-release-2.6/configs/tood/tood_r50_fpn_1x_coco.yml b/PaddleDetection-release-2.6/configs/tood/tood_r50_fpn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..3d05c9884ea12013ea7b599d9c04c81abd709f40 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/tood/tood_r50_fpn_1x_coco.yml @@ -0,0 +1,11 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/tood_r50_fpn.yml', + '_base_/optimizer_1x.yml', + '_base_/tood_reader.yml', +] + +weights: output/tood_r50_fpn_1x_coco/model_final +find_unused_parameters: True +log_iter: 100 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/README.md b/PaddleDetection-release-2.6/configs/ttfnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..63836613b0a8aff5fd2ae0cd4b8d908c3cefdc55 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/README.md @@ -0,0 +1,68 @@ +# 1. TTFNet + +## 简介 + +TTFNet是一种用于实时目标检测且对训练时间友好的网络,对CenterNet收敛速度慢的问题进行改进,提出了利用高斯核生成训练样本的新方法,有效的消除了anchor-free head中存在的模糊性。同时简单轻量化的网络结构也易于进行任务扩展。 + +**特点:** + +结构简单,仅需要两个head检测目标位置和大小,并且去除了耗时的后处理操作 +训练时间短,基于DarkNet53的骨干网路,V100 8卡仅需要训练2个小时即可达到较好的模型效果 + +## Model Zoo + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) | + + + + + +# 2. PAFNet + +## 简介 + +PAFNet(Paddle Anchor Free)是PaddleDetection基于TTFNet的优化模型,精度达到anchor free领域SOTA水平,同时产出移动端轻量级模型PAFNet-Lite + +PAFNet系列模型从如下方面优化TTFNet模型: + +- [CutMix](https://arxiv.org/abs/1905.04899) +- 更优的骨干网络: ResNet50vd-DCN +- 更大的训练batch size: 8 GPUs,每GPU batch_size=18 +- Synchronized Batch Normalization +- [Deformable Convolution](https://arxiv.org/abs/1703.06211) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- 更优的预训练模型 + + +## 模型库 + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/pafnet_10x_coco.yml) | + + + +### PAFNet-Lite + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | 麒麟990延时(ms) | 体积(M) | 下载 | 配置文件 | +| :-------------- | :------------- | :-----: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) | + +**注意:** 由于动态图框架整体升级,PAFNet的PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias +``` + +## Citations +``` +@article{liu2019training, + title = {Training-Time-Friendly Network for Real-Time Object Detection}, + author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai}, + journal = {arXiv preprint arXiv:1909.00700}, + year = {2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ttfnet/README_en.md b/PaddleDetection-release-2.6/configs/ttfnet/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..f08fcfdfa9276194ca954cf6e9ae0deea1989487 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/README_en.md @@ -0,0 +1,69 @@ +# 1. TTFNet + +## Introduction + +TTFNet is a network used for real-time object detection and friendly to training time. It improves the slow convergence speed of CenterNet and proposes a new method to generate training samples using Gaussian kernel, which effectively eliminates the fuzziness existing in Anchor Free head. At the same time, the simple and lightweight network structure is also easy to expand the task. + + +**Characteristics:** + +The structure is simple, requiring only two heads to detect target position and size, and eliminating time-consuming post-processing operations +The training time is short. Based on DarkNet53 backbone network, V100 8 cards only need 2 hours of training to achieve better model effect + +## Model Zoo + +| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File | +| :-------- | :----------- | :----------------------: | :--------------------: | :-----------------: | :----: | :------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [link](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) | + + + + + +# 2. PAFNet + +## Introduction + +PAFNet (Paddle Anchor Free) is an optimized model of PaddleDetection based on TTF Net, whose accuracy reaches the SOTA level in the Anchor Free field, and meanwhile produces mobile lightweight model PAFNet-Lite + +PAFNet series models optimize TTFNet model from the following aspects: + +- [CutMix](https://arxiv.org/abs/1905.04899) +- Better backbone network: ResNet50vd-DCN +- Larger training batch size: 8 GPUs, each GPU batch size=18 +- Synchronized Batch Normalization +- [Deformable Convolution](https://arxiv.org/abs/1703.06211) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- Better pretraining model + + +## Model library + +| Backbone | Net type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File | +| :--------- | :------- | :----------------------: | :--------------------: | :-----------------: | :----: | :---------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [link](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/pafnet_10x_coco.yml) | + + + +### PAFNet-Lite + +| Backbone | Net type | Number of images per GPU | Learning rate strategy | Box AP | kirin 990 delay(ms) | volume(M) | Download | Configuration File | +| :---------- | :---------- | :----------------------: | :--------------------: | :----: | :-------------------: | :---------: | :---------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [link](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) | + +**Attention:** Due to the overall upgrade of the dynamic graph framework, the weighting model published by PaddleDetection of PAF Net needs to be evaluated with a --bias field, for example + +```bash +# Published weights using Paddle Detection +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias +``` + +## Citations +``` +@article{liu2019training, + title = {Training-Time-Friendly Network for Real-Time Object Detection}, + author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai}, + journal = {arXiv preprint arXiv:1909.00700}, + year = {2019} +} +``` diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_10x.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_10x.yml new file mode 100644 index 0000000000000000000000000000000000000000..dd2c29d966650d76b0636b3f889e13efbbe5d95a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_10x.yml @@ -0,0 +1,19 @@ +epoch: 120 + +LearningRate: + base_lr: 0.015 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [80, 110] + - !LinearWarmup + start_factor: 0.2 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..8457ead9add410c85d75c0427748e6d3d4eb8319 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.015 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.2 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_20x.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_20x.yml new file mode 100644 index 0000000000000000000000000000000000000000..4dd3492202a3fdf9a612541c0ecd1dc76f1b6519 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/optimizer_20x.yml @@ -0,0 +1,20 @@ +epoch: 240 + +LearningRate: + base_lr: 0.015 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [160, 220] + - !LinearWarmup + start_factor: 0.2 + steps: 1000 + +OptimizerBuilder: + clip_grad_by_norm: 35 + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0004 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet.yml new file mode 100644 index 0000000000000000000000000000000000000000..c3b21c5cce9d59f0f382a4d051d58e8f4ecdc0bb --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet.yml @@ -0,0 +1,40 @@ +architecture: TTFNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +TTFNet: + backbone: ResNet + neck: TTFFPN + ttf_head: TTFHead + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [0, 1, 2, 3] + freeze_at: -1 + norm_decay: 0. + dcn_v2_stages: [1, 2, 3] + +TTFFPN: + planes: [256, 128, 64] + shortcut_num: [3, 2, 1] + +TTFHead: + dcn_head: true + hm_loss: + name: CTFocalLoss + loss_weight: 1. + wh_loss: + name: GIoULoss + loss_weight: 5. + reduction: sum + +BBoxPostProcess: + decode: + name: TTFBox + max_per_img: 100 + score_thresh: 0.01 + down_ratio: 4 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite.yml new file mode 100644 index 0000000000000000000000000000000000000000..5ed2fa235b6eb0f35690183a884dabbea43b279e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite.yml @@ -0,0 +1,44 @@ +architecture: TTFNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams +norm_type: sync_bn + +TTFNet: + backbone: MobileNetV3 + neck: TTFFPN + ttf_head: TTFHead + post_process: BBoxPostProcess + +MobileNetV3: + scale: 1.0 + model_name: large + feature_maps: [5, 8, 14, 17] + with_extra_blocks: true + lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] + conv_decay: 0.00001 + norm_decay: 0.0 + extra_block_filters: [] + +TTFFPN: + planes: [96, 48, 24] + shortcut_num: [2, 2, 1] + lite_neck: true + fusion_method: concat + +TTFHead: + hm_head_planes: 48 + wh_head_planes: 24 + lite_head: true + hm_loss: + name: CTFocalLoss + loss_weight: 1. + wh_loss: + name: GIoULoss + loss_weight: 5. + reduction: sum + +BBoxPostProcess: + decode: + name: TTFBox + max_per_img: 100 + score_thresh: 0.01 + down_ratio: 4 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite_reader.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..024792114961e98f764e572efc548fb0bec7a7e5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_lite_reader.yml @@ -0,0 +1,37 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [-32., 32., 0.5], random_apply: False, random_channel: True} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {aspect_ratio: NULL, cover_all_box: True} + - RandomFlip: {} + - GridMask: {upper_iter: 300000} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false} + - Permute: {} + - Gt2TTFTarget: {down_ratio: 4} + - PadBatch: {pad_to_stride: 32} + batch_size: 12 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [320, 320], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [320, 320], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_reader.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..ccbbdb257b21fca0a9901512d6f2f1962fc5105e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/pafnet_reader.yml @@ -0,0 +1,36 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [-32., 32., 0.5], random_apply: false, random_channel: true} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {aspect_ratio: NULL, cover_all_box: True} + - RandomFlip: {prob: 0.5} + batch_transforms: + - BatchRandomResize: {target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672], keep_ratio: false} + - NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false} + - Permute: {} + - Gt2TTFTarget: {down_ratio: 4} + - PadBatch: {pad_to_stride: 32} + batch_size: 18 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_darknet53.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_darknet53.yml new file mode 100644 index 0000000000000000000000000000000000000000..05c7dce6503209c76da2c62613e3e2960ce47cc0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_darknet53.yml @@ -0,0 +1,35 @@ +architecture: TTFNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams + +TTFNet: + backbone: DarkNet + neck: TTFFPN + ttf_head: TTFHead + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + freeze_at: 0 + return_idx: [1, 2, 3, 4] + norm_type: bn + norm_decay: 0.0004 + +TTFFPN: + planes: [256, 128, 64] + shortcut_num: [3, 2, 1] + +TTFHead: + hm_loss: + name: CTFocalLoss + loss_weight: 1. + wh_loss: + name: GIoULoss + loss_weight: 5. + reduction: sum + +BBoxPostProcess: + decode: + name: TTFBox + max_per_img: 100 + score_thresh: 0.01 + down_ratio: 4 diff --git a/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_reader.yml b/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..9c12af727db6e7a199b76ccfda2286bc891a5b22 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/_base_/ttfnet_reader.yml @@ -0,0 +1,33 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {prob: 0.5} + - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False} + - NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false} + - Permute: {} + batch_transforms: + - Gt2TTFTarget: {down_ratio: 4} + - PadBatch: {pad_to_stride: 32} + batch_size: 12 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False} + - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]} + - Permute: {} + batch_size: 1 + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/ttfnet/pafnet_10x_coco.yml b/PaddleDetection-release-2.6/configs/ttfnet/pafnet_10x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b14a2bc912cce4cc4b0edca538cc19c3e51f65a5 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/pafnet_10x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_10x.yml', + '_base_/pafnet.yml', + '_base_/pafnet_reader.yml', +] +weights: output/pafnet_10x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml b/PaddleDetection-release-2.6/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..577af1635acc3c7778114db78775bd720727a588 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_20x.yml', + '_base_/pafnet_lite.yml', + '_base_/pafnet_lite_reader.yml', +] +weights: output/pafnet_lite_mobilenet_v3_10x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/ttfnet/ttfnet_darknet53_1x_coco.yml b/PaddleDetection-release-2.6/configs/ttfnet/ttfnet_darknet53_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..59123921f43742fa1bfc9d98ecd50f3bdb0bfaa7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/ttfnet/ttfnet_darknet53_1x_coco.yml @@ -0,0 +1,8 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_1x.yml', + '_base_/ttfnet_darknet53.yml', + '_base_/ttfnet_reader.yml', +] +weights: output/ttfnet_darknet53_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/vitdet/README.md b/PaddleDetection-release-2.6/configs/vitdet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e0037858544b671de34c79f32f43baa9525d9db4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/README.md @@ -0,0 +1,69 @@ +# Vision Transformer Detection + +## Introduction + +- [Context Autoencoder for Self-Supervised Representation Learning](https://arxiv.org/abs/2202.03026) +- [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/pdf/2111.11429.pdf) + +Object detection is a central downstream task used to +test if pre-trained network parameters confer benefits, such +as improved accuracy or training speed. The complexity +of object detection methods can make this benchmarking +non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. + +## Model Zoo + +| Model | Backbone | Pretrained | Scheduler | Images/GPU | Box AP | Mask AP | Config | Download | +|:------:|:--------:|:--------------:|:--------------:|:--------------:|:--------------:|:------:|:------:|:--------:| +| Cascade RCNN | ViT-base | CAE | 1x | 1 | 52.7 | - | [config](./cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams) | +| Cascade RCNN | ViT-large | CAE | 1x | 1 | 55.7 | - | [config](./cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams) | +| PP-YOLOE | ViT-base | CAE | 36e | 2 | 52.2 | - | [config](./ppyoloe_vit_base_csppan_cae_36e_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_vit_base_csppan_cae_36e_coco.pdparams) | +| Mask RCNN | ViT-base | CAE | 1x | 1 | 50.6 | 44.9 | [config](./mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/mask_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams) | +| Mask RCNN | ViT-large | CAE | 1x | 1 | 54.2 | 47.4 | [config](./mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/mask_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams) | + + +**Notes:** +- Model is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95) +- Base model is trained on 8x32G V100 GPU, large model on 8x80G A100 +- The `Cascade RCNN` experiments are based on PaddlePaddle 2.2.2 + +## Citations +``` +@article{chen2022context, + title={Context autoencoder for self-supervised representation learning}, + author={Chen, Xiaokang and Ding, Mingyu and Wang, Xiaodi and Xin, Ying and Mo, Shentong and Wang, Yunhao and Han, Shumin and Luo, Ping and Zeng, Gang and Wang, Jingdong}, + journal={arXiv preprint arXiv:2202.03026}, + year={2022} +} + +@article{DBLP:journals/corr/abs-2111-11429, + author = {Yanghao Li and + Saining Xie and + Xinlei Chen and + Piotr Doll{\'{a}}r and + Kaiming He and + Ross B. Girshick}, + title = {Benchmarking Detection Transfer Learning with Vision Transformers}, + journal = {CoRR}, + volume = {abs/2111.11429}, + year = {2021}, + url = {https://arxiv.org/abs/2111.11429}, + eprinttype = {arXiv}, + eprint = {2111.11429}, + timestamp = {Fri, 26 Nov 2021 13:48:43 +0100}, + biburl = {https://dblp.org/rec/journals/corr/abs-2111-11429.bib}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} + +@article{Cai_2019, + title={Cascade R-CNN: High Quality Object Detection and Instance Segmentation}, + ISSN={1939-3539}, + url={http://dx.doi.org/10.1109/tpami.2019.2956516}, + DOI={10.1109/tpami.2019.2956516}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + publisher={Institute of Electrical and Electronics Engineers (IEEE)}, + author={Cai, Zhaowei and Vasconcelos, Nuno}, + year={2019}, + pages={1–1} +} +``` diff --git a/PaddleDetection-release-2.6/configs/vitdet/_base_/faster_rcnn_reader.yml b/PaddleDetection-release-2.6/configs/vitdet/_base_/faster_rcnn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..e1165cd0a03fd07f41eaea2701526639010cc7e9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/_base_/faster_rcnn_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResizeCrop: {resizes: [400, 500, 600], cropsizes: [[384, 600], ], prob: 0.5} + - RandomResize: {target_size: [[480, 1333], [512, 1333], [544, 1333], [576, 1333], [608, 1333], [640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 2} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + inputs_def: + image_shape: [-1, 3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: 640, keep_ratio: True} + - Pad: {size: 640} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/vitdet/_base_/mask_rcnn_reader.yml b/PaddleDetection-release-2.6/configs/vitdet/_base_/mask_rcnn_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..83fd376b730ed10767508d3541e778d9663f4555 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/_base_/mask_rcnn_reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + # - RandomResizeCrop: {resizes: [400, 500, 600], cropsizes: [[384, 600], ], prob: 0.5} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_1x.yml b/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..b822b3bf92a6a12facafe4b569a0ebcad3cf1d3b --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_1x.yml @@ -0,0 +1,22 @@ +epoch: 12 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [9, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: AdamWDL + betas: [0.9, 0.999] + layer_decay: 0.75 + weight_decay: 0.02 + num_layers: 12 + filter_bias_and_bn: True + skip_decay_names: ['pos_embed', 'cls_token'] + set_param_lr_func: 'layerwise_lr_decay' diff --git a/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_36e.yml b/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_36e.yml new file mode 100644 index 0000000000000000000000000000000000000000..83b8708d046ea2ce57345ba543ed39453389f45d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/_base_/optimizer_base_36e.yml @@ -0,0 +1,20 @@ + +epoch: 36 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !CosineDecay + max_epochs: 36 + min_lr_ratio: 0.1 # 0.1 + - !LinearWarmup + start_factor: 0.001 + epochs: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 0.1 + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0001 diff --git a/PaddleDetection-release-2.6/configs/vitdet/_base_/ppyoloe_reader.yml b/PaddleDetection-release-2.6/configs/vitdet/_base_/ppyoloe_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..a4feaff4a1c1d64556bd787bd36b7ec7c6b08d81 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/_base_/ppyoloe_reader.yml @@ -0,0 +1,40 @@ +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f7808b7cc3ed90be63a984f8552731e3e0e289f7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml @@ -0,0 +1,131 @@ + +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/faster_rcnn_reader.yml', + './_base_/optimizer_base_1x.yml' +] + +weights: output/cascade_rcnn_vit_base_hrfpn_cae_1x_coco/model_final + + +# runtime +log_iter: 100 +snapshot_epoch: 1 +find_unused_parameters: True + +use_gpu: true +norm_type: sync_bn + + +# reader +worker_num: 2 +TrainReader: + batch_size: 1 + + +# model +architecture: CascadeRCNN + +CascadeRCNN: + backbone: VisionTransformer + neck: HRFPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + + +VisionTransformer: + patch_size: 16 + embed_dim: 768 + depth: 12 + num_heads: 12 + mlp_ratio: 4 + qkv_bias: True + drop_rate: 0.0 + drop_path_rate: 0.2 + init_values: 0.1 + final_norm: False + use_rel_pos_bias: False + use_sincos_pos_emb: True + epsilon: 0.000001 # 1e-6 + out_indices: [3, 5, 7, 11] + with_fpn: True + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams + +HRFPN: + out_channel: 256 + use_bias: True + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + loss_rpn_bbox: SmoothL1Loss + +SmoothL1Loss: + beta: 0.1111111111111111 + + +CascadeHead: + head: CascadeXConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + bbox_loss: GIoULoss + num_cascade_stages: 3 + reg_class_agnostic: False + stage_loss_weights: [1, 0.5, 0.25] + loss_normalize_pos: True + add_gt_as_proposals: [True, True, True] + + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + + +CascadeXConvNormHead: + norm_type: bn + + +GIoULoss: + loss_weight: 10. + reduction: 'none' + eps: 0.000001 + + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..3bd443ef3e3fc0d2dcb623b60e45c51c629c6d2a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + './cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml' +] + +weights: output/cascade_rcnn_vit_large_hrfpn_cae_1x_coco/model_final + + +depth: &depth 24 +dim: &dim 1024 +use_fused_allreduce_gradients: &use_checkpoint True + +VisionTransformer: + img_size: [800, 1344] + embed_dim: *dim + depth: *depth + num_heads: 16 + drop_path_rate: 0.25 + out_indices: [7, 11, 15, 23] + use_checkpoint: *use_checkpoint + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_large_cae_pretrained.pdparams + +HRFPN: + in_channels: [*dim, *dim, *dim, *dim] + +OptimizerBuilder: + optimizer: + layer_decay: 0.9 + weight_decay: 0.02 + num_layers: *depth diff --git a/PaddleDetection-release-2.6/configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8b693f687fd370231e2bdee47a8e7c719c4d63f2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml @@ -0,0 +1,130 @@ + +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/faster_rcnn_reader.yml', + './_base_/optimizer_base_1x.yml' +] + +weights: output/faster_rcnn_vit_base_fpn_cae_1x_coco/model_final + + +# runtime +log_iter: 100 +snapshot_epoch: 1 +find_unused_parameters: True + +use_gpu: true +norm_type: sync_bn + +OptimizerBuilder: + optimizer: + weight_decay: 0.05 + +# reader +worker_num: 2 +TrainReader: + batch_size: 1 + + +# model +architecture: FasterRCNN + +FasterRCNN: + backbone: VisionTransformer + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + bbox_post_process: BBoxPostProcess + +VisionTransformer: + patch_size: 16 + embed_dim: 768 + depth: 12 + num_heads: 12 + mlp_ratio: 4 + qkv_bias: True + drop_rate: 0.0 + drop_path_rate: 0.2 + init_values: 0.1 + final_norm: False + use_rel_pos_bias: False + use_sincos_pos_emb: True + epsilon: 0.000001 # 1e-6 + out_indices: [3, 5, 7, 11] + with_fpn: True + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams + + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + loss_rpn_bbox: SmoothL1Loss + + +SmoothL1Loss: + beta: 0.1111111111111111 + + +BBoxHead: + # head: TwoFCHead + head: XConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + loss_normalize_pos: True + bbox_loss: GIoULoss + + +GIoULoss: + loss_weight: 10. + reduction: 'none' + eps: 0.000001 # 1e-6 + + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +# TwoFCHead: +# out_channel: 1024 + +XConvNormHead: + num_convs: 4 + norm_type: bn + + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c11ce890d64a8709ba31df0a93d24494f7e3aa65 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml @@ -0,0 +1,135 @@ +_BASE_: [ + '../datasets/coco_instance.yml', + '../runtime.yml', + './_base_/mask_rcnn_reader.yml', + './_base_/optimizer_base_1x.yml' +] + +weights: output/mask_rcnn_vit_base_hrfpn_cae_1x_coco/model_final + + +# runtime +log_iter: 100 +snapshot_epoch: 1 +norm_type: sync_bn +use_fused_allreduce_gradients: &use_checkpoint False + + +architecture: MaskRCNN +MaskRCNN: + backbone: VisionTransformer + neck: HRFPN + rpn_head: RPNHead + bbox_head: BBoxHead + mask_head: MaskHead + # post process + bbox_post_process: BBoxPostProcess + mask_post_process: MaskPostProcess + +VisionTransformer: + patch_size: 16 + embed_dim: 768 + depth: 12 + num_heads: 12 + mlp_ratio: 4 + qkv_bias: True + drop_rate: 0.0 + drop_path_rate: 0.2 + init_values: 0.1 + final_norm: False + use_rel_pos_bias: False + use_sincos_pos_emb: True + epsilon: 0.000001 # 1e-6 + out_indices: [3, 5, 7, 11] + with_fpn: True + use_checkpoint: *use_checkpoint + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams + +HRFPN: + out_channel: 256 + use_bias: True + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + loss_rpn_bbox: SmoothL1Loss + +SmoothL1Loss: + beta: 0.1111111111111111 + + +BBoxHead: + head: XConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + loss_normalize_pos: True + bbox_loss: GIoULoss + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + + +XConvNormHead: + num_convs: 4 + norm_type: bn + +GIoULoss: + loss_weight: 10. + reduction: 'none' + eps: 0.000001 + + + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +MaskHead: + head: MaskFeat + roi_extractor: + resolution: 14 + sampling_ratio: 0 + aligned: True + mask_assigner: MaskAssigner + share_bbox_feat: False + +MaskFeat: + num_convs: 4 + out_channel: 256 + norm_type: ~ + +MaskAssigner: + mask_resolution: 28 + +MaskPostProcess: + binary_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5884e91e9d146e6ec031e23b6840026e3c39b073 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml @@ -0,0 +1,29 @@ +_BASE_: [ + './mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml' +] + +weights: output/mask_rcnn_vit_large_hrfpn_cae_1x_coco/model_final + + +depth: &depth 24 +dim: &dim 1024 +use_fused_allreduce_gradients: &use_checkpoint True + +VisionTransformer: + img_size: [800, 1344] + embed_dim: *dim + depth: *depth + num_heads: 16 + drop_path_rate: 0.25 + out_indices: [7, 11, 15, 23] + use_checkpoint: *use_checkpoint + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_large_cae_pretrained.pdparams + +HRFPN: + in_channels: [*dim, *dim, *dim, *dim] + +OptimizerBuilder: + optimizer: + layer_decay: 0.9 + weight_decay: 0.02 + num_layers: *depth diff --git a/PaddleDetection-release-2.6/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml b/PaddleDetection-release-2.6/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..556f4b49d7a2a1c7c6928b4e917881b175b01384 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml @@ -0,0 +1,78 @@ + +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyoloe_reader.yml', + './_base_/optimizer_base_36e.yml' +] + +weights: output/ppyoloe_vit_base_csppan_cae_36e_coco/model_final + + +snapshot_epoch: 2 +log_iter: 100 + + +use_ema: true +ema_decay: 0.9999 +ema_skip_names: ['yolo_head.proj_conv.weight', 'backbone.pos_embed'] +custom_black_list: ['reduce_mean'] +use_fused_allreduce_gradients: &use_checkpoint False + + +architecture: YOLOv3 +norm_type: sync_bn + +YOLOv3: + backbone: VisionTransformer + neck: YOLOCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +VisionTransformer: + patch_size: 16 + embed_dim: 768 + depth: 12 + num_heads: 12 + mlp_ratio: 4 + qkv_bias: True + drop_rate: 0.0 + drop_path_rate: 0.2 + init_values: 0.1 + final_norm: False + use_rel_pos_bias: False + use_sincos_pos_emb: True + epsilon: 0.000001 # 1e-6 + out_indices: [11, ] + with_fpn: True + num_fpn_levels: 3 + out_with_norm: False + use_checkpoint: *use_checkpoint + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams + +YOLOCSPPAN: + in_channels: [768, 768, 768] + act: 'silu' + +PPYOLOEHead: + fpn_strides: [8, 16, 32] + in_channels: [768, 768, 768] + static_assigner_epoch: -1 + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/PaddleDetection-release-2.6/configs/yolof/README.md b/PaddleDetection-release-2.6/configs/yolof/README.md new file mode 100644 index 0000000000000000000000000000000000000000..84c86bf3b7b31dc16bc3574d3233f3ecd1bf6186 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolof/README.md @@ -0,0 +1,22 @@ +# YOLOF (You Only Look One-level Feature) + +## ModelZOO + +| 网络网络 | 输入尺寸 | 图片数/GPU | Epochs | 模型推理耗时(ms) | mAPval
    0.5:0.95 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :--------------------- | :------- | :-------: | :----: | :----------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: | +| YOLOF-R_50_C5 (paper) | 800x1333 | 4 | 12 | - | 37.7 | - | - | - | - | +| YOLOF-R_50_C5 | 800x1333 | 4 | 12 | - | 38.1 | 44.16 | 241.64 | [下载链接](https://paddledet.bj.bcebos.com/models/yolof_r50_c5_1x_coco.pdparams) | [配置文件](./yolof_r50_c5_1x_coco.yml) | + +**注意:** + - YOLOF模型训练过程中默认使用8 GPUs进行混合精度训练,总batch_size默认为32。 + + +## Citations +``` +@inproceedings{chen2021you, + title={You Only Look One-level Feature}, + author={Chen, Qiang and Wang, Yingming and Yang, Tong and Zhang, Xiangyu and Cheng, Jian and Sun, Jian}, + booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/yolof/_base_/optimizer_1x.yml b/PaddleDetection-release-2.6/configs/yolof/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..6951a6c34bbde8bc484cc86d0a440cb2ed909fec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolof/_base_/optimizer_1x.yml @@ -0,0 +1,19 @@ +epoch: 12 + +LearningRate: + base_lr: 0.06 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.00066 + steps: 1500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_r50_c5.yml b/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_r50_c5.yml new file mode 100644 index 0000000000000000000000000000000000000000..53b2eb972ba4dd8d058c5ae571fd3eafa7fb4b99 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_r50_c5.yml @@ -0,0 +1,54 @@ +architecture: YOLOF +find_unused_parameters: True + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +YOLOF: + backbone: ResNet + neck: DilatedEncoder + head: YOLOFHead + +ResNet: + depth: 50 + variant: b # resnet-va in paper + freeze_at: 0 # res2 + return_idx: [3] # only res5 feature + lr_mult_list: [0.3333, 0.3333, 0.3333, 0.3333] + +DilatedEncoder: + in_channels: [2048] + out_channels: [512] + block_mid_channels: 128 + num_residual_blocks: 4 + block_dilations: [2, 4, 6, 8] + +YOLOFHead: + conv_feat: + name: YOLOFFeat + feat_in: 512 + feat_out: 512 + num_cls_convs: 2 + num_reg_convs: 4 + norm_type: bn + anchor_generator: + name: AnchorGenerator + anchor_sizes: [[32, 64, 128, 256, 512]] + aspect_ratios: [1.0] + strides: [32] + bbox_assigner: + name: UniformAssigner + pos_ignore_thr: 0.15 + neg_ignore_thr: 0.7 + match_times: 4 + loss_class: + name: FocalLoss + gamma: 2.0 + alpha: 0.25 + loss_bbox: + name: GIoULoss + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.6 diff --git a/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_reader.yml b/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..a0b19c85aec11b1c3917a4ab478af2e78e11f1d9 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolof/_base_/yolof_reader.yml @@ -0,0 +1,38 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - RandomShift: {prob: 0.5, max_shift: 32} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - RandomFlip: {} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 4 + shuffle: True + drop_last: True + collate_batch: False + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + fuse_normalize: True diff --git a/PaddleDetection-release-2.6/configs/yolof/yolof_r50_c5_1x_coco.yml b/PaddleDetection-release-2.6/configs/yolof/yolof_r50_c5_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..2bc476f505acf8e4e715fa1f1c3d8f1edf896d70 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolof/yolof_r50_c5_1x_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_1x.yml', + './_base_/yolof_r50_c5.yml', + './_base_/yolof_reader.yml' +] +log_iter: 50 +snapshot_epoch: 1 +weights: output/yolof_r50_c5_1x_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/README.md b/PaddleDetection-release-2.6/configs/yolov3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..16327dd3e3b7e482a42ad1776671e5ead870e062 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/README.md @@ -0,0 +1,85 @@ +# YOLOv3 + +## Model Zoo + +### YOLOv3 on COCO + +| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | mAPval
    0.5:0.95 | 下载 | 配置文件 | +| :------------------- | :------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | +| DarkNet53(paper) | 608 | 8 | 270e | - | 33.0 | - | - | +| DarkNet53(paper) | 416 | 8 | 270e | - | 31.0 | - | - | +| DarkNet53(paper) | 320 | 8 | 270e | - | 28.2 | - | - | +| DarkNet53 | 608 | 8 | 270e | - | **39.1** | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](./yolov3_darknet53_270e_coco.yml) | +| DarkNet53 | 416 | 8 | 270e | - | 37.7 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](./yolov3_darknet53_270e_coco.yml) | +| DarkNet53 | 320 | 8 | 270e | - | 34.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](./yolov3_darknet53_270e_coco.yml) | +| ResNet50_vd-DCN | 608 | 8 | 270e | - | **40.6** | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](./yolov3_r50vd_dcn_270e_coco.yml) | +| ResNet50_vd-DCN | 416 | 8 | 270e | - | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](./yolov3_r50vd_dcn_270e_coco.yml) | +| ResNet50_vd-DCN | 320 | 8 | 270e | - | 35.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](./yolov3_r50vd_dcn_270e_coco.yml) | +| ResNet34 | 608 | 8 | 270e | - | 36.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](./yolov3_r34_270e_coco.yml) | +| ResNet34 | 416 | 8 | 270e | - | 34.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](./yolov3_r34_270e_coco.yml) | +| ResNet34 | 320 | 8 | 270e | - | 31.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](./yolov3_r34_270e_coco.yml) | +| MobileNet-V1 | 608 | 8 | 270e | - | 29.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_coco.yml) | +| MobileNet-V1 | 416 | 8 | 270e | - | 29.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_coco.yml) | +| MobileNet-V1 | 320 | 8 | 270e | - | 27.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_coco.yml) | +| MobileNet-V3 | 608 | 8 | 270e | - | 31.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_coco.yml) | +| MobileNet-V3 | 416 | 8 | 270e | - | 29.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_coco.yml) | +| MobileNet-V3 | 320 | 8 | 270e | - | 27.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_coco.yml) | +| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 31.0 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 30.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 28.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_coco.yml) | + +### YOLOv3 on Pasacl VOC + +| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| mAP(0.50,11point) | 下载 | 配置文件 | +| :----------- | :--: | :-----: | :-----: |:------------: |:----: | :-------: | :----: | +| DarkNet53 | 608 | 8 | 270e | - | **85.4** (56.1 mAP
    0.5:0.95) | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_voc.pdparams) | [配置文件](./yolov3_darknet53_270e_voc.yml) | +| DarkNet53 | 416 | 8 | 270e | - | 85.2 (57.3 mAP
    0.5:0.95) | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_voc.pdparams) | [配置文件](./yolov3_darknet53_270e_voc.yml) | +| DarkNet53 | 320 | 8 | 270e | - | 84.3 (55.2 mAP
    0.5:0.95) | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_voc.pdparams) | [配置文件](./yolov3_darknet53_270e_voc.yml) | +| MobileNet-V1 | 608 | 8 | 270e | - | 75.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_voc.yml) | +| MobileNet-V1 | 416 | 8 | 270e | - | 76.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_voc.yml) | +| MobileNet-V1 | 320 | 8 | 270e | - | 74.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_270e_voc.yml) | +| MobileNet-V3 | 608 | 8 | 270e | - | 79.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_voc.yml) | +| MobileNet-V3 | 416 | 8 | 270e | - | 78.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_voc.yml) | +| MobileNet-V3 | 320 | 8 | 270e | - | 76.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_270e_voc.yml) | +| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 78.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 79.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 608 | 8 | 270e | - | 80.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](./yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | + + +**注意:** + - YOLOv3模型训练过程中默认使用8 GPUs,总batch_size默认为64,评估时网络尺度默认为`608*608`; + - `416*416`和`320*320`尺度只需更改`EvalReader`的`Resize`参数为相应值即可,无需重新训练模型,如: + ``` + EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: False, interp: 2} # or [320, 320] + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + ``` + - VOC数据集可以从此[链接](https://bj.bcebos.com/v1/paddledet/data/voc.zip)下载,默认评估指标为mAP(0.50,11point),如果想转为COCO格式指标的mAP
    0.5:0.95,可以参照[yolov3_darknet53_270e_voc](./yolov3_darknet53_270e_voc.yml) 添加以下几行重新eval: + ``` + metric: COCO + EvalDataset: + !COCODataSet + image_dir: VOCdevkit/VOC2007/JPEGImages + anno_path: voc_test.json + dataset_dir: dataset/voc + ``` + + +## Citations +``` +@misc{redmon2018yolov3, + title={YOLOv3: An Incremental Improvement}, + author={Joseph Redmon and Ali Farhadi}, + year={2018}, + eprint={1804.02767}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_270e.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_270e.yml new file mode 100644 index 0000000000000000000000000000000000000000..d92f3df60ca6686d7ada476b7b9f01419f0edb81 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_270e.yml @@ -0,0 +1,21 @@ +epoch: 270 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_40e.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_40e.yml new file mode 100644 index 0000000000000000000000000000000000000000..7cf676d7119162d55dc0a2566c0590457344cfd3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/optimizer_40e.yml @@ -0,0 +1,21 @@ +epoch: 40 + +LearningRate: + base_lr: 0.0001 + schedulers: + - name: PiecewiseDecay + gamma: 0.1 + milestones: + - 32 + - 36 + - name: LinearWarmup + start_factor: 0.3333333333333333 + steps: 100 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_darknet53.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_darknet53.yml new file mode 100644 index 0000000000000000000000000000000000000000..1187f6eac9d4e55eeea4d0b6e0c678ad01d724b0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_darknet53.yml @@ -0,0 +1,41 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v1.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v1.yml new file mode 100644 index 0000000000000000000000000000000000000000..6452b5132b47203379bb3292eb0afc6958d4609f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v1.yml @@ -0,0 +1,43 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: MobileNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNet: + scale: 1 + feature_maps: [4, 6, 13] + with_extra_blocks: false + extra_block_filters: [] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_large.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_large.yml new file mode 100644 index 0000000000000000000000000000000000000000..94b5dea3ea6b2039ce5aaf8ccab44d651727bd21 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_large.yml @@ -0,0 +1,44 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: MobileNetV3 + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNetV3: + model_name: large + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [7, 13, 16] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_small.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_small.yml new file mode 100644 index 0000000000000000000000000000000000000000..f0f144b916c44da6de45de762310e3470179ac5a --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_mobilenet_v3_small.yml @@ -0,0 +1,44 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_small_x1_0_ssld_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: MobileNetV3 + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +MobileNetV3: + model_name: small + scale: 1. + with_extra_blocks: false + extra_block_filters: [] + feature_maps: [4, 9, 12] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r34.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r34.yml new file mode 100644 index 0000000000000000000000000000000000000000..c2d1489f07ba65240e5b545662b8c1672750b705 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r34.yml @@ -0,0 +1,41 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: ResNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 34 + return_idx: [1, 2, 3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r50vd_dcn.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r50vd_dcn.yml new file mode 100644 index 0000000000000000000000000000000000000000..0d01148b476e5bbc4c9ae96cc7a215258e7d7042 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_r50vd_dcn.yml @@ -0,0 +1,45 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: ResNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_reader.yml b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5dab6742b120a68ea76599b911567ee753b68253 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/_base_/yolov3_reader.yml @@ -0,0 +1,44 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + mixup_epoch: 250 + use_shared_memory: true + +EvalReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4fbd401d302ea2d9c55a7b51384e36eff790abe2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_darknet53.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_voc.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..92631171af748e5c1b4d5e9219dccb140822123f --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_270e_voc.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_darknet53.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_voc/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams + + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + + +# ### remove comment below and run evaluate again to get 56.1 COCO for mAP(0.5:0.95) +# metric: COCO +# EvalDataset: +# !COCODataSet +# image_dir: VOCdevkit/VOC2007/JPEGImages +# anno_path: voc_test.json +# dataset_dir: dataset/voc +# # wget https://bj.bcebos.com/v1/paddledet/data/voc.zip diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f245d3b1c7309cb7af0b52fc2fa2747fe5d443a0 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_270e_coco.yml @@ -0,0 +1,40 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_darknet53.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final + +norm_type: bn + +YOLOv3Loss: + ignore_thresh: 0.5 + downsample: [32, 16, 8] + label_smooth: false + +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53], ratio: 2.0} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8], iou_thresh: 0.5} + batch_size: 8 + shuffle: true + drop_last: true + mixup_epoch: -1 + use_shared_memory: true diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_320e_coco_1p.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_320e_coco_1p.yml new file mode 100644 index 0000000000000000000000000000000000000000..fded8bbc3a53b24e7ffad28ba61b80b8960eb0ff --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_darknet53_original_320e_coco_1p.yml @@ -0,0 +1,59 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/yolov3_darknet53.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final + +norm_type: bn + +YOLOv3Loss: + ignore_thresh: 0.5 + downsample: [32, 16, 8] + label_smooth: false + +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53], ratio: 2.0} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [416], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8], iou_thresh: 0.5} + batch_size: 32 + shuffle: true + drop_last: true + mixup_epoch: -1 + use_shared_memory: true + +epoch: 320 + +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 320 + - !LinearWarmup + start_factor: 0. + epochs: 4 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.016 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b9dd33bdb27a539193ee1c003095f45c58b5e368 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_mobilenet_v1_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..996757af6be052409b5a71f8d543e5da63cb491d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_mobilenet_v1_270e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml new file mode 100644 index 0000000000000000000000000000000000000000..e897276c655ad97aa57f0ca195bba4db9900b5a8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/roadsign_voc.yml', + '../runtime.yml', + '_base_/optimizer_40e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +weights: output/yolov3_mobilenet_v1_roadsign/model_final + +YOLOv3Loss: + ignore_thresh: 0.7 + label_smooth: true diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..10cf8166d9e9ab1a63211f873d73a3e8eee4eb91 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml @@ -0,0 +1,11 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_ssld_pretrained.pdparams +weights: output/yolov3_mobilenet_v1_ssld_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..0f9c85fd981113ccbd1e1080000ea76a0cd680a6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_ssld_pretrained.pdparams +weights: output/yolov3_mobilenet_v1_ssld_270e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..d1b8af566e99310cbde30dead1d6ad3b6ff428a4 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v3_large.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_mobilenet_v3_large_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..e246c8bae484833e7e63034318f150c7fbba93d6 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v3_large.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_mobilenet_v3_large_270e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..13a2583397bfda58ab7e06d9b1621edec47f506e --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml @@ -0,0 +1,29 @@ +_BASE_: [ + '../datasets/voc.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_mobilenet_v3_large.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams +weights: output/yolov3_mobilenet_v3_large_ssld_270e_voc/model_final + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 1000 diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_r34_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_r34_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8653b06161b9145dbd23e00878d5c056986db5ec --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_r34_270e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_r34.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_r34_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml b/PaddleDetection-release-2.6/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..a07cbdde1dcfa2caf50ec93ae7f499a7734335ab --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml @@ -0,0 +1,10 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/optimizer_270e.yml', + '_base_/yolov3_r50vd_dcn.yml', + '_base_/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: output/yolov3_r50vd_dcn_270e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolox/README.md b/PaddleDetection-release-2.6/configs/yolox/README.md new file mode 100644 index 0000000000000000000000000000000000000000..057b87f2f8e27adaf76a2e993f1a7d6e395fefd3 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/README.md @@ -0,0 +1,190 @@ +# YOLOX (YOLOX: Exceeding YOLO Series in 2021) + +## 内容 +- [模型库](#模型库) +- [使用说明](#使用说明) +- [速度测试](#速度测试) +- [引用](#引用) + + +## 模型库 +### YOLOX on COCO + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | 模型推理耗时(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [配置文件](./yolox_nano_300e_coco.yml) | +| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [配置文件](./yolox_tiny_300e_coco.yml) | +| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [配置文件](./yolox_s_300e_coco.yml) | +| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [配置文件](./yolox_m_300e_coco.yml) | +| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [配置文件](./yolox_l_300e_coco.yml) | +| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [配置文件](./yolox_x_300e_coco.yml) | + + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | 模型推理耗时(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [配置文件](./yolox_cdn_tiny_300e_coco.yml) | +| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [配置文件](./yolox_crn_s_300e_coco.yml) | +| YOLOX-ConvNeXt-s| 640 | 8 | 36e | - | **44.6** | **65.3** | 36.2 | 27.52 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [配置文件](../convnext/yolox_convnext_s_36e_coco.yml) | + + +**注意:** + - YOLOX模型训练使用COCO train2017作为训练集,YOLOX-cdn表示使用与YOLOv5 releases v6.0之后版本相同的主干网络,YOLOX-crn表示使用与PPYOLOE相同的主干网络CSPResNet,YOLOX-ConvNeXt表示使用ConvNeXt作为主干网络; + - YOLOX模型训练过程中默认使用8 GPUs进行混合精度训练,默认每卡batch_size为8,默认lr为0.01为8卡总batch_size=64的设置,如果**GPU卡数**或者每卡**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率; + - 为保持高mAP的同时提高推理速度,可以将[yolox_cspdarknet.yml](_base_/yolox_cspdarknet.yml)中的`nms_top_k`修改为`1000`,将`keep_top_k`修改为`100`,将`score_threshold`修改为`0.01`,mAP会下降约0.1~0.2%; + - 为快速的demo演示效果,可以将[yolox_cspdarknet.yml](_base_/yolox_cspdarknet.yml)中的`score_threshold`修改为`0.25`,将`nms_threshold`修改为`0.45`,但mAP会下降较多; + - YOLOX模型推理速度测试采用单卡V100,batch size=1进行测试,使用**CUDA 10.2**, **CUDNN 7.6.5**,TensorRT推理速度测试使用**TensorRT 6.0.1.8**。 + - 参考[速度测试](#速度测试)以复现YOLOX推理速度测试结果,速度为**tensorRT-FP16**测速后的最快速度,**不包含数据预处理和模型输出后处理(NMS)**的耗时。 + - 如果你设置了`--run_benchmark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +## 使用教程 + +### 1.训练 +执行以下指令使用混合精度训练YOLOX +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolox/yolox_s_300e_coco.yml --amp --eval +``` +**注意:** +- `--amp`表示开启混合精度训练以避免显存溢出,`--eval`表示边训边验证。 + +### 2.评估 +执行以下命令在单个GPU上评估COCO val2017数据集 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +``` + +### 3.推理 +使用以下命令在单张GPU上预测图片,使用`--infer_img`推理单张图片以及使用`--infer_dir`推理文件中的所有图片。 +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# 推理文件中的所有图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams --infer_dir=demo +``` + +### 4.导出模型 +YOLOX在GPU上推理部署或benchmark测速等需要通过`tools/export_model.py`导出模型。 + +当你**使用Paddle Inference但不使用TensorRT**时,运行以下的命令导出模型 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +``` + +当你**使用Paddle Inference且使用TensorRT**时,需要指定`-o trt=True`来导出模型。 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams trt=True +``` + +如果你想将YOLOX模型导出为**ONNX格式**,参考 +[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md),运行以下命令: + +```bash + +# 导出推理模型 +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams + +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx格式 +paddle2onnx --model_dir output_inference/yolox_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file yolox_s_300e_coco.onnx +``` + +**注意:** ONNX模型目前只支持batch_size=1 + + +### 5.推理部署 +YOLOX可以使用以下方式进行部署: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim模型量化](../slim) + +运行以下命令导出模型 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams trt=True +``` + +**注意:** +- trt=True表示**使用Paddle Inference且使用TensorRT**进行测速,速度会更快,默认不加即为False,表示**使用Paddle Inference但不使用TensorRT**进行测速。 +- 如果是使用Paddle Inference在TensorRT FP16模式下部署,需要参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 + +#### 5.1.Python部署 +`deploy/python/infer.py`使用上述导出后的Paddle Inference模型用于推理和benchnark测速,如果设置了`--run_benchmark=True`, 首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +```bash +# Python部署推理单张图片 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu + +# 推理文件夹下的所有图片 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_dir=demo/ --device=gpu +``` + +#### 5.2. C++部署 +`deploy/cpp/build/main`使用上述导出后的Paddle Inference模型用于C++推理部署, 首先按照[docs](../../deploy/cpp/docs)编译安装环境。 +```bash +# C++部署推理单张图片 +./deploy/cpp/build/main --model_dir=output_inference/yolox_s_300e_coco/ --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=GPU --threshold=0.5 --output_dir=cpp_infer_output/yolox_s_300e_coco +``` + + +## 速度测试 + +为了公平起见,在[模型库](#模型库)中的速度测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致),需要在导出模型时指定`-o exclude_nms=True`。测速需设置`--run_benchmark=True`, 首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +**使用Paddle Inference但不使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型 +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams exclude_nms=True + +# 速度测试,使用run_benchmark=True +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**使用Paddle Inference且使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型,使用trt=True +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams exclude_nms=True trt=True + +# 速度测试,使用run_benchmark=True +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True + +# tensorRT-FP32测速 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp32 + +# tensorRT-FP16测速 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp16 +``` +**注意:** +- 导出模型时指定`-o exclude_nms=True`仅作为测速时用,这样导出的模型其推理部署预测的结果不是最终检出框的结果。 +- [模型库](#模型库)中的速度测试结果为**tensorRT-FP16**测速后的最快速度,为**不包含数据预处理和模型输出后处理(NMS)**的耗时。 + +## FAQ + +
    +如何计算模型参数量 +可以将以下代码插入:[trainer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/ppdet/engine/trainer.py#L154) 来计算参数量。 +```python +params = sum([ + p.numel() for n, p in self.model.named_parameters() + if all([x not in n for x in ['_mean', '_variance']]) +]) # exclude BatchNorm running status +print('Params: ', params) +``` +
    + + +## Citations +``` + @article{yolox2021, + title={YOLOX: Exceeding YOLO Series in 2021}, + author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian}, + journal={arXiv preprint arXiv:2107.08430}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/configs/yolox/_base_/optimizer_300e.yml b/PaddleDetection-release-2.6/configs/yolox/_base_/optimizer_300e.yml new file mode 100644 index 0000000000000000000000000000000000000000..1853ad61ff3e8f222388a005db9e60640700c996 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/_base_/optimizer_300e.yml @@ -0,0 +1,20 @@ +epoch: 300 + +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 300 + min_lr_ratio: 0.05 + last_plateau_epochs: 15 + - !ExpWarmup + epochs: 5 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 diff --git a/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_cspdarknet.yml b/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_cspdarknet.yml new file mode 100644 index 0000000000000000000000000000000000000000..24ef370c437e308c3a7e9da973fe3eea439faf17 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_cspdarknet.yml @@ -0,0 +1,42 @@ +architecture: YOLOX +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True + +depth_mult: 1.0 +width_mult: 1.0 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +YOLOXHead: + l1_epoch: 285 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 1000 + score_threshold: 0.001 + nms_threshold: 0.65 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_reader.yml b/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..a33b847b159a515248c8556a24bb29e779f1def8 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/_base_/yolox_reader.yml @@ -0,0 +1,44 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.1, 2.0] + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: True + mixup_prob: 1.0 + mixup_scale: [0.5, 1.5] + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 4 + + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_cdn_tiny_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_cdn_tiny_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..81c6c075d3620caa98dce2ebcd3b45bd694cef8d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_cdn_tiny_300e_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'yolox_tiny_300e_coco.yml' +] +depth_mult: 0.33 +width_mult: 0.375 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_cdn_tiny_300e_coco/model_final + +CSPDarkNet: + arch: "P5" # using the same backbone of YOLOv5 releases v6.0 and later version + return_idx: [2, 3, 4] + depthwise: False diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_crn_s_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_crn_s_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ae463113e5909f76905b70409ae75794a66430d7 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_crn_s_300e_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_crn_s_300e_coco/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams + + +YOLOX: + backbone: CSPResNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_l_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_l_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..79cffd5e544b0d2cf4629c6a9f37e75eda4a5a6d --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_l_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 1.0 +width_mult: 1.0 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_l_300e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_m_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_m_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4c25d7e2561cf120b60b712e621ad695debdb61c --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_m_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.67 +width_mult: 0.75 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_m_300e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_nano_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_nano_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..80b8b5c51fbc200ecce2ff10013b7e9a94300999 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_nano_300e_coco.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.25 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_nano_300e_coco/model_final + + +### model config: +# Note: YOLOX-nano use depthwise conv in backbone, neck and head. +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [10, 20] # multi-scale range [320*320 ~ 640*640] + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: True + +YOLOCSPPAN: + depthwise: True + +YOLOXHead: + depthwise: True + + +### reader config: +# Note: YOLOX-tiny/nano uses 416*416 for evaluation and inference. +# And multi-scale training setting is in model config, TrainReader's operators use 640*640 as default. +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 0.5 # 1.0 in YOLOX-tiny/s/m/l/x + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.5, 1.5] # [0.1, 2.0] in YOLOX-s/m/l/x + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: False # True in YOLOX-s/m/l/x + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + + +TestReader: + inputs_def: + image_shape: [3, 416, 416] + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_s_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_s_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..9ba6120a93ec1d5c46cc8d8dc88351671ff44349 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_s_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_s_300e_coco/model_final diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_tiny_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_tiny_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c81c172d27982c460bbead78f966158c67de7bc2 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_tiny_300e_coco.yml @@ -0,0 +1,69 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.375 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_tiny_300e_coco/model_final + + +### model config: +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [10, 20] # multi-scale ragne [320*320 ~ 640*640] + + +### reader config: +# Note: YOLOX-tiny/nano uses 416*416 for evaluation and inference. +# And multi-scale training setting is in model config, TrainReader's operators use 640*640 as default. +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.5, 1.5] # [0.1, 2.0] in YOLOX-s/m/l/x + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: False # True in YOLOX-s/m/l/x + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + + +TestReader: + inputs_def: + image_shape: [3, 416, 416] + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/configs/yolox/yolox_x_300e_coco.yml b/PaddleDetection-release-2.6/configs/yolox/yolox_x_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..fd8e0d2eb0fbc2d8f052e549b71f9995aa325a05 --- /dev/null +++ b/PaddleDetection-release-2.6/configs/yolox/yolox_x_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 1.33 +width_mult: 1.25 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_x_300e_coco/model_final diff --git a/PaddleDetection-release-2.6/dataset/coco/download_coco.py b/PaddleDetection-release-2.6/dataset/coco/download_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..993218fff32b9d78eab43e4a37264b031338f496 --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/coco/download_coco.py @@ -0,0 +1,28 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PaddleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import download_dataset + +logging.basicConfig(level=logging.INFO) + +download_path = osp.split(osp.realpath(sys.argv[0]))[0] +download_dataset(download_path, 'coco') diff --git a/PaddleDetection-release-2.6/dataset/dota/.gitignore b/PaddleDetection-release-2.6/dataset/dota/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/PaddleDetection-release-2.6/dataset/mot/gen_labels_MOT.py b/PaddleDetection-release-2.6/dataset/mot/gen_labels_MOT.py new file mode 100644 index 0000000000000000000000000000000000000000..22995aed3aefa392c245cf45b1206b18b7d7119f --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/mot/gen_labels_MOT.py @@ -0,0 +1,65 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os.path as osp +import os +import numpy as np + +MOT_data = 'MOT16' + +# choose a data in ['MOT15', 'MOT16', 'MOT17', 'MOT20'] +# or your custom data (prepare it following the 'docs/tutorials/PrepareMOTDataSet.md') + + +def mkdirs(d): + if not osp.exists(d): + os.makedirs(d) + + +seq_root = './{}/images/train'.format(MOT_data) +label_root = './{}/labels_with_ids/train'.format(MOT_data) +mkdirs(label_root) +seqs = [s for s in os.listdir(seq_root)] + +tid_curr = 0 +tid_last = -1 +for seq in seqs: + seq_info = open(osp.join(seq_root, seq, 'seqinfo.ini')).read() + seq_width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find( + '\nimHeight')]) + seq_height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find( + '\nimExt')]) + + gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt') + gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',') + + seq_label_root = osp.join(label_root, seq, 'img1') + mkdirs(seq_label_root) + + for fid, tid, x, y, w, h, mark, label, _ in gt: + if mark == 0 or not label == 1: + continue + fid = int(fid) + tid = int(tid) + if not tid == tid_last: + tid_curr += 1 + tid_last = tid + x += w / 2 + y += h / 2 + label_fpath = osp.join(seq_label_root, '{:06d}.txt'.format(fid)) + label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format( + tid_curr, x / seq_width, y / seq_height, w / seq_width, + h / seq_height) + with open(label_fpath, 'a') as f: + f.write(label_str) diff --git a/PaddleDetection-release-2.6/dataset/roadsign_voc/download_roadsign_voc.py b/PaddleDetection-release-2.6/dataset/roadsign_voc/download_roadsign_voc.py new file mode 100644 index 0000000000000000000000000000000000000000..7d8ef2252f3d8b91f9c0c30e6be5ad186a00c18f --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/roadsign_voc/download_roadsign_voc.py @@ -0,0 +1,28 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PaddleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import download_dataset + +logging.basicConfig(level=logging.INFO) + +download_path = osp.split(osp.realpath(sys.argv[0]))[0] +download_dataset(download_path, 'roadsign_voc') diff --git a/PaddleDetection-release-2.6/dataset/roadsign_voc/label_list.txt b/PaddleDetection-release-2.6/dataset/roadsign_voc/label_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..1be460f457a2fdbec91d3a69377c232ae4a6beb0 --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/roadsign_voc/label_list.txt @@ -0,0 +1,4 @@ +speedlimit +crosswalk +trafficlight +stop \ No newline at end of file diff --git a/PaddleDetection-release-2.6/dataset/spine_coco/download_spine_coco.py b/PaddleDetection-release-2.6/dataset/spine_coco/download_spine_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..2b23dd6387b66a9e42f26b59e5fe6fea7bf81d7d --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/spine_coco/download_spine_coco.py @@ -0,0 +1,28 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PaddleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import download_dataset + +logging.basicConfig(level=logging.INFO) + +download_path = osp.split(osp.realpath(sys.argv[0]))[0] +download_dataset(download_path, 'spine_coco') diff --git a/PaddleDetection-release-2.6/dataset/voc/create_list.py b/PaddleDetection-release-2.6/dataset/voc/create_list.py new file mode 100644 index 0000000000000000000000000000000000000000..7696073448d1dc65e1e0e20919048b69658d5ea1 --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/voc/create_list.py @@ -0,0 +1,28 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PaddleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import create_voc_list + +logging.basicConfig(level=logging.INFO) + +voc_path = osp.split(osp.realpath(sys.argv[0]))[0] +create_voc_list(voc_path) diff --git a/PaddleDetection-release-2.6/dataset/voc/download_voc.py b/PaddleDetection-release-2.6/dataset/voc/download_voc.py new file mode 100644 index 0000000000000000000000000000000000000000..2375fbf3c17c6424763ea5323f4a470f30eff3df --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/voc/download_voc.py @@ -0,0 +1,28 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os.path as osp +import logging +# add python path of PaddleDetection to sys.path +parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.utils.download import download_dataset + +logging.basicConfig(level=logging.INFO) + +download_path = osp.split(osp.realpath(sys.argv[0]))[0] +download_dataset(download_path, 'voc') diff --git a/PaddleDetection-release-2.6/dataset/voc/label_list.txt b/PaddleDetection-release-2.6/dataset/voc/label_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..9513a5faa0f2f75a3f9aa2470ff541a16dc888da --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/voc/label_list.txt @@ -0,0 +1,2 @@ +fall +nofall \ No newline at end of file diff --git a/PaddleDetection-release-2.6/dataset/wider_face/download_wider_face.sh b/PaddleDetection-release-2.6/dataset/wider_face/download_wider_face.sh new file mode 100644 index 0000000000000000000000000000000000000000..59a2054def3dfa7e27a2ac7ba84b779800a32933 --- /dev/null +++ b/PaddleDetection-release-2.6/dataset/wider_face/download_wider_face.sh @@ -0,0 +1,21 @@ +# All rights `PaddleDetection` reserved +# References: +# @inproceedings{yang2016wider, +# Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou}, +# Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +# Title = {WIDER FACE: A Face Detection Benchmark}, +# Year = {2016}} + +DIR="$( cd "$(dirname "$0")" ; pwd -P )" +cd "$DIR" + +# Download the data. +echo "Downloading..." +wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip +wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip +wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip +# Extract the data. +echo "Extracting..." +unzip -q WIDER_train.zip +unzip -q WIDER_val.zip +unzip -q wider_face_split.zip diff --git a/PaddleDetection-release-2.6/demo/000000014439.jpg b/PaddleDetection-release-2.6/demo/000000014439.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0abbdab06eb5950b93908cc91adfa640e8a3ac78 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/000000014439.jpg differ diff --git a/PaddleDetection-release-2.6/demo/000000014439_640x640.jpg b/PaddleDetection-release-2.6/demo/000000014439_640x640.jpg new file mode 100644 index 0000000000000000000000000000000000000000..58e9d3e228af43c9b55d8d0cb385ce82ebb8b996 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/000000014439_640x640.jpg differ diff --git a/PaddleDetection-release-2.6/demo/000000087038.jpg b/PaddleDetection-release-2.6/demo/000000087038.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9f77f5d5f057b6f92dc096da704ecb8dee99bdf5 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/000000087038.jpg differ diff --git a/PaddleDetection-release-2.6/demo/000000570688.jpg b/PaddleDetection-release-2.6/demo/000000570688.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cb304bd56c4010c08611a30dcca58ea9140cea54 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/000000570688.jpg differ diff --git a/PaddleDetection-release-2.6/demo/39006.jpg b/PaddleDetection-release-2.6/demo/39006.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ce980e366cac812263d5dbe4e660209345997688 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/39006.jpg differ diff --git a/PaddleDetection-release-2.6/demo/P0072__1.0__0___0.png b/PaddleDetection-release-2.6/demo/P0072__1.0__0___0.png new file mode 100644 index 0000000000000000000000000000000000000000..d3e307e7eec4b26b824cb717b619ecf2c88fb7f0 --- /dev/null +++ b/PaddleDetection-release-2.6/demo/P0072__1.0__0___0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ec9093d3b3ea72d578453b06497e35c4f7e62e0f0d8443d80f4c44af44d6c77 +size 1680618 diff --git a/PaddleDetection-release-2.6/demo/P0861__1.0__1154___824.png b/PaddleDetection-release-2.6/demo/P0861__1.0__1154___824.png new file mode 100644 index 0000000000000000000000000000000000000000..56dac9aa07f82f657bb0bbd51926b2ee67d8e37d --- /dev/null +++ b/PaddleDetection-release-2.6/demo/P0861__1.0__1154___824.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a58b94a1bc149eae55ed852c742d839b6f5902d81a053cb72868c91254adea32 +size 1256871 diff --git a/PaddleDetection-release-2.6/demo/car.jpg b/PaddleDetection-release-2.6/demo/car.jpg new file mode 100644 index 0000000000000000000000000000000000000000..486788dd3445cc1dbca7b7b835fc87f999701664 --- /dev/null +++ b/PaddleDetection-release-2.6/demo/car.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6d75247fcd88918054fdbcd09864e3d303de064d28ca18766b8568a31c0d898 +size 2283212 diff --git a/PaddleDetection-release-2.6/demo/hrnet_demo.jpg b/PaddleDetection-release-2.6/demo/hrnet_demo.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4d0c4ac247670f5941f6fe2115d288e7a5604f0d Binary files /dev/null and b/PaddleDetection-release-2.6/demo/hrnet_demo.jpg differ diff --git a/PaddleDetection-release-2.6/demo/orange_71.jpg b/PaddleDetection-release-2.6/demo/orange_71.jpg new file mode 100644 index 0000000000000000000000000000000000000000..da7974a1a1371298f1ca5f4ef9c82bd3824d7ac3 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/orange_71.jpg differ diff --git a/PaddleDetection-release-2.6/demo/road554.png b/PaddleDetection-release-2.6/demo/road554.png new file mode 100644 index 0000000000000000000000000000000000000000..7733e57f922b0fee893775da4f698c202804966f Binary files /dev/null and b/PaddleDetection-release-2.6/demo/road554.png differ diff --git a/PaddleDetection-release-2.6/demo/visdrone_0000315_01601_d_0000509.jpg b/PaddleDetection-release-2.6/demo/visdrone_0000315_01601_d_0000509.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cc7a3602c1c015213ca1f7e27b0d006e827ee935 Binary files /dev/null and b/PaddleDetection-release-2.6/demo/visdrone_0000315_01601_d_0000509.jpg differ diff --git a/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER.md b/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER.md new file mode 100644 index 0000000000000000000000000000000000000000..988cf30f6c672195d4b3833fe9a186b497a11c2e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER.md @@ -0,0 +1,60 @@ +# 推理Benchmark + +## 一、环境准备 +- 1、测试环境: + - CUDA 10.1 + - CUDNN 7.6 + - TensorRT-6.0.1 + - PaddlePaddle v2.0.1 + - GPU分别为: Tesla V100和GTX 1080Ti和Jetson AGX Xavier +- 2、测试方式: + - 为了方便比较不同模型的推理速度,输入采用同样大小的图片,为 3x640x640,采用 `demo/000000014439_640x640.jpg` 图片。 + - Batch Size=1 + - 去掉前100轮warmup时间,测试100轮的平均时间,单位ms/image,包括网络计算时间、数据拷贝至CPU的时间。 + - 采用Fluid C++预测引擎: 包含Fluid C++预测、Fluid-TensorRT预测,下面同时测试了Float32 (FP32) 和Float16 (FP16)的推理速度。 + +**注意:** TensorRT中固定尺寸和动态尺寸区别请参考文档[TENSOR教程](TENSOR_RT.md)。由于固定尺寸下对两阶段模型支持不完善,所以faster rcnn模型采用动态尺寸测试。固定尺寸和动态尺寸支持融合的OP不完全一样,因此同一个模型在固定尺寸和动态尺寸下测试的性能可能会有一点差异。 + +## 二、推理速度 + +### 1、Linux系统 +#### (1)Tesla V100 + +| 模型 | backbone | 是否固定尺寸 | 入网尺寸 | paddle_inference | trt_fp32 | trt_fp16 | +|-------------------------------|--------------|--------|----------|------------------|----------|----------| +| Faster RCNN FPN | ResNet50 | 否 | 640x640 | 27.99 | 26.15 | 21.92 | +| Faster RCNN FPN | ResNet50 | 否 | 800x1312 | 32.49 | 25.54 | 21.70 | +| YOLOv3 | Mobilenet\_v1 | 是 | 608x608 | 9.74 | 8.61 | 6.28 | +| YOLOv3 | Darknet53 | 是 | 608x608 | 17.84 | 15.43 | 9.86 | +| PPYOLO | ResNet50 | 是 | 608x608 | 20.77 | 18.40 | 13.53 | +| SSD | Mobilenet\_v1 | 是 | 300x300 | 5.17 | 4.43 | 4.29 | +| TTFNet | Darknet53 | 是 | 512x512 | 10.14 | 8.71 | 5.55 | +| FCOS | ResNet50 | 是 | 640x640 | 35.47 | 35.02 | 34.24 | + + +#### (2)Jetson AGX Xavier + +| 模型 | backbone | 是否固定尺寸 | 入网尺寸 | paddle_inference | trt_fp32 | trt_fp16 | +|-------------------------------|--------------|--------|----------|------------------|----------|----------| +| Faster RCNN FPN | ResNet50 | 否 | 640x640 | 169.45 | 158.92 | 119.25 | +| Faster RCNN FPN | ResNet50 | 否 | 800x1312 | 228.07 | 156.39 | 117.03 | +| YOLOv3 | Mobilenet\_v1 | 是 | 608x608 | 48.76 | 43.83 | 18.41 | +| YOLOv3 | Darknet53 | 是 | 608x608 | 121.61 | 110.30 | 42.38 | +| PPYOLO | ResNet50 | 是 | 608x608 | 111.80 | 99.40 | 48.05 | +| SSD | Mobilenet\_v1 | 是 | 300x300 | 10.52 | 8.84 | 8.77 | +| TTFNet | Darknet53 | 是 | 512x512 | 73.77 | 64.03 | 31.46 | +| FCOS | ResNet50 | 是 | 640x640 | 217.11 | 214.38 | 205.78 | + +### 2、Windows系统 +#### (1)GTX 1080Ti + +| 模型 | backbone | 是否固定尺寸 | 入网尺寸 | paddle_inference | trt_fp32 | trt_fp16 | +|-------------------------------|--------------|--------|----------|------------------|----------|----------| +| Faster RCNN FPN | ResNet50 | 否 | 640x640 | 50.74 | 57.17 | 62.08 | +| Faster RCNN FPN | ResNet50 | 否 | 800x1312 | 50.31 | 57.61 | 62.05 | +| YOLOv3 | Mobilenet\_v1 | 是 | 608x608 | 14.51 | 11.23 | 11.13 | +| YOLOv3 | Darknet53 | 是 | 608x608 | 30.26 | 23.92 | 24.02 | +| PPYOLO | ResNet50 | 是 | 608x608 | 38.06 | 31.40 | 31.94 | +| SSD | Mobilenet\_v1 | 是 | 300x300 | 16.47 | 13.87 | 13.76 | +| TTFNet | Darknet53 | 是 | 512x512 | 21.83 | 17.14 | 17.09 | +| FCOS | ResNet50 | 是 | 640x640 | 71.88 | 69.93 | 69.52 | diff --git a/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER_en.md b/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b0b92b6cc142bb6b07a703ccadc5a017f8080956 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/BENCHMARK_INFER_en.md @@ -0,0 +1,61 @@ +# Inference Benchmark + +## 一、Prepare the Environment +- 1、Test Environment: + - CUDA 10.1 + - CUDNN 7.6 + - TensorRT-6.0.1 + - PaddlePaddle v2.0.1 + - The GPUS are Tesla V100 and GTX 1080 Ti and Jetson AGX Xavier +- 2、Test Method: + - In order to compare the inference speed of different models, the input shape is 3x640x640, use `demo/000000014439_640x640.jpg`. + - Batch_size=1 + - Delete the warmup time of the first 100 rounds and test the average time of 100 rounds in ms/image, including network calculation time and data copy time to CPU. + - Using Fluid C++ prediction engine: including Fluid C++ prediction, Fluid TensorRT prediction, the following test Float32 (FP32) and Float16 (FP16) inference speed. + +**Attention:** For TensorRT, please refer to the [TENSOR tutorial](TENSOR_RT.md) for the difference between fixed and dynamic dimensions. Due to the imperfect support for the two-stage model under fixed size, dynamic size test was adopted for the Faster RCNN model. Fixed size and dynamic size do not support exactly the same OP for fusion, so the performance of the same model tested at fixed size and dynamic size may differ slightly. + + +## 二、Inferring Speed + +### 1、Linux System +#### (1)Tesla V100 + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 27.99 | 26.15 | 21.92 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 32.49 | 25.54 | 21.70 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 9.74 | 8.61 | 6.28 | +| YOLOv3 | Darknet53 | yes | 608x608 | 17.84 | 15.43 | 9.86 | +| PPYOLO | ResNet50 | yes | 608x608 | 20.77 | 18.40 | 13.53 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 5.17 | 4.43 | 4.29 | +| TTFNet | Darknet53 | yes | 512x512 | 10.14 | 8.71 | 5.55 | +| FCOS | ResNet50 | yes | 640x640 | 35.47 | 35.02 | 34.24 | + + +#### (2)Jetson AGX Xavier + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 169.45 | 158.92 | 119.25 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 228.07 | 156.39 | 117.03 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 48.76 | 43.83 | 18.41 | +| YOLOv3 | Darknet53 | yes | 608x608 | 121.61 | 110.30 | 42.38 | +| PPYOLO | ResNet50 | yes | 608x608 | 111.80 | 99.40 | 48.05 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 10.52 | 8.84 | 8.77 | +| TTFNet | Darknet53 | yes | 512x512 | 73.77 | 64.03 | 31.46 | +| FCOS | ResNet50 | yes | 640x640 | 217.11 | 214.38 | 205.78 | + +### 2、Windows System +#### (1)GTX 1080Ti + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 50.74 | 57.17 | 62.08 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 50.31 | 57.61 | 62.05 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 14.51 | 11.23 | 11.13 | +| YOLOv3 | Darknet53 | yes | 608x608 | 30.26 | 23.92 | 24.02 | +| PPYOLO | ResNet50 | yes | 608x608 | 38.06 | 31.40 | 31.94 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 16.47 | 13.87 | 13.76 | +| TTFNet | Darknet53 | yes | 512x512 | 21.83 | 17.14 | 17.09 | +| FCOS | ResNet50 | yes | 640x640 | 71.88 | 69.93 | 69.52 | diff --git a/PaddleDetection-release-2.6/deploy/EXPORT_MODEL.md b/PaddleDetection-release-2.6/deploy/EXPORT_MODEL.md new file mode 100644 index 0000000000000000000000000000000000000000..91f34b5860d6384baf773e71a39ffa4ec773dee6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/EXPORT_MODEL.md @@ -0,0 +1,54 @@ +# PaddleDetection模型导出教程 + +## 一、模型导出 +本章节介绍如何使用`tools/export_model.py`脚本导出模型。 +### 1、导出模输入输出说明 +- 输入变量以及输入形状如下: + + | 输入名称 | 输入形状 | 表示含义 | + | :---------: | ----------- | ---------- | + | image | [None, 3, H, W] | 输入网络的图像,None表示batch维度,如果输入图像大小为变长,则H,W为None | + | im_shape | [None, 2] | 图像经过resize后的大小,表示为H,W, None表示batch维度 | + | scale_factor | [None, 2] | 输入图像大小比真实图像大小,表示为scale_y, scale_x | + +**注意**具体预处理方式可参考配置文件中TestReader部分。 + + +- PaddleDetection中动转静导出模型输出统一为: + + - bbox, NMS的输出,形状为[N, 6], 其中N为预测框的个数,6为[class_id, score, x1, y1, x2, y2]。 + - bbox\_num, 每张图片对应预测框的个数,例如batch_size为2,输出为[N1, N2], 表示第一张图包含N1个预测框,第二张图包含N2个预测框,并且预测框的总个数和NMS输出的第一维N相同 + - mask,如果网络中包含mask,则会输出mask分支 + +**注意**模型动转静导出不支持模型结构中包含numpy相关操作的情况。 + + +### 2、启动参数说明 + +| FLAG | 用途 | 默认值 | 备注 | +|:--------------:|:--------------:|:------------:|:-----------------------------------------:| +| -c | 指定配置文件 | None | | +| --output_dir | 模型保存路径 | `./output_inference` | 模型默认保存在`output/配置文件名/`路径下 | + +### 3、使用示例 + +使用训练得到的模型进行试用,脚本如下 + +```bash +# 导出YOLOv3模型 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams +``` + +预测模型会导出到`inference_model/yolov3_darknet53_270e_coco`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`。 + + +### 4、设置导出模型的输入大小 + +使用Fluid-TensorRT进行预测时,由于<=TensorRT 5.1的版本仅支持定长输入,保存模型的`data`层的图片大小需要和实际输入图片大小一致。而Fluid C++预测引擎没有此限制。设置TestReader中的`image_shape`可以修改保存模型中的输入图片大小。示例如下: + +```bash +# 导出YOLOv3模型,输入是3x640x640 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams TestReader.inputs_def.image_shape=[3,640,640] +``` diff --git a/PaddleDetection-release-2.6/deploy/EXPORT_MODEL_en.md b/PaddleDetection-release-2.6/deploy/EXPORT_MODEL_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d2828edeb8388b4633b7d8489923a059ef96321c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/EXPORT_MODEL_en.md @@ -0,0 +1,53 @@ +# PaddleDetection Model Export Tutorial + +## 一、Model Export +This section describes how to use the `tools/export_model.py` script to export models. +### Export model input and output description +- Input variables and input shapes are as follows: + + | Input Name | Input Shape | Meaning | + | :----------: | --------------- | ------------------------------------------------------------------------------------------------------------------------- | + | image | [None, 3, H, W] | Enter the network image. None indicates the Batch dimension. If the input image size is variable length, H and W are None | + | im_shape | [None, 2] | The size of the image after resize is expressed as H,W, and None represents the Batch dimension | + | scale_factor | [None, 2] | The input image size is larger than the real image size, denoted byscale_y, scale_x | + +**Attention**For details about the preprocessing method, see the Test Reader section in the configuration file. + + +-The output of the dynamic and static derived model in Paddle Detection is unified as follows: + + - bbox, the output of NMS, in the shape of [N, 6], where N is the number of prediction boxes, and 6 is [class_id, score, x1, y1, x2, y2]. + - bbox\_num, Each picture corresponds to the number of prediction boxes. For example, batch size is 2 and the output is [N1, N2], indicating that the first picture contains N1 prediction boxes and the second picture contains N2 prediction boxes, and the total number of prediction boxes is the same as the first dimension N output by NMS + - mask, If the network contains a mask, the mask branch is printed + +**Attention**The model-to-static export does not support cases where numpy operations are included in the model structure. + + +### 2、Start Parameters + +| FLAG | USE | DEFAULT | NOTE | +| :----------: | :-----------------------------: | :------------------: | :-------------------------------------------------------------------: | +| -c | Specifying a configuration file | None | | +| --output_dir | Model save path | `./output_inference` | The model is saved in the `output/default_file_name/` path by default | + +### 3、Example + +Using the trained model for trial use, the script is as follows: + +```bash +# The YOLOv3 model is exported +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams +``` +The prediction model will be exported to the `inference_model/yolov3_darknet53_270e_coco` directory. `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` respectively. + + +### 4、Sets the input size of the export model +When using Fluid TensorRT for prediction, since <= TensorRT 5.1 only supports fixed-length input, the image size of the `data` layer of the saved model needs to be the same as the actual input image size. Fluid C++ prediction engine does not have this limitation. Setting `image_shape` in Test Reader changes the size of the input image in the saved model. The following is an example: + + +```bash +#Export the YOLOv3 model with the input 3x640x640 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams TestReader.inputs_def.image_shape=[3,640,640] +``` diff --git a/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL.md b/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL.md new file mode 100644 index 0000000000000000000000000000000000000000..e1f4027833973a9c37fb9f144e77beeead3acb41 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL.md @@ -0,0 +1,112 @@ +# PaddleDetection模型导出为ONNX格式教程 + +PaddleDetection模型支持保存为ONNX格式,目前测试支持的列表如下 +| 模型 | OP版本 | 备注 | +| :---- | :----- | :--- | +| YOLOv3 | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-YOLO | 11 | 仅支持batch=1推理;MatrixNMS将被转换NMS,精度略有变化;模型导出需固定shape | +| PP-YOLOv2 | 11 | 仅支持batch=1推理;MatrixNMS将被转换NMS,精度略有变化;模型导出需固定shape | +| PP-YOLO Tiny | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-YOLOE | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-PicoDet | 11 | 仅支持batch=1推理;模型导出需固定shape | +| FCOS | 11 |仅支持batch=1推理 | +| PAFNet | 11 |- | +| TTFNet | 11 |-| +| SSD | 11 |仅支持batch=1推理 | +| PP-TinyPose | 11 | - | +| Faster RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Mask RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Cascade RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Cascade Mask RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| + +保存ONNX的功能由[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX)提供,如在转换中有相关问题反馈,可在Paddle2ONNX的Github项目中通过[ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues)与工程师交流。 + +## 导出教程 + +### 步骤一、导出PaddlePaddle部署模型 + + +导出步骤参考文档[PaddleDetection部署模型导出教程](./EXPORT_MODEL.md), 导出示例如下 + +- 非RCNN系列模型, 以YOLOv3为例 +``` +cd PaddleDetection +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams \ + TestReader.inputs_def.image_shape=[3,608,608] \ + --output_dir inference_model +``` +导出后的模型保存在`inference_model/yolov3_darknet53_270e_coco/`目录中,结构如下 +``` +yolov3_darknet + ├── infer_cfg.yml # 模型配置文件信息 + ├── model.pdiparams # 静态图模型参数 + ├── model.pdiparams.info # 参数额外信息,一般无需关注 + └── model.pdmodel # 静态图模型文件 +``` +> 注意导出时的参数`TestReader.inputs_def.image_shape`,对于YOLO系列模型注意导出时指定该参数,否则无法转换成功 + +- RCNN系列模型,以Faster RCNN为例 + +RCNN系列模型导出ONNX模型时,需要去除模型中的控制流,因此需要额外添加`export_onnx=True` 字段 +``` +cd PaddleDetection +python tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams \ + export_onnx=True \ + --output_dir inference_model +``` + +导出的模型保存在`inference_model/faster_rcnn_r50_fpn_1x_coco/`目录中,结构如下 +``` +faster_rcnn_r50_fpn_1x_coco + ├── infer_cfg.yml # 模型配置文件信息 + ├── model.pdiparams # 静态图模型参数 + ├── model.pdiparams.info # 参数额外信息,一般无需关注 + └── model.pdmodel # 静态图模型文件 +``` + +### 步骤二、将部署模型转为ONNX格式 +安装Paddle2ONNX(高于或等于0.9.7版本) +``` +pip install paddle2onnx +``` +使用如下命令转换 +``` +# YOLOv3 +paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file yolov3.onnx + +# Faster RCNN +paddle2onnx --model_dir inference_model/faster_rcnn_r50_fpn_1x_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 16 \ + --save_file faster_rcnn.onnx +``` +转换后的模型即为在当前路径下的`yolov3.onnx`和`faster_rcnn.onnx` + +### 步骤三、使用onnxruntime进行推理 +安装onnxruntime +``` +pip install onnxruntime +``` +推理代码示例在[deploy/third_engine/onnx](./third_engine/onnx)下 + +使用如下命令进行推理: +``` +# YOLOv3 +python deploy/third_engine/onnx/infer.py + --infer_cfg inference_model/yolov3_darknet53_270e_coco/infer_cfg.yml \ + --onnx_file yolov3.onnx \ + --image_file demo/000000014439.jpg + +# Faster RCNN +python deploy/third_engine/onnx/infer.py + --infer_cfg inference_model/faster_rcnn_r50_fpn_1x_coco/infer_cfg.yml \ + --onnx_file faster_rcnn.onnx \ + --image_file demo/000000014439.jpg +``` diff --git a/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL_en.md b/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL_en.md new file mode 100644 index 0000000000000000000000000000000000000000..750959062dc20cc68600bbd89e9264468c11e4d6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/EXPORT_ONNX_MODEL_en.md @@ -0,0 +1,110 @@ +# PaddleDetection Model Export as ONNX Format Tutorial + +PaddleDetection Model support is saved in ONNX format and the list of current test support is as follows +| Model | OP Version | NOTE | +| :---- | :----- | :--- | +| YOLOv3 | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-YOLO | 11 | Only batch=1 inferring is supported. A MatrixNMS will be converted to an NMS with slightly different precision; Model export needs fixed shape | +| PP-YOLOv2 | 11 | Only batch=1 inferring is supported. MatrixNMS will be converted to NMS with slightly different precision; Model export needs fixed shape | +| PP-YOLO Tiny | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-YOLOE | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-PicoDet | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| FCOS | 11 |Only batch=1 inferring is supported | +| PAFNet | 11 |- | +| TTFNet | 11 |-| +| SSD | 11 |Only batch=1 inferring is supported | +| PP-TinyPose | 11 | - | +| Faster RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Mask RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Cascade RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Cascade Mask RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| + + +The function of saving ONNX is provided by [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). If there is feedback on related problems during conversion, Communicate with engineers in Paddle2ONNX's Github project via [ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues). + +## Export Tutorial + +### Step 1. Export the Paddle deployment model +Export procedure reference document[Tutorial on PaddleDetection deployment model export](./EXPORT_MODEL_en.md), for example: + +- Models except RCNN series, take YOLOv3 as example +``` +cd PaddleDetection +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams \ + TestReader.inputs_def.image_shape=[3,608,608] \ + --output_dir inference_model +``` +The derived models were saved in `inference_model/yolov3_darknet53_270e_coco/`, with the structure as follows +``` +yolov3_darknet + ├── infer_cfg.yml # Model configuration file information + ├── model.pdiparams # Static diagram model parameters + ├── model.pdiparams.info # Parameter Information is not required + └── model.pdmodel # Static diagram model file +``` +> check`TestReader.inputs_def.image_shape`, For YOLO series models, specify this parameter when exporting; otherwise, the conversion fails + +- RCNN series models, take Faster RCNN as example + +The conditional block needs to be removed in RCNN series when export ONNX model. Add `export_onnx=True` in command line +``` +cd PaddleDetection +python tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams \ + export_onnx=True \ + --output_dir inference_model +``` +The derived models were saved in `inference_model/faster_rcnn_r50_fpn_1x_coco/`, with the structure as follows +``` +faster_rcnn_r50_fpn_1x_coco + ├── infer_cfg.yml # Model configuration file information + ├── model.pdiparams # Static diagram model parameters + ├── model.pdiparams.info # Parameter Information is not required + └── model.pdmodel # Static diagram model file +``` + +### Step 2. Convert the deployment model to ONNX format +Install Paddle2ONNX (version 0.9.7 or higher) +``` +pip install paddle2onnx +``` +Use the following command to convert +``` +# YOLOv3 +paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file yolov3.onnx + +# Faster RCNN +paddle2onnx --model_dir inference_model/faster_rcnn_r50_fpn_1x_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 16 \ + --save_file faster_rcnn.onnx +``` +The transformed model is under the current path`yolov3.onnx` and `faster_rcnn.onnx` + +### Step 3. Inference with onnxruntime +Install onnxruntime +``` +pip install onnxruntime +``` +Inference code examples are in [deploy/third_engine/onnx](./third_engine/onnx) + +Use the following commands for inference: +``` +# YOLOv3 +python deploy/third_engine/onnx/infer.py + --infer_cfg inference_model/yolov3_darknet53_270e_coco/infer_cfg.yml \ + --onnx_file yolov3.onnx \ + --image_file demo/000000014439.jpg + +# Faster RCNN +python deploy/third_engine/onnx/infer.py + --infer_cfg inference_model/faster_rcnn_r50_fpn_1x_coco/infer_cfg.yml \ + --onnx_file faster_rcnn.onnx \ + --image_file demo/000000014439.jpg +``` diff --git a/PaddleDetection-release-2.6/deploy/README.md b/PaddleDetection-release-2.6/deploy/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ac1ba72f61c760d04376a510af55ed6bd4ac75b7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/README.md @@ -0,0 +1,84 @@ +# PaddleDetection 预测部署 + +PaddleDetection提供了Paddle Inference、Paddle Serving、Paddle-Lite多种部署形式,支持服务端、移动端、嵌入式等多种平台,提供了完善的Python和C++部署方案。 + +## PaddleDetection支持的部署形式说明 +|形式|语言|教程|设备/平台| +|-|-|-|-| +|Paddle Inference|Python|已完善|Linux(ARM\X86)、Windows +|Paddle Inference|C++|已完善|Linux(ARM\X86)、Windows| +|Paddle Serving|Python|已完善|Linux(ARM\X86)、Windows| +|Paddle-Lite|C++|已完善|Android、IOS、FPGA、RK... + + +## 1.Paddle Inference部署 + +### 1.1 导出模型 + +使用`tools/export_model.py`脚本导出模型以及部署时使用的配置文件,配置文件名字为`infer_cfg.yml`。模型导出脚本如下: +```bash +# 导出YOLOv3模型 +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams +``` +预测模型会导出到`output_inference/yolov3_mobilenet_v1_roadsign`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`。 +模型导出具体请参考文档[PaddleDetection模型导出教程](EXPORT_MODEL.md)。 + +### 1.2 使用PaddleInference进行预测 +* Python部署 支持`CPU`、`GPU`和`XPU`环境,支持,windows、linux系统,支持NV Jetson嵌入式设备上部署。参考文档[python部署](python/README.md) +* C++部署 支持`CPU`、`GPU`和`XPU`环境,支持,windows、linux系统,支持NV Jetson嵌入式设备上部署。参考文档[C++部署](cpp/README.md) +* PaddleDetection支持TensorRT加速,相关文档请参考[TensorRT预测部署教程](TENSOR_RT.md) + +**注意:** Paddle预测库版本需要>=2.1,batch_size>1仅支持YOLOv3和PP-YOLO。 + +## 2.PaddleServing部署 +### 2.1 导出模型 + +如果需要导出`PaddleServing`格式的模型,需要设置`export_serving_model=True`: +```buildoutcfg +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True +``` +预测模型会导出到`output_inference/yolov3_darknet53_270e_coco`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`, `serving_client/`文件夹, `serving_server/`文件夹。 + +模型导出具体请参考文档[PaddleDetection模型导出教程](EXPORT_MODEL.md)。 + +### 2.2 使用PaddleServing进行预测 +* [安装PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation) +* [使用PaddleServing](./serving/README.md) + + +## 3.PaddleLite部署 +- [使用PaddleLite部署PaddleDetection模型](./lite/README.md) +- 详细案例请参考[Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)部署。更多内容,请参考[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) + + +## 4.第三方部署(MNN、NCNN、Openvino) +- 第三方部署提供PicoDet、TinyPose案例,其他模型请参考修改 +- TinyPose部署推荐工具:Intel CPU端推荐使用Openvino,GPU端推荐使用PaddleInference,ARM/ANDROID端推荐使用PaddleLite或者MNN + +| Third_Engine | MNN | NCNN | OPENVINO | +| ------------ | ---- | ----- | ---------- | +| PicoDet | [PicoDet_MNN](./third_engine/demo_mnn/README.md) | [PicoDet_NCNN](./third_engine/demo_ncnn/README.md) | [PicoDet_OPENVINO](./third_engine/demo_openvino/README.md) | +| TinyPose | [TinyPose_MNN](./third_engine/demo_mnn_kpts/README.md) | - | [TinyPose_OPENVINO](./third_engine/demo_openvino_kpts/README.md) | + + + +## 5.Benchmark测试 +- 使用导出的模型,运行Benchmark批量测试脚本: +```shell +sh deploy/benchmark/benchmark.sh {model_dir} {model_name} +``` +**注意** 如果是量化模型,请使用`deploy/benchmark/benchmark_quant.sh`脚本。 +- 将测试结果log导出至Excel中: +``` +python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx +``` + +## 6.常见问题QA +- 1、`Paddle 1.8.4`训练的模型,可以用`Paddle2.0`部署吗? + Paddle 2.0是兼容Paddle 1.8.4的,因此是可以的。但是部分模型(如SOLOv2)使用到了Paddle 2.0中新增OP,这类模型不可以。 + +- 2、Windows编译时,预测库是VS2015编译的,选择VS2017或VS2019会有问题吗? + 关于VS兼容性问题请参考:[C++Visual Studio 2015、2017和2019之间的二进制兼容性](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160) + +- 3、cuDNN 8.0.4连续预测会发生内存泄漏吗? + 经QA测试,发现cuDNN 8系列连续预测时都有内存泄漏问题,且cuDNN 8性能差于cuDNN 7,推荐使用CUDA + cuDNN7.6.4的方式进行部署。 diff --git a/PaddleDetection-release-2.6/deploy/README_en.md b/PaddleDetection-release-2.6/deploy/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..f587b56b99e7a6b7c7ed31c5ae6307ade6e18126 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/README_en.md @@ -0,0 +1,83 @@ +# PaddleDetection Predict deployment + +PaddleDetection provides multiple deployment forms of Paddle Inference, Paddle Serving and Paddle-Lite, supports multiple platforms such as server, mobile and embedded, and provides a complete Python and C++ deployment solution + +## PaddleDetection This section describes the supported deployment modes +| formalization | language | Tutorial | Equipment/Platform | +| ---------------- | -------- | ----------- | ------------------------- | +| Paddle Inference | Python | Has perfect | Linux(ARM\X86)、Windows | +| Paddle Inference | C++ | Has perfect | Linux(ARM\X86)、Windows | +| Paddle Serving | Python | Has perfect | Linux(ARM\X86)、Windows | +| Paddle-Lite | C++ | Has perfect | Android、IOS、FPGA、RK... | + + +## 1.Paddle Inference Deployment + +### 1.1 The export model + +Use the `tools/export_model.py` script to export the model and the configuration file used during deployment. The configuration file name is `infer_cfg.yml`. The model export script is as follows + +```bash +# The YOLOv3 model is derived +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams +``` +The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md). + +### 1.2 Use Paddle Inference to Make Predictions +* Python deployment supports `CPU`, `GPU` and `XPU` environments, Windows, Linux, and NV Jetson embedded devices. Reference Documentation [Python Deployment](python/README.md) +* C++ deployment supports `CPU`, `GPU` and `XPU` environments, Windows and Linux systems, and NV Jetson embedded devices. Reference documentation [C++ deployment](cpp/README.md) +* PaddleDetection supports TensorRT acceleration. Please refer to the documentation for [TensorRT Predictive Deployment Tutorial](TENSOR_RT.md) + +**Attention:** Paddle prediction library version requires >=2.1, and batch_size>1 only supports YOLOv3 and PP-YOLO. + +## 2.PaddleServing Deployment +### 2.1 Export model + +If you want to export the model in `PaddleServing` format, set `export_serving_model=True`: +```buildoutcfg +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True +``` +The prediction model will be exported to the `output_inference/yolov3_darknet53_270e_coco` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`, `serving_client/` and `serving_server/` folder. + +For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md). + +### 2.2 Predictions are made using Paddle Serving +* [Install PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation) +* [Use PaddleServing](./serving/README.md) + + +## 3. PaddleLite Deployment +- [Deploy the PaddleDetection model using PaddleLite](./lite/README.md) +- For details, please refer to [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo) deployment. For more information, please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) + + +## 4.Third-Engine deploy(MNN、NCNN、Openvino) +- The Third-Engine deploy take example of PicoDet、TinyPose,the others model is the same +- Suggestion for TinyPose: For Intel CPU Openvino is recommended,for Nvidia GPU PaddleInference is recommended,and for ARM/ANDROID PaddleLite or MNN is recommended. + +| Third_Engine | MNN | NCNN | OPENVINO | +| ------------ | ------------------------------------------------------ | -------------------------------------------------- | ------------------------------------------------------------ | +| PicoDet | [PicoDet_MNN](./third_engine/demo_mnn/README.md) | [PicoDet_NCNN](./third_engine/demo_ncnn/README.md) | [PicoDet_OPENVINO](./third_engine/demo_openvino/README.md) | +| TinyPose | [TinyPose_MNN](./third_engine/demo_mnn_kpts/README.md) | - | [TinyPose_OPENVINO](./third_engine/demo_openvino_kpts/README.md) | + + +## 5. Benchmark Test +- Using the exported model, run the Benchmark batch test script: +```shell +sh deploy/benchmark/benchmark.sh {model_dir} {model_name} +``` +**Attention** If it is a quantitative model, please use the `deploy/benchmark/benchmark_quant.sh` script. +- Export the test result log to Excel: +``` +python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx +``` + +## 6. FAQ +- 1、Can `Paddle 1.8.4` trained models be deployed with `Paddle2.0`? + Paddle 2.0 is compatible with Paddle 1.8.4, so it is ok. However, some models (such as SOLOv2) use the new OP in Paddle 2.0, which is not allowed. + +- 2、When compiling for Windows, the prediction library is compiled with VS2015, will it be a problem to choose VS2017 or VS2019? + For compatibility issues with VS, please refer to: [C++ Visual Studio 2015, 2017 and 2019 binary compatibility](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160) + +- 3、Does cuDNN 8.0.4 continuously predict memory leaks? + QA tests show that cuDNN 8 series have memory leakage problems in continuous prediction, and cuDNN 8 performance is worse than cuDNN7. CUDA + cuDNN7.6.4 is recommended for deployment. diff --git a/PaddleDetection-release-2.6/deploy/TENSOR_RT.md b/PaddleDetection-release-2.6/deploy/TENSOR_RT.md new file mode 100644 index 0000000000000000000000000000000000000000..b1dd29789540746cce5f7ea3ce0a783e2178438d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/TENSOR_RT.md @@ -0,0 +1,98 @@ +# TensorRT预测部署教程 +TensorRT是NVIDIA提出的用于统一模型部署的加速库,可以应用于V100、JETSON Xavier等硬件,它可以极大提高预测速度。Paddle TensorRT教程请参考文档[使用Paddle-TensorRT库预测](https://www.paddlepaddle.org.cn/inference/optimize/paddle_trt.html) + +## 1. 安装PaddleInference预测库 +- Python安装包,请从[这里](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#python) 下载带有tensorrt的安装包进行安装 + +- CPP预测库,请从[这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 下载带有TensorRT编译的预测库 + +- 如果Python和CPP官网没有提供已编译好的安装包或预测库,请参考[源码安装](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html) 自行编译 + +**注意:** +- 您的机器上TensorRT的版本需要跟您使用的预测库中TensorRT版本保持一致。 +- PaddleDetection中部署预测要求TensorRT版本 > 6.0。 + +## 2. 导出模型 +模型导出具体请参考文档[PaddleDetection模型导出教程](./EXPORT_MODEL.md)。 + +## 3. 开启TensorRT加速 +### 3.1 配置TensorRT +在使用Paddle预测库构建预测器配置config时,打开TensorRT引擎就可以了: + +``` +config->EnableUseGpu(100, 0); // 初始化100M显存,使用GPU ID为0 +config->GpuDeviceId(); // 返回正在使用的GPU ID +// 开启TensorRT预测,可提升GPU预测性能,需要使用带TensorRT的预测库 +config->EnableTensorRtEngine(1 << 20 /*workspace_size*/, + batch_size /*max_batch_size*/, + 3 /*min_subgraph_size*/, + AnalysisConfig::Precision::kFloat32 /*precision*/, + false /*use_static*/, + false /*use_calib_mode*/); + +``` +**注意:** + --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 + +### 3.2 TensorRT固定尺寸预测 + +例如在模型Reader配置文件中设置: +```yaml +TestReader: + inputs_def: + image_shape: [3,608,608] + ... +``` +或者在导出模型时设置`-o TestReader.inputs_def.image_shape=[3,608,608]`,模型将会进行固定尺寸预测,具体请参考[PaddleDetection模型导出教程](./EXPORT_MODEL.md) 。 + +可以通过[visualdl](https://www.paddlepaddle.org.cn/paddle/visualdl/demo/graph) 打开`model.pdmodel`文件,查看输入的第一个Tensor尺寸是否是固定的,如果不指定,尺寸会用`?`表示,如下图所示: +![img](../docs/images/input_shape.png) + + +注意:由于TesnorRT不支持在batch维度进行slice操作,Faster RCNN 和 Mask RCNN不能使用固定尺寸输入预测,所以不能设置`TestReader.inputs_def.image_shape`字段。 + +以`YOLOv3`为例,使用固定尺寸输入预测: +``` +python python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp32 --run_benchmark=True +``` + +### 3.3 TensorRT动态尺寸预测 + +TensorRT版本>=6时,使用TensorRT预测时,可以支持动态尺寸输入。如果模型Reader配置文件中没有设置例如`TestReader.inputs_def.image_shape=[3,608,608]`的字段,或者`image_shape=[3.-1,-1]`,导出模型将以动态尺寸进行预测。一般RCNN系列模型使用动态图尺寸预测。 +Paddle预测库关于动态尺寸输入请查看[Paddle CPP预测](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/native_infer.html) 的`SetTRTDynamicShapeInfo`函数说明。 + +`python/infer.py`设置动态尺寸输入参数说明: + +- trt_min_shape 用于设定TensorRT的输入图像height、width中的最小尺寸,默认值:1 + +- trt_max_shape 用于设定TensorRT的输入图像height、width中的最大尺寸,默认值:1280 + +- trt_opt_shape 用于设定TensorRT的输入图像height、width中的最优尺寸,默认值:640 + +**注意:`TensorRT`中动态尺寸设置是4维的,这里只设置输入图像的尺寸。** + +以`Faster RCNN`为例,使用动态尺寸输入预测: +``` +python python/infer.py --model_dir=./output_inference/faster_rcnn_r50_fpn_1x_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp16 --run_benchmark=True --trt_max_shape=1280 --trt_min_shape=800 --trt_opt_shape=960 +``` + +## 4、常见问题QA +**Q:** 提示没有`tensorrt_op`
    +**A:** 请检查是否使用带有TensorRT的Paddle Python包或预测库。 + +**Q:** 提示`op out of memory`
    +**A:** 检查GPU是否是别人也在使用,请尝试使用空闲GPU + +**Q:** 提示`some trt inputs dynamic shape info not set`
    +**A:** 这是由于`TensorRT`会把网络结果划分成多个子图,我们只设置了输入数据的动态尺寸,划分的其他子图的输入并未设置动态尺寸。有两个解决方法: + +- 方法一:通过增大`min_subgraph_size`,跳过对这些子图的优化。根据提示,设置min_subgraph_size大于并未设置动态尺寸输入的子图中OP个数即可。 +`min_subgraph_size`的意思是,在加载TensorRT引擎的时候,大于`min_subgraph_size`的OP才会被优化,并且这些OP是连续的且是TensorRT可以优化的。 + +- 方法二:找到子图的这些输入,按照上面方式也设置子图的输入动态尺寸。 + +**Q:** 如何打开日志
    +**A:** 预测库默认是打开日志的,只要注释掉`config.disable_glog_info()`就可以打开日志 + +**Q:** 开启TensorRT,预测时提示Slice on batch axis is not supported in TensorRT
    +**A:** 请尝试使用动态尺寸输入 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/README.md b/PaddleDetection-release-2.6/deploy/auto_compression/README.md new file mode 100644 index 0000000000000000000000000000000000000000..26e7b808f976867ef734ed2c6e01cdfa0d730883 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/README.md @@ -0,0 +1,155 @@ +# 自动化压缩 + +目录: +- [1.简介](#1简介) +- [2.Benchmark](#2Benchmark) +- [3.开始自动压缩](#自动压缩流程) + - [3.1 环境准备](#31-准备环境) + - [3.2 准备数据集](#32-准备数据集) + - [3.3 准备预测模型](#33-准备预测模型) + - [3.4 测试模型精度](#34-测试模型精度) + - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型) +- [4.预测部署](#4预测部署) + +## 1. 简介 +本示例使用PaddleDetection中Inference部署模型进行自动化压缩,使用的自动化压缩策略为量化蒸馏。 + + +## 2.Benchmark + +### PP-YOLOE+ + +| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 | +| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: | +| PP-YOLOE+_s | 43.7 | - | 42.9 | - | - | - | [config](./configs/ppyoloe_plus_s_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_s_qat_dis.tar) | +| PP-YOLOE+_m | 49.8 | - | 49.3 | - | - | - | [config](./configs/ppyoloe_plus_m_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_m_qat_dis.tar) | +| PP-YOLOE+_l | 52.9 | - | 52.6 | - | - | - | [config](./configs/ppyoloe_plus_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_l_qat_dis.tar) | +| PP-YOLOE+_x | 54.7 | - | 54.4 | - | - | - | [config](./configs/ppyoloe_plus_x_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_x_qat_dis.tar) | + +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 + +### YOLOv8 + +| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 | +| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: | +| YOLOv8-s | 44.9 | 43.9 | 44.3 | 9.27ms | 4.65ms | **3.78ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/yolov8_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms_quant.tar) | + +**注意:** +- 表格中YOLOv8模型均为带NMS的模型,可直接在TRT中部署,如果需要对齐测试标准,需要测试不带NMS的模型。 +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 +- 表格中的性能在Tesla T4的GPU环境下测试,并且开启TensorRT,batch_size=1。 + +### PP-YOLOE + +| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 | +| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: | +| PP-YOLOE-l | 50.9 | - | 50.6 | 11.2ms | 7.7ms | **6.7ms** | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco_quant.tar) | + +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 +- PP-YOLOE-l模型在Tesla V100的GPU环境下测试,并且开启TensorRT,batch_size=1,包含NMS,测试脚本是[benchmark demo](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy/python)。 + +### PP-PicoDet + +| 模型 | 策略 | mAP | FP32 | FP16 | INT8 | 配置文件 | 模型 | +| :-------- |:-------- |:--------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: | +| PicoDet-S-NPU | Baseline | 30.1 | - | - | - | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet/picodet_s_416_coco_npu.yml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_416_coco_npu.tar) | +| PicoDet-S-NPU | 量化训练 | 29.7 | - | - | - | [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/full_quantization/detection/configs/picodet_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_npu_quant.tar) | + +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 + +## 3. 自动压缩流程 + +#### 3.1 准备环境 +- PaddlePaddle >= 2.4 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.1 +- PaddleDet >= 2.5 +- opencv-python + +安装paddlepaddle: +```shell +# CPU +pip install paddlepaddle +# GPU +pip install paddlepaddle-gpu +``` + +安装paddleslim: +```shell +pip install paddleslim +``` + +安装paddledet: +```shell +pip install paddledet +``` + +**注意:** YOLOv8模型的自动化压缩需要依赖安装最新[Develop Paddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)和[Develop PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim#%E5%AE%89%E8%A3%85)版本。 + +#### 3.2 准备数据集 + +本案例默认以COCO数据进行自动压缩实验,如果自定义COCO数据,或者其他格式数据,请参考[数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/docs/tutorials/data/PrepareDataSet.md) 来准备数据。 + +如果数据集为非COCO格式数据,请修改[configs](./configs)中reader配置文件中的Dataset字段。 + +以PP-YOLOE模型为例,如果已经准备好数据集,请直接修改[./configs/yolo_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。 + +#### 3.3 准备预测模型 + +预测模型的格式为:`model.pdmodel` 和 `model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。 + + +根据[PaddleDetection文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA) 导出Inference模型,具体可参考下方PP-YOLOE模型的导出示例: +- 下载代码 +``` +git clone https://github.com/PaddlePaddle/PaddleDetection.git +``` +- 导出预测模型 + +PPYOLOE-l模型,包含NMS:如快速体验,可直接下载[PP-YOLOE-l导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar) +```shell +python tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams \ + trt=True \ +``` + +YOLOv8-s模型,包含NMS,具体可参考[YOLOv8模型文档](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8), 然后执行: +```shell +python tools/export_model.py \ + -c configs/yolov8/yolov8_s_500e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov8_s_500e_coco.pdparams \ + trt=True +``` + +如快速体验,可直接下载[YOLOv8-s导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms.tar) + +#### 3.4 自动压缩并产出模型 + +蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为: + +- 单卡训练: +``` +export CUDA_VISIBLE_DEVICES=0 +python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/' +``` + +- 多卡训练: +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \ + --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/' +``` + +#### 3.5 测试模型精度 + +使用eval.py脚本得到模型的mAP: +``` +export CUDA_VISIBLE_DEVICES=0 +python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml +``` + +**注意**: +- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。 + +## 4.预测部署 + +- 可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy),GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_reader.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..952a978ae32723e5a98bc63989e473d04e480c7c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_reader.yml @@ -0,0 +1,32 @@ +metric: COCO +num_classes: 80 + + +# Datset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_s_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_s_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..a5012be15a1e6791b27a9053417709ed96830bb0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/picodet_s_qat_dis.yaml @@ -0,0 +1,34 @@ +Global: + reader_config: ./configs/picodet_reader.yml + include_nms: True + Evaluation: True + model_dir: ./picodet_s_416_coco_npu/ + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: l2 + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + weight_bits: 8 + activation_bits: 8 + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 8000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00001 + T_max: 8000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..df346d2b00ec24b351f2d62974a13e33293f431b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/ppyoloe_reader.yml + include_nms: True + Evaluation: True + model_dir: ./ppyoloe_crn_l_300e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..fd03aed09d9a1ed3a67eec3283ef227224e941fb --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/ppyoloe_plus_reader.yml + include_nms: True + Evaluation: True + model_dir: ./ppyoloe_plus_crn_l_80e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..4d31332f5e3745604e03c50ad2f9db62376c1373 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/ppyoloe_plus_reader.yml + include_nms: True + Evaluation: True + model_dir: ./ppyoloe_plus_crn_m_80e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_reader.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5f3795f29be025e6836a7c88b51dd79ecb04a9f4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_reader.yml @@ -0,0 +1,26 @@ +metric: COCO +num_classes: 80 + +# Datset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 0 + +# preprocess reader in test +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 4 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..41bfde1e47855cdd1c543d13292d387781b8c0d6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/ppyoloe_plus_reader.yml + include_nms: True + Evaluation: True + model_dir: ./ppyoloe_plus_crn_s_80e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ac62e7ca2d22bae19ffcf99f8265a05ea7e1331c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/ppyoloe_plus_reader.yml + include_nms: True + Evaluation: True + model_dir: ./ppyoloe_plus_crn_x_80e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_reader.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..d1061453051e8f7408f4e605078956a8b634f13c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/ppyoloe_reader.yml @@ -0,0 +1,26 @@ +metric: COCO +num_classes: 80 + +# Datset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 0 + +# preprocess reader in test +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 4 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_reader.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..6ad321a04d12f822e98facd179d9d72b0d8aa741 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_reader.yml @@ -0,0 +1,26 @@ +metric: COCO +num_classes: 80 + +# Datset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 0 + +# preprocess reader in test +TestReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_s_qat_dis.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_s_qat_dis.yml new file mode 100644 index 0000000000000000000000000000000000000000..309977ef696ab23cc859fa224486e2ed7e91900e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov5_s_qat_dis.yml @@ -0,0 +1,29 @@ + +Global: + reader_config: configs/yolov5_reader.yml + include_nms: True + Evaluation: True + model_dir: ./yolov5_s_300e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 3000 + eval_iter: 1000 + learning_rate: 0.00001 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + target_metric: 0.365 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov6mt_s_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov6mt_s_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e134494fe2833333f3b2bcf87edb71e0b870a56f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov6mt_s_qat_dis.yaml @@ -0,0 +1,30 @@ + +Global: + reader_config: configs/yolov5_reader.yml + include_nms: True + Evaluation: True + model_dir: ./yolov6mt_s_400e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 8000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 8000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 0.00004 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov7_l_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov7_l_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..801ccb4057c4f36fe379c281a21965ddc63a2e8b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov7_l_qat_dis.yaml @@ -0,0 +1,30 @@ + +Global: + reader_config: configs/yolov5_reader.yml + include_nms: True + Evaluation: True + model_dir: ./yolov7_l_300e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 8000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 8000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 0.00004 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_reader.yml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..202a49415572201811ed53fe806c2b31c9051fde --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_reader.yml @@ -0,0 +1,27 @@ +metric: COCO +num_classes: 80 + +# Dataset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 0 + +# preprocess reader in test +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 4 diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_s_qat_dis.yaml b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_s_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8c93e203e918d798e055e260d73f747a6ef9d5cb --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/configs/yolov8_s_qat_dis.yaml @@ -0,0 +1,32 @@ + +Global: + reader_config: configs/yolov8_reader.yml + include_nms: True + Evaluation: True + model_dir: ./yolov8_s_500e_coco_trt_nms/ + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +QuantAware: + onnx_format: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 8000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 10000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/eval.py b/PaddleDetection-release-2.6/deploy/auto_compression/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..6de8aff85ce5f3cffa4119a1a3c26e318101db74 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/eval.py @@ -0,0 +1,163 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import numpy as np +import argparse +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.core.workspace import create +from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval +from paddleslim.auto_compression.config_helpers import load_config as load_slim_config +from post_process import PPYOLOEPostProcess + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + + return parser + + +def reader_wrapper(reader, input_list): + def gen(): + for data in reader: + in_dict = {} + if isinstance(input_list, list): + for input_name in input_list: + in_dict[input_name] = data[input_name] + elif isinstance(input_list, dict): + for input_name in input_list.keys(): + in_dict[input_list[input_name]] = data[input_name] + yield in_dict + + return gen + + +def convert_numpy_data(data, metric): + data_all = {} + data_all = {k: np.array(v) for k, v in data.items()} + if isinstance(metric, VOCMetric): + for k, v in data_all.items(): + if not isinstance(v[0], np.ndarray): + tmp_list = [] + for t in v: + tmp_list.append(np.array(t)) + data_all[k] = np.array(tmp_list) + else: + data_all = {k: np.array(v) for k, v in data.items()} + return data_all + + +def eval(): + + place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() + exe = paddle.static.Executor(place) + + val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model( + global_config["model_dir"].rstrip('/'), + exe, + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"]) + print('Loaded model from: {}'.format(global_config["model_dir"])) + + metric = global_config['metric'] + for batch_id, data in enumerate(val_loader): + data_all = convert_numpy_data(data, metric) + data_input = {} + for k, v in data.items(): + if isinstance(global_config['input_list'], list): + if k in global_config['input_list']: + data_input[k] = np.array(v) + elif isinstance(global_config['input_list'], dict): + if k in global_config['input_list'].keys(): + data_input[global_config['input_list'][k]] = np.array(v) + + outs = exe.run(val_program, + feed=data_input, + fetch_list=fetch_targets, + return_numpy=False) + res = {} + if 'arch' in global_config and global_config['arch'] == 'PPYOLOE': + postprocess = PPYOLOEPostProcess( + score_threshold=0.01, nms_threshold=0.6) + res = postprocess(np.array(outs[0]), data_all['scale_factor']) + else: + for out in outs: + v = np.array(out) + if len(v.shape) > 1: + res['bbox'] = v + else: + res['bbox_num'] = v + metric.update(data_all, res) + if batch_id % 100 == 0: + print('Eval iter:', batch_id) + metric.accumulate() + metric.log() + metric.reset() + + +def main(): + global global_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file." + global_config = all_config["Global"] + reader_cfg = load_config(global_config['reader_config']) + + dataset = reader_cfg['EvalDataset'] + global val_loader + val_loader = create('EvalReader')(reader_cfg['EvalDataset'], + reader_cfg['worker_num'], + return_list=True) + metric = None + if reader_cfg['metric'] == 'COCO': + clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} + anno_file = dataset.get_anno() + metric = COCOMetric( + anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') + elif reader_cfg['metric'] == 'VOC': + metric = VOCMetric( + label_list=dataset.get_label_list(), + class_num=reader_cfg['num_classes'], + map_type=reader_cfg['map_type']) + elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval': + anno_file = dataset.get_anno() + metric = KeyPointTopDownCOCOEval(anno_file, + len(dataset), 17, 'output_eval') + else: + raise ValueError("metric currently only supports COCO and VOC.") + global_config['metric'] = metric + + eval() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/post_process.py b/PaddleDetection-release-2.6/deploy/auto_compression/post_process.py new file mode 100644 index 0000000000000000000000000000000000000000..eea2f019548ec288a23e37b3bd2faf24f9a98935 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/post_process.py @@ -0,0 +1,157 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import cv2 + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PPYOLOEPostProcess(object): + """ + Args: + input_shape (int): network input image size + scale_factor (float): scale factor of ori image + """ + + def __init__(self, + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=10000, + keep_top_k=300): + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def _non_max_suppression(self, prediction, scale_factor): + batch_size = prediction.shape[0] + out_boxes_list = [] + box_num_list = [] + for batch_id in range(batch_size): + bboxes, confidences = prediction[batch_id][..., :4], prediction[ + batch_id][..., 4:] + # nms + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.nms_top_k) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + # resize output boxes + picked_box_probs[:, 0] /= scale_factor[batch_id][1] + picked_box_probs[:, 2] /= scale_factor[batch_id][1] + picked_box_probs[:, 1] /= scale_factor[batch_id][0] + picked_box_probs[:, 3] /= scale_factor[batch_id][0] + + # clas score box + out_box = np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1) + if out_box.shape[0] > self.keep_top_k: + out_box = out_box[out_box[:, 1].argsort()[::-1] + [:self.keep_top_k]] + out_boxes_list.append(out_box) + box_num_list.append(out_box.shape[0]) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + box_num_list = np.array(box_num_list) + return out_boxes_list, box_num_list + + def __call__(self, outs, scale_factor): + out_boxes_list, box_num_list = self._non_max_suppression(outs, + scale_factor) + return {'bbox': out_boxes_list, 'bbox_num': box_num_list} diff --git a/PaddleDetection-release-2.6/deploy/auto_compression/run.py b/PaddleDetection-release-2.6/deploy/auto_compression/run.py new file mode 100644 index 0000000000000000000000000000000000000000..d940307db618c80f015b32637e7610784d1affb9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/auto_compression/run.py @@ -0,0 +1,191 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import numpy as np +import argparse +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.core.workspace import create +from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval +from paddleslim.auto_compression.config_helpers import load_config as load_slim_config +from paddleslim.auto_compression import AutoCompression +from post_process import PPYOLOEPostProcess +from paddleslim.common.dataloader import get_feed_vars + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--save_dir', + type=str, + default='output', + help="directory to save compressed model.") + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + + return parser + + +def reader_wrapper(reader, input_list): + def gen(): + for data in reader: + in_dict = {} + if isinstance(input_list, list): + for input_name in input_list: + in_dict[input_name] = data[input_name] + elif isinstance(input_list, dict): + for input_name in input_list.keys(): + in_dict[input_list[input_name]] = data[input_name] + yield in_dict + + return gen + + +def convert_numpy_data(data, metric): + data_all = {} + data_all = {k: np.array(v) for k, v in data.items()} + if isinstance(metric, VOCMetric): + for k, v in data_all.items(): + if not isinstance(v[0], np.ndarray): + tmp_list = [] + for t in v: + tmp_list.append(np.array(t)) + data_all[k] = np.array(tmp_list) + else: + data_all = {k: np.array(v) for k, v in data.items()} + return data_all + + +def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): + metric = global_config['metric'] + for batch_id, data in enumerate(val_loader): + data_all = convert_numpy_data(data, metric) + data_input = {} + for k, v in data.items(): + if isinstance(global_config['input_list'], list): + if k in test_feed_names: + data_input[k] = np.array(v) + elif isinstance(global_config['input_list'], dict): + if k in global_config['input_list'].keys(): + data_input[global_config['input_list'][k]] = np.array(v) + outs = exe.run(compiled_test_program, + feed=data_input, + fetch_list=test_fetch_list, + return_numpy=False) + res = {} + if 'include_nms' in global_config and not global_config['include_nms']: + if 'arch' in global_config and global_config['arch'] == 'PPYOLOE': + postprocess = PPYOLOEPostProcess( + score_threshold=0.01, nms_threshold=0.6) + else: + assert "Not support arch={} now.".format(global_config['arch']) + res = postprocess(np.array(outs[0]), data_all['scale_factor']) + else: + for out in outs: + v = np.array(out) + if len(v.shape) > 1: + res['bbox'] = v + else: + res['bbox_num'] = v + + metric.update(data_all, res) + if batch_id % 100 == 0: + print('Eval iter:', batch_id) + metric.accumulate() + metric.log() + map_res = metric.get_results() + metric.reset() + map_key = 'keypoint' if 'arch' in global_config and global_config[ + 'arch'] == 'keypoint' else 'bbox' + return map_res[map_key][0] + + +def main(): + global global_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file." + global_config = all_config["Global"] + reader_cfg = load_config(global_config['reader_config']) + + train_loader = create('EvalReader')(reader_cfg['TrainDataset'], + reader_cfg['worker_num'], + return_list=True) + if global_config.get('input_list') is None: + global_config['input_list'] = get_feed_vars( + global_config['model_dir'], global_config['model_filename'], + global_config['params_filename']) + train_loader = reader_wrapper(train_loader, global_config['input_list']) + + if 'Evaluation' in global_config.keys() and global_config[ + 'Evaluation'] and paddle.distributed.get_rank() == 0: + eval_func = eval_function + dataset = reader_cfg['EvalDataset'] + global val_loader + _eval_batch_sampler = paddle.io.BatchSampler( + dataset, batch_size=reader_cfg['EvalReader']['batch_size']) + val_loader = create('EvalReader')(dataset, + reader_cfg['worker_num'], + batch_sampler=_eval_batch_sampler, + return_list=True) + metric = None + if reader_cfg['metric'] == 'COCO': + clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} + anno_file = dataset.get_anno() + metric = COCOMetric( + anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') + elif reader_cfg['metric'] == 'VOC': + metric = VOCMetric( + label_list=dataset.get_label_list(), + class_num=reader_cfg['num_classes'], + map_type=reader_cfg['map_type']) + elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval': + anno_file = dataset.get_anno() + metric = KeyPointTopDownCOCOEval(anno_file, + len(dataset), 17, 'output_eval') + else: + raise ValueError("metric currently only supports COCO and VOC.") + global_config['metric'] = metric + else: + eval_func = None + + ac = AutoCompression( + model_dir=global_config["model_dir"], + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"], + save_dir=FLAGS.save_dir, + config=all_config, + train_dataloader=train_loader, + eval_callback=eval_func) + ac.compress() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/PaddleDetection-release-2.6/deploy/benchmark/benchmark.sh b/PaddleDetection-release-2.6/deploy/benchmark/benchmark.sh new file mode 100644 index 0000000000000000000000000000000000000000..e29aaa884d30316237aede0c18b38e2cc520ee4b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/benchmark/benchmark.sh @@ -0,0 +1,36 @@ +# All rights `PaddleDetection` reserved +#!/bin/bash +model_dir=$1 +model_name=$2 + +export img_dir="demo" +export log_path="output_pipeline" + + +echo "model_dir : ${model_dir}" +echo "img_dir: ${img_dir}" + +# TODO: support batch size>1 +for use_mkldnn in "True" "False"; do + for threads in "1" "6"; do + echo "${model_name} ${model_dir}, use_mkldnn: ${use_mkldnn} threads: ${threads}" + python deploy/python/infer.py \ + --model_dir=${model_dir} \ + --run_benchmark True \ + --enable_mkldnn=${use_mkldnn} \ + --device=CPU \ + --cpu_threads=${threads} \ + --image_dir=${img_dir} 2>&1 | tee ${log_path}/${model_name}_cpu_usemkldnn_${use_mkldnn}_cputhreads_${threads}_bs1_infer.log + done +done + +for run_mode in "fluid" "trt_fp32" "trt_fp16"; do + echo "${model_name} ${model_dir}, run_mode: ${run_mode}" + python deploy/python/infer.py \ + --model_dir=${model_dir} \ + --run_benchmark=True \ + --device=GPU \ + --run_mode=${run_mode} \ + --image_dir=${img_dir} 2>&1 | tee ${log_path}/${model_name}_gpu_runmode_${run_mode}_bs1_infer.log +done + diff --git a/PaddleDetection-release-2.6/deploy/benchmark/benchmark_quant.sh b/PaddleDetection-release-2.6/deploy/benchmark/benchmark_quant.sh new file mode 100644 index 0000000000000000000000000000000000000000..a21541dd044bf9bd4a33bb4eb2116b47743e5a8a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/benchmark/benchmark_quant.sh @@ -0,0 +1,23 @@ +# All rights `PaddleDetection` reserved +#!/bin/bash +model_dir=$1 +model_name=$2 + +export img_dir="demo" +export log_path="output_pipeline" + + +echo "model_dir : ${model_dir}" +echo "img_dir: ${img_dir}" + +# TODO: support batch size>1 +for run_mode in "trt_int8"; do + echo "${model_name} ${model_dir}, run_mode: ${run_mode}" + python deploy/python/infer.py \ + --model_dir=${model_dir} \ + --run_benchmark=True \ + --device=GPU \ + --run_mode=${run_mode} \ + --image_dir=${img_dir} 2>&1 | tee ${log_path}/${model_name}_gpu_runmode_${run_mode}_bs1_infer.log +done + diff --git a/PaddleDetection-release-2.6/deploy/benchmark/log_parser_excel.py b/PaddleDetection-release-2.6/deploy/benchmark/log_parser_excel.py new file mode 100644 index 0000000000000000000000000000000000000000..317b3759572c6acef3438fbc654bc5918e8bdd38 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/benchmark/log_parser_excel.py @@ -0,0 +1,300 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import re +import argparse +import pandas as pd + + +def parse_args(): + """ + parse input args + """ + parser = argparse.ArgumentParser() + parser.add_argument( + "--log_path", + type=str, + default="./output_pipeline", + help="benchmark log path") + parser.add_argument( + "--output_name", + type=str, + default="benchmark_excel.xlsx", + help="output excel file name") + parser.add_argument( + "--analysis_trt", dest="analysis_trt", action='store_true') + parser.add_argument( + "--analysis_mkl", dest="analysis_mkl", action='store_true') + return parser.parse_args() + + +def find_all_logs(path_walk): + """ + find all .log files from target dir + """ + for root, ds, files in os.walk(path_walk): + for file_name in files: + if re.match(r'.*.log', file_name): + full_path = os.path.join(root, file_name) + yield file_name, full_path + + +def process_log(file_name): + """ + process log to dict + """ + output_dict = {} + with open(file_name, 'r') as f: + for i, data in enumerate(f.readlines()): + if i == 0: + continue + line_lists = data.split(" ") + + # conf info + if "runtime_device:" in line_lists: + pos_buf = line_lists.index("runtime_device:") + output_dict["runtime_device"] = line_lists[pos_buf + 1].strip() + if "ir_optim:" in line_lists: + pos_buf = line_lists.index("ir_optim:") + output_dict["ir_optim"] = line_lists[pos_buf + 1].strip() + if "enable_memory_optim:" in line_lists: + pos_buf = line_lists.index("enable_memory_optim:") + output_dict["enable_memory_optim"] = line_lists[pos_buf + + 1].strip() + if "enable_tensorrt:" in line_lists: + pos_buf = line_lists.index("enable_tensorrt:") + output_dict["enable_tensorrt"] = line_lists[pos_buf + 1].strip() + if "precision:" in line_lists: + pos_buf = line_lists.index("precision:") + output_dict["precision"] = line_lists[pos_buf + 1].strip() + if "enable_mkldnn:" in line_lists: + pos_buf = line_lists.index("enable_mkldnn:") + output_dict["enable_mkldnn"] = line_lists[pos_buf + 1].strip() + if "cpu_math_library_num_threads:" in line_lists: + pos_buf = line_lists.index("cpu_math_library_num_threads:") + output_dict["cpu_math_library_num_threads"] = line_lists[ + pos_buf + 1].strip() + + # model info + if "model_name:" in line_lists: + pos_buf = line_lists.index("model_name:") + output_dict["model_name"] = list( + filter(None, line_lists[pos_buf + 1].strip().split('/')))[ + -1] + + # data info + if "batch_size:" in line_lists: + pos_buf = line_lists.index("batch_size:") + output_dict["batch_size"] = line_lists[pos_buf + 1].strip() + if "input_shape:" in line_lists: + pos_buf = line_lists.index("input_shape:") + output_dict["input_shape"] = line_lists[pos_buf + 1].strip() + + # perf info + if "cpu_rss(MB):" in line_lists: + pos_buf = line_lists.index("cpu_rss(MB):") + output_dict["cpu_rss(MB)"] = line_lists[pos_buf + 1].strip( + ).split(',')[0] + if "gpu_rss(MB):" in line_lists: + pos_buf = line_lists.index("gpu_rss(MB):") + output_dict["gpu_rss(MB)"] = line_lists[pos_buf + 1].strip( + ).split(',')[0] + if "gpu_util:" in line_lists: + pos_buf = line_lists.index("gpu_util:") + output_dict["gpu_util"] = line_lists[pos_buf + 1].strip().split( + ',')[0] + if "preproce_time(ms):" in line_lists: + pos_buf = line_lists.index("preproce_time(ms):") + output_dict["preproce_time(ms)"] = line_lists[ + pos_buf + 1].strip().split(',')[0] + if "inference_time(ms):" in line_lists: + pos_buf = line_lists.index("inference_time(ms):") + output_dict["inference_time(ms)"] = line_lists[ + pos_buf + 1].strip().split(',')[0] + if "postprocess_time(ms):" in line_lists: + pos_buf = line_lists.index("postprocess_time(ms):") + output_dict["postprocess_time(ms)"] = line_lists[ + pos_buf + 1].strip().split(',')[0] + return output_dict + + +def filter_df_merge(cpu_df, filter_column=None): + """ + process cpu data frame, merge by 'model_name', 'batch_size' + Args: + cpu_df ([type]): [description] + """ + if not filter_column: + raise Exception( + "please assign filter_column for filter_df_merge function") + + df_lists = [] + filter_column_lists = [] + for k, v in cpu_df.groupby(filter_column, dropna=True): + filter_column_lists.append(k) + df_lists.append(v) + final_output_df = df_lists[-1] + + # merge same model + for i in range(len(df_lists) - 1): + left_suffix = cpu_df[filter_column].unique()[0] + right_suffix = df_lists[i][filter_column].unique()[0] + print(left_suffix, right_suffix) + if not pd.isnull(right_suffix): + final_output_df = pd.merge( + final_output_df, + df_lists[i], + how='left', + left_on=['model_name', 'batch_size'], + right_on=['model_name', 'batch_size'], + suffixes=('', '_{0}_{1}'.format(filter_column, right_suffix))) + + # rename default df columns + origin_column_names = list(cpu_df.columns.values) + origin_column_names.remove(filter_column) + suffix = final_output_df[filter_column].unique()[0] + for name in origin_column_names: + final_output_df.rename( + columns={name: "{0}_{1}_{2}".format(name, filter_column, suffix)}, + inplace=True) + final_output_df.rename( + columns={ + filter_column: "{0}_{1}_{2}".format(filter_column, filter_column, + suffix) + }, + inplace=True) + + final_output_df.sort_values( + by=[ + "model_name_{0}_{1}".format(filter_column, suffix), + "batch_size_{0}_{1}".format(filter_column, suffix) + ], + inplace=True) + return final_output_df + + +def trt_perf_analysis(raw_df): + """ + sperate raw dataframe to a list of dataframe + compare tensorrt percision performance + """ + # filter df by gpu, compare tensorrt and gpu + # define default dataframe for gpu performance analysis + gpu_df = raw_df.loc[raw_df['runtime_device'] == 'gpu'] + new_df = filter_df_merge(gpu_df, "precision") + + # calculate qps diff percentile + infer_fp32 = "inference_time(ms)_precision_fp32" + infer_fp16 = "inference_time(ms)_precision_fp16" + infer_int8 = "inference_time(ms)_precision_int8" + new_df["fp32_fp16_diff"] = new_df[[infer_fp32, infer_fp16]].apply( + lambda x: (float(x[infer_fp16]) - float(x[infer_fp32])) / float(x[infer_fp32]), + axis=1) + new_df["fp32_gpu_diff"] = new_df[["inference_time(ms)", infer_fp32]].apply( + lambda x: (float(x[infer_fp32]) - float(x[infer_fp32])) / float(x["inference_time(ms)"]), + axis=1) + new_df["fp16_int8_diff"] = new_df[[infer_fp16, infer_int8]].apply( + lambda x: (float(x[infer_int8]) - float(x[infer_fp16])) / float(x[infer_fp16]), + axis=1) + + return new_df + + +def mkl_perf_analysis(raw_df): + """ + sperate raw dataframe to a list of dataframe + compare mkldnn performance with not enable mkldnn + """ + # filter df by cpu, compare mkl and cpu + # define default dataframe for cpu mkldnn analysis + cpu_df = raw_df.loc[raw_df['runtime_device'] == 'cpu'] + mkl_compare_df = cpu_df.loc[cpu_df['cpu_math_library_num_threads'] == '1'] + thread_compare_df = cpu_df.loc[cpu_df['enable_mkldnn'] == 'True'] + + # define dataframe need to be analyzed + output_mkl_df = filter_df_merge(mkl_compare_df, 'enable_mkldnn') + output_thread_df = filter_df_merge(thread_compare_df, + 'cpu_math_library_num_threads') + + # calculate performance diff percentile + # compare mkl performance with cpu + enable_mkldnn = "inference_time(ms)_enable_mkldnn_True" + disable_mkldnn = "inference_time(ms)_enable_mkldnn_False" + output_mkl_df["mkl_infer_diff"] = output_mkl_df[[ + enable_mkldnn, disable_mkldnn + ]].apply( + lambda x: (float(x[enable_mkldnn]) - float(x[disable_mkldnn])) / float(x[disable_mkldnn]), + axis=1) + cpu_enable_mkldnn = "cpu_rss(MB)_enable_mkldnn_True" + cpu_disable_mkldnn = "cpu_rss(MB)_enable_mkldnn_False" + output_mkl_df["mkl_cpu_rss_diff"] = output_mkl_df[[ + cpu_enable_mkldnn, cpu_disable_mkldnn + ]].apply( + lambda x: (float(x[cpu_enable_mkldnn]) - float(x[cpu_disable_mkldnn])) / float(x[cpu_disable_mkldnn]), + axis=1) + + # compare cpu_multi_thread performance with cpu + num_threads_1 = "inference_time(ms)_cpu_math_library_num_threads_1" + num_threads_6 = "inference_time(ms)_cpu_math_library_num_threads_6" + output_thread_df["mkl_infer_diff"] = output_thread_df[[ + num_threads_6, num_threads_1 + ]].apply( + lambda x: (float(x[num_threads_6]) - float(x[num_threads_1])) / float(x[num_threads_1]), + axis=1) + cpu_num_threads_1 = "cpu_rss(MB)_cpu_math_library_num_threads_1" + cpu_num_threads_6 = "cpu_rss(MB)_cpu_math_library_num_threads_6" + output_thread_df["mkl_cpu_rss_diff"] = output_thread_df[[ + cpu_num_threads_6, cpu_num_threads_1 + ]].apply( + lambda x: (float(x[cpu_num_threads_6]) - float(x[cpu_num_threads_1])) / float(x[cpu_num_threads_1]), + axis=1) + + return output_mkl_df, output_thread_df + + +def main(): + """ + main + """ + args = parse_args() + # create empty DataFrame + origin_df = pd.DataFrame(columns=[ + "model_name", "batch_size", "input_shape", "runtime_device", "ir_optim", + "enable_memory_optim", "enable_tensorrt", "precision", "enable_mkldnn", + "cpu_math_library_num_threads", "preproce_time(ms)", + "inference_time(ms)", "postprocess_time(ms)", "cpu_rss(MB)", + "gpu_rss(MB)", "gpu_util" + ]) + + for file_name, full_path in find_all_logs(args.log_path): + dict_log = process_log(full_path) + origin_df = origin_df.append(dict_log, ignore_index=True) + + raw_df = origin_df.sort_values(by='model_name') + raw_df.sort_values(by=["model_name", "batch_size"], inplace=True) + raw_df.to_excel(args.output_name) + + if args.analysis_trt: + trt_df = trt_perf_analysis(raw_df) + trt_df.to_excel("trt_analysis_{}".format(args.output_name)) + + if args.analysis_mkl: + mkl_df, thread_df = mkl_perf_analysis(raw_df) + mkl_df.to_excel("mkl_enable_analysis_{}".format(args.output_name)) + thread_df.to_excel("mkl_threads_analysis_{}".format(args.output_name)) + + +if __name__ == "__main__": + main() diff --git a/PaddleDetection-release-2.6/deploy/cpp/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/cpp/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..34f8808d53e085c43048c4955a5715d663e4291e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/CMakeLists.txt @@ -0,0 +1,264 @@ +cmake_minimum_required(VERSION 3.0) +project(PaddleObjectDetector CXX C) + +option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON) +option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON) +option(WITH_TENSORRT "Compile demo with TensorRT." OFF) + +option(WITH_KEYPOINT "Whether to Compile KeyPoint detector" OFF) +option(WITH_MOT "Whether to Compile MOT detector" OFF) + +SET(PADDLE_DIR "" CACHE PATH "Location of libraries") +SET(PADDLE_LIB_NAME "" CACHE STRING "libpaddle_inference") +SET(OPENCV_DIR "" CACHE PATH "Location of libraries") +SET(CUDA_LIB "" CACHE PATH "Location of libraries") +SET(CUDNN_LIB "" CACHE PATH "Location of libraries") +SET(TENSORRT_INC_DIR "" CACHE PATH "Compile demo with TensorRT") +SET(TENSORRT_LIB_DIR "" CACHE PATH "Compile demo with TensorRT") + +include(cmake/yaml-cpp.cmake) + +include_directories("${CMAKE_SOURCE_DIR}/") +include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include") +link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib") + +if (WITH_KEYPOINT) + set(SRCS src/main_keypoint.cc src/preprocess_op.cc src/object_detector.cc src/picodet_postprocess.cc src/utils.cc src/keypoint_detector.cc src/keypoint_postprocess.cc) +elseif (WITH_MOT) + set(SRCS src/main_jde.cc src/preprocess_op.cc src/object_detector.cc src/jde_detector.cc src/tracker.cc src/trajectory.cc src/lapjv.cpp src/picodet_postprocess.cc src/utils.cc) +else () + set(SRCS src/main.cc src/preprocess_op.cc src/object_detector.cc src/picodet_postprocess.cc src/utils.cc) +endif() + +macro(safe_set_static_flag) + foreach(flag_var + CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE + CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO) + if(${flag_var} MATCHES "/MD") + string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}") + endif(${flag_var} MATCHES "/MD") + endforeach(flag_var) +endmacro() + +if (WITH_MKL) + ADD_DEFINITIONS(-DUSE_MKL) +endif() + +if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "") + message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir") +endif() +message("PADDLE_DIR IS:" ${PADDLE_DIR}) + +if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "") + message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv") +endif() + +include_directories("${CMAKE_SOURCE_DIR}/") +include_directories("${PADDLE_DIR}/") +include_directories("${PADDLE_DIR}/third_party/install/protobuf/include") +include_directories("${PADDLE_DIR}/third_party/install/glog/include") +include_directories("${PADDLE_DIR}/third_party/install/gflags/include") +include_directories("${PADDLE_DIR}/third_party/install/xxhash/include") +if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include") + include_directories("${PADDLE_DIR}/third_party/install/snappy/include") +endif() +if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include") + include_directories("${PADDLE_DIR}/third_party/install/snappystream/include") +endif() +include_directories("${PADDLE_DIR}/third_party/boost") +include_directories("${PADDLE_DIR}/third_party/eigen3") + +if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + link_directories("${PADDLE_DIR}/third_party/install/snappy/lib") +endif() +if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib") +endif() + +link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib") +link_directories("${PADDLE_DIR}/third_party/install/glog/lib") +link_directories("${PADDLE_DIR}/third_party/install/gflags/lib") +link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib") +link_directories("${PADDLE_DIR}/third_party/install/paddle2onnx/lib") +link_directories("${PADDLE_DIR}/third_party/install/onnxruntime/lib") +link_directories("${PADDLE_DIR}/paddle/lib/") +link_directories("${CMAKE_CURRENT_BINARY_DIR}") + + + +if (WIN32) + include_directories("${PADDLE_DIR}/paddle/fluid/inference") + include_directories("${PADDLE_DIR}/paddle/include") + link_directories("${PADDLE_DIR}/paddle/fluid/inference") + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH) + +else () + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH) + include_directories("${PADDLE_DIR}/paddle/include") + link_directories("${PADDLE_DIR}/paddle/lib") +endif () +include_directories(${OpenCV_INCLUDE_DIRS}) + +if (WIN32) + add_definitions("/DGOOGLE_GLOG_DLL_DECL=") + set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT") + set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT") +else() + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11") + set(CMAKE_STATIC_LIBRARY_PREFIX "") +endif() + +# TODO let users define cuda lib path +if (WITH_GPU) + if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "") + message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64") + endif() + if (NOT WIN32) + if (NOT DEFINED CUDNN_LIB) + message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64") + endif() + endif(NOT WIN32) +endif() + + +if (NOT WIN32) + if (WITH_TENSORRT AND WITH_GPU) + include_directories("${TENSORRT_INC_DIR}/") + link_directories("${TENSORRT_LIB_DIR}/") + endif() +endif(NOT WIN32) + +if (NOT WIN32) + set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph") + if(EXISTS ${NGRAPH_PATH}) + include(GNUInstallDirs) + include_directories("${NGRAPH_PATH}/include") + link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}") + set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() +endif() + +if(WITH_MKL) + include_directories("${PADDLE_DIR}/third_party/install/mklml/include") + if (WIN32) + set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib + ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib) + else () + set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} + ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX}) + execute_process(COMMAND cp -r ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib) + endif () + set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn") + if(EXISTS ${MKLDNN_PATH}) + include_directories("${MKLDNN_PATH}/include") + if (WIN32) + set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib) + else () + set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0) + endif () + endif() +else() + set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX}) +endif() + + +if (WIN32) + if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}") + set(DEPS + ${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) + else() + set(DEPS + ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) + endif() +endif() + + +if (WIN32) + set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) +else() + set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_SHARED_LIBRARY_SUFFIX}) +endif() + +message("PADDLE_LIB_NAME:" ${PADDLE_LIB_NAME}) +message("DEPS:" $DEPS) + +if (NOT WIN32) + set(DEPS ${DEPS} + ${MATH_LIB} ${MKLDNN_LIB} + glog gflags protobuf z xxhash yaml-cpp + ) + if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + set(DEPS ${DEPS} snappystream) + endif() + if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + set(DEPS ${DEPS} snappy) + endif() +else() + set(DEPS ${DEPS} + ${MATH_LIB} ${MKLDNN_LIB} + glog gflags_static libprotobuf xxhash libyaml-cppmt) + set(DEPS ${DEPS} libcmt shlwapi) + if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + set(DEPS ${DEPS} snappy) + endif() + if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + set(DEPS ${DEPS} snappystream) + endif() +endif(NOT WIN32) + +if(WITH_GPU) + if(NOT WIN32) + if (WITH_TENSORRT) + set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() + set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX}) + else() + set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} ) + set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} ) + set(DEPS ${DEPS} ${CUDNN_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX}) + endif() +endif() + +if (NOT WIN32) + set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread") + set(DEPS ${DEPS} ${EXTERNAL_LIB}) +endif() + +set(DEPS ${DEPS} ${OpenCV_LIBS}) +add_executable(main ${SRCS}) +ADD_DEPENDENCIES(main ext-yaml-cpp) +message("DEPS:" $DEPS) +target_link_libraries(main ${DEPS}) + +if (WIN32 AND WITH_MKL) + add_custom_command(TARGET main POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll + ) +endif() + +if (WIN32 AND NOT WITH_MKL) + add_custom_command(TARGET main POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/openblas/lib/openblas.dll ./openblas.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/openblas/lib/openblas.dll ./release/openblas.dll + ) +endif() + +if (WIN32) + add_custom_command(TARGET main POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/onnxruntime/lib/onnxruntime.dll ./onnxruntime.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/paddle2onnx/lib/paddle2onnx.dll ./paddle2onnx.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/onnxruntime/lib/onnxruntime.dll ./release/onnxruntime.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/paddle2onnx/lib/paddle2onnx.dll ./release/paddle2onnx.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll + ) +endif() diff --git a/PaddleDetection-release-2.6/deploy/cpp/README.md b/PaddleDetection-release-2.6/deploy/cpp/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ffa5e251e7913b4af30fa6abe9912c9434af996f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/README.md @@ -0,0 +1,54 @@ +# C++端预测部署 + + + +## 各环境编译部署教程 +- [Linux 编译部署](docs/linux_build.md) +- [Windows编译部署(使用Visual Studio 2019)](docs/windows_vs2019_build.md) +- [NV Jetson编译部署](docs/Jetson_build.md) + + +## C++部署总览 +[1.说明](#1说明) + +[2.主要目录和文件](#2主要目录和文件) + + +### 1.说明 + +本目录为用户提供一个跨平台的`C++`部署方案,让用户通过`PaddleDetection`训练的模型导出后,即可基于本项目快速运行,也可以快速集成代码结合到自己的项目实际应用中去。 + +主要设计的目标包括以下四点: +- 跨平台,支持在 `Windows` 和 `Linux` 完成编译、二次开发集成和部署运行 +- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理等逻辑 +- 高性能,除了`PaddlePaddle`自身带来的性能优势,我们还针对图像检测的特点对关键步骤进行了性能优化 +- 支持各种不同检测模型结构,包括`Yolov3`/`Faster_RCNN`/`SSD`等 + +### 2.主要目录和文件 + +```bash +deploy/cpp +| +├── src +│ ├── main.cc # 集成代码示例, 程序入口 +│ ├── object_detector.cc # 模型加载和预测主要逻辑封装类实现 +│ └── preprocess_op.cc # 预处理相关主要逻辑封装实现 +| +├── include +│ ├── config_parser.h # 导出模型配置yaml文件解析 +│ ├── object_detector.h # 模型加载和预测主要逻辑封装类 +│ └── preprocess_op.h # 预处理相关主要逻辑类封装 +| +├── docs +│ ├── linux_build.md # Linux 编译指南 +│ └── windows_vs2019_build.md # Windows VS2019编译指南 +│ +├── build.sh # 编译命令脚本 +│ +├── CMakeList.txt # cmake编译入口文件 +| +├── CMakeSettings.json # Visual Studio 2019 CMake项目编译设置 +│ +└── cmake # 依赖的外部项目cmake(目前仅有yaml-cpp) + +``` diff --git a/PaddleDetection-release-2.6/deploy/cpp/cmake/yaml-cpp.cmake b/PaddleDetection-release-2.6/deploy/cpp/cmake/yaml-cpp.cmake new file mode 100644 index 0000000000000000000000000000000000000000..7bc7f34d476d69d57336940bcf6c8c55311b8112 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/cmake/yaml-cpp.cmake @@ -0,0 +1,30 @@ + +find_package(Git REQUIRED) + +include(ExternalProject) + +message("${CMAKE_BUILD_TYPE}") + +ExternalProject_Add( + ext-yaml-cpp + URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip + URL_MD5 9542d6de397d1fbd649ed468cb5850e6 + CMAKE_ARGS + -DYAML_CPP_BUILD_TESTS=OFF + -DYAML_CPP_BUILD_TOOLS=OFF + -DYAML_CPP_INSTALL=OFF + -DYAML_CPP_BUILD_CONTRIB=OFF + -DMSVC_SHARED_RT=OFF + -DBUILD_SHARED_LIBS=OFF + -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} + -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} + -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG} + -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE} + -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + -DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp" + # Disable install step + INSTALL_COMMAND "" + LOG_DOWNLOAD ON + LOG_BUILD 1 +) diff --git a/PaddleDetection-release-2.6/deploy/cpp/docs/Jetson_build.md b/PaddleDetection-release-2.6/deploy/cpp/docs/Jetson_build.md new file mode 100644 index 0000000000000000000000000000000000000000..ea9699a438ed3977e118b155a01b533d83bb12f4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/docs/Jetson_build.md @@ -0,0 +1,210 @@ +# Jetson平台编译指南 + +## 说明 +`NVIDIA Jetson`设备是具有`NVIDIA GPU`的嵌入式设备,可以将目标检测算法部署到该设备上。本文档是在`Jetson`硬件上部署`PaddleDetection`模型的教程。 + +本文档以`Jetson TX2`硬件、`JetPack 4.3`版本为例进行说明。 + +`Jetson`平台的开发指南请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html). + +## Jetson环境搭建 +`Jetson`系统软件安装,请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html). + +* (1) 查看硬件系统的l4t的版本号 +``` +cat /etc/nv_tegra_release +``` +* (2) 根据硬件,选择硬件可安装的`JetPack`版本,硬件和`JetPack`版本对应关系请参考[jetpack-archive](https://developer.nvidia.com/embedded/jetpack-archive). + +* (3) 下载`JetPack`,请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html) 中的`Preparing a Jetson Developer Kit for Use`章节内容进行刷写系统镜像。 + +**注意**: 请在[jetpack-archive](https://developer.nvidia.com/embedded/jetpack-archive) 根据硬件选择适配的`JetPack`版本进行刷机。 + +## 下载或编译`Paddle`预测库 +本文档使用`Paddle`在`JetPack4.3`上预先编译好的预测库,请根据硬件在[安装与编译 Linux 预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 中选择对应版本的`Paddle`预测库。 + +这里选择[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.0-nv-jetson-jetpack4.3-all/paddle_inference.tgz), `Paddle`版本`2.0.0-rc0`,`CUDA`版本`10.0`,`CUDNN`版本`7.6`,`TensorRT`版本`6`。 + +若需要自己在`Jetson`平台上自定义编译`Paddle`库,请参考文档[安装与编译 Linux 预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html) 的`NVIDIA Jetson嵌入式硬件预测库源码编译`部分内容。 + +### Step1: 下载代码 + + `git clone https://github.com/PaddlePaddle/PaddleDetection.git` + +**说明**:其中`C++`预测代码在`/root/projects/PaddleDetection/deploy/cpp` 目录,该目录不依赖任何`PaddleDetection`下其他目录。 + + +### Step2: 下载PaddlePaddle C++ 预测库 paddle_inference + +解压下载的[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.1-nv-jetson-jetpack4.3-all/paddle_inference.tgz) 。 + +下载并解压后`/root/projects/paddle_inference`目录包含内容为: +``` +paddle_inference +├── paddle # paddle核心库和头文件 +| +├── third_party # 第三方依赖库和头文件 +| +└── version.txt # 版本和编译信息 +``` + +**注意:** 预编译库`nv-jetson-cuda10-cudnn7.6-trt6`使用的`GCC`版本是`7.5.0`,其他都是使用`GCC 4.8.5`编译的。使用高版本的GCC可能存在`ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。 + + +### Step4: 编译 + +编译`cmake`的命令在`scripts/build.sh`中,请根据实际情况修改主要参数,其主要内容说明如下: + +注意,`TX2`平台的`CUDA`、`CUDNN`需要通过`JetPack`安装。 + +``` +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=ON + +# 是否使用MKL or openblas,TX2需要设置为OFF +WITH_MKL=OFF + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=ON + +# TensorRT 的include路径 +TENSORRT_INC_DIR=/usr/include/aarch64-linux-gnu + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/usr/lib/aarch64-linux-gnu + +# Paddle 预测库路径 +PADDLE_DIR=/path/to/paddle_inference/ + +# Paddle 预测库名称 +PADDLE_LIB_NAME=paddle_inference + +# Paddle 的预测库是否使用静态库来编译 +# 使用TensorRT时,Paddle的预测库通常为动态库 +WITH_STATIC_LIB=OFF + +# CUDA 的 lib 路径 +CUDA_LIB=/usr/local/cuda-10.0/lib64 + +# CUDNN 的 lib 路径 +CUDNN_LIB=/usr/lib/aarch64-linux-gnu + +# 是否开启关键点模型预测功能 +WITH_KEYPOINT=ON + +# OPENCV_DIR 的路径 +# linux平台请下载:https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2,并解压到deps文件夹下 +# TX2平台请下载:https://paddlemodels.bj.bcebos.com/TX2_JetPack4.3_opencv_3.4.10_gcc7.5.0.zip,并解压到deps文件夹下 +OPENCV_DIR=/path/to/opencv + +# 请检查以上各个路径是否正确 + +# 以下无需改动 +cmake .. \ + -DWITH_GPU=${WITH_GPU} \ + -DWITH_MKL=OFF \ + -DWITH_TENSORRT=${WITH_TENSORRT} \ + -DTENSORRT_DIR=${TENSORRT_DIR} \ + -DPADDLE_DIR=${PADDLE_DIR} \ + -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \ + -DCUDA_LIB=${CUDA_LIB} \ + -DCUDNN_LIB=${CUDNN_LIB} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DPADDLE_LIB_NAME={PADDLE_LIB_NAME} \ + -DWITH_KEYPOINT=${WITH_KEYPOINT} +make +``` + +例如设置如下: +``` +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=ON + +# 是否使用MKL or openblas +WITH_MKL=OFF + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=OFF + +# TensorRT 的include路径 +TENSORRT_INC_DIR=/usr/include/aarch64-linux-gnu + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/usr/lib/aarch64-linux-gnu + +# Paddle 预测库路径 +PADDLE_DIR=/home/nvidia/PaddleDetection_infer/paddle_inference/ + +# Paddle 预测库名称 +PADDLE_LIB_NAME=paddle_inference + +# Paddle 的预测库是否使用静态库来编译 +# 使用TensorRT时,Paddle的预测库通常为动态库 +WITH_STATIC_LIB=OFF + +# CUDA 的 lib 路径 +CUDA_LIB=/usr/local/cuda-10.0/lib64 + +# CUDNN 的 lib 路径 +CUDNN_LIB=/usr/lib/aarch64-linux-gnu/ + +# 是否开启关键点模型预测功能 +WITH_KEYPOINT=ON +``` + +修改脚本设置好主要参数后,执行`build`脚本: + ```shell + sh ./scripts/build.sh + ``` + +### Step5: 预测及可视化 +编译成功后,预测入口程序为`build/main`其主要命令参数说明如下: +| 参数 | 说明 | +| ---- | ---- | +| --model_dir | 导出的检测预测模型所在路径 | +| --model_dir_keypoint | Option | 导出的关键点预测模型所在路径 | +| --image_file | 要预测的图片文件路径 | +| --image_dir | 要预测的图片文件夹路径 | +| --video_file | 要预测的视频文件路径 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)| +| --device | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --gpu_id | 指定进行推理的GPU device id(默认值为0)| +| --run_mode | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --batch_size | 检测模型预测时的batch size,在指定`image_dir`时有效 | +| --batch_size_keypoint | 关键点模型预测时的batch size,默认为8 | +| --run_benchmark | 是否重复预测来进行benchmark测速 | +| --output_dir | 输出图片所在的文件夹, 默认为output | +| --use_mkldnn | CPU预测中是否开启MKLDNN加速 | +| --cpu_threads | 设置cpu线程数,默认为1 | +| --use_dark | 关键点模型输出预测是否使用DarkPose后处理,默认为true | + +**注意**: +- 优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。 +- --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 + + +`样例一`: +```shell +#不使用`GPU`测试图片 `/root/projects/images/test.jpeg` +./main --model_dir=/root/projects/models/yolov3_darknet --image_file=/root/projects/images/test.jpeg +``` + +图片文件`可视化预测结果`会保存在当前目录下`output.jpg`文件中。 + + +`样例二`: +```shell +#使用 `GPU`预测视频`/root/projects/videos/test.mp4` +./main --model_dir=/root/projects/models/yolov3_darknet --video_path=/root/projects/images/test.mp4 --device=GPU +``` +视频文件目前支持`.mp4`格式的预测,`可视化预测结果`会保存在当前目录下`output.mp4`文件中。 + +`样例三`: +```shell +#使用关键点模型与检测模型联合预测,使用 `GPU`预测 +#检测模型检测到的人送入关键点模型进行关键点预测 +./main --model_dir=/root/projects/models/yolov3_darknet --model_dir_keypoint=/root/projects/models/hrnet_w32_256x192 --image_file=/root/projects/images/test.jpeg --device=GPU +``` + +## 性能测试 +benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md) diff --git a/PaddleDetection-release-2.6/deploy/cpp/docs/linux_build.md b/PaddleDetection-release-2.6/deploy/cpp/docs/linux_build.md new file mode 100644 index 0000000000000000000000000000000000000000..ee28e73ee56db3ec46a1674a6af0cb3af1012b3e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/docs/linux_build.md @@ -0,0 +1,149 @@ +# Linux平台编译指南 + +## 说明 +本文档在 `Linux`平台使用`GCC 8.2`测试过,如果需要使用其他G++版本编译使用,则需要重新编译Paddle预测库,请参考: [从源码编译Paddle预测库](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc8.2编译的,如果需要在gcc8.2以外的环境编译,那么需自行编译opencv库。 + +## 前置条件 +* G++ 8.2 +* CUDA 9.0 / CUDA 10.1, cudnn 7+ (仅在使用GPU版本的预测库时需要) +* CMake 3.0+ + +请确保系统已经安装好上述基本软件,**下面所有示例以工作目录为 `/root/projects/`演示**。 + +### Step1: 下载代码 + + `git clone https://github.com/PaddlePaddle/PaddleDetection.git` + +**说明**:其中`C++`预测代码在`/root/projects/PaddleDetection/deploy/cpp` 目录,该目录不依赖任何`PaddleDetection`下其他目录。 + + +### Step2: 下载PaddlePaddle C++ 预测库 paddle_inference + +PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html) + + +下载并解压后`/root/projects/paddle_inference`目录包含内容为: +``` +paddle_inference +├── paddle # paddle核心库和头文件 +| +├── third_party # 第三方依赖库和头文件 +| +└── version.txt # 版本和编译信息 +``` + +**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。 + + +### Step3: 编译 + +编译`cmake`的命令在`scripts/build.sh`中,请根据实际情况修改主要参数,其主要内容说明如下: + +``` +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=OFF + +# 使用MKL or openblas +WITH_MKL=ON + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=OFF + +# TensorRT 的include路径 +TENSORRT_LIB_DIR=/path/to/TensorRT/include + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/path/to/TensorRT/lib + +# Paddle 预测库路径 +PADDLE_DIR=/path/to/paddle_inference + +# Paddle 预测库名称 +PADDLE_LIB_NAME=paddle_inference + +# CUDA 的 lib 路径 +CUDA_LIB=/path/to/cuda/lib + +# CUDNN 的 lib 路径 +CUDNN_LIB=/path/to/cudnn/lib + +# 是否开启关键点模型预测功能 +WITH_KEYPOINT=ON + +# 请检查以上各个路径是否正确 + +# 以下无需改动 +cmake .. \ + -DWITH_GPU=${WITH_GPU} \ + -DWITH_MKL=${WITH_MKL} \ + -DWITH_TENSORRT=${WITH_TENSORRT} \ + -DTENSORRT_LIB_DIR=${TENSORRT_LIB_DIR} \ + -DTENSORRT_INC_DIR=${TENSORRT_INC_DIR} \ + -DPADDLE_DIR=${PADDLE_DIR} \ + -DCUDA_LIB=${CUDA_LIB} \ + -DCUDNN_LIB=${CUDNN_LIB} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DPADDLE_LIB_NAME=${PADDLE_LIB_NAME} \ + -DWITH_KEYPOINT=${WITH_KEYPOINT} +make + +``` + +修改脚本设置好主要参数后,执行`build`脚本: + ```shell + sh ./scripts/build.sh + ``` + +**注意**: OPENCV依赖OPENBLAS,Ubuntu用户需确认系统是否已存在`libopenblas.so`。如未安装,可执行apt-get install libopenblas-dev进行安装。 + +### Step4: 预测及可视化 +编译成功后,预测入口程序为`build/main`其主要命令参数说明如下: +| 参数 | 说明 | +| ---- | ---- | +| --model_dir | 导出的检测预测模型所在路径 | +| --model_dir_keypoint | Option | 导出的关键点预测模型所在路径 | +| --image_file | 要预测的图片文件路径 | +| --image_dir | 要预测的图片文件夹路径 | +| --video_file | 要预测的视频文件路径 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)| +| --device | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --gpu_id | 指定进行推理的GPU device id(默认值为0)| +| --run_mode | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --batch_size | 检测模型预测时的batch size,在指定`image_dir`时有效 | +| --batch_size_keypoint | 关键点模型预测时的batch size,默认为8 | +| --run_benchmark | 是否重复预测来进行benchmark测速 | +| --output_dir | 输出图片所在的文件夹, 默认为output | +| --use_mkldnn | CPU预测中是否开启MKLDNN加速 | +| --cpu_threads | 设置cpu线程数,默认为1 | +| --use_dark | 关键点模型输出预测是否使用DarkPose后处理,默认为true | + +**注意**: +- 优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。 +- --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 + +`样例一`: +```shell +#不使用`GPU`测试图片 `/root/projects/images/test.jpeg` +./build/main --model_dir=/root/projects/models/yolov3_darknet --image_file=/root/projects/images/test.jpeg +``` + +图片文件`可视化预测结果`会保存在当前目录下`output.jpg`文件中。 + + +`样例二`: +```shell +#使用 `GPU`预测视频`/root/projects/videos/test.mp4` +./build/main --model_dir=/root/projects/models/yolov3_darknet --video_file=/root/projects/images/test.mp4 --device=GPU +``` +视频文件目前支持`.mp4`格式的预测,`可视化预测结果`会保存在当前目录下`output.mp4`文件中。 + + +`样例三`: +```shell +#使用关键点模型与检测模型联合预测,使用 `GPU`预测 +#检测模型检测到的人送入关键点模型进行关键点预测 +./build/main --model_dir=/root/projects/models/yolov3_darknet --model_dir_keypoint=/root/projects/models/hrnet_w32_256x192 --image_file=/root/projects/images/test.jpeg --device=GPU +``` + +## 性能测试 +benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md) diff --git a/PaddleDetection-release-2.6/deploy/cpp/docs/windows_vs2019_build.md b/PaddleDetection-release-2.6/deploy/cpp/docs/windows_vs2019_build.md new file mode 100644 index 0000000000000000000000000000000000000000..1a23cabc7bf640ed548942012354013f500d6be2 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/docs/windows_vs2019_build.md @@ -0,0 +1,158 @@ +# Visual Studio 2019 Community CMake 编译指南 + +Windows 平台下,我们使用`Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目,但是直到`2019`才提供了稳定和完全的支持,所以如果你想使用CMake管理项目编译构建,我们推荐你使用`Visual Studio 2019`环境下构建。 + + +## 前置条件 +* Visual Studio 2019 (根据Paddle预测库所使用的VS版本选择,请参考 [Visual Studio 不同版本二进制兼容性](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=vs-2019) ) +* CUDA 9.0 / CUDA 10.0,cudnn 7+ / TensorRT(仅在使用GPU版本的预测库时需要) +* CMake 3.0+ [CMake下载](https://cmake.org/download/) + +**特别注意:windows下预测库需要的TensorRT版本为:**。 + +| 预测库版本 | TensorRT版本 | +| ---- | ---- | +| cuda10.1_cudnn7.6_avx_mkl_trt6 | TensorRT-6.0.1.5 | +| cuda10.2_cudnn7.6_avx_mkl_trt7 | TensorRT-7.0.0.11 | +| cuda11.0_cudnn8.0_avx_mkl_trt7 | TensorRT-7.2.1.6 | + +请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。 + +**下面所有示例以工作目录为 `D:\projects`演示**。 + +### Step1: 下载代码 + +下载源代码 +```shell +git clone https://github.com/PaddlePaddle/PaddleDetection.git +``` + +**说明**:其中`C++`预测代码在`PaddleDetection/deploy/cpp` 目录,该目录不依赖任何`PaddleDetection`下其他目录。 + + +### Step2: 下载PaddlePaddle C++ 预测库 paddle_inference + +PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#windows) + +解压后`D:\projects\paddle_inference`目录包含内容为: +``` +paddle_inference +├── paddle # paddle核心库和头文件 +| +├── third_party # 第三方依赖库和头文件 +| +└── version.txt # 版本和编译信息 +``` + +### Step3: 安装配置OpenCV + +1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download) +2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv` +3. 配置环境变量,如下流程所示(如果使用全局绝对路径,可以不用设置环境变量) + - 我的电脑->属性->高级系统设置->环境变量 + - 在系统变量中找到Path(如没有,自行创建),并双击编辑 + - 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin` + +### Step4: 编译 + +1. 进入到`cpp`文件夹 +``` +cd D:\projects\PaddleDetection\deploy\cpp +``` + +2. 使用CMake生成项目文件 + +编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐,**使用9.0、10.0版本,不使用9.2、10.1等版本CUDA库**): + +| 参数名 | 含义 | +| ---- | ---- | +| *CUDA_LIB | CUDA的库路径 | +| *CUDNN_LIB | CUDNN的库路径 | +| OPENCV_DIR | OpenCV的安装路径, | +| PADDLE_DIR | Paddle预测库的路径 | +| PADDLE_LIB_NAME | Paddle 预测库名称 | + +**注意:** + +1. 如果编译环境为CPU,需要下载`CPU`版预测库,请把`WITH_GPU`的勾去掉 +2. 如果使用的是`openblas`版本,请把`WITH_MKL`勾去掉 +3. 如无需使用关键点模型可以把`WITH_KEYPOINT`勾去掉 +4. Windows环境下,`PADDLE_LIB_NAME`需要设置为`paddle_inference` + +执行如下命令项目文件: +``` +cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=path_to_opencv -DWITH_KEYPOINT=ON +``` + +例如: +``` +cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=D:\projects\packages\cuda10_0\lib\x64 -DCUDNN_LIB=D:\projects\packages\cuda10_0\lib\x64 -DPADDLE_DIR=D:\projects\packages\paddle_inference -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=D:\projects\packages\opencv3_4_6 -DWITH_KEYPOINT=ON +``` + + + +3. 编译 +用`Visual Studio 16 2019`打开`cpp`文件夹下的`PaddleObjectDetector.sln`,将编译模式设置为`Release`,点击`生成`->`全部生成 + + +### Step5: 预测及可视化 + +上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录: + +``` +cd D:\projects\PaddleDetection\deploy\cpp\out\build\x64-Release +``` +可执行文件`main`即为样例的预测程序,其主要的命令行参数如下: + +| 参数 | 说明 | +| ---- | ---- | +| --model_dir | 导出的检测预测模型所在路径 | +| --model_dir_keypoint | Option | 导出的关键点预测模型所在路径 | +| --image_file | 要预测的图片文件路径 | +| --image_dir | 要预测的图片文件夹路径 | +| --video_file | 要预测的视频文件路径 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)| +| --device | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --gpu_id | 指定进行推理的GPU device id(默认值为0)| +| --run_mode | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --batch_size | 检测模型预测时的batch size,在指定`image_dir`时有效 | +| --batch_size_keypoint | 关键点模型预测时的batch size,默认为8 | +| --run_benchmark | 是否重复预测来进行benchmark测速 | +| --output_dir | 输出图片所在的文件夹, 默认为output | +| --use_mkldnn | CPU预测中是否开启MKLDNN加速 | +| --cpu_threads | 设置cpu线程数,默认为1 | +| --use_dark | 关键点模型输出预测是否使用DarkPose后处理,默认为true | + +**注意**: +(1)优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。 +(2)如果提示找不到`opencv_world346.dll`,把`D:\projects\packages\opencv3_4_6\build\x64\vc14\bin`文件夹下的`opencv_world346.dll`拷贝到`main.exe`文件夹下即可。 +(3)--run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 + + +`样例一`: +```shell +#不使用`GPU`测试图片 `D:\\images\\test.jpeg` +.\main --model_dir=D:\\models\\yolov3_darknet --image_file=D:\\images\\test.jpeg +``` + +图片文件`可视化预测结果`会保存在当前目录下`output.jpg`文件中。 + + +`样例二`: +```shell +#使用`GPU`测试视频 `D:\\videos\\test.mp4` +.\main --model_dir=D:\\models\\yolov3_darknet --video_path=D:\\videos\\test.mp4 --device=GPU +``` + +视频文件目前支持`.mp4`格式的预测,`可视化预测结果`会保存在当前目录下`output.mp4`文件中。 + + +`样例三`: +```shell +#使用关键点模型与检测模型联合预测,使用 `GPU`预测 +#检测模型检测到的人送入关键点模型进行关键点预测 +.\main --model_dir=D:\\models\\yolov3_darknet --model_dir_keypoint=D:\\models\\hrnet_w32_256x192 --image_file=D:\\images\\test.jpeg --device=GPU +``` + +## 性能测试 +Benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md) diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/config_parser.h b/PaddleDetection-release-2.6/deploy/cpp/include/config_parser.h new file mode 100644 index 0000000000000000000000000000000000000000..1f2e381c5284bb7ce16a6b06f858a32e83290f98 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/config_parser.h @@ -0,0 +1,142 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include + +#include "yaml-cpp/yaml.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleDetection { + +// Inference model configuration parser +class ConfigPaser { + public: + ConfigPaser() {} + + ~ConfigPaser() {} + + bool load_config(const std::string& model_dir, + const std::string& cfg = "infer_cfg.yml") { + // Load as a YAML::Node + YAML::Node config; + config = YAML::LoadFile(model_dir + OS_PATH_SEP + cfg); + + // Get runtime mode : paddle, trt_fp16, trt_fp32 + if (config["mode"].IsDefined()) { + mode_ = config["mode"].as(); + } else { + std::cerr << "Please set mode, " + << "support value : paddle/trt_fp16/trt_fp32." << std::endl; + return false; + } + + // Get model arch : YOLO, SSD, RetinaNet, RCNN, Face + if (config["arch"].IsDefined()) { + arch_ = config["arch"].as(); + } else { + std::cerr << "Please set model arch," + << "support value : YOLO, SSD, RetinaNet, RCNN, Face." + << std::endl; + return false; + } + + // Get min_subgraph_size for tensorrt + if (config["min_subgraph_size"].IsDefined()) { + min_subgraph_size_ = config["min_subgraph_size"].as(); + } else { + std::cerr << "Please set min_subgraph_size." << std::endl; + return false; + } + // Get draw_threshold for visualization + if (config["draw_threshold"].IsDefined()) { + draw_threshold_ = config["draw_threshold"].as(); + } else { + std::cerr << "Please set draw_threshold." << std::endl; + return false; + } + // Get Preprocess for preprocessing + if (config["Preprocess"].IsDefined()) { + preprocess_info_ = config["Preprocess"]; + } else { + std::cerr << "Please set Preprocess." << std::endl; + return false; + } + // Get label_list for visualization + if (config["label_list"].IsDefined()) { + label_list_ = config["label_list"].as>(); + } else { + std::cerr << "Please set label_list." << std::endl; + return false; + } + + // Get use_dynamic_shape for TensorRT + if (config["use_dynamic_shape"].IsDefined()) { + use_dynamic_shape_ = config["use_dynamic_shape"].as(); + } else { + std::cerr << "Please set use_dynamic_shape." << std::endl; + return false; + } + + // Get conf_thresh for tracker + if (config["tracker"].IsDefined()) { + if (config["tracker"]["conf_thres"].IsDefined()) { + conf_thresh_ = config["tracker"]["conf_thres"].as(); + } else { + std::cerr << "Please set conf_thres in tracker." << std::endl; + return false; + } + } + + // Get NMS for postprocess + if (config["NMS"].IsDefined()) { + nms_info_ = config["NMS"]; + } + // Get fpn_stride in PicoDet + if (config["fpn_stride"].IsDefined()) { + fpn_stride_.clear(); + for (auto item : config["fpn_stride"]) { + fpn_stride_.emplace_back(item.as()); + } + } + + if (config["mask"].IsDefined()) { + mask_ = config["mask"].as(); + } + + return true; + } + std::string mode_; + float draw_threshold_; + std::string arch_; + int min_subgraph_size_; + YAML::Node preprocess_info_; + YAML::Node nms_info_; + std::vector label_list_; + std::vector fpn_stride_; + bool use_dynamic_shape_; + float conf_thresh_; + bool mask_ = false; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/jde_detector.h b/PaddleDetection-release-2.6/deploy/cpp/include/jde_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..959b9b448b5d8a09909eaca93793d6c0d09003f5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/jde_detector.h @@ -0,0 +1,134 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/preprocess_op.h" +#include "include/tracker.h" + +using namespace paddle_infer; + +namespace PaddleDetection { +// JDE Detection Result +struct MOT_Rect { + float left; + float top; + float right; + float bottom; +}; + +struct MOT_Track { + int ids; + float score; + MOT_Rect rects; +}; + +typedef std::vector MOT_Result; + +// Generate visualization color +cv::Scalar GetColor(int idx); + +// Visualiztion Detection Result +cv::Mat VisualizeTrackResult(const cv::Mat& img, + const MOT_Result& results, + const float fps, + const int frame_id); + +class JDEDetector { + public: + explicit JDEDetector(const std::string& model_dir, + const std::string& device = "CPU", + bool use_mkldnn = false, + int cpu_threads = 1, + const std::string& run_mode = "paddle", + const int batch_size = 1, + const int gpu_id = 0, + const int trt_min_shape = 1, + const int trt_max_shape = 1280, + const int trt_opt_shape = 640, + bool trt_calib_mode = false, + const int min_box_area = 200) { + this->device_ = device; + this->gpu_id_ = gpu_id; + this->cpu_math_library_num_threads_ = cpu_threads; + this->use_mkldnn_ = use_mkldnn; + + this->trt_min_shape_ = trt_min_shape; + this->trt_max_shape_ = trt_max_shape; + this->trt_opt_shape_ = trt_opt_shape; + this->trt_calib_mode_ = trt_calib_mode; + config_.load_config(model_dir); + this->use_dynamic_shape_ = config_.use_dynamic_shape_; + this->min_subgraph_size_ = config_.min_subgraph_size_; + threshold_ = config_.draw_threshold_; + preprocessor_.Init(config_.preprocess_info_); + LoadModel(model_dir, batch_size, run_mode); + this->min_box_area_ = min_box_area; + this->conf_thresh_ = config_.conf_thresh_; + } + + // Load Paddle inference model + void LoadModel(const std::string& model_dir, + const int batch_size = 1, + const std::string& run_mode = "paddle"); + + // Run predictor + void Predict(const std::vector imgs, + const double threshold = 0.5, + const int warmup = 0, + const int repeats = 1, + MOT_Result* result = nullptr, + std::vector* times = nullptr); + + private: + std::string device_ = "CPU"; + int gpu_id_ = 0; + int cpu_math_library_num_threads_ = 1; + bool use_mkldnn_ = false; + int min_subgraph_size_ = 3; + bool use_dynamic_shape_ = false; + int trt_min_shape_ = 1; + int trt_max_shape_ = 1280; + int trt_opt_shape_ = 640; + bool trt_calib_mode_ = false; + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(const cv::Mat dets, const cv::Mat emb, MOT_Result* result); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + std::vector bbox_data_; + std::vector emb_data_; + float threshold_; + ConfigPaser config_; + float min_box_area_; + float conf_thresh_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_detector.h b/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..ce6aa0e0692d215fc1a704afd37c3787fe8e42ef --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_detector.h @@ -0,0 +1,126 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/keypoint_postprocess.h" +#include "include/preprocess_op.h" + +using namespace paddle_infer; + +namespace PaddleDetection { + +// Visualiztion KeyPoint Result +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap); + +class KeyPointDetector { + public: + explicit KeyPointDetector(const std::string& model_dir, + const std::string& device = "CPU", + bool use_mkldnn = false, + int cpu_threads = 1, + const std::string& run_mode = "paddle", + const int batch_size = 1, + const int gpu_id = 0, + const int trt_min_shape = 1, + const int trt_max_shape = 1280, + const int trt_opt_shape = 640, + bool trt_calib_mode = false, + bool use_dark = true) { + this->device_ = device; + this->gpu_id_ = gpu_id; + this->cpu_math_library_num_threads_ = cpu_threads; + this->use_mkldnn_ = use_mkldnn; + this->use_dark = use_dark; + + this->trt_min_shape_ = trt_min_shape; + this->trt_max_shape_ = trt_max_shape; + this->trt_opt_shape_ = trt_opt_shape; + this->trt_calib_mode_ = trt_calib_mode; + config_.load_config(model_dir); + this->use_dynamic_shape_ = config_.use_dynamic_shape_; + this->min_subgraph_size_ = config_.min_subgraph_size_; + threshold_ = config_.draw_threshold_; + preprocessor_.Init(config_.preprocess_info_); + LoadModel(model_dir, batch_size, run_mode); + } + + // Load Paddle inference model + void LoadModel(const std::string& model_dir, + const int batch_size = 1, + const std::string& run_mode = "paddle"); + + // Run predictor + void Predict(const std::vector imgs, + std::vector>& center, + std::vector>& scale, + const double threshold = 0.5, + const int warmup = 0, + const int repeats = 1, + std::vector* result = nullptr, + std::vector* times = nullptr); + + // Get Model Label list + const std::vector& GetLabelList() const { + return config_.label_list_; + } + + private: + std::string device_ = "CPU"; + int gpu_id_ = 0; + int cpu_math_library_num_threads_ = 1; + bool use_dark = true; + bool use_mkldnn_ = false; + int min_subgraph_size_ = 3; + bool use_dynamic_shape_ = false; + int trt_min_shape_ = 1; + int trt_max_shape_ = 1280; + int trt_opt_shape_ = 640; + bool trt_calib_mode_ = false; + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(std::vector& output, + std::vector output_shape, + std::vector& idxout, + std::vector idx_shape, + std::vector* result, + std::vector>& center, + std::vector>& scale); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + std::vector output_data_; + std::vector idx_data_; + float threshold_; + ConfigPaser config_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_postprocess.h b/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..fa0c7d55f06db986404eb23a7df1144a22e7f33f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/keypoint_postprocess.h @@ -0,0 +1,134 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +namespace PaddleDetection { + +std::vector get_3rd_point(std::vector& a, std::vector& b); + +std::vector get_dir(float src_point_x, float src_point_y, float rot_rad); + +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& preds, int p); + +cv::Mat get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + int inv); + +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine = false); + +void box_to_center_scale(std::vector& box, + int width, + int height, + std::vector& center, + std::vector& scale); + +void get_max_preds(float* heatmap, + std::vector& dim, + std::vector& preds, + float* maxvals, + int batchid, + int joint_idx); + +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK = true); + +// Object KeyPoint Result +struct KeyPointResult { + // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf + std::vector keypoints; + int num_joints = -1; +}; + +class PoseSmooth { + public: + explicit PoseSmooth(const int width, + const int height, + std::string filter_type = "OneEuro", + float alpha = 0.5, + float fc_d = 0.1, + float fc_min = 0.1, + float beta = 0.1, + float thres_mult = 0.3) + : width(width), + height(height), + alpha(alpha), + fc_d(fc_d), + fc_min(fc_min), + beta(beta), + filter_type(filter_type), + thres_mult(thres_mult){}; + + // Run predictor + KeyPointResult smooth_process(KeyPointResult* result); + void PointSmooth(KeyPointResult* result, + KeyPointResult* keypoint_smoothed, + std::vector thresholds, + int index); + float OneEuroFilter(float x_cur, float x_pre, int loc); + float smoothing_factor(float te, float fc); + float ExpSmoothing(float x_cur, float x_pre, int loc = 0); + + private: + int width = 0; + int height = 0; + float alpha = 0.; + float fc_d = 1.; + float fc_min = 0.; + float beta = 1.; + float thres_mult = 1.; + std::string filter_type = "OneEuro"; + std::vector thresholds = {0.005, + 0.005, + 0.005, + 0.005, + 0.005, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01}; + KeyPointResult x_prev_hat; + KeyPointResult dx_prev_hat; +}; +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/lapjv.h b/PaddleDetection-release-2.6/deploy/cpp/include/lapjv.h new file mode 100644 index 0000000000000000000000000000000000000000..331defc42c4c38d7360d38b881909fb51ce7e2c7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/lapjv.h @@ -0,0 +1,52 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/gatagat/lap/blob/master/lap/lapjv.h +// Ths copyright of gatagat/lap is as follows: +// MIT License + +#ifndef LAPJV_H +#define LAPJV_H + +#define LARGE 1000000 + +#if !defined TRUE +#define TRUE 1 +#endif +#if !defined FALSE +#define FALSE 0 +#endif + +#define NEW(x, t, n) if ((x = (t *)malloc(sizeof(t) * (n))) == 0) {return -1;} +#define FREE(x) if (x != 0) { free(x); x = 0; } +#define SWAP_INDICES(a, b) { int_t _temp_index = a; a = b; b = _temp_index; } +#include + +namespace PaddleDetection { + +typedef signed int int_t; +typedef unsigned int uint_t; +typedef double cost_t; +typedef char boolean; +typedef enum fp_t { FP_1 = 1, FP_2 = 2, FP_DYNAMIC = 3 } fp_t; + +int lapjv_internal( + const cv::Mat &cost, const bool extend_cost, const float cost_limit, + int *x, int *y); + +} // namespace PaddleDetection + +#endif // LAPJV_H + diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/object_detector.h b/PaddleDetection-release-2.6/deploy/cpp/include/object_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..47bd29362c85eafc3825d25af73694803e2a1504 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/object_detector.h @@ -0,0 +1,124 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/picodet_postprocess.h" +#include "include/preprocess_op.h" +#include "include/utils.h" + +using namespace paddle_infer; +namespace PaddleDetection { + +// Generate visualization colormap for each class +std::vector GenerateColorMap(int num_class); + +// Visualiztion Detection Result +cv::Mat +VisualizeResult(const cv::Mat &img, + const std::vector &results, + const std::vector &lables, + const std::vector &colormap, const bool is_rbox); + +class ObjectDetector { +public: + explicit ObjectDetector(const std::string &model_dir, + const std::string &device = "CPU", + bool use_mkldnn = false, int cpu_threads = 1, + const std::string &run_mode = "paddle", + const int batch_size = 1, const int gpu_id = 0, + const int trt_min_shape = 1, + const int trt_max_shape = 1280, + const int trt_opt_shape = 640, + bool trt_calib_mode = false) { + this->device_ = device; + this->gpu_id_ = gpu_id; + this->cpu_math_library_num_threads_ = cpu_threads; + this->use_mkldnn_ = use_mkldnn; + + this->trt_min_shape_ = trt_min_shape; + this->trt_max_shape_ = trt_max_shape; + this->trt_opt_shape_ = trt_opt_shape; + this->trt_calib_mode_ = trt_calib_mode; + config_.load_config(model_dir); + this->use_dynamic_shape_ = config_.use_dynamic_shape_; + this->min_subgraph_size_ = config_.min_subgraph_size_; + threshold_ = config_.draw_threshold_; + preprocessor_.Init(config_.preprocess_info_); + LoadModel(model_dir, batch_size, run_mode); + } + + // Load Paddle inference model + void LoadModel(const std::string &model_dir, const int batch_size = 1, + const std::string &run_mode = "paddle"); + + // Run predictor + void Predict(const std::vector imgs, const double threshold = 0.5, + const int warmup = 0, const int repeats = 1, + std::vector *result = nullptr, + std::vector *bbox_num = nullptr, + std::vector *times = nullptr); + + // Get Model Label list + const std::vector &GetLabelList() const { + return config_.label_list_; + } + +private: + std::string device_ = "CPU"; + int gpu_id_ = 0; + int cpu_math_library_num_threads_ = 1; + bool use_mkldnn_ = false; + int min_subgraph_size_ = 3; + bool use_dynamic_shape_ = false; + int trt_min_shape_ = 1; + int trt_max_shape_ = 1280; + int trt_opt_shape_ = 640; + bool trt_calib_mode_ = false; + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat &image_mat); + // Postprocess result + void Postprocess(const std::vector mats, + std::vector *result, + std::vector bbox_num, std::vector output_data_, + std::vector output_mask_data_, bool is_rbox); + + void SOLOv2Postprocess( + const std::vector mats, std::vector *result, + std::vector *bbox_num, std::vector out_bbox_num_data_, + std::vector out_label_data_, std::vector out_score_data_, + std::vector out_global_mask_data_, float threshold = 0.5); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + float threshold_; + ConfigPaser config_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/picodet_postprocess.h b/PaddleDetection-release-2.6/deploy/cpp/include/picodet_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..c0705e85d9ac089fd093ba6a1b213dfd08e6e449 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/picodet_postprocess.h @@ -0,0 +1,37 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include "include/utils.h" + +namespace PaddleDetection { + +void PicoDetPostProcess(std::vector *results, + std::vector outs, + std::vector fpn_stride, + std::vector im_shape, + std::vector scale_factor, + float score_threshold = 0.3, float nms_threshold = 0.5, + int num_class = 80, int reg_max = 7); + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/preprocess_op.h b/PaddleDetection-release-2.6/deploy/cpp/include/preprocess_op.h new file mode 100644 index 0000000000000000000000000000000000000000..e3d4a99bb15f2860a7ce4c7bb17b332565de2da1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/preprocess_op.h @@ -0,0 +1,239 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace PaddleDetection { + +// Object for storing all preprocessed data +class ImageBlob { + public: + // image width and height + std::vector im_shape_; + // Buffer for image data after preprocessing + std::vector im_data_; + // in net data shape(after pad) + std::vector in_net_shape_; + // Evaluation image width and height + // std::vector eval_im_size_f_; + // Scale factor for image size to origin image size + std::vector scale_factor_; + // in net image after preprocessing + cv::Mat in_net_im_; +}; + +// Abstraction of preprocessing opration class +class PreprocessOp { + public: + virtual void Init(const YAML::Node& item) = 0; + virtual void Run(cv::Mat* im, ImageBlob* data) = 0; +}; + +class InitInfo : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class NormalizeImage : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + mean_ = item["mean"].as>(); + scale_ = item["std"].as>(); + if (item["is_scale"]) is_scale_ = item["is_scale"].as(); + if (item["norm_type"]) norm_type_ = item["norm_type"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + // CHW or HWC + std::vector mean_; + std::vector scale_; + bool is_scale_ = true; + std::string norm_type_ = "mean_std"; +}; + +class Permute : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class Resize : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + interp_ = item["interp"].as(); + keep_ratio_ = item["keep_ratio"].as(); + target_size_ = item["target_size"].as>(); + } + + // Compute best resize scale for x-dimension, y-dimension + std::pair GenerateScale(const cv::Mat& im); + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int interp_; + bool keep_ratio_; + std::vector target_size_; + std::vector in_net_shape_; +}; + +class LetterBoxResize : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + target_size_ = item["target_size"].as>(); + } + + float GenerateScale(const cv::Mat& im); + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector target_size_; + std::vector in_net_shape_; +}; +// Models with FPN need input shape % stride == 0 +class PadStride : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + stride_ = item["stride"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int stride_; +}; + +class TopDownEvalAffine : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + trainsize_ = item["trainsize"].as>(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int interp_ = 1; + std::vector trainsize_; +}; + +class WarpAffine : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + input_h_ = item["input_h"].as(); + input_w_ = item["input_w"].as(); + keep_res_ = item["keep_res"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int input_h_; + int input_w_; + int interp_ = 1; + bool keep_res_ = true; + int pad_ = 31; +}; + +class Pad : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + size_ = item["size"].as>(); + fill_value_ = item["fill_value"].as>(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector size_; + std::vector fill_value_; +}; + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio = 0.15); + +// check whether the input size is dynamic +bool CheckDynamicInput(const std::vector& imgs); + +// Pad images in batch +std::vector PadBatch(const std::vector& imgs); + +class Preprocessor { + public: + void Init(const YAML::Node& config_node) { + // initialize image info at first + ops_["InitInfo"] = std::make_shared(); + for (const auto& item : config_node) { + auto op_name = item["type"].as(); + + ops_[op_name] = CreateOp(op_name); + ops_[op_name]->Init(item); + } + } + + std::shared_ptr CreateOp(const std::string& name) { + if (name == "Resize") { + return std::make_shared(); + } else if (name == "LetterBoxResize") { + return std::make_shared(); + } else if (name == "Permute") { + return std::make_shared(); + } else if (name == "NormalizeImage") { + return std::make_shared(); + } else if (name == "PadStride") { + // use PadStride instead of PadBatch + return std::make_shared(); + } else if (name == "TopDownEvalAffine") { + return std::make_shared(); + } else if (name == "WarpAffine") { + return std::make_shared(); + }else if (name == "Pad") { + return std::make_shared(); + } + std::cerr << "can not find function of OP: " << name + << " and return: nullptr" << std::endl; + return nullptr; + } + + void Run(cv::Mat* im, ImageBlob* data); + + public: + static const std::vector RUN_ORDER; + + private: + std::unordered_map> ops_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/tracker.h b/PaddleDetection-release-2.6/deploy/cpp/include/tracker.h new file mode 100644 index 0000000000000000000000000000000000000000..903c3b3046280766e33ce67ef157ba0ea558e3e1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/tracker.h @@ -0,0 +1,63 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/jdetracker.h +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#pragma once + +#include +#include +#include + +#include "trajectory.h" + +namespace PaddleDetection { + +typedef std::map Match; +typedef std::map::iterator MatchIterator; + +struct Track +{ + int id; + float score; + cv::Vec4f ltrb; +}; + +class JDETracker +{ +public: + static JDETracker *instance(void); + virtual bool update(const cv::Mat &dets, const cv::Mat &emb, std::vector &tracks); +private: + JDETracker(void); + virtual ~JDETracker(void) {} + cv::Mat motion_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b); + void linear_assignment(const cv::Mat &cost, float cost_limit, Match &matches, + std::vector &mismatch_row, std::vector &mismatch_col); + void remove_duplicate_trajectory(TrajectoryPool &a, TrajectoryPool &b, float iou_thresh=0.15f); +private: + static JDETracker *me; + int timestamp; + TrajectoryPool tracked_trajectories; + TrajectoryPool lost_trajectories; + TrajectoryPool removed_trajectories; + int max_lost_time; + float lambda; + float det_thresh; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/trajectory.h b/PaddleDetection-release-2.6/deploy/cpp/include/trajectory.h new file mode 100644 index 0000000000000000000000000000000000000000..d801e280007b52b6fda98d90aebde197cf090ca5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/trajectory.h @@ -0,0 +1,202 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/trajectory.h +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#pragma once + +#include +#include + +namespace PaddleDetection { + +typedef enum +{ + New = 0, + Tracked = 1, + Lost = 2, + Removed = 3 +} TrajectoryState; + +class Trajectory; +typedef std::vector TrajectoryPool; +typedef std::vector::iterator TrajectoryPoolIterator; +typedef std::vectorTrajectoryPtrPool; +typedef std::vector::iterator TrajectoryPtrPoolIterator; + +class TKalmanFilter : public cv::KalmanFilter +{ +public: + TKalmanFilter(void); + virtual ~TKalmanFilter(void) {} + virtual void init(const cv::Mat &measurement); + virtual const cv::Mat &predict(); + virtual const cv::Mat &correct(const cv::Mat &measurement); + virtual void project(cv::Mat &mean, cv::Mat &covariance) const; +private: + float std_weight_position; + float std_weight_velocity; +}; + +inline TKalmanFilter::TKalmanFilter(void) : cv::KalmanFilter(8, 4) +{ + cv::KalmanFilter::transitionMatrix = cv::Mat::eye(8, 8, CV_32F); + for (int i = 0; i < 4; ++i) + cv::KalmanFilter::transitionMatrix.at(i, i + 4) = 1; + cv::KalmanFilter::measurementMatrix = cv::Mat::eye(4, 8, CV_32F); + std_weight_position = 1/20.f; + std_weight_velocity = 1/160.f; +} + +class Trajectory : public TKalmanFilter +{ +public: + Trajectory(); + Trajectory(cv::Vec4f <rb, float score, const cv::Mat &embedding); + Trajectory(const Trajectory &other); + Trajectory &operator=(const Trajectory &rhs); + virtual ~Trajectory(void) {}; + + static int next_id(); + virtual const cv::Mat &predict(void); + virtual void update(Trajectory &traj, int timestamp, bool update_embedding=true); + virtual void activate(int timestamp); + virtual void reactivate(Trajectory &traj, int timestamp, bool newid=false); + virtual void mark_lost(void); + virtual void mark_removed(void); + + friend TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPool &b); + friend TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPtrPool &b); + friend TrajectoryPool &operator+=(TrajectoryPool &a, const TrajectoryPtrPool &b); + friend TrajectoryPool operator-(const TrajectoryPool &a, const TrajectoryPool &b); + friend TrajectoryPool &operator-=(TrajectoryPool &a, const TrajectoryPool &b); + friend TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b); + friend TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, TrajectoryPool &b); + friend TrajectoryPtrPool operator-(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b); + + friend cv::Mat embedding_distance(const TrajectoryPool &a, const TrajectoryPool &b); + friend cv::Mat embedding_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b); + friend cv::Mat embedding_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b); + + friend cv::Mat mahalanobis_distance(const TrajectoryPool &a, const TrajectoryPool &b); + friend cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b); + friend cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b); + + friend cv::Mat iou_distance(const TrajectoryPool &a, const TrajectoryPool &b); + friend cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b); + friend cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b); +private: + void update_embedding(const cv::Mat &embedding); +public: + TrajectoryState state; + cv::Vec4f ltrb; + cv::Mat smooth_embedding; + int id; + bool is_activated; + int timestamp; + int starttime; + float score; +private: + static int count; + cv::Vec4f xyah; + cv::Mat current_embedding; + float eta; + int length; +}; + +inline cv::Vec4f ltrb2xyah(cv::Vec4f <rb) +{ + cv::Vec4f xyah; + xyah[0] = (ltrb[0] + ltrb[2]) * 0.5f; + xyah[1] = (ltrb[1] + ltrb[3]) * 0.5f; + xyah[3] = ltrb[3] - ltrb[1]; + xyah[2] = (ltrb[2] - ltrb[0]) / xyah[3]; + return xyah; +} + +inline Trajectory::Trajectory() : + state(New), ltrb(cv::Vec4f()), smooth_embedding(cv::Mat()), id(0), + is_activated(false), timestamp(0), starttime(0), score(0), eta(0.9), length(0) +{ +} + +inline Trajectory::Trajectory(cv::Vec4f <rb_, float score_, const cv::Mat &embedding) : + state(New), ltrb(ltrb_), smooth_embedding(cv::Mat()), id(0), + is_activated(false), timestamp(0), starttime(0), score(score_), eta(0.9), length(0) +{ + xyah = ltrb2xyah(ltrb); + update_embedding(embedding); +} + +inline Trajectory::Trajectory(const Trajectory &other): + state(other.state), ltrb(other.ltrb), id(other.id), is_activated(other.is_activated), + timestamp(other.timestamp), starttime(other.starttime), xyah(other.xyah), + score(other.score), eta(other.eta), length(other.length) +{ + other.smooth_embedding.copyTo(smooth_embedding); + other.current_embedding.copyTo(current_embedding); + // copy state in KalmanFilter + + other.statePre.copyTo(cv::KalmanFilter::statePre); + other.statePost.copyTo(cv::KalmanFilter::statePost); + other.errorCovPre.copyTo(cv::KalmanFilter::errorCovPre); + other.errorCovPost.copyTo(cv::KalmanFilter::errorCovPost); + +} + +inline Trajectory &Trajectory::operator=(const Trajectory &rhs) +{ + this->state = rhs.state; + this->ltrb = rhs.ltrb; + rhs.smooth_embedding.copyTo(this->smooth_embedding); + this->id = rhs.id; + this->is_activated = rhs.is_activated; + this->timestamp = rhs.timestamp; + this->starttime = rhs.starttime; + this->xyah = rhs.xyah; + this->score = rhs.score; + rhs.current_embedding.copyTo(this->current_embedding); + this->eta = rhs.eta; + this->length = rhs.length; + + // copy state in KalmanFilter + + rhs.statePre.copyTo(cv::KalmanFilter::statePre); + rhs.statePost.copyTo(cv::KalmanFilter::statePost); + rhs.errorCovPre.copyTo(cv::KalmanFilter::errorCovPre); + rhs.errorCovPost.copyTo(cv::KalmanFilter::errorCovPost); + + return *this; +} + +inline int Trajectory::next_id() +{ + ++count; + return count; +} + +inline void Trajectory::mark_lost(void) +{ + state = Lost; +} + +inline void Trajectory::mark_removed(void) +{ + state = Removed; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/include/utils.h b/PaddleDetection-release-2.6/deploy/cpp/include/utils.h new file mode 100644 index 0000000000000000000000000000000000000000..b41db0dacff17339ffcac591b7825cec09d3663d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/include/utils.h @@ -0,0 +1,41 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +namespace PaddleDetection { + +// Object Detection Result +struct ObjectResult { + // Rectangle coordinates of detected object: left, right, top, down + std::vector rect; + // Class id of detected object + int class_id; + // Confidence of detected object + float confidence; + // Mask of detected object + std::vector mask; +}; + +void nms(std::vector &input_boxes, float nms_threshold); + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/scripts/build.sh b/PaddleDetection-release-2.6/deploy/cpp/scripts/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..1937c7a05b6854e47c32c7c7833526b94de083ff --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/scripts/build.sh @@ -0,0 +1,86 @@ +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=OFF + +# 是否使用MKL or openblas,TX2需要设置为OFF +WITH_MKL=ON + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=OFF + +# paddle 预测库lib名称,由于不同平台不同版本预测库lib名称不同,请查看所下载的预测库中`paddle_inference/lib/`文件夹下`lib`的名称 +PADDLE_LIB_NAME=libpaddle_inference + +# TensorRT 的include路径 +TENSORRT_INC_DIR=/path/to/tensorrt/include + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/path/to/tensorrt/lib + +# Paddle 预测库路径 +PADDLE_DIR=/path/to/paddle_inference + +# CUDA 的 lib 路径 +CUDA_LIB=/path/to/cuda/lib + +# CUDNN 的 lib 路径 +CUDNN_LIB=/path/to/cudnn/lib + +# 是否开启关键点模型预测功能 +WITH_KEYPOINT=OFF + +# 是否开启跟踪模型预测功能 +WITH_MOT=OFF + +MACHINE_TYPE=`uname -m` +echo "MACHINE_TYPE: "${MACHINE_TYPE} + + +if [ "$MACHINE_TYPE" = "x86_64" ] +then + echo "set OPENCV_DIR for x86_64" + # linux系统通过以下命令下载预编译的opencv + mkdir -p $(pwd)/deps && cd $(pwd)/deps + wget -c https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz + tar -xvf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz && cd .. + + # set OPENCV_DIR + OPENCV_DIR=$(pwd)/deps/opencv-3.4.16_gcc8.2_ffmpeg + +elif [ "$MACHINE_TYPE" = "aarch64" ] +then + echo "set OPENCV_DIR for aarch64" + # TX2平台通过以下命令下载预编译的opencv + mkdir -p $(pwd)/deps && cd $(pwd)/deps + wget -c https://bj.bcebos.com/v1/paddledet/data/TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0.tar.gz + tar -xvf TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0.tar.gz && cd .. + + # set OPENCV_DIR + OPENCV_DIR=$(pwd)/deps/TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0/ + +else + echo "Please set OPENCV_DIR manually" +fi + +echo "OPENCV_DIR: "$OPENCV_DIR + +# 以下无需改动 +rm -rf build +mkdir -p build +cd build +cmake .. \ + -DWITH_GPU=${WITH_GPU} \ + -DWITH_MKL=${WITH_MKL} \ + -DWITH_TENSORRT=${WITH_TENSORRT} \ + -DTENSORRT_LIB_DIR=${TENSORRT_LIB_DIR} \ + -DTENSORRT_INC_DIR=${TENSORRT_INC_DIR} \ + -DPADDLE_DIR=${PADDLE_DIR} \ + -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \ + -DCUDA_LIB=${CUDA_LIB} \ + -DCUDNN_LIB=${CUDNN_LIB} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DPADDLE_LIB_NAME=${PADDLE_LIB_NAME} \ + -DWITH_KEYPOINT=${WITH_KEYPOINT} \ + -DWITH_MOT=${WITH_MOT} + +make +echo "make finished!" diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/jde_detector.cc b/PaddleDetection-release-2.6/deploy/cpp/src/jde_detector.cc new file mode 100644 index 0000000000000000000000000000000000000000..5df8b87a7f89deddb19fb328ab5d5adcd5c5245c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/jde_detector.cc @@ -0,0 +1,368 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/jde_detector.h" + +using namespace paddle_infer; + +namespace PaddleDetection { + +// Load Model and create model predictor +void JDEDetector::LoadModel(const std::string& model_dir, + const int batch_size, + const std::string& run_mode) { + paddle_infer::Config config; + std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel"; + std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams"; + config.SetModel(prog_file, params_file); + if (this->device_ == "GPU") { + config.EnableUseGpu(200, this->gpu_id_); + config.SwitchIrOptim(true); + // use tensorrt + if (run_mode != "paddle") { + auto precision = paddle_infer::Config::Precision::kFloat32; + if (run_mode == "trt_fp32") { + precision = paddle_infer::Config::Precision::kFloat32; + } else if (run_mode == "trt_fp16") { + precision = paddle_infer::Config::Precision::kHalf; + } else if (run_mode == "trt_int8") { + precision = paddle_infer::Config::Precision::kInt8; + } else { + printf( + "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " + "'trt_int8'"); + } + // set tensorrt + config.EnableTensorRtEngine(1 << 30, + batch_size, + this->min_subgraph_size_, + precision, + false, + this->trt_calib_mode_); + + // set use dynamic shape + if (this->use_dynamic_shape_) { + // set DynamicShsape for image tensor + const std::vector min_input_shape = { + 1, 3, this->trt_min_shape_, this->trt_min_shape_}; + const std::vector max_input_shape = { + 1, 3, this->trt_max_shape_, this->trt_max_shape_}; + const std::vector opt_input_shape = { + 1, 3, this->trt_opt_shape_, this->trt_opt_shape_}; + const std::map> map_min_input_shape = { + {"image", min_input_shape}}; + const std::map> map_max_input_shape = { + {"image", max_input_shape}}; + const std::map> map_opt_input_shape = { + {"image", opt_input_shape}}; + + config.SetTRTDynamicShapeInfo( + map_min_input_shape, map_max_input_shape, map_opt_input_shape); + std::cout << "TensorRT dynamic shape enabled" << std::endl; + } + } + + } else if (this->device_ == "XPU") { + config.EnableXpu(10 * 1024 * 1024); + } else { + config.DisableGpu(); + if (this->use_mkldnn_) { + config.EnableMKLDNN(); + // cache 10 different shapes for mkldnn to avoid memory leak + config.SetMkldnnCacheCapacity(10); + } + config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); + } + config.SwitchUseFeedFetchOps(false); + config.SwitchIrOptim(true); + config.DisableGlogInfo(); + // Memory optimization + config.EnableMemoryOptim(); + predictor_ = std::move(CreatePredictor(config)); +} + +// Visualiztion results +cv::Mat VisualizeTrackResult(const cv::Mat& img, + const MOT_Result& results, + const float fps, + const int frame_id) { + cv::Mat vis_img = img.clone(); + int im_h = img.rows; + int im_w = img.cols; + float text_scale = std::max(1, int(im_w / 1600.)); + float text_thickness = 2.; + float line_thickness = std::max(1, int(im_w / 500.)); + + std::ostringstream oss; + oss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + oss << "frame: " << frame_id << " "; + oss << "fps: " << fps << " "; + oss << "num: " << results.size(); + std::string text = oss.str(); + + cv::Point origin; + origin.x = 0; + origin.y = int(15 * text_scale); + cv::putText(vis_img, + text, + origin, + cv::FONT_HERSHEY_PLAIN, + text_scale, + (0, 0, 255), + 2); + + for (int i = 0; i < results.size(); ++i) { + const int obj_id = results[i].ids; + const float score = results[i].score; + + cv::Scalar color = GetColor(obj_id); + + cv::Point pt1 = cv::Point(results[i].rects.left, results[i].rects.top); + cv::Point pt2 = cv::Point(results[i].rects.right, results[i].rects.bottom); + cv::Point id_pt = + cv::Point(results[i].rects.left, results[i].rects.top + 10); + cv::Point score_pt = + cv::Point(results[i].rects.left, results[i].rects.top - 10); + cv::rectangle(vis_img, pt1, pt2, color, line_thickness); + + std::ostringstream idoss; + idoss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + idoss << obj_id; + std::string id_text = idoss.str(); + + cv::putText(vis_img, + id_text, + id_pt, + cv::FONT_HERSHEY_PLAIN, + text_scale, + cv::Scalar(0, 255, 255), + text_thickness); + + std::ostringstream soss; + soss << std::setiosflags(std::ios::fixed) << std::setprecision(2); + soss << score; + std::string score_text = soss.str(); + + cv::putText(vis_img, + score_text, + score_pt, + cv::FONT_HERSHEY_PLAIN, + text_scale, + cv::Scalar(0, 255, 255), + text_thickness); + } + return vis_img; +} + +void FilterDets(const float conf_thresh, + const cv::Mat dets, + std::vector* index) { + for (int i = 0; i < dets.rows; ++i) { + float score = *dets.ptr(i, 4); + if (score > conf_thresh) { + index->push_back(i); + } + } +} + +void JDEDetector::Preprocess(const cv::Mat& ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + preprocessor_.Run(&im, &inputs_); +} + +void JDEDetector::Postprocess(const cv::Mat dets, + const cv::Mat emb, + MOT_Result* result) { + result->clear(); + std::vector tracks; + std::vector valid; + FilterDets(conf_thresh_, dets, &valid); + cv::Mat new_dets, new_emb; + for (int i = 0; i < valid.size(); ++i) { + new_dets.push_back(dets.row(valid[i])); + new_emb.push_back(emb.row(valid[i])); + } + JDETracker::instance()->update(new_dets, new_emb, tracks); + if (tracks.size() == 0) { + MOT_Track mot_track; + MOT_Rect ret = {*dets.ptr(0, 0), + *dets.ptr(0, 1), + *dets.ptr(0, 2), + *dets.ptr(0, 3)}; + mot_track.ids = 1; + mot_track.score = *dets.ptr(0, 4); + mot_track.rects = ret; + result->push_back(mot_track); + } else { + std::vector::iterator titer; + for (titer = tracks.begin(); titer != tracks.end(); ++titer) { + if (titer->score < threshold_) { + continue; + } else { + float w = titer->ltrb[2] - titer->ltrb[0]; + float h = titer->ltrb[3] - titer->ltrb[1]; + bool vertical = w / h > 1.6; + float area = w * h; + if (area > min_box_area_ && !vertical) { + MOT_Track mot_track; + MOT_Rect ret = { + titer->ltrb[0], titer->ltrb[1], titer->ltrb[2], titer->ltrb[3]}; + mot_track.rects = ret; + mot_track.score = titer->score; + mot_track.ids = titer->id; + result->push_back(mot_track); + } + } + } + } +} + +void JDEDetector::Predict(const std::vector imgs, + const double threshold, + const int warmup, + const int repeats, + MOT_Result* result, + std::vector* times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + std::vector im_shape_all(batch_size * 2); + std::vector scale_factor_all(batch_size * 2); + + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + im_shape_all[bs_idx * 2] = inputs_.im_shape_[0]; + im_shape_all[bs_idx * 2 + 1] = inputs_.im_shape_[1]; + + scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; + scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; + + // TODO: reduce cost time + in_data_all.insert( + in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + } + + // Prepare input tensor + auto input_names = predictor_->GetInputNames(); + for (const auto& tensor_name : input_names) { + auto in_tensor = predictor_->GetInputHandle(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Reshape({batch_size, 3, rh, rw}); + in_tensor->CopyFromCpu(in_data_all.data()); + } else if (tensor_name == "im_shape") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(im_shape_all.data()); + } else if (tensor_name == "scale_factor") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(scale_factor_all.data()); + } + } + + auto preprocess_end = std::chrono::steady_clock::now(); + std::vector bbox_shape; + std::vector emb_shape; + // Run predictor + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto bbox_tensor = predictor_->GetOutputHandle(output_names[0]); + bbox_shape = bbox_tensor->shape(); + auto emb_tensor = predictor_->GetOutputHandle(output_names[1]); + emb_shape = emb_tensor->shape(); + // Calculate bbox length + int bbox_size = 1; + for (int j = 0; j < bbox_shape.size(); ++j) { + bbox_size *= bbox_shape[j]; + } + // Calculate emb length + int emb_size = 1; + for (int j = 0; j < emb_shape.size(); ++j) { + emb_size *= emb_shape[j]; + } + + bbox_data_.resize(bbox_size); + bbox_tensor->CopyToCpu(bbox_data_.data()); + + emb_data_.resize(emb_size); + emb_tensor->CopyToCpu(emb_data_.data()); + } + + auto inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto bbox_tensor = predictor_->GetOutputHandle(output_names[0]); + bbox_shape = bbox_tensor->shape(); + auto emb_tensor = predictor_->GetOutputHandle(output_names[1]); + emb_shape = emb_tensor->shape(); + // Calculate bbox length + int bbox_size = 1; + for (int j = 0; j < bbox_shape.size(); ++j) { + bbox_size *= bbox_shape[j]; + } + // Calculate emb length + int emb_size = 1; + for (int j = 0; j < emb_shape.size(); ++j) { + emb_size *= emb_shape[j]; + } + + bbox_data_.resize(bbox_size); + bbox_tensor->CopyToCpu(bbox_data_.data()); + + emb_data_.resize(emb_size); + emb_tensor->CopyToCpu(emb_data_.data()); + } + auto inference_end = std::chrono::steady_clock::now(); + auto postprocess_start = std::chrono::steady_clock::now(); + // Postprocessing result + result->clear(); + + cv::Mat dets(bbox_shape[0], 6, CV_32FC1, bbox_data_.data()); + cv::Mat emb(bbox_shape[0], emb_shape[1], CV_32FC1, emb_data_.data()); + + Postprocess(dets, emb, result); + + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + (*times)[0] += double(preprocess_diff.count() * 1000); + std::chrono::duration inference_diff = inference_end - inference_start; + (*times)[1] += double(inference_diff.count() * 1000); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + (*times)[2] += double(postprocess_diff.count() * 1000); +} + +cv::Scalar GetColor(int idx) { + idx = idx * 3; + cv::Scalar color = + cv::Scalar((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255); + return color; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_detector.cc b/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_detector.cc new file mode 100644 index 0000000000000000000000000000000000000000..b0ee884566749c5ab459d8ec76aa98ae4e1d1f3c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_detector.cc @@ -0,0 +1,314 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/keypoint_detector.h" + +using namespace paddle_infer; + +namespace PaddleDetection { + +// Load Model and create model predictor +void KeyPointDetector::LoadModel(const std::string& model_dir, + const int batch_size, + const std::string& run_mode) { + paddle_infer::Config config; + std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel"; + std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams"; + config.SetModel(prog_file, params_file); + if (this->device_ == "GPU") { + config.EnableUseGpu(200, this->gpu_id_); + config.SwitchIrOptim(true); + // use tensorrt + if (run_mode != "paddle") { + auto precision = paddle_infer::Config::Precision::kFloat32; + if (run_mode == "trt_fp32") { + precision = paddle_infer::Config::Precision::kFloat32; + } else if (run_mode == "trt_fp16") { + precision = paddle_infer::Config::Precision::kHalf; + } else if (run_mode == "trt_int8") { + precision = paddle_infer::Config::Precision::kInt8; + } else { + printf( + "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " + "'trt_int8'"); + } + // set tensorrt + config.EnableTensorRtEngine(1 << 30, + batch_size, + this->min_subgraph_size_, + precision, + false, + this->trt_calib_mode_); + + // set use dynamic shape + if (this->use_dynamic_shape_) { + // set DynamicShsape for image tensor + const std::vector min_input_shape = { + 1, 3, this->trt_min_shape_, this->trt_min_shape_}; + const std::vector max_input_shape = { + 1, 3, this->trt_max_shape_, this->trt_max_shape_}; + const std::vector opt_input_shape = { + 1, 3, this->trt_opt_shape_, this->trt_opt_shape_}; + const std::map> map_min_input_shape = { + {"image", min_input_shape}}; + const std::map> map_max_input_shape = { + {"image", max_input_shape}}; + const std::map> map_opt_input_shape = { + {"image", opt_input_shape}}; + + config.SetTRTDynamicShapeInfo( + map_min_input_shape, map_max_input_shape, map_opt_input_shape); + std::cout << "TensorRT dynamic shape enabled" << std::endl; + } + } + + } else if (this->device_ == "XPU") { + config.EnableXpu(10 * 1024 * 1024); + } else { + config.DisableGpu(); + if (this->use_mkldnn_) { + config.EnableMKLDNN(); + // cache 10 different shapes for mkldnn to avoid memory leak + config.SetMkldnnCacheCapacity(10); + } + config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); + } + config.SwitchUseFeedFetchOps(false); + config.SwitchIrOptim(true); + config.DisableGlogInfo(); + // Memory optimization + config.EnableMemoryOptim(); + predictor_ = std::move(CreatePredictor(config)); +} + +// Visualization MaskDetector results +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap) { + const int edge[][2] = {{0, 1}, + {0, 2}, + {1, 3}, + {2, 4}, + {3, 5}, + {4, 6}, + {5, 7}, + {6, 8}, + {7, 9}, + {8, 10}, + {5, 11}, + {6, 12}, + {11, 13}, + {12, 14}, + {13, 15}, + {14, 16}, + {11, 12}}; + cv::Mat vis_img = img.clone(); + for (int batchid = 0; batchid < results.size(); batchid++) { + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[i * 3] > 0.5) { + int x_coord = int(results[batchid].keypoints[i * 3 + 1]); + int y_coord = int(results[batchid].keypoints[i * 3 + 2]); + cv::circle(vis_img, + cv::Point2d(x_coord, y_coord), + 1, + cv::Scalar(0, 0, 255), + 2); + } + } + for (int i = 0; i < results[batchid].num_joints; i++) { + int x_start = int(results[batchid].keypoints[edge[i][0] * 3 + 1]); + int y_start = int(results[batchid].keypoints[edge[i][0] * 3 + 2]); + int x_end = int(results[batchid].keypoints[edge[i][1] * 3 + 1]); + int y_end = int(results[batchid].keypoints[edge[i][1] * 3 + 2]); + cv::line(vis_img, + cv::Point2d(x_start, y_start), + cv::Point2d(x_end, y_end), + colormap[i], + 1); + } + } + return vis_img; +} + +void KeyPointDetector::Preprocess(const cv::Mat& ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + cv::cvtColor(im, im, cv::COLOR_BGR2RGB); + preprocessor_.Run(&im, &inputs_); +} + +void KeyPointDetector::Postprocess(std::vector& output, + std::vector output_shape, + std::vector& idxout, + std::vector idx_shape, + std::vector* result, + std::vector>& center_bs, + std::vector>& scale_bs) { + std::vector preds(output_shape[1] * 3, 0); + + for (int batchid = 0; batchid < output_shape[0]; batchid++) { + get_final_preds(output, + output_shape, + idxout, + idx_shape, + center_bs[batchid], + scale_bs[batchid], + preds, + batchid, + this->use_dark); + KeyPointResult result_item; + result_item.num_joints = output_shape[1]; + result_item.keypoints.clear(); + for (int i = 0; i < output_shape[1]; i++) { + result_item.keypoints.emplace_back(preds[i * 3]); + result_item.keypoints.emplace_back(preds[i * 3 + 1]); + result_item.keypoints.emplace_back(preds[i * 3 + 2]); + } + result->push_back(result_item); + } +} + +void KeyPointDetector::Predict(const std::vector imgs, + std::vector>& center_bs, + std::vector>& scale_bs, + const double threshold, + const int warmup, + const int repeats, + std::vector* result, + std::vector* times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + std::vector im_shape_all(batch_size * 2); + std::vector scale_factor_all(batch_size * 2); + + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + im_shape_all[bs_idx * 2] = inputs_.im_shape_[0]; + im_shape_all[bs_idx * 2 + 1] = inputs_.im_shape_[1]; + + scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; + scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; + + // TODO: reduce cost time + in_data_all.insert( + in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + } + + // Prepare input tensor + + auto input_names = predictor_->GetInputNames(); + for (const auto& tensor_name : input_names) { + auto in_tensor = predictor_->GetInputHandle(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Reshape({batch_size, 3, rh, rw}); + in_tensor->CopyFromCpu(in_data_all.data()); + } else if (tensor_name == "im_shape") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(im_shape_all.data()); + } else if (tensor_name == "scale_factor") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(scale_factor_all.data()); + } + } + + auto preprocess_end = std::chrono::steady_clock::now(); + std::vector output_shape, idx_shape; + // Run predictor + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto out_tensor = predictor_->GetOutputHandle(output_names[0]); + output_shape = out_tensor->shape(); + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + output_data_.resize(output_size); + out_tensor->CopyToCpu(output_data_.data()); + + auto idx_tensor = predictor_->GetOutputHandle(output_names[1]); + idx_shape = idx_tensor->shape(); + // Calculate output length + output_size = 1; + for (int j = 0; j < idx_shape.size(); ++j) { + output_size *= idx_shape[j]; + } + idx_data_.resize(output_size); + idx_tensor->CopyToCpu(idx_data_.data()); + } + + auto inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto out_tensor = predictor_->GetOutputHandle(output_names[0]); + output_shape = out_tensor->shape(); + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + if (output_size < 6) { + std::cerr << "[WARNING] No object detected." << std::endl; + } + output_data_.resize(output_size); + out_tensor->CopyToCpu(output_data_.data()); + + auto idx_tensor = predictor_->GetOutputHandle(output_names[1]); + idx_shape = idx_tensor->shape(); + // Calculate output length + output_size = 1; + for (int j = 0; j < idx_shape.size(); ++j) { + output_size *= idx_shape[j]; + } + idx_data_.resize(output_size); + idx_tensor->CopyToCpu(idx_data_.data()); + } + auto inference_end = std::chrono::steady_clock::now(); + auto postprocess_start = std::chrono::steady_clock::now(); + // Postprocessing result + Postprocess(output_data_, + output_shape, + idx_data_, + idx_shape, + result, + center_bs, + scale_bs); + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + times->push_back(double(preprocess_diff.count() * 1000)); + std::chrono::duration inference_diff = inference_end - inference_start; + times->push_back(double(inference_diff.count() / repeats * 1000)); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + times->push_back(double(postprocess_diff.count() * 1000)); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_postprocess.cc b/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_postprocess.cc new file mode 100644 index 0000000000000000000000000000000000000000..eb692b0a78bcf48ac96aa45b671300b9ff2db400 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/keypoint_postprocess.cc @@ -0,0 +1,316 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include "include/keypoint_postprocess.h" +#include +#define PI 3.1415926535 +#define HALF_CIRCLE_DEGREE 180 + +namespace PaddleDetection { + +cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) { + cv::Point2f direct{a.x - b.x, a.y - b.y}; + return cv::Point2f(a.x - direct.y, a.y + direct.x); +} + +std::vector get_dir(float src_point_x, + float src_point_y, + float rot_rad) { + float sn = sin(rot_rad); + float cs = cos(rot_rad); + std::vector src_result{0.0, 0.0}; + src_result[0] = src_point_x * cs - src_point_y * sn; + src_result[1] = src_point_x * sn + src_point_y * cs; + return src_result; +} + +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& preds, int p) { + double new1[3] = {pt_x, pt_y, 1.0}; + cv::Mat new_pt(3, 1, trans.type(), new1); + cv::Mat w = trans * new_pt; + preds[p * 3 + 1] = static_cast(w.at(0, 0)); + preds[p * 3 + 2] = static_cast(w.at(1, 0)); +} + +void get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + cv::Mat& trans, + int inv) { + float src_w = scale[0]; + float dst_w = static_cast(output_size[0]); + float dst_h = static_cast(output_size[1]); + float rot_rad = rot * PI / HALF_CIRCLE_DEGREE; + std::vector src_dir = get_dir(-0.5 * src_w, 0, rot_rad); + std::vector dst_dir{-0.5f * dst_w, 0.0}; + cv::Point2f srcPoint2f[3], dstPoint2f[3]; + srcPoint2f[0] = cv::Point2f(center[0], center[1]); + srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]); + srcPoint2f[2] = get_3rd_point(srcPoint2f[0], srcPoint2f[1]); + + dstPoint2f[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstPoint2f[1] = + cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]); + dstPoint2f[2] = get_3rd_point(dstPoint2f[0], dstPoint2f[1]); + if (inv == 0) { + trans = cv::getAffineTransform(srcPoint2f, dstPoint2f); + } else { + trans = cv::getAffineTransform(dstPoint2f, srcPoint2f); + } +} + +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine) { + if (affine) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform( + coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } + } else { + float heat_w = static_cast(output_size[0]); + float heat_h = static_cast(output_size[1]); + float x_scale = scale[0] / heat_w; + float y_scale = scale[1] / heat_h; + float offset_x = center[0] - scale[0] / 2.; + float offset_y = center[1] - scale[1] / 2.; + for (int i = 0; i < dim[1]; i++) { + target_coords[i * 3 + 1] = x_scale * coords[i * 2] + offset_x; + target_coords[i * 3 + 2] = y_scale * coords[i * 2 + 1] + offset_y; + } + } +} + +// only for batchsize == 1 +void get_max_preds(float* heatmap, + std::vector& dim, + std::vector& preds, + float* maxvals, + int batchid, + int joint_idx) { + int num_joints = dim[1]; + int width = dim[3]; + std::vector idx; + idx.resize(num_joints * 2); + + for (int j = 0; j < dim[1]; j++) { + float* index = &( + heatmap[batchid * num_joints * dim[2] * dim[3] + j * dim[2] * dim[3]]); + float* end = index + dim[2] * dim[3]; + float* max_dis = std::max_element(index, end); + auto max_id = std::distance(index, max_dis); + maxvals[j] = *max_dis; + if (*max_dis > 0) { + preds[j * 2] = static_cast(max_id % width); + preds[j * 2 + 1] = static_cast(max_id / width); + } + } +} + +void dark_parse(std::vector& heatmap, + std::vector& dim, + std::vector& coords, + int px, + int py, + int index, + int ch) { + /*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate + Representation for Human Pose Estimation (CVPR 2020). + 1) offset = - hassian.inv() * derivative + 2) dx = (heatmap[x+1] - heatmap[x-1])/2. + 3) dxx = (dx[x+1] - dx[x-1])/2. + 4) derivative = Mat([dx, dy]) + 5) hassian = Mat([[dxx, dxy], [dxy, dyy]]) + */ + std::vector::const_iterator first1 = heatmap.begin() + index; + std::vector::const_iterator last1 = + heatmap.begin() + index + dim[2] * dim[3]; + std::vector heatmap_ch(first1, last1); + cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0, dim[2]); + heatmap_mat.convertTo(heatmap_mat, CV_32FC1); + cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0); + heatmap_mat = heatmap_mat.reshape(1, 1); + heatmap_ch = std::vector(heatmap_mat.reshape(1, 1)); + + float epsilon = 1e-10; + // sample heatmap to get values in around target location + float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon)); + float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon)); + float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon)); + + float xr2 = log(fmax(heatmap_ch[py * dim[3] + px + 2], epsilon)); + float xl2 = log(fmax(heatmap_ch[py * dim[3] + px - 2], epsilon)); + float yu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px], epsilon)); + float yd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px], epsilon)); + float yu2 = log(fmax(heatmap_ch[(py + 2) * dim[3] + px], epsilon)); + float yd2 = log(fmax(heatmap_ch[(py - 2) * dim[3] + px], epsilon)); + float xryu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px + 1], epsilon)); + float xryd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px + 1], epsilon)); + float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon)); + float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon)); + + // compute dx/dy and dxx/dyy with sampled values + float dx = 0.5 * (xr - xl); + float dy = 0.5 * (yu - yd); + float dxx = 0.25 * (xr2 - 2 * xy + xl2); + float dxy = 0.25 * (xryu - xryd - xlyu + xlyd); + float dyy = 0.25 * (yu2 - 2 * xy + yd2); + + // finally get offset by derivative and hassian, which combined by dx/dy and + // dxx/dyy + if (dxx * dyy - dxy * dxy != 0) { + float M[2][2] = {dxx, dxy, dxy, dyy}; + float D[2] = {dx, dy}; + cv::Mat hassian(2, 2, CV_32F, M); + cv::Mat derivative(2, 1, CV_32F, D); + cv::Mat offset = -hassian.inv() * derivative; + coords[ch * 2] += offset.at(0, 0); + coords[ch * 2 + 1] += offset.at(1, 0); + } +} + +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK) { + std::vector coords; + coords.resize(dim[1] * 2); + int heatmap_height = dim[2]; + int heatmap_width = dim[3]; + + for (int j = 0; j < dim[1]; ++j) { + int index = (batchid * dim[1] + j) * dim[2] * dim[3]; + + int idx = idxout[batchid * dim[1] + j]; + preds[j * 3] = heatmap[index + idx]; + coords[j * 2] = idx % heatmap_width; + coords[j * 2 + 1] = idx / heatmap_width; + + int px = int(coords[j * 2] + 0.5); + int py = int(coords[j * 2 + 1] + 0.5); + + if (DARK && px > 1 && px < heatmap_width - 2 && py > 1 && + py < heatmap_height - 2) { + dark_parse(heatmap, dim, coords, px, py, index, j); + } else { + if (px > 0 && px < heatmap_width - 1) { + float diff_x = heatmap[index + py * dim[3] + px + 1] - + heatmap[index + py * dim[3] + px - 1]; + coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25; + } + if (py > 0 && py < heatmap_height - 1) { + float diff_y = heatmap[index + (py + 1) * dim[3] + px] - + heatmap[index + (py - 1) * dim[3] + px]; + coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25; + } + } + } + + std::vector img_size{heatmap_width, heatmap_height}; + transform_preds(coords, center, scale, img_size, dim, preds); +} + +// Run predictor +KeyPointResult PoseSmooth::smooth_process(KeyPointResult* result) { + KeyPointResult keypoint_smoothed = *result; + if (this->x_prev_hat.num_joints == -1) { + this->x_prev_hat = *result; + this->dx_prev_hat = *result; + std::fill(dx_prev_hat.keypoints.begin(), dx_prev_hat.keypoints.end(), 0.); + return keypoint_smoothed; + } else { + for (int i = 0; i < result->num_joints; i++) { + this->PointSmooth(result, &keypoint_smoothed, this->thresholds, i); + } + return keypoint_smoothed; + } +} + +void PoseSmooth::PointSmooth(KeyPointResult* result, + KeyPointResult* keypoint_smoothed, + std::vector thresholds, + int index) { + float distance = sqrt(pow((result->keypoints[index * 3 + 1] - + this->x_prev_hat.keypoints[index * 3 + 1]) / + this->width, + 2) + + pow((result->keypoints[index * 3 + 2] - + this->x_prev_hat.keypoints[index * 3 + 2]) / + this->height, + 2)); + if (distance < thresholds[index] * this->thres_mult) { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->x_prev_hat.keypoints[index * 3 + 1]; + keypoint_smoothed->keypoints[index * 3 + 2] = + this->x_prev_hat.keypoints[index * 3 + 2]; + } else { + if (this->filter_type == "OneEuro") { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->OneEuroFilter(result->keypoints[index * 3 + 1], + this->x_prev_hat.keypoints[index * 3 + 1], + index * 3 + 1); + keypoint_smoothed->keypoints[index * 3 + 2] = + this->OneEuroFilter(result->keypoints[index * 3 + 2], + this->x_prev_hat.keypoints[index * 3 + 2], + index * 3 + 2); + } else { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->ExpSmoothing(result->keypoints[index * 3 + 1], + this->x_prev_hat.keypoints[index * 3 + 1], + index * 3 + 1); + keypoint_smoothed->keypoints[index * 3 + 2] = + this->ExpSmoothing(result->keypoints[index * 3 + 2], + this->x_prev_hat.keypoints[index * 3 + 2], + index * 3 + 2); + } + } + return; +} + +float PoseSmooth::OneEuroFilter(float x_cur, float x_pre, int loc) { + float te = 1.0; + this->alpha = this->smoothing_factor(te, this->fc_d); + float dx_cur = (x_cur - x_pre) / te; + float dx_cur_hat = + this->ExpSmoothing(dx_cur, this->dx_prev_hat.keypoints[loc]); + + float fc = this->fc_min + this->beta * abs(dx_cur_hat); + this->alpha = this->smoothing_factor(te, fc); + float x_cur_hat = this->ExpSmoothing(x_cur, x_pre); + this->x_prev_hat.keypoints[loc] = x_cur_hat; + this->dx_prev_hat.keypoints[loc] = dx_cur_hat; + return x_cur_hat; +} + +float PoseSmooth::smoothing_factor(float te, float fc) { + float r = 2 * PI * fc * te; + return r / (r + 1); +} + +float PoseSmooth::ExpSmoothing(float x_cur, float x_pre, int loc) { + return this->alpha * x_cur + (1 - this->alpha) * x_pre; +} +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/lapjv.cpp b/PaddleDetection-release-2.6/deploy/cpp/src/lapjv.cpp new file mode 100644 index 0000000000000000000000000000000000000000..e8a7b58d5d86892f6abfeae8bbd058ad26a8d85a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/lapjv.cpp @@ -0,0 +1,405 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/gatagat/lap/blob/master/lap/lapjv.cpp +// Ths copyright of gatagat/lap is as follows: +// MIT License + +#include +#include +#include + +#include "include/lapjv.h" + +namespace PaddleDetection { + +/** Column-reduction and reduction transfer for a dense cost matrix. + */ +int _ccrrt_dense(const int n, float *cost[], + int *free_rows, int *x, int *y, float *v) +{ + int n_free_rows; + bool *unique; + + for (int i = 0; i < n; i++) { + x[i] = -1; + v[i] = LARGE; + y[i] = 0; + } + for (int i = 0; i < n; i++) { + for (int j = 0; j < n; j++) { + const float c = cost[i][j]; + if (c < v[j]) { + v[j] = c; + y[j] = i; + } + } + } + NEW(unique, bool, n); + memset(unique, TRUE, n); + { + int j = n; + do { + j--; + const int i = y[j]; + if (x[i] < 0) { + x[i] = j; + } else { + unique[i] = FALSE; + y[j] = -1; + } + } while (j > 0); + } + n_free_rows = 0; + for (int i = 0; i < n; i++) { + if (x[i] < 0) { + free_rows[n_free_rows++] = i; + } else if (unique[i]) { + const int j = x[i]; + float min = LARGE; + for (int j2 = 0; j2 < n; j2++) { + if (j2 == (int)j) { + continue; + } + const float c = cost[i][j2] - v[j2]; + if (c < min) { + min = c; + } + } + v[j] -= min; + } + } + FREE(unique); + return n_free_rows; +} + + +/** Augmenting row reduction for a dense cost matrix. + */ +int _carr_dense( + const int n, float *cost[], + const int n_free_rows, + int *free_rows, int *x, int *y, float *v) +{ + int current = 0; + int new_free_rows = 0; + int rr_cnt = 0; + while (current < n_free_rows) { + int i0; + int j1, j2; + float v1, v2, v1_new; + bool v1_lowers; + + rr_cnt++; + const int free_i = free_rows[current++]; + j1 = 0; + v1 = cost[free_i][0] - v[0]; + j2 = -1; + v2 = LARGE; + for (int j = 1; j < n; j++) { + const float c = cost[free_i][j] - v[j]; + if (c < v2) { + if (c >= v1) { + v2 = c; + j2 = j; + } else { + v2 = v1; + v1 = c; + j2 = j1; + j1 = j; + } + } + } + i0 = y[j1]; + v1_new = v[j1] - (v2 - v1); + v1_lowers = v1_new < v[j1]; + if (rr_cnt < current * n) { + if (v1_lowers) { + v[j1] = v1_new; + } else if (i0 >= 0 && j2 >= 0) { + j1 = j2; + i0 = y[j2]; + } + if (i0 >= 0) { + if (v1_lowers) { + free_rows[--current] = i0; + } else { + free_rows[new_free_rows++] = i0; + } + } + } else { + if (i0 >= 0) { + free_rows[new_free_rows++] = i0; + } + } + x[free_i] = j1; + y[j1] = free_i; + } + return new_free_rows; +} + + +/** Find columns with minimum d[j] and put them on the SCAN list. + */ +int _find_dense(const int n, int lo, float *d, int *cols, int *y) +{ + int hi = lo + 1; + float mind = d[cols[lo]]; + for (int k = hi; k < n; k++) { + int j = cols[k]; + if (d[j] <= mind) { + if (d[j] < mind) { + hi = lo; + mind = d[j]; + } + cols[k] = cols[hi]; + cols[hi++] = j; + } + } + return hi; +} + + +// Scan all columns in TODO starting from arbitrary column in SCAN +// and try to decrease d of the TODO columns using the SCAN column. +int _scan_dense(const int n, float *cost[], + int *plo, int*phi, + float *d, int *cols, int *pred, + int *y, float *v) +{ + int lo = *plo; + int hi = *phi; + float h, cred_ij; + + while (lo != hi) { + int j = cols[lo++]; + const int i = y[j]; + const float mind = d[j]; + h = cost[i][j] - v[j] - mind; + // For all columns in TODO + for (int k = hi; k < n; k++) { + j = cols[k]; + cred_ij = cost[i][j] - v[j] - h; + if (cred_ij < d[j]) { + d[j] = cred_ij; + pred[j] = i; + if (cred_ij == mind) { + if (y[j] < 0) { + return j; + } + cols[k] = cols[hi]; + cols[hi++] = j; + } + } + } + } + *plo = lo; + *phi = hi; + return -1; +} + + +/** Single iteration of modified Dijkstra shortest path algorithm as explained in the JV paper. + * + * This is a dense matrix version. + * + * \return The closest free column index. + */ +int find_path_dense( + const int n, float *cost[], + const int start_i, + int *y, float *v, + int *pred) +{ + int lo = 0, hi = 0; + int final_j = -1; + int n_ready = 0; + int *cols; + float *d; + + NEW(cols, int, n); + NEW(d, float, n); + + for (int i = 0; i < n; i++) { + cols[i] = i; + pred[i] = start_i; + d[i] = cost[start_i][i] - v[i]; + } + while (final_j == -1) { + // No columns left on the SCAN list. + if (lo == hi) { + n_ready = lo; + hi = _find_dense(n, lo, d, cols, y); + for (int k = lo; k < hi; k++) { + const int j = cols[k]; + if (y[j] < 0) { + final_j = j; + } + } + } + if (final_j == -1) { + final_j = _scan_dense( + n, cost, &lo, &hi, d, cols, pred, y, v); + } + } + + { + const float mind = d[cols[lo]]; + for (int k = 0; k < n_ready; k++) { + const int j = cols[k]; + v[j] += d[j] - mind; + } + } + + FREE(cols); + FREE(d); + + return final_j; +} + + +/** Augment for a dense cost matrix. + */ +int _ca_dense( + const int n, float *cost[], + const int n_free_rows, + int *free_rows, int *x, int *y, float *v) +{ + int *pred; + + NEW(pred, int, n); + + for (int *pfree_i = free_rows; pfree_i < free_rows + n_free_rows; pfree_i++) { + int i = -1, j; + int k = 0; + + j = find_path_dense(n, cost, *pfree_i, y, v, pred); + while (i != *pfree_i) { + i = pred[j]; + y[j] = i; + SWAP_INDICES(j, x[i]); + k++; + } + } + FREE(pred); + return 0; +} + + +/** Solve dense sparse LAP. + */ +int lapjv_internal( + const cv::Mat &cost, const bool extend_cost, const float cost_limit, + int *x, int *y ) { + int n_rows = cost.rows; + int n_cols = cost.cols; + int n; + if (n_rows == n_cols) { + n = n_rows; + } else if (!extend_cost) { + throw std::invalid_argument("Square cost array expected. If cost is intentionally non-square, pass extend_cost=True."); + } + + // Get extend cost + if (extend_cost || cost_limit < LARGE) { + n = n_rows + n_cols; + } + cv::Mat cost_expand(n, n, CV_32F); + float expand_value; + if (cost_limit < LARGE) { + expand_value = cost_limit / 2; + } else { + double max_v; + minMaxLoc(cost, nullptr, &max_v); + expand_value = (float)max_v + 1; + } + + for (int i = 0; i < n; ++i) { + for (int j = 0; j < n; ++j) { + cost_expand.at(i, j) = expand_value; + if (i >= n_rows && j >= n_cols) { + cost_expand.at(i, j) = 0; + } else if (i < n_rows && j < n_cols) { + cost_expand.at(i, j) = cost.at(i, j); + } + } + } + + // Convert Mat to pointer array + float **cost_ptr; + NEW(cost_ptr, float *, n); + for (int i = 0; i < n; ++i) { + NEW(cost_ptr[i], float, n); + } + for (int i = 0; i < n; ++i) { + for (int j = 0; j < n; ++j) { + cost_ptr[i][j] = cost_expand.at(i, j); + } + } + + int ret; + int *free_rows; + float *v; + int *x_c; + int *y_c; + + NEW(free_rows, int, n); + NEW(v, float, n); + NEW(x_c, int, n); + NEW(y_c, int, n); + + ret = _ccrrt_dense(n, cost_ptr, free_rows, x_c, y_c, v); + int i = 0; + while (ret > 0 && i < 2) { + ret = _carr_dense(n, cost_ptr, ret, free_rows, x_c, y_c, v); + i++; + } + if (ret > 0) { + ret = _ca_dense(n, cost_ptr, ret, free_rows, x_c, y_c, v); + } + FREE(v); + FREE(free_rows); + for (int i = 0; i < n; ++i) { + FREE(cost_ptr[i]); + } + FREE(cost_ptr); + if (ret != 0) { + if (ret == -1){ + throw "Out of memory."; + } + throw "Unknown error (lapjv_internal)"; + } + // Get output of x, y, opt + for (int i = 0; i < n; ++i) { + if (i < n_rows) { + x[i] = x_c[i]; + if (x[i] >= n_cols) { + x[i] = -1; + } + } + if (i < n_cols) { + y[i] = y_c[i]; + if (y[i] >= n_rows) { + y[i] = -1; + } + } + } + + FREE(x_c); + FREE(y_c); + return ret; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/main.cc b/PaddleDetection-release-2.6/deploy/cpp/src/main.cc new file mode 100644 index 0000000000000000000000000000000000000000..6912031ba7e887b5d2b8449b026bdab6263ea08b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/main.cc @@ -0,0 +1,428 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef _WIN32 +#include +#include +#elif LINUX +#include +#include +#endif + +#include +#include "include/object_detector.h" + +DEFINE_string(model_dir, "", "Path of inference model"); +DEFINE_string(image_file, "", "Path of input image"); +DEFINE_string(image_dir, + "", + "Dir of input image, `image_file` has a higher priority."); +DEFINE_int32(batch_size, 1, "batch_size"); +DEFINE_string( + video_file, + "", + "Path of input video, `video_file` or `camera_id` has a highest priority."); +DEFINE_int32(camera_id, -1, "Device id of camera to predict"); +DEFINE_bool( + use_gpu, + false, + "Deprecated, please use `--device` to set the device you want to run."); +DEFINE_string(device, + "CPU", + "Choose the device you want to run, it can be: CPU/GPU/XPU, " + "default is CPU."); +DEFINE_double(threshold, 0.5, "Threshold of score."); +DEFINE_string(output_dir, "output", "Directory of output visualization files."); +DEFINE_string(run_mode, + "paddle", + "Mode of running(paddle/trt_fp32/trt_fp16/trt_int8)"); +DEFINE_int32(gpu_id, 0, "Device id of GPU to execute"); +DEFINE_bool(run_benchmark, + false, + "Whether to predict a image_file repeatedly for benchmark"); +DEFINE_bool(use_mkldnn, false, "Whether use mkldnn with CPU"); +DEFINE_int32(cpu_threads, 1, "Num of threads with CPU"); +DEFINE_int32(trt_min_shape, 1, "Min shape of TRT DynamicShapeI"); +DEFINE_int32(trt_max_shape, 1280, "Max shape of TRT DynamicShapeI"); +DEFINE_int32(trt_opt_shape, 640, "Opt shape of TRT DynamicShapeI"); +DEFINE_bool(trt_calib_mode, + false, + "If the model is produced by TRT offline quantitative calibration, " + "trt_calib_mode need to set True"); + +void PrintBenchmarkLog(std::vector det_time, int img_num) { + LOG(INFO) << "----------------------- Config info -----------------------"; + LOG(INFO) << "runtime_device: " << FLAGS_device; + LOG(INFO) << "ir_optim: " + << "True"; + LOG(INFO) << "enable_memory_optim: " + << "True"; + int has_trt = FLAGS_run_mode.find("trt"); + if (has_trt >= 0) { + LOG(INFO) << "enable_tensorrt: " + << "True"; + std::string precision = FLAGS_run_mode.substr(4, 8); + LOG(INFO) << "precision: " << precision; + } else { + LOG(INFO) << "enable_tensorrt: " + << "False"; + LOG(INFO) << "precision: " + << "fp32"; + } + LOG(INFO) << "enable_mkldnn: " << (FLAGS_use_mkldnn ? "True" : "False"); + LOG(INFO) << "cpu_math_library_num_threads: " << FLAGS_cpu_threads; + LOG(INFO) << "----------------------- Data info -----------------------"; + LOG(INFO) << "batch_size: " << FLAGS_batch_size; + LOG(INFO) << "input_shape: " + << "dynamic shape"; + LOG(INFO) << "----------------------- Model info -----------------------"; + FLAGS_model_dir.erase(FLAGS_model_dir.find_last_not_of("/") + 1); + LOG(INFO) << "model_name: " + << FLAGS_model_dir.substr(FLAGS_model_dir.find_last_of('/') + 1); + LOG(INFO) << "----------------------- Perf info ------------------------"; + LOG(INFO) << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0); + LOG(INFO) << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num; +} + +static std::string DirName(const std::string& filepath) { + auto pos = filepath.rfind(OS_PATH_SEP); + if (pos == std::string::npos) { + return ""; + } + return filepath.substr(0, pos); +} + +static bool PathExists(const std::string& path) { +#ifdef _WIN32 + struct _stat buffer; + return (_stat(path.c_str(), &buffer) == 0); +#else + struct stat buffer; + return (stat(path.c_str(), &buffer) == 0); +#endif // !_WIN32 +} + +static void MkDir(const std::string& path) { + if (PathExists(path)) return; + int ret = 0; +#ifdef _WIN32 + ret = _mkdir(path.c_str()); +#else + ret = mkdir(path.c_str(), 0755); +#endif // !_WIN32 + if (ret != 0) { + std::string path_error(path); + path_error += " mkdir failed!"; + throw std::runtime_error(path_error); + } +} + +static void MkDirs(const std::string& path) { + if (path.empty()) return; + if (PathExists(path)) return; + + MkDirs(DirName(path)); + MkDir(path); +} + +void PredictVideo(const std::string& video_path, + PaddleDetection::ObjectDetector* det, + const std::string& output_dir = "output") { + // Open video + cv::VideoCapture capture; + std::string video_out_name = "output.mp4"; + if (FLAGS_camera_id != -1) { + capture.open(FLAGS_camera_id); + } else { + capture.open(video_path.c_str()); + video_out_name = + video_path.substr(video_path.find_last_of(OS_PATH_SEP) + 1); + } + if (!capture.isOpened()) { + printf("can not open video : %s\n", video_path.c_str()); + return; + } + + // Get Video info : resolution, fps, frame count + int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH)); + int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT)); + int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS)); + int video_frame_count = + static_cast(capture.get(CV_CAP_PROP_FRAME_COUNT)); + printf("fps: %d, frame_count: %d\n", video_fps, video_frame_count); + + // Create VideoWriter for output + cv::VideoWriter video_out; + std::string video_out_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + video_out_path += OS_PATH_SEP; + } + video_out_path += video_out_name; + video_out.open(video_out_path.c_str(), + 0x00000021, + video_fps, + cv::Size(video_width, video_height), + true); + if (!video_out.isOpened()) { + printf("create video writer failed!\n"); + return; + } + + std::vector result; + std::vector bbox_num; + std::vector det_times; + auto labels = det->GetLabelList(); + auto colormap = PaddleDetection::GenerateColorMap(labels.size()); + // Capture all frames and do inference + cv::Mat frame; + int frame_id = 1; + bool is_rbox = false; + while (capture.read(frame)) { + if (frame.empty()) { + break; + } + std::vector imgs; + imgs.push_back(frame); + printf("detect frame: %d\n", frame_id); + det->Predict(imgs, FLAGS_threshold, 0, 1, &result, &bbox_num, &det_times); + std::vector out_result; + for (const auto& item : result) { + if (item.confidence < FLAGS_threshold || item.class_id == -1) { + continue; + } + out_result.push_back(item); + if (item.rect.size() > 6) { + is_rbox = true; + printf("class=%d confidence=%.4f rect=[%d %d %d %d %d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3], + item.rect[4], + item.rect[5], + item.rect[6], + item.rect[7]); + } else { + printf("class=%d confidence=%.4f rect=[%d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3]); + } + } + + cv::Mat out_im = PaddleDetection::VisualizeResult( + frame, out_result, labels, colormap, is_rbox); + + video_out.write(out_im); + frame_id += 1; + } + capture.release(); + video_out.release(); +} + +void PredictImage(const std::vector all_img_paths, + const int batch_size, + const double threshold, + const bool run_benchmark, + PaddleDetection::ObjectDetector* det, + const std::string& output_dir = "output") { + std::vector det_t = {0, 0, 0}; + int steps = ceil(float(all_img_paths.size()) / batch_size); + printf("total images = %d, batch_size = %d, total steps = %d\n", + all_img_paths.size(), + batch_size, + steps); + for (int idx = 0; idx < steps; idx++) { + std::vector batch_imgs; + int left_image_cnt = all_img_paths.size() - idx * batch_size; + if (left_image_cnt > batch_size) { + left_image_cnt = batch_size; + } + for (int bs = 0; bs < left_image_cnt; bs++) { + std::string image_file_path = all_img_paths.at(idx * batch_size + bs); + cv::Mat im = cv::imread(image_file_path, 1); + batch_imgs.insert(batch_imgs.end(), im); + } + + // Store all detected result + std::vector result; + std::vector bbox_num; + std::vector det_times; + bool is_rbox = false; + if (run_benchmark) { + det->Predict( + batch_imgs, threshold, 10, 10, &result, &bbox_num, &det_times); + } else { + det->Predict(batch_imgs, threshold, 0, 1, &result, &bbox_num, &det_times); + // get labels and colormap + auto labels = det->GetLabelList(); + auto colormap = PaddleDetection::GenerateColorMap(labels.size()); + + int item_start_idx = 0; + for (int i = 0; i < left_image_cnt; i++) { + cv::Mat im = batch_imgs[i]; + std::vector im_result; + int detect_num = 0; + + for (int j = 0; j < bbox_num[i]; j++) { + PaddleDetection::ObjectResult item = result[item_start_idx + j]; + if (item.confidence < threshold || item.class_id == -1) { + continue; + } + detect_num += 1; + im_result.push_back(item); + if (item.rect.size() > 6) { + is_rbox = true; + printf("class=%d confidence=%.4f rect=[%d %d %d %d %d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3], + item.rect[4], + item.rect[5], + item.rect[6], + item.rect[7]); + } else { + printf("class=%d confidence=%.4f rect=[%d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3]); + } + } + std::cout << all_img_paths.at(idx * batch_size + i) + << " The number of detected box: " << detect_num << std::endl; + item_start_idx = item_start_idx + bbox_num[i]; + // Visualization result + cv::Mat vis_img = PaddleDetection::VisualizeResult( + im, im_result, labels, colormap, is_rbox); + std::vector compression_params; + compression_params.push_back(CV_IMWRITE_JPEG_QUALITY); + compression_params.push_back(95); + std::string output_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + output_path += OS_PATH_SEP; + } + std::string image_file_path = all_img_paths.at(idx * batch_size + i); + output_path += + image_file_path.substr(image_file_path.find_last_of('/') + 1); + cv::imwrite(output_path, vis_img, compression_params); + printf("Visualized output saved as %s\n", output_path.c_str()); + } + } + det_t[0] += det_times[0]; + det_t[1] += det_times[1]; + det_t[2] += det_times[2]; + det_times.clear(); + } + PrintBenchmarkLog(det_t, all_img_paths.size()); +} + +int main(int argc, char** argv) { + // Parsing command-line + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir.empty() || + (FLAGS_image_file.empty() && FLAGS_image_dir.empty() && + FLAGS_video_file.empty())) { + std::cout << "Usage: ./main --model_dir=/PATH/TO/INFERENCE_MODEL/ " + << "--image_file=/PATH/TO/INPUT/IMAGE/" << std::endl; + return -1; + } + if (!(FLAGS_run_mode == "paddle" || FLAGS_run_mode == "trt_fp32" || + FLAGS_run_mode == "trt_fp16" || FLAGS_run_mode == "trt_int8")) { + std::cout + << "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or 'trt_int8'."; + return -1; + } + transform(FLAGS_device.begin(), + FLAGS_device.end(), + FLAGS_device.begin(), + ::toupper); + if (!(FLAGS_device == "CPU" || FLAGS_device == "GPU" || + FLAGS_device == "XPU")) { + std::cout << "device should be 'CPU', 'GPU' or 'XPU'."; + return -1; + } + if (FLAGS_use_gpu) { + std::cout << "Deprecated, please use `--device` to set the device you want " + "to run."; + return -1; + } + // Load model and create a object detector + PaddleDetection::ObjectDetector det(FLAGS_model_dir, + FLAGS_device, + FLAGS_use_mkldnn, + FLAGS_cpu_threads, + FLAGS_run_mode, + FLAGS_batch_size, + FLAGS_gpu_id, + FLAGS_trt_min_shape, + FLAGS_trt_max_shape, + FLAGS_trt_opt_shape, + FLAGS_trt_calib_mode); + // Do inference on input video or image + if (!PathExists(FLAGS_output_dir)) { + MkDirs(FLAGS_output_dir); + } + if (!FLAGS_video_file.empty() || FLAGS_camera_id != -1) { + PredictVideo(FLAGS_video_file, &det, FLAGS_output_dir); + } else if (!FLAGS_image_file.empty() || !FLAGS_image_dir.empty()) { + std::vector all_img_paths; + std::vector cv_all_img_paths; + if (!FLAGS_image_file.empty()) { + all_img_paths.push_back(FLAGS_image_file); + if (FLAGS_batch_size > 1) { + std::cout << "batch_size should be 1, when set `image_file`." + << std::endl; + return -1; + } + } else { + cv::glob(FLAGS_image_dir, cv_all_img_paths); + for (const auto& img_path : cv_all_img_paths) { + all_img_paths.push_back(img_path); + } + } + PredictImage(all_img_paths, + FLAGS_batch_size, + FLAGS_threshold, + FLAGS_run_benchmark, + &det, + FLAGS_output_dir); + } + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/main_jde.cc b/PaddleDetection-release-2.6/deploy/cpp/src/main_jde.cc new file mode 100644 index 0000000000000000000000000000000000000000..3bba98dd4c9b6b4cd01cd44d38c564dc6c8d82dc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/main_jde.cc @@ -0,0 +1,269 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef _WIN32 +#include +#include +#elif LINUX +#include +#include +#endif + +#include +#include +#include "include/jde_detector.h" +#include "include/object_detector.h" + +DEFINE_string(model_dir, "", "Path of inference model"); +DEFINE_int32(batch_size, 1, "batch_size"); +DEFINE_string( + video_file, + "", + "Path of input video, `video_file` or `camera_id` has a highest priority."); +DEFINE_int32(camera_id, -1, "Device id of camera to predict"); +DEFINE_bool( + use_gpu, + false, + "Deprecated, please use `--device` to set the device you want to run."); +DEFINE_string(device, + "CPU", + "Choose the device you want to run, it can be: CPU/GPU/XPU, " + "default is CPU."); +DEFINE_double(threshold, 0.5, "Threshold of score."); +DEFINE_string(output_dir, "output", "Directory of output visualization files."); +DEFINE_string(run_mode, + "paddle", + "Mode of running(paddle/trt_fp32/trt_fp16/trt_int8)"); +DEFINE_int32(gpu_id, 0, "Device id of GPU to execute"); +DEFINE_bool(run_benchmark, + false, + "Whether to predict a image_file repeatedly for benchmark"); +DEFINE_bool(use_mkldnn, false, "Whether use mkldnn with CPU"); +DEFINE_int32(cpu_threads, 1, "Num of threads with CPU"); +DEFINE_int32(trt_min_shape, 1, "Min shape of TRT DynamicShapeI"); +DEFINE_int32(trt_max_shape, 1280, "Max shape of TRT DynamicShapeI"); +DEFINE_int32(trt_opt_shape, 640, "Opt shape of TRT DynamicShapeI"); +DEFINE_bool(trt_calib_mode, + false, + "If the model is produced by TRT offline quantitative calibration, " + "trt_calib_mode need to set True"); + +void PrintBenchmarkLog(std::vector det_time, int img_num) { + LOG(INFO) << "----------------------- Config info -----------------------"; + LOG(INFO) << "runtime_device: " << FLAGS_device; + LOG(INFO) << "ir_optim: " + << "True"; + LOG(INFO) << "enable_memory_optim: " + << "True"; + int has_trt = FLAGS_run_mode.find("trt"); + if (has_trt >= 0) { + LOG(INFO) << "enable_tensorrt: " + << "True"; + std::string precision = FLAGS_run_mode.substr(4, 8); + LOG(INFO) << "precision: " << precision; + } else { + LOG(INFO) << "enable_tensorrt: " + << "False"; + LOG(INFO) << "precision: " + << "fp32"; + } + LOG(INFO) << "enable_mkldnn: " << (FLAGS_use_mkldnn ? "True" : "False"); + LOG(INFO) << "cpu_math_library_num_threads: " << FLAGS_cpu_threads; + LOG(INFO) << "----------------------- Data info -----------------------"; + LOG(INFO) << "batch_size: " << FLAGS_batch_size; + LOG(INFO) << "input_shape: " + << "dynamic shape"; + LOG(INFO) << "----------------------- Model info -----------------------"; + FLAGS_model_dir.erase(FLAGS_model_dir.find_last_not_of("/") + 1); + LOG(INFO) << "model_name: " + << FLAGS_model_dir.substr(FLAGS_model_dir.find_last_of('/') + 1); + LOG(INFO) << "----------------------- Perf info ------------------------"; + LOG(INFO) << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0); + LOG(INFO) << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num; +} + +static std::string DirName(const std::string& filepath) { + auto pos = filepath.rfind(OS_PATH_SEP); + if (pos == std::string::npos) { + return ""; + } + return filepath.substr(0, pos); +} + +static bool PathExists(const std::string& path) { +#ifdef _WIN32 + struct _stat buffer; + return (_stat(path.c_str(), &buffer) == 0); +#else + struct stat buffer; + return (stat(path.c_str(), &buffer) == 0); +#endif // !_WIN32 +} + +static void MkDir(const std::string& path) { + if (PathExists(path)) return; + int ret = 0; +#ifdef _WIN32 + ret = _mkdir(path.c_str()); +#else + ret = mkdir(path.c_str(), 0755); +#endif // !_WIN32 + if (ret != 0) { + std::string path_error(path); + path_error += " mkdir failed!"; + throw std::runtime_error(path_error); + } +} + +static void MkDirs(const std::string& path) { + if (path.empty()) return; + if (PathExists(path)) return; + + MkDirs(DirName(path)); + MkDir(path); +} + +void PredictVideo(const std::string& video_path, + PaddleDetection::JDEDetector* mot, + const std::string& output_dir = "output") { + // Open video + cv::VideoCapture capture; + std::string video_out_name = "output.mp4"; + if (FLAGS_camera_id != -1) { + capture.open(FLAGS_camera_id); + } else { + capture.open(video_path.c_str()); + video_out_name = + video_path.substr(video_path.find_last_of(OS_PATH_SEP) + 1); + } + if (!capture.isOpened()) { + printf("can not open video : %s\n", video_path.c_str()); + return; + } + + // Get Video info : resolution, fps, frame count + int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH)); + int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT)); + int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS)); + int video_frame_count = + static_cast(capture.get(CV_CAP_PROP_FRAME_COUNT)); + printf("fps: %d, frame_count: %d\n", video_fps, video_frame_count); + + // Create VideoWriter for output + cv::VideoWriter video_out; + std::string video_out_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + video_out_path += OS_PATH_SEP; + } + video_out_path += video_out_name; + video_out.open(video_out_path.c_str(), + 0x00000021, + video_fps, + cv::Size(video_width, video_height), + true); + if (!video_out.isOpened()) { + printf("create video writer failed!\n"); + return; + } + + PaddleDetection::MOT_Result result; + std::vector det_times(3); + double times; + // Capture all frames and do inference + cv::Mat frame; + int frame_id = 1; + while (capture.read(frame)) { + if (frame.empty()) { + break; + } + std::vector imgs; + imgs.push_back(frame); + printf("detect frame: %d\n", frame_id); + mot->Predict(imgs, FLAGS_threshold, 0, 1, &result, &det_times); + frame_id += 1; + times = std::accumulate(det_times.begin(), det_times.end(), 0) / frame_id; + + cv::Mat out_im = PaddleDetection::VisualizeTrackResult( + frame, result, 1000. / times, frame_id); + + video_out.write(out_im); + } + capture.release(); + video_out.release(); + PrintBenchmarkLog(det_times, frame_id); + printf("Visualized output saved as %s\n", video_out_path.c_str()); +} + +int main(int argc, char** argv) { + // Parsing command-line + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir.empty() || FLAGS_video_file.empty()) { + std::cout << "Usage: ./main --model_dir=/PATH/TO/INFERENCE_MODEL/ " + << "--video_file=/PATH/TO/INPUT/VIDEO/" << std::endl; + return -1; + } + if (!(FLAGS_run_mode == "paddle" || FLAGS_run_mode == "trt_fp32" || + FLAGS_run_mode == "trt_fp16" || FLAGS_run_mode == "trt_int8")) { + std::cout + << "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or 'trt_int8'."; + return -1; + } + transform(FLAGS_device.begin(), + FLAGS_device.end(), + FLAGS_device.begin(), + ::toupper); + if (!(FLAGS_device == "CPU" || FLAGS_device == "GPU" || + FLAGS_device == "XPU")) { + std::cout << "device should be 'CPU', 'GPU' or 'XPU'."; + return -1; + } + if (FLAGS_use_gpu) { + std::cout << "Deprecated, please use `--device` to set the device you want " + "to run."; + return -1; + } + + // Do inference on input video or image + PaddleDetection::JDEDetector mot(FLAGS_model_dir, + FLAGS_device, + FLAGS_use_mkldnn, + FLAGS_cpu_threads, + FLAGS_run_mode, + FLAGS_batch_size, + FLAGS_gpu_id, + FLAGS_trt_min_shape, + FLAGS_trt_max_shape, + FLAGS_trt_opt_shape, + FLAGS_trt_calib_mode); + if (!PathExists(FLAGS_output_dir)) { + MkDirs(FLAGS_output_dir); + } + PredictVideo(FLAGS_video_file, &mot, FLAGS_output_dir); + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/main_keypoint.cc b/PaddleDetection-release-2.6/deploy/cpp/src/main_keypoint.cc new file mode 100644 index 0000000000000000000000000000000000000000..ab6555367f64b0f13f4707a2367754c4da61f392 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/main_keypoint.cc @@ -0,0 +1,598 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef _WIN32 +#include +#include +#elif LINUX +#include +#endif + +#include +#include "include/keypoint_detector.h" +#include "include/object_detector.h" +#include "include/preprocess_op.h" + +DEFINE_string(model_dir, "", "Path of object detector inference model"); +DEFINE_string(model_dir_keypoint, + "", + "Path of keypoint detector inference model"); +DEFINE_string(image_file, "", "Path of input image"); +DEFINE_string(image_dir, + "", + "Dir of input image, `image_file` has a higher priority."); +DEFINE_int32(batch_size, 1, "batch_size of object detector"); +DEFINE_int32(batch_size_keypoint, 8, "batch_size of keypoint detector"); +DEFINE_string( + video_file, + "", + "Path of input video, `video_file` or `camera_id` has a highest priority."); +DEFINE_int32(camera_id, -1, "Device id of camera to predict"); +DEFINE_bool( + use_gpu, + false, + "Deprecated, please use `--device` to set the device you want to run."); +DEFINE_string(device, + "CPU", + "Choose the device you want to run, it can be: CPU/GPU/XPU, " + "default is CPU."); +DEFINE_double(threshold, 0.5, "Threshold of score."); +DEFINE_double(threshold_keypoint, 0.5, "Threshold of score."); +DEFINE_string(output_dir, "output", "Directory of output visualization files."); +DEFINE_string(run_mode, + "paddle", + "Mode of running(paddle/trt_fp32/trt_fp16/trt_int8)"); +DEFINE_int32(gpu_id, 0, "Device id of GPU to execute"); +DEFINE_bool(run_benchmark, + false, + "Whether to predict a image_file repeatedly for benchmark"); +DEFINE_bool(use_mkldnn, false, "Whether use mkldnn with CPU"); +DEFINE_int32(cpu_threads, 1, "Num of threads with CPU"); +DEFINE_int32(trt_min_shape, 1, "Min shape of TRT DynamicShapeI"); +DEFINE_int32(trt_max_shape, 1280, "Max shape of TRT DynamicShapeI"); +DEFINE_int32(trt_opt_shape, 640, "Opt shape of TRT DynamicShapeI"); +DEFINE_bool(trt_calib_mode, + false, + "If the model is produced by TRT offline quantitative calibration, " + "trt_calib_mode need to set True"); +DEFINE_bool(use_dark, true, "Whether use dark decode in keypoint postprocess"); + +void PrintBenchmarkLog(std::vector det_time, int img_num) { + LOG(INFO) << "----------------------- Config info -----------------------"; + LOG(INFO) << "runtime_device: " << FLAGS_device; + LOG(INFO) << "ir_optim: " + << "True"; + LOG(INFO) << "enable_memory_optim: " + << "True"; + int has_trt = FLAGS_run_mode.find("trt"); + if (has_trt >= 0) { + LOG(INFO) << "enable_tensorrt: " + << "True"; + std::string precision = FLAGS_run_mode.substr(4, 8); + LOG(INFO) << "precision: " << precision; + } else { + LOG(INFO) << "enable_tensorrt: " + << "False"; + LOG(INFO) << "precision: " + << "fp32"; + } + LOG(INFO) << "enable_mkldnn: " << (FLAGS_use_mkldnn ? "True" : "False"); + LOG(INFO) << "cpu_math_library_num_threads: " << FLAGS_cpu_threads; + LOG(INFO) << "----------------------- Data info -----------------------"; + LOG(INFO) << "batch_size: " << FLAGS_batch_size; + LOG(INFO) << "input_shape: " + << "dynamic shape"; + LOG(INFO) << "----------------------- Model info -----------------------"; + FLAGS_model_dir.erase(FLAGS_model_dir.find_last_not_of(OS_PATH_SEP) + 1); + LOG(INFO) << "model_name: " << FLAGS_model_dir; + LOG(INFO) << "----------------------- Perf info ------------------------"; + LOG(INFO) << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0.); + img_num = std::max(1, img_num); + LOG(INFO) << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num; +} + +void PrintKptsBenchmarkLog(std::vector det_time, int img_num) { + LOG(INFO) << "----------------------- Data info -----------------------"; + LOG(INFO) << "batch_size_keypoint: " << FLAGS_batch_size_keypoint; + LOG(INFO) << "----------------------- Model info -----------------------"; + FLAGS_model_dir_keypoint.erase( + FLAGS_model_dir_keypoint.find_last_not_of(OS_PATH_SEP) + 1); + LOG(INFO) << "keypoint_model_name: " << FLAGS_model_dir_keypoint; + LOG(INFO) << "----------------------- Perf info ------------------------"; + LOG(INFO) << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0.); + img_num = std::max(1, img_num); + LOG(INFO) << "Average time cost per person:"; + LOG(INFO) << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num; +} + +static std::string DirName(const std::string& filepath) { + auto pos = filepath.rfind(OS_PATH_SEP); + if (pos == std::string::npos) { + return ""; + } + return filepath.substr(0, pos); +} + +static bool PathExists(const std::string& path) { +#ifdef _WIN32 + struct _stat buffer; + return (_stat(path.c_str(), &buffer) == 0); +#else + struct stat buffer; + return (stat(path.c_str(), &buffer) == 0); +#endif // !_WIN32 +} + +static void MkDir(const std::string& path) { + if (PathExists(path)) return; + int ret = 0; +#ifdef _WIN32 + ret = _mkdir(path.c_str()); +#else + ret = mkdir(path.c_str(), 0755); +#endif // !_WIN32 + if (ret != 0) { + std::string path_error(path); + path_error += " mkdir failed!"; + throw std::runtime_error(path_error); + } +} + +static void MkDirs(const std::string& path) { + if (path.empty()) return; + if (PathExists(path)) return; + + MkDirs(DirName(path)); + MkDir(path); +} + +void PredictVideo(const std::string& video_path, + PaddleDetection::ObjectDetector* det, + PaddleDetection::KeyPointDetector* keypoint, + const std::string& output_dir = "output") { + // Open video + cv::VideoCapture capture; + std::string video_out_name = "output.mp4"; + if (FLAGS_camera_id != -1) { + capture.open(FLAGS_camera_id); + } else { + capture.open(video_path.c_str()); + video_out_name = + video_path.substr(video_path.find_last_of(OS_PATH_SEP) + 1); + } + if (!capture.isOpened()) { + printf("can not open video : %s\n", video_path.c_str()); + return; + } + + // Get Video info : resolution, fps, frame count + int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH)); + int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT)); + int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS)); + int video_frame_count = + static_cast(capture.get(CV_CAP_PROP_FRAME_COUNT)); + printf("fps: %d, frame_count: %d\n", video_fps, video_frame_count); + + // Create VideoWriter for output + cv::VideoWriter video_out; + std::string video_out_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + video_out_path += OS_PATH_SEP; + } + video_out_path += video_out_name; + video_out.open(video_out_path.c_str(), + 0x00000021, + video_fps, + cv::Size(video_width, video_height), + true); + if (!video_out.isOpened()) { + printf("create video writer failed!\n"); + return; + } + PaddleDetection::PoseSmooth smoother = + PaddleDetection::PoseSmooth(video_width, video_height); + + std::vector result; + std::vector bbox_num; + std::vector det_times; + auto labels = det->GetLabelList(); + auto colormap = PaddleDetection::GenerateColorMap(labels.size()); + + // Store keypoint results + std::vector result_kpts; + std::vector imgs_kpts; + std::vector> center_bs; + std::vector> scale_bs; + std::vector colormap_kpts = PaddleDetection::GenerateColorMap(20); + // Capture all frames and do inference + cv::Mat frame; + int frame_id = 1; + bool is_rbox = false; + while (capture.read(frame)) { + if (frame.empty()) { + break; + } + std::vector imgs; + imgs.push_back(frame); + printf("detect frame: %d\n", frame_id); + det->Predict(imgs, FLAGS_threshold, 0, 1, &result, &bbox_num, &det_times); + std::vector out_result; + for (const auto& item : result) { + if (item.confidence < FLAGS_threshold || item.class_id == -1) { + continue; + } + out_result.push_back(item); + if (item.rect.size() > 6) { + is_rbox = true; + printf("class=%d confidence=%.4f rect=[%d %d %d %d %d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3], + item.rect[4], + item.rect[5], + item.rect[6], + item.rect[7]); + } else { + printf("class=%d confidence=%.4f rect=[%d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3]); + } + } + + if (keypoint) { + result_kpts.clear(); + int imsize = out_result.size(); + for (int i = 0; i < imsize; i++) { + auto item = out_result[i]; + cv::Mat crop_img; + std::vector keypoint_times; + std::vector rect = { + item.rect[0], item.rect[1], item.rect[2], item.rect[3]}; + std::vector center; + std::vector scale; + if (item.class_id == 0) { + PaddleDetection::CropImg(frame, crop_img, rect, center, scale); + center_bs.emplace_back(center); + scale_bs.emplace_back(scale); + imgs_kpts.emplace_back(crop_img); + } + + if (imgs_kpts.size() == FLAGS_batch_size_keypoint || + ((i == imsize - 1) && !imgs_kpts.empty())) { + keypoint->Predict(imgs_kpts, + center_bs, + scale_bs, + FLAGS_threshold, + 0, + 1, + &result_kpts, + &keypoint_times); + imgs_kpts.clear(); + center_bs.clear(); + scale_bs.clear(); + } + } + + if (result_kpts.size() == 1) { + for (int i = 0; i < result_kpts.size(); i++) { + result_kpts[i] = smoother.smooth_process(&(result_kpts[i])); + } + } + + cv::Mat out_im = VisualizeKptsResult(frame, result_kpts, colormap_kpts); + video_out.write(out_im); + } else { + // Visualization result + cv::Mat out_im = PaddleDetection::VisualizeResult( + frame, out_result, labels, colormap, is_rbox); + video_out.write(out_im); + } + + frame_id += 1; + } + capture.release(); + video_out.release(); +} + +void PredictImage(const std::vector all_img_paths, + const int batch_size, + const double threshold, + const bool run_benchmark, + PaddleDetection::ObjectDetector* det, + PaddleDetection::KeyPointDetector* keypoint, + const std::string& output_dir = "output") { + std::vector det_t = {0, 0, 0}; + int steps = ceil(static_cast(all_img_paths.size()) / batch_size); + int kpts_imgs = 0; + std::vector keypoint_t = {0, 0, 0}; + printf("total images = %d, batch_size = %d, total steps = %d\n", + all_img_paths.size(), + batch_size, + steps); + for (int idx = 0; idx < steps; idx++) { + std::vector batch_imgs; + int left_image_cnt = all_img_paths.size() - idx * batch_size; + if (left_image_cnt > batch_size) { + left_image_cnt = batch_size; + } + for (int bs = 0; bs < left_image_cnt; bs++) { + std::string image_file_path = all_img_paths.at(idx * batch_size + bs); + cv::Mat im = cv::imread(image_file_path, 1); + batch_imgs.insert(batch_imgs.end(), im); + } + + // Store all detected result + std::vector result; + std::vector bbox_num; + std::vector det_times; + + // Store keypoint results + std::vector result_kpts; + std::vector imgs_kpts; + std::vector> center_bs; + std::vector> scale_bs; + std::vector colormap_kpts = PaddleDetection::GenerateColorMap(20); + + bool is_rbox = false; + if (run_benchmark) { + det->Predict( + batch_imgs, threshold, 10, 10, &result, &bbox_num, &det_times); + } else { + det->Predict(batch_imgs, threshold, 0, 1, &result, &bbox_num, &det_times); + } + // get labels and colormap + auto labels = det->GetLabelList(); + auto colormap = PaddleDetection::GenerateColorMap(labels.size()); + int item_start_idx = 0; + for (int i = 0; i < left_image_cnt; i++) { + cv::Mat im = batch_imgs[i]; + std::vector im_result; + int detect_num = 0; + for (int j = 0; j < bbox_num[i]; j++) { + PaddleDetection::ObjectResult item = result[item_start_idx + j]; + if (item.confidence < threshold || item.class_id == -1) { + continue; + } + detect_num += 1; + im_result.push_back(item); + if (item.rect.size() > 6) { + is_rbox = true; + printf("class=%d confidence=%.4f rect=[%d %d %d %d %d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3], + item.rect[4], + item.rect[5], + item.rect[6], + item.rect[7]); + } else { + printf("class=%d confidence=%.4f rect=[%d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3]); + } + } + std::cout << all_img_paths.at(idx * batch_size + i) + << " The number of detected box: " << detect_num << std::endl; + item_start_idx = item_start_idx + bbox_num[i]; + + std::vector compression_params; + compression_params.push_back(CV_IMWRITE_JPEG_QUALITY); + compression_params.push_back(95); + std::string output_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + output_path += OS_PATH_SEP; + } + std::string image_file_path = all_img_paths.at(idx * batch_size + i); + if (keypoint) { + int imsize = im_result.size(); + for (int i = 0; i < imsize; i++) { + auto item = im_result[i]; + cv::Mat crop_img; + std::vector keypoint_times; + std::vector rect = { + item.rect[0], item.rect[1], item.rect[2], item.rect[3]}; + std::vector center; + std::vector scale; + if (item.class_id == 0) { + PaddleDetection::CropImg(im, crop_img, rect, center, scale); + center_bs.emplace_back(center); + scale_bs.emplace_back(scale); + imgs_kpts.emplace_back(crop_img); + kpts_imgs += 1; + } + + if (imgs_kpts.size() == FLAGS_batch_size_keypoint || + ((i == imsize - 1) && !imgs_kpts.empty())) { + if (run_benchmark) { + keypoint->Predict(imgs_kpts, + center_bs, + scale_bs, + 0.5, + 10, + 10, + &result_kpts, + &keypoint_times); + } else { + keypoint->Predict(imgs_kpts, + center_bs, + scale_bs, + 0.5, + 0, + 1, + &result_kpts, + &keypoint_times); + } + imgs_kpts.clear(); + center_bs.clear(); + scale_bs.clear(); + keypoint_t[0] += keypoint_times[0]; + keypoint_t[1] += keypoint_times[1]; + keypoint_t[2] += keypoint_times[2]; + } + } + std::string kpts_savepath = + output_path + "keypoint_" + + image_file_path.substr(image_file_path.find_last_of(OS_PATH_SEP) + 1); + cv::Mat kpts_vis_img = + VisualizeKptsResult(im, result_kpts, colormap_kpts); + cv::imwrite(kpts_savepath, kpts_vis_img, compression_params); + printf("Visualized output saved as %s\n", kpts_savepath.c_str()); + } else { + // Visualization result + cv::Mat vis_img = PaddleDetection::VisualizeResult( + im, im_result, labels, colormap, is_rbox); + std::string det_savepath = + output_path + + image_file_path.substr(image_file_path.find_last_of(OS_PATH_SEP) + 1); + cv::imwrite(det_savepath, vis_img, compression_params); + printf("Visualized output saved as %s\n", det_savepath.c_str()); + } + } + + det_t[0] += det_times[0]; + det_t[1] += det_times[1]; + det_t[2] += det_times[2]; + } + PrintBenchmarkLog(det_t, all_img_paths.size()); + if (keypoint) { + PrintKptsBenchmarkLog(keypoint_t, kpts_imgs); + } +} + +int main(int argc, char** argv) { + // Parsing command-line + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir.empty() || + (FLAGS_image_file.empty() && FLAGS_image_dir.empty() && + FLAGS_video_file.empty())) { + std::cout << "Usage: ./main --model_dir=/PATH/TO/INFERENCE_MODEL/ " + "(--model_dir_keypoint=/PATH/TO/INFERENCE_MODEL/)" + << "--image_file=/PATH/TO/INPUT/IMAGE/" << std::endl; + return -1; + } + if (!(FLAGS_run_mode == "paddle" || FLAGS_run_mode == "trt_fp32" || + FLAGS_run_mode == "trt_fp16" || FLAGS_run_mode == "trt_int8")) { + std::cout + << "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or 'trt_int8'."; + return -1; + } + transform(FLAGS_device.begin(), + FLAGS_device.end(), + FLAGS_device.begin(), + ::toupper); + if (!(FLAGS_device == "CPU" || FLAGS_device == "GPU" || + FLAGS_device == "XPU")) { + std::cout << "device should be 'CPU', 'GPU' or 'XPU'."; + return -1; + } + if (FLAGS_use_gpu) { + std::cout << "Deprecated, please use `--device` to set the device you want " + "to run."; + return -1; + } + // Load model and create a object detector + PaddleDetection::ObjectDetector det(FLAGS_model_dir, + FLAGS_device, + FLAGS_use_mkldnn, + FLAGS_cpu_threads, + FLAGS_run_mode, + FLAGS_batch_size, + FLAGS_gpu_id, + FLAGS_trt_min_shape, + FLAGS_trt_max_shape, + FLAGS_trt_opt_shape, + FLAGS_trt_calib_mode); + + PaddleDetection::KeyPointDetector* keypoint = nullptr; + if (!FLAGS_model_dir_keypoint.empty()) { + keypoint = new PaddleDetection::KeyPointDetector(FLAGS_model_dir_keypoint, + FLAGS_device, + FLAGS_use_mkldnn, + FLAGS_cpu_threads, + FLAGS_run_mode, + FLAGS_batch_size_keypoint, + FLAGS_gpu_id, + FLAGS_trt_min_shape, + FLAGS_trt_max_shape, + FLAGS_trt_opt_shape, + FLAGS_trt_calib_mode, + FLAGS_use_dark); + } + // Do inference on input video or image + if (!PathExists(FLAGS_output_dir)) { + MkDirs(FLAGS_output_dir); + } + if (!FLAGS_video_file.empty() || FLAGS_camera_id != -1) { + PredictVideo(FLAGS_video_file, &det, keypoint, FLAGS_output_dir); + } else if (!FLAGS_image_file.empty() || !FLAGS_image_dir.empty()) { + std::vector all_img_paths; + std::vector cv_all_img_paths; + if (!FLAGS_image_file.empty()) { + all_img_paths.push_back(FLAGS_image_file); + if (FLAGS_batch_size > 1) { + std::cout << "batch_size should be 1, when set `image_file`." + << std::endl; + return -1; + } + } else { + cv::glob(FLAGS_image_dir, cv_all_img_paths); + for (const auto& img_path : cv_all_img_paths) { + all_img_paths.push_back(img_path); + } + } + PredictImage(all_img_paths, + FLAGS_batch_size, + FLAGS_threshold, + FLAGS_run_benchmark, + &det, + keypoint, + FLAGS_output_dir); + } + delete keypoint; + keypoint = nullptr; + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/object_detector.cc b/PaddleDetection-release-2.6/deploy/cpp/src/object_detector.cc new file mode 100644 index 0000000000000000000000000000000000000000..d4f2ceb5d7c07142e51e2b0008148e5d90b55adc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/object_detector.cc @@ -0,0 +1,592 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include + +#include "include/object_detector.h" + +namespace PaddleDetection { + +// Load Model and create model predictor +void ObjectDetector::LoadModel(const std::string &model_dir, + const int batch_size, + const std::string &run_mode) { + paddle_infer::Config config; + std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel"; + std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams"; + config.SetModel(prog_file, params_file); + if (this->device_ == "GPU") { + config.EnableUseGpu(200, this->gpu_id_); + config.SwitchIrOptim(true); + // use tensorrt + if (run_mode != "paddle") { + auto precision = paddle_infer::Config::Precision::kFloat32; + if (run_mode == "trt_fp32") { + precision = paddle_infer::Config::Precision::kFloat32; + } else if (run_mode == "trt_fp16") { + precision = paddle_infer::Config::Precision::kHalf; + } else if (run_mode == "trt_int8") { + precision = paddle_infer::Config::Precision::kInt8; + } else { + printf("run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " + "'trt_int8'"); + } + // set tensorrt + config.EnableTensorRtEngine(1 << 30, batch_size, this->min_subgraph_size_, + precision, false, this->trt_calib_mode_); + + // set use dynamic shape + if (this->use_dynamic_shape_) { + // set DynamicShape for image tensor + const std::vector min_input_shape = { + batch_size, 3, this->trt_min_shape_, this->trt_min_shape_}; + const std::vector max_input_shape = { + batch_size, 3, this->trt_max_shape_, this->trt_max_shape_}; + const std::vector opt_input_shape = { + batch_size, 3, this->trt_opt_shape_, this->trt_opt_shape_}; + const std::map> map_min_input_shape = { + {"image", min_input_shape}}; + const std::map> map_max_input_shape = { + {"image", max_input_shape}}; + const std::map> map_opt_input_shape = { + {"image", opt_input_shape}}; + + config.SetTRTDynamicShapeInfo(map_min_input_shape, map_max_input_shape, + map_opt_input_shape); + std::cout << "TensorRT dynamic shape enabled" << std::endl; + } + } + + } else if (this->device_ == "XPU") { + config.EnableXpu(10 * 1024 * 1024); + } else { + config.DisableGpu(); + if (this->use_mkldnn_) { + config.EnableMKLDNN(); + // cache 10 different shapes for mkldnn to avoid memory leak + config.SetMkldnnCacheCapacity(10); + } + config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); + } + config.SwitchUseFeedFetchOps(false); + config.SwitchIrOptim(true); + config.DisableGlogInfo(); + // Memory optimization + config.EnableMemoryOptim(); + predictor_ = std::move(CreatePredictor(config)); +} + +// Visualiztion MaskDetector results +cv::Mat +VisualizeResult(const cv::Mat &img, + const std::vector &results, + const std::vector &lables, + const std::vector &colormap, const bool is_rbox = false) { + cv::Mat vis_img = img.clone(); + int img_h = vis_img.rows; + int img_w = vis_img.cols; + for (int i = 0; i < results.size(); ++i) { + // Configure color and text size + std::ostringstream oss; + oss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + oss << lables[results[i].class_id] << " "; + oss << results[i].confidence; + std::string text = oss.str(); + int c1 = colormap[3 * results[i].class_id + 0]; + int c2 = colormap[3 * results[i].class_id + 1]; + int c3 = colormap[3 * results[i].class_id + 2]; + cv::Scalar roi_color = cv::Scalar(c1, c2, c3); + int font_face = cv::FONT_HERSHEY_COMPLEX_SMALL; + double font_scale = 0.5f; + float thickness = 0.5; + cv::Size text_size = + cv::getTextSize(text, font_face, font_scale, thickness, nullptr); + cv::Point origin; + + if (is_rbox) { + // Draw object, text, and background + for (int k = 0; k < 4; k++) { + cv::Point pt1 = cv::Point(results[i].rect[(k * 2) % 8], + results[i].rect[(k * 2 + 1) % 8]); + cv::Point pt2 = cv::Point(results[i].rect[(k * 2 + 2) % 8], + results[i].rect[(k * 2 + 3) % 8]); + cv::line(vis_img, pt1, pt2, roi_color, 2); + } + } else { + int w = results[i].rect[2] - results[i].rect[0]; + int h = results[i].rect[3] - results[i].rect[1]; + cv::Rect roi = cv::Rect(results[i].rect[0], results[i].rect[1], w, h); + // Draw roi object, text, and background + cv::rectangle(vis_img, roi, roi_color, 2); + + // Draw mask + std::vector mask_v = results[i].mask; + if (mask_v.size() > 0) { + cv::Mat mask = cv::Mat(img_h, img_w, CV_32S); + std::memcpy(mask.data, mask_v.data(), mask_v.size() * sizeof(int)); + + cv::Mat colored_img = vis_img.clone(); + + std::vector contours; + cv::Mat hierarchy; + mask.convertTo(mask, CV_8U); + cv::findContours(mask, contours, hierarchy, cv::RETR_CCOMP, + cv::CHAIN_APPROX_SIMPLE); + cv::drawContours(colored_img, contours, -1, roi_color, -1, cv::LINE_8, + hierarchy, 100); + + cv::Mat debug_roi = vis_img; + colored_img = 0.4 * colored_img + 0.6 * vis_img; + colored_img.copyTo(vis_img, mask); + } + } + + origin.x = results[i].rect[0]; + origin.y = results[i].rect[1]; + + // Configure text background + cv::Rect text_back = + cv::Rect(results[i].rect[0], results[i].rect[1] - text_size.height, + text_size.width, text_size.height); + // Draw text, and background + cv::rectangle(vis_img, text_back, roi_color, -1); + cv::putText(vis_img, text, origin, font_face, font_scale, + cv::Scalar(255, 255, 255), thickness); + } + return vis_img; +} + +void ObjectDetector::Preprocess(const cv::Mat &ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + cv::cvtColor(im, im, cv::COLOR_BGR2RGB); + preprocessor_.Run(&im, &inputs_); +} + +void ObjectDetector::Postprocess( + const std::vector mats, + std::vector *result, + std::vector bbox_num, std::vector output_data_, + std::vector output_mask_data_, bool is_rbox = false) { + result->clear(); + int start_idx = 0; + int total_num = std::accumulate(bbox_num.begin(), bbox_num.end(), 0); + int out_mask_dim = -1; + if (config_.mask_) { + out_mask_dim = output_mask_data_.size() / total_num; + } + + for (int im_id = 0; im_id < mats.size(); im_id++) { + cv::Mat raw_mat = mats[im_id]; + int rh = 1; + int rw = 1; + for (int j = start_idx; j < start_idx + bbox_num[im_id]; j++) { + if (is_rbox) { + // Class id + int class_id = static_cast(round(output_data_[0 + j * 10])); + // Confidence score + float score = output_data_[1 + j * 10]; + int x1 = (output_data_[2 + j * 10] * rw); + int y1 = (output_data_[3 + j * 10] * rh); + int x2 = (output_data_[4 + j * 10] * rw); + int y2 = (output_data_[5 + j * 10] * rh); + int x3 = (output_data_[6 + j * 10] * rw); + int y3 = (output_data_[7 + j * 10] * rh); + int x4 = (output_data_[8 + j * 10] * rw); + int y4 = (output_data_[9 + j * 10] * rh); + + PaddleDetection::ObjectResult result_item; + result_item.rect = {x1, y1, x2, y2, x3, y3, x4, y4}; + result_item.class_id = class_id; + result_item.confidence = score; + result->push_back(result_item); + } else { + // Class id + int class_id = static_cast(round(output_data_[0 + j * 6])); + // Confidence score + float score = output_data_[1 + j * 6]; + int xmin = (output_data_[2 + j * 6] * rw); + int ymin = (output_data_[3 + j * 6] * rh); + int xmax = (output_data_[4 + j * 6] * rw); + int ymax = (output_data_[5 + j * 6] * rh); + int wd = xmax - xmin; + int hd = ymax - ymin; + + PaddleDetection::ObjectResult result_item; + result_item.rect = {xmin, ymin, xmax, ymax}; + result_item.class_id = class_id; + result_item.confidence = score; + + if (config_.mask_) { + std::vector mask; + for (int k = 0; k < out_mask_dim; ++k) { + if (output_mask_data_[k + j * out_mask_dim] > -1) { + mask.push_back(output_mask_data_[k + j * out_mask_dim]); + } + } + result_item.mask = mask; + } + + result->push_back(result_item); + } + } + start_idx += bbox_num[im_id]; + } +} + +// This function is to convert output result from SOLOv2 to class ObjectResult +void ObjectDetector::SOLOv2Postprocess( + const std::vector mats, std::vector *result, + std::vector *bbox_num, std::vector out_bbox_num_data_, + std::vector out_label_data_, std::vector out_score_data_, + std::vector out_global_mask_data_, float threshold) { + + for (int im_id = 0; im_id < mats.size(); im_id++) { + cv::Mat mat = mats[im_id]; + + int valid_bbox_count = 0; + for (int bbox_id = 0; bbox_id < out_bbox_num_data_[im_id]; ++bbox_id) { + if (out_score_data_[bbox_id] >= threshold) { + ObjectResult result_item; + result_item.class_id = out_label_data_[bbox_id]; + result_item.confidence = out_score_data_[bbox_id]; + std::vector global_mask; + + for (int k = 0; k < mat.rows * mat.cols; ++k) { + global_mask.push_back(static_cast( + out_global_mask_data_[k + bbox_id * mat.rows * mat.cols])); + } + + // find minimize bounding box from mask + cv::Mat mask(mat.rows, mat.cols, CV_32SC1); + std::memcpy(mask.data, global_mask.data(), + global_mask.size() * sizeof(int)); + + cv::Mat mask_fp; + cv::Mat rowSum; + cv::Mat colSum; + std::vector sum_of_row(mat.rows); + std::vector sum_of_col(mat.cols); + + mask.convertTo(mask_fp, CV_32FC1); + cv::reduce(mask_fp, colSum, 0, CV_REDUCE_SUM, CV_32FC1); + cv::reduce(mask_fp, rowSum, 1, CV_REDUCE_SUM, CV_32FC1); + + for (int row_id = 0; row_id < mat.rows; ++row_id) { + sum_of_row[row_id] = rowSum.at(row_id, 0); + } + + for (int col_id = 0; col_id < mat.cols; ++col_id) { + sum_of_col[col_id] = colSum.at(0, col_id); + } + + auto it = std::find_if(sum_of_row.begin(), sum_of_row.end(), + [](int x) { return x > 0.5; }); + int y1 = std::distance(sum_of_row.begin(), it); + + auto it2 = std::find_if(sum_of_col.begin(), sum_of_col.end(), + [](int x) { return x > 0.5; }); + int x1 = std::distance(sum_of_col.begin(), it2); + + auto rit = std::find_if(sum_of_row.rbegin(), sum_of_row.rend(), + [](int x) { return x > 0.5; }); + int y2 = std::distance(rit, sum_of_row.rend()); + + auto rit2 = std::find_if(sum_of_col.rbegin(), sum_of_col.rend(), + [](int x) { return x > 0.5; }); + int x2 = std::distance(rit2, sum_of_col.rend()); + + result_item.rect = {x1, y1, x2, y2}; + result_item.mask = global_mask; + + result->push_back(result_item); + valid_bbox_count++; + } + } + bbox_num->push_back(valid_bbox_count); + } +} + +void ObjectDetector::Predict(const std::vector imgs, + const double threshold, const int warmup, + const int repeats, + std::vector *result, + std::vector *bbox_num, + std::vector *times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + std::vector im_shape_all(batch_size * 2); + std::vector scale_factor_all(batch_size * 2); + std::vector output_data_list_; + std::vector out_bbox_num_data_; + std::vector out_mask_data_; + + // these parameters are for SOLOv2 output + std::vector out_score_data_; + std::vector out_global_mask_data_; + std::vector out_label_data_; + + // in_net img for each batch + std::vector in_net_img_all(batch_size); + + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + im_shape_all[bs_idx * 2] = inputs_.im_shape_[0]; + im_shape_all[bs_idx * 2 + 1] = inputs_.im_shape_[1]; + + scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; + scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; + + in_data_all.insert(in_data_all.end(), inputs_.im_data_.begin(), + inputs_.im_data_.end()); + + // collect in_net img + in_net_img_all[bs_idx] = inputs_.in_net_im_; + } + + // Pad Batch if batch size > 1 + if (batch_size > 1 && CheckDynamicInput(in_net_img_all)) { + in_data_all.clear(); + std::vector pad_img_all = PadBatch(in_net_img_all); + int rh = pad_img_all[0].rows; + int rw = pad_img_all[0].cols; + int rc = pad_img_all[0].channels(); + + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat pad_img = pad_img_all[bs_idx]; + pad_img.convertTo(pad_img, CV_32FC3); + std::vector pad_data; + pad_data.resize(rc * rh * rw); + float *base = pad_data.data(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(pad_img, + cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); + } + in_data_all.insert(in_data_all.end(), pad_data.begin(), pad_data.end()); + } + // update in_net_shape + inputs_.in_net_shape_ = {static_cast(rh), static_cast(rw)}; + } + + auto preprocess_end = std::chrono::steady_clock::now(); + // Prepare input tensor + auto input_names = predictor_->GetInputNames(); + for (const auto &tensor_name : input_names) { + auto in_tensor = predictor_->GetInputHandle(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Reshape({batch_size, 3, rh, rw}); + in_tensor->CopyFromCpu(in_data_all.data()); + } else if (tensor_name == "im_shape") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(im_shape_all.data()); + } else if (tensor_name == "scale_factor") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(scale_factor_all.data()); + } + } + + // Run predictor + std::vector> out_tensor_list; + std::vector> output_shape_list; + bool is_rbox = false; + int reg_max = 7; + int num_class = 80; + + auto inference_start = std::chrono::steady_clock::now(); + if (config_.arch_ == "SOLOv2") { + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + if (j == 0) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else if (j == 1) { + out_label_data_.resize(out_num); + output_tensor->CopyToCpu(out_label_data_.data()); + } else if (j == 2) { + out_score_data_.resize(out_num); + output_tensor->CopyToCpu(out_score_data_.data()); + } else if (config_.mask_ && (j == 3)) { + out_global_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_global_mask_data_.data()); + } + } + } + + inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + out_tensor_list.clear(); + output_shape_list.clear(); + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + output_shape_list.push_back(output_shape); + if (j == 0) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else if (j == 1) { + out_label_data_.resize(out_num); + output_tensor->CopyToCpu(out_label_data_.data()); + } else if (j == 2) { + out_score_data_.resize(out_num); + output_tensor->CopyToCpu(out_score_data_.data()); + } else if (config_.mask_ && (j == 3)) { + out_global_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_global_mask_data_.data()); + } + } + } + } else { + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + if (config_.mask_ && (j == 2)) { + out_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_mask_data_.data()); + } else if (output_tensor->type() == paddle_infer::DataType::INT32) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else { + std::vector out_data; + out_data.resize(out_num); + output_tensor->CopyToCpu(out_data.data()); + out_tensor_list.push_back(out_data); + } + } + } + + inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + out_tensor_list.clear(); + output_shape_list.clear(); + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + output_shape_list.push_back(output_shape); + if (config_.mask_ && (j == 2)) { + out_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_mask_data_.data()); + } else if (output_tensor->type() == paddle_infer::DataType::INT32) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else { + std::vector out_data; + out_data.resize(out_num); + output_tensor->CopyToCpu(out_data.data()); + out_tensor_list.push_back(out_data); + } + } + } + } + + auto inference_end = std::chrono::steady_clock::now(); + auto postprocess_start = std::chrono::steady_clock::now(); + // Postprocessing result + result->clear(); + bbox_num->clear(); + if (config_.arch_ == "PicoDet") { + for (int i = 0; i < out_tensor_list.size(); i++) { + if (i == 0) { + num_class = output_shape_list[i][2]; + } + if (i == config_.fpn_stride_.size()) { + reg_max = output_shape_list[i][2] / 4 - 1; + } + float *buffer = new float[out_tensor_list[i].size()]; + memcpy(buffer, &out_tensor_list[i][0], + out_tensor_list[i].size() * sizeof(float)); + output_data_list_.push_back(buffer); + } + PaddleDetection::PicoDetPostProcess( + result, output_data_list_, config_.fpn_stride_, inputs_.im_shape_, + inputs_.scale_factor_, config_.nms_info_["score_threshold"].as(), + config_.nms_info_["nms_threshold"].as(), num_class, reg_max); + bbox_num->push_back(result->size()); + } else if (config_.arch_ == "SOLOv2") { + SOLOv2Postprocess(imgs, result, bbox_num, out_bbox_num_data_, + out_label_data_, out_score_data_, out_global_mask_data_, + threshold); + } else { + is_rbox = output_shape_list[0][output_shape_list[0].size() - 1] % 10 == 0; + Postprocess(imgs, result, out_bbox_num_data_, out_tensor_list[0], + out_mask_data_, is_rbox); + for (int k = 0; k < out_bbox_num_data_.size(); k++) { + int tmp = out_bbox_num_data_[k]; + bbox_num->push_back(tmp); + } + } + + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + times->push_back(static_cast(preprocess_diff.count() * 1000)); + std::chrono::duration inference_diff = inference_end - inference_start; + times->push_back( + static_cast(inference_diff.count() / repeats * 1000)); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + times->push_back(static_cast(postprocess_diff.count() * 1000)); +} + +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/picodet_postprocess.cc b/PaddleDetection-release-2.6/deploy/cpp/src/picodet_postprocess.cc new file mode 100644 index 0000000000000000000000000000000000000000..7f40a2658ac04d98e73646996e12f2dd4e016006 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/picodet_postprocess.cc @@ -0,0 +1,128 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on: +// https://github.com/RangiLyu/nanodet/blob/main/demo_mnn/nanodet_mnn.cpp + +#include "include/picodet_postprocess.h" + +namespace PaddleDetection { + +float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} + +// PicoDet decode +PaddleDetection::ObjectResult +disPred2Bbox(const float *&dfl_det, int label, float score, int x, int y, + int stride, std::vector im_shape, int reg_max) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[reg_max + 1]; + activation_function_softmax(dfl_det + i * (reg_max + 1), dis_after_sm, + reg_max + 1); + for (int j = 0; j < reg_max + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + int xmin = (int)(std::max)(ct_x - dis_pred[0], .0f); + int ymin = (int)(std::max)(ct_y - dis_pred[1], .0f); + int xmax = (int)(std::min)(ct_x + dis_pred[2], (float)im_shape[0]); + int ymax = (int)(std::min)(ct_y + dis_pred[3], (float)im_shape[1]); + + PaddleDetection::ObjectResult result_item; + result_item.rect = {xmin, ymin, xmax, ymax}; + result_item.class_id = label; + result_item.confidence = score; + + return result_item; +} + +void PicoDetPostProcess(std::vector *results, + std::vector outs, + std::vector fpn_stride, + std::vector im_shape, + std::vector scale_factor, float score_threshold, + float nms_threshold, int num_class, int reg_max) { + std::vector> bbox_results; + bbox_results.resize(num_class); + int in_h = im_shape[0], in_w = im_shape[1]; + for (int i = 0; i < fpn_stride.size(); ++i) { + int feature_h = std::ceil((float)in_h / fpn_stride[i]); + int feature_w = std::ceil((float)in_w / fpn_stride[i]); + for (int idx = 0; idx < feature_h * feature_w; idx++) { + const float *scores = outs[i] + (idx * num_class); + + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + for (int label = 0; label < num_class; label++) { + if (scores[label] > score) { + score = scores[label]; + cur_label = label; + } + } + if (score > score_threshold) { + const float *bbox_pred = + outs[i + fpn_stride.size()] + (idx * 4 * (reg_max + 1)); + bbox_results[cur_label].push_back( + disPred2Bbox(bbox_pred, cur_label, score, col, row, fpn_stride[i], + im_shape, reg_max)); + } + } + } + for (int i = 0; i < (int)bbox_results.size(); i++) { + PaddleDetection::nms(bbox_results[i], nms_threshold); + + for (auto box : bbox_results[i]) { + box.rect[0] = box.rect[0] / scale_factor[1]; + box.rect[2] = box.rect[2] / scale_factor[1]; + box.rect[1] = box.rect[1] / scale_factor[0]; + box.rect[3] = box.rect[3] / scale_factor[0]; + results->push_back(box); + } + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/preprocess_op.cc b/PaddleDetection-release-2.6/deploy/cpp/src/preprocess_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..e1cbfe4f15a49930ac9759e8d1b71232f167ad04 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/preprocess_op.cc @@ -0,0 +1,355 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include + +#include "include/preprocess_op.h" + +namespace PaddleDetection { + +void InitInfo::Run(cv::Mat* im, ImageBlob* data) { + data->im_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; + data->scale_factor_ = {1., 1.}; + data->in_net_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; +} + +void NormalizeImage::Run(cv::Mat* im, ImageBlob* data) { + double e = 1.0; + if (is_scale_) { + e /= 255.0; + } + (*im).convertTo(*im, CV_32FC3, e); + if (norm_type_ == "mean_std"){ + for (int h = 0; h < im->rows; h++) { + for (int w = 0; w < im->cols; w++) { + im->at(h, w)[0] = + (im->at(h, w)[0] - mean_[0]) / scale_[0]; + im->at(h, w)[1] = + (im->at(h, w)[1] - mean_[1]) / scale_[1]; + im->at(h, w)[2] = + (im->at(h, w)[2] - mean_[2]) / scale_[2]; + } + } + } +} + +void Permute::Run(cv::Mat* im, ImageBlob* data) { + (*im).convertTo(*im, CV_32FC3); + int rh = im->rows; + int rw = im->cols; + int rc = im->channels(); + (data->im_data_).resize(rc * rh * rw); + float* base = (data->im_data_).data(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); + } +} + +void Resize::Run(cv::Mat* im, ImageBlob* data) { + auto resize_scale = GenerateScale(*im); + cv::resize( + *im, *im, cv::Size(), resize_scale.first, resize_scale.second, interp_); + + data->in_net_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; + data->im_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + data->scale_factor_ = { + resize_scale.second, resize_scale.first, + }; +} + +std::pair Resize::GenerateScale(const cv::Mat& im) { + std::pair resize_scale; + int origin_w = im.cols; + int origin_h = im.rows; + + if (keep_ratio_) { + int im_size_max = std::max(origin_w, origin_h); + int im_size_min = std::min(origin_w, origin_h); + int target_size_max = + *std::max_element(target_size_.begin(), target_size_.end()); + int target_size_min = + *std::min_element(target_size_.begin(), target_size_.end()); + float scale_min = + static_cast(target_size_min) / static_cast(im_size_min); + float scale_max = + static_cast(target_size_max) / static_cast(im_size_max); + float scale_ratio = std::min(scale_min, scale_max); + resize_scale = {scale_ratio, scale_ratio}; + } else { + resize_scale.first = + static_cast(target_size_[1]) / static_cast(origin_w); + resize_scale.second = + static_cast(target_size_[0]) / static_cast(origin_h); + } + return resize_scale; +} + +void LetterBoxResize::Run(cv::Mat* im, ImageBlob* data) { + float resize_scale = GenerateScale(*im); + int new_shape_w = std::round(im->cols * resize_scale); + int new_shape_h = std::round(im->rows * resize_scale); + data->im_shape_ = {static_cast(new_shape_h), + static_cast(new_shape_w)}; + float padw = (target_size_[1] - new_shape_w) / 2.; + float padh = (target_size_[0] - new_shape_h) / 2.; + + int top = std::round(padh - 0.1); + int bottom = std::round(padh + 0.1); + int left = std::round(padw - 0.1); + int right = std::round(padw + 0.1); + + cv::resize( + *im, *im, cv::Size(new_shape_w, new_shape_h), 0, 0, cv::INTER_AREA); + + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + cv::copyMakeBorder(*im, + *im, + top, + bottom, + left, + right, + cv::BORDER_CONSTANT, + cv::Scalar(127.5)); + + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + + data->scale_factor_ = { + resize_scale, resize_scale, + }; +} + +float LetterBoxResize::GenerateScale(const cv::Mat& im) { + int origin_w = im.cols; + int origin_h = im.rows; + + int target_h = target_size_[0]; + int target_w = target_size_[1]; + + float ratio_h = static_cast(target_h) / static_cast(origin_h); + float ratio_w = static_cast(target_w) / static_cast(origin_w); + float resize_scale = std::min(ratio_h, ratio_w); + return resize_scale; +} + +void PadStride::Run(cv::Mat* im, ImageBlob* data) { + if (stride_ <= 0) { + data->in_net_im_ = im->clone(); + return; + } + int rc = im->channels(); + int rh = im->rows; + int rw = im->cols; + int nh = (rh / stride_) * stride_ + (rh % stride_ != 0) * stride_; + int nw = (rw / stride_) * stride_ + (rw % stride_ != 0) * stride_; + cv::copyMakeBorder( + *im, *im, 0, nh - rh, 0, nw - rw, cv::BORDER_CONSTANT, cv::Scalar(0)); + data->in_net_im_ = im->clone(); + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; +} + +void TopDownEvalAffine::Run(cv::Mat* im, ImageBlob* data) { + cv::resize(*im, *im, cv::Size(trainsize_[0], trainsize_[1]), 0, 0, interp_); + // todo: Simd::ResizeBilinear(); + data->in_net_shape_ = { + static_cast(trainsize_[1]), static_cast(trainsize_[0]), + }; +} + +void GetAffineTrans(const cv::Point2f center, + const cv::Point2f input_size, + const cv::Point2f output_size, + cv::Mat* trans) { + cv::Point2f srcTri[3]; + cv::Point2f dstTri[3]; + float src_w = input_size.x; + float dst_w = output_size.x; + float dst_h = output_size.y; + + cv::Point2f src_dir(0, -0.5 * src_w); + cv::Point2f dst_dir(0, -0.5 * dst_w); + + srcTri[0] = center; + srcTri[1] = center + src_dir; + cv::Point2f src_d = srcTri[0] - srcTri[1]; + srcTri[2] = srcTri[1] + cv::Point2f(-src_d.y, src_d.x); + + dstTri[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstTri[1] = cv::Point2f(dst_w * 0.5, dst_h * 0.5) + dst_dir; + cv::Point2f dst_d = dstTri[0] - dstTri[1]; + dstTri[2] = dstTri[1] + cv::Point2f(-dst_d.y, dst_d.x); + + *trans = cv::getAffineTransform(srcTri, dstTri); +} + +void WarpAffine::Run(cv::Mat* im, ImageBlob* data) { + cv::cvtColor(*im, *im, cv::COLOR_RGB2BGR); + cv::Mat trans(2, 3, CV_32FC1); + cv::Point2f center; + cv::Point2f input_size; + int h = im->rows; + int w = im->cols; + if (keep_res_) { + input_h_ = (h | pad_) + 1; + input_w_ = (w + pad_) + 1; + input_size = cv::Point2f(input_w_, input_h_); + center = cv::Point2f(w / 2, h / 2); + } else { + float s = std::max(h, w) * 1.0; + input_size = cv::Point2f(s, s); + center = cv::Point2f(w / 2., h / 2.); + } + cv::Point2f output_size(input_w_, input_h_); + + GetAffineTrans(center, input_size, output_size, &trans); + cv::warpAffine(*im, *im, trans, cv::Size(input_w_, input_h_)); + data->in_net_shape_ = { + static_cast(input_h_), static_cast(input_w_), + }; +} + +void Pad::Run(cv::Mat* im, ImageBlob* data) { + int h = size_[0]; + int w = size_[1]; + int rh = im->rows; + int rw = im->cols; + if (h == rh && w == rw){ + data->in_net_im_ = im->clone(); + return; + } + cv::copyMakeBorder( + *im, *im, 0, h - rh, 0, w - rw, cv::BORDER_CONSTANT, cv::Scalar(114)); + data->in_net_im_ = im->clone(); + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; +} + +// Preprocessor op running order +const std::vector Preprocessor::RUN_ORDER = {"InitInfo", + "TopDownEvalAffine", + "Resize", + "LetterBoxResize", + "WarpAffine", + "NormalizeImage", + "PadStride", + "Pad", + "Permute"}; + +void Preprocessor::Run(cv::Mat* im, ImageBlob* data) { + for (const auto& name : RUN_ORDER) { + if (ops_.find(name) != ops_.end()) { + ops_[name]->Run(im, data); + } + } +} + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio) { + int crop_x1 = std::max(0, area[0]); + int crop_y1 = std::max(0, area[1]); + int crop_x2 = std::min(img.cols - 1, area[2]); + int crop_y2 = std::min(img.rows - 1, area[3]); + int center_x = (crop_x1 + crop_x2) / 2.; + int center_y = (crop_y1 + crop_y2) / 2.; + int half_h = (crop_y2 - crop_y1) / 2.; + int half_w = (crop_x2 - crop_x1) / 2.; + + // adjust h or w to keep image ratio, expand the shorter edge + if (half_h * 3 > half_w * 4) { + half_w = static_cast(half_h * 0.75); + } else { + half_h = static_cast(half_w * 4 / 3); + } + + crop_x1 = + std::max(0, center_x - static_cast(half_w * (1 + expandratio))); + crop_y1 = + std::max(0, center_y - static_cast(half_h * (1 + expandratio))); + crop_x2 = std::min(img.cols - 1, + static_cast(center_x + half_w * (1 + expandratio))); + crop_y2 = std::min(img.rows - 1, + static_cast(center_y + half_h * (1 + expandratio))); + crop_img = + img(cv::Range(crop_y1, crop_y2 + 1), cv::Range(crop_x1, crop_x2 + 1)); + + center.clear(); + center.emplace_back((crop_x1 + crop_x2) / 2); + center.emplace_back((crop_y1 + crop_y2) / 2); + + scale.clear(); + scale.emplace_back((crop_x2 - crop_x1)); + scale.emplace_back((crop_y2 - crop_y1)); +} + +bool CheckDynamicInput(const std::vector& imgs) { + if (imgs.size() == 1) return false; + + int h = imgs.at(0).rows; + int w = imgs.at(0).cols; + for (int i = 1; i < imgs.size(); ++i) { + int hi = imgs.at(i).rows; + int wi = imgs.at(i).cols; + if (hi != h || wi != w) { + return true; + } + } + return false; +} + +std::vector PadBatch(const std::vector& imgs) { + std::vector out_imgs; + int max_h = 0; + int max_w = 0; + int rh = 0; + int rw = 0; + // find max_h and max_w in batch + for (int i = 0; i < imgs.size(); ++i) { + rh = imgs.at(i).rows; + rw = imgs.at(i).cols; + if (rh > max_h) max_h = rh; + if (rw > max_w) max_w = rw; + } + for (int i = 0; i < imgs.size(); ++i) { + cv::Mat im = imgs.at(i); + cv::copyMakeBorder(im, + im, + 0, + max_h - imgs.at(i).rows, + 0, + max_w - imgs.at(i).cols, + cv::BORDER_CONSTANT, + cv::Scalar(0)); + out_imgs.push_back(im); + } + return out_imgs; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/tracker.cc b/PaddleDetection-release-2.6/deploy/cpp/src/tracker.cc new file mode 100644 index 0000000000000000000000000000000000000000..f40cb0dd699a4687f4f77714e4bc5ae5416141f6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/tracker.cc @@ -0,0 +1,333 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/jdetracker.cpp +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#include +#include +#include +#include + +#include "include/lapjv.h" +#include "include/tracker.h" + +#define mat2vec4f(m) cv::Vec4f(*m.ptr(0,0), *m.ptr(0,1), *m.ptr(0,2), *m.ptr(0,3)) + +namespace PaddleDetection { + +static std::map chi2inv95 = { + {1, 3.841459f}, + {2, 5.991465f}, + {3, 7.814728f}, + {4, 9.487729f}, + {5, 11.070498f}, + {6, 12.591587f}, + {7, 14.067140f}, + {8, 15.507313f}, + {9, 16.918978f} +}; + +JDETracker *JDETracker::me = new JDETracker; + +JDETracker *JDETracker::instance(void) +{ + return me; +} + +JDETracker::JDETracker(void) : timestamp(0), max_lost_time(30), lambda(0.98f), det_thresh(0.3f) +{ +} + +bool JDETracker::update(const cv::Mat &dets, const cv::Mat &emb, std::vector &tracks) +{ + ++timestamp; + TrajectoryPool candidates(dets.rows); + for (int i = 0; i < dets.rows; ++i) + { + float score = *dets.ptr(i, 1); + const cv::Mat <rb_ = dets(cv::Rect(2, i, 4, 1)); + cv::Vec4f ltrb = mat2vec4f(ltrb_); + const cv::Mat &embedding = emb(cv::Rect(0, i, emb.cols, 1)); + candidates[i] = Trajectory(ltrb, score, embedding); + } + + + TrajectoryPtrPool tracked_trajectories; + TrajectoryPtrPool unconfirmed_trajectories; + for (size_t i = 0; i < this->tracked_trajectories.size(); ++i) + { + if (this->tracked_trajectories[i].is_activated) + tracked_trajectories.push_back(&this->tracked_trajectories[i]); + else + unconfirmed_trajectories.push_back(&this->tracked_trajectories[i]); + } + + + TrajectoryPtrPool trajectory_pool = tracked_trajectories + this->lost_trajectories; + + for (size_t i = 0; i < trajectory_pool.size(); ++i) + trajectory_pool[i]->predict(); + + Match matches; + std::vector mismatch_row; + std::vector mismatch_col; + + cv::Mat cost = motion_distance(trajectory_pool, candidates); + linear_assignment(cost, 0.7f, matches, mismatch_row, mismatch_col); + + MatchIterator miter; + TrajectoryPtrPool activated_trajectories; + TrajectoryPtrPool retrieved_trajectories; + + + for (miter = matches.begin(); miter != matches.end(); miter++) + { + Trajectory *pt = trajectory_pool[miter->first]; + Trajectory &ct = candidates[miter->second]; + if (pt->state == Tracked) + { + pt->update(ct, timestamp); + activated_trajectories.push_back(pt); + } + else + { + pt->reactivate(ct, timestamp); + retrieved_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool next_candidates(mismatch_col.size()); + for (size_t i = 0; i < mismatch_col.size(); ++i) + next_candidates[i] = &candidates[mismatch_col[i]]; + + TrajectoryPtrPool next_trajectory_pool; + for (size_t i = 0; i < mismatch_row.size(); ++i) + { + int j = mismatch_row[i]; + if (trajectory_pool[j]->state == Tracked) + next_trajectory_pool.push_back(trajectory_pool[j]); + } + + cost = iou_distance(next_trajectory_pool, next_candidates); + linear_assignment(cost, 0.5f, matches, mismatch_row, mismatch_col); + + for (miter = matches.begin(); miter != matches.end(); miter++) + { + Trajectory *pt = next_trajectory_pool[miter->first]; + Trajectory *ct = next_candidates[miter->second]; + if (pt->state == Tracked) + { + pt->update(*ct, timestamp); + activated_trajectories.push_back(pt); + } + else + { + pt->reactivate(*ct, timestamp); + retrieved_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool lost_trajectories; + for (size_t i = 0; i < mismatch_row.size(); ++i) + { + Trajectory *pt = next_trajectory_pool[mismatch_row[i]]; + if (pt->state != Lost) + { + pt->mark_lost(); + lost_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool nnext_candidates(mismatch_col.size()); + for (size_t i = 0; i < mismatch_col.size(); ++i) + nnext_candidates[i] = next_candidates[mismatch_col[i]]; + cost = iou_distance(unconfirmed_trajectories, nnext_candidates); + linear_assignment(cost, 0.7f, matches, mismatch_row, mismatch_col); + + for (miter = matches.begin(); miter != matches.end(); miter++) + { + unconfirmed_trajectories[miter->first]->update(*nnext_candidates[miter->second], timestamp); + activated_trajectories.push_back(unconfirmed_trajectories[miter->first]); + } + + TrajectoryPtrPool removed_trajectories; + + for (size_t i = 0; i < mismatch_row.size(); ++i) + { + unconfirmed_trajectories[mismatch_row[i]]->mark_removed(); + removed_trajectories.push_back(unconfirmed_trajectories[mismatch_row[i]]); + } + + for (size_t i = 0; i < mismatch_col.size(); ++i) + { + if (nnext_candidates[mismatch_col[i]]->score < det_thresh) continue; + nnext_candidates[mismatch_col[i]]->activate(timestamp); + activated_trajectories.push_back(nnext_candidates[mismatch_col[i]]); + } + + for (size_t i = 0; i < this->lost_trajectories.size(); ++i) + { + Trajectory < = this->lost_trajectories[i]; + if (timestamp - lt.timestamp > max_lost_time) + { + lt.mark_removed(); + removed_trajectories.push_back(<); + } + } + + TrajectoryPoolIterator piter; + for (piter = this->tracked_trajectories.begin(); piter != this->tracked_trajectories.end(); ) + { + if (piter->state != Tracked) + piter = this->tracked_trajectories.erase(piter); + else + ++piter; + } + + this->tracked_trajectories += activated_trajectories; + this->tracked_trajectories += retrieved_trajectories; + + this->lost_trajectories -= this->tracked_trajectories; + this->lost_trajectories += lost_trajectories; + this->lost_trajectories -= this->removed_trajectories; + this->removed_trajectories += removed_trajectories; + remove_duplicate_trajectory(this->tracked_trajectories, this->lost_trajectories); + + tracks.clear(); + for (size_t i = 0; i < this->tracked_trajectories.size(); ++i) + { + if (this->tracked_trajectories[i].is_activated) + { + Track track = { + .id = this->tracked_trajectories[i].id, + .score = this->tracked_trajectories[i].score, + .ltrb = this->tracked_trajectories[i].ltrb}; + tracks.push_back(track); + } + } + return 0; +} + + +cv::Mat JDETracker::motion_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b) +{ + if (0 == a.size() || 0 == b.size()) + return cv::Mat(a.size(), b.size(), CV_32F); + + cv::Mat edists = embedding_distance(a, b); + cv::Mat mdists = mahalanobis_distance(a, b); + cv::Mat fdists = lambda * edists + (1 - lambda) * mdists; + + const float gate_thresh = chi2inv95[4]; + for (int i = 0; i < fdists.rows; ++i) + { + for (int j = 0; j < fdists.cols; ++j) + { + if (*mdists.ptr(i, j) > gate_thresh) + *fdists.ptr(i, j) = FLT_MAX; + } + } + + return fdists; +} + +void JDETracker::linear_assignment(const cv::Mat &cost, float cost_limit, Match &matches, + std::vector &mismatch_row, std::vector &mismatch_col) +{ + matches.clear(); + mismatch_row.clear(); + mismatch_col.clear(); + if (cost.empty()) + { + for (int i = 0; i < cost.rows; ++i) + mismatch_row.push_back(i); + for (int i = 0; i < cost.cols; ++i) + mismatch_col.push_back(i); + return; + } + + float opt = 0; + cv::Mat x(cost.rows, 1, CV_32S); + cv::Mat y(cost.cols, 1, CV_32S); + + lapjv_internal(cost, true, cost_limit, + (int *)x.data, (int *)y.data); + + for (int i = 0; i < x.rows; ++i) + { + int j = *x.ptr(i); + if (j >= 0) + matches.insert({i, j}); + else + mismatch_row.push_back(i); + } + + for (int i = 0; i < y.rows; ++i) + { + int j = *y.ptr(i); + if (j < 0) + mismatch_col.push_back(i); + } + + return; +} + +void JDETracker::remove_duplicate_trajectory(TrajectoryPool &a, TrajectoryPool &b, float iou_thresh) +{ + if (0 == a.size() || 0 == b.size()) + return; + + cv::Mat dist = iou_distance(a, b); + cv::Mat mask = dist < iou_thresh; + std::vector idx; + cv::findNonZero(mask, idx); + + std::vector da; + std::vector db; + for (size_t i = 0; i < idx.size(); ++i) + { + int ta = a[idx[i].y].timestamp - a[idx[i].y].starttime; + int tb = b[idx[i].x].timestamp - b[idx[i].x].starttime; + if (ta > tb) + db.push_back(idx[i].x); + else + da.push_back(idx[i].y); + } + + int id = 0; + TrajectoryPoolIterator piter; + for (piter = a.begin(); piter != a.end(); ) + { + std::vector::iterator iter = find(da.begin(), da.end(), id++); + if (iter != da.end()) + piter = a.erase(piter); + else + ++piter; + } + + id = 0; + for (piter = b.begin(); piter != b.end(); ) + { + std::vector::iterator iter = find(db.begin(), db.end(), id++); + if (iter != db.end()) + piter = b.erase(piter); + else + ++piter; + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/trajectory.cc b/PaddleDetection-release-2.6/deploy/cpp/src/trajectory.cc new file mode 100644 index 0000000000000000000000000000000000000000..6e69b350fad4fcf43b2ef9cf350c97ce5f8cd884 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/trajectory.cc @@ -0,0 +1,584 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/trajectory.cpp +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#include + +#include "include/trajectory.h" + +namespace PaddleDetection { + +void TKalmanFilter::init(const cv::Mat &measurement) +{ + measurement.copyTo(statePost(cv::Rect(0, 0, 1, 4))); + statePost(cv::Rect(0, 4, 1, 4)).setTo(0); + statePost.copyTo(statePre); + + float varpos = 2 * std_weight_position * (*measurement.ptr(3)); + varpos *= varpos; + float varvel = 10 * std_weight_velocity * (*measurement.ptr(3)); + varvel *= varvel; + + errorCovPost.setTo(0); + *errorCovPost.ptr(0, 0) = varpos; + *errorCovPost.ptr(1, 1) = varpos; + *errorCovPost.ptr(2, 2) = 1e-4f; + *errorCovPost.ptr(3, 3) = varpos; + *errorCovPost.ptr(4, 4) = varvel; + *errorCovPost.ptr(5, 5) = varvel; + *errorCovPost.ptr(6, 6) = 1e-10f; + *errorCovPost.ptr(7, 7) = varvel; + errorCovPost.copyTo(errorCovPre); +} + +const cv::Mat &TKalmanFilter::predict() +{ + float varpos = std_weight_position * (*statePre.ptr(3)); + varpos *= varpos; + float varvel = std_weight_velocity * (*statePre.ptr(3)); + varvel *= varvel; + + processNoiseCov.setTo(0); + *processNoiseCov.ptr(0, 0) = varpos; + *processNoiseCov.ptr(1, 1) = varpos; + *processNoiseCov.ptr(2, 2) = 1e-4f; + *processNoiseCov.ptr(3, 3) = varpos; + *processNoiseCov.ptr(4, 4) = varvel; + *processNoiseCov.ptr(5, 5) = varvel; + *processNoiseCov.ptr(6, 6) = 1e-10f; + *processNoiseCov.ptr(7, 7) = varvel; + + return cv::KalmanFilter::predict(); +} + +const cv::Mat &TKalmanFilter::correct(const cv::Mat &measurement) +{ + float varpos = std_weight_position * (*measurement.ptr(3)); + varpos *= varpos; + + measurementNoiseCov.setTo(0); + *measurementNoiseCov.ptr(0, 0) = varpos; + *measurementNoiseCov.ptr(1, 1) = varpos; + *measurementNoiseCov.ptr(2, 2) = 1e-2f; + *measurementNoiseCov.ptr(3, 3) = varpos; + + return cv::KalmanFilter::correct(measurement); +} + +void TKalmanFilter::project(cv::Mat &mean, cv::Mat &covariance) const +{ + float varpos = std_weight_position * (*statePost.ptr(3)); + varpos *= varpos; + + cv::Mat measurementNoiseCov_ = cv::Mat::eye(4, 4, CV_32F); + *measurementNoiseCov_.ptr(0, 0) = varpos; + *measurementNoiseCov_.ptr(1, 1) = varpos; + *measurementNoiseCov_.ptr(2, 2) = 1e-2f; + *measurementNoiseCov_.ptr(3, 3) = varpos; + + mean = measurementMatrix * statePost; + cv::Mat temp = measurementMatrix * errorCovPost; + gemm(temp, measurementMatrix, 1, measurementNoiseCov_, 1, covariance, cv::GEMM_2_T); +} + +int Trajectory::count = 0; + +const cv::Mat &Trajectory::predict(void) +{ + if (state != Tracked) + *cv::KalmanFilter::statePost.ptr(7) = 0; + return TKalmanFilter::predict(); +} + +void Trajectory::update(Trajectory &traj, int timestamp_, bool update_embedding_) +{ + timestamp = timestamp_; + ++length; + ltrb = traj.ltrb; + xyah = traj.xyah; + TKalmanFilter::correct(cv::Mat(traj.xyah)); + state = Tracked; + is_activated = true; + score = traj.score; + if (update_embedding_) + update_embedding(traj.current_embedding); +} + +void Trajectory::activate(int timestamp_) +{ + id = next_id(); + TKalmanFilter::init(cv::Mat(xyah)); + length = 0; + state = Tracked; + if (timestamp_ == 1) { + is_activated = true; + } + timestamp = timestamp_; + starttime = timestamp_; +} + +void Trajectory::reactivate(Trajectory &traj, int timestamp_, bool newid) +{ + TKalmanFilter::correct(cv::Mat(traj.xyah)); + update_embedding(traj.current_embedding); + length = 0; + state = Tracked; + is_activated = true; + timestamp = timestamp_; + if (newid) + id = next_id(); +} + +void Trajectory::update_embedding(const cv::Mat &embedding) +{ + current_embedding = embedding / cv::norm(embedding); + if (smooth_embedding.empty()) + { + smooth_embedding = current_embedding; + } + else + { + smooth_embedding = eta * smooth_embedding + (1 - eta) * current_embedding; + } + smooth_embedding = smooth_embedding / cv::norm(smooth_embedding); +} + +TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPool &b) +{ + TrajectoryPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) + ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i].id); + if (iter == ids.end()) + { + sum.push_back(b[i]); + ids.push_back(b[i].id); + } + } + + return sum; +} + +TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPtrPool &b) +{ + TrajectoryPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) + ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) + { + sum.push_back(*b[i]); + ids.push_back(b[i]->id); + } + } + + return sum; +} + +TrajectoryPool &operator+=(TrajectoryPool &a, const TrajectoryPtrPool &b) +{ + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) + ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) + { + if (b[i]->smooth_embedding.empty()) + continue; + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) + { + a.push_back(*b[i]); + ids.push_back(b[i]->id); + } + } + + return a; +} + +TrajectoryPool operator-(const TrajectoryPool &a, const TrajectoryPool &b) +{ + TrajectoryPool dif; + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) + ids[i] = b[i].id; + + for (size_t i = 0; i < a.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), a[i].id); + if (iter == ids.end()) + dif.push_back(a[i]); + } + + return dif; +} + +TrajectoryPool &operator-=(TrajectoryPool &a, const TrajectoryPool &b) +{ + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) + ids[i] = b[i].id; + + TrajectoryPoolIterator piter; + for (piter = a.begin(); piter != a.end(); ) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), piter->id); + if (iter == ids.end()) + ++piter; + else + piter = a.erase(piter); + } + + return a; +} + +TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) +{ + TrajectoryPtrPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) + ids[i] = a[i]->id; + + for (size_t i = 0; i < b.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) + { + sum.push_back(b[i]); + ids.push_back(b[i]->id); + } + } + + return sum; +} + +TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, TrajectoryPool &b) +{ + TrajectoryPtrPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) + ids[i] = a[i]->id; + + for (size_t i = 0; i < b.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i].id); + if (iter == ids.end()) + { + sum.push_back(&b[i]); + ids.push_back(b[i].id); + } + } + + return sum; +} + +TrajectoryPtrPool operator-(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) +{ + TrajectoryPtrPool dif; + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) + ids[i] = b[i]->id; + + for (size_t i = 0; i < a.size(); ++i) + { + std::vector::iterator iter = find(ids.begin(), ids.end(), a[i]->id); + if (iter == ids.end()) + dif.push_back(a[i]); + } + + return dif; +} + +cv::Mat embedding_distance(const TrajectoryPool &a, const TrajectoryPool &b) +{ + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + cv::Mat u = a[i].smooth_embedding; + cv::Mat v = b[j].smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + //double dist = cv::norm(a[i].smooth_embedding, b[j].smooth_embedding, cv::NORM_L2); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + } + } + return dists; +} + +cv::Mat embedding_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) +{ + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + //double dist = cv::norm(a[i]->smooth_embedding, b[j]->smooth_embedding, cv::NORM_L2); + //distsi[j] = static_cast(dist); + cv::Mat u = a[i]->smooth_embedding; + cv::Mat v = b[j]->smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + + } + } + + return dists; +} + +cv::Mat embedding_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b) +{ + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + //double dist = cv::norm(a[i]->smooth_embedding, b[j].smooth_embedding, cv::NORM_L2); + //distsi[j] = static_cast(dist); + cv::Mat u = a[i]->smooth_embedding; + cv::Mat v = b[j].smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPool &a, const TrajectoryPool &b) +{ + std::vector means(a.size()); + std::vector icovariances(a.size()); + for (size_t i = 0; i < a.size(); ++i) + { + cv::Mat covariance; + a[i].project(means[i], covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Mat x(b[j].xyah); + float dist = static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) +{ + std::vector means(a.size()); + std::vector icovariances(a.size()); + for (size_t i = 0; i < a.size(); ++i) + { + cv::Mat covariance; + a[i]->project(means[i], covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Mat x(b[j]->xyah); + float dist = static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b) +{ + std::vector means(a.size()); + std::vector icovariances(a.size()); + + for (size_t i = 0; i < a.size(); ++i) + { + cv::Mat covariance; + a[i]->project(means[i], covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Mat x(b[j].xyah); + float dist = static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +static inline float calc_inter_area(const cv::Vec4f &a, const cv::Vec4f &b) +{ + if (a[2] < b[0] || a[0] > b[2] || a[3] < b[1] || a[1] > b[3]) + return 0.f; + + float w = std::min(a[2], b[2]) - std::max(a[0], b[0]); + float h = std::min(a[3], b[3]) - std::max(a[1], b[1]); + return w * h; +} + +cv::Mat iou_distance(const TrajectoryPool &a, const TrajectoryPool &b) +{ + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) + { + float w = a[i].ltrb[2] - a[i].ltrb[0]; + float h = a[i].ltrb[3] - a[i].ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) + { + float w = b[j].ltrb[2] - b[j].ltrb[0]; + float h = b[j].ltrb[3] - b[j].ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + const cv::Vec4f &boxa = a[i].ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Vec4f &boxb = b[j].ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) +{ + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) + { + float w = a[i]->ltrb[2] - a[i]->ltrb[0]; + float h = a[i]->ltrb[3] - a[i]->ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) + { + float w = b[j]->ltrb[2] - b[j]->ltrb[0]; + float h = b[j]->ltrb[3] - b[j]->ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + const cv::Vec4f &boxa = a[i]->ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Vec4f &boxb = b[j]->ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b) +{ + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) + { + float w = a[i]->ltrb[2] - a[i]->ltrb[0]; + float h = a[i]->ltrb[3] - a[i]->ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) + { + float w = b[j].ltrb[2] - b[j].ltrb[0]; + float h = b[j].ltrb[3] - b[j].ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) + { + const cv::Vec4f &boxa = a[i]->ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) + { + const cv::Vec4f &boxb = b[j].ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/cpp/src/utils.cc b/PaddleDetection-release-2.6/deploy/cpp/src/utils.cc new file mode 100644 index 0000000000000000000000000000000000000000..7b4731cd9e25b3536417ade20d3f9ce5089755fd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/cpp/src/utils.cc @@ -0,0 +1,49 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/utils.h" + +namespace PaddleDetection { + +void nms(std::vector &input_boxes, float nms_threshold) { + std::sort(input_boxes.begin(), + input_boxes.end(), + [](ObjectResult a, ObjectResult b) { return a.confidence > b.confidence; }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).rect[2] - input_boxes.at(i).rect[0] + 1) + * (input_boxes.at(i).rect[3] - input_boxes.at(i).rect[1] + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].rect[0], input_boxes[j].rect[0]); + float yy1 = (std::max)(input_boxes[i].rect[1], input_boxes[j].rect[1]); + float xx2 = (std::min)(input_boxes[i].rect[2], input_boxes[j].rect[2]); + float yy2 = (std::min)(input_boxes[i].rect[3], input_boxes[j].rect[3]); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= nms_threshold) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } + else { + j++; + } + } + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/README.md b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d470dccffe7c9927eac6946d3ee47ea96c346a56 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/README.md @@ -0,0 +1,99 @@ +# Export ONNX Model +## Download pretrain paddle models + +* [ppyoloe-s](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) +* [ppyoloe-m](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) +* [ppyoloe-l](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) +* [ppyoloe-x](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) +* [ppyoloe-s-400e](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) + + +## Export paddle model for deploying + +```shell +python ./tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ + -o weights=ppyoloe_crn_s_300e_coco.pdparams \ + trt=True \ + exclude_nms=True \ + TestReader.inputs_def.image_shape=[3,640,640] \ + --output_dir ./ + +# if you want to try ppyoloe-s-400e model +python ./tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml \ + -o weights=ppyoloe_crn_s_400e_coco.pdparams \ + trt=True \ + exclude_nms=True \ + TestReader.inputs_def.image_shape=[3,640,640] \ + --output_dir ./ +``` + +## Check requirements +```shell +pip install onnx>=1.10.0 +pip install paddle2onnx +pip install onnx-simplifier +pip install onnx-graphsurgeon --index-url https://pypi.ngc.nvidia.com +# if use cuda-python infer, please install it +pip install cuda-python +# if use cupy infer, please install it +pip install cupy-cuda117 # cuda110-cuda117 are all available +``` + +## Export script +```shell +python ./deploy/end2end_ppyoloe/end2end.py \ + --model-dir ppyoloe_crn_s_300e_coco \ + --save-file ppyoloe_crn_s_300e_coco.onnx \ + --opset 11 \ + --batch-size 1 \ + --topk-all 100 \ + --iou-thres 0.6 \ + --conf-thres 0.4 +# if you want to try ppyoloe-s-400e model +python ./deploy/end2end_ppyoloe/end2end.py \ + --model-dir ppyoloe_crn_s_400e_coco \ + --save-file ppyoloe_crn_s_400e_coco.onnx \ + --opset 11 \ + --batch-size 1 \ + --topk-all 100 \ + --iou-thres 0.6 \ + --conf-thres 0.4 +``` +#### Description of all arguments + +- `--model-dir` : the path of ppyoloe export dir. +- `--save-file` : the path of export onnx. +- `--opset` : onnx opset version. +- `--img-size` : image size for exporting ppyoloe. +- `--batch-size` : batch size for exporting ppyoloe. +- `--topk-all` : topk objects for every image. +- `--iou-thres` : iou threshold for NMS algorithm. +- `--conf-thres` : confidence threshold for NMS algorithm. + +### TensorRT backend (TensorRT version>= 8.0.0) +#### TensorRT engine export +``` shell +/path/to/trtexec \ + --onnx=ppyoloe_crn_s_300e_coco.onnx \ + --saveEngine=ppyoloe_crn_s_300e_coco.engine \ + --fp16 # if export TensorRT fp16 model +# if you want to try ppyoloe-s-400e model +/path/to/trtexec \ + --onnx=ppyoloe_crn_s_400e_coco.onnx \ + --saveEngine=ppyoloe_crn_s_400e_coco.engine \ + --fp16 # if export TensorRT fp16 model +``` +#### TensorRT image infer + +``` shell +# cuda-python infer script +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_300e_coco.engine +# cupy infer script +python ./deploy/end2end_ppyoloe/cupy-python.py ppyoloe_crn_s_300e_coco.engine +# if you want to try ppyoloe-s-400e model +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_400e_coco.engine +# or +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_400e_coco.engine +``` \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cuda-python.py b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cuda-python.py new file mode 100644 index 0000000000000000000000000000000000000000..3c7bd7c84b3eeaa6bea55416d8a5eabd37ac4d33 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cuda-python.py @@ -0,0 +1,161 @@ +import sys +import requests +import cv2 +import random +import time +import numpy as np +import tensorrt as trt +from cuda import cudart +from pathlib import Path +from collections import OrderedDict, namedtuple + + +def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32): + # Resize and pad image while meeting stride-multiple constraints + shape = im.shape[:2] # current shape [height, width] + if isinstance(new_shape, int): + new_shape = (new_shape, new_shape) + + # Scale ratio (new / old) + r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) + if not scaleup: # only scale down, do not scale up (for better val mAP) + r = min(r, 1.0) + + # Compute padding + new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) + dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding + + if auto: # minimum rectangle + dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding + + dw /= 2 # divide padding into 2 sides + dh /= 2 + + if shape[::-1] != new_unpad: # resize + im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) + top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) + left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) + im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border + return im, r, (dw, dh) + + +w = Path(sys.argv[1]) + +assert w.exists() and w.suffix in ('.engine', '.plan'), 'Wrong engine path' + +names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', + 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', + 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', + 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', + 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', + 'hair drier', 'toothbrush'] +colors = {name: [random.randint(0, 255) for _ in range(3)] for i, name in enumerate(names)} + +url = 'https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/image1.jpg' +file = requests.get(url) +img = cv2.imdecode(np.frombuffer(file.content, np.uint8), 1) + +_, stream = cudart.cudaStreamCreate() + +mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(1, 3, 1, 1) +std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(1, 3, 1, 1) + +# Infer TensorRT Engine +Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr')) +logger = trt.Logger(trt.Logger.ERROR) +trt.init_libnvinfer_plugins(logger, namespace="") +with open(w, 'rb') as f, trt.Runtime(logger) as runtime: + model = runtime.deserialize_cuda_engine(f.read()) +bindings = OrderedDict() +fp16 = False # default updated below +for index in range(model.num_bindings): + name = model.get_binding_name(index) + dtype = trt.nptype(model.get_binding_dtype(index)) + shape = tuple(model.get_binding_shape(index)) + data = np.empty(shape, dtype=np.dtype(dtype)) + _, data_ptr = cudart.cudaMallocAsync(data.nbytes, stream) + bindings[name] = Binding(name, dtype, shape, data, data_ptr) + if model.binding_is_input(index) and dtype == np.float16: + fp16 = True +binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items()) +context = model.create_execution_context() + +image = img.copy() +image, ratio, dwdh = letterbox(image, auto=False) +image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + +image_copy = image.copy() + +image = image.transpose((2, 0, 1)) +image = np.expand_dims(image, 0) +image = np.ascontiguousarray(image) + +im = image.astype(np.float32) +im /= 255 +im -= mean +im /= std + +_, image_ptr = cudart.cudaMallocAsync(im.nbytes, stream) +cudart.cudaMemcpyAsync(image_ptr, im.ctypes.data, im.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream) + +# warmup for 10 times +for _ in range(10): + tmp = np.random.randn(1, 3, 640, 640).astype(np.float32) + _, tmp_ptr = cudart.cudaMallocAsync(tmp.nbytes, stream) + binding_addrs['image'] = tmp_ptr + context.execute_v2(list(binding_addrs.values())) + +start = time.perf_counter() +binding_addrs['image'] = image_ptr +context.execute_v2(list(binding_addrs.values())) +print(f'Cost {(time.perf_counter() - start) * 1000}ms') + +nums = bindings['num_dets'].data +boxes = bindings['det_boxes'].data +scores = bindings['det_scores'].data +classes = bindings['det_classes'].data + +cudart.cudaMemcpyAsync(nums.ctypes.data, + bindings['num_dets'].ptr, + nums.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(boxes.ctypes.data, + bindings['det_boxes'].ptr, + boxes.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(scores.ctypes.data, + bindings['det_scores'].ptr, + scores.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(classes.ctypes.data, + bindings['det_classes'].ptr, + classes.data.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) + +cudart.cudaStreamSynchronize(stream) +cudart.cudaStreamDestroy(stream) + +for i in binding_addrs.values(): + cudart.cudaFree(i) + +num = int(nums[0][0]) +box_img = boxes[0, :num].round().astype(np.int32) +score_img = scores[0, :num] +clss_img = classes[0, :num] +for i, (box, score, clss) in enumerate(zip(box_img, score_img, clss_img)): + name = names[int(clss)] + color = colors[name] + cv2.rectangle(image_copy, box[:2].tolist(), box[2:].tolist(), color, 2) + cv2.putText(image_copy, name, (int(box[0]), int(box[1]) - 2), cv2.FONT_HERSHEY_SIMPLEX, + 0.75, [225, 255, 255], thickness=2) + +cv2.imshow('Result', cv2.cvtColor(image_copy, cv2.COLOR_RGB2BGR)) +cv2.waitKey(0) diff --git a/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cupy-python.py b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cupy-python.py new file mode 100644 index 0000000000000000000000000000000000000000..a66eb77ecf3aa4c76c143050764429a2a06e8ba1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/cupy-python.py @@ -0,0 +1,131 @@ +import sys +import requests +import cv2 +import random +import time +import numpy as np +import cupy as cp +import tensorrt as trt +from PIL import Image +from collections import OrderedDict, namedtuple +from pathlib import Path + + +def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32): + # Resize and pad image while meeting stride-multiple constraints + shape = im.shape[:2] # current shape [height, width] + if isinstance(new_shape, int): + new_shape = (new_shape, new_shape) + + # Scale ratio (new / old) + r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) + if not scaleup: # only scale down, do not scale up (for better val mAP) + r = min(r, 1.0) + + # Compute padding + new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) + dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding + + if auto: # minimum rectangle + dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding + + dw /= 2 # divide padding into 2 sides + dh /= 2 + + if shape[::-1] != new_unpad: # resize + im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) + top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) + left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) + im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border + return im, r, (dw, dh) + + +names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', + 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', + 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', + 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', + 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', + 'hair drier', 'toothbrush'] +colors = {name: [random.randint(0, 255) for _ in range(3)] for i, name in enumerate(names)} + +url = 'https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/image1.jpg' +file = requests.get(url) +img = cv2.imdecode(np.frombuffer(file.content, np.uint8), 1) + +w = Path(sys.argv[1]) + +assert w.exists() and w.suffix in ('.engine', '.plan'), 'Wrong engine path' + +mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(1, 3, 1, 1) +std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(1, 3, 1, 1) + +mean = cp.asarray(mean) +std = cp.asarray(std) + +# Infer TensorRT Engine +Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr')) +logger = trt.Logger(trt.Logger.INFO) +trt.init_libnvinfer_plugins(logger, namespace="") +with open(w, 'rb') as f, trt.Runtime(logger) as runtime: + model = runtime.deserialize_cuda_engine(f.read()) +bindings = OrderedDict() +fp16 = False # default updated below +for index in range(model.num_bindings): + name = model.get_binding_name(index) + dtype = trt.nptype(model.get_binding_dtype(index)) + shape = tuple(model.get_binding_shape(index)) + data = cp.empty(shape, dtype=cp.dtype(dtype)) + bindings[name] = Binding(name, dtype, shape, data, int(data.data.ptr)) + if model.binding_is_input(index) and dtype == np.float16: + fp16 = True +binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items()) +context = model.create_execution_context() + +image = img.copy() +image, ratio, dwdh = letterbox(image, auto=False) +image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + +image_copy = image.copy() + +image = image.transpose((2, 0, 1)) +image = np.expand_dims(image, 0) +image = np.ascontiguousarray(image) + +im = cp.asarray(image) +im = im.astype(cp.float32) +im /= 255 +im -= mean +im /= std + +# warmup for 10 times +for _ in range(10): + tmp = cp.random.randn(1, 3, 640, 640).astype(cp.float32) + binding_addrs['image'] = int(tmp.data.ptr) + context.execute_v2(list(binding_addrs.values())) + +start = time.perf_counter() +binding_addrs['image'] = int(im.data.ptr) +context.execute_v2(list(binding_addrs.values())) +print(f'Cost {(time.perf_counter() - start) * 1000}ms') + +nums = bindings['num_dets'].data +boxes = bindings['det_boxes'].data +scores = bindings['det_scores'].data +classes = bindings['det_classes'].data + +num = int(nums[0][0]) +box_img = boxes[0, :num].round().astype(cp.int32) +score_img = scores[0, :num] +clss_img = classes[0, :num] +for i, (box, score, clss) in enumerate(zip(box_img, score_img, clss_img)): + name = names[int(clss)] + color = colors[name] + cv2.rectangle(image_copy, box[:2].tolist(), box[2:].tolist(), color, 2) + cv2.putText(image_copy, name, (int(box[0]), int(box[1]) - 2), cv2.FONT_HERSHEY_SIMPLEX, + 0.75, [225, 255, 255], thickness=2) + +cv2.imshow('Result', cv2.cvtColor(image_copy, cv2.COLOR_RGB2BGR)) +cv2.waitKey(0) diff --git a/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/end2end.py b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/end2end.py new file mode 100644 index 0000000000000000000000000000000000000000..fcfbf019a5d5755768e7defd573203a20a020ef7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/end2end_ppyoloe/end2end.py @@ -0,0 +1,97 @@ +import argparse +import onnx +import onnx_graphsurgeon as gs +import numpy as np + +from pathlib import Path +from paddle2onnx.legacy.command import program2onnx +from collections import OrderedDict + + +def main(opt): + model_dir = Path(opt.model_dir) + save_file = Path(opt.save_file) + assert model_dir.exists() and model_dir.is_dir() + if save_file.is_dir(): + save_file = (save_file / model_dir.stem).with_suffix('.onnx') + elif save_file.is_file() and save_file.suffix != '.onnx': + save_file = save_file.with_suffix('.onnx') + input_shape_dict = {'image': [opt.batch_size, 3, *opt.img_size], + 'scale_factor': [opt.batch_size, 2]} + program2onnx(str(model_dir), str(save_file), + 'model.pdmodel', 'model.pdiparams', + opt.opset, input_shape_dict=input_shape_dict) + onnx_model = onnx.load(save_file) + try: + import onnxsim + onnx_model, check = onnxsim.simplify(onnx_model) + assert check, 'assert check failed' + except Exception as e: + print(f'Simplifier failure: {e}') + onnx.checker.check_model(onnx_model) + graph = gs.import_onnx(onnx_model) + graph.fold_constants() + graph.cleanup().toposort() + mul = concat = None + for node in graph.nodes: + if node.op == 'Div' and node.i(0).op == 'Mul': + mul = node.i(0) + if node.op == 'Concat' and node.o().op == 'Reshape' and node.o().o().op == 'ReduceSum': + concat = node + + assert mul.outputs[0].shape[1] == concat.outputs[0].shape[2], 'Something wrong in outputs shape' + + anchors = mul.outputs[0].shape[1] + classes = concat.outputs[0].shape[1] + + scores = gs.Variable(name='scores', shape=[opt.batch_size, anchors, classes], dtype=np.float32) + graph.layer(op='Transpose', name='lastTranspose', + inputs=[concat.outputs[0]], + outputs=[scores], + attrs=OrderedDict(perm=[0, 2, 1])) + + graph.inputs = [graph.inputs[0]] + + attrs = OrderedDict( + plugin_version="1", + background_class=-1, + max_output_boxes=opt.topk_all, + score_threshold=opt.conf_thres, + iou_threshold=opt.iou_thres, + score_activation=False, + box_coding=0, ) + outputs = [gs.Variable("num_dets", np.int32, [opt.batch_size, 1]), + gs.Variable("det_boxes", np.float32, [opt.batch_size, opt.topk_all, 4]), + gs.Variable("det_scores", np.float32, [opt.batch_size, opt.topk_all]), + gs.Variable("det_classes", np.int32, [opt.batch_size, opt.topk_all])] + graph.layer(op='EfficientNMS_TRT', name="batched_nms", + inputs=[mul.outputs[0], scores], + outputs=outputs, + attrs=attrs) + graph.outputs = outputs + graph.cleanup().toposort() + onnx.save(gs.export_onnx(graph), save_file) + + +def parse_opt(): + parser = argparse.ArgumentParser() + parser.add_argument('--model-dir', type=str, + default=None, + help='paddle static model') + parser.add_argument('--save-file', type=str, + default=None, + help='onnx model save path') + parser.add_argument('--opset', type=int, default=11, help='opset version') + parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') + parser.add_argument('--batch-size', type=int, default=1, help='batch size') + parser.add_argument('--topk-all', type=int, default=100, help='topk objects for every images') + parser.add_argument('--iou-thres', type=float, default=0.45, help='iou threshold for NMS') + parser.add_argument('--conf-thres', type=float, default=0.25, help='conf threshold for NMS') + opt = parser.parse_args() + opt.img_size *= 2 if len(opt.img_size) == 1 else 1 + return opt + + +if __name__ == '__main__': + opt = parse_opt() + main(opt) diff --git a/PaddleDetection-release-2.6/deploy/lite/Makefile b/PaddleDetection-release-2.6/deploy/lite/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..9c439382acfafea440de93bb2f3fa91977ad3891 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/Makefile @@ -0,0 +1,90 @@ +ARM_ABI = arm8#[arm7/arm8] +export ARM_ABI + +ifeq ($(ARM_ABI), arm8) + ARM_PLAT=arm64-v8a +else + ARM_PLAT=armeabi-v7a +endif +${info ARM_ABI: ${ARM_ABI}} +${info ARM_PLAT: ${ARM_PLAT}; option[arm7/arm8]} + +include ../Makefile.def + +LITE_ROOT=../../../ +${info LITE_ROOT: $(abspath ${LITE_ROOT})} + +THIRD_PARTY_DIR=third_party +${info THIRD_PARTY_DIR: $(abspath ${THIRD_PARTY_DIR})} + + +OPENCV_VERSION=opencv4.1.0 +OPENCV_LIBS = ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/libs/libopencv_imgcodecs.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/libs/libopencv_imgproc.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/libs/libopencv_core.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/libtegra_hal.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/liblibjpeg-turbo.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/liblibwebp.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/liblibpng.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/liblibjasper.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/liblibtiff.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/libIlmImf.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/libtbb.a \ + ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/3rdparty/libs/libcpufeatures.a + + +LITE_LIBS = -L${LITE_ROOT}/cxx/lib/ -lpaddle_light_api_shared +############################################################### +# How to use one of static libaray: # +# `libpaddle_api_full_bundled.a` # +# `libpaddle_api_light_bundled.a` # +############################################################### +# Note: default use lite's shared library. # +############################################################### +# 1. Comment above line using `libpaddle_light_api_shared.so` +# 2. Undo comment below line using `libpaddle_api_light_bundled.a` +# LITE_LIBS = ${LITE_ROOT}/cxx/lib/libpaddle_api_light_bundled.a + +CXX_LIBS = $(LITE_LIBS) ${OPENCV_LIBS} $(SYSTEM_LIBS) + +LOCAL_DIRSRCS=$(wildcard src/*.cc) +LOCAL_SRCS=$(notdir $(LOCAL_DIRSRCS)) +LOCAL_OBJS=$(patsubst %.cpp, %.o, $(patsubst %.cc, %.o, $(LOCAL_SRCS))) + +JSON_OBJS = json_reader.o json_value.o json_writer.o + +main: $(LOCAL_OBJS) $(JSON_OBJS) fetch_opencv + $(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) $(LOCAL_OBJS) $(JSON_OBJS) -o main $(CXX_LIBS) $(LDFLAGS) + +fetch_opencv: + @ test -d ${THIRD_PARTY_DIR} || mkdir ${THIRD_PARTY_DIR} + @ test -e ${THIRD_PARTY_DIR}/${OPENCV_VERSION}.tar.gz || \ + (echo "fetch opencv libs" && \ + wget -P ${THIRD_PARTY_DIR} https://paddle-inference-dist.bj.bcebos.com/${OPENCV_VERSION}.tar.gz) + @ test -d ${THIRD_PARTY_DIR}/${OPENCV_VERSION} || \ + tar -zxf ${THIRD_PARTY_DIR}/${OPENCV_VERSION}.tar.gz -C ${THIRD_PARTY_DIR} + +fetch_json_code: + @ test -d ${THIRD_PARTY_DIR} || mkdir ${THIRD_PARTY_DIR} + @ test -e ${THIRD_PARTY_DIR}/jsoncpp_code.tar.gz || \ + (echo "fetch jsoncpp_code.tar.gz" && \ + wget -P ${THIRD_PARTY_DIR} https://bj.bcebos.com/v1/paddledet/deploy/jsoncpp_code.tar.gz ) + @ test -d ${THIRD_PARTY_DIR}/jsoncpp_code || \ + tar -zxf ${THIRD_PARTY_DIR}/jsoncpp_code.tar.gz -C ${THIRD_PARTY_DIR} + +LOCAL_INCLUDES = -I./ -Iinclude +OPENCV_INCLUDE = -I${THIRD_PARTY_DIR}/${OPENCV_VERSION}/${ARM_PLAT}/include +JSON_INCLUDE = -I${THIRD_PARTY_DIR}/jsoncpp_code/include +CXX_INCLUDES = ${LOCAL_INCLUDES} ${INCLUDES} ${OPENCV_INCLUDE} ${JSON_INCLUDE} -I$(LITE_ROOT)/cxx/include + + +$(LOCAL_OBJS): %.o: src/%.cc fetch_opencv fetch_json_code + $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -c $< -o $@ + +$(JSON_OBJS): %.o: ${THIRD_PARTY_DIR}/jsoncpp_code/%.cpp fetch_json_code + $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -c $< -o $@ + +.PHONY: clean fetch_opencv fetch_json_code +clean: + rm -rf $(LOCAL_OBJS) $(JSON_OBJS) + rm -f main diff --git a/PaddleDetection-release-2.6/deploy/lite/README.md b/PaddleDetection-release-2.6/deploy/lite/README.md new file mode 100644 index 0000000000000000000000000000000000000000..30447460eb6c4ccdf5c1013d1ea2d631d9073fba --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/README.md @@ -0,0 +1,306 @@ +# Paddle-Lite端侧部署 + +[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 +本目录提供了PaddleDetection中主要模型在Paddle-Lite上的端到端部署代码。用户可以通过本教程了解如何使用该部分代码,基于Paddle-Lite实现在移动端部署PaddleDetection模型。 + + +## 1. 准备环境 + +### 运行准备 +- 电脑(编译Paddle Lite) +- 安卓手机(armv7或armv8) + +### 1.1 准备交叉编译环境 +交叉编译环境用于编译 Paddle Lite 和 PaddleDetection 的C++ demo。 +支持多种开发环境,不同开发环境的编译流程请参考对应文档,请确保安装完成Java jdk、Android NDK(R17 < version < R21,其他版本以上未做测试)。 +设置NDK_ROOT命令: +```shell +export NDK_ROOT=[YOUR_NDK_PATH]/android-ndk-r17c +``` + + +1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) +2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) +3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) + +### 1.2 准备预测库 + +预测库有两种获取方式: +1. [**建议**]直接从[Paddle-Lite Release](https://github.com/PaddlePaddle/Paddle-Lite/releases)中, 根据设备类型与架构选择对应的预编译库,请注意使用模型FP32/16版本需要与库相对应,库文件的说明请参考[官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)。 + +**注意**:(1) 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo) +(2)PP-PicoDet部署需要Paddle Lite 2.11以上版本。 + + +2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下(Lite库在不断更新,如若下列命令无效,请以Lite官方repo为主): +```shell +git clone https://github.com/PaddlePaddle/Paddle-Lite.git +cd Paddle-Lite +# 如果使用编译方式,建议使用develop分支编译预测库 +git checkout develop +# FP32 +./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON +# FP16 +./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON +``` + +**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_options.html)。 + +直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。 +预测库的文件目录如下: + +``` +inference_lite_lib.android.armv8/ +|-- cxx C++ 预测库和头文件 +| |-- include C++ 头文件 +| | |-- paddle_api.h +| | |-- paddle_image_preprocess.h +| | |-- paddle_lite_factory_helper.h +| | |-- paddle_place.h +| | |-- paddle_use_kernels.h +| | |-- paddle_use_ops.h +| | `-- paddle_use_passes.h +| `-- lib C++预测库 +| |-- libpaddle_api_light_bundled.a C++静态库 +| `-- libpaddle_light_api_shared.so C++动态库 +|-- java Java预测库 +| |-- jar +| | `-- PaddlePredictor.jar +| |-- so +| | `-- libpaddle_lite_jni.so +| `-- src +|-- demo C++和Java示例代码 +| |-- cxx C++ 预测库demo, 请将本文档目录下的PaddleDetection相关代码拷贝至该文件夹下执行交叉编译。 +| `-- java Java 预测库demo +``` + +## 2 开始运行 + +### 2.1 模型转换 + +Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,并转换为推理所使用的文件格式。目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。 + +**注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。 + +#### 2.1.1 安装paddle_lite_opt工具 +安装`paddle_lite_opt`工具有如下两种方法, **请注意**,无论使用哪种方法,请尽量保证`paddle_lite_opt`工具和预测库的版本一致,以避免未知的Bug。 +1. [**建议**]pip安装paddlelite并进行转换 + ```shell + pip install paddlelite + ``` + +2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具 + + 模型优化需要Paddle-Lite的`opt`可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下: + ```shell + # 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite + git clone https://github.com/PaddlePaddle/Paddle-Lite.git + cd Paddle-Lite + git checkout develop + # 启动编译 + ./lite/tools/build.sh build_optimize_tool + ``` + + 编译完成后,`opt`文件位于`build.opt/lite/api/`下,可通过如下方式查看`opt`的运行选项和使用方式; + ```shell + cd build.opt/lite/api/ + ./opt + ``` + + `opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。 + +之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下: + +|选项|说明| +|-|-| +|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径| +|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径| +|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer| +|--optimize_out|优化模型的输出路径| +|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm| +| --enable_fp16| true/false,是否使用fp16进行推理。如果开启,需要使用对应fp16的预测库| + +更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html) + +`--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。 + + +#### 2.1.2 转换示例 + +下面以PaddleDetection中的 `PicoDet` 模型为例,介绍使用`paddle_lite_opt`完成预训练模型到inference模型,再到Paddle-Lite优化模型的转换。 + +```shell +# 进入PaddleDetection根目录 +cd PaddleDetection_root_path + +# 将预训练模型导出为inference模型 +python tools/export_model.py -c configs/picodet/picodet_s_320_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams --output_dir=output_inference + +# 将inference模型转化为Paddle-Lite优化模型 +# FP32 +paddle_lite_opt --valid_targets=arm --model_file=output_inference/picodet_s_320_coco/model.pdmodel --param_file=output_inference/picodet_s_320_coco/model.pdiparams --optimize_out=output_inference/picodet_s_320_coco/model +# FP16 +paddle_lite_opt --valid_targets=arm --model_file=output_inference/picodet_s_320_coco/model.pdmodel --param_file=output_inference/picodet_s_320_coco/model.pdiparams --optimize_out=output_inference/picodet_s_320_coco/model --enable_fp16=true + +# 将inference模型配置转化为json格式 +python deploy/lite/convert_yml_to_json.py output_inference/picodet_s_320_coco/infer_cfg.yml +``` + +最终在output_inference/picodet_s_320_coco/文件夹下生成`model.nb` 和 `infer_cfg.json`的文件。 + +**注意**:`--optimize_out` 参数为优化后模型的保存路径,无需加后缀`.nb`;`--model_file` 参数为模型结构信息文件的路径,`--param_file` 参数为模型权重信息文件的路径,请注意文件名。 + +### 2.2 与手机联调 + +首先需要进行一些准备工作。 +1. 准备一台arm8的安卓手机,如果编译的预测库是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI=arm7`。 +2. 电脑上安装ADB工具,用于调试。 ADB安装方式如下: + + 2.1. MAC电脑安装ADB: + + ```shell + brew cask install android-platform-tools + ``` + 2.2. Linux安装ADB + ```shell + sudo apt update + sudo apt install -y wget adb + ``` + 2.3. Window安装ADB + + win上安装需要去谷歌的安卓平台下载ADB软件包进行安装:[链接](https://developer.android.com/studio) + +3. 手机连接电脑后,开启手机`USB调试`选项,选择`文件传输`模式,在电脑终端中输入: + +```shell +adb devices +``` +如果有device输出,则表示安装成功,如下所示: +``` +List of devices attached +744be294 device +``` + +4. 编译lite部署代码生成移动端可执行文件 + +```shell +cd {PadddleDetection_Root} +cd deploy/lite/ + +inference_lite_path=/{lite prediction library path}/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv/ +mkdir $inference_lite_path/demo/cxx/lite + +cp -r Makefile src/ include/ *runtime_config.json $inference_lite_path/demo/cxx/lite + +cd $inference_lite_path/demo/cxx/lite + +# 执行编译,等待完成后得到可执行文件main +make ARM_ABI=arm8 +#如果是arm7,则执行 make ARM_ABI = arm7 (或者在Makefile中修改该项) + +``` + +5. 准备优化后的模型、预测库文件、测试图像。 + +```shell +mkdir deploy +cp main *runtime_config.json deploy/ +cd deploy +mkdir model_det +mkdir model_keypoint + +# 将优化后的模型、预测库文件、测试图像放置在预测库中的demo/cxx/detection文件夹下 +cp {PadddleDetection_Root}/output_inference/picodet_s_320_coco/model.nb ./model_det/ +cp {PadddleDetection_Root}/output_inference/picodet_s_320_coco/infer_cfg.json ./model_det/ + +# 如果需要关键点模型,则只需操作: +cp {PadddleDetection_Root}/output_inference/hrnet_w32_256x192/model.nb ./model_keypoint/ +cp {PadddleDetection_Root}/output_inference/hrnet_w32_256x192/infer_cfg.json ./model_keypoint/ + +# 将测试图像复制到deploy文件夹中 +cp [your_test_img].jpg ./demo.jpg + +# 将C++预测动态库so文件复制到deploy文件夹中 +cp ../../../cxx/lib/libpaddle_light_api_shared.so ./ +``` + +执行完成后,deploy文件夹下将有如下文件格式: + +``` +deploy/ +|-- model_det/ +| |--model.nb 优化后的检测模型文件 +| |--infer_cfg.json 检测器模型配置文件 +|-- model_keypoint/ +| |--model.nb 优化后的关键点模型文件 +| |--infer_cfg.json 关键点模型配置文件 +|-- main 生成的移动端执行文件 +|-- det_runtime_config.json 目标检测执行时参数配置文件 +|-- keypoint_runtime_config.json 关键点检测执行时参数配置文件 +|-- libpaddle_light_api_shared.so Paddle-Lite库文件 +``` + +**注意:** +* `det_runtime_config.json` 包含了目标检测的超参数,请按需进行修改: + +```shell +{ + "model_dir_det": "./model_det/", #检测器模型路径 + "batch_size_det": 1, #检测预测时batchsize + "threshold_det": 0.5, #检测器输出阈值 + "image_file": "demo.jpg", #测试图片 + "image_dir": "", #测试图片文件夹 + "run_benchmark": true, #性能测试开关 + "cpu_threads": 4 #线程数 +} +``` + +* `keypoint_runtime_config.json` 同时包含了目标检测和关键点检测的超参数,支持Top-Down方案的推理流程,请按需进行修改: +```shell +{ + "model_dir_det": "./model_det/", #检测模型路径 + "batch_size_det": 1, #检测模型预测时batchsize, 存在关键点模型时只能为1 + "threshold_det": 0.5, #检测器输出阈值 + "model_dir_keypoint": "./model_keypoint/", #关键点模型路径(不使用需为空字符) + "batch_size_keypoint": 8, #关键点预测时batchsize + "threshold_keypoint": 0.5, #关键点输出阈值 + "image_file": "demo.jpg", #测试图片 + "image_dir": "", #测试图片文件夹 + "run_benchmark": true, #性能测试开关 + "cpu_threads": 4 #线程数 + "use_dark_decode": true #是否使用DARK解码关键点坐标 +} +``` + +6. 启动调试,上述步骤完成后就可以使用ADB将文件夹 `deploy/` push到手机上运行,步骤如下: + +```shell +# 将上述deploy文件夹push到手机上 +adb push deploy /data/local/tmp/ + +adb shell +cd /data/local/tmp/deploy +export LD_LIBRARY_PATH=/data/local/tmp/deploy:$LD_LIBRARY_PATH + +# 修改权限为可执行 +chmod 777 main +# 以检测为例,执行程序 +./main det_runtime_config.json +``` + +如果对代码做了修改,则需要重新编译并push到手机上。 + +运行效果如下: + +
    + +
    + + +## FAQ +Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗? +A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件及其对应模型配置文件`infer_cfg.json`,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。 + +Q2:换一个图测试怎么做? +A2:替换 deploy 下的测试图像为你想要测试的图像,使用 ADB 再次 push 到手机上即可。 diff --git a/PaddleDetection-release-2.6/deploy/lite/convert_yml_to_json.py b/PaddleDetection-release-2.6/deploy/lite/convert_yml_to_json.py new file mode 100644 index 0000000000000000000000000000000000000000..6282c783050b26a9b07e7e96e87cac4711a9d20b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/convert_yml_to_json.py @@ -0,0 +1,14 @@ +import yaml +import json +import sys + +yamlf = sys.argv[1] + +assert yamlf.endswith(".yml") + +with open(yamlf, 'r') as rf: + yaml_data = yaml.safe_load(rf) + +jsonf = yamlf[:-4] + ".json" +with open(jsonf, 'w') as wf: + json.dump(yaml_data, wf, indent=4) diff --git a/PaddleDetection-release-2.6/deploy/lite/det_runtime_config.json b/PaddleDetection-release-2.6/deploy/lite/det_runtime_config.json new file mode 100644 index 0000000000000000000000000000000000000000..a1bc4ec3bdcf226f8c31caf2ff7b00e7b832050d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/det_runtime_config.json @@ -0,0 +1,10 @@ +{ + "model_dir_det": "./model_det/", + "batch_size_det": 1, + "threshold_det": 0.5, + "image_file": "./demo.jpg", + "image_dir": "", + "run_benchmark": false, + "cpu_threads": 4 + } + \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/lite/include/config_parser.h b/PaddleDetection-release-2.6/deploy/lite/include/config_parser.h new file mode 100644 index 0000000000000000000000000000000000000000..60d94c69e3b17aa9afea5dfb90e286f44d63f0bc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/config_parser.h @@ -0,0 +1,104 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include +#include +#include +#include +#include + +#include "json/json.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleDetection { + +void load_jsonf(std::string jsonfile, Json::Value& jsondata); + +// Inference model configuration parser +class ConfigPaser { + public: + ConfigPaser() {} + + ~ConfigPaser() {} + + bool load_config(const std::string& model_dir, + const std::string& cfg = "infer_cfg") { + Json::Value config; + load_jsonf(model_dir + OS_PATH_SEP + cfg + ".json", config); + + // Get model arch : YOLO, SSD, RetinaNet, RCNN, Face, PicoDet, HRNet + if (config.isMember("arch")) { + arch_ = config["arch"].as(); + } else { + std::cerr + << "Please set model arch," + << "support value : YOLO, SSD, RetinaNet, RCNN, Face, PicoDet, HRNet." + << std::endl; + return false; + } + + // Get draw_threshold for visualization + if (config.isMember("draw_threshold")) { + draw_threshold_ = config["draw_threshold"].as(); + } else { + std::cerr << "Please set draw_threshold." << std::endl; + return false; + } + // Get Preprocess for preprocessing + if (config.isMember("Preprocess")) { + preprocess_info_ = config["Preprocess"]; + } else { + std::cerr << "Please set Preprocess." << std::endl; + return false; + } + // Get label_list for visualization + if (config.isMember("label_list")) { + label_list_.clear(); + for (auto item : config["label_list"]) { + label_list_.emplace_back(item.as()); + } + } else { + std::cerr << "Please set label_list." << std::endl; + return false; + } + + // Get NMS for postprocess + if (config.isMember("NMS")) { + nms_info_ = config["NMS"]; + } + // Get fpn_stride in PicoDet + if (config.isMember("fpn_stride")) { + fpn_stride_.clear(); + for (auto item : config["fpn_stride"]) { + fpn_stride_.emplace_back(item.as()); + } + } + + return true; + } + float draw_threshold_; + std::string arch_; + Json::Value preprocess_info_; + Json::Value nms_info_; + std::vector label_list_; + std::vector fpn_stride_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/include/keypoint_detector.h b/PaddleDetection-release-2.6/deploy/lite/include/keypoint_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..d41ba0adde31b81c6a797a7c70cae7ec7fdac37d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/keypoint_detector.h @@ -0,0 +1,107 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/keypoint_postprocess.h" +#include "include/preprocess_op.h" + +using namespace paddle::lite_api; // NOLINT + +namespace PaddleDetection { +// Object KeyPoint Result +struct KeyPointResult { + // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf + std::vector keypoints; + int num_joints = -1; +}; + +// Visualiztion KeyPoint Result +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold = 0.2); + +class KeyPointDetector { + public: + explicit KeyPointDetector(const std::string& model_dir, + int cpu_threads = 1, + const int batch_size = 1, + bool use_dark = true) { + config_.load_config(model_dir); + threshold_ = config_.draw_threshold_; + use_dark_ = use_dark; + preprocessor_.Init(config_.preprocess_info_); + printf("before keypoint detector\n"); + LoadModel(model_dir, cpu_threads); + printf("create keypoint detector\n"); + } + + // Load Paddle inference model + void LoadModel(std::string model_file, int num_theads); + + // Run predictor + void Predict(const std::vector imgs, + std::vector>& center, + std::vector>& scale, + const int warmup = 0, + const int repeats = 1, + std::vector* result = nullptr, + std::vector* times = nullptr); + + // Get Model Label list + const std::vector& GetLabelList() const { + return config_.label_list_; + } + + bool use_dark(){return this->use_dark_;} + + inline float get_threshold() {return threshold_;}; + + private: + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center, + std::vector>& scale); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + std::vector output_data_; + std::vector idx_data_; + float threshold_; + ConfigPaser config_; + bool use_dark_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/include/keypoint_postprocess.h b/PaddleDetection-release-2.6/deploy/lite/include/keypoint_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..4e0e54c2640104488ef85e733af1c16bdc2d86aa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/keypoint_postprocess.h @@ -0,0 +1,57 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include + +std::vector get_3rd_point(std::vector& a, std::vector& b); +std::vector get_dir(float src_point_x, float src_point_y, float rot_rad); +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& x, int p, int num); +cv::Mat get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + int inv); +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine); +void box_to_center_scale(std::vector& box, + int width, + int height, + std::vector& center, + std::vector& scale); +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx); +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK = true); diff --git a/PaddleDetection-release-2.6/deploy/lite/include/object_detector.h b/PaddleDetection-release-2.6/deploy/lite/include/object_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..7874a9b8bba087f5731ac9d91ebd308a8e0d5ef2 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/object_detector.h @@ -0,0 +1,98 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/preprocess_op.h" +#include "include/utils.h" +#include "include/picodet_postprocess.h" + +using namespace paddle::lite_api; // NOLINT + +namespace PaddleDetection { + +// Generate visualization colormap for each class +std::vector GenerateColorMap(int num_class); + +// Visualiztion Detection Result +cv::Mat VisualizeResult(const cv::Mat& img, + const std::vector& results, + const std::vector& lables, + const std::vector& colormap, + const bool is_rbox); + +class ObjectDetector { + public: + explicit ObjectDetector(const std::string& model_dir, + int cpu_threads = 1, + const int batch_size = 1) { + config_.load_config(model_dir); + printf("config created\n"); + threshold_ = config_.draw_threshold_; + preprocessor_.Init(config_.preprocess_info_); + printf("before object detector\n"); + LoadModel(model_dir, cpu_threads); + printf("create object detector\n"); + } + + // Load Paddle inference model + void LoadModel(std::string model_file, int num_theads); + + // Run predictor + void Predict(const std::vector& imgs, + const double threshold = 0.5, + const int warmup = 0, + const int repeats = 1, + std::vector* result = nullptr, + std::vector* bbox_num = nullptr, + std::vector* times = nullptr); + + // Get Model Label list + const std::vector& GetLabelList() const { + return config_.label_list_; + } + + private: + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(const std::vector mats, + std::vector* result, + std::vector bbox_num, + bool is_rbox); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + std::vector output_data_; + std::vector out_bbox_num_data_; + float threshold_; + ConfigPaser config_; + +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/include/picodet_postprocess.h b/PaddleDetection-release-2.6/deploy/lite/include/picodet_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..ac33e92ba167cb8a9c3bfaae9991522c358d6d0c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/picodet_postprocess.h @@ -0,0 +1,39 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include "include/utils.h" + +namespace PaddleDetection { + +void PicoDetPostProcess(std::vector* results, + std::vector outs, + std::vector fpn_stride, + std::vector im_shape, + std::vector scale_factor, + float score_threshold = 0.3, + float nms_threshold = 0.5, + int num_class = 80, + int reg_max = 7); + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/include/preprocess_op.h b/PaddleDetection-release-2.6/deploy/lite/include/preprocess_op.h new file mode 100644 index 0000000000000000000000000000000000000000..86bb56c80a6f24afdf8a0e139639fe032f170ba3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/preprocess_op.h @@ -0,0 +1,188 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include "json/json.h" + +namespace PaddleDetection { + +// Object for storing all preprocessed data +class ImageBlob { + public: + // image width and height + std::vector im_shape_; + // Buffer for image data after preprocessing + std::vector im_data_; + // in net data shape(after pad) + std::vector in_net_shape_; + // Evaluation image width and height + // std::vector eval_im_size_f_; + // Scale factor for image size to origin image size + std::vector scale_factor_; +}; + +// Abstraction of preprocessing opration class +class PreprocessOp { + public: + virtual void Init(const Json::Value& item) = 0; + virtual void Run(cv::Mat* im, ImageBlob* data) = 0; +}; + +class InitInfo : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class NormalizeImage : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) { + mean_.clear(); + scale_.clear(); + for (auto tmp : item["mean"]) { + mean_.emplace_back(tmp.as()); + } + for (auto tmp : item["std"]) { + scale_.emplace_back(tmp.as()); + } + is_scale_ = item["is_scale"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + // CHW or HWC + std::vector mean_; + std::vector scale_; + bool is_scale_; +}; + +class Permute : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class Resize : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) { + interp_ = item["interp"].as(); + // max_size_ = item["target_size"].as(); + keep_ratio_ = item["keep_ratio"].as(); + target_size_.clear(); + for (auto tmp : item["target_size"]) { + target_size_.emplace_back(tmp.as()); + } + } + + // Compute best resize scale for x-dimension, y-dimension + std::pair GenerateScale(const cv::Mat& im); + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int interp_; + bool keep_ratio_; + std::vector target_size_; + std::vector in_net_shape_; +}; + +// Models with FPN need input shape % stride == 0 +class PadStride : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) { + stride_ = item["stride"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int stride_; +}; + +class TopDownEvalAffine : public PreprocessOp { + public: + virtual void Init(const Json::Value& item) { + trainsize_.clear(); + for (auto tmp : item["trainsize"]) { + trainsize_.emplace_back(tmp.as()); + } + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int interp_ = 1; + std::vector trainsize_; +}; + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio = 0.15); + +class Preprocessor { + public: + void Init(const Json::Value& config_node) { + // initialize image info at first + ops_["InitInfo"] = std::make_shared(); + for (const auto& item : config_node) { + auto op_name = item["type"].as(); + + ops_[op_name] = CreateOp(op_name); + ops_[op_name]->Init(item); + } + } + + std::shared_ptr CreateOp(const std::string& name) { + if (name == "Resize") { + return std::make_shared(); + } else if (name == "Permute") { + return std::make_shared(); + } else if (name == "NormalizeImage") { + return std::make_shared(); + } else if (name == "PadStride") { + // use PadStride instead of PadBatch + return std::make_shared(); + } else if (name == "TopDownEvalAffine") { + return std::make_shared(); + } + std::cerr << "can not find function of OP: " << name + << " and return: nullptr" << std::endl; + return nullptr; + } + + void Run(cv::Mat* im, ImageBlob* data); + + public: + static const std::vector RUN_ORDER; + + private: + std::unordered_map> ops_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/include/utils.h b/PaddleDetection-release-2.6/deploy/lite/include/utils.h new file mode 100644 index 0000000000000000000000000000000000000000..3802e1267176a050402d1fdf742e54a79f33ffb9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/include/utils.h @@ -0,0 +1,39 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +namespace PaddleDetection { + +// Object Detection Result +struct ObjectResult { + // Rectangle coordinates of detected object: left, right, top, down + std::vector rect; + // Class id of detected object + int class_id; + // Confidence of detected object + float confidence; +}; + +void nms(std::vector &input_boxes, float nms_threshold); + +} // namespace PaddleDetection \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/lite/keypoint_runtime_config.json b/PaddleDetection-release-2.6/deploy/lite/keypoint_runtime_config.json new file mode 100644 index 0000000000000000000000000000000000000000..80971e51a8c79534704d50be2a8959f631a3cf83 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/keypoint_runtime_config.json @@ -0,0 +1,13 @@ +{ + "model_dir_det": "./model_det/", + "batch_size_det": 1, + "threshold_det": 0.5, + "model_dir_keypoint": "./model_keypoint/", + "batch_size_keypoint": 8, + "threshold_keypoint": 0.5, + "image_file": "./demo.jpg", + "image_dir": "", + "run_benchmark": false, + "cpu_threads": 4, + "use_dark_decode": true +} diff --git a/PaddleDetection-release-2.6/deploy/lite/src/config_parser.cc b/PaddleDetection-release-2.6/deploy/lite/src/config_parser.cc new file mode 100644 index 0000000000000000000000000000000000000000..70c43e76c2c85d2917eb1c3384304260c591b85c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/config_parser.cc @@ -0,0 +1,32 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/config_parser.h" + +namespace PaddleDetection { + +void load_jsonf(std::string jsonfile, Json::Value &jsondata) { + std::ifstream ifs; + ifs.open(jsonfile); + + Json::CharReaderBuilder builder; + builder["collectComments"] = true; + JSONCPP_STRING errs; + if (!parseFromStream(builder, ifs, &jsondata, &errs)) { + std::cout << errs << std::endl; + return; + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/src/keypoint_detector.cc b/PaddleDetection-release-2.6/deploy/lite/src/keypoint_detector.cc new file mode 100644 index 0000000000000000000000000000000000000000..2be7471779355614457f52292443bf05ec73d21c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/keypoint_detector.cc @@ -0,0 +1,224 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/keypoint_detector.h" + +namespace PaddleDetection { + +// Load Model and create model predictor +void KeyPointDetector::LoadModel(std::string model_file, int num_theads) { + MobileConfig config; + config.set_threads(num_theads); + config.set_model_from_file(model_file + "/model.nb"); + config.set_power_mode(LITE_POWER_HIGH); + + predictor_ = std::move(CreatePaddlePredictor(config)); +} + +// Visualiztion MaskDetector results +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold) { + const int edge[][2] = {{0, 1}, + {0, 2}, + {1, 3}, + {2, 4}, + {3, 5}, + {4, 6}, + {5, 7}, + {6, 8}, + {7, 9}, + {8, 10}, + {5, 11}, + {6, 12}, + {11, 13}, + {12, 14}, + {13, 15}, + {14, 16}, + {11, 12}}; + cv::Mat vis_img = img.clone(); + for (int batchid = 0; batchid < results.size(); batchid++) { + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[i * 3] > threshold) { + int x_coord = int(results[batchid].keypoints[i * 3 + 1]); + int y_coord = int(results[batchid].keypoints[i * 3 + 2]); + cv::circle(vis_img, + cv::Point2d(x_coord, y_coord), + 1, + cv::Scalar(0, 0, 255), + 2); + } + } + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[edge[i][0] * 3] > threshold && + results[batchid].keypoints[edge[i][1] * 3] > threshold) { + int x_start = int(results[batchid].keypoints[edge[i][0] * 3 + 1]); + int y_start = int(results[batchid].keypoints[edge[i][0] * 3 + 2]); + int x_end = int(results[batchid].keypoints[edge[i][1] * 3 + 1]); + int y_end = int(results[batchid].keypoints[edge[i][1] * 3 + 2]); + cv::line(vis_img, + cv::Point2d(x_start, y_start), + cv::Point2d(x_end, y_end), + colormap[i], + 1); + } + } + } + return vis_img; +} + +void KeyPointDetector::Preprocess(const cv::Mat& ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + cv::cvtColor(im, im, cv::COLOR_BGR2RGB); + preprocessor_.Run(&im, &inputs_); +} + +void KeyPointDetector::Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center_bs, + std::vector>& scale_bs) { + std::vector preds(output_shape[1] * 3, 0); + + for (int batchid = 0; batchid < output_shape[0]; batchid++) { + get_final_preds(output, + output_shape, + idxout, + idx_shape, + center_bs[batchid], + scale_bs[batchid], + preds, + batchid, + this->use_dark()); + KeyPointResult result_item; + result_item.num_joints = output_shape[1]; + result_item.keypoints.clear(); + for (int i = 0; i < output_shape[1]; i++) { + result_item.keypoints.emplace_back(preds[i * 3]); + result_item.keypoints.emplace_back(preds[i * 3 + 1]); + result_item.keypoints.emplace_back(preds[i * 3 + 2]); + } + result->push_back(result_item); + } +} + +void KeyPointDetector::Predict(const std::vector imgs, + std::vector>& center_bs, + std::vector>& scale_bs, + const int warmup, + const int repeats, + std::vector* result, + std::vector* times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + + // TODO: reduce cost time + in_data_all.insert( + in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + } + + // Prepare input tensor + + auto input_names = predictor_->GetInputNames(); + for (const auto& tensor_name : input_names) { + auto in_tensor = predictor_->GetInputByName(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Resize({batch_size, 3, rh, rw}); + auto* inptr = in_tensor->mutable_data(); + std::copy_n(in_data_all.data(), in_data_all.size(), inptr); + } + } + + auto preprocess_end = std::chrono::steady_clock::now(); + std::vector output_shape, idx_shape; + // Run predictor + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto out_tensor = predictor_->GetTensor(output_names[0]); + auto idx_tensor = predictor_->GetTensor(output_names[1]); + } + + auto inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto out_tensor = predictor_->GetTensor(output_names[0]); + output_shape = out_tensor->shape(); + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + if (output_size < 6) { + std::cerr << "[WARNING] No object detected." << std::endl; + } + output_data_.resize(output_size); + std::copy_n( + out_tensor->mutable_data(), output_size, output_data_.data()); + + auto idx_tensor = predictor_->GetTensor(output_names[1]); + idx_shape = idx_tensor->shape(); + // Calculate output length + output_size = 1; + for (int j = 0; j < idx_shape.size(); ++j) { + output_size *= idx_shape[j]; + } + idx_data_.resize(output_size); + std::copy_n( + idx_tensor->mutable_data(), output_size, idx_data_.data()); + } + auto inference_end = std::chrono::steady_clock::now(); + auto postprocess_start = std::chrono::steady_clock::now(); + // Postprocessing result + Postprocess(output_data_, + output_shape, + idx_data_, + idx_shape, + result, + center_bs, + scale_bs); + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + times->push_back(double(preprocess_diff.count() * 1000)); + std::chrono::duration inference_diff = inference_end - inference_start; + times->push_back(double(inference_diff.count() / repeats * 1000)); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + times->push_back(double(postprocess_diff.count() * 1000)); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/src/keypoint_postprocess.cc b/PaddleDetection-release-2.6/deploy/lite/src/keypoint_postprocess.cc new file mode 100644 index 0000000000000000000000000000000000000000..6c75ece87c2c8f743f0f112ab6bd23fdcc96a270 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/keypoint_postprocess.cc @@ -0,0 +1,231 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/keypoint_postprocess.h" +#define PI 3.1415926535 +#define HALF_CIRCLE_DEGREE 180 + +cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) { + cv::Point2f direct{a.x - b.x, a.y - b.y}; + return cv::Point2f(a.x - direct.y, a.y + direct.x); +} + +std::vector get_dir(float src_point_x, + float src_point_y, + float rot_rad) { + float sn = sin(rot_rad); + float cs = cos(rot_rad); + std::vector src_result{0.0, 0.0}; + src_result[0] = src_point_x * cs - src_point_y * sn; + src_result[1] = src_point_x * sn + src_point_y * cs; + return src_result; +} + +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& preds, int p) { + double new1[3] = {pt_x, pt_y, 1.0}; + cv::Mat new_pt(3, 1, trans.type(), new1); + cv::Mat w = trans * new_pt; + preds[p * 3 + 1] = static_cast(w.at(0, 0)); + preds[p * 3 + 2] = static_cast(w.at(1, 0)); +} + +void get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + cv::Mat& trans, + int inv) { + float src_w = scale[0]; + float dst_w = static_cast(output_size[0]); + float dst_h = static_cast(output_size[1]); + float rot_rad = rot * PI / HALF_CIRCLE_DEGREE; + std::vector src_dir = get_dir(-0.5 * src_w, 0, rot_rad); + std::vector dst_dir{static_cast(-0.5) * dst_w, 0.0}; + cv::Point2f srcPoint2f[3], dstPoint2f[3]; + srcPoint2f[0] = cv::Point2f(center[0], center[1]); + srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]); + srcPoint2f[2] = get_3rd_point(srcPoint2f[0], srcPoint2f[1]); + + dstPoint2f[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstPoint2f[1] = + cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]); + dstPoint2f[2] = get_3rd_point(dstPoint2f[0], dstPoint2f[1]); + if (inv == 0) { + trans = cv::getAffineTransform(srcPoint2f, dstPoint2f); + } else { + trans = cv::getAffineTransform(dstPoint2f, srcPoint2f); + } +} + +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine=false) { + if (affine) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform( + coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } + } else { + float heat_w = static_cast(output_size[0]); + float heat_h = static_cast(output_size[1]); + float x_scale = scale[0] / heat_w; + float y_scale = scale[1] / heat_h; + float offset_x = center[0] - scale[0] / 2.; + float offset_y = center[1] - scale[1] / 2.; + for (int i = 0; i < dim[1]; i++) { + target_coords[i * 3 + 1] = x_scale * coords[i * 2] + offset_x; + target_coords[i * 3 + 2] = y_scale * coords[i * 2 + 1] + offset_y; + } + } +} + +// only for batchsize == 1 +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx) { + int num_joints = dim[1]; + int width = dim[3]; + std::vector idx; + idx.resize(num_joints * 2); + + for (int j = 0; j < dim[1]; j++) { + float* index = &( + heatmap[batchid * num_joints * dim[2] * dim[3] + j * dim[2] * dim[3]]); + float* end = index + dim[2] * dim[3]; + float* max_dis = std::max_element(index, end); + auto max_id = std::distance(index, max_dis); + maxvals[j] = *max_dis; + if (*max_dis > 0) { + preds[j * 2] = static_cast(max_id % width); + preds[j * 2 + 1] = static_cast(max_id / width); + } + } +} + + +void dark_parse(std::vector& heatmap, + std::vector& dim, + std::vector& coords, + int px, + int py, + int index, + int ch){ + /*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate + Representation for Human Pose Estimation (CVPR 2020). + 1) offset = - hassian.inv() * derivative + 2) dx = (heatmap[x+1] - heatmap[x-1])/2. + 3) dxx = (dx[x+1] - dx[x-1])/2. + 4) derivative = Mat([dx, dy]) + 5) hassian = Mat([[dxx, dxy], [dxy, dyy]]) + */ + std::vector::const_iterator first1 = heatmap.begin() + index; + std::vector::const_iterator last1 = heatmap.begin() + index + dim[2] * dim[3]; + std::vector heatmap_ch(first1, last1); + cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0,dim[2]); + heatmap_mat.convertTo(heatmap_mat, CV_32FC1); + cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0); + heatmap_mat = heatmap_mat.reshape(1,1); + heatmap_ch = std::vector(heatmap_mat.reshape(1,1)); + + float epsilon = 1e-10; + //sample heatmap to get values in around target location + float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon)); + float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon)); + float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon)); + + float xr2 = log(fmax(heatmap_ch[py * dim[3] + px + 2], epsilon)); + float xl2 = log(fmax(heatmap_ch[py * dim[3] + px - 2], epsilon)); + float yu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px], epsilon)); + float yd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px], epsilon)); + float yu2 = log(fmax(heatmap_ch[(py + 2) * dim[3] + px], epsilon)); + float yd2 = log(fmax(heatmap_ch[(py - 2) * dim[3] + px], epsilon)); + float xryu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px + 1], epsilon)); + float xryd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px + 1], epsilon)); + float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon)); + float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon)); + + //compute dx/dy and dxx/dyy with sampled values + float dx = 0.5 * (xr - xl); + float dy = 0.5 * (yu - yd); + float dxx = 0.25 * (xr2 - 2*xy + xl2); + float dxy = 0.25 * (xryu - xryd - xlyu + xlyd); + float dyy = 0.25 * (yu2 - 2*xy + yd2); + + //finally get offset by derivative and hassian, which combined by dx/dy and dxx/dyy + if(dxx * dyy - dxy*dxy != 0){ + float M[2][2] = {dxx, dxy, dxy, dyy}; + float D[2] = {dx, dy}; + cv::Mat hassian(2,2,CV_32F,M); + cv::Mat derivative(2,1,CV_32F,D); + cv::Mat offset = - hassian.inv() * derivative; + coords[ch * 2] += offset.at(0,0); + coords[ch * 2 + 1] += offset.at(1,0); + } +} + +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK) { + std::vector coords; + coords.resize(dim[1] * 2); + int heatmap_height = dim[2]; + int heatmap_width = dim[3]; + + for (int j = 0; j < dim[1]; ++j) { + int index = (batchid * dim[1] + j) * dim[2] * dim[3]; + + int idx = idxout[batchid * dim[1] + j]; + preds[j * 3] = heatmap[index + idx]; + coords[j * 2] = idx % heatmap_width; + coords[j * 2 + 1] = idx / heatmap_width; + + int px = int(coords[j * 2] + 0.5); + int py = int(coords[j * 2 + 1] + 0.5); + + if(DARK && px > 1 && px < heatmap_width - 2){ + dark_parse(heatmap, dim, coords, px, py, index, j); + } + else{ + if (px > 0 && px < heatmap_width - 1) { + float diff_x = heatmap[index + py * dim[3] + px + 1] - + heatmap[index + py * dim[3] + px - 1]; + coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25; + } + if (py > 0 && py < heatmap_height - 1) { + float diff_y = heatmap[index + (py + 1) * dim[3] + px] - + heatmap[index + (py - 1) * dim[3] + px]; + coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25; + } + } + } + + std::vector img_size{heatmap_width, heatmap_height}; + transform_preds(coords, center, scale, img_size, dim, preds); +} diff --git a/PaddleDetection-release-2.6/deploy/lite/src/main.cc b/PaddleDetection-release-2.6/deploy/lite/src/main.cc new file mode 100644 index 0000000000000000000000000000000000000000..51f3b338064a90e7b7fd411f964d08ce72f4441e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/main.cc @@ -0,0 +1,388 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "include/config_parser.h" +#include "include/keypoint_detector.h" +#include "include/object_detector.h" +#include "include/preprocess_op.h" +#include "json/json.h" + +Json::Value RT_Config; + +void PrintBenchmarkLog(std::vector det_time, int img_num) { + std::cout << "----------------------- Config info -----------------------" + << std::endl; + std::cout << "num_threads: " << RT_Config["cpu_threads"].as() + << std::endl; + std::cout << "----------------------- Data info -----------------------" + << std::endl; + std::cout << "batch_size_det: " << RT_Config["batch_size_det"].as() + << std::endl; + std::cout << "----------------------- Model info -----------------------" + << std::endl; + RT_Config["model_dir_det"].as().erase( + RT_Config["model_dir_det"].as().find_last_not_of("/") + 1); + std::cout << "detection model_name: " + << RT_Config["model_dir_det"].as() << std::endl; + std::cout << "----------------------- Perf info ------------------------" + << std::endl; + std::cout << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0.) + << std::endl; + img_num = std::max(1, img_num); + std::cout << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num << std::endl; +} + +void PrintKptsBenchmarkLog(std::vector det_time, int img_num) { + std::cout << "----------------------- Data info -----------------------" + << std::endl; + std::cout << "batch_size_keypoint: " + << RT_Config["batch_size_keypoint"].as() << std::endl; + std::cout << "----------------------- Model info -----------------------" + << std::endl; + RT_Config["model_dir_keypoint"].as().erase( + RT_Config["model_dir_keypoint"].as().find_last_not_of("/") + + 1); + std::cout << "keypoint model_name: " + << RT_Config["model_dir_keypoint"].as() << std::endl; + std::cout << "----------------------- Perf info ------------------------" + << std::endl; + std::cout << "Total number of predicted data: " << img_num + << " and total time spent(ms): " + << std::accumulate(det_time.begin(), det_time.end(), 0.) + << std::endl; + img_num = std::max(1, img_num); + std::cout << "Average time cost per person:" << std::endl + << "preproce_time(ms): " << det_time[0] / img_num + << ", inference_time(ms): " << det_time[1] / img_num + << ", postprocess_time(ms): " << det_time[2] / img_num << std::endl; +} + +void PrintTotalIimeLog(double det_time, + double keypoint_time, + double crop_time) { + std::cout << "----------------------- Time info ------------------------" + << std::endl; + std::cout << "Total Pipeline time(ms) per image: " + << det_time + keypoint_time + crop_time << std::endl; + std::cout << "Average det time(ms) per image: " << det_time + << ", average keypoint time(ms) per image: " << keypoint_time + << ", average crop time(ms) per image: " << crop_time << std::endl; +} + +static std::string DirName(const std::string& filepath) { + auto pos = filepath.rfind(OS_PATH_SEP); + if (pos == std::string::npos) { + return ""; + } + return filepath.substr(0, pos); +} + +static bool PathExists(const std::string& path) { + struct stat buffer; + return (stat(path.c_str(), &buffer) == 0); +} + +static void MkDir(const std::string& path) { + if (PathExists(path)) return; + int ret = 0; + ret = mkdir(path.c_str(), 0755); + if (ret != 0) { + std::string path_error(path); + path_error += " mkdir failed!"; + throw std::runtime_error(path_error); + } +} + +static void MkDirs(const std::string& path) { + if (path.empty()) return; + if (PathExists(path)) return; + + MkDirs(DirName(path)); + MkDir(path); +} + +void PredictImage(const std::vector all_img_paths, + const int batch_size_det, + const double threshold_det, + const bool run_benchmark, + PaddleDetection::ObjectDetector* det, + PaddleDetection::KeyPointDetector* keypoint, + const std::string& output_dir = "output") { + std::vector det_t = {0, 0, 0}; + int steps = ceil(static_cast(all_img_paths.size()) / batch_size_det); + int kpts_imgs = 0; + std::vector keypoint_t = {0, 0, 0}; + double midtimecost = 0; + for (int idx = 0; idx < steps; idx++) { + std::vector batch_imgs; + int left_image_cnt = all_img_paths.size() - idx * batch_size_det; + if (left_image_cnt > batch_size_det) { + left_image_cnt = batch_size_det; + } + for (int bs = 0; bs < left_image_cnt; bs++) { + std::string image_file_path = all_img_paths.at(idx * batch_size_det + bs); + cv::Mat im = cv::imread(image_file_path, 1); + batch_imgs.insert(batch_imgs.end(), im); + } + // Store all detected result + std::vector result; + std::vector bbox_num; + std::vector det_times; + + // Store keypoint results + std::vector result_kpts; + std::vector imgs_kpts; + std::vector> center_bs; + std::vector> scale_bs; + std::vector colormap_kpts = PaddleDetection::GenerateColorMap(20); + bool is_rbox = false; + if (run_benchmark) { + det->Predict( + batch_imgs, threshold_det, 50, 50, &result, &bbox_num, &det_times); + } else { + det->Predict( + batch_imgs, threshold_det, 0, 1, &result, &bbox_num, &det_times); + } + + // get labels and colormap + auto labels = det->GetLabelList(); + auto colormap = PaddleDetection::GenerateColorMap(labels.size()); + int item_start_idx = 0; + for (int i = 0; i < left_image_cnt; i++) { + cv::Mat im = batch_imgs[i]; + std::vector im_result; + int detect_num = 0; + for (int j = 0; j < bbox_num[i]; j++) { + PaddleDetection::ObjectResult item = result[item_start_idx + j]; + if (item.confidence < threshold_det || item.class_id == -1) { + continue; + } + detect_num += 1; + im_result.push_back(item); + if (item.rect.size() > 6) { + is_rbox = true; + printf("class=%d confidence=%.4f rect=[%d %d %d %d %d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3], + item.rect[4], + item.rect[5], + item.rect[6], + item.rect[7]); + } else { + printf("class=%d confidence=%.4f rect=[%d %d %d %d]\n", + item.class_id, + item.confidence, + item.rect[0], + item.rect[1], + item.rect[2], + item.rect[3]); + } + } + std::cout << all_img_paths.at(idx * batch_size_det + i) + << " The number of detected box: " << detect_num << std::endl; + item_start_idx = item_start_idx + bbox_num[i]; + + std::vector compression_params; + compression_params.push_back(cv::IMWRITE_JPEG_QUALITY); + compression_params.push_back(95); + std::string output_path(output_dir); + if (output_dir.rfind(OS_PATH_SEP) != output_dir.size() - 1) { + output_path += OS_PATH_SEP; + } + std::string image_file_path = all_img_paths.at(idx * batch_size_det + i); + if (keypoint) { + int imsize = im_result.size(); + for (int i = 0; i < imsize; i++) { + auto keypoint_start_time = std::chrono::steady_clock::now(); + auto item = im_result[i]; + cv::Mat crop_img; + std::vector keypoint_times; + std::vector rect = { + item.rect[0], item.rect[1], item.rect[2], item.rect[3]}; + std::vector center; + std::vector scale; + if (item.class_id == 0) { + PaddleDetection::CropImg(im, crop_img, rect, center, scale); + center_bs.emplace_back(center); + scale_bs.emplace_back(scale); + imgs_kpts.emplace_back(crop_img); + kpts_imgs += 1; + } + auto keypoint_crop_time = std::chrono::steady_clock::now(); + + std::chrono::duration midtimediff = + keypoint_crop_time - keypoint_start_time; + midtimecost += static_cast(midtimediff.count() * 1000); + + if (imgs_kpts.size() == RT_Config["batch_size_keypoint"].as() || + ((i == imsize - 1) && !imgs_kpts.empty())) { + if (run_benchmark) { + keypoint->Predict(imgs_kpts, + center_bs, + scale_bs, + 10, + 10, + &result_kpts, + &keypoint_times); + } else { + keypoint->Predict(imgs_kpts, + center_bs, + scale_bs, + 0, + 1, + &result_kpts, + &keypoint_times); + } + imgs_kpts.clear(); + center_bs.clear(); + scale_bs.clear(); + keypoint_t[0] += keypoint_times[0]; + keypoint_t[1] += keypoint_times[1]; + keypoint_t[2] += keypoint_times[2]; + } + } + std::string kpts_savepath = + output_path + "keypoint_" + + image_file_path.substr(image_file_path.find_last_of('/') + 1); + cv::Mat kpts_vis_img = VisualizeKptsResult( + im, result_kpts, colormap_kpts, keypoint->get_threshold()); + cv::imwrite(kpts_savepath, kpts_vis_img, compression_params); + printf("Visualized output saved as %s\n", kpts_savepath.c_str()); + } else { + // Visualization result + cv::Mat vis_img = PaddleDetection::VisualizeResult( + im, im_result, labels, colormap, is_rbox); + std::string det_savepath = + output_path + "result_" + + image_file_path.substr(image_file_path.find_last_of('/') + 1); + cv::imwrite(det_savepath, vis_img, compression_params); + printf("Visualized output saved as %s\n", det_savepath.c_str()); + } + } + + det_t[0] += det_times[0]; + det_t[1] += det_times[1]; + det_t[2] += det_times[2]; + } + PrintBenchmarkLog(det_t, all_img_paths.size()); + if (keypoint) { + PrintKptsBenchmarkLog(keypoint_t, kpts_imgs); + PrintTotalIimeLog( + (det_t[0] + det_t[1] + det_t[2]) / all_img_paths.size(), + (keypoint_t[0] + keypoint_t[1] + keypoint_t[2]) / all_img_paths.size(), + midtimecost / all_img_paths.size()); + } +} + +int main(int argc, char** argv) { + std::cout << "Usage: " << argv[0] << " [config_path] [image_dir](option)\n"; + if (argc < 2) { + std::cout << "Usage: ./main det_runtime_config.json" << std::endl; + return -1; + } + std::string config_path = argv[1]; + std::string img_path = ""; + + if (argc >= 3) { + img_path = argv[2]; + } + // Parsing command-line + PaddleDetection::load_jsonf(config_path, RT_Config); + if (RT_Config["model_dir_det"].as().empty()) { + std::cout << "Please set [model_det_dir] in " << config_path << std::endl; + return -1; + } + if (RT_Config["image_file"].as().empty() && + RT_Config["image_dir"].as().empty() && img_path.empty()) { + std::cout << "Please set [image_file] or [image_dir] in " << config_path + << " Or use command: <" << argv[0] << " [image_dir]>" + << std::endl; + return -1; + } + if (!img_path.empty()) { + std::cout << "Use image_dir in command line overide the path in config file" + << std::endl; + RT_Config["image_dir"] = img_path; + RT_Config["image_file"] = ""; + } + // Load model and create a object detector + PaddleDetection::ObjectDetector det( + RT_Config["model_dir_det"].as(), + RT_Config["cpu_threads"].as(), + RT_Config["batch_size_det"].as()); + + PaddleDetection::KeyPointDetector* keypoint = nullptr; + if (!RT_Config["model_dir_keypoint"].as().empty()) { + keypoint = new PaddleDetection::KeyPointDetector( + RT_Config["model_dir_keypoint"].as(), + RT_Config["cpu_threads"].as(), + RT_Config["batch_size_keypoint"].as(), + RT_Config["use_dark_decode"].as()); + RT_Config["batch_size_det"] = 1; + printf( + "batchsize of detection forced to be 1 while keypoint model is not " + "empty()"); + } + // Do inference on input image + + if (!RT_Config["image_file"].as().empty() || + !RT_Config["image_dir"].as().empty()) { + if (!PathExists(RT_Config["output_dir"].as())) { + MkDirs(RT_Config["output_dir"].as()); + } + std::vector all_img_paths; + std::vector cv_all_img_paths; + if (!RT_Config["image_file"].as().empty()) { + all_img_paths.push_back(RT_Config["image_file"].as()); + if (RT_Config["batch_size_det"].as() > 1) { + std::cout << "batch_size_det should be 1, when set `image_file`." + << std::endl; + return -1; + } + } else { + cv::glob(RT_Config["image_dir"].as(), cv_all_img_paths); + for (const auto& img_path : cv_all_img_paths) { + all_img_paths.push_back(img_path); + } + } + PredictImage(all_img_paths, + RT_Config["batch_size_det"].as(), + RT_Config["threshold_det"].as(), + RT_Config["run_benchmark"].as(), + &det, + keypoint, + RT_Config["output_dir"].as()); + } + delete keypoint; + keypoint = nullptr; + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/lite/src/object_detector.cc b/PaddleDetection-release-2.6/deploy/lite/src/object_detector.cc new file mode 100644 index 0000000000000000000000000000000000000000..0909bd9194679485fd2a8b735ff6f7ffdb0bb2c9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/object_detector.cc @@ -0,0 +1,329 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/object_detector.h" + +namespace PaddleDetection { + +// Load Model and create model predictor +void ObjectDetector::LoadModel(std::string model_file, int num_theads) { + MobileConfig config; + config.set_threads(num_theads); + config.set_model_from_file(model_file + "/model.nb"); + config.set_power_mode(LITE_POWER_HIGH); + + predictor_ = CreatePaddlePredictor(config); +} + +// Visualiztion MaskDetector results +cv::Mat VisualizeResult(const cv::Mat& img, + const std::vector& results, + const std::vector& lables, + const std::vector& colormap, + const bool is_rbox = false) { + cv::Mat vis_img = img.clone(); + for (int i = 0; i < results.size(); ++i) { + // Configure color and text size + std::ostringstream oss; + oss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + oss << lables[results[i].class_id] << " "; + oss << results[i].confidence; + std::string text = oss.str(); + int c1 = colormap[3 * results[i].class_id + 0]; + int c2 = colormap[3 * results[i].class_id + 1]; + int c3 = colormap[3 * results[i].class_id + 2]; + cv::Scalar roi_color = cv::Scalar(c1, c2, c3); + int font_face = cv::FONT_HERSHEY_COMPLEX_SMALL; + double font_scale = 0.5f; + float thickness = 0.5; + cv::Size text_size = + cv::getTextSize(text, font_face, font_scale, thickness, nullptr); + cv::Point origin; + + if (is_rbox) { + // Draw object, text, and background + for (int k = 0; k < 4; k++) { + cv::Point pt1 = cv::Point(results[i].rect[(k * 2) % 8], + results[i].rect[(k * 2 + 1) % 8]); + cv::Point pt2 = cv::Point(results[i].rect[(k * 2 + 2) % 8], + results[i].rect[(k * 2 + 3) % 8]); + cv::line(vis_img, pt1, pt2, roi_color, 2); + } + } else { + int w = results[i].rect[2] - results[i].rect[0]; + int h = results[i].rect[3] - results[i].rect[1]; + cv::Rect roi = cv::Rect(results[i].rect[0], results[i].rect[1], w, h); + // Draw roi object, text, and background + cv::rectangle(vis_img, roi, roi_color, 2); + } + + origin.x = results[i].rect[0]; + origin.y = results[i].rect[1]; + + // Configure text background + cv::Rect text_back = cv::Rect(results[i].rect[0], + results[i].rect[1] - text_size.height, + text_size.width, + text_size.height); + // Draw text, and background + cv::rectangle(vis_img, text_back, roi_color, -1); + cv::putText(vis_img, + text, + origin, + font_face, + font_scale, + cv::Scalar(255, 255, 255), + thickness); + } + return vis_img; +} + +void ObjectDetector::Preprocess(const cv::Mat& ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + cv::cvtColor(im, im, cv::COLOR_BGR2RGB); + preprocessor_.Run(&im, &inputs_); +} + +void ObjectDetector::Postprocess(const std::vector mats, + std::vector* result, + std::vector bbox_num, + bool is_rbox = false) { + result->clear(); + int start_idx = 0; + for (int im_id = 0; im_id < mats.size(); im_id++) { + cv::Mat raw_mat = mats[im_id]; + int rh = 1; + int rw = 1; + if (config_.arch_ == "Face") { + rh = raw_mat.rows; + rw = raw_mat.cols; + } + for (int j = start_idx; j < start_idx + bbox_num[im_id]; j++) { + if (is_rbox) { + // Class id + int class_id = static_cast(round(output_data_[0 + j * 10])); + // Confidence score + float score = output_data_[1 + j * 10]; + int x1 = (output_data_[2 + j * 10] * rw); + int y1 = (output_data_[3 + j * 10] * rh); + int x2 = (output_data_[4 + j * 10] * rw); + int y2 = (output_data_[5 + j * 10] * rh); + int x3 = (output_data_[6 + j * 10] * rw); + int y3 = (output_data_[7 + j * 10] * rh); + int x4 = (output_data_[8 + j * 10] * rw); + int y4 = (output_data_[9 + j * 10] * rh); + + PaddleDetection::ObjectResult result_item; + result_item.rect = {x1, y1, x2, y2, x3, y3, x4, y4}; + result_item.class_id = class_id; + result_item.confidence = score; + result->push_back(result_item); + } else { + // Class id + int class_id = static_cast(round(output_data_[0 + j * 6])); + // Confidence score + float score = output_data_[1 + j * 6]; + int xmin = (output_data_[2 + j * 6] * rw); + int ymin = (output_data_[3 + j * 6] * rh); + int xmax = (output_data_[4 + j * 6] * rw); + int ymax = (output_data_[5 + j * 6] * rh); + int wd = xmax - xmin; + int hd = ymax - ymin; + + PaddleDetection::ObjectResult result_item; + result_item.rect = {xmin, ymin, xmax, ymax}; + result_item.class_id = class_id; + result_item.confidence = score; + result->push_back(result_item); + } + } + start_idx += bbox_num[im_id]; + } +} + +void ObjectDetector::Predict(const std::vector& imgs, + const double threshold, + const int warmup, + const int repeats, + std::vector* result, + std::vector* bbox_num, + std::vector* times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + std::vector im_shape_all(batch_size * 2); + std::vector scale_factor_all(batch_size * 2); + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + im_shape_all[bs_idx * 2] = inputs_.im_shape_[0]; + im_shape_all[bs_idx * 2 + 1] = inputs_.im_shape_[1]; + + scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; + scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; + + // TODO: reduce cost time + in_data_all.insert( + in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + } + auto preprocess_end = std::chrono::steady_clock::now(); + std::vector output_data_list_; + // Prepare input tensor + + auto input_names = predictor_->GetInputNames(); + for (const auto& tensor_name : input_names) { + auto in_tensor = predictor_->GetInputByName(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Resize({batch_size, 3, rh, rw}); + auto* inptr = in_tensor->mutable_data(); + std::copy_n(in_data_all.data(), in_data_all.size(), inptr); + } else if (tensor_name == "im_shape") { + in_tensor->Resize({batch_size, 2}); + auto* inptr = in_tensor->mutable_data(); + std::copy_n(im_shape_all.data(), im_shape_all.size(), inptr); + } else if (tensor_name == "scale_factor") { + in_tensor->Resize({batch_size, 2}); + auto* inptr = in_tensor->mutable_data(); + std::copy_n(scale_factor_all.data(), scale_factor_all.size(), inptr); + } + } + + // Run predictor + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + if (config_.arch_ == "PicoDet") { + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetTensor(output_names[j]); + const float* outptr = output_tensor->data(); + std::vector output_shape = output_tensor->shape(); + output_data_list_.push_back(outptr); + } + } else { + auto out_tensor = predictor_->GetTensor(output_names[0]); + auto out_bbox_num = predictor_->GetTensor(output_names[1]); + } + } + + bool is_rbox = false; + auto inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + } + auto inference_end = std::chrono::steady_clock::now(); + auto postprocess_start = std::chrono::steady_clock::now(); + // Get output tensor + output_data_list_.clear(); + int num_class = 80; + int reg_max = 7; + auto output_names = predictor_->GetOutputNames(); + // TODO: Unified model output. + if (config_.arch_ == "PicoDet") { + for (int i = 0; i < output_names.size(); i++) { + auto output_tensor = predictor_->GetTensor(output_names[i]); + const float* outptr = output_tensor->data(); + std::vector output_shape = output_tensor->shape(); + if (i == 0) { + num_class = output_shape[2]; + } + if (i == config_.fpn_stride_.size()) { + reg_max = output_shape[2] / 4 - 1; + } + output_data_list_.push_back(outptr); + } + } else { + auto output_tensor = predictor_->GetTensor(output_names[0]); + auto output_shape = output_tensor->shape(); + auto out_bbox_num = predictor_->GetTensor(output_names[1]); + auto out_bbox_num_shape = out_bbox_num->shape(); + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + is_rbox = output_shape[output_shape.size() - 1] % 10 == 0; + + if (output_size < 6) { + std::cerr << "[WARNING] No object detected." << std::endl; + } + output_data_.resize(output_size); + std::copy_n( + output_tensor->mutable_data(), output_size, output_data_.data()); + + int out_bbox_num_size = 1; + for (int j = 0; j < out_bbox_num_shape.size(); ++j) { + out_bbox_num_size *= out_bbox_num_shape[j]; + } + out_bbox_num_data_.resize(out_bbox_num_size); + std::copy_n(out_bbox_num->mutable_data(), + out_bbox_num_size, + out_bbox_num_data_.data()); + } + // Postprocessing result + result->clear(); + if (config_.arch_ == "PicoDet") { + PaddleDetection::PicoDetPostProcess( + result, output_data_list_, config_.fpn_stride_, + inputs_.im_shape_, inputs_.scale_factor_, + config_.nms_info_["score_threshold"].as(), + config_.nms_info_["nms_threshold"].as(), num_class, reg_max); + bbox_num->push_back(result->size()); + } else { + Postprocess(imgs, result, out_bbox_num_data_, is_rbox); + bbox_num->clear(); + for (int k = 0; k < out_bbox_num_data_.size(); k++) { + int tmp = out_bbox_num_data_[k]; + bbox_num->push_back(tmp); + } + } + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + times->push_back(double(preprocess_diff.count() * 1000)); + std::chrono::duration inference_diff = inference_end - inference_start; + times->push_back(double(inference_diff.count() / repeats * 1000)); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + times->push_back(double(postprocess_diff.count() * 1000)); +} + +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/src/picodet_postprocess.cc b/PaddleDetection-release-2.6/deploy/lite/src/picodet_postprocess.cc new file mode 100644 index 0000000000000000000000000000000000000000..32625249fabf04745ea239a6ec924df244426c86 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/picodet_postprocess.cc @@ -0,0 +1,128 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on: +// https://github.com/RangiLyu/nanodet/blob/main/demo_mnn/nanodet_mnn.cpp + +#include "include/picodet_postprocess.h" + +namespace PaddleDetection { + +float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} + +// PicoDet decode +PaddleDetection::ObjectResult +disPred2Bbox(const float *&dfl_det, int label, float score, int x, int y, + int stride, std::vector im_shape, int reg_max) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[reg_max + 1]; + activation_function_softmax(dfl_det + i * (reg_max + 1), dis_after_sm, + reg_max + 1); + for (int j = 0; j < reg_max + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + int xmin = (int)(std::max)(ct_x - dis_pred[0], .0f); + int ymin = (int)(std::max)(ct_y - dis_pred[1], .0f); + int xmax = (int)(std::min)(ct_x + dis_pred[2], (float)im_shape[0]); + int ymax = (int)(std::min)(ct_y + dis_pred[3], (float)im_shape[1]); + + PaddleDetection::ObjectResult result_item; + result_item.rect = {xmin, ymin, xmax, ymax}; + result_item.class_id = label; + result_item.confidence = score; + + return result_item; +} + +void PicoDetPostProcess(std::vector *results, + std::vector outs, + std::vector fpn_stride, + std::vector im_shape, + std::vector scale_factor, float score_threshold, + float nms_threshold, int num_class, int reg_max) { + std::vector> bbox_results; + bbox_results.resize(num_class); + int in_h = im_shape[0], in_w = im_shape[1]; + for (int i = 0; i < fpn_stride.size(); ++i) { + int feature_h = ceil((float)in_h / fpn_stride[i]); + int feature_w = ceil((float)in_w / fpn_stride[i]); + for (int idx = 0; idx < feature_h * feature_w; idx++) { + const float *scores = outs[i] + (idx * num_class); + + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + for (int label = 0; label < num_class; label++) { + if (scores[label] > score) { + score = scores[label]; + cur_label = label; + } + } + if (score > score_threshold) { + const float *bbox_pred = + outs[i + fpn_stride.size()] + (idx * 4 * (reg_max + 1)); + bbox_results[cur_label].push_back( + disPred2Bbox(bbox_pred, cur_label, score, col, row, fpn_stride[i], + im_shape, reg_max)); + } + } + } + for (int i = 0; i < (int)bbox_results.size(); i++) { + PaddleDetection::nms(bbox_results[i], nms_threshold); + + for (auto box : bbox_results[i]) { + box.rect[0] = box.rect[0] / scale_factor[1]; + box.rect[2] = box.rect[2] / scale_factor[1]; + box.rect[1] = box.rect[1] / scale_factor[0]; + box.rect[3] = box.rect[3] / scale_factor[0]; + results->push_back(box); + } + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/src/preprocess_op.cc b/PaddleDetection-release-2.6/deploy/lite/src/preprocess_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..fbbc5adb1d431c800b0624107d8c281f4b53c9cd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/preprocess_op.cc @@ -0,0 +1,185 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include + +#include "include/preprocess_op.h" + +namespace PaddleDetection { + +void InitInfo::Run(cv::Mat* im, ImageBlob* data) { + data->im_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; + data->scale_factor_ = {1., 1.}; + data->in_net_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; +} + +void NormalizeImage::Run(cv::Mat* im, ImageBlob* data) { + double e = 1.0; + if (is_scale_) { + e *= 1./255.0; + } + (*im).convertTo(*im, CV_32FC3, e); + for (int h = 0; h < im->rows; h++) { + for (int w = 0; w < im->cols; w++) { + im->at(h, w)[0] = + (im->at(h, w)[0] - mean_[0]) / scale_[0]; + im->at(h, w)[1] = + (im->at(h, w)[1] - mean_[1]) / scale_[1]; + im->at(h, w)[2] = + (im->at(h, w)[2] - mean_[2]) / scale_[2]; + } + } +} + +void Permute::Run(cv::Mat* im, ImageBlob* data) { + (*im).convertTo(*im, CV_32FC3); + int rh = im->rows; + int rw = im->cols; + int rc = im->channels(); + (data->im_data_).resize(rc * rh * rw); + float* base = (data->im_data_).data(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); + } +} + +void Resize::Run(cv::Mat* im, ImageBlob* data) { + auto resize_scale = GenerateScale(*im); + data->im_shape_ = {static_cast(im->cols * resize_scale.first), + static_cast(im->rows * resize_scale.second)}; + data->in_net_shape_ = {static_cast(im->cols * resize_scale.first), + static_cast(im->rows * resize_scale.second)}; + cv::resize( + *im, *im, cv::Size(), resize_scale.first, resize_scale.second, interp_); + data->im_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + data->scale_factor_ = { + resize_scale.second, resize_scale.first, + }; +} + +std::pair Resize::GenerateScale(const cv::Mat& im) { + std::pair resize_scale; + int origin_w = im.cols; + int origin_h = im.rows; + + if (keep_ratio_) { + int im_size_max = std::max(origin_w, origin_h); + int im_size_min = std::min(origin_w, origin_h); + int target_size_max = + *std::max_element(target_size_.begin(), target_size_.end()); + int target_size_min = + *std::min_element(target_size_.begin(), target_size_.end()); + float scale_min = + static_cast(target_size_min) / static_cast(im_size_min); + float scale_max = + static_cast(target_size_max) / static_cast(im_size_max); + float scale_ratio = std::min(scale_min, scale_max); + resize_scale = {scale_ratio, scale_ratio}; + } else { + resize_scale.first = + static_cast(target_size_[1]) / static_cast(origin_w); + resize_scale.second = + static_cast(target_size_[0]) / static_cast(origin_h); + } + return resize_scale; +} + +void PadStride::Run(cv::Mat* im, ImageBlob* data) { + if (stride_ <= 0) { + return; + } + int rc = im->channels(); + int rh = im->rows; + int rw = im->cols; + int nh = (rh / stride_) * stride_ + (rh % stride_ != 0) * stride_; + int nw = (rw / stride_) * stride_ + (rw % stride_ != 0) * stride_; + cv::copyMakeBorder( + *im, *im, 0, nh - rh, 0, nw - rw, cv::BORDER_CONSTANT, cv::Scalar(0)); + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; +} + +void TopDownEvalAffine::Run(cv::Mat* im, ImageBlob* data) { + cv::resize(*im, *im, cv::Size(trainsize_[0], trainsize_[1]), 0, 0, interp_); + // todo: Simd::ResizeBilinear(); + data->in_net_shape_ = { + static_cast(trainsize_[1]), static_cast(trainsize_[0]), + }; +} + +// Preprocessor op running order +const std::vector Preprocessor::RUN_ORDER = {"InitInfo", + "TopDownEvalAffine", + "Resize", + "NormalizeImage", + "PadStride", + "Permute"}; + +void Preprocessor::Run(cv::Mat* im, ImageBlob* data) { + for (const auto& name : RUN_ORDER) { + if (ops_.find(name) != ops_.end()) { + ops_[name]->Run(im, data); + } + } +} + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio) { + int crop_x1 = std::max(0, area[0]); + int crop_y1 = std::max(0, area[1]); + int crop_x2 = std::min(img.cols - 1, area[2]); + int crop_y2 = std::min(img.rows - 1, area[3]); + + int center_x = (crop_x1 + crop_x2) / 2.; + int center_y = (crop_y1 + crop_y2) / 2.; + int half_h = (crop_y2 - crop_y1) / 2.; + int half_w = (crop_x2 - crop_x1) / 2.; + + if (half_h * 3 > half_w * 4) { + half_w = static_cast(half_h * 0.75); + } else { + half_h = static_cast(half_w * 4 / 3); + } + + crop_x1 = + std::max(0, center_x - static_cast(half_w * (1 + expandratio))); + crop_y1 = + std::max(0, center_y - static_cast(half_h * (1 + expandratio))); + crop_x2 = std::min(img.cols - 1, + static_cast(center_x + half_w * (1 + expandratio))); + crop_y2 = std::min(img.rows - 1, + static_cast(center_y + half_h * (1 + expandratio))); + crop_img = + img(cv::Range(crop_y1, crop_y2 + 1), cv::Range(crop_x1, crop_x2 + 1)); + + center.clear(); + center.emplace_back((crop_x1 + crop_x2) / 2); + center.emplace_back((crop_y1 + crop_y2) / 2); + scale.clear(); + scale.emplace_back((crop_x2 - crop_x1)); + scale.emplace_back((crop_y2 - crop_y1)); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/lite/src/utils.cc b/PaddleDetection-release-2.6/deploy/lite/src/utils.cc new file mode 100644 index 0000000000000000000000000000000000000000..7b4731cd9e25b3536417ade20d3f9ce5089755fd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/lite/src/utils.cc @@ -0,0 +1,49 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/utils.h" + +namespace PaddleDetection { + +void nms(std::vector &input_boxes, float nms_threshold) { + std::sort(input_boxes.begin(), + input_boxes.end(), + [](ObjectResult a, ObjectResult b) { return a.confidence > b.confidence; }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).rect[2] - input_boxes.at(i).rect[0] + 1) + * (input_boxes.at(i).rect[3] - input_boxes.at(i).rect[1] + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].rect[0], input_boxes[j].rect[0]); + float yy1 = (std::max)(input_boxes[i].rect[1], input_boxes[j].rect[1]); + float xx2 = (std::min)(input_boxes[i].rect[2], input_boxes[j].rect[2]); + float yy2 = (std::min)(input_boxes[i].rect[3], input_boxes[j].rect[3]); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= nms_threshold) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } + else { + j++; + } + } + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pipeline/README.md b/PaddleDetection-release-2.6/deploy/pipeline/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f05510af944332b0578da5f1cfa6b1f223eea8fd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/README.md @@ -0,0 +1,164 @@ +简体中文 | [English](README_en.md) + + + + + +**PaddleDetection深入探索核心行业的高频场景,提供了行人、车辆场景的开箱即用分析工具,支持图片/单镜头视频/多镜头视频/在线视频流多种输入方式,广泛应用于智慧交通、智慧城市、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。** + +- 🚶‍♂️🚶‍♀️ **PP-Human支持四大产业级功能:五大异常行为识别、26种人体属性分析、实时人流计数、跨镜头(ReID)跟踪。** + +- 🚗🚙 **PP-Vehicle囊括四大交通场景核心功能:车牌识别、属性识别、车流量统计、违章检测。** + +![](https://user-images.githubusercontent.com/22989727/202134414-713a00d6-a0a4-4a77-b6e8-05cdb5d42b1e.gif) + +## 📣 近期更新 + +- 🔥🔥🔥 **2023.02.15: Jetson部署专用小模型PP-YOLOE-PLUS-Tiny发布,可在AGX平台实现4路视频流实时预测;PP-Vehicle发布违法分析功能车辆逆行和压车道线** +- **2022.8.20:PP-Vehicle首发,提供车牌识别、车辆属性分析(颜色、车型)、车流量统计以及违章检测四大功能,完善的文档教程支持高效完成二次开发与模型优化** +- **2022.7.13:PP-Human v2发布,新增打架、打电话、抽烟、闯入四大行为识别,底层算法性能升级,覆盖行人检测、跟踪、属性三类核心算法能力,提供保姆级全流程开发及模型优化策略** +- 2022.4.18:新增PP-Human全流程实战教程, 覆盖训练、部署、动作类型扩展等内容,AIStudio项目请见[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982) +- 2022.4.10:新增PP-Human范例,赋能社区智能精细化管理, AIStudio快速上手教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564) +- 2022.4.5:全新发布实时行人分析工具PP-Human,支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度 + +## 🔮 功能介绍与效果展示 + +### PP-Human + +| ⭐ 功能 | 💟 方案优势 | 💡示例图 | +| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | +| **跨镜跟踪(ReID)** | 超强性能:针对目标遮挡、完整度、模糊度等难点特殊优化,实现mAP 98.8、1.5ms/人 | | +| **属性分析** | 兼容多种数据格式:支持图片、视频、在线视频流输入

    高性能:融合开源数据集与企业真实数据进行训练,实现mAP 95.4、2ms/人

    支持26种属性:性别、年龄、眼镜、上衣、鞋子、帽子、背包等26种高频属性 | | +| **行为识别(包含摔倒、打架、抽烟、打电话、人员闯入)** | 功能丰富:支持摔倒、打架、抽烟、打电话、人员闯入五种高频异常行为识别

    鲁棒性强:对光照、视角、背景环境无限制

    性能高:与视频识别技术相比,模型计算量大幅降低,支持本地化与服务化快速部署

    训练速度快:仅需15分钟即可产出高精度行为识别模型 | | +| **人流量计数**
    **轨迹记录** | 简洁易用:单个参数即可开启人流量计数与轨迹记录功能 | | + +### PP-Vehicle + +| ⭐ 功能 | 💟 方案优势 | 💡示例图 | +| ---------- | ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | +| **车牌识别** | 支持传统车牌和新能源绿色车牌

    车牌识别采用长间隔采样识别与多次结果统计投票方式,算力消耗少,识别精度高,结果稳定性好。 检测模型 hmean: 0.979; 识别模型 acc: 0.773 | | +| **车辆属性分析** | 支持多种车型、颜色类别识别

    使用更强力的Backbone模型PP-HGNet、PP-LCNet,精度高、速度快。识别精度: 90.81 | | +| **违章检测** | 简单易用:一行命令即可实现违停检测,自定义设置区域

    检测、跟踪效果好,可实现违停车辆车牌识别 | | +| **车流量计数** | 简单易用:一行命令即可开启功能,自定义出入位置

    可提供目标跟踪轨迹显示,统计准确度高 | | +| **违法分析-车辆逆行** | 简单易用:一行命令即可开启功能

    车道线分割使用高精度模型PP-LIteSeg | | +| **违法分析-压车道线** | 简单易用:一行命令即可开启功能

    车道线分割使用高精度模型PP-LIteSeg | | + +## 🗳 模型库 + +### PP-Human + +
    +端到端模型效果(点击展开) + +| 任务 | 端到端速度(ms) | 模型方案 | 模型体积 | +|:---------:|:---------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------:| +| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人检测(超轻量级) | 10ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人跟踪(超轻量级) | 13.2ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| 跨镜跟踪(REID) | 单人1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID:92M | +| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M
    属性识别:86M | +| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + + +点击模型方案中的模型即可下载指定模型,下载后解压存放至`./output_inference`目录中 + +
    + +### PP-Vehicle + +
    +端到端模型效果(点击展开) + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 车辆检测(高精度) | 25.7ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车辆检测(轻量级) | 13.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| 车辆检测(超轻量级) | 10ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| 车辆跟踪(高精度) | 40ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车辆跟踪(轻量级) | 25ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| 车辆跟踪(超轻量级) | 13.2ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| 车牌识别 | 4.68ms | [车牌检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [车牌字符识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | 车牌检测:3.9M
    车牌字符识别: 12M | +| 车辆属性 | 7.31ms | [车辆属性](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | +| 车道线检测 | 47ms | [车道线模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | 47M | + +点击模型方案中的模型即可下载指定模型,下载后解压存放至`./output_inference`目录中 + +
    + +## 📚 详细文档 + +### 🚶‍♀️ 行人分析工具PP-Human + +#### [快速开始](docs/tutorials/PPHuman_QUICK_STARTED.md) + +#### 行为识别 + +- [快速开始](docs/tutorials/pphuman_action.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/action_recognotion/README.md) + +#### 行人属性/特征识别 + +- [快速开始](docs/tutorials/pphuman_attribute.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_attribute.md) + +#### 跨镜跟踪/ReID + +- [快速开始](docs/tutorials/pphuman_mtmct.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mtmct.md) + +#### 行人跟踪、人流计数与轨迹记录 + +- [快速开始](docs/tutorials/pphuman_mot.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mot.md) + +### 🚘 车辆分析工具PP-Vehicle + +#### [快速开始](docs/tutorials/PPVehicle_QUICK_STARTED.md) + +#### 车牌识别 + +- [快速开始](docs/tutorials/ppvehicle_plate.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/ppvehicle_plate.md) + +#### 车辆属性分析 + +- [快速开始](docs/tutorials/ppvehicle_attribute.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/ppvehicle_attribute.md) + +#### 违章检测 + +- [快速开始](docs/tutorials/ppvehicle_illegal_parking.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mot.md) + +#### 车辆跟踪、车流计数与轨迹记录 + +- [快速开始](docs/tutorials/ppvehicle_mot.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mot.md) + +#### 车辆违法压线 + +- [快速开始](docs/tutorials/ppvehicle_press.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/ppvehicle_violation.md) + +#### 车辆逆行 + +- [快速开始](docs/tutorials/ppvehicle_retrograde.md) + +- [二次开发教程](../../docs/advanced_tutorials/customization/ppvehicle_violation.md) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/README_en.md b/PaddleDetection-release-2.6/deploy/pipeline/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..ef5c0077a06163fc9872b30797d10a72fa826282 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/README_en.md @@ -0,0 +1,161 @@ +[简体中文](README.md) | English + + + +**PaddleDetection has provide out-of-the-box tools in pedestrian and vehicle analysis, and it support multiple input format such as images/videos/multi-videos/online video streams. This make it popular in smart-city\smart transportation and so on. It can be deployed easily with GPU server and TensorRT, which achieves real-time performace.** + +- 🚶‍♂️🚶‍♀️ **PP-Human has four major toolbox for pedestrian analysis: five example of behavior analysis、26 attributes recognition、in-out counting、multi-target-multi-camera tracking(REID).** + +- 🚗🚙 **PP-Vehicle has four major toolbox for vehicle analysis: The license plate recognition、vechile attributes、in-out counting、illegal_parking recognition.** + +![](https://user-images.githubusercontent.com/22989727/202134414-713a00d6-a0a4-4a77-b6e8-05cdb5d42b1e.gif) + +## 📣 Updates + +- 🔥🔥🔥 **PP-YOLOE-PLUS-Tiny was launched for Jetson deploy, which has achieved 20fps while four rtsp streams work at the same time; PP-Vehicle was launched with retrograde and lane line press.** +- 🔥 **2022.8.20:PP-Vehicle was first launched with four major toolbox for vehicle analysis,and it also provide detailed documentation for user to train with their own datas and model optimize.** +- 🔥 2022.7.13:PP-Human v2 launched with a full upgrade of four industrial features: behavior analysis, attributes recognition, visitor traffic statistics and ReID. It provides a strong core algorithm for pedestrian detection, tracking and attribute analysis with a simple and detailed development process and model optimization strategy. +- 2022.4.18: Add PP-Human practical tutorials, including training, deployment, and action expansion. Details for AIStudio project please see [Link](https://aistudio.baidu.com/aistudio/projectdetail/3842982) + +- 2022.4.10: Add PP-Human examples; empower refined management of intelligent community management. A quick start for AIStudio [Link](https://aistudio.baidu.com/aistudio/projectdetail/3679564) +- 2022.4.5: Launch the real-time pedestrian analysis tool PP-Human. It supports pedestrian tracking, visitor traffic statistics, attributes recognition, and falling detection. Due to its specific optimization of real-scene data, it can accurately recognize various falling gestures, and adapt to different environmental backgrounds, light and camera angles. + +![](https://user-images.githubusercontent.com/48054808/184843170-c3ef7d29-913b-4c6e-b533-b83892a8b0e2.gif) + + +## 🔮 Features and demonstration + +### PP-Human + +| ⭐ Feature | 💟 Advantages | 💡Example | +| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| **ReID** | Extraordinary performance: special optimization for technical challenges such as target occlusion, uncompleted and blurry objects to achieve mAP 98.8, 1.5ms/person | | +| **Attribute analysis** | Compatible with a variety of data formats: support for images, video input

    High performance: Integrated open-sourced datasets with real enterprise data for training, achieved mAP 94.86, 2ms/person

    Support 26 attributes: gender, age, glasses, tops, shoes, hats, backpacks and other 26 high-frequency attributes | | +| **Behaviour detection** | Rich function: support five high-frequency anomaly behavior detection of falling, fighting, smoking, telephoning, and intrusion

    Robust: unlimited by different environmental backgrounds, light, and camera angles.

    High performance: Compared with video recognition technology, it takes significantly smaller computation resources; support localization and service-oriented rapid deployment

    Fast training: only takes 15 minutes to produce high precision behavior detection models | | +| **Visitor traffic statistics**
    **Trace record** | Simple and easy to use: single parameter to initiate functions of visitor traffic statistics and trace record | | + +### PP-Vehicle + +| ⭐ Feature | 💟 Advantages | 💡 Example | +| ---------- | ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | +| **License Plate-Recognition** | Both support for traditional plate and new green plate

    Sample the frame in a time windows to recognice the plate license, and vots the license in many samples, which lead less compute cost and better accuracy, and the result is much more stable.

    hmean of text detector: 0.979; accuracy of recognition model: 0.773

    | | +| **Vehicle Attributes** | Identify 10 vehicle colors and 9 models

    More powerfull backbone: PP-HGNet/PP-LCNet, with higher accuracy and faster speed

    accuracy of model: 90.81

    | | +| **Illegal Parking** | Easy to use with one line command, and define the illegal area by yourself

    Get the license of illegal car

    | | +| **in-out counting** | Easy to use with one line command, and define the in-out line by yourself

    Target route visualize with high tracking performance | | +| **vehicle retrograde** | Easy to use with one line command

    High precision Segmetation model PP-LiteSeg | | +| **vehicle press line** | Easy to use with one line command

    High precision Segmetation model PP-LiteSeg | | + +## 🗳 Model Zoo + +
    +PP-Human End-to-end model results (click to expand) + +| Task | End-to-End Speed(ms) | Model | Size | +|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:| +| Pedestrian detection (high precision) | 25.1ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian detection (lightweight) | 16.2ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian detection (super lightweight) | 10ms(Jetson AGX) | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| Pedestrian tracking (high precision) | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian tracking (lightweight) | 21.0ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian tracking(super lightweight) | 13.2ms(Jetson AGX) | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| MTMCT(REID) | Single Person 1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID:92M | +| Attribute recognition (high precision) | Single person8.5ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Attribute recognition (lightweight) | Single person 7.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Falling detection | Single person 10ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking:182M
    Keypoint detection:101M
    Behavior detection based on key points: 21.8M | +| Intrusion detection | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Fighting detection | 19.7ms | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| Smoking detection | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | Object detection:182M
    Object detection based on Human ID: 27M | +| Phoning detection | Single person ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | Object detection:182M
    Image classification based on Human ID:45M | + +
    + +
    +PP-Vehicle End-to-end model results (click to expand) + +| Task | End-to-End Speed(ms) | Model | Size | +|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:| +| Vehicle detection (high precision) | 25.7ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle detection (lightweight) | 13.2ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| Vehicle detection (super lightweight) | 10ms(Jetson AGX) | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| Vehicle tracking (high precision) | 40ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle tracking (lightweight) | 25ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Vehicle tracking (super lightweight) | 13.2ms(Jetson AGX) | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| Plate Recognition | 4.68ms | [plate detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [plate recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | Plate detection:3.9M
    Plate recognition:12M | +| Vehicle attribute | 7.31ms | [attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | +| Lane line Segmentation | 47ms | [Lane line Segmentation](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | 47M | + +
    + + +Click to download the model, then unzip and save it in the `. /output_inference`. + +## 📚 Doc Tutorials + +### 🚶‍♀️ PP-Human + +#### [A Quick Start](docs/tutorials/PPHuman_QUICK_STARTED_en.md) + +#### Pedestrian attribute/feature recognition + +* [A quick start](docs/tutorials/pphuman_attribute_en.md) + +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_attribute_en.md) + +#### Behavior detection + +* [A quick start](docs/tutorials/pphuman_action_en.md) + +* [Customized development tutorials](../../docs/advanced_tutorials/customization/action_recognotion/README_en.md) + +#### ReID + +* [A quick start](docs/tutorials/pphuman_mtmct_en.md) + +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mtmct_en.md) + +#### Pedestrian tracking, visitor traffic statistics, trace records + +* [A quick start](docs/tutorials/pphuman_mot_en.md) + +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mot_en.md) + + +### 🚘 PP-Vehicle + +#### [A Quick Start](docs/tutorials/PPVehicle_QUICK_STARTED.md) + +#### Vehicle Plate License + +- [A quick start](docs/tutorials/ppvehicle_plate_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/ppvehicle_plate.md) + +#### Vehicle Attributes + +- [A quick start](docs/tutorials/ppvehicle_attribute_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/ppvehicle_attribute_en.md) + +#### Illegal Parking + +- [A quick start](docs/tutorials/ppvehicle_illegal_parking_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mot_en.md) + +#### Vehicle Tracking/in-out counint/Route Visualize + +- [A quick start](docs/tutorials/ppvehicle_mot_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mot_en.md) + +#### Vehicle Press Line + +- [A quick start](docs/tutorials/ppvehicle_press_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/ppvehicle_violation_en.md) + +#### Vehicle Retrograde + +- [A quick start](docs/tutorials/ppvehicle_retrograde_en.md) + +- [Customized development tutorials](../../docs/advanced_tutorials/customization/ppvehicle_violation_en.md) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/__init__.py b/PaddleDetection-release-2.6/deploy/pipeline/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/PaddleDetection-release-2.6/deploy/pipeline/cfg_utils.py b/PaddleDetection-release-2.6/deploy/pipeline/cfg_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1ad6ed999460a5df60cc6e61782747031488c1d4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/cfg_utils.py @@ -0,0 +1,221 @@ +import ast +import yaml +import copy +import argparse +from argparse import ArgumentParser, RawDescriptionHelpFormatter + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument( + "-o", "--opt", nargs='*', help="set configuration options") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + args.opt = self._parse_opt(args.opt) + return args + + def _parse_opt(self, opts): + config = {} + if not opts: + return config + for s in opts: + s = s.strip() + k, v = s.split('=', 1) + if '.' not in k: + config[k] = yaml.load(v, Loader=yaml.Loader) + else: + keys = k.split('.') + if keys[0] not in config: + config[keys[0]] = {} + cur = config[keys[0]] + for idx, key in enumerate(keys[1:]): + if idx == len(keys) - 2: + cur[key] = yaml.load(v, Loader=yaml.Loader) + else: + cur[key] = {} + cur = cur[key] + return config + + +def argsparser(): + parser = ArgsParser() + + parser.add_argument( + "--config", + type=str, + default=None, + help=("Path of configure"), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--video_dir", + type=str, + default=None, + help="Dir of video file, `video_file` has a higher priority.") + parser.add_argument( + "--rtsp", + type=str, + nargs='+', + default=None, + help="list of rtsp inputs, for one or multiple rtsp input.") + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--pushurl", + type=str, + default="", + help="url of output visualization stream.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1280, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=640, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + "--do_entrance_counting", + action='store_true', + help="Whether counting the numbers of identifiers entering " + "or getting out from the entrance. Note that only support single-class MOT." + ) + parser.add_argument( + "--do_break_in_counting", + action='store_true', + help="Whether counting the numbers of identifiers break in " + "the area. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--illegal_parking_time", + type=int, + default=-1, + help="illegal parking time which units are seconds, default is -1 which means not recognition illegal parking" + ) + parser.add_argument( + "--region_type", + type=str, + default='horizontal', + help="Area type for entrance counting or break in counting, 'horizontal' and " + "'vertical' used when do entrance counting. 'custom' used when do break in counting. " + "Note that only support single-class MOT, and the video should be taken by a static camera." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--secs_interval", + type=int, + default=2, + help="The seconds interval to count after tracking") + parser.add_argument( + "--draw_center_traj", + action='store_true', + help="Whether drawing the trajectory of center") + + return parser + + +def merge_cfg(args): + # load config + with open(args.config) as f: + pred_config = yaml.safe_load(f) + + def merge(cfg, arg): + # update cfg from arg directly + merge_cfg = copy.deepcopy(cfg) + for k, v in cfg.items(): + if k in arg: + merge_cfg[k] = arg[k] + else: + if isinstance(v, dict): + merge_cfg[k] = merge(v, arg) + + return merge_cfg + + def merge_opt(cfg, arg): + merge_cfg = copy.deepcopy(cfg) + # merge opt + if 'opt' in arg.keys() and arg['opt']: + for name, value in arg['opt'].items( + ): # example: {'MOT': {'batch_size': 3}} + if name not in merge_cfg.keys(): + print("No", name, "in config file!") + continue + for sub_k, sub_v in value.items(): + if sub_k not in merge_cfg[name].keys(): + print("No", sub_k, "in config file of", name, "!") + continue + merge_cfg[name][sub_k] = sub_v + + return merge_cfg + + args_dict = vars(args) + pred_config = merge(pred_config, args_dict) + pred_config = merge_opt(pred_config, args_dict) + + return pred_config + + +def print_arguments(cfg): + print('----------- Running Arguments -----------') + buffer = yaml.dump(cfg) + print(buffer) + print('------------------------------------------') diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_calling.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_calling.yml new file mode 100644 index 0000000000000000000000000000000000000000..8d74712aa0b8c9f16d7b89aec5307d16253438c3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_calling.yml @@ -0,0 +1,17 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ID_BASED_CLSACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip + batch_size: 8 + threshold: 0.8 + display_frames: 80 + skip_frame_num: 2 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fall_down.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fall_down.yml new file mode 100644 index 0000000000000000000000000000000000000000..5dc38bb23b161cb7e84e027a3a4dd381da3d246b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fall_down.yml @@ -0,0 +1,22 @@ +crop_thresh: 0.5 +kpt_thresh: 0.2 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +KPT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip + batch_size: 8 + +SKELETON_ACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip + batch_size: 1 + max_frames: 50 + display_frames: 80 + coord_size: [384, 512] + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml new file mode 100644 index 0000000000000000000000000000000000000000..76826ebaa45c0345e94d5ab8218293844cc96697 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml @@ -0,0 +1,11 @@ +visual: True +warmup_frame: 50 + +VIDEO_ACTION: + model_dir: https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip + batch_size: 1 + frame_len: 8 + sample_freq: 7 + short_size: 340 + target_size: 320 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_attr.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_attr.yml new file mode 100644 index 0000000000000000000000000000000000000000..ff39f978e5779860099b78742f1cbb6b054c1fe4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_attr.yml @@ -0,0 +1,19 @@ +crop_thresh: 0.5 +attr_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_mot.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_mot.yml new file mode 100644 index 0000000000000000000000000000000000000000..7b9e739d4aa9139055c732728370fc414f31cfee --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_human_mot.yml @@ -0,0 +1,9 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml new file mode 100644 index 0000000000000000000000000000000000000000..d19bdf3d4652251668105015b72fdc6eacf6b628 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml @@ -0,0 +1,19 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +VEHICLE_PLATE: + det_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz + det_limit_side_len: 736 + det_limit_type: "min" + rec_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz + rec_image_shape: [3, 48, 320] + rec_batch_num: 6 + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_reid.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_reid.yml new file mode 100644 index 0000000000000000000000000000000000000000..42c7f6f20d7c194aa03b8d54df81958211b72452 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_reid.yml @@ -0,0 +1,14 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +REID: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip + batch_size: 16 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_smoking.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_smoking.yml new file mode 100644 index 0000000000000000000000000000000000000000..41a1475303ee25fe6f35c58d39891a868d9cecab --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_smoking.yml @@ -0,0 +1,17 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ID_BASED_DETACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip + batch_size: 8 + threshold: 0.6 + display_frames: 80 + skip_frame_num: 2 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml new file mode 100644 index 0000000000000000000000000000000000000000..926f40d47f2439988bb423c29b5dc74c585d5167 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml @@ -0,0 +1,20 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +VEHICLE_ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip + batch_size: 8 + color_threshold: 0.5 + type_threshold: 0.5 + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml new file mode 100644 index 0000000000000000000000000000000000000000..9f8d0740a8f41084f3fe37efced9d62227481bc3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml @@ -0,0 +1,23 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +VEHICLE_PLATE: + det_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz + det_limit_side_len: 736 + det_limit_type: "min" + rec_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz + rec_image_shape: [3, 48, 320] + rec_batch_num: 6 + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt + enable: True diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_violation.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_violation.yml new file mode 100644 index 0000000000000000000000000000000000000000..5c3119b8ffd1737c3b749345d0ac2e8d8e6e314a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/examples/infer_cfg_vehicle_violation.yml @@ -0,0 +1,31 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: -1 # preferably no more than 3 + enable: True + +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip + +VEHICLE_PRESSING: + enable: True + +VEHICLE_RETROGRADE: + frame_len: 8 + sample_freq: 7 + enable: True + filter_horizontal_flag: True + keep_right_flag: True + deviation: 45 + move_scale: 0.01 + fence_line: [570, 163, 1030, 752] #[x1,y1,x2,y2] y2>y1. diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_pphuman.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_pphuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..95349291dae2ddd50f7371db4084eb4244fd42a0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_pphuman.yml @@ -0,0 +1,63 @@ +crop_thresh: 0.5 +attr_thresh: 0.5 +kpt_thresh: 0.2 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: -1 # preferably no more than 3 + enable: False + +KPT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip + batch_size: 8 + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: False + +VIDEO_ACTION: + model_dir: https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip + batch_size: 1 + frame_len: 8 + sample_freq: 7 + short_size: 340 + target_size: 320 + enable: False + +SKELETON_ACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip + batch_size: 1 + max_frames: 50 + display_frames: 80 + coord_size: [384, 512] + enable: False + +ID_BASED_DETACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip + batch_size: 8 + threshold: 0.6 + display_frames: 80 + skip_frame_num: 2 + enable: False + +ID_BASED_CLSACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip + batch_size: 8 + threshold: 0.8 + display_frames: 80 + skip_frame_num: 2 + enable: False + +REID: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip + batch_size: 16 + enable: False diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_ppvehicle.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..2d4eada3ce89b9c0da9e1def5af1fa44ac77c7d5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/infer_cfg_ppvehicle.yml @@ -0,0 +1,48 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: -1 # preferably no more than 3 + enable: False + +VEHICLE_PLATE: + det_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz + det_limit_side_len: 736 + det_limit_type: "min" + rec_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz + rec_image_shape: [3, 48, 320] + rec_batch_num: 6 + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt + enable: False + +VEHICLE_ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip + batch_size: 8 + color_threshold: 0.5 + type_threshold: 0.5 + enable: False + +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip + +VEHICLE_PRESSING: + enable: False + +VEHICLE_RETROGRADE: + frame_len: 8 + sample_freq: 7 + enable: False + filter_horizontal_flag: True + keep_right_flag: True + deviation: 23 + move_scale: 0.01 + fence_line: [] #[x1,y1,x2,y2] y2>y1. diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/lane_seg_config.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/lane_seg_config.yml new file mode 100644 index 0000000000000000000000000000000000000000..85fa85928f78afd2b6e5617766b0174cbe74502f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/lane_seg_config.yml @@ -0,0 +1,19 @@ +type: PLSLaneseg + + +PLSLaneseg: + run_mode: 'paddle' + batch_size: 1 + device: gpu + min_subgraph_size: 3 + use_dynamic_shape: False + trt_min_shape: [100,100] + trt_max_shape: [2000,3000] + trt_opt_shape: [512,1024] + trt_calib_mode: False + cpu_threads: 10 + enable_mkldnn: False #Enable to use mkldnn to speed up when using cpu. + + filter_horizontal_flag: True #Whether to filter horizontal roads + horizontal_filtration_degree: 23 + horizontal_filtering_threshold: 0.25 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/config/tracker_config.yml b/PaddleDetection-release-2.6/deploy/pipeline/config/tracker_config.yml new file mode 100644 index 0000000000000000000000000000000000000000..c4a3f60268108be5ab285eeddea2c704958ce2a5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/config/tracker_config.yml @@ -0,0 +1,55 @@ +# config of tracker for MOT SDE Detector, use 'OCSORTTracker' as default, 'JDETracker' here is just BYTETracker. +# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. +# Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. + +type: BOTSORTTracker # choose one tracker in ['JDETracker', 'OCSORTTracker', 'DeepSORTTracker','BOTSORTTracker'] +# When using for MTMCT(Multi-Target Multi-Camera Tracking), you should modify to 'DeepSORTTracker' + + +# just as BYTETracker, used for FairMOT in PP-Tracking project and for ByteTrack in PP-Humanv1 project +JDETracker: + use_byte: True + det_thresh: 0.3 + conf_thres: 0.6 + low_conf_thres: 0.1 + match_thres: 0.9 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + + +# used for OC-SORT in PP-Humanv2 project and PP-Vehicle project +OCSORTTracker: + det_thresh: 0.4 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + min_box_area: 0 + vertical_ratio: 0 + use_byte: False + use_angle_cost: False + + +# used for DeepSORT and MTMCT in PP-Tracking project +DeepSORTTracker: + input_size: [64, 192] # An unique operation to scale the sub-image of the selected detected boxes to a fixed size + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + +BOTSORTTracker: + track_high_thresh: 0.3 + track_low_thresh: 0.2 + new_track_thresh: 0.4 + match_thresh: 0.7 + track_buffer: 30 + min_box_area: 0 + camera_motion: False + cmc_method: 'sparseOptFlow' # only camera_motion is True, + # sparseOptFlow | files (Vidstab GMC) | orb | ecc diff --git a/PaddleDetection-release-2.6/deploy/pipeline/datacollector.py b/PaddleDetection-release-2.6/deploy/pipeline/datacollector.py new file mode 100644 index 0000000000000000000000000000000000000000..49c5e085c94dbebfbd4d5edeb16844eea5171f97 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/datacollector.py @@ -0,0 +1,134 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import copy +from collections import Counter + + +class Result(object): + def __init__(self): + self.res_dict = { + 'det': dict(), + 'mot': dict(), + 'attr': dict(), + 'kpt': dict(), + 'video_action': dict(), + 'skeleton_action': dict(), + 'reid': dict(), + 'det_action': dict(), + 'cls_action': dict(), + 'vehicleplate': dict(), + 'vehicle_attr': dict(), + 'lanes': dict(), + 'vehicle_press': dict(), + 'vehicle_retrograde': dict() + } + + def update(self, res, name): + self.res_dict[name].update(res) + + def get(self, name): + if name in self.res_dict and len(self.res_dict[name]) > 0: + return self.res_dict[name] + return None + + def clear(self, name): + self.res_dict[name].clear() + + +class DataCollector(object): + """ + DataCollector of Pipeline, collect results in every frames and assign it to each track ids. + mainly used in mtmct. + + data struct: + collector: + - [id1]: (all results of N frames) + - frames(list of int): Nx[int] + - rects(list of rect): Nx[rect(conf, xmin, ymin, xmax, ymax)] + - features(list of array(256,)): Nx[array(256,)] + - qualities(list of float): Nx[float] + - attrs(list of attr): refer to attrs for details + - kpts(list of kpts): refer to kpts for details + - skeleton_action(list of skeleton_action): refer to skeleton_action for details + ... + - [idN] + """ + + def __init__(self): + #id, frame, rect, score, label, attrs, kpts, skeleton_action + self.mots = { + "frames": [], + "rects": [], + "attrs": [], + "kpts": [], + "features": [], + "qualities": [], + "skeleton_action": [], + "vehicleplate": [] + } + self.collector = {} + + def append(self, frameid, Result): + mot_res = Result.get('mot') + attr_res = Result.get('attr') + kpt_res = Result.get('kpt') + skeleton_action_res = Result.get('skeleton_action') + reid_res = Result.get('reid') + vehicleplate_res = Result.get('vehicleplate') + + rects = [] + if reid_res is not None: + rects = reid_res['rects'] + elif mot_res is not None: + rects = mot_res['boxes'] + + for idx, mot_item in enumerate(rects): + ids = int(mot_item[0]) + if ids not in self.collector: + self.collector[ids] = copy.deepcopy(self.mots) + self.collector[ids]["frames"].append(frameid) + self.collector[ids]["rects"].append([mot_item[2:]]) + if attr_res: + self.collector[ids]["attrs"].append(attr_res['output'][idx]) + if kpt_res: + self.collector[ids]["kpts"].append( + [kpt_res['keypoint'][0][idx], kpt_res['keypoint'][1][idx]]) + if skeleton_action_res and (idx + 1) in skeleton_action_res: + self.collector[ids]["skeleton_action"].append( + skeleton_action_res[idx + 1]) + else: + # action model generate result per X frames, Not available every frames + self.collector[ids]["skeleton_action"].append(None) + if reid_res: + self.collector[ids]["features"].append(reid_res['features'][ + idx]) + self.collector[ids]["qualities"].append(reid_res['qualities'][ + idx]) + if vehicleplate_res and vehicleplate_res['plate'][idx] != "": + self.collector[ids]["vehicleplate"].append(vehicleplate_res[ + 'plate'][idx]) + + def get_res(self): + return self.collector + + def get_carlp(self, trackid): + lps = self.collector[trackid]["vehicleplate"] + counter = Counter(lps) + carlp = counter.most_common() + if len(carlp) > 0: + return carlp[0][0] + else: + return None \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/images/ppvehicleplate.jpg b/PaddleDetection-release-2.6/deploy/pipeline/docs/images/ppvehicleplate.jpg new file mode 100644 index 0000000000000000000000000000000000000000..85e0aabf8bc0c4d40b2cce4d1b9a58e8bc8e1dbb Binary files /dev/null and b/PaddleDetection-release-2.6/deploy/pipeline/docs/images/ppvehicleplate.jpg differ diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..236287ddb5125224d5d2e30e9958a2dcaecd296f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md @@ -0,0 +1,243 @@ +[English](PPHuman_QUICK_STARTED_en.md) | 简体中文 + +# PP-Human快速开始 + +## 目录 + +- [环境准备](#环境准备) +- [模型下载](#模型下载) +- [配置文件说明](#配置文件说明) +- [预测部署](#预测部署) + - [在线视频流](#在线视频流) + - [Jetson部署说明](#Jetson部署说明) + - [参数说明](#参数说明) +- [方案介绍](#方案介绍) + - [行人检测](#行人检测) + - [行人跟踪](#行人跟踪) + - [跨镜行人跟踪](#跨镜行人跟踪) + - [属性识别](#属性识别) + - [行为识别](#行为识别) + +## 环境准备 + +环境要求: PaddleDetection版本 >= release/2.4 或 develop版本 + +PaddlePaddle和PaddleDetection安装 + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +1. 详细安装文档参考[文档](../../../../docs/tutorials/INSTALL_cn.md) +2. 如果需要TensorRT推理加速(测速方式),请安装带`TensorRT版本Paddle`。您可以从[Paddle安装包](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python)下载安装,或者按照[指导文档](https://www.paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html)使用docker或自编译方式准备Paddle环境。 + +## 模型下载 + +PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模型,以实现不同使用场景,用户可以直接下载使用 + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人检测(超轻量级) | 10ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人跟踪(超轻量级) | 13.2ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| 跨镜跟踪(REID) | 单人1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID:92M | +| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M
    属性识别:86M | +| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 多目标跟踪:182M | +| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 单人6.0ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + +下载模型后,解压至`./output_inference`文件夹。 + +在配置文件中,模型路径默认为模型的下载路径,如果用户不修改,则在推理时会自动下载对应的模型。 + +**注意:** + +- 模型精度为融合数据集结果,数据集包含开源数据集和企业数据集 +- ReID模型精度为Market1501数据集测试结果 +- 预测速度为T4下,开启TensorRT FP16的效果, 模型预测速度包含数据预处理、模型预测、后处理全流程 + +## 配置文件说明 + +PP-Human相关配置位于```deploy/pipeline/config/infer_cfg_pphuman.yml```中,存放模型路径,该配置文件中包含了目前PP-Human支持的所有功能。如果想要查看某个单一功能的配置,请参见```deploy/pipeline/config/examples/```中相关配置。此外,配置文件中的内容可以通过```-o```命令行参数修改,如修改属性的模型目录,则可通过```-o ATTR.model_dir="DIR_PATH"```进行设置。 + +功能及任务类型对应表单如下: + +| 输入类型 | 功能 | 任务类型 | 配置项 | +|-------|-------|----------|-----| +| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR | +| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR | +| 单镜头视频 | 行为识别 | 多目标跟踪 关键点检测 摔倒识别 | MOT KPT SKELETON_ACTION | + +例如基于视频输入的属性识别,任务类型包含多目标跟踪和属性识别,具体配置如下: + +``` +crop_thresh: 0.5 +attr_thresh: 0.5 +visual: True + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: True +``` + +**注意:** + +- 如果用户需要实现不同任务,可以在配置文件对应enable选项设置为True。 + + +## 预测部署 + +1. 直接使用默认配置或者examples中配置文件,或者直接在`infer_cfg_pphuman.yml`中修改配置: +``` +# 例:行人检测,指定配置文件路径和测试图片,图片输入默认打开检测模型 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu + +# 例:行人属性识别,直接使用examples中配置 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu +``` + +2. 使用命令行进行功能开启,或者模型路径修改: +``` +# 例:行人跟踪,指定配置文件路径,模型路径和测试视频, 命令行中指定的模型路径优先级高于配置文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu + +# 例:行为识别,以摔倒识别为例,命令行中开启SKELETON_ACTION模型 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o SKELETON_ACTION.enbale=True --video_file=test_video.mp4 --device=gpu +``` + +### 在线视频流 + +在线视频流解码功能基于opencv的capture函数,支持rtsp、rtmp格式。 + +- rtsp拉流预测 + +对rtsp拉流的支持,使用--rtsp RTSP [RTSP ...]参数指定一路或者多路rtsp视频流,如果是多路地址中间用空格隔开。(或者video_file后面的视频地址直接更换为rtsp流地址),示例如下: +``` +# 例:行人属性识别,单路视频流 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE] --device=gpu + +# 例:行人属性识别,多路视频流 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu +``` + +- 视频结果推流rtsp + +预测结果进行rtsp推流,使用--pushurl rtsp:[IP] 推流到IP地址端,PC端可以使用[VLC播放器](https://vlc.onl/)打开网络流进行播放,播放地址为 `rtsp:[IP]/videoname`。其中`videoname`是预测的视频文件名,如果视频来源是本地摄像头则`videoname`默认为`output`. +``` +# 例:行人属性识别,单路视频流,该示例播放地址为 rtsp://[YOUR_SERVER_IP]:8554/test_video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554 +``` +注: +1. rtsp推流服务基于 [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), 如使用推流功能请先开启该服务. +使用方法很简单,以linux平台为例:1)下载对应平台release包;2)解压后在命令行执行命令 `./rtsp-simple-server`即可,成功后进入服务开启状态就可以接收视频流了。 +2. rtsp推流如果模型处理速度跟不上会出现很明显的卡顿现象,建议跟踪模型使用ppyoloe_s或ppyoloe-plus-tiny版本,方式为修改配置中跟踪模型mot_ppyoloe_l_36e_pipeline.zip替换为mot_ppyoloe_s_36e_pipeline.zip。 + + +### Jetson部署说明 + +由于Jetson平台算力相比服务器有较大差距,有如下使用建议: + +1. 模型选择轻量级版本,我们最新提供了轻量级[PP-YOLOE-Plus Tiny模型](../../../../configs/pphuman/README.md),该模型在Jetson AGX上可以实现4路视频流20fps实时跟踪。 +2. 如果需进一步提升速度,建议开启跟踪跳帧功能,推荐使用2或者3: `skip_frame_num: 3`,该功能当前默认关闭。 + +上述修改可以直接修改配置文件(推荐),也可以在命令行中修改(字段较长,不推荐)。 + +PP-YOLOE-Plus Tiny模型在AGX平台不同功能开启时的速度如下:(跟踪人数为3人情况下,以属性为例,总耗时为跟踪13.3+5.2*3≈29ms) + +| 功能 | 平均每帧耗时(ms) | 运行帧率(fps) | +|:----------|:----------|:----------| +| 跟踪 | 13 | 77 | +| 属性识别 | 29 | 34 | +| 摔倒识别 | 64.5 | 15.5 | +| 抽烟识别 | 68.8 | 14.5 | +| 打电话识别 | 22.5 | 44.5 | +| 打架识别 | 3.98 | 251 | + + + +### 参数说明 + +| 参数 | 是否必须|含义 | +|-------|-------|----------| +| --config | Yes | 配置文件路径 | +| -o | Option | 覆盖配置文件中对应的配置 | +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频,或者rtsp流地址(推荐使用rtsp参数) | +| --rtsp | Option | rtsp视频流地址,支持一路或者多路同时输入 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --pushurl | Option| 对预测结果视频进行推流的地址,以rtsp://开头,该选项优先级高于视频结果本地存储,打开时不再另外存储本地预测结果视频| +| --output_dir | Option|可视化结果保存的根目录,默认为output/| +| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | +| --region_type | Option | 'horizontal'(默认值)、'vertical':表示流量统计方向选择;'custom':表示设置闯入区域 | +| --region_polygon | Option | 设置闯入区域多边形多点的坐标,无默认值 | +| --do_break_in_counting | Option | 此项表示做区域闯入检查 | + +## 方案介绍 + +PP-Human v2整体方案如下图所示: + +
    + +
    + + +### 行人检测 +- 采用PP-YOLOE L 作为目标检测模型 +- 详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)和[检测跟踪文档](pphuman_mot.md) + +### 行人跟踪 +- 采用SDE方案完成行人跟踪 +- 检测模型使用PP-YOLOE L(高精度)和S(轻量级) +- 跟踪模块采用OC-SORT方案 +- 详细文档参考[OC-SORT](../../../../configs/mot/ocsort)和[检测跟踪文档](pphuman_mot.md) + +### 跨镜行人跟踪 +- 使用PP-YOLOE + OC-SORT得到单镜头多目标跟踪轨迹 +- 使用ReID(StrongBaseline网络)对每一帧的检测结果提取特征 +- 多镜头轨迹特征进行匹配,得到跨镜头跟踪结果 +- 详细文档参考[跨镜跟踪](pphuman_mtmct.md) + +### 属性识别 +- 使用PP-YOLOE + OC-SORT跟踪人体 +- 使用PP-HGNet、PP-LCNet(多分类模型)完成识别属性,主要属性包括年龄、性别、帽子、眼睛、上衣下衣款式、背包等 +- 详细文档参考[属性识别](pphuman_attribute.md) + +### 行为识别: +- 提供四种行为识别方案 +- 1. 基于骨骼点的行为识别,例如摔倒识别 +- 2. 基于图像分类的行为识别,例如打电话识别 +- 3. 基于检测的行为识别,例如吸烟识别 +- 4. 基于视频分类的行为识别,例如打架识别 +- 详细文档参考[行为识别](pphuman_action.md) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED_en.md new file mode 100644 index 0000000000000000000000000000000000000000..cd71732710516ec3add025a4003b0b39de2d11e0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED_en.md @@ -0,0 +1,247 @@ +English | [简体中文](PPHuman_QUICK_STARTED.md) + +# Quick Start for PP-Human + +## Contents + +- [Environment Preparation](#Environment-Preparation) +- [Model Download](#Model-Download) +- [Configuration](#Configuration) +- [Inference Deployment](#Inference-Deployment) + - [rtsp_stream](#rtsp_stream) + - [Nvidia_Jetson](#Nvidia_Jetson) + - [Parameters](#Parameters) +- [Solutions](#Solutions) + - [Pedestrian Detection](#edestrian-Detection) + - [Pedestrian Tracking](#Pedestrian-Tracking) + - [Multi-camera & multi-pedestrain tracking](#Multi-camera-&-multi-pedestrain-tracking) + - [Attribute Recognition](#Attribute-Recognition) + - [Behavior Recognition](#Behavior-Recognition) + +## Environment Preparation + +Environment requirements: PaddleDetection>= release/2.4 or develop version + +Installation of PaddlePaddle and PaddleDetection + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +#Clone PaddleDetection repositories +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# Install dependencies +cd PaddleDetection +pip install -r requirements.txt +``` + +1. For installation details, please refer to [Installation Tutorials](../../../../docs/tutorials/INSTALL.md) +2. If you need TensorRT inference acceleration (speed measurement), please install PaddlePaddle with `TensorRT version`. You can download and install it from the [PaddlePaddle Installation Package](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python) or follow the [Instructions](https://www. paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html) or use docker, or self-compiling to prepare the environment. + +## Model Download + +PP-Human provides object detection, attribute recognition, behaviour recognition and ReID pre-trained models for different applications. Developers can download them directly. + +| Task | End-to(ms) | Model Solution | Model Size | +|:--------------------------------------:|:--------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------:| +| Pedestrian Detection (high precision) | 25.1ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian Detection (Lightweight) | 16.2ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian detection (super lightweight) | 10ms(Jetson AGX) | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| Pedestrian Tracking (high precision) | 31.8ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian Tracking (Lightweight) | 21.0ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian tracking(super lightweight) | 13.2ms(Jetson AGX) | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/pphuman/ppyoloe_plus_crn_t_auxhead_320_60e_pphuman.tar.gz) | 17M | +| MTMCT(REID) | Single Person 1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID:92M | +| Attribute Recognition (high precision) | Single Person 8.5ms | [Object Detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute Recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | Object Detection:182M
    Attribute Recogniton:86M | +| Attribute Recognition (Lightweight) | Single Person 7.1ms | [Object Detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute Recogniton](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | Object Detection:182M
    Attribute Recogniton:86M | +| Falling Detection | Single Person 10ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Keypoint Detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [Skeleton Based Action Recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-Object Tracking:182M
    Keypoint Detection:101M
    Skeleton Based Action Recognition:21.8M | +| Breaking-In Detection | 31.8ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | Multi-Object Tracking:182M | +| Fighting Detection | 19.7ms | [Video Classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| Smoking Detection | Single Person 15.1ms | [Object Detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Object Detection Based On Body ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | Object Detection:182M
    Object Detection Based On Body ID:27M | +| Phone-calling Detection | Single Person 6.0ms | [Object Detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Image Classification Based On Body ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | Object Detection:182M
    Image Classification Based On Body ID:45M | + +Download the model and unzip it into the `. /output_inference` folder. + +In the configuration file, the model path defaults to the download path of the model. If the user does not change it, the corresponding model will be downloaded automatically upon inference. + +**Note:** + +- Model accuracy is tested on fused datasets, which contain both open source and enterprise datasets. +- ReID model accuracy is tested on the Market1501 dataset +- Prediction speed is obtained at T4 with TensorRT FP16 enabled, which includes data pre-processing, model inference and post-processing. + +## Configuration + +The PP-Human-related configuration is located in ``deploy/pipeline/config/infer_cfg_pphuman.yml``, and this configuration file contains all the features currently supported by PP-Human. If you want to see the configuration for a specific feature, please refer to the relevant configuration in ``deploy/pipeline/config/examples/``. In addition, the contents of the configuration file can be modified with the `-o`command line parameter. E.g. to modify the model directory of an attribute, developers can run ```-o ATTR.model_dir="DIR_PATH"``. + +The features and corresponding task types are as follows. + +| Input | Feature | Task | Config | +| ------------------- | --------------------- | ---------------------------------------------------------- | ----------------------- | +| Image | Attribute Recognition | Object Detection Attribute Recognition | DET ATTR | +| Single-camera video | Attribute Recognition | Multi-Object Tracking Attribute Recognition | MOT ATTR | +| Single-camera video | Behaviour Recognition | Multi-Object Tracking Keypoint Detection Falling detection | MOT KPT SKELETON_ACTION | + +Take attribute recognition based on video input as an example: Its task type includes multi-object tracking and attributes recognition. The specific configuration is as follows. + +``` +crop_thresh: 0.5 +attr_thresh: 0.5 +visual: True + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: True +``` + +**Note:** + +- If developer needs to carry out different tasks, set the corresponding enables option to be True in the configuration file. + +## Inference Deployment + +1. Use the default configuration directly or the configuration file in examples, or modify the configuration in `infer_cfg_pphuman.yml` + + ``` + # Example: In pedestrian detection model, specify configuration file path and test image, and image input opens detection model by default + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu + # Example: In pedestrian attribute recognition, directly configure the examples + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu + ``` + +2. Use the command line to enable functions or change the model path. + +``` +# Example: Pedestrian tracking, specify config file path, model path and test video. The specified model path on the command line has a higher priority than the config file. +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu + +# Example: In behaviour recognition, with fall recognition as an example, enable the SKELETON_ACTION model on the command line +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o SKELETON_ACTION.enbale=True --video_file=test_video.mp4 --device=gpu +``` + +### rtsp_stream + +The online stream decode based on opencv Capture function, normally support rtsp and rtmp. + +- rtsp pull stream + +For rtsp pull stream, use `--rtsp RTSP [RTSP ...]` parameter to specify one or more rtsp streams. Separate the multiple addresses with a space, or replace the video address directly after the video_file with the rtsp stream address), examples as follows + +``` +# Example: Single video stream for pedestrian attribute recognition +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE] --device=gpu +# Example: Multiple-video stream for pedestrian attribute recognition +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu | +``` + +- rtsp push stream + +For rtsp push stream, use `--pushurl rtsp:[IP]` parameter to push stream to a IP set, and you can visualize the output video by [VLC Player](https://vlc.onl/) with the `open network` funciton. the whole url path is `rtsp:[IP]/videoname`, the videoname here is the basename of the video file to infer, and the default of videoname is `output` when the video come from local camera and there is no video name. + +``` +# Example:Pedestrian attribute recognition,in this example the whole url path is: rtsp://[YOUR_SERVER_IP]:8554/test_video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554 +``` +Note: +1. rtsp push stream is based on [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), please enable this serving first. +It's very easy to use: 1) download the [release package](https://github.com/aler9/rtsp-simple-server/releases) which is compatible with your workspace. 2) run command './rtsp-simple-server', which works as a rtsp server. +2. the output visualize will be frozen frequently if the model cost too much time, we suggest to use faster model like ppyoloe_s or ppyoloe_plus_tiny in tracking, this is simply replace mot_ppyoloe_l_36e_pipeline.zip with mot_ppyoloe_s_36e_pipeline.zip in model config yaml file. + + +### Nvidia_Jetson + +Due to the large gap in computing power of the Jetson platform compared to the server, we suggest: + +1. choose a lightweight model, we provide a new model named [PP-YOLOE-Plus Tiny](../../../../configs/pphuman/README.md),which achieve 20fps with four rtsp streams work togather on Jetson AGX. +2. For further speedup, you can set frame skipping of tracking; we recommend 2 or 3: `skip_frame_num: 3` + +PP-YOLOE-Plus Tiny module speed test data on AGX:(three people in video, for example of attribute,the whole time cost per frame is 13.3+5.2*3≈29ms) + +| module | time cost per frame(ms) | speed(fps) | +|:----------|:----------|:----------| +| tracking | 13 | 77 | +| Attribute | 29 | 34 | +| falldown | 64.5 | 15.5 | +| smoking | 68.8 | 14.5 | +| calling | 22.5 | 44.5 | +| fighting | 3.98 | 251 | + + + +### Parameters + +| Parameters | Necessity | Implications | +| ---------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| --config | Yes | Path to configuration file | +| -o | Option | Overwrite the corresponding configuration in the configuration file | +| --image_file | Option | Images to be predicted | +| --image_dir | Option | Path to the images folder to be predicted | +| --video_file | Option | Video to be predicted, or rtsp stream address (rtsp parameter recommended) | +| --rtsp | Option | rtsp video stream address, supports one or more simultaneous streams input | +| --camera_id | Option | The camera ID for prediction, default is -1 ( for no camera prediction, can be set to 0 - (number of cameras - 1) ), press `q` in the visualization interface during the prediction process to output the prediction result to: output/output.mp4 | +| --device | Option | Running device, options include `CPU/GPU/XPU`, and the default is `CPU`. | +| --pushurl | Option | push the output video to rtsp stream, normaly start with `rtsp://`; this has higher priority than local video save, while this is set, pipeline will not save local visualize video, the default is "", means this will not work now.| +| --output_dir | Option | The root directory for the visualization results, and the default is output/ | +| --run_mode | Option | For GPU, the default is paddle, with (paddle/trt_fp32/trt_fp16/trt_int8) as optional | +| --enable_mkldnn | Option | Whether to enable MKLDNN acceleration in CPU prediction, the default is False | +| --cpu_threads | Option | Set the number of cpu threads, and the default is 1 | +| --trt_calib_mode | Option | Whether TensorRT uses the calibration function, and the default is False; set to True when using TensorRT's int8 function and False when using the PaddleSlim quantized model | +| --do_entrance_counting | Option | Whether to count entrance/exit traffic flows, the default is False | +| --draw_center_traj | Option | Whether to map the trace, the default is False | +| --region_type | Option | 'horizontal' (default), 'vertical': traffic count direction; 'custom': set break-in area | +| --region_polygon | Option | Set the coordinates of the polygon multipoint in the break-in area. No default. | +| --do_break_in_counting | Option | Area break-in checks | + +## Solutions + +The overall solution for PP-Human v2 is shown in the graph below: + +### Pedestrian detection + +- Take PP-YOLOE L as the object detection model +- For detailed documentation, please refer to [PP-YOLOE](../../../../configs/ppyoloe/) and [Multiple-Object-Tracking](pphuman_mot_en.md) + +### Pedestrian tracking + +- Vehicle tracking by SDE solution +- Adopt PP-YOLOE L (high precision) and S (lightweight) for detection models +- Adopt the OC-SORT solution for racking module +- Refer to [OC-SORT](../../../../configs/mot/ocsort) and [Multi-Object Tracking](pphuman_mot_en.md) for details + +### Multi-camera & multi-pedestrain tracking + +- Use PP-YOLOE & OC-SORT to acquire single-camera multi-object tracking trajectory +- Extract features for each frame using ReID (StrongBaseline network). +- Match multi-camera trajectory features to obtain multi-camera tracking results. +- Refer to [Multi-camera & multi-pedestrain tracking](pphuman_mtmct_en.md) for details. + +### Attribute Recognition + +- Use PP-YOLOE + OC-SORT to track the human body. +- Use PP-HGNet, PP-LCNet (multi-classification model) to complete the attribute recognition. Main attributes include age, gender, hat, eyes, top and bottom dressing style, backpack. +- Refer to [attribute recognition](pphuman_attribute_en.md) for details. + +### Behaviour Recognition: + +- Four behaviour recognition solutions are provided: + +- 1. Behaviour recognition based on skeletal points, e.g. falling recognition + +- 2. Behaviour recognition based on image classification, e.g. phone call recognition + +- 3. Behaviour recognition based on detection, e.g. smoking recognition + +- 4. Behaviour recognition based on Video classification, e.g. fighting recognition + +- For details, please refer to [Behaviour Recognition](pphuman_action_en.md) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..b131dfd09a0d215cb59b9f47f32ec23c656ff268 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md @@ -0,0 +1,244 @@ +[English](PPVehicle_QUICK_STARTED_en.md) | 简体中文 + +# PP-Vehicle快速开始 + +## 目录 + +- [环境准备](#环境准备) +- [模型下载](#模型下载) +- [配置文件说明](#配置文件说明) +- [预测部署](#预测部署) + - [在线视频流](#在线视频流) + - [Jetson部署说明](#Jetson部署说明) + - [参数说明](#参数说明) +- [方案介绍](#方案介绍) + - [车辆检测](#车辆检测) + - [车辆跟踪](#车辆跟踪) + - [车牌识别](#车牌识别) + - [属性识别](#属性识别) + - [违章停车识别](#违章停车识别) + + +## 环境准备 + +环境要求: PaddleDetection版本 >= release/2.5 或 develop版本 + +PaddlePaddle和PaddleDetection安装 + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +1. 详细安装文档参考[文档](../../../../docs/tutorials/INSTALL_cn.md) +2. 如果需要TensorRT推理加速(测速方式),请安装带`TensorRT版本Paddle`。您可以从[Paddle安装包](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python)下载安装,或者按照[指导文档](https://www.paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html)使用docker或自编译方式准备Paddle环境。 + +## 模型下载 + +PP-Vehicle提供了目标检测、属性识别、行为识别、ReID预训练模型,以实现不同使用场景,用户可以直接下载使用 + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 车辆检测(高精度) | 25.7ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车辆检测(轻量级) | 13.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| 车辆检测(超轻量级) | 10ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| 车辆跟踪(高精度) | 40ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| 车辆跟踪(轻量级) | 25ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| 车辆跟踪(超轻量级) | 13.2ms(Jetson AGX) | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| 车牌识别 | 4.68ms | [车牌检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [车牌字符识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | 车牌检测:3.9M
    车牌字符识别: 12M | +| 车辆属性 | 7.31ms | [车辆属性](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | +| 车道线检测 | 47ms | [车道线模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | 47M | + +下载模型后,解压至`./output_inference`文件夹。 + +在配置文件中,模型路径默认为模型的下载路径,如果用户不修改,则在推理时会自动下载对应的模型。 + +**注意:** + +- 检测跟踪模型精度为公开数据集BDD100K-MOT和UA-DETRAC整合后的联合数据集PPVehicle的结果,具体参照[ppvehicle](../../../../configs/ppvehicle) +- 预测速度为T4下,开启TensorRT FP16的效果, 模型预测速度包含数据预处理、模型预测、后处理全流程 + +## 配置文件说明 + +PP-Vehicle相关配置位于```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中,存放模型路径,完成不同功能需要设置不同的任务类型 + +功能及任务类型对应表单如下: + +| 输入类型 | 功能 | 任务类型 | 配置项 | +|-------|-------|----------|-----| +| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR | +| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR | +| 单镜头视频 | 车牌识别 | 多目标跟踪 车牌识别 | MOT VEHICLEPLATE | + +例如基于视频输入的属性识别,任务类型包含多目标跟踪和属性识别,具体配置如下: + +``` +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +VEHICLE_ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip + batch_size: 8 + color_threshold: 0.5 + type_threshold: 0.5 + enable: True +``` + +**注意:** + +- 如果用户需要实现不同任务,可以在配置文件对应enable选项设置为True。 +- 如果用户仅需要修改模型文件路径,可以在命令行中--config后面紧跟着 `-o MOT.model_dir=ppyoloe/` 进行修改即可,也可以手动修改配置文件中的相应模型路径,详细说明参考下方参数说明文档。 + + +## 预测部署 + +1. 直接使用默认配置或者examples中配置文件,或者直接在`infer_cfg_ppvehicle.yml`中修改配置: +``` +# 例:车辆检测,指定配置文件路径和测试图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --image_file=test_image.jpg --device=gpu + +# 例:车辆车牌识别,指定配置文件路径和测试视频 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml --video_file=test_video.mp4 --device=gpu +``` + +2. 使用命令行进行功能开启,或者模型路径修改: +``` +# 例:车辆跟踪,指定配置文件路径和测试视频,命令行中开启MOT模型并修改模型路径,命令行中指定的模型路径优先级高于配置文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu + +# 例:车辆违章分析,指定配置文件和测试视频,命令行中指定违停区域设置、违停时间判断。 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml \ + --video_file=../car_test.mov \ + --device=gpu \ + --draw_center_traj \ + --illegal_parking_time=3 \ + --region_type=custom \ + --region_polygon 600 300 1300 300 1300 800 600 800 + +``` + +### 在线视频流 + +在线视频流解码功能基于opencv的capture函数,支持rtsp、rtmp格式。 + +- rtsp拉流预测 + +对rtsp拉流的支持,使用--rtsp RTSP [RTSP ...]参数指定一路或者多路rtsp视频流,如果是多路地址中间用空格隔开。(或者video_file后面的视频地址直接更换为rtsp流地址),示例如下: +``` +# 例:车辆属性识别,单路视频流 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE] --device=gpu + +# 例:车辆属性识别,多路视频流 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu +``` + +- 视频结果推流rtsp + +预测结果进行rtsp推流,使用--pushurl rtsp:[IP] 推流到IP地址端,PC端可以使用[VLC播放器](https://vlc.onl/)打开网络流进行播放,播放地址为 `rtsp:[IP]/videoname`。其中`videoname`是预测的视频文件名,如果视频来源是本地摄像头则`videoname`默认为`output`. +``` +# 例:车辆属性识别,单路视频流,该示例播放地址为 rtsp://[YOUR_SERVER_IP]:8554/test_video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_attr.yml -o visual=False --video_file=test_video.mp4 --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554 +``` +注: +1. rtsp推流服务基于 [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), 如使用推流功能请先开启该服务. +使用方法很简单,以linux平台为例:1)下载对应平台release包;2)解压后在命令行执行命令 `./rtsp-simple-server`即可,成功后进入服务开启状态就可以接收视频流了。 +2. rtsp推流如果模型处理速度跟不上会出现很明显的卡顿现象,建议跟踪模型使用ppyoloe_s版本,即修改配置中跟踪模型mot_ppyoloe_l_36e_pipeline.zip替换为mot_ppyoloe_s_36e_pipeline.zip。 + +### Jetson部署说明 + +由于Jetson平台算力相比服务器有较大差距,有如下使用建议: + +1. 模型选择轻量级版本,我们最新提供了轻量级[PP-YOLOE-Plus Tiny模型](../../../../configs/ppvehicle/README.md),该模型在Jetson AGX上可以实现4路视频流20fps实时跟踪。 +2. 如果需进一步提升速度,建议开启跟踪跳帧功能,推荐使用2或者3: `skip_frame_num: 3`,该功能当前默认关闭。 + +上述修改可以直接修改配置文件(推荐),也可以在命令行中修改(字段较长,不推荐)。 + +PP-YOLOE-Plus Tiny模型在AGX平台不同功能开启时的速度如下:(测试视频跟踪车辆为1个) + +| 功能 | 平均每帧耗时(ms) | 运行帧率(fps) | +|:----------|:----------|:----------| +| 跟踪 | 13 | 77 | +| 属性识别 | 20.2 | 49.4 | +| 车牌识别 | - | - | + + +### 参数说明 + +| 参数 | 是否必须|含义 | +|-------|-------|----------| +| --config | Yes | 配置文件路径 | +| -o | Option | 覆盖配置文件中对应的配置 | +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频,或者rtsp流地址 | +| --rtsp | Option | rtsp视频流地址,支持一路或者多路同时输入 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --pushurl | Option| 对预测结果视频进行推流的地址,以rtsp://开头,该选项优先级高于视频结果本地存储,打开时不再另外存储本地预测结果视频, 默认为空,表示没有开启| +| --output_dir | Option|可视化结果保存的根目录,默认为output/| +| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | +| --region_type | Option | 'horizontal'(默认值)、'vertical':表示流量统计方向选择;'custom':表示设置车辆禁停区域 | +| --region_polygon | Option | 设置禁停区域多边形多点的坐标,无默认值 | +| --illegal_parking_time | Option | 设置禁停时间阈值,单位秒(s),-1(默认值)表示不做检查 | + +## 方案介绍 + +PP-Vehicle 整体方案如下图所示: + +
    + +
    + + +### 车辆检测 +- 采用PP-YOLOE L 作为目标检测模型 +- 详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)和[检测跟踪文档](ppvehicle_mot.md) + +### 车辆跟踪 +- 采用SDE方案完成车辆跟踪 +- 检测模型使用PP-YOLOE L(高精度)和S(轻量级) +- 跟踪模块采用OC-SORT方案 +- 详细文档参考[OC-SORT](../../../../configs/mot/ocsort)和[检测跟踪文档](ppvehicle_mot.md) + +### 属性识别 +- 使用PaddleClas提供的特色模型PP-LCNet,实现对车辆颜色及车型属性的识别。 +- 详细文档参考[属性识别](ppvehicle_attribute.md) + +### 车牌识别 +- 使用PaddleOCR特色模型ch_PP-OCRv3_det+ch_PP-OCRv3_rec模型,识别车牌号码 +- 详细文档参考[车牌识别](ppvehicle_plate.md) + +### 违章停车识别 +- 车辆跟踪模型使用高精度模型PP-YOLOE L,根据车辆的跟踪轨迹以及指定的违停区域判断是否违章停车,如果存在则展示违章停车车牌号。 +- 详细文档参考[违章停车识别](ppvehicle_illegal_parking.md) + +### 违法分析-逆行 +- 违法分析-逆行,通过使用高精度分割模型PP-Seg,对车道线进行分割拟合,然后与车辆轨迹组合判断车辆行驶方向是否与道路方向一致。 +- 详细文档参考[违法分析-逆行](ppvehicle_retrograde.md) + +### 违法分析-压线 +- 违法分析-逆行,通过使用高精度分割模型PP-Seg,对车道线进行分割拟合,然后与车辆区域是否覆盖实线区域,进行压线判断。 +- 详细文档参考[违法分析-压线](ppvehicle_press.md) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED_en.md new file mode 100644 index 0000000000000000000000000000000000000000..7cdb42e18f5baee5d966410081c2a0c1346dff8e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED_en.md @@ -0,0 +1,254 @@ +English | [简体中文](PPVehicle_QUICK_STARTED.md) + +# Quick Start for PP-Vehicle + +## Content + +- [Environment Preparation](#Environment-Preparation) +- [Model Download](#Model-Download) +- [Configuration](#Configuration) +- [Inference Deployment](#Inference-Deployment) + - [rtsp_stream](#rtsp_stream) + - [Nvidia_Jetson](#Nvidia_Jetson) + - [Parameters](#Parameters) +- [Solutions](#Solutions) + - [Vehicle Detection](#Vehicle-Detection) + - [Vehicle Tracking](#Vehicle-Tracking) + - [License Plate Recognition](#License-Plate-Recognition) + - [Attribute Recognition](#Attribute-Recognition) + - [Illegal Parking Detection](#Illegal-Parking-Detection) + +## Environment Preparation + +Environment Preparation: PaddleDetection version >= release/2.5 or develop + +Installation of PaddlePaddle and PaddleDetection + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +# Clone PaddleDetectionrepositories +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# Installing dependencies +cd PaddleDetection +pip install -r requirements.txt +``` + +1. For installation details, please refer to [Installation Tutorials](../../../../docs/tutorials/INSTALL.md) +2. If you need TensorRT inference acceleration (speed measurement), please install PaddlePaddle with `TensorRT version`. You can download and install it from the [PaddlePaddle Installation Package](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python) or follow the [Instructions]([https://www](https://www). paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html) or use docker, or self-compiling to prepare the environment. + +## Model Download + +PP-Vehicle provides object detection, attribute recognition, behaviour recognition and ReID pre-trained models for different applications. Developers can download them directly. + +| Task | End-to(ms) | Model Solution | Model Size | +|:---------------------------------:|:----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------:| +| Vehicle Detection(high precision) | 25.7ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle Detection(Lightweight) | 13.2ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| Vehicle detection (super lightweight) | 10ms(Jetson AGX) | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| Vehicle Tracking(high precision) | 40ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M | +| Vehicle Tracking(Lightweight) | 25ms | [Multi-Object Tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M | +| Vehicle tracking (super lightweight) | 13.2ms(Jetson AGX) | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppvehicle/ppyoloe_plus_crn_t_auxhead_320_60e_ppvehicle.tar.gz) | 17M | +| License plate recognition | 4.68ms | [License plate recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)
    [License plate character recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | Vehicle Detection:3.9M
    License plate character recognition: 12M | +| Vehicle Attribute Recognition | 7.31ms | [Vehicle Attribute](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M | +| Lane line Segmentation | 47ms | [Lane line Segmentation](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | 47M | + +Download the model and unzip it into the `. /output_inference` folder. + +In the configuration file, the model path defaults to the download path of the model. If the user does not change it, the corresponding model will be downloaded automatically upon inference. + +**Notes:** + +- The accuracy of detection tracking model is obtained from the joint dataset PPVehicle (integration of the public dataset BDD100K-MOT and UA-DETRAC). For more details, please refer to [PP-Vehicle](../../../../configs/ppvehicle) +- Inference speed is obtained at T4 with TensorRT FP16 enabled, which includes data pre-processing, model inference and post-processing. + +## Configuration + +PP-Vehicle related configuration locates in ``deploy/pipeline/config/infer_cfg_ppvehicle.yml``. Developers need to set specific task types to use different features. + +The features and corresponding task types are as follows. + +| Input | Feature | Task | Config | +| ------------------- | --------------------- | ------------------------------------------- | ---------------- | +| Image | Attribute Recognition | Object Detection Attribute Recognition | DET ATTR | +| Single-camera video | Attribute Recognition | Multi-Object Tracking Attribute Recognition | MOT ATTR | +| Single-camera video | License-plate Recognition | Multi-Object Tracking License-plate Recognition | MOT VEHICLEPLATE | + +Take attribute recognition based on video input as an example: Its task type includes multi-object tracking and attributes recognition. The specific configuration is as follows. + +``` +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +VEHICLE_ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip + batch_size: 8 + color_threshold: 0.5 + type_threshold: 0.5 + enable: True +``` + +**Notes:** + +- If the developer needs to carry out different tasks, set the corresponding enables option to be True in the configuration file. +- If the developer only needs to modify the model file path, run the command line with `-o MOT.model_dir=ppyoloe/` after --config, or manually modify the corresponding model path in the configuration file. For more details, please refer to the following parameter descriptions + +## Inference Deployment + +1. Use the default configuration directly or the configuration file in examples, or modify the configuration in `infer_cfg_ppvehicle.yml` + +``` +# Example:In vehicle detection,specify configuration file path and test image +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --image_file=test_image.jpg --device=gpu + +# Example:In license plate recognition,directly configure the examples +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml --video_file=test_video.mp4 --device=gpu +``` + +2. Use the command line to enable functions or change the model path. + + ``` + # Example:In vehicle tracking,specify configuration file path and test video, Turn on the MOT model and modify the model path on the command line, the model path specified on the command line has higher priority than the configuration file + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu + + # Example:In vehicle illegal action analysis,specify configuration file path and test video, 命令行中指定违停区域设置、违停时间判断。 + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml \ + --video_file=../car_test.mov \ + --device=gpu \ + --draw_center_traj \ + --illegal_parking_time=3 \ + --region_type=custom \ + --region_polygon 600 300 1300 300 1300 800 600 800 + ``` + +### rtsp_stream + +The online stream decode based on opencv Capture function, normally support rtsp and rtmp. + +- rtsp pull stream + +For rtsp pull stream, use --rtsp RTSP [RTSP ...] parameter to specify one or more rtsp streams. Separate the multiple addresses with a space, or replace the video address directly after the video_file with the rtsp stream address), examples as follows + + ``` + # Example: Single video stream for pedestrian attribute recognition + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE] --device=gpu + # Example: Multiple-video stream for pedestrian attribute recognition + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu | + ``` + +- rtsp push stream + +For rtsp push stream, use --pushurl rtsp:[IP] parameter to push stream to a IP set, and you can visualize the output video by [VLC Player](https://vlc.onl/) with the `open network` funciton. the whole url path is `rtsp:[IP]/videoname`, the videoname here is the basename of the video file to infer, and the default of videoname is `output` when the video come from local camera and there is no video name. + +``` +# Example:license plate recognition,in this example the whole url path is: rtsp://[YOUR_SERVER_IP]:8554/test_video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_vehicle_plate.yml --video_file=test_video.mp4 --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554 +``` +Note: +1. rtsp push stream is based on [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), please enable this serving first. +It's very easy to use: 1) download the [release package](https://github.com/aler9/rtsp-simple-server/releases) which is compatible with your workspace. 2) run command './rtsp-simple-server', which works as a rtsp server. +2. the output visualize will be frozen frequently if the model cost too much time, we suggest to use faster model like ppyoloe_s in tracking, this is simply replace mot_ppyoloe_l_36e_pipeline.zip with mot_ppyoloe_s_36e_pipeline.zip in model config yaml file. + +### Nvidia_Jetson + +Due to the large gap in computing power of the Jetson platform compared to the server, we suggest: + +1. choose a lightweight model, we provide a new model named [PP-YOLOE-Plus Tiny](../../../../configs/ppvehicle/README.md),which achieve 20fps with four rtsp streams work togather on Jetson AGX. +2. For further speedup, you can set frame skipping of tracking; we recommend 2 or 3: `skip_frame_num: 3` + +PP-YOLOE-Plus Tiny module speed test data on AGX:(a single car in the test video) + +| module | time cost per frame(ms) | speed(fps) | +|:----------|:----------|:----------| +| tracking | 13 | 77 | +| Attribute | 20.2 | 49.4 | +| Plate | - | - | + + +### Parameters + +# + +| Parameters | Necessity | Implications | +| ---------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| --config | Yes | Path to configuration file | +| -o | Option | Overwrite the corresponding configuration in the configuration file | +| --image_file | Option | Images to be predicted | +| --image_dir | Option | Path to the images folder to be predicted | +| --video_file | Option | Video to be predicted, or rtsp stream address (rtsp parameter recommended) | +| --rtsp | Option | rtsp video stream address, supports one or more simultaneous streams input | +| --camera_id | Option | The camera ID for prediction, default is -1 ( for no camera prediction, can be set to 0 - (number of cameras - 1) ), press `q` in the visualization interface during the prediction process to output the prediction result to: output/output.mp4 | +| --device | Option | Running device, options include `CPU/GPU/XPU`, and the default is `CPU`. | +| --pushurl | Option | push the output video to rtsp stream, normaly start with `rtsp://`; this has higher priority than local video save, while this is set, pipeline will not save local visualize video, the default is "", means this will not work now.| +| --output_dir | Option | The root directory for the visualization results, and the default is output/ | +| --run_mode | Option | For GPU, the default is paddle, with (paddle/trt_fp32/trt_fp16/trt_int8) as optional | +| --enable_mkldnn | Option | Whether to enable MKLDNN acceleration in CPU prediction, the default is False | +| --cpu_threads | Option | Set the number of cpu threads, and the default is 1 | +| --trt_calib_mode | Option | Whether TensorRT uses the calibration function, and the default is False; set to True when using TensorRT's int8 function and False when using the PaddleSlim quantized model | +| --do_entrance_counting | Option | Whether to count entrance/exit traffic flows, the default is False | +| --draw_center_traj | Option | Whether to draw center trajectory, the default is False | +| --region_type | Option | 'horizontal' (default), 'vertical': traffic count direction; 'custom': set illegal parking area | +| --region_polygon | Option | Set the coordinates of the polygon multipoint in the illegal parking area. No default. | +| --illegal_parking_time | Option | Set the time threshold for illegal parking in seconds (s), -1 (default) indicates no check | + +## Solutions + +The overall solution for PP-Vehicle v2 is shown in the graph below: + +
    + +
    + +### + +### Vehicle detection + +- Take PP-YOLOE L as the object detection model +- For detailed documentation, please refer to [PP-YOLOE](../../../../configs/ppyoloe/) and [Multiple-Object-Tracking](ppvehicle_mot_en.md) + +### Vehicle tracking + +- Vehicle tracking by SDE solution +- Adopt PP-YOLOE L (high precision) and S (lightweight) for detection models +- Adopt the OC-SORT solution for racking module +- Refer to [OC-SORT](../../../../configs/mot/ocsort) and [Multi-Object Tracking](ppvehicle_mot_en.md) for details + +### Attribute Recognition + +- Use PP-LCNet provided by PaddleClas to recognize vehicle colours and model attributes. +- For details, please refer to [Attribute Recognition](ppvehicle_attribute_en.md) + +### License plate recognition + +- Use ch_PP-OCRv3_det+ch_PP-OCRv3_rec model to recognize license plate number +- For details, please refer to [Plate Recognition](ppvehicle_plate_en.md) + +### Illegal Parking Detection + +- Use vehicle tracking model (high precision) PP-YOLOE L to determine whether the parking is illegal based on the vehicle's trajectory and the designated illegal parking area. If it is illegal parking, display the illegal parking plate number. + +- For details, please refer to [Illegal Parking Detection](ppvehicle_illegal_parking_en.md) + +#### Vehicle Press Line + +- Use segmentation model PP-LiteSeg to get the lane line in frame, combine it with vehicle route to find out the vehicle against traffic. +- For details, please refer to [Vehicle Press Line](ppvehicle_press_en.md) + +#### Vehicle Retrograde + +- Use segmentation model PP-LiteSeg to get the lane line in frame, combine it with vehicle detection box to juege if the car is pressing on lines. +- For details, please refer to [Vehicle Retrograde](ppvehicle_retrograde_en.md) + diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action.md new file mode 100644 index 0000000000000000000000000000000000000000..9f536e58bf74f033027fe8661a1ed468790adf21 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action.md @@ -0,0 +1,272 @@ +[English](pphuman_action_en.md) | 简体中文 + +# PP-Human行为识别模块 + +## 目录 + +- [基于骨骼点的行为识别](#基于骨骼点的行为识别) +- [基于图像分类的行为识别](#基于图像分类的行为识别) +- [基于检测的行为识别](#基于检测的行为识别) +- [基于行人轨迹的行为识别](#基于行人轨迹的行为识别) +- [基于视频分类的行为识别](#基于视频分类的行为识别) + +行为识别在智慧社区,安防监控等方向具有广泛应用,根据行为的不同,PP-Human中集成了基于视频分类、基于检测、基于图像分类,基于行人轨迹以及基于骨骼点的行为识别模块,方便用户根据需求进行选择。 + +## 基于骨骼点的行为识别 + +应用行为:摔倒识别 + +
    +
    +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 + +基于骨骼点的行为识别包含行人检测/跟踪,关键点检测和摔倒行为识别三个模型,首先需要下载以下预训练模型 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 关键点识别 | HRNet | AP: 87.1 | 单人 2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)| +| 摔倒行为识别 | ST-GCN | 准确率: 96.43 | 单人 2.7ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 关键点模型使用[COCO](https://cocodataset.org/),[UAV-Human](https://github.com/SUTDCV/UAV-Human)和部分业务数据融合训练, 精度在业务数据测试集上得到。 +3. 摔倒行为识别模型使用[NTU-RGB+D](https://rose1.ntu.edu.sg/dataset/actionRecognition/),[UR Fall Detection Dataset](http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html)和部分业务数据融合训练,精度在业务数据测试集上得到。 +4. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下: +``` +SKELETON_ACTION: # 基于骨骼点的行为识别模型配置 + model_dir: output_inference/STGCN # 模型所在路径 + batch_size: 1 # 预测批大小。 当前仅支持为1进行推理 + max_frames: 50 # 动作片段对应的帧数。在行人ID对应时序骨骼点结果时达到该帧数后,会通过行为识别模型判断该段序列的动作类型。与训练设置一致时效果最佳。 + display_frames: 80 # 显示帧数。当预测结果为摔倒时,在对应人物ID中显示状态的持续时间。 + coord_size: [384, 512] # 坐标统一缩放到的尺度大小。与训练设置一致时效果最佳。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`关键点识别`、`摔倒行为识别`三个预测部署模型并解压到```./output_inference```路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 目前行为识别模块仅支持视频输入,根据期望开启的行为识别方案类型,设置infer_cfg_pphuman.yml中`SKELETON_ACTION`的enable: True, 然后启动命令如下: + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + ``` + +3. 若修改模型路径,有以下两种方式: + + - ```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,关键点模型和摔倒行为识别模型分别对应`KPT`和`SKELETON_ACTION`字段,修改对应字段下的路径为实际期望的路径即可。 + - 命令行中--config后面紧跟着增加`-o KPT.model_dir=xxx SKELETON_ACTION.model_dir=xxx `修改模型路径: + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + -o KPT.model_dir=./dark_hrnet_w32_256x192 SKELETON_ACTION.model_dir=./STGCN \ + --video_file=test_video.mp4 \ + --device=gpu + ``` + +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + + +### 方案说明 +1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 使用[关键点识别模型](../../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml)得到对应的17个骨骼特征点。骨骼特征点的顺序及类型与COCO一致,详见[如何准备关键点数据集](../../../../docs/tutorials/data/PrepareKeypointDataSet.md)中的`COCO数据集`部分。 +4. 每个跟踪ID对应的目标行人各自累计骨骼特征点结果,组成该人物的时序关键点序列。当累计到预定帧数或跟踪丢失后,使用行为识别模型判断时序关键点序列的动作类型。当前版本模型支持摔倒行为的识别,预测得到的`class id`对应关系为: +``` +0: 摔倒, +1: 其他 +``` +- 摔倒行为识别模型使用了[ST-GCN](https://arxiv.org/abs/1801.07455),并基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)套件完成模型训练。 + +## 基于图像分类的行为识别 + +应用行为:打电话识别 + +
    +
    +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 + +基于图像分类的行为识别包含行人检测/跟踪,打电话识别两个模型,首先需要下载以下预训练模型 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 打电话识别 | PP-HGNet | 准确率: 86.85 | 单人 2.94ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | + + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 打电话行为识别模型使用[UAV-Human](https://github.com/SUTDCV/UAV-Human)的打电话行为部分进行训练和测试。 +3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下: +``` +ID_BASED_CLSACTION: # 基于分类的行为识别模型配置 + model_dir: output_inference/PPHGNet_tiny_calling_halfbody # 模型所在路径 + batch_size: 8 # 预测批大小 + threshold: 0.45 #识别为对应行为的阈值 + display_frames: 80 # 显示帧数。当识别到对应动作时,在对应人物ID中显示状态的持续时间。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`打电话行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_CLSACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + +### 方案说明 +1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 通过在帧级别的行人图像通过图像分类的方式实现。当图片所属类别为对应行为时,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-HGNet](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md)实现,当前版本模型支持打电话行为的识别,预测得到的`class id`对应关系为: +``` +0: 打电话, +1: 其他 +``` +- 基于分类的行为识别基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md#3.3)完成模型训练。 + + +## 基于检测的行为识别 + +应用行为:吸烟识别 + +
    +
    +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 +在这里,我们提供了行人检测/跟踪、吸烟行为识别的预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 吸烟行为识别 | PP-YOLOE | mAP: 39.7 | 单人 2.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 抽烟行为识别模型使用业务数据进行训练和测试。 +3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下: +``` +ID_BASED_DETACTION: # 基于检测的行为识别模型配置 + model_dir: output_inference/ppyoloe_crn_s_80e_smoking_visdrone # 模型所在路径 + batch_size: 8 # 预测批大小 + threshold: 0.4 # 识别为对应行为的阈值 + display_frames: 80 # 显示帧数。当识别到对应动作时,在对应人物ID中显示状态的持续时间。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`抽烟行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_DETACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + +### 方案说明 +1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 通过在帧级别的行人图像中检测该行为的典型特定目标实现。当检测到特定目标(在这里即烟头)以后,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)实现,当前版本模型支持吸烟行为的识别,预测得到的`class id`对应关系为: +``` +0: 吸烟, +1: 其他 +``` + +## 基于行人轨迹的行为识别 + +应用行为:闯入识别 + +
    + +
    + +具体使用请参照[PP-Human检测跟踪模块](pphuman_mot.md)的`5. 区域闯入判断和计数`。 + +### 方案说明 +1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的下边界中点在相邻帧位于用户所选区域的内外位置,来识别是否闯入所选区域。 + + +## 基于视频分类的行为识别 + +应用行为:打架识别 + +
    +
    +
    数据来源及版权归属:Surveillance Camera Fight Dataset。
    +
    + +该方案关注的场景为监控摄像头下的打架行为识别。打架行为涉及多人,基于骨骼点技术的方案更适用于单人的行为识别。此外,打架行为对时序信息依赖较强,基于检测和分类的方案也不太适用。由于监控场景背景复杂,人的密集程度、光线、拍摄角度等都会对识别造成影响,本方案采用基于视频分类的方式判断视频中是否存在打架行为。针对摄像头距离人较远的情况,通过增大输入图像分辨率优化。由于训练数据有限,采用数据增强的方式提升模型的泛化性能。 + +### 模型库 +在这里,我们提供了打架识别的预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 打架识别 | PP-TSM | 准确率:89.06% | 2s视频 128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +注: +1. 打架识别模型基于6个公开数据集训练得到:Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset。 +2. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下: +``` +VIDEO_ACTION: # 基于视频分类的行为识别模型配置 + model_dir: output_inference/ppTSM # 模型所在路径 + batch_size: 1 # 预测批大小。当前仅支持为1进行推理 + frame_len: 8 # 累计抽样帧数量,达到该数量后执行一次识别 + sample_freq: 7 # 抽样频率,即间隔多少帧抽样一帧 + short_size: 340 # 视频帧尺度变换最小边的长度 + target_size: 320 # 目标视频帧的大小 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从上表链接中下载`打架识别`任务的预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pphuman/config/infer_cfg_pphuman.yml`中`VIDEO_ACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + + +### 方案说明 + +目前打架识别模型使用的是[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md),并在PP-TSM视频分类模型训练流程的基础上修改适配,完成模型训练。对于输入的视频或者视频流,进行等间隔抽帧,当视频帧累计到指定数目时,输入到视频分类模型中判断是否存在打架行为。 + + +## 参考文献 +``` +@inproceedings{stgcn2018aaai, + title = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition}, + author = {Sijie Yan and Yuanjun Xiong and Dahua Lin}, + booktitle = {AAAI}, + year = {2018}, +} +````` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action_en.md new file mode 100644 index 0000000000000000000000000000000000000000..537892a88bc02fddad0e9a80b2b25d4fb1ac45d3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_action_en.md @@ -0,0 +1,276 @@ +English | [简体中文](pphuman_action.md) + +# Action Recognition Module of PP-Human + +Action Recognition is widely used in the intelligent community/smart city, and security monitoring. PP-Human provides the module of video-classification-based, detection-based, image-classification-based and skeleton-based action recognition. + +## Model Zoo + +There are multiple available pretrained models including pedestrian detection/tracking, keypoint detection, fighting, calling, smoking and fall detection models. Users can download and use them directly. + +| Task | Algorithm | Precision | Inference Speed(ms) | Model Weights |Model Inference and Deployment | +|:----------------------------- |:---------:|:-------------------------:|:-----------------------------------:| :-----------------: |:-----------------------------------------------------------------------------------------:| +| Pedestrian Detection/Tracking | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | Detection: 28ms
    Tracking:33.1ms |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Calling Recognition | PP-HGNet | Precision Rate: 86.85 | Single Person 2.94ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | +| Smoking Recognition | PP-YOLOE | mAP: 39.7 | Single Person 2.0ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | +| Keypoint Detection | HRNet | AP: 87.1 | Single Person 2.9ms |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | +| Falling Recognition | ST-GCN | Precision Rate: 96.43 | Single Person 2.7ms | - |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | +| Fighting Recognition | PP-TSM | Precision Rate: 89.06% | 128ms for a 2sec video | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +Note: + +1. The precision of the pedestrian detection/ tracking model is obtained by trainning and testing on [MOT17](https://motchallenge.net/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) and some business data. + +2. The keypoint detection model is trained on [COCO](https://cocodataset.org/), [UAV-Human](https://github.com/SUTDCV/UAV-Human), and some business data, and the precision is obtained on test sets of business data. + +3. The falling action recognition model is trained on [NTU-RGB+D](https://rose1.ntu.edu.sg/dataset/actionRecognition/), [UR Fall Detection Dataset](http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html), and some business data, and the precision is obtained on the testing set of business data. + +4. The calling action recognition model is trained and tested on [UAV-Human](https://github.com/SUTDCV/UAV-Human), by using video frames of calling in this dataset. + +5. The smoking action recognition model is trained and tested on business data. + +6. The fighting action recognition model is trained and tested on 6 public datasets, including Surveillance Camera Fight Dataset, A Dataset for Automatic Violence Detection in Videos, Hockey Fight Detection Dataset, Video Fight Detection Dataset, Real Life Violence Situations Dataset, UBI Abnormal Event Detection Dataset. + +7. The inference speed is the speed of using TensorRT FP16 on NVIDIA T4, including the total time of data pre-training, model inference, and post-processing. + + +## Skeleton-based action recognition -- falling detection + +

    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: + +``` +SKELETON_ACTION: # Config for skeleton-based action recognition model + model_dir: output_inference/STGCN # Path of the model + batch_size: 1 # The size of the inference batch. Current now only support 1. + max_frames: 50 # The number of frames of action segments. When frames of time-ordered skeleton keypoints of each pedestrian ID achieve the max value,the action type will be judged by the action recognition model. If the setting is the same as the training, there will be an ideal inference result. + display_frames: 80 # The number of display frames. When the inferred action type is falling down, the time length of the act will be displayed in the ID. + coord_size: [384, 512] # The unified size of the coordinate, which is the best when it is the same as the training setting. + enable: False # Whether to enable this function +``` + + +## How to Use + +1. Download models `Pedestrian Detection/Tracking`, `Keypoint Detection` and `Falling Recognition` from the links in the Model Zoo and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command: + + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` + +3. There are two ways to modify the model path: + + - In ```./deploy/pipeline/config/infer_cfg_pphuman.yml```, you can configurate different model paths,which is proper only if you match keypoint models and action recognition models with the fields of `KPT` and `SKELETON_ACTION` respectively, and modify the corresponding path of each field into the expected path. + - Add `-o KPT.model_dir=xxx SKELETON_ACTION.model_dir=xxx ` in the command line following the --config to change the model path: + + + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + -o KPT.model_dir=./dark_hrnet_w32_256x192 SKELETON_ACTION.model_dir=./STGCN \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution + +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. +3. In this strategy, we use the [keypoint detection model](../../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) to obtain 17 skeleton keypoints. Their sequences and types are identical to those of COCO. For details, please refer to the `COCO dataset` part of [how to prepare keypoint datasets](../../../../docs/tutorials/data/PrepareKeypointDataSet_en.md). + +4. Each target pedestrian with a tracking ID has their own accumulation of skeleton keypoints, which is used to form a keypoint sequence in time order. When the number of accumulated frames reach a preset threshold or the tracking is lost, the action recognition model will be applied to judging the action type of the time-ordered keypoint sequence. The current model only supports the recognition of the act of falling down, and the relationship between the action type and `class id` is: + +``` +0: Fall down + +1: Others +``` +- The falling action recognition model uses [ST-GCN](https://arxiv.org/abs/1801.07455), and employ the [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md) toolkit to complete model training. + +## Image-Classification-Based Action Recognition -- Calling Recognition + +

    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: + +``` +ID_BASED_CLSACTION: # config for classfication-based action recognition model + model_dir: output_inference/PPHGNet_tiny_calling_halfbody # Path of the model + batch_size: 8 # The size of the inference batch + threshold: 0.45 # Threshold for corresponding behavior + display_frames: 80 # The number of display frames. When the corresponding action is detected, the time length of the act will be displayed in the ID. + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download models `Pedestrian Detection/Tracking` and `Calling Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. Set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml. + +3. Run this command: + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. +3. With image classification through pedestrian images at the frame level, when the category to which the image belongs is the corresponding behavior, it is considered that the character is in the behavior state for a certain period of time. This task is implemented with [PP-HGNet](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md). In current version, the behavior of calling is supported and the relationship between the action type and `class id` is: +``` +0: Calling + +1: Others +``` + + +## Detection-based Action Recognition -- Smoking Detection + +

    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: +``` +ID_BASED_DETACTION: # Config for detection-based action recognition model + model_dir: output_inference/ppyoloe_crn_s_80e_smoking_visdrone # Path of the model + batch_size: 8 # The size of the inference batch + threshold: 0.4 # Threshold for corresponding behavior. + display_frames: 80 # The number of display frames. When the corresponding action is detected, the time length of the act will be displayed in the ID. + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download models `Pedestrian Detection/Tracking` and `Smoking Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_DETACTION` in infer_cfg_pphuman.yml. + +3. Run this command: + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. + +3. We detecting the typical specific target of this behavior in frame-level pedestrian images. When a specific target (in this case, cigarette is the target) is detected, it is considered that the character is in the behavior state for a certain period of time. This task is implemented by [PP-YOLOE](../../../../configs/ppyoloe/). In current version, the behavior of smoking is supported and the relationship between the action type and `class id` is: + +``` +0: Smoking + +1: Others +``` + +## Video-Classification-Based Action Recognition -- Fighting Detection +With wider and wider deployment of surveillance cameras, it is time-consuming and labor-intensive and inefficient to manually check whether there are abnormal behaviors such as fighting. AI + security assistant smart security. A fight recognition module is integrated into PP-Human to identify whether there is fighting in the video. We provide pre-trained models that users can download and use directly. + +| Task | Model | Acc. | Speed(ms) | Weight | Deploy Model | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| Fighting Detection | PP-TSM | 89.06% | 128ms for a 2-sec video| [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + + +The model is trained with 6 public dataset, including Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset. + +This project focuses on is the identification of fighting behavior under surveillance cameras. Fighting behavior involves multiple people, and the skeleton-based technology is more suitable for single-person behavior recognition. In addition, fighting behavior is strongly dependent on timing information, and the detection and classification-based scheme is not suitable. Due to the complex background of the monitoring scene, the density of people, light, filming angle may affect the accuracy. This solution uses video-classification-based method to determine whether there is fighting in the video. +For the case where the camera is far away from the person, it is optimized by increasing the resolution of the input image. Due to the limited training data, data augmentation is used to improve the generalization performance of the model. + + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: +``` +VIDEO_ACTION: # Config for detection-based action recognition model + model_dir: output_inference/ppTSM # Path of the model + batch_size: 1 # The size of the inference batch. Current now only support 1. + frame_len: 8 # Accumulate the number of sampling frames. Inference will be executed when sampled frames reached this value. + sample_freq: 7 # Sampling frequency. It means how many frames to sample one frame. + short_size: 340 # The shortest length for video frame scaling transforms. + target_size: 320 # Target size for input video + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download model `Fighting Detection` from the links of the above table and unzip it to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Modify the file names in the `ppTSM` folder to `model.pdiparams, model.pdiparams.info and model.pdmodel`; + +3. Now the only available input is the video input in the action recognition module. set the "enable: True" of `VIDEO_ACTION` in infer_cfg_pphuman.yml. + +4. Run this command: + ```bash + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +5. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md). + + +The result is shown as follow: + +
    + +
    + +Data source and copyright owner: Surveillance Camera Fight Dataset. + +### Introduction to the Solution +The current fight recognition model is using [PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md), and adaptated to complete the model training. For the input video or video stream, we extraction frame at a certain interval. When the video frame accumulates to the specified number, it is input into the video classification model to determine whether there is fighting. + + +## Custom Training + +The pretrained models are provided and can be used directly, including pedestrian detection/ tracking, keypoint detection, smoking, calling and fighting recognition. If users need to train custom action or optimize the model performance, please refer the link below. + +| Task | Model | Development Document | +| ---- | ---- | -------- | +| pedestrian detection/tracking | PP-YOLOE | [doc](../../../../configs/ppyoloe/README.md#getting-start) | +| keypoint detection | HRNet | [doc](../../../../configs/keypoint/README_en.md#3training-and-testing) | +| action recognition (fall down) | ST-GCN | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) | +| action recognition (smoking) | PP-YOLOE | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) | +| action recognition (calling) | PP-HGNet | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) | +| action recognition (fighting) | PP-TSM | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md) | + + +## Reference + +``` +@inproceedings{stgcn2018aaai, + title = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition}, + author = {Sijie Yan and Yuanjun Xiong and Dahua Lin}, + booktitle = {AAAI}, + year = {2018}, +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..4210e4adcd0b723965625a05aa6d320b29f3850a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute.md @@ -0,0 +1,111 @@ +[English](pphuman_attribute_en.md) | 简体中文 + +# PP-Human属性识别模块 + +行人属性识别在智慧社区,工业巡检,交通监控等方向都具有广泛应用,PP-Human中集成了属性识别模块,属性包含性别、年龄、帽子、眼镜、上衣下衣款式等。我们提供了预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 行人属性高精度模型 | PP-HGNet_small | mA: 95.4 | 单人 1.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | +| 行人属性轻量级模型 | PP-LCNet_x1_0 | mA: 94.5 | 单人 0.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | +| 行人属性精度与速度均衡模型 | PP-HGNet_tiny | mA: 95.2 | 单人 1.14ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.zip) | + + +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 行人属性分析精度为[PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset),[RAPv2](http://www.rapdataset.com/rapv2.html),[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html)和部分业务数据融合训练测试得到 +3. 预测速度为V100 机器上使用TensorRT FP16时的速度, 该处测速速度为模型预测速度 +4. 属性模型应用依赖跟踪模型结果,请在[跟踪模型页面](./pphuman_mot.md)下载跟踪模型,依自身需求选择高精或轻量级下载。 +5. 模型下载后解压放置在PaddleDetection/output_inference/目录下。 + +## 使用方法 + +1. 从上表链接中下载模型并解压到```PaddleDetection/output_inference```路径下,并修改配置文件中模型路径,也可默认自动下载模型。设置```deploy/pipeline/config/infer_cfg_pphuman.yml```中`ATTR`的enable: True + +`infer_cfg_pphuman.yml`中配置项说明: +``` +ATTR: #模块名称 + model_dir: output_inference/PPLCNet_x1_0_person_attribute_945_infer/ #模型路径 + batch_size: 8 #推理最大batchsize + enable: False #功能是否开启 +``` + +2. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPHuman_QUICK_STARTED.md#41-参数说明))。 +```python +#单张图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_file=test_image.jpg \ + --device=gpu \ + +#图片文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_dir=images/ \ + --device=gpu \ + +``` +3. 视频输入时,启动命令如下 +```python +#单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + +#视频文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_dir=test_videos/ \ + --device=gpu \ +``` + +4. 若修改模型路径,有以下两种方式: + + - 方法一:```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,属性识别模型修改ATTR字段下配置 + - 方法二:命令行中--config后面紧跟着增加`-o ATTR.model_dir`修改模型路径: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml + -o ATTR.model_dir=output_inference/PPLCNet_x1_0_person_attribute_945_infer/\ + --video_file=test_video.mp4 \ + --device=gpu +``` + +测试效果如下: + +
    + +
    + +数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用 + +## 方案说明 + +1. 目标检测/多目标跟踪获取图片/视频输入中的行人检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md) +2. 通过行人检测框的坐标在输入图像中截取每个行人 +3. 使用属性识别分析每个行人对应属性,属性类型与PA100k数据集相同,具体属性列表如下: +``` +- 性别:男、女 +- 年龄:小于18、18-60、大于60 +- 朝向:朝前、朝后、侧面 +- 配饰:眼镜、帽子、无 +- 正面持物:是、否 +- 包:双肩包、单肩包、手提包 +- 上衣风格:带条纹、带logo、带格子、拼接风格 +- 下装风格:带条纹、带图案 +- 短袖上衣:是、否 +- 长袖上衣:是、否 +- 长外套:是、否 +- 长裤:是、否 +- 短裤:是、否 +- 短裙&裙子:是、否 +- 穿靴:是、否 +``` + +4. 属性识别模型方案为[StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf),模型结构更改为基于PP-HGNet、PP-LCNet的多分类网络结构,引入Weighted BCE loss提升模型效果。 + +## 参考文献 +``` +@article{jia2020rethinking, + title={Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method}, + author={Jia, Jian and Huang, Houjing and Yang, Wenjie and Chen, Xiaotang and Huang, Kaiqi}, + journal={arXiv preprint arXiv:2005.11909}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a8a72f3b350f3f7b5a070f6036f19d4a5c756a2b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md @@ -0,0 +1,107 @@ +English | [简体中文](pphuman_attribute.md) + +# Attribute Recognition Modules of PP-Human + +Pedestrian attribute recognition has been widely used in the intelligent community, industrial, and transportation monitoring. Many attribute recognition modules have been gathered in PP-Human, including gender, age, hats, eyes, clothing and up to 26 attributes in total. Also, the pre-trained models are offered here and users can download and use them directly. + +| Task | Algorithm | Precision | Inference Speed(ms) | Download Link | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| High-Precision Model | PP-HGNet_small | mA: 95.4 | per person 1.54ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.tar) | +| Fast Model | PP-LCNet_x1_0 | mA: 94.5 | per person 0.54ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.tar) | +| Balanced Model | PP-HGNet_tiny | mA: 95.2 | per person 1.14ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.tar) | + +1. The precision of pedestiran attribute analysis is obtained by training and testing on the dataset consist of [PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset),[RAPv2](http://www.rapdataset.com/rapv2.html),[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html) and some business data. +2. The inference speed is V100, the speed of using TensorRT FP16. +3. This model of Attribute is based on the result of tracking, please download tracking model in the [Page of Mot](./pphuman_mot_en.md). The High precision and Faster model are both available. +4. You should place the model unziped in the directory of `PaddleDetection/output_inference/`. + +## Instruction + +1. Download the model from the link in the above table, and unzip it to```./output_inference```, and set the "enable: True" in ATTR of infer_cfg_pphuman.yml + +The meaning of configs of `infer_cfg_pphuman.yml`: +``` +ATTR: #module name + model_dir: output_inference/PPLCNet_x1_0_person_attribute_945_infer/ #model path + batch_size: 8 #maxmum batchsize when inference + enable: False #whether to enable this model +``` + +2. When inputting the image, run the command as follows (please refer to [QUICK_STARTED-Parameters](./PPHuman_QUICK_STARTED.md#41-参数说明) for more details): +```python +#single image +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_file=test_image.jpg \ + --device=gpu \ + +#image directory +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_dir=images/ \ + --device=gpu \ + +``` +3. When inputting the video, run the command as follows: +```python +#a single video file +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + +#directory of videos +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_dir=test_videos/ \ + --device=gpu \ +``` +4. If you want to change the model path, there are two methods: + + - The first: In ```./deploy/pipeline/config/infer_cfg_pphuman.yml``` you can configurate different model paths. In attribute recognition models, you can modify the configuration in the field of ATTR. + - The second: Add `-o ATTR.model_dir` in the command line following the --config to change the model path: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + -o ATTR.model_dir=output_inference/PPLCNet_x1_0_person_attribute_945_infer/\ + --video_file=test_video.mp4 \ + --device=gpu +``` + +The test result is: + +
    + +
    + +Data Source and Copyright:Skyinfor Technology. Thanks for the provision of actual scenario data, which are only used for academic research here. + +## Introduction to the Solution + +1. The PP-YOLOE model is used to handle detection boxs of input images/videos from object detection/ multi-object tracking. For details, please refer to the document [PP-YOLOE](../../../configs/ppyoloe). +2. Capture every pedestrian in the input images with the help of coordiantes of detection boxes. +3. Analyze the listed labels of pedestirans through attribute recognition. They are the same as those in the PA100k dataset. The label list is as follows: +``` +- Gender +- Age: Less than 18; 18-60; Over 60 +- Orientation: Front; Back; Side +- Accessories: Glasses; Hat; None +- HoldObjectsInFront: Yes; No +- Bag: BackPack; ShoulderBag; HandBag +- TopStyle: UpperStride; UpperLogo; UpperPlaid; UpperSplice +- BottomStyle: LowerStripe; LowerPattern +- ShortSleeve: Yes; No +- LongSleeve: Yes; No +- LongCoat: Yes; No +- Trousers: Yes; No +- Shorts: Yes; No +- Skirt&Dress: Yes; No +- Boots: Yes; No +``` + +4. The model adopted in the attribute recognition is [StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf), where the structure is the multi-class network structure based on PP-HGNet、PP-LCNet, and Weighted BCE loss is introduced for effect optimization. + +## Reference +``` +@article{jia2020rethinking, + title={Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method}, + author={Jia, Jian and Huang, Houjing and Yang, Wenjie and Chen, Xiaotang and Huang, Kaiqi}, + journal={arXiv preprint arXiv:2005.11909}, + year={2020} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot.md new file mode 100644 index 0000000000000000000000000000000000000000..d36282a6b89878493a22ff425fd6d9bb6bce005c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot.md @@ -0,0 +1,116 @@ +[English](pphuman_mot_en.md) | 简体中文 + +# PP-Human检测跟踪模块 + +行人检测与跟踪在智慧社区,工业巡检,交通监控等方向都具有广泛应用,PP-Human中集成了检测跟踪模块,是关键点检测、属性行为识别等任务的基础。我们提供了预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE-l | mAP: 57.8
    MOTA: 82.2 | 检测: 25.1ms
    跟踪:31.8ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 行人检测/跟踪 | PP-YOLOE-s | mAP: 53.2
    MOTA: 73.9 | 检测: 16.2ms
    跟踪:21.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | + +1. 检测/跟踪模型精度为[COCO-Person](http://cocodataset.org/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) 和部分业务数据融合训练测试得到,验证集为业务数据 +2. 预测速度为T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程 + +## 使用方法 + +1. 从上表链接中下载模型并解压到```./output_inference```路径下,并修改配置文件中模型路径。默认为自动下载模型,无需做改动。 +2. 图片输入时,是纯检测任务,启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_file=test_image.jpg \ + --device=gpu +``` +3. 视频输入时,是跟踪任务,注意首先设置infer_cfg_pphuman.yml中的MOT配置的`enable=True`,如果希望跳帧加速检测跟踪流程,可以设置`skip_frame_num: 2`,建议跳帧帧数最大不超过3: +``` +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: 2 + enable: True +``` +然后启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 若修改模型路径,有以下两种方式: + + - ```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,检测和跟踪模型分别对应`DET`和`MOT`字段,修改对应字段下的路径为实际期望的路径即可。 + - 命令行中--config后面紧跟着增加`-o MOT.model_dir`修改模型路径: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + -o MOT.model_dir=ppyoloe/\ + --video_file=test_video.mp4 \ + --device=gpu \ + --region_type=horizontal \ + --do_entrance_counting \ + --draw_center_traj + +``` +**注意:** + - `--do_entrance_counting`表示是否统计出入口流量,不设置即默认为False。 + - `--draw_center_traj`表示是否绘制跟踪轨迹,不设置即默认为False。注意绘制跟踪轨迹的测试视频最好是静止摄像头拍摄的。 + - `--region_type`表示流量计数的区域,当设置`--do_entrance_counting`时可选择`horizontal`或者`vertical`,默认是`horizontal`,表示以视频图片的中心水平线为出入口,同一物体框的中心点在相邻两秒内分别在区域中心水平线的两侧,即完成计数加一。 + +测试效果如下: + +
    + +
    + +数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用 + +5. 区域闯入判断和计数 + +注意首先设置infer_cfg_pphuman.yml中的MOT配置的enable=True,然后启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` +**注意:** + - 区域闯入的测试视频必须是静止摄像头拍摄的,镜头不能抖动或移动。 + - `--do_break_in_counting`表示是否进行区域出入后计数,不设置即默认为False。 + - `--region_type`表示流量计数的区域,当设置`--do_break_in_counting`时仅可选择`custom`,默认是`custom`,表示以用户自定义区域为出入口,同一物体框的下边界中点坐标在相邻两秒内从区域外到区域内,即完成计数加一。 + - `--region_polygon`表示用户自定义区域的多边形的点坐标序列,每两个为一对点坐标(x,y),**按顺时针顺序**连成一个**封闭区域**,至少需要3对点也即6个整数,默认值是`[]`,需要用户自行设置点坐标,如是四边形区域,坐标顺序是`左上、右上、右下、左下`。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数,以及可以自定义画出自己想要的多边形区域的可视化并自己调整。 + 自定义多边形区域的可视化代码运行如下: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + 快速画出想要的区域的小技巧:先任意取点得到图片,用画图工具打开,鼠标放到想要的区域点上会显示出坐标,记录下来并取整,作为这段可视化代码的region_polygon参数,并再次运行可视化,微调点坐标参数直至满意。 + + +测试效果如下: + +
    + +
    + +## 方案说明 + +1. 使用目标检测/多目标跟踪技术来获取图片/视频输入中的行人检测框,检测模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe)。 +2. 多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)和[OC-SORT](https://arxiv.org/pdf/2203.14360.pdf),采用PP-YOLOE替换原文的YOLOX作为检测器,采用BYTETracker和OCSORTTracker作为跟踪器,详细文档参考[ByteTrack](../../../../configs/mot/bytetrack)和[OC-SORT](../../../../configs/mot/ocsort)。 + +## 参考文献 +``` +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} + +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot_en.md new file mode 100644 index 0000000000000000000000000000000000000000..221102646dfd5821c5132ade76263fac605a9718 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mot_en.md @@ -0,0 +1,111 @@ +English | [简体中文](pphuman_mot.md) + +# Detection and Tracking Module of PP-Human + +Pedestrian detection and tracking is widely used in the intelligent community, industrial inspection, transportation monitoring and so on. PP-Human has the detection and tracking module, which is fundamental to keypoint detection, attribute action recognition, etc. Users enjoy easy access to pretrained models here. + +| Task | Algorithm | Precision | Inference Speed(ms) | Download Link | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| Pedestrian Detection/ Tracking | PP-YOLOE-l | mAP: 57.8
    MOTA: 82.2 | Detection: 25.1ms
    Tracking:31.8ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Pedestrian Detection/ Tracking | PP-YOLOE-s | mAP: 53.2
    MOTA: 73.9 | Detection: 16.2ms
    Tracking:21.0ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | + +1. The precision of the pedestrian detection/ tracking model is obtained by trainning and testing on [COCO-Person](http://cocodataset.org/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) and some business data. +2. The inference speed is the speed of using TensorRT FP16 on T4, the total number of data pre-training, model inference, and post-processing. + +## How to Use + +1. Download models from the links of the above table and unizp them to ```./output_inference```. +2. When use the image as input, it's a detection task, the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_file=test_image.jpg \ + --device=gpu +``` +3. When use the video as input, it's a tracking task, first you should set the "enable: True" in MOT of infer_cfg_pphuman.yml. If you want skip some frames speed up the detection and tracking process, you can set `skip_frame_num: 2`, it is recommended that the maximum number of skip_frame_num should not exceed 3: +``` +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: 2 + enable: True +``` +and then the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. There are two ways to modify the model path: + + - In `./deploy/pipeline/config/infer_cfg_pphuman.yml`, you can configurate different model paths,which is proper only if you match keypoint models and action recognition models with the fields of `DET` and `MOT` respectively, and modify the corresponding path of each field into the expected path. + - Add `-o MOT.model_dir` in the command line following the --config to change the model path: + +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + -o MOT.model_dir=ppyoloe/\ + --video_file=test_video.mp4 \ + --device=gpu \ + --region_type=horizontal \ + --do_entrance_counting \ + --draw_center_traj + +``` +**Note:** + + - `--do_entrance_counting` is whether to calculate flow at the gateway, and the default setting is False. + - `--draw_center_traj` means whether to draw the track, and the default setting is False. It's worth noting that the test video of track drawing should be filmed by the still camera. + - `--region_type` means the region type of flow counting. When set `--do_entrance_counting`, you can select from `horizontal` or `vertical`, the default setting is `horizontal`, means that the central horizontal line of the video picture is used as the entrance and exit, and when the central point of the same object box is on both sides of the central horizontal line of the area in two adjacent seconds, the counting plus one is completed. + +The test result is: + +
    + +
    + +Data source and copyright owner:Skyinfor Technology. Thanks for the provision of actual scenario data, which are only used for academic research here. + +5. Break in and counting + +Please set the "enable: True" in MOT of infer_cfg_pphuman.yml at first, and then the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` + +**Note:** + - `--do_break_in_counting` is whether to calculate flow when break in the user-defined region, and the default setting is False. + - `--region_type` means the region type of flow counting. When set `--do_break_in_counting`, only `custom` can be selected, and the default is `custom`, which means that the user-defined region is used as the entrance and exit, and when the midpoint coords of the bottom boundary of the same object moves from outside to inside the region within two adjacent seconds, the counting plus one is completed. + - `--region_polygon` means the point coords sequence of the polygon in the user-defined region. Every two integers are a pair of point coords (x,y), which are connected into a closed area in clockwise order. At least 3 pairs of points, that is, 6 integers, are required. The default value is `[]`, and the user needs to set the point coords by himself. Users can run this [code](../../tools/get_video_info.py) to obtain the resolution and frame number of the measured video, and can customize the visualization of drawing the polygon area they want and adjust it by themselves. + The visualization code of the custom polygon region runs as follows: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + +The test result is: + +
    + +
    + + +## Introduction to the Solution + +1. Get the pedestrian detection box of the image/ video input through object detection and multi-object tracking. The detection model is PP-YOLOE, please refer to [PP-YOLOE](../../../../configs/ppyoloe) for details. + +2. The multi-object tracking solution is based on [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf) and [OC-SORT](https://arxiv.org/pdf/2203.14360.pdf), and replace the original YOLOX with PP-YOLOE as the detector,and BYTETracker or OC-SORT Tracker as the tracker, please refer to [ByteTrack](../../../../configs/mot/bytetrack) and [OC-SORT](../../../../configs/mot/ocsort). + +## Reference +``` +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct.md new file mode 100644 index 0000000000000000000000000000000000000000..069365a6f3ad47720c2dd204f9ba84ed3dbd3a60 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct.md @@ -0,0 +1,90 @@ +[English](pphuman_mtmct_en.md) | 简体中文 + +# PP-Human跨镜头跟踪模块 + +跨镜头跟踪任务,是在单镜头跟踪的基础上,实现不同摄像头中人员的身份匹配关联。在安放、智慧零售等方向有较多的应用。 +PP-Human跨镜头跟踪模块主要目的在于提供一套简洁、高效的跨镜跟踪Pipeline,REID模型完全基于开源数据集训练。 + +## 使用方法 + +1. 下载模型 [行人跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)和[REID模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) 并解压到```./output_inference```路径下,修改配置文件中模型路径。也可简单起见直接用默认配置,自动下载模型。 MOT模型请参考[mot说明](./pphuman_mot.md)文件下载。 + +2. 跨镜头跟踪模式下,要求输入的多个视频放在同一目录下,同时开启infer_cfg_pphuman.yml 中的REID选择中的enable=True, 命令如下: +```python +python3 deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=[your_video_file_directory] --device=gpu +``` + +3. 相关配置在`./deploy/pipeline/config/infer_cfg_pphuman.yml`文件中修改: + +```python +python3 deploy/pipeline/pipeline.py + --config deploy/pipeline/config/infer_cfg_pphuman.yml -o REID.model_dir=reid_best/ + --video_dir=[your_video_file_directory] + --device=gpu +``` + +## 方案说明 + +跨镜头跟踪模块,主要由跨镜头跟踪Pipeline及REID模型两部分组成。 +1. 跨镜头跟踪Pipeline + +``` + +单镜头跟踪[id+bbox] + │ +根据bbox截取原图中目标——│ + │ │ + REID模型 质量评估(遮挡、完整度、亮度等) + │ │ + [feature] [quality] + │ │ + datacollector—————│ + │ + 特征排序、筛选 + │ + 多视频各id相似度计算 + │ + id聚类、重新分配id +``` + +2. 模型方案为[reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline), Backbone为ResNet50, 主要特色为模型结构简单。 +本跨镜跟踪中所用REID模型在上述基础上,整合多个开源数据集并压缩模型特征到128维以提升泛化性能。大幅提升了在实际应用中的泛化效果。 + +### 其他建议 +- 提供的REID模型基于开源数据集训练得到,建议加入自有数据,训练更加强有力的REID模型,将非常明显提升跨镜跟踪效果。 +- 质量评估部分基于简单逻辑+OpenCV实现,效果有限,如果有条件建议针对性训练质量判断模型。 + + +### 示例效果 + +- camera 1: +
    + +
    + +- camera 2: +
    + +
    + + +## 参考文献 +``` +@InProceedings{Luo_2019_CVPR_Workshops, +author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei}, +title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, +month = {June}, +year = {2019} +} + +@ARTICLE{Luo_2019_Strong_TMM, +author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}}, +journal={IEEE Transactions on Multimedia}, +title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification}, +year={2019}, +pages={1-1}, +doi={10.1109/TMM.2019.2958756}, +ISSN={1941-0077}, +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md new file mode 100644 index 0000000000000000000000000000000000000000..c7a5d9909a043a601b6e996929b46c100a9c8437 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md @@ -0,0 +1,93 @@ +English | [简体中文](pphuman_mtmct.md) + +# Multi-Target Multi-Camera Tracking Module of PP-Human + +Multi-target multi-camera tracking, or MTMCT, matches the identity of a person in different cameras based on the single-camera tracking. MTMCT is usually applied to the security system and the smart retailing. +The MTMCT module of PP-Human aims to provide a multi-target multi-camera pipleline which is simple, and efficient. + +## How to Use + +1. Download [REID model](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) and unzip it to ```./output_inference```. For the MOT model, please refer to [mot description](./pphuman_mot.md). + +2. In the MTMCT mode, input videos are required to be put in the same directory. set the REID "enable: True" in the infer_cfg_pphuman.yml. The command line is: +```python +python3 deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=[your_video_file_directory] --device=gpu +``` + +3. Configuration can be modified in `./deploy/pipeline/config/infer_cfg_pphuman.yml`. + +```python +python3 deploy/pipeline/pipeline.py + --config deploy/pipeline/config/infer_cfg_pphuman.yml -o REID.model_dir=reid_best/ + --video_dir=[your_video_file_directory] + --device=gpu +``` + +## Intorduction to the Solution + +MTMCT module consists of the multi-target multi-camera tracking pipeline and the REID model. + +1. Multi-Target Multi-Camera Tracking Pipeline + +``` + +single-camera tracking[id+bbox] + │ +capture the target in the original image according to bbox——│ + │ │ + REID model quality assessment (covered or not, complete or not, brightness, etc.) + │ │ + [feature] [quality] + │ │ + datacollector—————│ + │ + sort out and filter features + │ + calculate the similarity of IDs in the videos + │ + make the IDs cluster together and rearrange them +``` + +2. The model solution is [reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline), with ResNet50 as the backbone. + +Under the above circumstances, the REID model used in MTMCT integrates open-source datasets and compresses model features to 128-dimensional features to optimize the generalization. In this way, the actual generalization result becomes much better. + +### Other Suggestions + +- The provided REID model is obtained from open-source dataset training. It is recommended to add your own data to get a more powerful REID model, notably improving the MTMCT effect. +- The quality assessment is based on simple logic +OpenCV, whose effect is limited. If possible, it is advisable to conduct specific training on the quality assessment model. + + +### Example + +- camera 1: +
    + +
    + +- camera 2: +
    + +
    + + +## Reference +``` +@InProceedings{Luo_2019_CVPR_Workshops, +author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei}, +title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, +month = {June}, +year = {2019} +} + +@ARTICLE{Luo_2019_Strong_TMM, +author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}}, +journal={IEEE Transactions on Multimedia}, +title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification}, +year={2019}, +pages={1-1}, +doi={10.1109/TMM.2019.2958756}, +ISSN={1941-0077}, +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..c55923f6a5c452f7d962411679f9fa55df39dd94 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md @@ -0,0 +1,117 @@ +[English](ppvehicle_attribute_en.md) | 简体中文 + +# PP-Vehicle属性识别模块 + +车辆属性识别在智慧城市,智慧交通等方向具有广泛应用。在PP-Vehicle中,集成了车辆属性识别模块,可识别车辆颜色及车型属性的识别。 + +| 任务 | 算法 | 精度 | 预测速度 | 下载链接| +|-----------|------|-----------|----------|---------------| +| 车辆检测/跟踪 | PP-YOLOE | mAP 63.9 | 38.67ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车辆属性识别 | PPLCNet | 90.81 | 7.31 ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | + + +注意: +1. 属性模型预测速度是基于NVIDIA T4, 开启TensorRT FP16得到。模型预测速度包含数据预处理、模型预测、后处理部分。 +2. 关于PP-LCNet的介绍可以参考[PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/models/PP-LCNet.md)介绍,相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。 +3. 属性模型的训练和精度测试均基于[VeRi数据集](https://www.v7labs.com/open-datasets/veri-dataset)。 + + +- 当前提供的预训练模型支持识别10种车辆颜色及9种车型,同VeRi数据集,具体如下: + +```yaml +# 车辆颜色 +- "yellow" +- "orange" +- "green" +- "gray" +- "red" +- "blue" +- "white" +- "golden" +- "brown" +- "black" + +# 车型 +- "sedan" +- "suv" +- "van" +- "hatchback" +- "mpv" +- "pickup" +- "bus" +- "truck" +- "estate" +``` + +## 使用方法 + +### 配置项说明 + +[配置文件](../../config/infer_cfg_ppvehicle.yml)中与属性相关的参数如下: +``` +VEHICLE_ATTR: + model_dir: output_inference/vehicle_attribute_infer/ # 车辆属性模型调用路径 + batch_size: 8 # 模型预测时的batch_size大小 + color_threshold: 0.5 # 颜色属性阈值,需要置信度达到此阈值才会确定具体颜色,否则为'Unknown‘ + type_threshold: 0.5 # 车型属性阈值,需要置信度达到此阈值才会确定具体属性,否则为'Unknown‘ + enable: False # 是否开启该功能 +``` + +### 使用命令 + +1. 从模型库下载`车辆检测/跟踪`, `车辆属性识别`两个预测部署模型并解压到`./output_inference`路径下;默认会自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件中`VEHICLE_ATTR`项的`enable: True`,以启用该功能。 +3. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPVehicle_QUICK_STARTED.md)): + +```bash +# 预测单张图片文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu + +# 预测包含一张或多张图片的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_dir=images/ \ + --device=gpu +``` + +4. 视频输入时,启动命令如下: + +```bash +#预测单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu + +#预测包含一个或多个视频的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + --device=gpu +``` + +5. 若修改模型路径,有以下两种方式: + + - 方法一:`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`下可以配置不同模型路径,属性识别模型修改`VEHICLE_ATTR`字段下配置 + - 方法二:直接在命令行中增加`-o`,以覆盖配置文件中的默认模型路径: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o VEHICLE_ATTR.model_dir=output_inference/vehicle_attribute_infer +``` + +测试效果如下: + +
    + +
    + +## 方案说明 +车辆属性识别模型使用了[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 的超轻量图像分类方案(PULC,Practical Ultra Lightweight image Classification)。关于该模型的数据准备、训练、测试等详细内容,请见[PULC 车辆属性识别模型](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/PULC/PULC_vehicle_attribute.md). + +车辆属性识别模型选用了轻量级、高精度的PPLCNet。并在该模型的基础上,进一步使用了以下优化方案: + +- 使用SSLD预训练模型,在不改变推理速度的前提下,精度可以提升约0.5个百分点 +- 融合EDA数据增强策略,精度可以再提升0.52个百分点 +- 使用SKL-UGI知识蒸馏, 精度可以继续提升0.23个百分点 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fa666da3255056d987fdc50fcf4f5ae3b00eeb3a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_attribute_en.md @@ -0,0 +1,121 @@ +English | [简体中文](ppvehicle_attribute.md) + +# Attribute Recognition Module of PP-Vehicle + +Vehicle attribute recognition is widely used in smart cities, smart transportation and other scenarios. In PP-Vehicle, a vehicle attribute recognition module is integrated, which can identify vehicle color and model. + +| Task | Algorithm | Precision | Inference Speed | Download | +|-----------|------|-----------|----------|---------------------| +| Vehicle Detection/Tracking | PP-YOLOE | mAP 63.9 | 38.67ms | [Inference and Deployment Model](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Vehicle Attribute Recognition | PPLCNet | 90.81 | 7.31 ms | [Inference and Deployment Model](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | + + +Note: +1. The inference speed of the attribute model is obtained from the test on NVIDIA T4, with TensorRT FP16. The time includes data pre-process, model inference and post-process. +2. For introductions, please refer to [PP-LCNet Series](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/models/PP-LCNet_en.md). Related paper is available on PP-LCNet paper +3. The training and test phase of vehicle attribute recognition model are both obtained from [VeRi dataset](https://www.v7labs.com/open-datasets/veri-dataset). + + +- The provided pre-trained model supports 10 colors and 9 models, which is the same with VeRi dataset. The details are as follows: + +```yaml +# Vehicle Colors +- "yellow" +- "orange" +- "green" +- "gray" +- "red" +- "blue" +- "white" +- "golden" +- "brown" +- "black" + +# Vehicle Models +- "sedan" +- "suv" +- "van" +- "hatchback" +- "mpv" +- "pickup" +- "bus" +- "truck" +- "estate" +``` + +## Instructions + +### Description of Configuration + +Parameters related to vehicle attribute recognition in the [config file](../../config/infer_cfg_ppvehicle.yml) are as follows: + +```yaml +VEHICLE_ATTR: + model_dir: output_inference/vehicle_attribute_infer/ # Path of the model + batch_size: 8 # The size of the inference batch + color_threshold: 0.5 # Threshold of color. Confidence is required to reach this threshold to determine the specific attribute, otherwise it will be 'Unknown‘. + type_threshold: 0.5 # Threshold of vehicle model. Confidence is required to reach this threshold to determine the specific attribute, otherwise it will be 'Unknown‘. + enable: False # Whether to enable this function +``` + +### How to Use +1. Download models `Vehicle Detection/Tracking` and `Vehicle Attribute Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path to use this function. + +2. Set the "enable: True" of `VEHICLE_ATTR` in infer_cfg_ppvehicle.yml. + +3. For image input, please run these commands. (Description of more parameters, please refer to [QUICK_STARTED - Parameter_Description](./PPVehicle_QUICK_STARTED.md). + +```bash +# For single image +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu + +# For folder contains one or multiple images +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_dir=images/ \ + --device=gpu +``` + +4. For video input, please run these commands. + +```bash +# For single video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu + +# For folder contains one or multiple videos +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + --device=gpu +``` + +5. There are two ways to modify the model path: + + - Method 1:Set paths of each model in `./deploy/pipeline/config/infer_cfg_ppvehicle.yml`. For vehicle attribute recognition, the path should be modified under the `VEHICLE_ATTR` field. + - Method 2: Directly add `-o` in command line to override the default model path in the configuration file: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o VEHICLE_ATTR.model_dir=output_inference/vehicle_attribute_infer +``` + +The result is shown as follow: + +
    + +
    + + +### Features to the Solution + +The vehicle attribute recognition model adopts PULC, Practical Ultra Lightweight image Classification from [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). For details on data preparation, training, and testing of the model, please refer to [PULC Recognition Model of Vehicle Attribute](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/PULC/PULC_vehicle_attribute_en.md). + +The vehicle attribute recognition model adopts the lightweight and high-precision PPLCNet. And on top of PPLCNet, our model optimized via:: + +- Improved about 0.5 percentage points accuracy by using the SSLD pre-trained model without changing the inference speed. +- Improved 0.52 percentage points accuracy further by integrating EDA data augmentation strategy. +- Improved 0.23 percentage points accuracy by using SKL-UGI knowledge distillation. diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking.md new file mode 100644 index 0000000000000000000000000000000000000000..6627411855d09cb7fcde4697dfa2a67216ec8415 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking.md @@ -0,0 +1,92 @@ + +# PP-Vehicle违章停车识别模块 + +禁停区域违章停车识别在车辆应用场景中有着非常广泛的应用,借助AI的力量可以减轻人力投入,精准快速识别出违停车辆并进一步采取如广播驱离行为。PP-Vehicle中基于车辆跟踪模型、车牌检测模型和车牌识别模型实现了违章停车识别功能,具体模型信息如下: + +| 任务 | 算法 | 精度 | 预测速度(ms) |预测模型下载链接 | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| 车辆检测/跟踪 | PP-YOLOE-l | mAP: 63.9 | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车牌检测模型 | ch_PP-OCRv3_det | hmean: 0.979 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) | +| 车牌识别模型 | ch_PP-OCRv3_rec | acc: 0.773 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | + +1. 跟踪模型使用PPVehicle数据集(整合了BDD100K-MOT和UA-DETRAC),是将BDD100K-MOT中的car, truck, bus, van和UA-DETRAC中的car, bus, van都合并为1类vehicle(1)后的数据集。 +2. 车牌检测、识别模型使用PP-OCRv3模型在CCPD2019、CCPD2020混合车牌数据集上fine-tune得到。 + +## 使用方法 + +1. 用户可从上表链接中下载模型并解压到```PaddleDetection/output_inference```路径下,并修改配置文件中模型路径,也可默认自动下载模型。在```deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml```中可手动设置三个模型的模型路径。 + +`infer_cfg_illegal_parking.yml`中配置项说明: +``` +MOT: # 跟踪模块 + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip # 跟踪模型路径 + tracker_config: deploy/pipeline/config/tracker_config.yml # 跟踪配置文件路径 + batch_size: 1 # 跟踪batch size + enable: True # 是否开启跟踪功能 + +VEHICLE_PLATE: # 车牌识别模块 + det_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz # 车牌检测模型路径 + det_limit_side_len: 480 # 检测模型单边输入尺寸 + det_limit_type: "max" # 检测模型输入尺寸长短边选择,"max"表示长边 + rec_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz # 车牌识别模型路径 + rec_image_shape: [3, 48, 320] # 车牌识别模型输入尺寸 + rec_batch_num: 6 # 车牌识别batch size + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt # OCR模型查询字典 + enable: True # 是否开启车牌识别功能 +``` + +2. 输入视频,启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --illegal_parking_time=5 \ + --region_type=custom \ + --region_polygon 100 1000 1000 1000 900 1700 0 1700 +``` + +参数说明如下: +- config:配置文件路径; +- video_file:测试视频路径; +- device:推理设备配置; +- draw_center_traj:画出车辆中心运动轨迹; +- illegal_parking_time:非法停车时间,单位为秒; +- region_type:非法停车区域类型,custom表示自定义; +- region_polygon:自定义非法停车多边形,至少为3个点。 + +**注意:** + - 违章停车的测试视频必须是静止摄像头拍摄的,镜头不能抖动或移动。 + - 判断车辆是否在违停区域内是**以车辆的中心点**作为参考,车辆擦边而过等场景不算作违章停车。 + - `--region_polygon`表示用户自定义区域的多边形的点坐标序列,每两个为一对点坐标(x,y),**按顺时针顺序**连成一个**封闭区域**,至少需要3对点也即6个整数,默认值是`[]`,需要用户自行设置点坐标,如是四边形区域,坐标顺序是`左上、右上、右下、左下`。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数,以及可以自定义画出自己想要的多边形区域的可视化并自己调整。 + 自定义多边形区域的可视化代码运行如下: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + 快速画出想要的区域的小技巧:先任意取点得到图片,用画图工具打开,鼠标放到想要的区域点上会显示出坐标,记录下来并取整,作为这段可视化代码的region_polygon参数,并再次运行可视化,微调点坐标参数直至满意。 + + +3. 若修改模型路径,有以下两种方式: + + - 方法一:```./deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml```下可以配置不同模型路径; + - 方法二:命令行中--config配置项后面增加`-o VEHICLE_PLATE.det_model_dir=[YOUR_DETMODEL_PATH] VEHICLE_PLATE.rec_model_dir=[YOUR_RECMODEL_PATH]`修改模型路径。 + + +测试效果如下: + +
    + +
    + +可视化视频中左上角num后面的数值表示当前帧中车辆的数目;Total count表示画面中出现的车辆的总数,包括出现又消失的车辆。 + +## 方案说明 + +1. 目标检测/多目标跟踪获取图片/视频输入中的车辆检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md) +2. 基于跟踪算法获取每辆车的轨迹,如果车辆中心在违停区域内且在指定时间内未发生移动,则视为违章停车; +3. 使用车牌识别模型得到违章停车车牌并可视化。 + +## 参考资料 + +1. PaddeDetection特色检测模型[PP-YOLOE](../../../../configs/ppyoloe)。 +2. Paddle字符识别模型库[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)。 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a5ae30f6076dc90d0909b0fecaaa3f4a6404fde3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_illegal_parking_en.md @@ -0,0 +1,80 @@ + +# PP-Vehicle Illegal Parking Recognition Module + +Illegal parking recognition in no-parking areas has a very wide range of applications in vehicle application scenarios. With the help of AI, human input can be reduced, and illegally parked vehicles can be accurately and quickly identified, and further behaviors such as broadcasting to expel the vehicles can be performed. Based on the vehicle tracking model, license plate detection model and license plate recognition model, the PP-Vehicle realizes the illegal parking recognition function. The specific model information is as follows: + +| Task | Algorithm | Precision | Inference Speed(ms) |Inference Model Download Link | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| Vehicle Tracking | PP-YOLOE-l | mAP: 63.9 | - |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Plate Detection | ch_PP-OCRv3_det | hmean: 0.979 | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) | +| Plate Recognition | ch_PP-OCRv3_rec | acc: 0.773 | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | + +1. The tracking model uses the PPVehicle dataset (integrating BDD100K-MOT and UA-DETRAC), which combines car, truck, bus, van in BDD100K-MOT and car, bus, and van in UA-DETRAC into one class which named vehicle (1). +2. The license plate detection and recognition model is fine-tuned on the CCPD2019 and CCPD2020 using the PP-OCRv3 model. + +## Instructions + +1. Users can download the model from the link in the table above and unzip it to the ``PaddleDetection/output_inference``` path, and modify the model path in the configuration file, or download the model automatically by default. The model paths for the three models can be manually set in ``deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml```. + +Description of configuration items in `infer_cfg_illegal_parking.yml`: +``` +MOT: # Tracking Module + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip # Path of Tracking Model + tracker_config: deploy/pipeline/config/tracker_config.yml # Config Path of Tracking + batch_size: 1 # Tracking batch size + enable: True # Whether to Enable Tracking Function + +VEHICLE_PLATE: # Plate Recognition Module + det_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz # Path of Plate Detection Model + det_limit_side_len: 480 # Single Side Size of Detection Model + det_limit_type: "max" # Detection model Input Size Selection of Long and Short Sides, "max" Represents the Long Side + rec_model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz # Path of Plate Recognition Model + rec_image_shape: [3, 48, 320] # The Input Size of Plate Recognition Model + rec_batch_num: 6 # Plate Recognition batch size + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt # OCR Model Look-up Table + enable: True # Whether to Enable Plate Recognition Function +``` + +2. Input video, the command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --illegal_parking_time=5 \ + --region_type=custom \ + --region_polygon 100 1000 1000 1000 900 1700 0 1700 + +The parameter description: +- config: config path; +- video_file: video path to be tested; +- device: device to infe; +- draw_center_traj: draw the trajectory of the center of the vehicle; +- illegal_parking_time: illegal parking time, in seconds; +- region_type: illegal parking region type, 'custom' means the region is customized; +- region_polygon: customized illegal parking region which includes three points at least. + +3. Methods to modify the path of model: + + - Method 1: Configure different model paths in ```./deploy/pipeline/config/examples/infer_cfg_illegal_parking.yml``` file; + - Method2: In the command line, add `-o VEHICLE_PLATE.det_model_dir=[YOUR_DETMODEL_PATH] VEHICLE_PLATE.rec_model_dir=[YOUR_RECMODEL_PATH]` after the --config configuration item to modify the model path. + + +Test Result: + +
    + +
    + + +## Method Description + +1. Target multi-target tracking obtains the vehicle detection frame in the picture/video input. The model scheme is PP-YOLOE. For detailed documentation, refer to [PP-YOLOE](../../../configs/ppyoloe/README_cn. md) +2. Obtain the trajectory of each vehicle based on the tracking algorithm. If the center of the vehicle is in the illegal parking area and does not move within the specified time, it is considered illegal parking; +3. Use the license plate recognition model to get the illegal parking license plate and visualize it. + + +## References + +1. Detection Model in PaddeDetection:[PP-YOLOE](../../../../configs/ppyoloe). +2. Character Recognition Model Library in Paddle: [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR). diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot.md new file mode 100644 index 0000000000000000000000000000000000000000..1163e62c4c4c403a2639ce2f0c25d672b2b38f3f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot.md @@ -0,0 +1,120 @@ +[English](ppvehicle_mot_en.md) | 简体中文 + +# PP-Vehicle车辆跟踪模块 + +【应用介绍】 +车辆检测与跟踪在交通监控、自动驾驶等方向都具有广泛应用,PP-Vehicle中集成了检测跟踪模块,是车牌检测、车辆属性识别等任务的基础。我们提供了预训练模型,用户可以直接下载使用。 + +【模型下载】 +| 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| 车辆检测/跟踪 | PP-YOLOE-l | mAP: 63.9
    MOTA: 50.1 | 检测: 25.1ms
    跟踪:31.8ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车辆检测/跟踪 | PP-YOLOE-s | mAP: 61.3
    MOTA: 46.8 | 检测: 16.2ms
    跟踪:21.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | + +1. 检测/跟踪模型精度为PPVehicle数据集训练得到,整合了BDD100K-MOT和UA-DETRAC,是将BDD100K-MOT中的`car, truck, bus, van`和UA-DETRAC中的`car, bus, van`都合并为1类`vehicle(1)`后的数据集,检测精度mAP是PPVehicle的验证集上测得,跟踪精度MOTA是在BDD100K-MOT的验证集上测得(`car, truck, bus, van`合并为1类`vehicle`)。训练具体流程请参照[ppvehicle](../../../../configs/ppvehicle)。 +2. 预测速度为T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +## 使用方法 + +【配置项说明】 + +配置文件中与属性相关的参数如下: +``` +DET: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ # 车辆检测模型调用路径 + batch_size: 1 # 模型预测时的batch_size大小 + +MOT: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ # 车辆跟踪模型调用路径 + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 # 模型预测时的batch_size大小, 跟踪任务只能设置为1 + skip_frame_num: -1 # 跳帧预测的帧数,-1表示不进行跳帧,建议跳帧帧数最大不超过3 + enable: False # 是否开启该功能,使用跟踪前必须确保设置为True +``` + +【使用命令】 +1. 从上表链接中下载模型并解压到```./output_inference```路径下,并修改配置文件中模型路径。默认为自动下载模型,无需做改动。 +2. 图片输入时,是纯检测任务,启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu +``` +3. 视频输入时,是跟踪任务,注意首先设置infer_cfg_ppvehicle.yml中的MOT配置的`enable=True`,如果希望跳帧加速检测跟踪流程,可以设置`skip_frame_num: 2`,建议跳帧帧数最大不超过3: +``` +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: 2 + enable: True +``` +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 若修改模型路径,有以下两种方式: + - 方法一:```./deploy/pipeline/config/infer_cfg_ppvehicle.yml```下可以配置不同模型路径,检测和跟踪模型分别对应`DET`和`MOT`字段,修改对应字段下的路径为实际期望的路径即可。 + - 方法二:命令行中--config配置项后面增加`-o MOT.model_dir=[YOUR_DETMODEL_PATH]`修改模型路径。 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --region_type=horizontal \ + --do_entrance_counting \ + --draw_center_traj \ + -o MOT.model_dir=ppyoloe/ + +``` +**注意:** + - `--do_entrance_counting`表示是否统计出入口流量,不设置即默认为False。 + - `--draw_center_traj`表示是否绘制跟踪轨迹,不设置即默认为False。注意绘制跟踪轨迹的测试视频最好是静止摄像头拍摄的。 + - `--region_type`表示流量计数的区域,当设置`--do_entrance_counting`时可选择`horizontal`或者`vertical`,默认是`horizontal`,表示以视频图片的中心水平线为出入口,同一物体框的中心点在相邻两秒内分别在区域中心水平线的两侧,即完成计数加一。 + + +5. 区域闯入判断和计数 + +注意首先设置infer_cfg_ppvehicle.yml中的MOT配置的enable=True,然后启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` +**注意:** + - 区域闯入的测试视频必须是静止摄像头拍摄的,镜头不能抖动或移动。 + - `--do_break_in_counting`表示是否进行区域出入后计数,不设置即默认为False。 + - `--region_type`表示流量计数的区域,当设置`--do_break_in_counting`时仅可选择`custom`,默认是`custom`,表示以用户自定义区域为出入口,同一物体框的下边界中点坐标在相邻两秒内从区域外到区域内,即完成计数加一。 + - `--region_polygon`表示用户自定义区域的多边形的点坐标序列,每两个为一对点坐标(x,y),**按顺时针顺序**连成一个**封闭区域**,至少需要3对点也即6个整数,默认值是`[]`,需要用户自行设置点坐标,如是四边形区域,坐标顺序是`左上、右上、右下、左下`。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数,以及可以自定义画出自己想要的多边形区域的可视化并自己调整。 + 自定义多边形区域的可视化代码运行如下: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + 快速画出想要的区域的小技巧:先任意取点得到图片,用画图工具打开,鼠标放到想要的区域点上会显示出坐标,记录下来并取整,作为这段可视化代码的region_polygon参数,并再次运行可视化,微调点坐标参数直至满意。 + + +【效果展示】 + +
    + +
    + +## 方案说明 + +【实现方案及特色】 +1. 使用目标检测/多目标跟踪技术来获取图片/视频输入中的车辆检测框,检测模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe)和[ppvehicle](../../../../configs/ppvehicle)。 +2. 多目标跟踪模型方案采用[OC-SORT](https://arxiv.org/pdf/2203.14360.pdf),采用PP-YOLOE替换原文的YOLOX作为检测器,采用OCSORTTracker作为跟踪器,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 + +## 参考文献 +``` +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d63c3ed0182d5e6c427852e826acdf84a968b649 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_mot_en.md @@ -0,0 +1,143 @@ +English | [简体中文](ppvehicle_mot.md) + +# PP-Vehicle Vehicle Tracking Module + +【Application Introduction】 + +Vehicle detection and tracking are widely used in traffic monitoring and autonomous driving. The detection and tracking module is integrated in PP-Vehicle, providing a solid foundation for tasks including license plate detection and vehicle attribute recognition. We provide pre-trained models that can be directly used by developers. + +【Model Download】 + +| Task | Algorithm | Accuracy | Inference speed(ms) | Download Link | +| -------------------------- | ---------- | ----------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------ | +| Vehicle Detection/Tracking | PP-YOLOE-l | mAP: 63.9
    MOTA: 50.1 | Detection: 25.1ms
    Tracking:31.8ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Vehicle Detection/Tracking | PP-YOLOE-s | mAP: 61.3
    MOTA: 46.8 | Detection: 16.2ms
    Tracking:21.0ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | + +1. The detection/tracking model uses the PPVehicle dataset ( which integrates BDD100K-MOT and UA-DETRAC). The dataset merged car, truck, bus, van from BDD100K-MOT and car, bus, van from UA-DETRAC all into 1 class vehicle(1). The detection accuracy mAP was tested on the test set of PPVehicle, and the tracking accuracy MOTA was obtained on the test set of BDD100K-MOT (`car, truck, bus, van` were combined into 1 class `vehicle`). For more details about the training procedure, please refer to [ppvehicle](../../../../configs/ppvehicle). +2. Inference speed is obtained at T4 with TensorRT FP16 enabled, which includes data pre-processing, model inference and post-processing. + +## How To Use + +【Config】 + +The parameters associated with the attributes in the configuration file are as follows. + +``` +DET: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ # Vehicle detection model path + batch_size: 1 # Batch_size size for model inference + +MOT: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ # Vehicle tracking model path + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 # Batch_size size for model inference, 1 only for tracking task. + skip_frame_num: -1 # Number of frames to skip, -1 means no skipping, the maximum skipped frames are recommended to be 3 + enable: False # Whether or not to enable this function, please make sure it is set to True before tracking +``` + +【Usage】 + +1. Download the model from the link in the table above and unzip it to ``. /output_inference`` and change the model path in the configuration file. The default is to download the model automatically, no changes are needed. + +2. The image input will start a pure detection task, and the start command is as follows + + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu + ``` + +3. Video input will start a tracking task. Please set `enable=True` for the MOT configuration in infer_cfg_ppvehicle.yml. If skip frames are needed for faster detection and tracking, it is recommended to set `skip_frame_num: 2` the maximum should not exceed 3. + + ``` + MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + skip_frame_num: 2 + enable: True + ``` + + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` + +4. There are two ways to modify the model path + + - Config different model path in```./deploy/pipeline/config/infer_cfg_ppvehicle.yml```. The detection and tracking models correspond to the `DET` and `MOT` fields respectively. Modify the path under the corresponding field to the actual path. + + - **[Recommand]** Add`-o MOT.model_dir=[YOUR_DETMODEL_PATH]` after the config in the command line to modify model path + + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --region_type=horizontal \ + --do_entrance_counting \ + --draw_center_traj \ + -o MOT.model_dir=ppyoloe/ + ``` + +**Note:** + +- `--do_entrance_counting` : Whether to count entrance/exit traffic flows, the default is False + +- `--draw_center_traj` : Whether to draw center trajectory, the default is False. Its input video is preferably taken from a still camera + +- `--region_type` : The region for traffic counting. When setting `--do_entrance_counting`, there are two options: `horizontal` or `vertical`. The default is `horizontal`, which means the center horizontal line of the video picture is the entrance and exit. When the center point of the same object frame is on both sides of the centre horizontal line of the region in two adjacent seconds, i.e. the count adds 1. +5. Regional break-in and counting + +Please set the MOT config: enable=True in `infer_cfg_ppvehicle.yml` before running the starting command: + +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` + +**Note:** + +- Test video of area break-ins must be taken from a still camera, with no shaky or moving footage. + +- `--do_break_in_counting`Indicates whether or not to count the entrance and exit of the area. The default is False. + +- `--region_type` indicates the region for traffic counting, when setting `--do_break_in_counting` only `custom` can be selected, and the default is `custom`. It means that the customized region is used as the entry and exit. When the coordinates of the lower boundary midpoint of the same object frame goes to the inside of the region within two adjacent seconds, i.e. the count adds one. + +- `--region_polygon` indicates a sequence of point coordinates for a polygon in a customized region. Every two are a pair of point coordinates (x,y). **In clockwise order** they are connected into a **closed region**, at least 3 pairs of points are needed (or 6 integers). The default value is `[]`. Developers need to set the point coordinates manually. If it is a quadrilateral region, the coordinate order is `top left, top right , bottom right, bottom left`. Developers can run [this code](... /... /tools/get_video_info.py) to get the resolution frames of the predicted video. It also supports customizing and adjusting the visualisation of the polygon area. + The code for the visualisation of the customized polygon area runs as follows. + + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + + A quick tip for drawing customized area: first take any point to get the picture, open it with the drawing tool, mouse over the area point and the coordinates will be displayed, record it and round it up, use it as a region_polygon parameter for this visualisation code and run the visualisation again, and fine-tune the point coordinates parameter. + +【Showcase】 + +
    + +
    + +## Solution + +【Solution and feature】 + +- PP-YOLOE is adopted for vehicle detection frame of object detection, multi-object tracking in the picture/video input. For details, please refer to [PP-YOLOE](... /... /... /configs/ppyoloe/README_cn.md) and [PPVehicle](../../../../configs/ppvehicle) +- [OC-SORT](https://arxiv.org/pdf/2203.14360.pdf) is adopted as multi-object tracking model. PP-YOLOE replaced YOLOX as detector, and OCSORTTracker is the tracker. For more details, please refer to [OC-SORT](../../../../configs/mot/ocsort) + +## Reference + +``` +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate.md new file mode 100644 index 0000000000000000000000000000000000000000..e2c5ab56ea5d5651d3ae4f650fc7e2b4db82f79a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate.md @@ -0,0 +1,88 @@ +[English](ppvehicle_plate_en.md) | 简体中文 + +# PP-Vehicle车牌识别模块 + +车牌识别,在车辆应用场景中有着非常广泛的应用,起到车辆身份识别的作用,比如车辆出入口自动闸机。PP-Vehicle中提供了车辆的跟踪及其车牌识别的功能,并提供模型下载: + +| 任务 | 算法 | 精度 | 预测速度(ms) |预测模型下载链接 | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| 车辆检测/跟踪 | PP-YOLOE-l | mAP: 63.9 | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车牌检测模型 | ch_PP-OCRv3_det | hmean: 0.979 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) | +| 车牌识别模型 | ch_PP-OCRv3_rec | acc: 0.773 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | +1. 跟踪模型使用PPVehicle数据集(整合了BDD100K-MOT和UA-DETRAC),是将BDD100K-MOT中的car, truck, bus, van和UA-DETRAC中的car, bus, van都合并为1类vehicle(1)后的数据集。 +2. 车牌检测、识别模型使用PP-OCRv3模型在CCPD2019、CCPD2020混合车牌数据集上fine-tune得到。 + +## 使用方法 + +1. 从上表链接中下载模型并解压到```PaddleDetection/output_inference```路径下,并修改配置文件中模型路径,也可默认自动下载模型。设置```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中`VEHICLE_PLATE`的enable: True + +`infer_cfg_ppvehicle.yml`中配置项说明: +``` +VEHICLE_PLATE: #模块名称 + det_model_dir: output_inference/ch_PP-OCRv3_det_infer/ #车牌检测模型路径 + det_limit_side_len: 480 #检测模型单边输入尺寸 + det_limit_type: "max" #检测模型输入尺寸长短边选择,"max"表示长边 + rec_model_dir: output_inference/ch_PP-OCRv3_rec_infer/ #车牌识别模型路径 + rec_image_shape: [3, 48, 320] #车牌识别模型输入尺寸 + rec_batch_num: 6 #车牌识别batchsize + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt #OCR模型查询字典 + basemode: "idbased" #流程类型,'idbased'表示基于跟踪模型 + enable: False #功能是否开启 +``` + +2. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPVehicle_QUICK_STARTED.md#41-参数说明))。 +```python +#单张图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu \ + +#图片文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_dir=images/ \ + --device=gpu \ + +``` + +3. 视频输入时,启动命令如下 +```python +#单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + +#视频文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + --device=gpu \ +``` + +4. 若修改模型路径,有以下两种方式: + + - 方法一:```./deploy/pipeline/config/infer_cfg_ppvehicle.yml```下可以配置不同模型路径,车牌识别模型修改`VEHICLE_PLATE`字段下配置 + - 方法二:命令行中--config配置项后面增加`-o VEHICLE_PLATE.det_model_dir=[YOUR_DETMODEL_PATH] VEHICLE_PLATE.rec_model_dir=[YOUR_RECMODEL_PATH]`修改模型路径。 + + +测试效果如下: + +
    + +
    + + +## 方案说明 + +1. 目标检测/多目标跟踪获取图片/视频输入中的车辆检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md) +2. 通过车辆检测框的坐标在输入图像中截取每个车辆 +3. 使用车牌检测模型在每张车辆截图中识别车牌所在位置,同理截取车牌区域,模型方案为PP-OCRv3_det模型,经CCPD数据集在车牌场景fine-tune得到。 +4. 使用字符识别模型识别车牌中的字符。模型方案为PP-OCRv3_rec模型,经CCPD数据集在车牌场景fine-tune得到。 + +**性能优化措施:** + +1. 使用跳帧策略,每10帧做一次车牌检测,避免每帧做车牌检测的算力消耗。 +2. 车牌结果稳定策略,避免单帧结果的波动,利用同一个id的历史所有车牌识别结果进行投票,得到该id最大可能的正确结果。 + +## 参考资料 + +1. PaddeDetection特色检测模型[PP-YOLOE](../../../../configs/ppyoloe)。 +2. Paddle字符识别模型库[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)。 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d33e1e895a8a8e2e2d7d105fd93e42c13e581dd1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_plate_en.md @@ -0,0 +1,90 @@ +English | [简体中文](ppvehicle_plate.md) + +# PP-Vehicle License Plate Recognition Modules + +License plate recognition has a very wide range of applications with vehicle identification functions, such as automatic vehicle entrance/exit gates. + +PP-Vehicle supports vehicle tracking and license plate recognition. Models are available for download: + +| Task | Algorithm | Accuracy | Inference speed(ms) | Model Download | +|:-------------------------- |:---------------:|:------------:|:-------------------:|:------------------------------------------------------------------------------------------:| +| Vehicle Detection/Tracking | PP-YOLOE-l | mAP: 63.9 | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Vehicle Detection Model | ch_PP-OCRv3_det | hmean: 0.979 | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) | +| Vehicle Detection Model | ch_PP-OCRv3_rec | acc: 0.773 | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | + +1. The tracking model uses the PPVehicle dataset ( which integrates BDD100K-MOT and UA-DETRAC). The dataset merged car, truck, bus, van from BDD100K-MOT and car, bus, van from UA-DETRAC all into 1 class vehicle(1). +2. License plate detection and recognition models were obtained from fine-tuned PP-OCRv3 model on the CCPD2019 and CCPD2020 mixed license plate datasets. + +## How to Use + +1. Download models from the above table and unzip it to```PaddleDetection/output_inference```, and modify the model path in the configuration file. Models can also be downloaded automatically by default: Set enable: True of `VEHICLE_PLATE` in `deploy/pipeline/config/infer_cfg_ppvehicle.yml` + +Config Description of `infer_cfg_ppvehicle.yml`: + +``` +VEHICLE_PLATE: #模块名称 + det_model_dir: output_inference/ch_PP-OCRv3_det_infer/ #车牌检测模型路径 + det_limit_side_len: 480 #检测模型单边输入尺寸 + det_limit_type: "max" #检测模型输入尺寸长短边选择,"max"表示长边 + rec_model_dir: output_inference/ch_PP-OCRv3_rec_infer/ #车牌识别模型路径 + rec_image_shape: [3, 48, 320] #车牌识别模型输入尺寸 + rec_batch_num: 6 #车牌识别batchsize + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt #OCR模型查询字典 + basemode: "idbased" #流程类型,'idbased'表示基于跟踪模型 + enable: False #功能是否开启 +``` + +2. For picture input, the start command is as follows (for more descriptions of the command parameters, please refer to [Quick Start - Parameter Description](. /PPVehicle_QUICK_STARTED.md#41-parameter description)). + + ```python + #Single image + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu \ + #Image folder + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_dir=images/ \ + --device=gpu \ + ``` + +3. For video input, the start command is as follows + +```bash +#Single video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + +#Video folder +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + --device=gpu \ +``` + +4. There are two ways to modify the model path + + - Config different model path in ```./deploy/pipeline/config/infer_cfg_ppvehicle.yml```, and modify`VEHICLE_PLATE`field to config license plate recognition model modification + - **[Recommand]** Add`-o VEHICLE_PLATE.det_model_dir=[YOUR_DETMODEL_PATH] VEHICLE_PLATE.rec_model_dir=[YOUR_RECMODEL_PATH]` to config file in command line. + +The test results are as follows: + +
    + +
    + +## Solutions + +1. PP-YOLOE is adopted for vehicle detection frame of object detection, multi-object tracking in the picture/video input. For details, please refer to [PP-YOLOE](... /... /... /configs/ppyoloe/README_cn.md) +2. By using the coordinates of the vehicle detection frame, each vehicle's image is intercepted in the input image +3. Use the license plate detection model to identify the location of the license plate in each vehicle screenshot as well as the license plate area. The PP-OCRv3_det model is adopted as the solution, obtained from fine-tuned CCPD dataset in terms of number plate. +4. Use a character recognition model to identify characters in a number plate. The PP-OCRv3_det model is adopted as the solution, obtained from fine-tuned CCPD dataset in terms of number plate. + +**Performance optimization measures:** + +1. Use a frame skipping strategy to detect license plates every 10 frames to reduce the computing workload. +2. Use the license plate result stabilization strategy to avoid the volatility of single frame results; use all historical license plate recognition results of the same id to gain the most likely result for that id. + +## Reference + +1. PaddeDetection featured detection model PP-YOLOE](../../../../configs/ppyoloe)。 +2. Paddle OCR Model Library [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)。 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press.md new file mode 100644 index 0000000000000000000000000000000000000000..acf9dc9095f070896e0f76343eb52c118f16f081 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press.md @@ -0,0 +1,115 @@ +[English](ppvehicle_press_en.md) | 简体中文 + +# PP-Vehicle压实线识别模块 + +车辆压实线识别在智慧城市,智慧交通等方向具有广泛应用。在PP-Vehicle中,集成了车辆压实线识别模块,可识别车辆是否违章压实线。 + +| 任务 | 算法 | 精度 | 预测速度 | 下载链接| +|-----------|------|-----------|----------|---------------| +| 车辆检测/跟踪 | PP-YOLOE | mAP 63.9 | 38.67ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车道线识别 | PP-liteseg | mIou 32.69 | 47 ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | + + +注意: +1. 车辆检测/跟踪模型预测速度是基于NVIDIA T4, 开启TensorRT FP16得到。模型预测速度包含数据预处理、模型预测、后处理部分。 +2. 车辆检测/跟踪模型的训练和精度测试均基于[VeRi数据集](https://www.v7labs.com/open-datasets/veri-dataset)。 +3. 车道线模型预测速度基于Tesla P40,python端预测,模型预测速度包含数据预处理、模型预测、后处理部分。 +4. 车道线模型训练和精度测试均基于[BDD100K-LaneSeg](https://bdd-data.berkeley.edu/portal.html#download)和[Apollo Scape](http://apolloscape.auto/lane_segmentation.html#to_dataset_href),两个数据集车道线分割[标签](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip) + + +## 使用方法 + +### 配置项说明 + +[配置文件](../../config/infer_cfg_ppvehicle.yml)中与车辆压线相关的参数如下: +``` +VEHICLE_PRESSING: + enable: True #是否开启功能 +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml #车道线提取配置文件 + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip #模型文件路径 +``` +[车道线配置文件](../../config/lane_seg_config.yml)中与车道线提取相关的参数如下: +``` +type: PLSLaneseg #选择分割模型 + +PLSLaneseg: + batch_size: 1 #图片batch_size + device: gpu #选择gpu还是cpu + filter_flag: True #是否过滤水平方向道路线 + horizontal_filtration_degree: 23 #过滤水平方向车道线阈值,当分割出来的车道线最大倾斜角与 + #最小倾斜角差值小于阈值时,不进行过滤 + horizontal_filtering_threshold: 0.25 #确定竖直方向与水平方向分开阈值 + #thr = (min_degree+max_degree)*0.25 + #根据车道线倾斜角与thr的大小比较,将车道线分为垂直方向与水平方向 +``` + +### 使用命令 + +1. 从模型库下载`车辆检测/跟踪`, `车道线识别`两个预测部署模型并解压到`./output_inference`路径下;默认会自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件中`VEHICLE_PRESSING`项的`enable: True`,以启用该功能。 +3. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPVehicle_QUICK_STARTED.md)): + +```bash +# 预测单张图片文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --image_file=test_image.jpg \ + --device=gpu + +# 预测包含一张或多张图片的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --image_dir=images/ \ + --device=gpu +``` + +4. 视频输入时,启动命令如下: + +```bash +#预测单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --video_file=test_video.mp4 \ + --device=gpu + +#预测包含一个或多个视频的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + -o VEHICLE_PRESSING.enable=true + --device=gpu +``` + +5. 若修改模型路径,有以下两种方式: + + - 方法一:`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`下可以配置不同模型路径,车道线识别模型修改`LANE_SEG`字段下配置 + - 方法二:直接在命令行中增加`-o`,以覆盖配置文件中的默认模型路径: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o VEHICLE_PRESSING.enable=true + LANE_SEG.model_dir=output_inference +``` + +测试效果如下: + +
    + +
    + +## 方案说明 +1.车道线识别模型使用了[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) 的超轻量分割方案。训练样本[标签](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip)分为4类: + 0 背景 + 1 双黄线 + 2 实线 + 3 虚线 +车辆压线分析过滤虚线类; + +2.车道线通过对分割结果聚类得到,且默认过滤水平方向车道线,若不过滤可在[车道线配置文件](../../config/lane_seg_config.yml)修改`filter_flag`参数; + +3.车辆压线判断条件:车辆的检测框底边线与车道线是否有交点; + +**性能优化措施** +1.因摄像头视角原因,可以根据实际情况决定是否过滤水平方向车道线; diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press_en.md new file mode 100644 index 0000000000000000000000000000000000000000..93b93a5e626307d5962fdb296c81966d5f5afd19 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_press_en.md @@ -0,0 +1,115 @@ +English | [简体中文](ppvehicle_press.md) + +# PP-Vehicle press line identification module + +Vehicle compaction line recognition is widely used in smart cities, smart transportation and other directions. +In PP-Vehicle, a vehicle compaction line identification module is integrated to identify whether the vehicle is in violation of regulations. + +| task | algorithm | precision | infer speed | download| +|-----------|------|-----------|----------|---------------| +| Vehicle detection/tracking | PP-YOLOE | mAP 63.9 | 38.67ms | [infer deploy model](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Lane line segmentation | PP-liteseg | mIou 32.69 | 47 ms | [infer deploy model](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | + + +Notes: +1. The prediction speed of vehicle detection/tracking model is based on NVIDIA T4 and TensorRT FP16. The model prediction speed includes data preprocessing, model prediction and post-processing. +2. The training and precision test of vehicle detection/tracking model are based on [VeRi](https://www.v7labs.com/open-datasets/veri-dataset). +3. The predicted speed of lane line segmentation model is based on Tesla P40 and python prediction. The predicted speed of the model includes data preprocessing, model prediction and post-processing. +4. Lane line model training and precision testing are based on [BDD100K-LaneSeg](https://bdd-data.berkeley.edu/portal.html#download)and [Apollo Scape](http://apolloscape.auto/lane_segmentation.html#to_dataset_href),The label data of the two data sets is in[Lane_dataset_label](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip) + + +## Instructions + +### Description of Configuration + +The parameters related to vehicle line pressing in [config file](../../config/infer_cfg_ppvehicle.yml) is as follows: +``` +VEHICLE_PRESSING: + enable: True #Whether to enable the funcion +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml #lane line seg config file + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip #model path +``` +The parameters related to Lane line segmentation in [lane line seg config file](../../config/lane_seg_config.yml)is as follows: +``` +type: PLSLaneseg #Select segmentation Model + +PLSLaneseg: + batch_size: 1 #image batch_size + device: gpu #device is gpu or cpu + filter_flag: True #Whether to filter the horizontal direction road route + horizontal_filtration_degree: 23 #Filter the threshold value of the lane line in the horizontal direction. When the difference between the maximum inclination angle and the minimum inclination angle of the segmented lane line is less than the threshold value, no filtering is performed + + horizontal_filtering_threshold: 0.25 #Determine the threshold value for separating the vertical direction from the horizontal direction thr=(min_degree+max_degree) * 0.25 Divide the lane line into vertical direction and horizontal direction according to the comparison between the gradient angle of the lane line and thr +``` +### How to Use + +1. Download 'vehicle detection/tracking' and 'lane line recognition' two prediction deployment models from the model base and unzip them to '/ output_ Invitation ` under the path; By default, the model will be downloaded automatically. If you download it manually, you need to modify the model folder as the model storage path. + +2. Modify Profile ` VEHICLE_PRESSING ' -'enable: True' item to enable this function. + +3. When inputting a picture, the startup command is as follows (for more command parameter descriptions,please refer to [QUICK_STARTED - Parameter_Description](./PPVehicle_QUICK_STARTED.md) + +```bash +# For single image +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --image_file=test_image.jpg \ + --device=gpu + +# For folder contains one or multiple images +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --image_dir=images/ \ + --device=gpu +``` + +4. For video input, please run these commands. + +```bash +#For single video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_PRESSING.enable=true + --video_file=test_video.mp4 \ + --device=gpu + +#For folder contains one or multiple videos +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + -o VEHICLE_PRESSING.enable=true + --device=gpu +``` + +5. There are two ways to modify the model path: + + - 1.Set paths of each model in `./deploy/pipeline/config/infer_cfg_ppvehicle.yml`,For Lane line segmentation, the path should be modified under the `LANE_SEG` + - 2.Directly add `-o` in command line to override the default model path in the configuration file: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o VEHICLE_PRESSING.enable=true + LANE_SEG.model_dir=output_inference +``` + +The result is shown as follow: + +
    + +
    + +## Features to the Solution +1.Lane line recognition model uses [PaddleSeg]( https://github.com/PaddlePaddle/PaddleSeg )Super lightweight segmentation scheme.Train [lable](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip)it is divided into four categories: + 0 Background + 1 Double yellow line + 2 Solid line + 3 Dashed line +Lane line recognition filtering Dashed lines; + +2.Lane lines are obtained by clustering segmentation results, and the horizontal lane lines are filtered by default. If not, you can modify the `filter_flag` in [lane line seg config file](../../config/lane_seg_config.yml); + +3.Judgment conditions for vehicle line pressing: whether there is intersection between the bottom edge line of vehicle detection frame and lane line; + +**Performance optimization measures:** +1.Due to the camera angle, it can be decided whether to filter the lane line in the horizontal direction according to the actual situation; diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde.md new file mode 100644 index 0000000000000000000000000000000000000000..8f229d16ac7f4edc1e860da38faf7ec1897ceecc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde.md @@ -0,0 +1,126 @@ +[English](ppvehicle_retrograde_en.md) | 简体中文 + +# PP-Vehicle车辆逆行识别模块 + +车辆逆行识别在智慧城市,智慧交通等方向具有广泛应用。在PP-Vehicle中,集成了车辆逆行识别模块,可识别车辆是否逆行。 + +| 任务 | 算法 | 精度 | 预测速度 | 下载链接| +|-----------|------|-----------|----------|---------------| +| 车辆检测/跟踪 | PP-YOLOE | mAP 63.9 | 38.67ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车道线识别 | PP-liteseg | mIou 32.69 | 47 ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | + + +注意: +1. 车辆检测/跟踪模型预测速度是基于NVIDIA T4, 开启TensorRT FP16得到。模型预测速度包含数据预处理、模型预测、后处理部分。 +2. 车辆检测/跟踪模型的训练和精度测试均基于[VeRi数据集](https://www.v7labs.com/open-datasets/veri-dataset)。 +3. 车道线模型预测速度基于Tesla P40,python端预测,模型预测速度包含数据预处理、模型预测、后处理部分。 +4. 车道线模型训练和精度测试均基于[BDD100K-LaneSeg](https://bdd-data.berkeley.edu/portal.html#download.)和[Apollo Scape](http://apolloscape.auto/lane_segmentation.html#to_dataset_href)。两个数据集的标签文件[Lane_dataset_label](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip) + + +## 使用方法 + +### 配置项说明 + +[配置文件](../../config/infer_cfg_ppvehicle.yml)中与车辆逆行识别模块相关的参数如下: +``` +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml #车道线提取配置文件 + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip #模型文件路径 + +VEHICLE_RETROGRADE: + frame_len: 8 #采样帧数 + sample_freq: 7 #采样频率 + enable: True #开启车辆逆行判断功能 + filter_horizontal_flag: False #是否过滤水平方向车辆 + keep_right_flag: True #按车辆靠右行驶规则,若车辆靠左行驶,则设为False + deviation: 23 #过滤水平方向车辆的角度阈值,如果大于该角度则过滤 + move_scale: 0.01 #过滤静止车辆阈值,若车辆移动像素大于图片对角线*move_scale,则认为车辆移动,反之 + #车辆静止 + fence_line: [] #车道中间线坐标,格式[x1,y1,x2,y2] 且y2>y1。若为空,由程序根据车流方向自动判断 +``` +[车道线配置文件](../../config/lane_seg_config.yml)中与车道线提取相关的参数如下: +``` +type: PLSLaneseg #选择分割模型 + +PLSLaneseg: + batch_size: 1 #图片batch_size + device: gpu #选择gpu还是cpu + filter_flag: True #是否过滤水平方向道路线 + horizontal_filtration_degree: 23 #过滤水平方向车道线阈值,当分割出来的车道线最大倾斜角与 + #最小倾斜角差值小于阈值时,不进行过滤 + horizontal_filtering_threshold: 0.25 #确定竖直方向与水平方向分开阈值 + #thr = (min_degree+max_degree)*0.25 + #根据车道线倾斜角与thr的大小比较,将车道线分为垂直方向与水平方向 +``` + +### 使用命令 + +1. 从模型库下载`车辆检测/跟踪`, `车道线识别`两个预测部署模型并解压到`./output_inference`路径下;默认会自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件中`VEHICLE_RETROGRADE`项的`enable: True`,以启用该功能。 + + + +3. 车辆逆行识别功能需要视频输入时,启动命令如下: + +```bash +#预测单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_RETROGRADE.enable=true \ + --video_file=test_video.mp4 \ + --device=gpu + +#预测包含一个或多个视频的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_RETROGRADE.enable=true \ + --video_dir=test_video \ + --device=gpu +``` + +5. 若修改模型路径,有以下两种方式: + + - 方法一:`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`下可以配置不同模型路径,车道线识别模型修改`LANE_SEG`字段下配置 + - 方法二:直接在命令行中增加`-o`,以覆盖配置文件中的默认模型路径: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o LANE_SEG.model_dir=output_inference/ + VEHICLE_RETROGRADE.enable=true + +``` +测试效果如下: + +
    + +
    + +**注意:** + - 车道线中间线自动判断条件:在采样的视频段内同时有两个相反方向的车辆,且判断一次后固定,不再更新; + - 因摄像头角度以及2d视角问题,车道线中间线判断存在不准确情况; + - 可在配置文件手动输入中间线坐标.参考[车辆违章配置文件](../../config/examples/infer_cfg_vehicle_violation.yml) + + +## 方案说明 +1.车辆在采样视频段内,根据车道中间线的位置与车辆轨迹,判断车辆是否逆行,判断流程图: +
    + +
    + +2.车道线识别模型使用了[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) 的超轻量分割方案。训练样本[标签](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip)分为4类: + 0 背景 + 1 双黄线 + 2 实线 + 3 虚线 +车辆逆行分析过滤虚线类; + +3.车道线通过对分割结果聚类得到,且默认过滤水平方向车道线,若不过滤可在[车道线配置文件](../../config/lane_seg_config.yml)修改`filter_flag`参数; + +4.车辆逆行判断默认过滤水平方向车辆,若不过滤可在[配置文件](../../config/infer_cfg_ppvehicle.yml)修改`filter_horizontal_flag`参数; + +5.车辆逆行默认按靠右行驶规则判断,若修改,可在[配置文件](../../config/infer_cfg_ppvehicle.yml)修改`keep_right_flag`参数; + +**性能优化措施**: +1.因摄像头视角原因,可以根据实际情况决定是否过滤水平方向车道线与水平方向车辆; + +2.车道中间线可手动输入; diff --git a/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde_en.md b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde_en.md new file mode 100644 index 0000000000000000000000000000000000000000..650efe199a1df5bcf0a12458c78cc2bc2b8837de --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/docs/tutorials/ppvehicle_retrograde_en.md @@ -0,0 +1,125 @@ +English | [简体中文](ppvehicle_retrograde.md) + +# PP-Vehicle vehicle retrograde identification module + +Vehicle reverse identification is widely used in smart cities, smart transportation and other directions. In PP-Vehicle, a vehicle retrograde identification module is integrated to identify whether the vehicle is retrograde. + +| task | algorithm | precision | infer speed | download| +|-----------|------|-----------|----------|---------------| +| Vehicle detection/tracking | PP-YOLOE | mAP 63.9 | 38.67ms | [infer deploy model](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| Lane line segmentation | PP-liteseg | mIou 32.69 | 47 ms | [infer deploy model](https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip) | + + +Notes: +1. The prediction speed of vehicle detection/tracking model is based on NVIDIA T4 and TensorRT FP16. The model prediction speed includes data preprocessing, model prediction and post-processing. +2. The training and precision test of vehicle detection/tracking model are based on [VeRi](https://www.v7labs.com/open-datasets/veri-dataset). +3. The predicted speed of lane line segmentation model is based on Tesla P40 and python prediction. The predicted speed of the model includes data preprocessing, model prediction and post-processing. +4. Lane line model training and precision testing are based on [BDD100K-LaneSeg](https://bdd-data.berkeley.edu/portal.html#download) and [Apollo Scape](http://apolloscape.auto/lane_segmentation.html#to_dataset_href),The label data of the two data sets is in [Lane_dataset_label](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip) + + + +## Instructions + +### Description of Configuration + +[The parameters related to vehicle retrograde in [config file](../../config/infer_cfg_ppvehicle.yml) is as follows: +``` +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml #lane line seg config file + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/pp_lite_stdc2_bdd100k.zip #model path + +VEHICLE_RETROGRADE: + frame_len: 8 #Number of sampling frames + sample_freq: 7 #sampling frequency + enable: True #Whether to enable the funcion + filter_horizontal_flag: False #Whether to filter vehicles in horizontal direction + keep_right_flag: True #According to the right driving rule, if the vehicle keeps left driving, it is set as False + deviation: 23 #Filter the horizontal angle vehicles threshold. If it is greater than this angle, filter + move_scale: 0.01 #Filter the threshold value of stationary vehicles. If the vehicle moving pixel is greater than the image diagonal * move_scale, the vehicle is considered moving, otherwise, the vehicle is stationary + fence_line: [] #Lane centerline coordinates, format[x1,y1,x2,y2] and y2>y1. If it is empty, the program will automatically judge according to the direction of traffic flow +``` +The parameters related to Lane line segmentation in [lane line seg config file](../../config/lane_seg_config.yml)is as follows: +``` +type: PLSLaneseg #Select segmentation Model + +PLSLaneseg: + batch_size: 1 #image batch_size + device: gpu #device is gpu or cpu + filter_flag: True #Whether to filter the horizontal direction road route + horizontal_filtration_degree: 23 #Filter the threshold value of the lane line in the horizontal direction. When the difference between the maximum inclination angle and the minimum inclination angle of the segmented lane line is less than the threshold value, no filtering is performed + + horizontal_filtering_threshold: 0.25 #Determine the threshold value for separating the vertical direction from the horizontal direction thr=(min_degree+max_degree) * 0.25 Divide the lane line into vertical direction and horizontal direction according to the comparison between the gradient angle of the lane line and thr +``` + +### How to Use + +1. Download 'vehicle detection/tracking' and 'lane line recognition' two prediction deployment models from the model base and unzip them to '/ output_ Invitation ` under the path; By default, the model will be downloaded automatically. If you download it manually, you need to modify the model folder as the model storage path. +2. Modify Profile`VEHICLE_RETROGRADE`-`enable: True`, item to enable this function. + + + +3. When video input is required for vehicle retrograde recognition function, the starting command is as follows: + +```bash +#For single video +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_RETROGRADE.enable=true \ + --video_file=test_video.mp4 \ + --device=gpu + +#For folder contains one or multiple videos +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + -o VEHICLE_RETROGRADE.enable=true \ + --video_dir=test_video \ + --device=gpu +``` + +5. There are two ways to modify the model path: + + - 1.Set paths of each model in `./deploy/pipeline/config/infer_cfg_ppvehicle.yml`,For Lane line segmentation, the path should be modified under the `LANE_SEG` + - 2.Directly add `-o` in command line to override the default model path in the configuration file: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + -o LANE_SEG.model_dir=output_inference/ + VEHICLE_RETROGRADE.enable=true + +``` +The result is shown as follow: + +
    + +
    + +**Note:** + - Automatic judgment condition of lane line middle line: there are two vehicles in opposite directions in the sampled video segment, and the judgment is fixed after one time and will not be updated; + - Due to camera angle and 2d visual angle problems, the judgment of lane line middle line is inaccurate. + - You can manually enter the middle line coordinates in the configuration file.Example as [infer_cfg_vehicle_violation.yml](../../config/examples/infer_cfg_vehicle_violation.yml) + + +## Features to the Solution +1.In the sampling video segment, judge whether the vehicle is retrograde according to the location of the lane centerline and the vehicle track, and determine the flow chart: +
    + +
    + +2.Lane line recognition model uses [PaddleSeg]( https://github.com/PaddlePaddle/PaddleSeg )Super lightweight segmentation scheme.Train [lable](https://bj.bcebos.com/v1/paddledet/data/mot/bdd100k/lane_dataset_label.zip)it is divided into four categories: + 0 Background + 1 Double yellow line + 2 Solid line + 3 Dashed line +Lane line recognition filtering Dashed lines; + +3.Lane lines are obtained by clustering segmentation results, and the horizontal lane lines are filtered by default. If not, you can modify the `filter_flag` in [lane line seg config file](../../config/lane_seg_config.yml); + +4.The vehicles in the horizontal direction are filtered by default when judging the vehicles in the reverse direction. If not, you can modify the `filter_horizontal_flag` in [config file](../../config/infer_cfg_ppvehicle.yml); + +5.The vehicle will be judged according to the right driving rule by default.If not, you can modify the `keep_right_flag` in [config file](../../config/infer_cfg_ppvehicle.yml); + + +**Performance optimization measures:** +1.Due to the camera's viewing angle, it can be decided whether to filter the lane lines and vehicles in the horizontal direction according to the actual situation; + +2.The lane middle line can be manually entered; diff --git a/PaddleDetection-release-2.6/deploy/pipeline/download.py b/PaddleDetection-release-2.6/deploy/pipeline/download.py new file mode 100644 index 0000000000000000000000000000000000000000..a166b3ba16587f866fc0b4822610ba9bbc43f433 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/download.py @@ -0,0 +1,340 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os, sys +import os.path as osp +import hashlib +import requests +import shutil +import tqdm +import time +import tarfile +import zipfile +from paddle.utils.download import _get_unique_endpoints + +PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX = 'https://paddledet.bj.bcebos.com/' + +DOWNLOAD_RETRY_LIMIT = 3 + +WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/infer_weights") + +MODEL_URL_MD5_DICT = { + 'https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz': + '1b8eae0f098635699bd4e8bccf3067a7', + 'https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz': + '64fa0e0701efd93c7db52a9b685b3de6', + "https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip": + "3859d1a26e0c498285c2374b1a347013", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip": + "4ed58b546be2a76d8ccbb138f64874ac", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip": + "a20d5f6ca087bff0e9f2b18df45a36f2", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip": + "1dfb161bf12bbc1365b2ed6866674483", + "https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip": + "5d4609142501258608bf0a1445eedaba", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip": + "cf1c3c4bae90b975accb954d13129ea4", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip": + "4cd12ae55be8f0eb2b90c08ac3b48218", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip": + "cf86b87ace97540dace6ef08e62b584a", + "https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip": + "fdc4dac38393b8e2b5921c1e1fdd5315" +} + + +def is_url(path): + """ + Whether path is URL. + Args: + path (string): URL string or not. + """ + return path.startswith('http://') \ + or path.startswith('https://') \ + or path.startswith('ppdet://') + + +def parse_url(url): + url = url.replace("ppdet://", PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX) + return url + + +def map_path(url, root_dir, path_depth=1): + # parse path after download to decompress under root_dir + assert path_depth > 0, "path_depth should be a positive integer" + dirname = url + for _ in range(path_depth): + dirname = osp.dirname(dirname) + fpath = osp.relpath(url, dirname) + + zip_formats = ['.zip', '.tar', '.gz'] + for zip_format in zip_formats: + fpath = fpath.replace(zip_format, '') + return osp.join(root_dir, fpath) + + +def _md5check(fullname, md5sum=None): + if md5sum is None: + return True + + md5 = hashlib.md5() + with open(fullname, 'rb') as f: + for chunk in iter(lambda: f.read(4096), b""): + md5.update(chunk) + calc_md5sum = md5.hexdigest() + + if calc_md5sum != md5sum: + return False + return True + + +def _check_exist_file_md5(filename, md5sum, url): + return _md5check(filename, md5sum) + + +def _download(url, path, md5sum=None): + """ + Download from url, save to path. + url (str): download url + path (str): download to given path + """ + if not osp.exists(path): + os.makedirs(path) + + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + retry_cnt = 0 + while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum, + url)): + if retry_cnt < DOWNLOAD_RETRY_LIMIT: + retry_cnt += 1 + else: + raise RuntimeError("Download from {} failed. " + "Retry limit reached".format(url)) + + # NOTE: windows path join may incur \, which is invalid in url + if sys.platform == "win32": + url = url.replace('\\', '/') + + req = requests.get(url, stream=True) + if req.status_code != 200: + raise RuntimeError("Downloading from {} failed with code " + "{}!".format(url, req.status_code)) + + # For protecting download interupted, download to + # tmp_fullname firstly, move tmp_fullname to fullname + # after download finished + tmp_fullname = fullname + "_tmp" + total_size = req.headers.get('content-length') + with open(tmp_fullname, 'wb') as f: + if total_size: + for chunk in tqdm.tqdm( + req.iter_content(chunk_size=1024), + total=(int(total_size) + 1023) // 1024, + unit='KB'): + f.write(chunk) + else: + for chunk in req.iter_content(chunk_size=1024): + if chunk: + f.write(chunk) + shutil.move(tmp_fullname, fullname) + return fullname + + +def _download_dist(url, path, md5sum=None): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + trainer_id = int(env['PADDLE_TRAINER_ID']) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + return _download(url, path, md5sum) + else: + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + lock_path = fullname + '.download.lock' + + if not osp.isdir(path): + os.makedirs(path) + + if not osp.exists(fullname): + from paddle.distributed import ParallelEnv + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + if ParallelEnv().current_endpoint in unique_endpoints: + _download(url, path, md5sum) + os.remove(lock_path) + else: + while os.path.exists(lock_path): + time.sleep(0.5) + return fullname + else: + return _download(url, path, md5sum) + + +def _move_and_merge_tree(src, dst): + """ + Move src directory to dst, if dst is already exists, + merge src to dst + """ + if not osp.exists(dst): + shutil.move(src, dst) + elif osp.isfile(src): + shutil.move(src, dst) + else: + for fp in os.listdir(src): + src_fp = osp.join(src, fp) + dst_fp = osp.join(dst, fp) + if osp.isdir(src_fp): + if osp.isdir(dst_fp): + _move_and_merge_tree(src_fp, dst_fp) + else: + shutil.move(src_fp, dst_fp) + elif osp.isfile(src_fp) and \ + not osp.isfile(dst_fp): + shutil.move(src_fp, dst_fp) + + +def _decompress(fname): + """ + Decompress for zip and tar file + """ + + # For protecting decompressing interupted, + # decompress to fpath_tmp directory firstly, if decompress + # successed, move decompress files to fpath and delete + # fpath_tmp and remove download compress file. + fpath = osp.split(fname)[0] + fpath_tmp = osp.join(fpath, 'tmp') + if osp.isdir(fpath_tmp): + shutil.rmtree(fpath_tmp) + os.makedirs(fpath_tmp) + + if fname.find('tar') >= 0: + with tarfile.open(fname) as tf: + tf.extractall(path=fpath_tmp) + elif fname.find('zip') >= 0: + with zipfile.ZipFile(fname) as zf: + zf.extractall(path=fpath_tmp) + elif fname.find('.txt') >= 0: + return + else: + raise TypeError("Unsupport compress file type {}".format(fname)) + + for f in os.listdir(fpath_tmp): + src_dir = osp.join(fpath_tmp, f) + dst_dir = osp.join(fpath, f) + _move_and_merge_tree(src_dir, dst_dir) + + shutil.rmtree(fpath_tmp) + os.remove(fname) + + +def _decompress_dist(fname): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + trainer_id = int(env['PADDLE_TRAINER_ID']) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + _decompress(fname) + else: + lock_path = fname + '.decompress.lock' + from paddle.distributed import ParallelEnv + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + # NOTE(dkp): _decompress_dist always performed after + # _download_dist, in _download_dist sub-trainers is waiting + # for download lock file release with sleeping, if decompress + # prograss is very fast and finished with in the sleeping gap + # time, e.g in tiny dataset such as coco_ce, spine_coco, main + # trainer may finish decompress and release lock file, so we + # only craete lock file in main trainer and all sub-trainer + # wait 1s for main trainer to create lock file, for 1s is + # twice as sleeping gap, this waiting time can keep all + # trainer pipeline in order + # **change this if you have more elegent methods** + if ParallelEnv().current_endpoint in unique_endpoints: + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + _decompress(fname) + os.remove(lock_path) + else: + time.sleep(1) + while os.path.exists(lock_path): + time.sleep(0.5) + else: + _decompress(fname) + + +def get_path(url, root_dir=WEIGHTS_HOME, md5sum=None, check_exist=True): + """ Download from given url to root_dir. + if file or directory specified by url is exists under + root_dir, return the path directly, otherwise download + from url and decompress it, return the path. + url (str): download url + root_dir (str): root dir for downloading + md5sum (str): md5 sum of download package + """ + # parse path after download to decompress under root_dir + fullpath = map_path(url, root_dir) + + # For same zip file, decompressed directory name different + # from zip file name, rename by following map + decompress_name_map = {"ppTSM_fight": "ppTSM", } + for k, v in decompress_name_map.items(): + if fullpath.find(k) >= 0: + fullpath = osp.join(osp.split(fullpath)[0], v) + + if osp.exists(fullpath) and check_exist: + if not osp.isfile(fullpath) or \ + _check_exist_file_md5(fullpath, md5sum, url): + return fullpath, True + else: + os.remove(fullpath) + + fullname = _download_dist(url, root_dir, md5sum) + + # new weights format which postfix is 'pdparams' not + # need to decompress + if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']: + _decompress_dist(fullname) + + return fullpath, False + + +def get_weights_path(url): + """Get weights path from WEIGHTS_HOME, if not exists, + download it from url. + """ + url = parse_url(url) + md5sum = None + if url in MODEL_URL_MD5_DICT.keys(): + md5sum = MODEL_URL_MD5_DICT[url] + path, _ = get_path(url, WEIGHTS_HOME, md5sum) + return path + + +def auto_download_model(model_path): + # auto download + if is_url(model_path): + weight = get_weights_path(model_path) + return weight + return None + + +if __name__ == "__main__": + model_path = "https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip" + auto_download_model(model_path) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pipe_utils.py b/PaddleDetection-release-2.6/deploy/pipeline/pipe_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..4f4a83f7c1d1f809066f232f12634a3be4e38390 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pipe_utils.py @@ -0,0 +1,265 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time +import os +import ast +import glob +import yaml +import copy +import numpy as np +import subprocess as sp + +from python.keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop + + +class Times(object): + def __init__(self): + self.time = 0. + # start time + self.st = 0. + # end time + self.et = 0. + + def start(self): + self.st = time.time() + + def end(self, repeats=1, accumulative=True): + self.et = time.time() + if accumulative: + self.time += (self.et - self.st) / repeats + else: + self.time = (self.et - self.st) / repeats + + def reset(self): + self.time = 0. + self.st = 0. + self.et = 0. + + def value(self): + return round(self.time, 4) + + +class PipeTimer(Times): + def __init__(self): + super(PipeTimer, self).__init__() + self.total_time = Times() + self.module_time = { + 'det': Times(), + 'mot': Times(), + 'attr': Times(), + 'kpt': Times(), + 'video_action': Times(), + 'skeleton_action': Times(), + 'reid': Times(), + 'det_action': Times(), + 'cls_action': Times(), + 'vehicle_attr': Times(), + 'vehicleplate': Times(), + 'lanes': Times(), + 'vehicle_press': Times(), + 'vehicle_retrograde': Times() + } + self.img_num = 0 + self.track_num = 0 + + def get_total_time(self): + total_time = self.total_time.value() + total_time = round(total_time, 4) + average_latency = total_time / max(1, self.img_num) + qps = 0 + if total_time > 0: + qps = 1 / average_latency + return total_time, average_latency, qps + + def info(self): + total_time, average_latency, qps = self.get_total_time() + print("------------------ Inference Time Info ----------------------") + print("total_time(ms): {}, img_num: {}".format(total_time * 1000, + self.img_num)) + + for k, v in self.module_time.items(): + v_time = round(v.value(), 4) + if v_time > 0 and k in ['det', 'mot', 'video_action']: + print("{} time(ms): {}; per frame average time(ms): {}".format( + k, v_time * 1000, v_time * 1000 / self.img_num)) + elif v_time > 0: + print("{} time(ms): {}; per trackid average time(ms): {}". + format(k, v_time * 1000, v_time * 1000 / self.track_num)) + + print("average latency time(ms): {:.2f}, QPS: {:2f}".format( + average_latency * 1000, qps)) + return qps + + def report(self, average=False): + dic = {} + dic['total'] = round(self.total_time.value() / max(1, self.img_num), + 4) if average else self.total_time.value() + dic['det'] = round(self.module_time['det'].value() / + max(1, self.img_num), + 4) if average else self.module_time['det'].value() + dic['mot'] = round(self.module_time['mot'].value() / + max(1, self.img_num), + 4) if average else self.module_time['mot'].value() + dic['attr'] = round(self.module_time['attr'].value() / + max(1, self.img_num), + 4) if average else self.module_time['attr'].value() + dic['kpt'] = round(self.module_time['kpt'].value() / + max(1, self.img_num), + 4) if average else self.module_time['kpt'].value() + dic['video_action'] = self.module_time['video_action'].value() + dic['skeleton_action'] = round( + self.module_time['skeleton_action'].value() / max(1, self.img_num), + 4) if average else self.module_time['skeleton_action'].value() + + dic['img_num'] = self.img_num + return dic + + +class PushStream(object): + def __init__(self, pushurl="rtsp://127.0.0.1:8554/"): + self.command = "" + # 自行设置 + self.pushurl = pushurl + + def initcmd(self, fps, width, height): + self.command = [ + 'ffmpeg', '-y', '-f', 'rawvideo', '-vcodec', 'rawvideo', '-pix_fmt', + 'bgr24', '-s', "{}x{}".format(width, height), '-r', str(fps), '-i', + '-', '-pix_fmt', 'yuv420p', '-f', 'rtsp', self.pushurl + ] + self.pipe = sp.Popen(self.command, stdin=sp.PIPE) + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--infer_img or --infer_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def crop_image_with_det(batch_input, det_res, thresh=0.3): + boxes = det_res['boxes'] + score = det_res['boxes'][:, 1] + boxes_num = det_res['boxes_num'] + start_idx = 0 + crop_res = [] + for b_id, input in enumerate(batch_input): + boxes_num_i = boxes_num[b_id] + if boxes_num_i == 0: + continue + boxes_i = boxes[start_idx:start_idx + boxes_num_i, :] + score_i = score[start_idx:start_idx + boxes_num_i] + res = [] + for box, s in zip(boxes_i, score_i): + if s > thresh: + crop_image, new_box, ori_box = expand_crop(input, box) + if crop_image is not None: + res.append(crop_image) + crop_res.append(res) + return crop_res + + +def normal_crop(image, rect): + imgh, imgw, c = image.shape + label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()] + org_rect = [xmin, ymin, xmax, ymax] + if label != 0: + return None, None, None + xmin = max(0, xmin) + ymin = max(0, ymin) + xmax = min(imgw, xmax) + ymax = min(imgh, ymax) + return image[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect + + +def crop_image_with_mot(input, mot_res, expand=True): + res = mot_res['boxes'] + crop_res = [] + new_bboxes = [] + ori_bboxes = [] + for box in res: + if expand: + crop_image, new_bbox, ori_bbox = expand_crop(input, box[1:]) + else: + crop_image, new_bbox, ori_bbox = normal_crop(input, box[1:]) + if crop_image is not None: + crop_res.append(crop_image) + new_bboxes.append(new_bbox) + ori_bboxes.append(ori_bbox) + return crop_res, new_bboxes, ori_bboxes + + +def parse_mot_res(input): + mot_res = [] + boxes, scores, ids = input[0] + for box, score, i in zip(boxes[0], scores[0], ids[0]): + xmin, ymin, w, h = box + res = [i, 0, score, xmin, ymin, xmin + w, ymin + h] + mot_res.append(res) + return {'boxes': np.array(mot_res)} + + +def refine_keypoint_coordinary(kpts, bbox, coord_size): + """ + This function is used to adjust coordinate values to a fixed scale. + """ + tl = bbox[:, 0:2] + wh = bbox[:, 2:] - tl + tl = np.expand_dims(np.transpose(tl, (1, 0)), (2, 3)) + wh = np.expand_dims(np.transpose(wh, (1, 0)), (2, 3)) + target_w, target_h = coord_size + res = (kpts - tl) / wh * np.expand_dims( + np.array([[target_w], [target_h]]), (2, 3)) + return res + + +def parse_mot_keypoint(input, coord_size): + parsed_skeleton_with_mot = {} + ids = [] + skeleton = [] + for tracker_id, kpt_seq in input: + ids.append(tracker_id) + kpts = np.array(kpt_seq.kpts, dtype=np.float32)[:, :, :2] + kpts = np.expand_dims(np.transpose(kpts, [2, 0, 1]), + -1) #T, K, C -> C, T, K, 1 + bbox = np.array(kpt_seq.bboxes, dtype=np.float32) + skeleton.append(refine_keypoint_coordinary(kpts, bbox, coord_size)) + parsed_skeleton_with_mot["mot_id"] = ids + parsed_skeleton_with_mot["skeleton"] = skeleton + return parsed_skeleton_with_mot \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pipeline.py b/PaddleDetection-release-2.6/deploy/pipeline/pipeline.py new file mode 100644 index 0000000000000000000000000000000000000000..3407f479e82a5a7ac96e29da98168df25ab44e20 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pipeline.py @@ -0,0 +1,1320 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +import cv2 +import numpy as np +import math +import paddle +import sys +import copy +import threading +import queue +import time +from collections import defaultdict +from datacollector import DataCollector, Result +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from cfg_utils import argsparser, print_arguments, merge_cfg +from pipe_utils import PipeTimer +from pipe_utils import get_test_images, crop_image_with_det, crop_image_with_mot, parse_mot_res, parse_mot_keypoint +from pipe_utils import PushStream + +from python.infer import Detector, DetectorPicoDet +from python.keypoint_infer import KeyPointDetector +from python.keypoint_postprocess import translate_to_ori_images +from python.preprocess import decode_image, ShortSizeScale +from python.visualize import visualize_box_mask, visualize_attr, visualize_pose, visualize_action, visualize_vehicleplate, visualize_vehiclepress, visualize_lane, visualize_vehicle_retrograde + +from pptracking.python.mot_sde_infer import SDE_Detector +from pptracking.python.mot.visualize import plot_tracking_dict +from pptracking.python.mot.utils import flow_statistic, update_object_info + +from pphuman.attr_infer import AttrDetector +from pphuman.video_action_infer import VideoActionRecognizer +from pphuman.action_infer import SkeletonActionRecognizer, DetActionRecognizer, ClsActionRecognizer +from pphuman.action_utils import KeyPointBuff, ActionVisualHelper +from pphuman.reid import ReID +from pphuman.mtmct import mtmct_process + +from ppvehicle.vehicle_plate import PlateRecognizer +from ppvehicle.vehicle_attr import VehicleAttr +from ppvehicle.vehicle_pressing import VehiclePressingRecognizer +from ppvehicle.vehicle_retrograde import VehicleRetrogradeRecognizer +from ppvehicle.lane_seg_infer import LaneSegPredictor + +from download import auto_download_model + + +class Pipeline(object): + """ + Pipeline + + Args: + args (argparse.Namespace): arguments in pipeline, which contains environment and runtime settings + cfg (dict): config of models in pipeline + """ + + def __init__(self, args, cfg): + self.multi_camera = False + reid_cfg = cfg.get('REID', False) + self.enable_mtmct = reid_cfg['enable'] if reid_cfg else False + self.is_video = False + self.output_dir = args.output_dir + self.vis_result = cfg['visual'] + self.input = self._parse_input(args.image_file, args.image_dir, + args.video_file, args.video_dir, + args.camera_id, args.rtsp) + if self.multi_camera: + self.predictor = [] + for name in self.input: + predictor_item = PipePredictor( + args, cfg, is_video=True, multi_camera=True) + predictor_item.set_file_name(name) + self.predictor.append(predictor_item) + + else: + self.predictor = PipePredictor(args, cfg, self.is_video) + if self.is_video: + self.predictor.set_file_name(self.input) + + def _parse_input(self, image_file, image_dir, video_file, video_dir, + camera_id, rtsp): + + # parse input as is_video and multi_camera + + if image_file is not None or image_dir is not None: + input = get_test_images(image_dir, image_file) + self.is_video = False + self.multi_camera = False + + elif video_file is not None: + assert os.path.exists( + video_file + ) or 'rtsp' in video_file, "video_file not exists and not an rtsp site." + self.multi_camera = False + input = video_file + self.is_video = True + + elif video_dir is not None: + videof = [os.path.join(video_dir, x) for x in os.listdir(video_dir)] + if len(videof) > 1: + self.multi_camera = True + videof.sort() + input = videof + else: + input = videof[0] + self.is_video = True + + elif rtsp is not None: + if len(rtsp) > 1: + rtsp = [rtsp_item for rtsp_item in rtsp if 'rtsp' in rtsp_item] + self.multi_camera = True + input = rtsp + else: + self.multi_camera = False + input = rtsp[0] + self.is_video = True + + elif camera_id != -1: + self.multi_camera = False + input = camera_id + self.is_video = True + + else: + raise ValueError( + "Illegal Input, please set one of ['video_file', 'camera_id', 'image_file', 'image_dir']" + ) + + return input + + def run_multithreads(self): + if self.multi_camera: + multi_res = [] + threads = [] + for idx, (predictor, + input) in enumerate(zip(self.predictor, self.input)): + thread = threading.Thread( + name=str(idx).zfill(3), + target=predictor.run, + args=(input, idx)) + threads.append(thread) + + for thread in threads: + thread.start() + + for predictor, thread in zip(self.predictor, threads): + thread.join() + collector_data = predictor.get_result() + multi_res.append(collector_data) + + if self.enable_mtmct: + mtmct_process( + multi_res, + self.input, + mtmct_vis=self.vis_result, + output_dir=self.output_dir) + + else: + self.predictor.run(self.input) + + def run(self): + if self.multi_camera: + multi_res = [] + for predictor, input in zip(self.predictor, self.input): + predictor.run(input) + collector_data = predictor.get_result() + multi_res.append(collector_data) + if self.enable_mtmct: + mtmct_process( + multi_res, + self.input, + mtmct_vis=self.vis_result, + output_dir=self.output_dir) + + else: + self.predictor.run(self.input) + + +def get_model_dir(cfg): + """ + Auto download inference model if the model_path is a url link. + Otherwise it will use the model_path directly. + """ + for key in cfg.keys(): + if type(cfg[key]) == dict and \ + ("enable" in cfg[key].keys() and cfg[key]['enable'] + or "enable" not in cfg[key].keys()): + + if "model_dir" in cfg[key].keys(): + model_dir = cfg[key]["model_dir"] + downloaded_model_dir = auto_download_model(model_dir) + if downloaded_model_dir: + model_dir = downloaded_model_dir + cfg[key]["model_dir"] = model_dir + print(key, " model dir: ", model_dir) + elif key == "VEHICLE_PLATE": + det_model_dir = cfg[key]["det_model_dir"] + downloaded_det_model_dir = auto_download_model(det_model_dir) + if downloaded_det_model_dir: + det_model_dir = downloaded_det_model_dir + cfg[key]["det_model_dir"] = det_model_dir + print("det_model_dir model dir: ", det_model_dir) + + rec_model_dir = cfg[key]["rec_model_dir"] + downloaded_rec_model_dir = auto_download_model(rec_model_dir) + if downloaded_rec_model_dir: + rec_model_dir = downloaded_rec_model_dir + cfg[key]["rec_model_dir"] = rec_model_dir + print("rec_model_dir model dir: ", rec_model_dir) + + elif key == "MOT": # for idbased and skeletonbased actions + model_dir = cfg[key]["model_dir"] + downloaded_model_dir = auto_download_model(model_dir) + if downloaded_model_dir: + model_dir = downloaded_model_dir + cfg[key]["model_dir"] = model_dir + print("mot_model_dir model_dir: ", model_dir) + + +class PipePredictor(object): + """ + Predictor in single camera + + The pipeline for image input: + + 1. Detection + 2. Detection -> Attribute + + The pipeline for video input: + + 1. Tracking + 2. Tracking -> Attribute + 3. Tracking -> KeyPoint -> SkeletonAction Recognition + 4. VideoAction Recognition + + Args: + args (argparse.Namespace): arguments in pipeline, which contains environment and runtime settings + cfg (dict): config of models in pipeline + is_video (bool): whether the input is video, default as False + multi_camera (bool): whether to use multi camera in pipeline, + default as False + """ + + def __init__(self, args, cfg, is_video=True, multi_camera=False): + # general module for pphuman and ppvehicle + self.with_mot = cfg.get('MOT', False)['enable'] if cfg.get( + 'MOT', False) else False + self.with_human_attr = cfg.get('ATTR', False)['enable'] if cfg.get( + 'ATTR', False) else False + if self.with_mot: + print('Multi-Object Tracking enabled') + if self.with_human_attr: + print('Human Attribute Recognition enabled') + + # only for pphuman + self.with_skeleton_action = cfg.get( + 'SKELETON_ACTION', False)['enable'] if cfg.get('SKELETON_ACTION', + False) else False + self.with_video_action = cfg.get( + 'VIDEO_ACTION', False)['enable'] if cfg.get('VIDEO_ACTION', + False) else False + self.with_idbased_detaction = cfg.get( + 'ID_BASED_DETACTION', False)['enable'] if cfg.get( + 'ID_BASED_DETACTION', False) else False + self.with_idbased_clsaction = cfg.get( + 'ID_BASED_CLSACTION', False)['enable'] if cfg.get( + 'ID_BASED_CLSACTION', False) else False + self.with_mtmct = cfg.get('REID', False)['enable'] if cfg.get( + 'REID', False) else False + + if self.with_skeleton_action: + print('SkeletonAction Recognition enabled') + if self.with_video_action: + print('VideoAction Recognition enabled') + if self.with_idbased_detaction: + print('IDBASED Detection Action Recognition enabled') + if self.with_idbased_clsaction: + print('IDBASED Classification Action Recognition enabled') + if self.with_mtmct: + print("MTMCT enabled") + + # only for ppvehicle + self.with_vehicleplate = cfg.get( + 'VEHICLE_PLATE', False)['enable'] if cfg.get('VEHICLE_PLATE', + False) else False + if self.with_vehicleplate: + print('Vehicle Plate Recognition enabled') + + self.with_vehicle_attr = cfg.get( + 'VEHICLE_ATTR', False)['enable'] if cfg.get('VEHICLE_ATTR', + False) else False + if self.with_vehicle_attr: + print('Vehicle Attribute Recognition enabled') + + self.with_vehicle_press = cfg.get( + 'VEHICLE_PRESSING', False)['enable'] if cfg.get('VEHICLE_PRESSING', + False) else False + if self.with_vehicle_press: + print('Vehicle Pressing Recognition enabled') + + self.with_vehicle_retrograde = cfg.get( + 'VEHICLE_RETROGRADE', False)['enable'] if cfg.get( + 'VEHICLE_RETROGRADE', False) else False + if self.with_vehicle_retrograde: + print('Vehicle Retrograde Recognition enabled') + + self.modebase = { + "framebased": False, + "videobased": False, + "idbased": False, + "skeletonbased": False + } + + self.basemode = { + "MOT": "idbased", + "ATTR": "idbased", + "VIDEO_ACTION": "videobased", + "SKELETON_ACTION": "skeletonbased", + "ID_BASED_DETACTION": "idbased", + "ID_BASED_CLSACTION": "idbased", + "REID": "idbased", + "VEHICLE_PLATE": "idbased", + "VEHICLE_ATTR": "idbased", + "VEHICLE_PRESSING": "idbased", + "VEHICLE_RETROGRADE": "idbased", + } + + self.is_video = is_video + self.multi_camera = multi_camera + self.cfg = cfg + + self.output_dir = args.output_dir + self.draw_center_traj = args.draw_center_traj + self.secs_interval = args.secs_interval + self.do_entrance_counting = args.do_entrance_counting + self.do_break_in_counting = args.do_break_in_counting + self.region_type = args.region_type + self.region_polygon = args.region_polygon + self.illegal_parking_time = args.illegal_parking_time + + self.warmup_frame = self.cfg['warmup_frame'] + self.pipeline_res = Result() + self.pipe_timer = PipeTimer() + self.file_name = None + self.collector = DataCollector() + + self.pushurl = args.pushurl + + # auto download inference model + get_model_dir(self.cfg) + + if self.with_vehicleplate: + vehicleplate_cfg = self.cfg['VEHICLE_PLATE'] + self.vehicleplate_detector = PlateRecognizer(args, vehicleplate_cfg) + basemode = self.basemode['VEHICLE_PLATE'] + self.modebase[basemode] = True + + if self.with_human_attr: + attr_cfg = self.cfg['ATTR'] + basemode = self.basemode['ATTR'] + self.modebase[basemode] = True + self.attr_predictor = AttrDetector.init_with_cfg(args, attr_cfg) + + if self.with_vehicle_attr: + vehicleattr_cfg = self.cfg['VEHICLE_ATTR'] + basemode = self.basemode['VEHICLE_ATTR'] + self.modebase[basemode] = True + self.vehicle_attr_predictor = VehicleAttr.init_with_cfg( + args, vehicleattr_cfg) + + if self.with_vehicle_press: + vehiclepress_cfg = self.cfg['VEHICLE_PRESSING'] + basemode = self.basemode['VEHICLE_PRESSING'] + self.modebase[basemode] = True + self.vehicle_press_predictor = VehiclePressingRecognizer( + vehiclepress_cfg) + + if self.with_vehicle_press or self.with_vehicle_retrograde: + laneseg_cfg = self.cfg['LANE_SEG'] + self.laneseg_predictor = LaneSegPredictor( + laneseg_cfg['lane_seg_config'], laneseg_cfg['model_dir']) + + if not is_video: + + det_cfg = self.cfg['DET'] + model_dir = det_cfg['model_dir'] + batch_size = det_cfg['batch_size'] + self.det_predictor = Detector( + model_dir, args.device, args.run_mode, batch_size, + args.trt_min_shape, args.trt_max_shape, args.trt_opt_shape, + args.trt_calib_mode, args.cpu_threads, args.enable_mkldnn) + else: + if self.with_idbased_detaction: + idbased_detaction_cfg = self.cfg['ID_BASED_DETACTION'] + basemode = self.basemode['ID_BASED_DETACTION'] + self.modebase[basemode] = True + + self.det_action_predictor = DetActionRecognizer.init_with_cfg( + args, idbased_detaction_cfg) + self.det_action_visual_helper = ActionVisualHelper(1) + + if self.with_idbased_clsaction: + idbased_clsaction_cfg = self.cfg['ID_BASED_CLSACTION'] + basemode = self.basemode['ID_BASED_CLSACTION'] + self.modebase[basemode] = True + + self.cls_action_predictor = ClsActionRecognizer.init_with_cfg( + args, idbased_clsaction_cfg) + self.cls_action_visual_helper = ActionVisualHelper(1) + + if self.with_skeleton_action: + skeleton_action_cfg = self.cfg['SKELETON_ACTION'] + display_frames = skeleton_action_cfg['display_frames'] + self.coord_size = skeleton_action_cfg['coord_size'] + basemode = self.basemode['SKELETON_ACTION'] + self.modebase[basemode] = True + skeleton_action_frames = skeleton_action_cfg['max_frames'] + + self.skeleton_action_predictor = SkeletonActionRecognizer.init_with_cfg( + args, skeleton_action_cfg) + self.skeleton_action_visual_helper = ActionVisualHelper( + display_frames) + + kpt_cfg = self.cfg['KPT'] + kpt_model_dir = kpt_cfg['model_dir'] + kpt_batch_size = kpt_cfg['batch_size'] + self.kpt_predictor = KeyPointDetector( + kpt_model_dir, + args.device, + args.run_mode, + kpt_batch_size, + args.trt_min_shape, + args.trt_max_shape, + args.trt_opt_shape, + args.trt_calib_mode, + args.cpu_threads, + args.enable_mkldnn, + use_dark=False) + self.kpt_buff = KeyPointBuff(skeleton_action_frames) + + if self.with_vehicleplate: + vehicleplate_cfg = self.cfg['VEHICLE_PLATE'] + self.vehicleplate_detector = PlateRecognizer(args, + vehicleplate_cfg) + basemode = self.basemode['VEHICLE_PLATE'] + self.modebase[basemode] = True + + if self.with_mtmct: + reid_cfg = self.cfg['REID'] + basemode = self.basemode['REID'] + self.modebase[basemode] = True + self.reid_predictor = ReID.init_with_cfg(args, reid_cfg) + + if self.with_vehicle_retrograde: + vehicleretrograde_cfg = self.cfg['VEHICLE_RETROGRADE'] + basemode = self.basemode['VEHICLE_RETROGRADE'] + self.modebase[basemode] = True + self.vehicle_retrograde_predictor = VehicleRetrogradeRecognizer( + vehicleretrograde_cfg) + + if self.with_mot or self.modebase["idbased"] or self.modebase[ + "skeletonbased"]: + mot_cfg = self.cfg['MOT'] + model_dir = mot_cfg['model_dir'] + tracker_config = mot_cfg['tracker_config'] + batch_size = mot_cfg['batch_size'] + skip_frame_num = mot_cfg.get('skip_frame_num', -1) + basemode = self.basemode['MOT'] + self.modebase[basemode] = True + self.mot_predictor = SDE_Detector( + model_dir, + tracker_config, + args.device, + args.run_mode, + batch_size, + args.trt_min_shape, + args.trt_max_shape, + args.trt_opt_shape, + args.trt_calib_mode, + args.cpu_threads, + args.enable_mkldnn, + skip_frame_num=skip_frame_num, + draw_center_traj=self.draw_center_traj, + secs_interval=self.secs_interval, + do_entrance_counting=self.do_entrance_counting, + do_break_in_counting=self.do_break_in_counting, + region_type=self.region_type, + region_polygon=self.region_polygon) + + if self.with_video_action: + video_action_cfg = self.cfg['VIDEO_ACTION'] + basemode = self.basemode['VIDEO_ACTION'] + self.modebase[basemode] = True + self.video_action_predictor = VideoActionRecognizer.init_with_cfg( + args, video_action_cfg) + + def set_file_name(self, path): + if type(path) == int: + self.file_name = path + elif path is not None: + self.file_name = os.path.split(path)[-1] + if "." in self.file_name: + self.file_name = self.file_name.split(".")[-2] + else: + # use camera id + self.file_name = None + + def get_result(self): + return self.collector.get_res() + + def run(self, input, thread_idx=0): + if self.is_video: + self.predict_video(input, thread_idx=thread_idx) + else: + self.predict_image(input) + self.pipe_timer.info() + self.mot_predictor.det_times.tracking_info(average=True) + + def predict_image(self, input): + # det + # det -> attr + batch_loop_cnt = math.ceil( + float(len(input)) / self.det_predictor.batch_size) + self.warmup_frame = min(10, len(input) // 2) - 1 + for i in range(batch_loop_cnt): + start_index = i * self.det_predictor.batch_size + end_index = min((i + 1) * self.det_predictor.batch_size, len(input)) + batch_file = input[start_index:end_index] + batch_input = [decode_image(f, {})[0] for f in batch_file] + + if i > self.warmup_frame: + self.pipe_timer.total_time.start() + self.pipe_timer.module_time['det'].start() + # det output format: class, score, xmin, ymin, xmax, ymax + det_res = self.det_predictor.predict_image( + batch_input, visual=False) + det_res = self.det_predictor.filter_box(det_res, + self.cfg['crop_thresh']) + if i > self.warmup_frame: + self.pipe_timer.module_time['det'].end() + self.pipe_timer.track_num += len(det_res['boxes']) + self.pipeline_res.update(det_res, 'det') + + if self.with_human_attr: + crop_inputs = crop_image_with_det(batch_input, det_res) + attr_res_list = [] + + if i > self.warmup_frame: + self.pipe_timer.module_time['attr'].start() + + for crop_input in crop_inputs: + attr_res = self.attr_predictor.predict_image( + crop_input, visual=False) + attr_res_list.extend(attr_res['output']) + + if i > self.warmup_frame: + self.pipe_timer.module_time['attr'].end() + + attr_res = {'output': attr_res_list} + self.pipeline_res.update(attr_res, 'attr') + + if self.with_vehicle_attr: + crop_inputs = crop_image_with_det(batch_input, det_res) + vehicle_attr_res_list = [] + + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].start() + + for crop_input in crop_inputs: + attr_res = self.vehicle_attr_predictor.predict_image( + crop_input, visual=False) + vehicle_attr_res_list.extend(attr_res['output']) + + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].end() + + attr_res = {'output': vehicle_attr_res_list} + self.pipeline_res.update(attr_res, 'vehicle_attr') + + if self.with_vehicleplate: + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].start() + crop_inputs = crop_image_with_det(batch_input, det_res) + platelicenses = [] + for crop_input in crop_inputs: + platelicense = self.vehicleplate_detector.get_platelicense( + crop_input) + platelicenses.extend(platelicense['plate']) + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].end() + vehicleplate_res = {'vehicleplate': platelicenses} + self.pipeline_res.update(vehicleplate_res, 'vehicleplate') + + if self.with_vehicle_press: + vehicle_press_res_list = [] + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicle_press'].start() + + lanes, direction = self.laneseg_predictor.run(batch_input) + if len(lanes) == 0: + print(" no lanes!") + continue + + lanes_res = {'output': lanes, 'direction': direction} + self.pipeline_res.update(lanes_res, 'lanes') + + vehicle_press_res_list = self.vehicle_press_predictor.run( + lanes, det_res) + vehiclepress_res = {'output': vehicle_press_res_list} + self.pipeline_res.update(vehiclepress_res, 'vehicle_press') + + self.pipe_timer.img_num += len(batch_input) + if i > self.warmup_frame: + self.pipe_timer.total_time.end() + + if self.cfg['visual']: + self.visualize_image(batch_file, batch_input, self.pipeline_res) + + def capturevideo(self, capture, queue): + frame_id = 0 + while (1): + if queue.full(): + time.sleep(0.1) + else: + ret, frame = capture.read() + if not ret: + return + frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) + queue.put(frame_rgb) + + def predict_video(self, video_file, thread_idx=0): + # mot + # mot -> attr + # mot -> pose -> action + capture = cv2.VideoCapture(video_file) + + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("video fps: %d, frame_count: %d" % (fps, frame_count)) + + if len(self.pushurl) > 0: + video_out_name = 'output' if self.file_name is None else self.file_name + pushurl = os.path.join(self.pushurl, video_out_name) + print("the result will push stream to url:{}".format(pushurl)) + pushstream = PushStream(pushurl) + pushstream.initcmd(fps, width, height) + elif self.cfg['visual']: + video_out_name = 'output' if ( + self.file_name is None or + type(self.file_name) == int) else self.file_name + if type(video_file) == str and "rtsp" in video_file: + video_out_name = video_out_name + "_t" + str(thread_idx).zfill( + 2) + "_rtsp" + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name + ".mp4") + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 0 + + entrance, records, center_traj = None, None, None + if self.draw_center_traj: + center_traj = [{}] + id_set = set() + interval_id_set = set() + in_id_list = list() + out_id_list = list() + prev_center = dict() + records = list() + if self.do_entrance_counting or self.do_break_in_counting or self.illegal_parking_time != -1: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' + + for i in range(0, len(self.region_polygon), 2): + entrance.append( + [self.region_polygon[i], self.region_polygon[i + 1]]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} unsupported.".format( + self.region_type)) + + video_fps = fps + + video_action_imgs = [] + + if self.with_video_action: + short_size = self.cfg["VIDEO_ACTION"]["short_size"] + scale = ShortSizeScale(short_size) + + object_in_region_info = { + } # store info for vehicle parking in region + illegal_parking_dict = None + cars_count = 0 + retrograde_traj_len = 0 + framequeue = queue.Queue(10) + + thread = threading.Thread( + target=self.capturevideo, args=(capture, framequeue)) + thread.start() + time.sleep(1) + + while (not framequeue.empty()): + if frame_id % 10 == 0: + print('Thread: {}; frame id: {}'.format(thread_idx, frame_id)) + + frame_rgb = framequeue.get() + if frame_id > self.warmup_frame: + self.pipe_timer.total_time.start() + + if self.modebase["idbased"] or self.modebase["skeletonbased"]: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['mot'].start() + + mot_skip_frame_num = self.mot_predictor.skip_frame_num + reuse_det_result = False + if mot_skip_frame_num > 1 and frame_id > 0 and frame_id % mot_skip_frame_num > 0: + reuse_det_result = True + res = self.mot_predictor.predict_image( + [copy.deepcopy(frame_rgb)], + visual=False, + reuse_det_result=reuse_det_result, + frame_count=frame_id) + + # mot output format: id, class, score, xmin, ymin, xmax, ymax + mot_res = parse_mot_res(res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['mot'].end() + self.pipe_timer.track_num += len(mot_res['boxes']) + + if frame_id % 10 == 0: + print("Thread: {}; trackid number: {}".format( + thread_idx, len(mot_res['boxes']))) + + # flow_statistic only support single class MOT + boxes, scores, ids = res[0] # batch size = 1 in MOT + mot_result = (frame_id + 1, boxes[0], scores[0], + ids[0]) # single class + statistic = flow_statistic( + mot_result, + self.secs_interval, + self.do_entrance_counting, + self.do_break_in_counting, + self.region_type, + video_fps, + entrance, + id_set, + interval_id_set, + in_id_list, + out_id_list, + prev_center, + records, + ids2names=self.mot_predictor.pred_config.labels) + records = statistic['records'] + + if self.illegal_parking_time != -1: + object_in_region_info, illegal_parking_dict = update_object_info( + object_in_region_info, mot_result, self.region_type, + entrance, video_fps, self.illegal_parking_time) + if len(illegal_parking_dict) != 0: + # build relationship between id and plate + for key, value in illegal_parking_dict.items(): + plate = self.collector.get_carlp(key) + illegal_parking_dict[key]['plate'] = plate + + # nothing detected + if len(mot_res['boxes']) == 0: + frame_id += 1 + if frame_id > self.warmup_frame: + self.pipe_timer.img_num += 1 + self.pipe_timer.total_time.end() + if self.cfg['visual']: + _, _, fps = self.pipe_timer.get_total_time() + im = self.visualize_video(frame_rgb, mot_res, frame_id, + fps, entrance, records, + center_traj) # visualize + if len(self.pushurl) > 0: + pushstream.pipe.stdin.write(im.tobytes()) + else: + writer.write(im) + if self.file_name is None: # use camera_id + cv2.imshow('Paddle-Pipeline', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + continue + + self.pipeline_res.update(mot_res, 'mot') + crop_input, new_bboxes, ori_bboxes = crop_image_with_mot( + frame_rgb, mot_res) + + if self.with_vehicleplate and frame_id % 10 == 0: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].start() + plate_input, _, _ = crop_image_with_mot( + frame_rgb, mot_res, expand=False) + platelicense = self.vehicleplate_detector.get_platelicense( + plate_input) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].end() + self.pipeline_res.update(platelicense, 'vehicleplate') + else: + self.pipeline_res.clear('vehicleplate') + + if self.with_human_attr: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['attr'].start() + attr_res = self.attr_predictor.predict_image( + crop_input, visual=False) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['attr'].end() + self.pipeline_res.update(attr_res, 'attr') + + if self.with_vehicle_attr: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].start() + attr_res = self.vehicle_attr_predictor.predict_image( + crop_input, visual=False) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].end() + self.pipeline_res.update(attr_res, 'vehicle_attr') + + if self.with_vehicle_press or self.with_vehicle_retrograde: + if frame_id == 0 or cars_count == 0 or cars_count > len( + mot_res['boxes']): + + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['lanes'].start() + lanes, directions = self.laneseg_predictor.run( + [copy.deepcopy(frame_rgb)]) + lanes_res = {'output': lanes, 'directions': directions} + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['lanes'].end() + + if frame_id == 0 or (len(lanes) > 0 and frame_id > 0): + self.pipeline_res.update(lanes_res, 'lanes') + + cars_count = len(mot_res['boxes']) + + if self.with_vehicle_press: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_press'].start() + press_lane = copy.deepcopy(self.pipeline_res.get('lanes')) + if press_lane is None: + continue + + vehicle_press_res_list = self.vehicle_press_predictor.mot_run( + press_lane, mot_res['boxes']) + vehiclepress_res = {'output': vehicle_press_res_list} + + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_press'].end() + + self.pipeline_res.update(vehiclepress_res, 'vehicle_press') + + if self.with_idbased_detaction: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['det_action'].start() + det_action_res = self.det_action_predictor.predict( + crop_input, mot_res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['det_action'].end() + self.pipeline_res.update(det_action_res, 'det_action') + + if self.cfg['visual']: + self.det_action_visual_helper.update(det_action_res) + + if self.with_idbased_clsaction: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['cls_action'].start() + cls_action_res = self.cls_action_predictor.predict_with_mot( + crop_input, mot_res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['cls_action'].end() + self.pipeline_res.update(cls_action_res, 'cls_action') + + if self.cfg['visual']: + self.cls_action_visual_helper.update(cls_action_res) + + if self.with_skeleton_action: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['kpt'].start() + kpt_pred = self.kpt_predictor.predict_image( + crop_input, visual=False) + keypoint_vector, score_vector = translate_to_ori_images( + kpt_pred, np.array(new_bboxes)) + kpt_res = {} + kpt_res['keypoint'] = [ + keypoint_vector.tolist(), score_vector.tolist() + ] if len(keypoint_vector) > 0 else [[], []] + kpt_res['bbox'] = ori_bboxes + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['kpt'].end() + + self.pipeline_res.update(kpt_res, 'kpt') + + self.kpt_buff.update(kpt_res, mot_res) # collect kpt output + state = self.kpt_buff.get_state( + ) # whether frame num is enough or lost tracker + + skeleton_action_res = {} + if state: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time[ + 'skeleton_action'].start() + collected_keypoint = self.kpt_buff.get_collected_keypoint( + ) # reoragnize kpt output with ID + skeleton_action_input = parse_mot_keypoint( + collected_keypoint, self.coord_size) + skeleton_action_res = self.skeleton_action_predictor.predict_skeleton_with_mot( + skeleton_action_input) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['skeleton_action'].end() + self.pipeline_res.update(skeleton_action_res, + 'skeleton_action') + + if self.cfg['visual']: + self.skeleton_action_visual_helper.update( + skeleton_action_res) + + if self.with_mtmct and frame_id % 10 == 0: + crop_input, img_qualities, rects = self.reid_predictor.crop_image_with_mot( + frame_rgb, mot_res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['reid'].start() + reid_res = self.reid_predictor.predict_batch(crop_input) + + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['reid'].end() + + reid_res_dict = { + 'features': reid_res, + "qualities": img_qualities, + "rects": rects + } + self.pipeline_res.update(reid_res_dict, 'reid') + else: + self.pipeline_res.clear('reid') + + if self.with_video_action: + # get the params + frame_len = self.cfg["VIDEO_ACTION"]["frame_len"] + sample_freq = self.cfg["VIDEO_ACTION"]["sample_freq"] + + if sample_freq * frame_len > frame_count: # video is too short + sample_freq = int(frame_count / frame_len) + + # filter the warmup frames + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['video_action'].start() + + # collect frames + if frame_id % sample_freq == 0: + # Scale image + scaled_img = scale(frame_rgb) + video_action_imgs.append(scaled_img) + + # the number of collected frames is enough to predict video action + if len(video_action_imgs) == frame_len: + classes, scores = self.video_action_predictor.predict( + video_action_imgs) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['video_action'].end() + + video_action_res = {"class": classes[0], "score": scores[0]} + self.pipeline_res.update(video_action_res, 'video_action') + + print("video_action_res:", video_action_res) + + video_action_imgs.clear() # next clip + + if self.with_vehicle_retrograde: + # get the params + frame_len = self.cfg["VEHICLE_RETROGRADE"]["frame_len"] + sample_freq = self.cfg["VEHICLE_RETROGRADE"]["sample_freq"] + + if sample_freq * frame_len > frame_count: # video is too short + sample_freq = int(frame_count / frame_len) + + # filter the warmup frames + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_retrograde'].start() + + if frame_id % sample_freq == 0: + + frame_mot_res = copy.deepcopy(self.pipeline_res.get('mot')) + self.vehicle_retrograde_predictor.update_center_traj( + frame_mot_res, max_len=frame_len) + retrograde_traj_len = retrograde_traj_len + 1 + + #the number of collected frames is enough to predict + if retrograde_traj_len == frame_len: + retrograde_mot_res = copy.deepcopy( + self.pipeline_res.get('mot')) + retrograde_lanes = copy.deepcopy( + self.pipeline_res.get('lanes')) + frame_shape = frame_rgb.shape + + if retrograde_lanes is None: + continue + retrograde_res, fence_line = self.vehicle_retrograde_predictor.mot_run( + lanes_res=retrograde_lanes, + det_res=retrograde_mot_res, + frame_shape=frame_shape) + + retrograde_res_update = self.pipeline_res.get( + 'vehicle_retrograde') + + if retrograde_res_update is not None: + retrograde_res_update = retrograde_res_update['output'] + if retrograde_res is not None: + for retrograde_res_id in retrograde_res: + if retrograde_res_id not in retrograde_res_update: + retrograde_res_update.append( + retrograde_res_id) + else: + retrograde_res_update = [] + + retrograde_res_dict = { + 'output': retrograde_res_update, + "fence_line": fence_line, + } + + if retrograde_res is not None and len(retrograde_res) > 0: + print("retrograde res:", retrograde_res) + + self.pipeline_res.update(retrograde_res_dict, + 'vehicle_retrograde') + + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_retrograde'].end() + + retrograde_traj_len = 0 + + self.collector.append(frame_id, self.pipeline_res) + + if frame_id > self.warmup_frame: + self.pipe_timer.img_num += 1 + self.pipe_timer.total_time.end() + frame_id += 1 + + if self.cfg['visual']: + _, _, fps = self.pipe_timer.get_total_time() + + im = self.visualize_video(frame_rgb, self.pipeline_res, + self.collector, frame_id, fps, + entrance, records, center_traj, + self.illegal_parking_time != -1, + illegal_parking_dict) # visualize + if len(self.pushurl) > 0: + pushstream.pipe.stdin.write(im.tobytes()) + else: + writer.write(im) + if self.file_name is None: # use camera_id + cv2.imshow('Paddle-Pipeline', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + + if self.cfg['visual'] and len(self.pushurl) == 0: + writer.release() + print('save result to {}'.format(out_path)) + + def visualize_video(self, + image_rgb, + result, + collector, + frame_id, + fps, + entrance=None, + records=None, + center_traj=None, + do_illegal_parking_recognition=False, + illegal_parking_dict=None): + image = cv2.cvtColor(image_rgb, cv2.COLOR_RGB2BGR) + mot_res = copy.deepcopy(result.get('mot')) + + if mot_res is not None: + ids = mot_res['boxes'][:, 0] + scores = mot_res['boxes'][:, 2] + boxes = mot_res['boxes'][:, 3:] + boxes[:, 2] = boxes[:, 2] - boxes[:, 0] + boxes[:, 3] = boxes[:, 3] - boxes[:, 1] + else: + boxes = np.zeros([0, 4]) + ids = np.zeros([0]) + scores = np.zeros([0]) + + # single class, still need to be defaultdict type for ploting + num_classes = 1 + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + online_tlwhs[0] = boxes + online_scores[0] = scores + online_ids[0] = ids + + if mot_res is not None: + image = plot_tracking_dict( + image, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=self.mot_predictor.pred_config.labels, + do_entrance_counting=self.do_entrance_counting, + do_break_in_counting=self.do_break_in_counting, + do_illegal_parking_recognition=do_illegal_parking_recognition, + illegal_parking_dict=illegal_parking_dict, + entrance=entrance, + records=records, + center_traj=center_traj) + + human_attr_res = result.get('attr') + if human_attr_res is not None: + boxes = mot_res['boxes'][:, 1:] + human_attr_res = human_attr_res['output'] + image = visualize_attr(image, human_attr_res, boxes) + image = np.array(image) + + vehicle_attr_res = result.get('vehicle_attr') + if vehicle_attr_res is not None: + boxes = mot_res['boxes'][:, 1:] + vehicle_attr_res = vehicle_attr_res['output'] + image = visualize_attr(image, vehicle_attr_res, boxes) + image = np.array(image) + + lanes_res = result.get('lanes') + if lanes_res is not None: + lanes = lanes_res['output'][0] + image = visualize_lane(image, lanes) + image = np.array(image) + + vehiclepress_res = result.get('vehicle_press') + if vehiclepress_res is not None: + press_vehicle = vehiclepress_res['output'] + if len(press_vehicle) > 0: + image = visualize_vehiclepress( + image, press_vehicle, threshold=self.cfg['crop_thresh']) + image = np.array(image) + + if mot_res is not None: + vehicleplate = False + plates = [] + for trackid in mot_res['boxes'][:, 0]: + plate = collector.get_carlp(trackid) + if plate != None: + vehicleplate = True + plates.append(plate) + else: + plates.append("") + if vehicleplate: + boxes = mot_res['boxes'][:, 1:] + image = visualize_vehicleplate(image, plates, boxes) + image = np.array(image) + + kpt_res = result.get('kpt') + if kpt_res is not None: + image = visualize_pose( + image, + kpt_res, + visual_thresh=self.cfg['kpt_thresh'], + returnimg=True) + + video_action_res = result.get('video_action') + if video_action_res is not None: + video_action_score = None + if video_action_res and video_action_res["class"] == 1: + video_action_score = video_action_res["score"] + mot_boxes = None + if mot_res: + mot_boxes = mot_res['boxes'] + image = visualize_action( + image, + mot_boxes, + action_visual_collector=None, + action_text="SkeletonAction", + video_action_score=video_action_score, + video_action_text="Fight") + + vehicle_retrograde_res = result.get('vehicle_retrograde') + if vehicle_retrograde_res is not None: + mot_retrograde_res = copy.deepcopy(result.get('mot')) + image = visualize_vehicle_retrograde(image, mot_retrograde_res, + vehicle_retrograde_res) + image = np.array(image) + + visual_helper_for_display = [] + action_to_display = [] + + skeleton_action_res = result.get('skeleton_action') + if skeleton_action_res is not None: + visual_helper_for_display.append(self.skeleton_action_visual_helper) + action_to_display.append("Falling") + + det_action_res = result.get('det_action') + if det_action_res is not None: + visual_helper_for_display.append(self.det_action_visual_helper) + action_to_display.append("Smoking") + + cls_action_res = result.get('cls_action') + if cls_action_res is not None: + visual_helper_for_display.append(self.cls_action_visual_helper) + action_to_display.append("Calling") + + if len(visual_helper_for_display) > 0: + image = visualize_action(image, mot_res['boxes'], + visual_helper_for_display, + action_to_display) + + return image + + def visualize_image(self, im_files, images, result): + start_idx, boxes_num_i = 0, 0 + det_res = result.get('det') + human_attr_res = result.get('attr') + vehicle_attr_res = result.get('vehicle_attr') + vehicleplate_res = result.get('vehicleplate') + lanes_res = result.get('lanes') + vehiclepress_res = result.get('vehicle_press') + + for i, (im_file, im) in enumerate(zip(im_files, images)): + if det_res is not None: + det_res_i = {} + boxes_num_i = det_res['boxes_num'][i] + det_res_i['boxes'] = det_res['boxes'][start_idx:start_idx + + boxes_num_i, :] + im = visualize_box_mask( + im, + det_res_i, + labels=['target'], + threshold=self.cfg['crop_thresh']) + im = np.ascontiguousarray(np.copy(im)) + im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + if human_attr_res is not None: + human_attr_res_i = human_attr_res['output'][start_idx:start_idx + + boxes_num_i] + im = visualize_attr(im, human_attr_res_i, det_res_i['boxes']) + if vehicle_attr_res is not None: + vehicle_attr_res_i = vehicle_attr_res['output'][ + start_idx:start_idx + boxes_num_i] + im = visualize_attr(im, vehicle_attr_res_i, det_res_i['boxes']) + if vehicleplate_res is not None: + plates = vehicleplate_res['vehicleplate'] + det_res_i['boxes'][:, 4:6] = det_res_i[ + 'boxes'][:, 4:6] - det_res_i['boxes'][:, 2:4] + im = visualize_vehicleplate(im, plates, det_res_i['boxes']) + if vehiclepress_res is not None: + press_vehicle = vehiclepress_res['output'][i] + if len(press_vehicle) > 0: + im = visualize_vehiclepress( + im, press_vehicle, threshold=self.cfg['crop_thresh']) + im = np.ascontiguousarray(np.copy(im)) + if lanes_res is not None: + lanes = lanes_res['output'][i] + im = visualize_lane(im, lanes) + im = np.ascontiguousarray(np.copy(im)) + + img_name = os.path.split(im_file)[-1] + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, img_name) + cv2.imwrite(out_path, im) + print("save result to: " + out_path) + start_idx += boxes_num_i + + +def main(): + cfg = merge_cfg(FLAGS) # use command params to update config + print_arguments(cfg) + + pipeline = Pipeline(FLAGS, cfg) + # pipeline.run() + pipeline.run_multithreads() + + +if __name__ == '__main__': + paddle.enable_static() + + # parse params from command + parser = argsparser() + FLAGS = parser.parse_args() + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_infer.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..45c04ad5198c92466764eb58696921c9f11c8bc9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_infer.py @@ -0,0 +1,691 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob + +import cv2 +import numpy as np +import math +import paddle +import sys +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from paddle.inference import Config, create_predictor +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from attr_infer import AttrDetector + + +class SkeletonActionRecognizer(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + threshold (float): The threshold of score for visualization + window_size(int): Temporal size of skeleton feature. + random_pad (bool): Whether do random padding when frame length < window_size. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + window_size=100, + random_pad=False): + assert batch_size == 1, "SkeletonActionRecognizer only support batch_size=1 now." + super(SkeletonActionRecognizer, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, + delete_shuffle_pass=True) + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + window_size=cfg['max_frames'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeat number for prediction + Returns: + results (dict): + ''' + # model prediction + output_names = self.predictor.get_output_names() + for i in range(repeats): + self.predictor.run() + output_tensor = self.predictor.get_output_handle(output_names[0]) + np_output = output_tensor.copy_to_cpu() + result = dict(output=np_output) + return result + + def predict_skeleton(self, skeleton_list, run_benchmark=False, repeats=1): + results = [] + for i, skeleton in enumerate(skeleton_list): + if run_benchmark: + # preprocess + inputs = self.preprocess(skeleton) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(skeleton) + self.det_times.preprocess_time_s.end() + + # model prediction + result = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(skeleton) + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(skeleton) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(skeleton) + + results.append(result) + return results + + def predict_skeleton_with_mot(self, skeleton_with_mot, run_benchmark=False): + """ + skeleton_with_mot (dict): includes individual skeleton sequences, which shape is [C, T, K, 1] + and its corresponding track id. + """ + + skeleton_list = skeleton_with_mot["skeleton"] + mot_id = skeleton_with_mot["mot_id"] + act_res = self.predict_skeleton(skeleton_list, run_benchmark, repeats=1) + results = list(zip(mot_id, act_res)) + return results + + def preprocess(self, data): + preprocess_ops = [] + for op_info in self.pred_config.preprocess_infos: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + preprocess_ops.append(eval(op_type)(**new_op_info)) + + input_lst = [] + data = action_preprocess(data, preprocess_ops) + input_lst.append(data) + input_names = self.predictor.get_input_names() + inputs = {} + inputs['data_batch_0'] = np.stack(input_lst, axis=0).astype('float32') + + for i in range(len(input_names)): + input_tensor = self.predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(inputs[input_names[i]]) + + return inputs + + def postprocess(self, inputs, result): + # postprocess output of predictor + output_logit = result['output'][0] + classes = np.argpartition(output_logit, -1)[-1:] + classes = classes[np.argsort(-output_logit[classes])] + scores = output_logit[classes] + result = {'class': classes, 'score': scores} + return result + + +def action_preprocess(input, preprocess_ops): + """ + input (str | numpy.array): if input is str, it should be a legal file path with numpy array saved. + Otherwise it should be numpy.array as direct input. + return (numpy.array) + """ + if isinstance(input, str): + assert os.path.isfile(input) is not None, "{0} not exists".format(input) + data = np.load(input) + else: + data = input + for operator in preprocess_ops: + data = operator(data) + return data + + +class AutoPadding(object): + """ + Sample or Padding frame skeleton feature. + Args: + window_size (int): Temporal size of skeleton feature. + random_pad (bool): Whether do random padding when frame length < window size. Default: False. + """ + + def __init__(self, window_size=100, random_pad=False): + self.window_size = window_size + self.random_pad = random_pad + + def get_frame_num(self, data): + C, T, V, M = data.shape + for i in range(T - 1, -1, -1): + tmp = np.sum(data[:, i, :, :]) + if tmp > 0: + T = i + 1 + break + return T + + def __call__(self, results): + data = results + + C, T, V, M = data.shape + T = self.get_frame_num(data) + if T == self.window_size: + data_pad = data[:, :self.window_size, :, :] + elif T < self.window_size: + begin = random.randint( + 0, self.window_size - T) if self.random_pad else 0 + data_pad = np.zeros((C, self.window_size, V, M)) + data_pad[:, begin:begin + T, :, :] = data[:, :T, :, :] + else: + if self.random_pad: + index = np.random.choice( + T, self.window_size, replace=False).astype('int64') + else: + index = np.linspace(0, T, self.window_size).astype("int64") + data_pad = data[:, index, :, :] + + return data_pad + + +def get_test_skeletons(input_file): + assert input_file is not None, "--action_file can not be None" + input_data = np.load(input_file) + if input_data.ndim == 4: + return [input_data] + elif input_data.ndim == 5: + output = list( + map(lambda x: np.squeeze(x, 0), + np.split(input_data, input_data.shape[0], 0))) + return output + else: + raise ValueError( + "Now only support input with shape: (N, C, T, K, M) or (C, T, K, M)") + + +class DetActionRecognizer(object): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + threshold (float): The threshold of score for action feature object detection. + display_frames (int): The duration for corresponding detected action. + skip_frame_num (int): The number of frames for interval prediction. A skipped frame will + reuse the result of its last frame. If it is set to 0, no frame will be skipped. Default + is 0. + + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + display_frames=20, + skip_frame_num=0): + super(DetActionRecognizer, self).__init__() + self.detector = Detector( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold) + self.threshold = threshold + self.frame_life = display_frames + self.result_history = {} + self.skip_frame_num = skip_frame_num + self.skip_frame_cnt = 0 + self.id_in_last_frame = [] + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + threshold=cfg['threshold'], + display_frames=cfg['display_frames'], + skip_frame_num=cfg['skip_frame_num'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def predict(self, images, mot_result): + if self.skip_frame_cnt == 0 or (not self.check_id_is_same(mot_result)): + det_result = self.detector.predict_image(images, visual=False) + result = self.postprocess(det_result, mot_result) + else: + result = self.reuse_result(mot_result) + + self.skip_frame_cnt += 1 + if self.skip_frame_cnt >= self.skip_frame_num: + self.skip_frame_cnt = 0 + + return result + + def postprocess(self, det_result, mot_result): + np_boxes_num = det_result['boxes_num'] + if np_boxes_num[0] <= 0: + return [[], []] + + mot_bboxes = mot_result.get('boxes') + + cur_box_idx = 0 + mot_id = [] + act_res = [] + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + + # Current now, class 0 is positive, class 1 is negative. + action_ret = {'class': 1.0, 'score': -1.0} + box_num = np_boxes_num[idx] + boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] + cur_box_idx += box_num + isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) + valid_boxes = boxes[isvalid, :] + + if valid_boxes.shape[0] >= 1: + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] + else: + history_det, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + action_ret['class'] = history_det + action_ret['score'] = -1.0 + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[tracker_id] = [ + history_det, life_remain, history_score + ] + + mot_id.append(tracker_id) + act_res.append(action_ret) + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + def check_id_is_same(self, mot_result): + mot_bboxes = mot_result.get('boxes') + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + if tracker_id not in self.id_in_last_frame: + return False + return True + + def reuse_result(self, mot_result): + # This function reusing previous results of the same ID directly. + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, 0, -1.0]) + + life_remain -= 1 + if tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + + action_ret = {'class': history_cls, 'score': history_score} + mot_id.append(tracker_id) + act_res.append(action_ret) + + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + +class ClsActionRecognizer(AttrDetector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + threshold (float): The threshold of score for action feature object detection. + display_frames (int): The duration for corresponding detected action. + skip_frame_num (int): The number of frames for interval prediction. A skipped frame will + reuse the result of its last frame. If it is set to 0, no frame will be skipped. Default + is 0. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + display_frames=80, + skip_frame_num=0): + super(ClsActionRecognizer, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold) + self.threshold = threshold + self.frame_life = display_frames + self.result_history = {} + self.skip_frame_num = skip_frame_num + self.skip_frame_cnt = 0 + self.id_in_last_frame = [] + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + threshold=cfg['threshold'], + display_frames=cfg['display_frames'], + skip_frame_num=cfg['skip_frame_num'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def predict_with_mot(self, images, mot_result): + if self.skip_frame_cnt == 0 or (not self.check_id_is_same(mot_result)): + images = self.crop_half_body(images) + cls_result = self.predict_image(images, visual=False)["output"] + result = self.match_action_with_id(cls_result, mot_result) + else: + result = self.reuse_result(mot_result) + + self.skip_frame_cnt += 1 + if self.skip_frame_cnt >= self.skip_frame_num: + self.skip_frame_cnt = 0 + + return result + + def crop_half_body(self, images): + crop_images = [] + for image in images: + h = image.shape[0] + crop_images.append(image[:h // 2 + 1, :, :]) + return crop_images + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + action_res = res.tolist() + for cid, score in enumerate(action_res): + action_res[cid] = score + batch_res.append(action_res) + result = {'output': batch_res} + return result + + def match_action_with_id(self, cls_result, mot_result): + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + + cls_id_res = 1 + cls_score_res = -1.0 + for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + + # Current now, class 0 is positive, class 1 is negative. + if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] + else: + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + action_ret = {'class': cls_id_res, 'score': cls_score_res} + mot_id.append(tracker_id) + act_res.append(action_ret) + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + def check_id_is_same(self, mot_result): + mot_bboxes = mot_result.get('boxes') + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + if tracker_id not in self.id_in_last_frame: + return False + return True + + def reuse_result(self, mot_result): + # This function reusing previous results of the same ID directly. + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, 0, -1.0]) + + life_remain -= 1 + if tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + + action_ret = {'class': history_cls, 'score': history_score} + mot_id.append(tracker_id) + act_res.append(action_ret) + + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + +def main(): + detector = SkeletonActionRecognizer( + FLAGS.model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.threshold, + output_dir=FLAGS.output_dir, + window_size=FLAGS.window_size, + random_pad=FLAGS.random_pad) + # predict from numpy array + input_list = get_test_skeletons(FLAGS.action_file) + detector.predict_skeleton(input_list, FLAGS.run_benchmark, repeats=10) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mems = { + 'cpu_rss_mb': detector.cpu_mem / len(input_list), + 'gpu_rss_mb': detector.gpu_mem / len(input_list), + 'gpu_util': detector.gpu_util * 100 / len(input_list) + } + + perf_info = detector.det_times.report(average=True) + model_dir = FLAGS.model_dir + mode = FLAGS.run_mode + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + data_info = { + 'batch_size': FLAGS.batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + det_log = PaddleInferBenchmark(detector.config, model_info, data_info, + perf_info, mems) + det_log('SkeletonAction') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_utils.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..483116584e1e5e52aced38dd10ff170014a1b439 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/action_utils.py @@ -0,0 +1,114 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +class KeyPointSequence(object): + def __init__(self, max_size=100): + self.frames = 0 + self.kpts = [] + self.bboxes = [] + self.max_size = max_size + + def save(self, kpt, bbox): + self.kpts.append(kpt) + self.bboxes.append(bbox) + self.frames += 1 + if self.frames == self.max_size: + return True + return False + + +class KeyPointBuff(object): + def __init__(self, max_size=100): + self.flag_track_interrupt = False + self.keypoint_saver = dict() + self.max_size = max_size + self.id_to_pop = set() + self.flag_to_pop = False + + def get_state(self): + return self.flag_to_pop + + def update(self, kpt_res, mot_res): + kpts = kpt_res.get('keypoint')[0] + bboxes = kpt_res.get('bbox') + mot_bboxes = mot_res.get('boxes') + updated_id = set() + + for idx in range(len(kpts)): + tracker_id = mot_bboxes[idx, 0] + updated_id.add(tracker_id) + + kpt_seq = self.keypoint_saver.get(tracker_id, + KeyPointSequence(self.max_size)) + is_full = kpt_seq.save(kpts[idx], bboxes[idx]) + self.keypoint_saver[tracker_id] = kpt_seq + + #Scene1: result should be popped when frames meet max size + if is_full: + self.id_to_pop.add(tracker_id) + self.flag_to_pop = True + + #Scene2: result of a lost tracker should be popped + interrupted_id = set(self.keypoint_saver.keys()) - updated_id + if len(interrupted_id) > 0: + self.flag_to_pop = True + self.id_to_pop.update(interrupted_id) + + def get_collected_keypoint(self): + """ + Output (List): List of keypoint results for Skeletonbased Recognition task, where + the format of each element is [tracker_id, KeyPointSequence of tracker_id] + """ + output = [] + for tracker_id in self.id_to_pop: + output.append([tracker_id, self.keypoint_saver[tracker_id]]) + del (self.keypoint_saver[tracker_id]) + self.flag_to_pop = False + self.id_to_pop.clear() + return output + + +class ActionVisualHelper(object): + def __init__(self, frame_life=20): + self.frame_life = frame_life + self.action_history = {} + + def get_visualize_ids(self): + id_detected = self.check_detected() + return id_detected + + def check_detected(self): + id_detected = set() + deperate_id = [] + for mot_id in self.action_history: + self.action_history[mot_id]["life_remain"] -= 1 + if int(self.action_history[mot_id]["class"]) == 0: + id_detected.add(mot_id) + if self.action_history[mot_id]["life_remain"] == 0: + deperate_id.append(mot_id) + for mot_id in deperate_id: + del (self.action_history[mot_id]) + return id_detected + + def update(self, action_res_list): + for mot_id, action_res in action_res_list: + if mot_id in self.action_history: + if int(action_res["class"]) != 0 and int(self.action_history[ + mot_id]["class"]) == 0: + continue + action_info = self.action_history.get(mot_id, {}) + action_info["class"] = action_res["class"] + action_info["life_remain"] = self.frame_life + self.action_history[mot_id] = action_info diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/attr_infer.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/attr_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..bf9e80bec402d3a64016b7b93edbce40474603d5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/attr_infer.py @@ -0,0 +1,348 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +from functools import reduce + +import cv2 +import numpy as np +import math +import paddle +from paddle.inference import Config +from paddle.inference import create_predictor + +import sys +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) +sys.path.insert(0, parent_path) + +from python.benchmark_utils import PaddleInferBenchmark +from python.preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine +from python.visualize import visualize_attr +from python.utils import argsparser, Timer, get_current_memory_mb +from python.infer import Detector, get_test_images, print_arguments, load_predictor + +from PIL import Image, ImageDraw, ImageFont + + +class AttrDetector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (str): The path of output + threshold (float): The threshold of score for visualization + """ + + def __init__( + self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, ): + super(AttrDetector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def get_label(self): + return self.pred_config.labels + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + + labels = self.pred_config.labels + age_list = ['AgeLess18', 'Age18-60', 'AgeOver60'] + direct_list = ['Front', 'Side', 'Back'] + bag_list = ['HandBag', 'ShoulderBag', 'Backpack'] + upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice'] + lower_list = [ + 'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts', + 'Skirt&Dress' + ] + glasses_threshold = 0.3 + hold_threshold = 0.6 + batch_res = [] + for res in im_results: + res = res.tolist() + label_res = [] + # gender + gender = 'Female' if res[22] > self.threshold else 'Male' + label_res.append(gender) + # age + age = age_list[np.argmax(res[19:22])] + label_res.append(age) + # direction + direction = direct_list[np.argmax(res[23:])] + label_res.append(direction) + # glasses + glasses = 'Glasses: ' + if res[1] > glasses_threshold: + glasses += 'True' + else: + glasses += 'False' + label_res.append(glasses) + # hat + hat = 'Hat: ' + if res[0] > self.threshold: + hat += 'True' + else: + hat += 'False' + label_res.append(hat) + # hold obj + hold_obj = 'HoldObjectsInFront: ' + if res[18] > hold_threshold: + hold_obj += 'True' + else: + hold_obj += 'False' + label_res.append(hold_obj) + # bag + bag = bag_list[np.argmax(res[15:18])] + bag_score = res[15 + np.argmax(res[15:18])] + bag_label = bag if bag_score > self.threshold else 'No bag' + label_res.append(bag_label) + # upper + upper_label = 'Upper:' + sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve' + upper_label += ' {}'.format(sleeve) + upper_res = res[4:8] + if np.max(upper_res) > self.threshold: + upper_label += ' {}'.format(upper_list[np.argmax(upper_res)]) + label_res.append(upper_label) + # lower + lower_res = res[8:14] + lower_label = 'Lower: ' + has_lower = False + for i, l in enumerate(lower_res): + if l > self.threshold: + lower_label += ' {}'.format(lower_list[i]) + has_lower = True + if not has_lower: + lower_label += ' {}'.format(lower_list[np.argmax(lower_res)]) + + label_res.append(lower_label) + # shoe + shoe = 'Boots' if res[14] > self.threshold else 'No boots' + label_res.append(shoe) + + batch_res.append(label_res) + result = {'output': batch_res} + return result + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + MaskRCNN's result include 'masks': np.ndarray: + shape: [N, im_h, im_w] + ''' + # model prediction + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + output_tensor = self.predictor.get_output_handle(output_names[0]) + np_output = output_tensor.copy_to_cpu() + result = dict(output=np_output) + return result + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True): + batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size) + results = [] + for i in range(batch_loop_cnt): + start_index = i * self.batch_size + end_index = min((i + 1) * self.batch_size, len(image_list)) + batch_image_list = image_list[start_index:end_index] + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + if visual: + visualize( + batch_image_list, result, output_dir=self.output_dir) + + results.append(result) + if visual: + print('Test iter {}'.format(i)) + + results = self.merge_batch_result(results) + return results + + def merge_batch_result(self, batch_result): + if len(batch_result) == 1: + return batch_result[0] + res_key = batch_result[0].keys() + results = {k: [] for k in res_key} + for res in batch_result: + for k, v in res.items(): + results[k].extend(v) + return results + + +def visualize(image_list, batch_res, output_dir='output'): + + # visualize the predict result + batch_res = batch_res['output'] + for image_file, res in zip(image_list, batch_res): + im = visualize_attr(image_file, [res]) + if not os.path.exists(output_dir): + os.makedirs(output_dir) + img_name = os.path.split(image_file)[-1] + out_path = os.path.join(output_dir, img_name) + cv2.imwrite(out_path, im) + print("save result to: " + out_path) + + +def main(): + detector = AttrDetector( + FLAGS.model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.threshold, + output_dir=FLAGS.output_dir) + + # predict from image + if FLAGS.image_dir is None and FLAGS.image_file is not None: + assert FLAGS.batch_size == 1, "batch_size should be 1, when image_file is not None" + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mems = { + 'cpu_rss_mb': detector.cpu_mem / len(img_list), + 'gpu_rss_mb': detector.gpu_mem / len(img_list), + 'gpu_util': detector.gpu_util * 100 / len(img_list) + } + + perf_info = detector.det_times.report(average=True) + model_dir = FLAGS.model_dir + mode = FLAGS.run_mode + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + data_info = { + 'batch_size': FLAGS.batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + det_log = PaddleInferBenchmark(detector.config, model_info, data_info, + perf_info, mems) + det_log('Attr') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/mtmct.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/mtmct.py new file mode 100644 index 0000000000000000000000000000000000000000..f7ff199f94bdb0b973797aabe8d24d4e603bba99 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/mtmct.py @@ -0,0 +1,381 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from pptracking.python.mot.visualize import plot_tracking +from python.visualize import visualize_attr +import os +import re +import cv2 +import gc +import numpy as np +try: + from sklearn import preprocessing + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Human, please install sklearn, for example: `pip install sklearn`' + ) + pass +import pandas as pd +from tqdm import tqdm +from functools import reduce +import warnings +warnings.filterwarnings("ignore") + + +def gen_restxt(output_dir_filename, map_tid, cid_tid_dict): + pattern = re.compile(r'c(\d)_t(\d)') + f_w = open(output_dir_filename, 'w') + for key, res in cid_tid_dict.items(): + cid, tid = pattern.search(key).groups() + cid = int(cid) + 1 + rects = res["rects"] + frames = res["frames"] + for idx, bbox in enumerate(rects): + bbox[0][3:] -= bbox[0][1:3] + fid = frames[idx] + 1 + rect = [max(int(x), 0) for x in bbox[0][1:]] + if key in map_tid: + new_tid = map_tid[key] + f_w.write( + str(cid) + ' ' + str(new_tid) + ' ' + str(fid) + ' ' + + ' '.join(map(str, rect)) + '\n') + print('gen_res: write file in {}'.format(output_dir_filename)) + f_w.close() + + +def get_mtmct_matching_results(pred_mtmct_file, secs_interval=0.5, + video_fps=20): + res = np.loadtxt(pred_mtmct_file) # 'cid, tid, fid, x1, y1, w, h, -1, -1' + camera_ids = list(map(int, np.unique(res[:, 0]))) + + res = res[:, :7] + # each line in res: 'cid, tid, fid, x1, y1, w, h' + + camera_tids = [] + camera_results = dict() + for c_id in camera_ids: + camera_results[c_id] = res[res[:, 0] == c_id] + tids = np.unique(camera_results[c_id][:, 1]) + tids = list(map(int, tids)) + camera_tids.append(tids) + + # select common tids throughout each video + common_tids = reduce(np.intersect1d, camera_tids) + + # get mtmct matching results by cid_tid_fid_results[c_id][t_id][f_id] + cid_tid_fid_results = dict() + cid_tid_to_fids = dict() + interval = int(secs_interval * video_fps) # preferably less than 10 + for c_id in camera_ids: + cid_tid_fid_results[c_id] = dict() + cid_tid_to_fids[c_id] = dict() + for t_id in common_tids: + tid_mask = camera_results[c_id][:, 1] == t_id + cid_tid_fid_results[c_id][t_id] = dict() + + camera_trackid_results = camera_results[c_id][tid_mask] + fids = np.unique(camera_trackid_results[:, 2]) + fids = fids[fids % interval == 0] + fids = list(map(int, fids)) + cid_tid_to_fids[c_id][t_id] = fids + + for f_id in fids: + st_frame = f_id + ed_frame = f_id + interval + + st_mask = camera_trackid_results[:, 2] >= st_frame + ed_mask = camera_trackid_results[:, 2] < ed_frame + frame_mask = np.logical_and(st_mask, ed_mask) + cid_tid_fid_results[c_id][t_id][f_id] = camera_trackid_results[ + frame_mask] + + return camera_results, cid_tid_fid_results + + +def save_mtmct_vis_results(camera_results, captures, output_dir, + multi_res=None): + # camera_results: 'cid, tid, fid, x1, y1, w, h' + camera_ids = list(camera_results.keys()) + + import shutil + save_dir = os.path.join(output_dir, 'mtmct_vis') + if os.path.exists(save_dir): + shutil.rmtree(save_dir) + os.makedirs(save_dir) + + for idx, video_file in enumerate(captures): + capture = cv2.VideoCapture(video_file) + cid = camera_ids[idx] + basename = os.path.basename(video_file) + video_out_name = "vis_" + basename + out_path = os.path.join(save_dir, video_out_name) + print("Start visualizing output video: {}".format(out_path)) + + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + frame_id = 0 + while (1): + if frame_id % 50 == 0: + print('frame id: ', frame_id) + ret, frame = capture.read() + frame_id += 1 + if not ret: + if frame_id == 1: + print("video read failed!") + break + frame_results = camera_results[cid][camera_results[cid][:, 2] == + frame_id] + boxes = frame_results[:, -4:] + ids = frame_results[:, 1] + image = plot_tracking(frame, boxes, ids, frame_id=frame_id, fps=fps) + + # add attr vis + if multi_res: + tid_list = multi_res.keys() # c0_t1, c0_t2... + all_attr_result = [multi_res[i]["attrs"] + for i in tid_list] # all cid_tid result + if any( + all_attr_result + ): # at least one cid_tid[attrs] is not None will goes to attrs_vis + attr_res = [] + cid_str = 'c' + str(cid - 1) + "_" + for k in tid_list: + if not k.startswith(cid_str): + continue + if (frame_id - 1) >= len(multi_res[k]['attrs']): + t_attr = None + else: + t_attr = multi_res[k]['attrs'][frame_id - 1] + attr_res.append(t_attr) + assert len(attr_res) == len(boxes) + image = visualize_attr( + image, attr_res, boxes, is_mtmct=True) + + writer.write(image) + writer.release() + + +def get_euclidean(x, y, **kwargs): + m = x.shape[0] + n = y.shape[0] + distmat = (np.power(x, 2).sum(axis=1, keepdims=True).repeat( + n, axis=1) + np.power(y, 2).sum(axis=1, keepdims=True).repeat( + m, axis=1).T) + distmat -= np.dot(2 * x, y.T) + return distmat + + +def cosine_similarity(x, y, eps=1e-12): + """ + Computes cosine similarity between two tensors. + Value == 1 means the same vector + Value == 0 means perpendicular vectors + """ + x_n, y_n = np.linalg.norm( + x, axis=1, keepdims=True), np.linalg.norm( + y, axis=1, keepdims=True) + x_norm = x / np.maximum(x_n, eps * np.ones_like(x_n)) + y_norm = y / np.maximum(y_n, eps * np.ones_like(y_n)) + sim_mt = np.dot(x_norm, y_norm.T) + return sim_mt + + +def get_cosine(x, y, eps=1e-12): + """ + Computes cosine distance between two tensors. + The cosine distance is the inverse cosine similarity + -> cosine_distance = abs(-cosine_distance) to make it + similar in behavior to euclidean distance + """ + sim_mt = cosine_similarity(x, y, eps) + return sim_mt + + +def get_dist_mat(x, y, func_name="euclidean"): + if func_name == "cosine": + dist_mat = get_cosine(x, y) + elif func_name == "euclidean": + dist_mat = get_euclidean(x, y) + print("Using {} as distance function during evaluation".format(func_name)) + return dist_mat + + +def intracam_ignore(st_mask, cid_tids): + count = len(cid_tids) + for i in range(count): + for j in range(count): + if cid_tids[i][1] == cid_tids[j][1]: + st_mask[i, j] = 0. + return st_mask + + +def get_sim_matrix_new(cid_tid_dict, cid_tids): + # Note: camera independent get_sim_matrix function, + # which is different from the one in camera_utils.py. + count = len(cid_tids) + + q_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + g_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + #compute distmat + distmat = get_dist_mat(q_arr, g_arr, func_name="cosine") + + #mask the element which belongs to same video + st_mask = np.ones((count, count), dtype=np.float32) + st_mask = intracam_ignore(st_mask, cid_tids) + + sim_matrix = distmat * st_mask + np.fill_diagonal(sim_matrix, 0.) + return 1. - sim_matrix + + +def get_match(cluster_labels): + cluster_dict = dict() + cluster = list() + for i, l in enumerate(cluster_labels): + if l in list(cluster_dict.keys()): + cluster_dict[l].append(i) + else: + cluster_dict[l] = [i] + for idx in cluster_dict: + cluster.append(cluster_dict[idx]) + return cluster + + +def get_cid_tid(cluster_labels, cid_tids): + cluster = list() + for labels in cluster_labels: + cid_tid_list = list() + for label in labels: + cid_tid_list.append(cid_tids[label]) + cluster.append(cid_tid_list) + return cluster + + +def get_labels(cid_tid_dict, cid_tids): + #compute cost matrix between features + cost_matrix = get_sim_matrix_new(cid_tid_dict, cid_tids) + + #cluster all the features + cluster1 = AgglomerativeClustering( + n_clusters=None, + distance_threshold=0.5, + affinity='precomputed', + linkage='complete') + cluster_labels1 = cluster1.fit_predict(cost_matrix) + labels = get_match(cluster_labels1) + + sub_cluster = get_cid_tid(labels, cid_tids) + return labels + + +def sub_cluster(cid_tid_dict): + ''' + cid_tid_dict: all camera_id and track_id + ''' + #get all keys + cid_tids = sorted([key for key in cid_tid_dict.keys()]) + + #cluster all trackid + clu = get_labels(cid_tid_dict, cid_tids) + + #relabel every cluster groups + new_clu = list() + for c_list in clu: + new_clu.append([cid_tids[c] for c in c_list]) + cid_tid_label = dict() + for i, c_list in enumerate(new_clu): + for c in c_list: + cid_tid_label[c] = i + 1 + return cid_tid_label + + +def distill_idfeat(mot_res): + qualities_list = mot_res["qualities"] + feature_list = mot_res["features"] + rects = mot_res["rects"] + + qualities_new = [] + feature_new = [] + #filter rect less than 100*20 + for idx, rect in enumerate(rects): + conf, xmin, ymin, xmax, ymax = rect[0] + if (xmax - xmin) * (ymax - ymin) and (xmax > xmin) > 2000: + qualities_new.append(qualities_list[idx]) + feature_new.append(feature_list[idx]) + #take all features if available rect is less than 2 + if len(qualities_new) < 2: + qualities_new = qualities_list + feature_new = feature_list + + #if available frames number is more than 200, take one frame data per 20 frames + skipf = 1 + if len(qualities_new) > 20: + skipf = 2 + quality_skip = np.array(qualities_new[::skipf]) + feature_skip = np.array(feature_new[::skipf]) + + #sort features with image qualities, take the most trustworth features + topk_argq = np.argsort(quality_skip)[::-1] + if (quality_skip > 0.6).sum() > 1: + topk_feat = feature_skip[topk_argq[quality_skip > 0.6]] + else: + topk_feat = feature_skip[topk_argq] + + #get final features by mean or cluster, at most take five + mean_feat = np.mean(topk_feat[:5], axis=0) + return mean_feat + + +def res2dict(multi_res): + cid_tid_dict = {} + for cid, c_res in enumerate(multi_res): + for tid, res in c_res.items(): + key = "c" + str(cid) + "_t" + str(tid) + if key not in cid_tid_dict: + if len(res["features"]) == 0: + continue + cid_tid_dict[key] = res + cid_tid_dict[key]['mean_feat'] = distill_idfeat(res) + return cid_tid_dict + + +def mtmct_process(multi_res, captures, mtmct_vis=True, output_dir="output"): + cid_tid_dict = res2dict(multi_res) + if len(cid_tid_dict) == 0: + print("no tracking result found, mtmct will be skiped.") + return + map_tid = sub_cluster(cid_tid_dict) + + if not os.path.exists(output_dir): + os.mkdir(output_dir) + pred_mtmct_file = os.path.join(output_dir, 'mtmct_result.txt') + gen_restxt(pred_mtmct_file, map_tid, cid_tid_dict) + + if mtmct_vis: + camera_results, cid_tid_fid_res = get_mtmct_matching_results( + pred_mtmct_file) + + save_mtmct_vis_results( + camera_results, + captures, + output_dir=output_dir, + multi_res=cid_tid_dict) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/reid.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/reid.py new file mode 100644 index 0000000000000000000000000000000000000000..21b725ce4c7a587652a54176451841ff97d2bd8d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/reid.py @@ -0,0 +1,204 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import cv2 +import numpy as np +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from python.infer import PredictConfig +from pptracking.python.det_infer import load_predictor +from python.utils import Timer + + +class ReID(object): + """ + ReID of SDE methods + + Args: + pred_config (object): config of model, defined by `Config(model_dir)` + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of per batch in inference, default 50 means at most + 50 sub images can be made a batch and send into ReID model + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=50, + trt_min_shape=1, + trt_max_shape=1088, + trt_opt_shape=608, + trt_calib_mode=False, + cpu_threads=4, + enable_mkldnn=False): + self.pred_config = self.set_config(model_dir) + self.predictor, self.config = load_predictor( + model_dir, + run_mode=run_mode, + batch_size=batch_size, + min_subgraph_size=self.pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) + self.det_times = Timer() + self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0 + self.batch_size = batch_size + self.input_wh = (128, 256) + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def set_config(self, model_dir): + return PredictConfig(model_dir) + + def check_img_quality(self, crop, bbox, xyxy): + if crop is None: + return None + #eclipse + eclipse_quality = 1.0 + inner_rect = np.zeros(xyxy.shape) + inner_rect[:, :2] = np.maximum(xyxy[:, :2], bbox[None, :2]) + inner_rect[:, 2:] = np.minimum(xyxy[:, 2:], bbox[None, 2:]) + wh_array = inner_rect[:, 2:] - inner_rect[:, :2] + filt = np.logical_and(wh_array[:, 0] > 0, wh_array[:, 1] > 0) + wh_array = wh_array[filt] + if wh_array.shape[0] > 1: + eclipse_ratio = wh_array / (bbox[2:] - bbox[:2]) + eclipse_area_ratio = eclipse_ratio[:, 0] * eclipse_ratio[:, 1] + ear_lst = eclipse_area_ratio.tolist() + ear_lst.sort(reverse=True) + eclipse_quality = 1.0 - ear_lst[1] + bbox_wh = (bbox[2:] - bbox[:2]) + height_quality = bbox_wh[1] / (bbox_wh[0] * 2) + eclipse_quality = min(eclipse_quality, height_quality) + + #definition + cropgray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) + definition = int(cv2.Laplacian(cropgray, cv2.CV_64F, ksize=3).var()) + brightness = int(cropgray.mean()) + bd_quality = min(1., brightness / 50.) + + eclipse_weight = 0.7 + return eclipse_quality * eclipse_weight + bd_quality * (1 - + eclipse_weight) + + def normal_crop(self, image, rect): + imgh, imgw, c = image.shape + label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()] + xmin = max(0, xmin) + ymin = max(0, ymin) + xmax = min(imgw, xmax) + ymax = min(imgh, ymax) + if label != 0 or xmax <= xmin or ymax <= ymin: + print("Warning! label missed!!") + return None, None, None + return image[ymin:ymax, xmin:xmax, :] + + def crop_image_with_mot(self, image, mot_res): + res = mot_res['boxes'] + crop_res = [] + img_quality = [] + rects = [] + for box in res: + crop_image = self.normal_crop(image, box[1:]) + quality_item = self.check_img_quality(crop_image, box[3:], + res[:, 3:]) + if crop_image is not None: + crop_res.append(crop_image) + img_quality.append(quality_item) + rects.append(box) + return crop_res, img_quality, rects + + def preprocess(self, + imgs, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + im_batch = [] + for img in imgs: + img = cv2.resize(img, self.input_wh) + img = img.astype('float32') / 255. + img -= np.array(mean) + img /= np.array(std) + im_batch.append(img.transpose((2, 0, 1))) + inputs = {} + inputs['x'] = np.array(im_batch).astype('float32') + return inputs + + def predict(self, crops, repeats=1, add_timer=True, seq_name=''): + # preprocess + if add_timer: + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(crops) + input_names = self.predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(inputs[input_names[i]]) + + if add_timer: + self.det_times.preprocess_time_s.end() + self.det_times.inference_time_s.start() + + # model prediction + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + feature_tensor = self.predictor.get_output_handle(output_names[0]) + pred_embs = feature_tensor.copy_to_cpu() + if add_timer: + self.det_times.inference_time_s.end(repeats=repeats) + self.det_times.postprocess_time_s.start() + + if add_timer: + self.det_times.postprocess_time_s.end() + self.det_times.img_num += 1 + return pred_embs + + def predict_batch(self, imgs, batch_size=4): + batch_feat = [] + for b in range(0, len(imgs), batch_size): + b_end = min(len(imgs), b + batch_size) + batch_imgs = imgs[b:b_end] + feat = self.predict(batch_imgs) + batch_feat.extend(feat.tolist()) + + return batch_feat diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_infer.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..6a10355f385a633fb7c63d90398bc32998643b8e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_infer.py @@ -0,0 +1,314 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob + +import cv2 +import numpy as np +import math +import paddle +import sys +import paddle.nn.functional as F +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from paddle.inference import Config, create_predictor +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from video_action_preprocess import VideoDecoder, Sampler, Scale, CenterCrop, Normalization, Image2Array + + +def softmax(x): + f_x = np.exp(x) / np.sum(np.exp(x)) + return f_x + + +class VideoActionRecognizer(object): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + num_seg=8, + seg_len=1, + short_size=256, + target_size=224, + top_k=1, + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + ir_optim=True): + + self.num_seg = num_seg + self.seg_len = seg_len + self.short_size = short_size + self.target_size = target_size + self.top_k = top_k + + assert batch_size == 1, "VideoActionRecognizer only support batch_size=1 now." + + self.model_dir = model_dir + self.device = device + self.run_mode = run_mode + self.batch_size = batch_size + self.trt_min_shape = trt_min_shape + self.trt_max_shape = trt_max_shape + self.trt_opt_shape = trt_opt_shape + self.trt_calib_mode = trt_calib_mode + self.cpu_threads = cpu_threads + self.enable_mkldnn = enable_mkldnn + self.ir_optim = ir_optim + + self.recognize_times = Timer() + + model_file_path = glob.glob(os.path.join(model_dir, "*.pdmodel"))[0] + params_file_path = glob.glob(os.path.join(model_dir, "*.pdiparams"))[0] + self.config = Config(model_file_path, params_file_path) + + if device == "GPU" or device == "gpu": + self.config.enable_use_gpu(8000, 0) + else: + self.config.disable_gpu() + if self.enable_mkldnn: + # cache 10 different shapes for mkldnn to avoid memory leak + self.config.set_mkldnn_cache_capacity(10) + self.config.enable_mkldnn() + + self.config.switch_ir_optim(self.ir_optim) # default true + + precision_map = { + 'trt_int8': Config.Precision.Int8, + 'trt_fp32': Config.Precision.Float32, + 'trt_fp16': Config.Precision.Half + } + if run_mode in precision_map.keys(): + self.config.enable_tensorrt_engine( + max_batch_size=8, precision_mode=precision_map[run_mode]) + + self.config.enable_memory_optim() + # use zero copy + self.config.switch_use_feed_fetch_ops(False) + + self.predictor = create_predictor(self.config) + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + short_size=cfg['short_size'], + target_size=cfg['target_size'], + batch_size=cfg['batch_size'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def preprocess_batch(self, file_list): + batched_inputs = [] + for file in file_list: + inputs = self.preprocess(file) + batched_inputs.append(inputs) + batched_inputs = [ + np.concatenate([item[i] for item in batched_inputs]) + for i in range(len(batched_inputs[0])) + ] + self.input_file = file_list + return batched_inputs + + def get_timer(self): + return self.recognize_times + + def predict(self, input): + ''' + Args: + input (str) or (list): video file path or image data list + Returns: + results (dict): + ''' + + input_names = self.predictor.get_input_names() + input_tensor = self.predictor.get_input_handle(input_names[0]) + + output_names = self.predictor.get_output_names() + output_tensor = self.predictor.get_output_handle(output_names[0]) + + # preprocess + self.recognize_times.preprocess_time_s.start() + if type(input) == str: + inputs = self.preprocess_video(input) + else: + inputs = self.preprocess_frames(input) + self.recognize_times.preprocess_time_s.end() + + inputs = np.expand_dims( + inputs, axis=0).repeat( + self.batch_size, axis=0).copy() + + input_tensor.copy_from_cpu(inputs) + + # model prediction + self.recognize_times.inference_time_s.start() + self.predictor.run() + self.recognize_times.inference_time_s.end() + + output = output_tensor.copy_to_cpu() + + # postprocess + self.recognize_times.postprocess_time_s.start() + classes, scores = self.postprocess(output) + self.recognize_times.postprocess_time_s.end() + + return classes, scores + + def preprocess_frames(self, frame_list): + """ + frame_list: list, frame list + return: list + """ + + results = {} + results['frames_len'] = len(frame_list) + results["imgs"] = frame_list + + img_mean = [0.485, 0.456, 0.406] + img_std = [0.229, 0.224, 0.225] + ops = [ + CenterCrop(self.target_size), Image2Array(), + Normalization(img_mean, img_std) + ] + for op in ops: + results = op(results) + + res = np.expand_dims(results['imgs'], axis=0).copy() + return [res] + + def preprocess_video(self, input_file): + """ + input_file: str, file path + return: list + """ + assert os.path.isfile(input_file) is not None, "{0} not exists".format( + input_file) + + results = {'filename': input_file} + img_mean = [0.485, 0.456, 0.406] + img_std = [0.229, 0.224, 0.225] + ops = [ + VideoDecoder(), Sampler( + self.num_seg, self.seg_len, valid_mode=True), + Scale(self.short_size), CenterCrop(self.target_size), Image2Array(), + Normalization(img_mean, img_std) + ] + for op in ops: + results = op(results) + + res = np.expand_dims(results['imgs'], axis=0).copy() + return [res] + + def postprocess(self, output): + output = output.flatten() # numpy.ndarray + output = softmax(output) + classes = np.argpartition(output, -self.top_k)[-self.top_k:] + classes = classes[np.argsort(-output[classes])] + scores = output[classes] + return classes, scores + + +def main(): + if not FLAGS.run_benchmark: + assert FLAGS.batch_size == 1 + assert FLAGS.use_fp16 is False + else: + assert FLAGS.use_gpu is True + + recognizer = VideoActionRecognizer( + FLAGS.model_dir, + short_size=FLAGS.short_size, + target_size=FLAGS.target_size, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, ) + + if not FLAGS.run_benchmark: + classes, scores = recognizer.predict(FLAGS.video_file) + print("Current video file: {}".format(FLAGS.video_file)) + print("\ttop-1 class: {0}".format(classes[0])) + print("\ttop-1 score: {0}".format(scores[0])) + else: + cm, gm, gu = get_current_memory_mb() + mems = {'cpu_rss_mb': cm, 'gpu_rss_mb': gm, 'gpu_util': gu * 100} + + perf_info = recognizer.recognize_times.report() + model_dir = FLAGS.model_dir + mode = FLAGS.run_mode + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + data_info = { + 'batch_size': FLAGS.batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + recognize_log = PaddleInferBenchmark(recognizer.config, model_info, + data_info, perf_info, mems) + recognize_log('Fight') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_preprocess.py b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..eccec048dbe98e6326f169924ab86e5317aaba41 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/pphuman/video_action_preprocess.py @@ -0,0 +1,548 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +from PIL import Image +import paddle +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + + +class Sampler(object): + """ + Sample frames id. + NOTE: Use PIL to read image here, has diff with CV2 + Args: + num_seg(int): number of segments. + seg_len(int): number of sampled frames in each segment. + valid_mode(bool): True or False. + Returns: + frames_idx: the index of sampled #frames. + """ + + def __init__(self, + num_seg, + seg_len, + frame_interval=None, + valid_mode=True, + dense_sample=False, + linspace_sample=False, + use_pil=True): + self.num_seg = num_seg + self.seg_len = seg_len + self.frame_interval = frame_interval + self.valid_mode = valid_mode + self.dense_sample = dense_sample + self.linspace_sample = linspace_sample + self.use_pil = use_pil + + def _get(self, frames_idx, results): + data_format = results['format'] + + if data_format == "frame": + frame_dir = results['frame_dir'] + imgs = [] + for idx in frames_idx: + img = Image.open( + os.path.join(frame_dir, results['suffix'].format( + idx))).convert('RGB') + imgs.append(img) + + elif data_format == "video": + if results['backend'] == 'cv2': + frames = np.array(results['frames']) + imgs = [] + for idx in frames_idx: + imgbuf = frames[idx] + img = Image.fromarray(imgbuf, mode='RGB') + imgs.append(img) + elif results['backend'] == 'decord': + container = results['frames'] + if self.use_pil: + frames_select = container.get_batch(frames_idx) + # dearray_to_img + np_frames = frames_select.asnumpy() + imgs = [] + for i in range(np_frames.shape[0]): + imgbuf = np_frames[i] + imgs.append(Image.fromarray(imgbuf, mode='RGB')) + else: + if frames_idx.ndim != 1: + frames_idx = np.squeeze(frames_idx) + frame_dict = { + idx: container[idx].asnumpy() + for idx in np.unique(frames_idx) + } + imgs = [frame_dict[idx] for idx in frames_idx] + elif results['backend'] == 'pyav': + imgs = [] + frames = np.array(results['frames']) + for idx in frames_idx: + imgbuf = frames[idx] + imgs.append(imgbuf) + imgs = np.stack(imgs) # thwc + else: + raise NotImplementedError + else: + raise NotImplementedError + results['imgs'] = imgs # all image data + return results + + def _get_train_clips(self, num_frames): + ori_seg_len = self.seg_len * self.frame_interval + avg_interval = (num_frames - ori_seg_len + 1) // self.num_seg + + if avg_interval > 0: + base_offsets = np.arange(self.num_seg) * avg_interval + clip_offsets = base_offsets + np.random.randint( + avg_interval, size=self.num_seg) + elif num_frames > max(self.num_seg, ori_seg_len): + clip_offsets = np.sort( + np.random.randint( + num_frames - ori_seg_len + 1, size=self.num_seg)) + elif avg_interval == 0: + ratio = (num_frames - ori_seg_len + 1.0) / self.num_seg + clip_offsets = np.around(np.arange(self.num_seg) * ratio) + else: + clip_offsets = np.zeros((self.num_seg, ), dtype=np.int) + return clip_offsets + + def _get_test_clips(self, num_frames): + ori_seg_len = self.seg_len * self.frame_interval + avg_interval = (num_frames - ori_seg_len + 1) / float(self.num_seg) + if num_frames > ori_seg_len - 1: + base_offsets = np.arange(self.num_seg) * avg_interval + clip_offsets = (base_offsets + avg_interval / 2.0).astype(np.int) + else: + clip_offsets = np.zeros((self.num_seg, ), dtype=np.int) + return clip_offsets + + def __call__(self, results): + """ + Args: + frames_len: length of frames. + return: + sampling id. + """ + frames_len = int(results['frames_len']) # total number of frames + + frames_idx = [] + if self.frame_interval is not None: + assert isinstance(self.frame_interval, int) + if not self.valid_mode: + offsets = self._get_train_clips(frames_len) + else: + offsets = self._get_test_clips(frames_len) + + offsets = offsets[:, None] + np.arange(self.seg_len)[ + None, :] * self.frame_interval + offsets = np.concatenate(offsets) + + offsets = offsets.reshape((-1, self.seg_len)) + offsets = np.mod(offsets, frames_len) + offsets = np.concatenate(offsets) + + if results['format'] == 'video': + frames_idx = offsets + elif results['format'] == 'frame': + frames_idx = list(offsets + 1) + else: + raise NotImplementedError + + return self._get(frames_idx, results) + + print("self.frame_interval:", self.frame_interval) + + if self.linspace_sample: # default if False + if 'start_idx' in results and 'end_idx' in results: + offsets = np.linspace(results['start_idx'], results['end_idx'], + self.num_seg) + else: + offsets = np.linspace(0, frames_len - 1, self.num_seg) + offsets = np.clip(offsets, 0, frames_len - 1).astype(np.int64) + if results['format'] == 'video': + frames_idx = list(offsets) + frames_idx = [x % frames_len for x in frames_idx] + elif results['format'] == 'frame': + frames_idx = list(offsets + 1) + else: + raise NotImplementedError + return self._get(frames_idx, results) + + average_dur = int(frames_len / self.num_seg) + + print("results['format']:", results['format']) + + if self.dense_sample: # For ppTSM, default is False + if not self.valid_mode: # train + sample_pos = max(1, 1 + frames_len - 64) + t_stride = 64 // self.num_seg + start_idx = 0 if sample_pos == 1 else np.random.randint( + 0, sample_pos - 1) + offsets = [(idx * t_stride + start_idx) % frames_len + 1 + for idx in range(self.num_seg)] + frames_idx = offsets + else: + sample_pos = max(1, 1 + frames_len - 64) + t_stride = 64 // self.num_seg + start_list = np.linspace(0, sample_pos - 1, num=10, dtype=int) + offsets = [] + for start_idx in start_list.tolist(): + offsets += [(idx * t_stride + start_idx) % frames_len + 1 + for idx in range(self.num_seg)] + frames_idx = offsets + else: + for i in range(self.num_seg): + idx = 0 + if not self.valid_mode: + if average_dur >= self.seg_len: + idx = random.randint(0, average_dur - self.seg_len) + idx += i * average_dur + elif average_dur >= 1: + idx += i * average_dur + else: + idx = i + else: + if average_dur >= self.seg_len: + idx = (average_dur - 1) // 2 + idx += i * average_dur + elif average_dur >= 1: + idx += i * average_dur + else: + idx = i + + for jj in range(idx, idx + self.seg_len): + if results['format'] == 'video': + frames_idx.append(int(jj % frames_len)) + elif results['format'] == 'frame': + frames_idx.append(jj + 1) + + elif results['format'] == 'MRI': + frames_idx.append(jj) + else: + raise NotImplementedError + + return self._get(frames_idx, results) + + +class Scale(object): + """ + Scale images. + Args: + short_size(float | int): Short size of an image will be scaled to the short_size. + fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True + do_round(bool): Whether to round up when calculating the zoom ratio. default: False + backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow' + """ + + def __init__(self, + short_size, + fixed_ratio=True, + keep_ratio=None, + do_round=False, + backend='pillow'): + self.short_size = short_size + assert (fixed_ratio and not keep_ratio) or ( + not fixed_ratio + ), "fixed_ratio and keep_ratio cannot be true at the same time" + self.fixed_ratio = fixed_ratio + self.keep_ratio = keep_ratio + self.do_round = do_round + + assert backend in [ + 'pillow', 'cv2' + ], "Scale's backend must be pillow or cv2, but get {backend}" + + self.backend = backend + + def __call__(self, results): + """ + Performs resize operations. + Args: + imgs (Sequence[PIL.Image]): List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + resized_imgs: List where each item is a PIL.Image after scaling. + """ + imgs = results['imgs'] + resized_imgs = [] + for i in range(len(imgs)): + img = imgs[i] + if isinstance(img, np.ndarray): + h, w, _ = img.shape + elif isinstance(img, Image.Image): + w, h = img.size + else: + raise NotImplementedError + + if w <= h: + ow = self.short_size + if self.fixed_ratio: # default is True + oh = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + oh = self.short_size + else: + scale_factor = self.short_size / w + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * + self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * + self.short_size / h) + else: + oh = self.short_size + if self.fixed_ratio: + ow = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + ow = self.short_size + else: + scale_factor = self.short_size / h + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * + self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * + self.short_size / h) + + if type(img) == np.ndarray: + img = Image.fromarray(img, mode='RGB') + + if self.backend == 'pillow': + resized_imgs.append(img.resize((ow, oh), Image.BILINEAR)) + elif self.backend == 'cv2' and (self.keep_ratio is not None): + resized_imgs.append( + cv2.resize( + img, (ow, oh), interpolation=cv2.INTER_LINEAR)) + else: + resized_imgs.append( + Image.fromarray( + cv2.resize( + np.asarray(img), (ow, oh), + interpolation=cv2.INTER_LINEAR))) + results['imgs'] = resized_imgs + return results + + +class CenterCrop(object): + """ + Center crop images + Args: + target_size(int): Center crop a square with the target_size from an image. + do_round(bool): Whether to round up the coordinates of the upper left corner of the cropping area. default: True + """ + + def __init__(self, target_size, do_round=True, backend='pillow'): + self.target_size = target_size + self.do_round = do_round + self.backend = backend + + def __call__(self, results): + """ + Performs Center crop operations. + Args: + imgs: List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + ccrop_imgs: List where each item is a PIL.Image after Center crop. + """ + imgs = results['imgs'] + ccrop_imgs = [] + th, tw = self.target_size, self.target_size + if isinstance(imgs, paddle.Tensor): + h, w = imgs.shape[-2:] + x1 = int(round((w - tw) / 2.0)) if self.do_round else (w - tw) // 2 + y1 = int(round((h - th) / 2.0)) if self.do_round else (h - th) // 2 + ccrop_imgs = imgs[:, :, y1:y1 + th, x1:x1 + tw] + else: + for img in imgs: + if self.backend == 'pillow': + w, h = img.size + elif self.backend == 'cv2': + h, w, _ = img.shape + else: + raise NotImplementedError + assert (w >= self.target_size) and (h >= self.target_size), \ + "image width({}) and height({}) should be larger than crop size".format( + w, h, self.target_size) + x1 = int(round((w - tw) / 2.0)) if self.do_round else ( + w - tw) // 2 + y1 = int(round((h - th) / 2.0)) if self.do_round else ( + h - th) // 2 + if self.backend == 'cv2': + ccrop_imgs.append(img[y1:y1 + th, x1:x1 + tw]) + elif self.backend == 'pillow': + ccrop_imgs.append(img.crop((x1, y1, x1 + tw, y1 + th))) + results['imgs'] = ccrop_imgs + return results + + +class Image2Array(object): + """ + transfer PIL.Image to Numpy array and transpose dimensions from 'dhwc' to 'dchw'. + Args: + transpose: whether to transpose or not, default True, False for slowfast. + """ + + def __init__(self, transpose=True, data_format='tchw'): + assert data_format in [ + 'tchw', 'cthw' + ], "Target format must in ['tchw', 'cthw'], but got {data_format}" + self.transpose = transpose + self.data_format = data_format + + def __call__(self, results): + """ + Performs Image to NumpyArray operations. + Args: + imgs: List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + np_imgs: Numpy array. + """ + imgs = results['imgs'] + if 'backend' in results and results[ + 'backend'] == 'pyav': # [T,H,W,C] in [0, 1] + if self.transpose: + if self.data_format == 'tchw': + t_imgs = imgs.transpose((0, 3, 1, 2)) # tchw + else: + t_imgs = imgs.transpose((3, 0, 1, 2)) # cthw + results['imgs'] = t_imgs + else: + t_imgs = np.stack(imgs).astype('float32') + if self.transpose: + if self.data_format == 'tchw': + t_imgs = t_imgs.transpose(0, 3, 1, 2) # tchw + else: + t_imgs = t_imgs.transpose(3, 0, 1, 2) # cthw + results['imgs'] = t_imgs + return results + + +class VideoDecoder(object): + """ + Decode mp4 file to frames. + Args: + filepath: the file path of mp4 file + """ + + def __init__(self, + backend='cv2', + mode='train', + sampling_rate=32, + num_seg=8, + num_clips=1, + target_fps=30): + + self.backend = backend + # params below only for TimeSformer + self.mode = mode + self.sampling_rate = sampling_rate + self.num_seg = num_seg + self.num_clips = num_clips + self.target_fps = target_fps + + def __call__(self, results): + """ + Perform mp4 decode operations. + return: + List where each item is a numpy array after decoder. + """ + file_path = results['filename'] + results['format'] = 'video' + results['backend'] = self.backend + + if self.backend == 'cv2': # here + cap = cv2.VideoCapture(file_path) + videolen = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) + + sampledFrames = [] + for i in range(videolen): + ret, frame = cap.read() + # maybe first frame is empty + if ret == False: + continue + img = frame[:, :, ::-1] + sampledFrames.append(img) + results['frames'] = sampledFrames + results['frames_len'] = len(sampledFrames) + + elif self.backend == 'decord': + container = de.VideoReader(file_path) + frames_len = len(container) + results['frames'] = container + results['frames_len'] = frames_len + else: + raise NotImplementedError + return results + + +class Normalization(object): + """ + Normalization. + Args: + mean(Sequence[float]): mean values of different channels. + std(Sequence[float]): std values of different channels. + tensor_shape(list): size of mean, default [3,1,1]. For slowfast, [1,1,1,3] + """ + + def __init__(self, mean, std, tensor_shape=[3, 1, 1], inplace=False): + if not isinstance(mean, Sequence): + raise TypeError( + 'Mean must be list, tuple or np.ndarray, but got {type(mean)}') + if not isinstance(std, Sequence): + raise TypeError( + 'Std must be list, tuple or np.ndarray, but got {type(std)}') + + self.inplace = inplace + if not inplace: + self.mean = np.array(mean).reshape(tensor_shape).astype(np.float32) + self.std = np.array(std).reshape(tensor_shape).astype(np.float32) + else: + self.mean = np.array(mean, dtype=np.float32) + self.std = np.array(std, dtype=np.float32) + + def __call__(self, results): + """ + Performs normalization operations. + Args: + imgs: Numpy array. + return: + np_imgs: Numpy array after normalization. + """ + + if self.inplace: # default is False + n = len(results['imgs']) + h, w, c = results['imgs'][0].shape + norm_imgs = np.empty((n, h, w, c), dtype=np.float32) + for i, img in enumerate(results['imgs']): + norm_imgs[i] = img + + for img in norm_imgs: # [n,h,w,c] + mean = np.float64(self.mean.reshape(1, -1)) # [1, 3] + stdinv = 1 / np.float64(self.std.reshape(1, -1)) # [1, 3] + cv2.subtract(img, mean, img) + cv2.multiply(img, stdinv, img) + else: + imgs = results['imgs'] + norm_imgs = imgs / 255.0 + norm_imgs -= self.mean + norm_imgs /= self.std + if 'backend' in results and results['backend'] == 'pyav': + norm_imgs = paddle.to_tensor(norm_imgs, dtype=paddle.float32) + results['imgs'] = norm_imgs + return results diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/lane_seg_infer.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/lane_seg_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..4a0be96a5c72cdc0c0e11ae5b30fc0efe9a81374 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/lane_seg_infer.py @@ -0,0 +1,231 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import codecs +import os + +import yaml +import numpy as np +import cv2 +from sklearn.cluster import DBSCAN +from pptracking.python.det_infer import load_predictor + + +class LaneSegPredictor: + def __init__(self, lane_seg_config, model_dir): + """ + Prepare for prediction. + The usage and docs of paddle inference, please refer to + https://paddleinference.paddlepaddle.org.cn/product_introduction/summary.html + """ + if not os.path.exists(lane_seg_config): + raise ValueError("Cannot find : {},".format(lane_seg_config)) + + args = yaml.safe_load(open(lane_seg_config)) + self.model_dir = model_dir + self.args = args[args['type']] + + self.shape = None + self.filter_horizontal_flag = self.args['filter_horizontal_flag'] + self.horizontal_filtration_degree = self.args[ + 'horizontal_filtration_degree'] + self.horizontal_filtering_threshold = self.args[ + 'horizontal_filtering_threshold'] + + try: + self.predictor, _ = load_predictor( + model_dir=self.model_dir, + run_mode=self.args['run_mode'], + batch_size=self.args['batch_size'], + device=self.args['device'], + min_subgraph_size=self.args['min_subgraph_size'], + use_dynamic_shape=self.args['use_dynamic_shape'], + trt_min_shape=self.args['trt_min_shape'], + trt_max_shape=self.args['trt_max_shape'], + trt_opt_shape=self.args['trt_opt_shape'], + trt_calib_mode=self.args['trt_calib_mode'], + cpu_threads=self.args['cpu_threads'], + enable_mkldnn=self.args['enable_mkldnn']) + except Exception as e: + print(str(e)) + exit() + + def run(self, img): + + input_names = self.predictor.get_input_names() + input_handle = self.predictor.get_input_handle(input_names[0]) + output_names = self.predictor.get_output_names() + output_handle = self.predictor.get_output_handle(output_names[0]) + + img = np.array(img) + self.shape = img.shape[1:3] + img = self.normalize(img) + img = np.transpose(img, (0, 3, 1, 2)) + input_handle.reshape(img.shape) + input_handle.copy_from_cpu(img) + + self.predictor.run() + + results = output_handle.copy_to_cpu() + results = self.postprocess(results) + + return self.get_line(results) + + def normalize(self, im, mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)): + mean = np.array(mean)[np.newaxis, np.newaxis, :] + std = np.array(std)[np.newaxis, np.newaxis, :] + im = im.astype(np.float32, copy=False) / 255.0 + im -= mean + im /= std + return im + + def postprocess(self, pred): + + pred = np.argmax(pred, axis=1) + pred[pred == 3] = 0 + pred[pred > 0] = 255 + + return pred + + def get_line(self, results): + lines = [] + directions = [] + for i in range(results.shape[0]): + line, direction = self.hough_line(np.uint8(results[i])) + lines.append(line) + directions.append(direction) + return lines, directions + + def get_distance(self, array_1, array_2): + lon_a = array_1[0] + lat_a = array_1[1] + lon_b = array_2[0] + lat_b = array_2[1] + + s = pow(pow((lat_b - lat_a), 2) + pow((lon_b - lon_a), 2), 0.5) + return s + + def get_angle(self, array): + import math + x1, y1, x2, y2 = array + a_x = x2 - x1 + a_y = y2 - y1 + angle1 = math.atan2(a_y, a_x) + angle1 = int(angle1 * 180 / math.pi) + if angle1 > 90: + angle1 = 180 - angle1 + return angle1 + + def get_proportion(self, lines): + + proportion = 0.0 + h, w = self.shape + for line in lines: + x1, y1, x2, y2 = line + length = abs(y2 - y1) / h + abs(x2 - x1) / w + proportion = proportion + length + + return proportion + + def line_cluster(self, linesP): + + points = [] + for i in range(0, len(linesP)): + l = linesP[i] + x_center = (float( + (max(l[2], l[0]) - min(l[2], l[0]))) / 2.0 + min(l[2], l[0])) + y_center = (float( + (max(l[3], l[1]) - min(l[3], l[1]))) / 2.0 + min(l[3], l[1])) + points.append([x_center, y_center]) + + dbscan = DBSCAN( + eps=50, min_samples=2, metric=self.get_distance).fit(points) + + labels = dbscan.labels_ + n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) + cluster_list = list([] for i in range(n_clusters_)) + if linesP is not None: + for i in range(0, len(linesP)): + if labels[i] == -1: + continue + l = linesP[i] + x1, y1, x2, y2 = l + if y2 >= y1: + cluster_list[labels[i]].append([x1, y1, x2, y2]) + else: + ll = [x2, y2, x1, y1] + cluster_list[labels[i]].append(ll) + + return cluster_list + + def hough_line(self, + binary_img, + min_line=50, + min_line_points=50, + max_line_gap=10): + linesP = cv2.HoughLinesP(binary_img, 1, np.pi / 180, min_line, None, + min_line_points, max_line_gap) + if linesP is None: + return [], None + + coarse_cluster_list = self.line_cluster(linesP[:, 0]) + filter_lines_output, direction = self.filter_lines(coarse_cluster_list) + + return filter_lines_output, direction + + def filter_lines(self, coarse_cluster_list): + + lines = [] + angles = [] + for i in range(len(coarse_cluster_list)): + if len(coarse_cluster_list[i]) == 0: + continue + coarse_cluster_list[i] = np.array(coarse_cluster_list[i]) + distance = abs(coarse_cluster_list[i][:, 3] - coarse_cluster_list[i] + [:, 1]) + abs(coarse_cluster_list[i][:, 2] - + coarse_cluster_list[i][:, 0]) + l = coarse_cluster_list[i][np.argmax(distance)] + angles.append(self.get_angle(l)) + lines.append(l) + + if len(lines) == 0: + return [], None + if not self.filter_horizontal_flag: + return lines, None + + #filter horizontal roads + angles = np.array(angles) + + max_angle, min_angle = np.max(angles), np.min(angles) + + if (max_angle - min_angle) < self.horizontal_filtration_degree: + return lines, np.mean(angles) + + thr_angle = ( + max_angle + min_angle) * self.horizontal_filtering_threshold + lines = np.array(lines) + + min_angle_line = lines[np.where(angles < thr_angle)] + max_angle_line = lines[np.where(angles >= thr_angle)] + + max_angle_line_pro = self.get_proportion(max_angle_line) + min_angle_line_pro = self.get_proportion(min_angle_line) + + if max_angle_line_pro >= min_angle_line_pro: + angle_list = angles[np.where(angles >= thr_angle)] + return max_angle_line, np.mean(angle_list) + else: + angle_list = angles[np.where(angles < thr_angle)] + return min_angle_line, np.mean(angle_list) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/rec_word_dict.txt b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/rec_word_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..84b885d8352226e49b1d5d791b8f43a663e246aa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/rec_word_dict.txt @@ -0,0 +1,6623 @@ +' +疗 +绚 +诚 +娇 +溜 +题 +贿 +者 +廖 +更 +纳 +加 +奉 +公 +一 +就 +汴 +计 +与 +路 +房 +原 +妇 +2 +0 +8 +- +7 +其 +> +: +] +, +, +骑 +刈 +全 +消 +昏 +傈 +安 +久 +钟 +嗅 +不 +影 +处 +驽 +蜿 +资 +关 +椤 +地 +瘸 +专 +问 +忖 +票 +嫉 +炎 +韵 +要 +月 +田 +节 +陂 +鄙 +捌 +备 +拳 +伺 +眼 +网 +盎 +大 +傍 +心 +东 +愉 +汇 +蹿 +科 +每 +业 +里 +航 +晏 +字 +平 +录 +先 +1 +3 +彤 +鲶 +产 +稍 +督 +腴 +有 +象 +岳 +注 +绍 +在 +泺 +文 +定 +核 +名 +水 +过 +理 +让 +偷 +率 +等 +这 +发 +” +为 +含 +肥 +酉 +相 +鄱 +七 +编 +猥 +锛 +日 +镀 +蒂 +掰 +倒 +辆 +栾 +栗 +综 +涩 +州 +雌 +滑 +馀 +了 +机 +块 +司 +宰 +甙 +兴 +矽 +抚 +保 +用 +沧 +秩 +如 +收 +息 +滥 +页 +疑 +埠 +! +! +姥 +异 +橹 +钇 +向 +下 +跄 +的 +椴 +沫 +国 +绥 +獠 +报 +开 +民 +蜇 +何 +分 +凇 +长 +讥 +藏 +掏 +施 +羽 +中 +讲 +派 +嘟 +人 +提 +浼 +间 +世 +而 +古 +多 +倪 +唇 +饯 +控 +庚 +首 +赛 +蜓 +味 +断 +制 +觉 +技 +替 +艰 +溢 +潮 +夕 +钺 +外 +摘 +枋 +动 +双 +单 +啮 +户 +枇 +确 +锦 +曜 +杜 +或 +能 +效 +霜 +盒 +然 +侗 +电 +晁 +放 +步 +鹃 +新 +杖 +蜂 +吒 +濂 +瞬 +评 +总 +隍 +对 +独 +合 +也 +是 +府 +青 +天 +诲 +墙 +组 +滴 +级 +邀 +帘 +示 +已 +时 +骸 +仄 +泅 +和 +遨 +店 +雇 +疫 +持 +巍 +踮 +境 +只 +亨 +目 +鉴 +崤 +闲 +体 +泄 +杂 +作 +般 +轰 +化 +解 +迂 +诿 +蛭 +璀 +腾 +告 +版 +服 +省 +师 +小 +规 +程 +线 +海 +办 +引 +二 +桧 +牌 +砺 +洄 +裴 +修 +图 +痫 +胡 +许 +犊 +事 +郛 +基 +柴 +呼 +食 +研 +奶 +律 +蛋 +因 +葆 +察 +戏 +褒 +戒 +再 +李 +骁 +工 +貂 +油 +鹅 +章 +啄 +休 +场 +给 +睡 +纷 +豆 +器 +捎 +说 +敏 +学 +会 +浒 +设 +诊 +格 +廓 +查 +来 +霓 +室 +溆 +¢ +诡 +寥 +焕 +舜 +柒 +狐 +回 +戟 +砾 +厄 +实 +翩 +尿 +五 +入 +径 +惭 +喹 +股 +宇 +篝 +| +; +美 +期 +云 +九 +祺 +扮 +靠 +锝 +槌 +系 +企 +酰 +阊 +暂 +蚕 +忻 +豁 +本 +羹 +执 +条 +钦 +H +獒 +限 +进 +季 +楦 +于 +芘 +玖 +铋 +茯 +未 +答 +粘 +括 +样 +精 +欠 +矢 +甥 +帷 +嵩 +扣 +令 +仔 +风 +皈 +行 +支 +部 +蓉 +刮 +站 +蜡 +救 +钊 +汗 +松 +嫌 +成 +可 +. +鹤 +院 +从 +交 +政 +怕 +活 +调 +球 +局 +验 +髌 +第 +韫 +谗 +串 +到 +圆 +年 +米 +/ +* +友 +忿 +检 +区 +看 +自 +敢 +刃 +个 +兹 +弄 +流 +留 +同 +没 +齿 +星 +聆 +轼 +湖 +什 +三 +建 +蛔 +儿 +椋 +汕 +震 +颧 +鲤 +跟 +力 +情 +璺 +铨 +陪 +务 +指 +族 +训 +滦 +鄣 +濮 +扒 +商 +箱 +十 +召 +慷 +辗 +所 +莞 +管 +护 +臭 +横 +硒 +嗓 +接 +侦 +六 +露 +党 +馋 +驾 +剖 +高 +侬 +妪 +幂 +猗 +绺 +骐 +央 +酐 +孝 +筝 +课 +徇 +缰 +门 +男 +西 +项 +句 +谙 +瞒 +秃 +篇 +教 +碲 +罚 +声 +呐 +景 +前 +富 +嘴 +鳌 +稀 +免 +朋 +啬 +睐 +去 +赈 +鱼 +住 +肩 +愕 +速 +旁 +波 +厅 +健 +茼 +厥 +鲟 +谅 +投 +攸 +炔 +数 +方 +击 +呋 +谈 +绩 +别 +愫 +僚 +躬 +鹧 +胪 +炳 +招 +喇 +膨 +泵 +蹦 +毛 +结 +5 +4 +谱 +识 +陕 +粽 +婚 +拟 +构 +且 +搜 +任 +潘 +比 +郢 +妨 +醪 +陀 +桔 +碘 +扎 +选 +哈 +骷 +楷 +亿 +明 +缆 +脯 +监 +睫 +逻 +婵 +共 +赴 +淝 +凡 +惦 +及 +达 +揖 +谩 +澹 +减 +焰 +蛹 +番 +祁 +柏 +员 +禄 +怡 +峤 +龙 +白 +叽 +生 +闯 +起 +细 +装 +谕 +竟 +聚 +钙 +上 +导 +渊 +按 +艾 +辘 +挡 +耒 +盹 +饪 +臀 +记 +邮 +蕙 +受 +各 +医 +搂 +普 +滇 +朗 +茸 +带 +翻 +酚 +( +光 +堤 +墟 +蔷 +万 +幻 +〓 +瑙 +辈 +昧 +盏 +亘 +蛀 +吉 +铰 +请 +子 +假 +闻 +税 +井 +诩 +哨 +嫂 +好 +面 +琐 +校 +馊 +鬣 +缂 +营 +访 +炖 +占 +农 +缀 +否 +经 +钚 +棵 +趟 +张 +亟 +吏 +茶 +谨 +捻 +论 +迸 +堂 +玉 +信 +吧 +瞠 +乡 +姬 +寺 +咬 +溏 +苄 +皿 +意 +赉 +宝 +尔 +钰 +艺 +特 +唳 +踉 +都 +荣 +倚 +登 +荐 +丧 +奇 +涵 +批 +炭 +近 +符 +傩 +感 +道 +着 +菊 +虹 +仲 +众 +懈 +濯 +颞 +眺 +南 +释 +北 +缝 +标 +既 +茗 +整 +撼 +迤 +贲 +挎 +耱 +拒 +某 +妍 +卫 +哇 +英 +矶 +藩 +治 +他 +元 +领 +膜 +遮 +穗 +蛾 +飞 +荒 +棺 +劫 +么 +市 +火 +温 +拈 +棚 +洼 +转 +果 +奕 +卸 +迪 +伸 +泳 +斗 +邡 +侄 +涨 +屯 +萋 +胭 +氡 +崮 +枞 +惧 +冒 +彩 +斜 +手 +豚 +随 +旭 +淑 +妞 +形 +菌 +吲 +沱 +争 +驯 +歹 +挟 +兆 +柱 +传 +至 +包 +内 +响 +临 +红 +功 +弩 +衡 +寂 +禁 +老 +棍 +耆 +渍 +织 +害 +氵 +渑 +布 +载 +靥 +嗬 +虽 +苹 +咨 +娄 +库 +雉 +榜 +帜 +嘲 +套 +瑚 +亲 +簸 +欧 +边 +6 +腿 +旮 +抛 +吹 +瞳 +得 +镓 +梗 +厨 +继 +漾 +愣 +憨 +士 +策 +窑 +抑 +躯 +襟 +脏 +参 +贸 +言 +干 +绸 +鳄 +穷 +藜 +音 +折 +详 +) +举 +悍 +甸 +癌 +黎 +谴 +死 +罩 +迁 +寒 +驷 +袖 +媒 +蒋 +掘 +模 +纠 +恣 +观 +祖 +蛆 +碍 +位 +稿 +主 +澧 +跌 +筏 +京 +锏 +帝 +贴 +证 +糠 +才 +黄 +鲸 +略 +炯 +饱 +四 +出 +园 +犀 +牧 +容 +汉 +杆 +浈 +汰 +瑷 +造 +虫 +瘩 +怪 +驴 +济 +应 +花 +沣 +谔 +夙 +旅 +价 +矿 +以 +考 +s +u +呦 +晒 +巡 +茅 +准 +肟 +瓴 +詹 +仟 +褂 +译 +桌 +混 +宁 +怦 +郑 +抿 +些 +余 +鄂 +饴 +攒 +珑 +群 +阖 +岔 +琨 +藓 +预 +环 +洮 +岌 +宀 +杲 +瀵 +最 +常 +囡 +周 +踊 +女 +鼓 +袭 +喉 +简 +范 +薯 +遐 +疏 +粱 +黜 +禧 +法 +箔 +斤 +遥 +汝 +奥 +直 +贞 +撑 +置 +绱 +集 +她 +馅 +逗 +钧 +橱 +魉 +[ +恙 +躁 +唤 +9 +旺 +膘 +待 +脾 +惫 +购 +吗 +依 +盲 +度 +瘿 +蠖 +俾 +之 +镗 +拇 +鲵 +厝 +簧 +续 +款 +展 +啃 +表 +剔 +品 +钻 +腭 +损 +清 +锶 +统 +涌 +寸 +滨 +贪 +链 +吠 +冈 +伎 +迥 +咏 +吁 +览 +防 +迅 +失 +汾 +阔 +逵 +绀 +蔑 +列 +川 +凭 +努 +熨 +揪 +利 +俱 +绉 +抢 +鸨 +我 +即 +责 +膦 +易 +毓 +鹊 +刹 +玷 +岿 +空 +嘞 +绊 +排 +术 +估 +锷 +违 +们 +苟 +铜 +播 +肘 +件 +烫 +审 +鲂 +广 +像 +铌 +惰 +铟 +巳 +胍 +鲍 +康 +憧 +色 +恢 +想 +拷 +尤 +疳 +知 +S +Y +F +D +A +峄 +裕 +帮 +握 +搔 +氐 +氘 +难 +墒 +沮 +雨 +叁 +缥 +悴 +藐 +湫 +娟 +苑 +稠 +颛 +簇 +后 +阕 +闭 +蕤 +缚 +怎 +佞 +码 +嘤 +蔡 +痊 +舱 +螯 +帕 +赫 +昵 +升 +烬 +岫 +、 +疵 +蜻 +髁 +蕨 +隶 +烛 +械 +丑 +盂 +梁 +强 +鲛 +由 +拘 +揉 +劭 +龟 +撤 +钩 +呕 +孛 +费 +妻 +漂 +求 +阑 +崖 +秤 +甘 +通 +深 +补 +赃 +坎 +床 +啪 +承 +吼 +量 +暇 +钼 +烨 +阂 +擎 +脱 +逮 +称 +P +神 +属 +矗 +华 +届 +狍 +葑 +汹 +育 +患 +窒 +蛰 +佼 +静 +槎 +运 +鳗 +庆 +逝 +曼 +疱 +克 +代 +官 +此 +麸 +耧 +蚌 +晟 +例 +础 +榛 +副 +测 +唰 +缢 +迹 +灬 +霁 +身 +岁 +赭 +扛 +又 +菡 +乜 +雾 +板 +读 +陷 +徉 +贯 +郁 +虑 +变 +钓 +菜 +圾 +现 +琢 +式 +乐 +维 +渔 +浜 +左 +吾 +脑 +钡 +警 +T +啵 +拴 +偌 +漱 +湿 +硕 +止 +骼 +魄 +积 +燥 +联 +踢 +玛 +则 +窿 +见 +振 +畿 +送 +班 +钽 +您 +赵 +刨 +印 +讨 +踝 +籍 +谡 +舌 +崧 +汽 +蔽 +沪 +酥 +绒 +怖 +财 +帖 +肱 +私 +莎 +勋 +羔 +霸 +励 +哼 +帐 +将 +帅 +渠 +纪 +婴 +娩 +岭 +厘 +滕 +吻 +伤 +坝 +冠 +戊 +隆 +瘁 +介 +涧 +物 +黍 +并 +姗 +奢 +蹑 +掣 +垸 +锴 +命 +箍 +捉 +病 +辖 +琰 +眭 +迩 +艘 +绌 +繁 +寅 +若 +毋 +思 +诉 +类 +诈 +燮 +轲 +酮 +狂 +重 +反 +职 +筱 +县 +委 +磕 +绣 +奖 +晋 +濉 +志 +徽 +肠 +呈 +獐 +坻 +口 +片 +碰 +几 +村 +柿 +劳 +料 +获 +亩 +惕 +晕 +厌 +号 +罢 +池 +正 +鏖 +煨 +家 +棕 +复 +尝 +懋 +蜥 +锅 +岛 +扰 +队 +坠 +瘾 +钬 +@ +卧 +疣 +镇 +譬 +冰 +彷 +频 +黯 +据 +垄 +采 +八 +缪 +瘫 +型 +熹 +砰 +楠 +襁 +箐 +但 +嘶 +绳 +啤 +拍 +盥 +穆 +傲 +洗 +盯 +塘 +怔 +筛 +丿 +台 +恒 +喂 +葛 +永 +¥ +烟 +酒 +桦 +书 +砂 +蚝 +缉 +态 +瀚 +袄 +圳 +轻 +蛛 +超 +榧 +遛 +姒 +奘 +铮 +右 +荽 +望 +偻 +卡 +丶 +氰 +附 +做 +革 +索 +戚 +坨 +桷 +唁 +垅 +榻 +岐 +偎 +坛 +莨 +山 +殊 +微 +骇 +陈 +爨 +推 +嗝 +驹 +澡 +藁 +呤 +卤 +嘻 +糅 +逛 +侵 +郓 +酌 +德 +摇 +※ +鬃 +被 +慨 +殡 +羸 +昌 +泡 +戛 +鞋 +河 +宪 +沿 +玲 +鲨 +翅 +哽 +源 +铅 +语 +照 +邯 +址 +荃 +佬 +顺 +鸳 +町 +霭 +睾 +瓢 +夸 +椁 +晓 +酿 +痈 +咔 +侏 +券 +噎 +湍 +签 +嚷 +离 +午 +尚 +社 +锤 +背 +孟 +使 +浪 +缦 +潍 +鞅 +军 +姹 +驶 +笑 +鳟 +鲁 +》 +孽 +钜 +绿 +洱 +礴 +焯 +椰 +颖 +囔 +乌 +孔 +巴 +互 +性 +椽 +哞 +聘 +昨 +早 +暮 +胶 +炀 +隧 +低 +彗 +昝 +铁 +呓 +氽 +藉 +喔 +癖 +瑗 +姨 +权 +胱 +韦 +堑 +蜜 +酋 +楝 +砝 +毁 +靓 +歙 +锲 +究 +屋 +喳 +骨 +辨 +碑 +武 +鸠 +宫 +辜 +烊 +适 +坡 +殃 +培 +佩 +供 +走 +蜈 +迟 +翼 +况 +姣 +凛 +浔 +吃 +飘 +债 +犟 +金 +促 +苛 +崇 +坂 +莳 +畔 +绂 +兵 +蠕 +斋 +根 +砍 +亢 +欢 +恬 +崔 +剁 +餐 +榫 +快 +扶 +‖ +濒 +缠 +鳜 +当 +彭 +驭 +浦 +篮 +昀 +锆 +秸 +钳 +弋 +娣 +瞑 +夷 +龛 +苫 +拱 +致 +% +嵊 +障 +隐 +弑 +初 +娓 +抉 +汩 +累 +蓖 +" +唬 +助 +苓 +昙 +押 +毙 +破 +城 +郧 +逢 +嚏 +獭 +瞻 +溱 +婿 +赊 +跨 +恼 +璧 +萃 +姻 +貉 +灵 +炉 +密 +氛 +陶 +砸 +谬 +衔 +点 +琛 +沛 +枳 +层 +岱 +诺 +脍 +榈 +埂 +征 +冷 +裁 +打 +蹴 +素 +瘘 +逞 +蛐 +聊 +激 +腱 +萘 +踵 +飒 +蓟 +吆 +取 +咙 +簋 +涓 +矩 +曝 +挺 +揣 +座 +你 +史 +舵 +焱 +尘 +苏 +笈 +脚 +溉 +榨 +诵 +樊 +邓 +焊 +义 +庶 +儋 +蟋 +蒲 +赦 +呷 +杞 +诠 +豪 +还 +试 +颓 +茉 +太 +除 +紫 +逃 +痴 +草 +充 +鳕 +珉 +祗 +墨 +渭 +烩 +蘸 +慕 +璇 +镶 +穴 +嵘 +恶 +骂 +险 +绋 +幕 +碉 +肺 +戳 +刘 +潞 +秣 +纾 +潜 +銮 +洛 +须 +罘 +销 +瘪 +汞 +兮 +屉 +r +林 +厕 +质 +探 +划 +狸 +殚 +善 +煊 +烹 +〒 +锈 +逯 +宸 +辍 +泱 +柚 +袍 +远 +蹋 +嶙 +绝 +峥 +娥 +缍 +雀 +徵 +认 +镱 +谷 += +贩 +勉 +撩 +鄯 +斐 +洋 +非 +祚 +泾 +诒 +饿 +撬 +威 +晷 +搭 +芍 +锥 +笺 +蓦 +候 +琊 +档 +礁 +沼 +卵 +荠 +忑 +朝 +凹 +瑞 +头 +仪 +弧 +孵 +畏 +铆 +突 +衲 +车 +浩 +气 +茂 +悖 +厢 +枕 +酝 +戴 +湾 +邹 +飚 +攘 +锂 +写 +宵 +翁 +岷 +无 +喜 +丈 +挑 +嗟 +绛 +殉 +议 +槽 +具 +醇 +淞 +笃 +郴 +阅 +饼 +底 +壕 +砚 +弈 +询 +缕 +庹 +翟 +零 +筷 +暨 +舟 +闺 +甯 +撞 +麂 +茌 +蔼 +很 +珲 +捕 +棠 +角 +阉 +媛 +娲 +诽 +剿 +尉 +爵 +睬 +韩 +诰 +匣 +危 +糍 +镯 +立 +浏 +阳 +少 +盆 +舔 +擘 +匪 +申 +尬 +铣 +旯 +抖 +赘 +瓯 +居 +ˇ +哮 +游 +锭 +茏 +歌 +坏 +甚 +秒 +舞 +沙 +仗 +劲 +潺 +阿 +燧 +郭 +嗖 +霏 +忠 +材 +奂 +耐 +跺 +砀 +输 +岖 +媳 +氟 +极 +摆 +灿 +今 +扔 +腻 +枝 +奎 +药 +熄 +吨 +话 +q +额 +慑 +嘌 +协 +喀 +壳 +埭 +视 +著 +於 +愧 +陲 +翌 +峁 +颅 +佛 +腹 +聋 +侯 +咎 +叟 +秀 +颇 +存 +较 +罪 +哄 +岗 +扫 +栏 +钾 +羌 +己 +璨 +枭 +霉 +煌 +涸 +衿 +键 +镝 +益 +岢 +奏 +连 +夯 +睿 +冥 +均 +糖 +狞 +蹊 +稻 +爸 +刿 +胥 +煜 +丽 +肿 +璃 +掸 +跚 +灾 +垂 +樾 +濑 +乎 +莲 +窄 +犹 +撮 +战 +馄 +软 +络 +显 +鸢 +胸 +宾 +妲 +恕 +埔 +蝌 +份 +遇 +巧 +瞟 +粒 +恰 +剥 +桡 +博 +讯 +凯 +堇 +阶 +滤 +卖 +斌 +骚 +彬 +兑 +磺 +樱 +舷 +两 +娱 +福 +仃 +差 +找 +桁 +÷ +净 +把 +阴 +污 +戬 +雷 +碓 +蕲 +楚 +罡 +焖 +抽 +妫 +咒 +仑 +闱 +尽 +邑 +菁 +爱 +贷 +沥 +鞑 +牡 +嗉 +崴 +骤 +塌 +嗦 +订 +拮 +滓 +捡 +锻 +次 +坪 +杩 +臃 +箬 +融 +珂 +鹗 +宗 +枚 +降 +鸬 +妯 +阄 +堰 +盐 +毅 +必 +杨 +崃 +俺 +甬 +状 +莘 +货 +耸 +菱 +腼 +铸 +唏 +痤 +孚 +澳 +懒 +溅 +翘 +疙 +杷 +淼 +缙 +骰 +喊 +悉 +砻 +坷 +艇 +赁 +界 +谤 +纣 +宴 +晃 +茹 +归 +饭 +梢 +铡 +街 +抄 +肼 +鬟 +苯 +颂 +撷 +戈 +炒 +咆 +茭 +瘙 +负 +仰 +客 +琉 +铢 +封 +卑 +珥 +椿 +镧 +窨 +鬲 +寿 +御 +袤 +铃 +萎 +砖 +餮 +脒 +裳 +肪 +孕 +嫣 +馗 +嵇 +恳 +氯 +江 +石 +褶 +冢 +祸 +阻 +狈 +羞 +银 +靳 +透 +咳 +叼 +敷 +芷 +啥 +它 +瓤 +兰 +痘 +懊 +逑 +肌 +往 +捺 +坊 +甩 +呻 +〃 +沦 +忘 +膻 +祟 +菅 +剧 +崆 +智 +坯 +臧 +霍 +墅 +攻 +眯 +倘 +拢 +骠 +铐 +庭 +岙 +瓠 +′ +缺 +泥 +迢 +捶 +? +? +郏 +喙 +掷 +沌 +纯 +秘 +种 +听 +绘 +固 +螨 +团 +香 +盗 +妒 +埚 +蓝 +拖 +旱 +荞 +铀 +血 +遏 +汲 +辰 +叩 +拽 +幅 +硬 +惶 +桀 +漠 +措 +泼 +唑 +齐 +肾 +念 +酱 +虚 +屁 +耶 +旗 +砦 +闵 +婉 +馆 +拭 +绅 +韧 +忏 +窝 +醋 +葺 +顾 +辞 +倜 +堆 +辋 +逆 +玟 +贱 +疾 +董 +惘 +倌 +锕 +淘 +嘀 +莽 +俭 +笏 +绑 +鲷 +杈 +择 +蟀 +粥 +嗯 +驰 +逾 +案 +谪 +褓 +胫 +哩 +昕 +颚 +鲢 +绠 +躺 +鹄 +崂 +儒 +俨 +丝 +尕 +泌 +啊 +萸 +彰 +幺 +吟 +骄 +苣 +弦 +脊 +瑰 +〈 +诛 +镁 +析 +闪 +剪 +侧 +哟 +框 +螃 +守 +嬗 +燕 +狭 +铈 +缮 +概 +迳 +痧 +鲲 +俯 +售 +笼 +痣 +扉 +挖 +满 +咋 +援 +邱 +扇 +歪 +便 +玑 +绦 +峡 +蛇 +叨 +〖 +泽 +胃 +斓 +喋 +怂 +坟 +猪 +该 +蚬 +炕 +弥 +赞 +棣 +晔 +娠 +挲 +狡 +创 +疖 +铕 +镭 +稷 +挫 +弭 +啾 +翔 +粉 +履 +苘 +哦 +楼 +秕 +铂 +土 +锣 +瘟 +挣 +栉 +习 +享 +桢 +袅 +磨 +桂 +谦 +延 +坚 +蔚 +噗 +署 +谟 +猬 +钎 +恐 +嬉 +雒 +倦 +衅 +亏 +璩 +睹 +刻 +殿 +王 +算 +雕 +麻 +丘 +柯 +骆 +丸 +塍 +谚 +添 +鲈 +垓 +桎 +蚯 +芥 +予 +飕 +镦 +谌 +窗 +醚 +菀 +亮 +搪 +莺 +蒿 +羁 +足 +J +真 +轶 +悬 +衷 +靛 +翊 +掩 +哒 +炅 +掐 +冼 +妮 +l +谐 +稚 +荆 +擒 +犯 +陵 +虏 +浓 +崽 +刍 +陌 +傻 +孜 +千 +靖 +演 +矜 +钕 +煽 +杰 +酗 +渗 +伞 +栋 +俗 +泫 +戍 +罕 +沾 +疽 +灏 +煦 +芬 +磴 +叱 +阱 +榉 +湃 +蜀 +叉 +醒 +彪 +租 +郡 +篷 +屎 +良 +垢 +隗 +弱 +陨 +峪 +砷 +掴 +颁 +胎 +雯 +绵 +贬 +沐 +撵 +隘 +篙 +暖 +曹 +陡 +栓 +填 +臼 +彦 +瓶 +琪 +潼 +哪 +鸡 +摩 +啦 +俟 +锋 +域 +耻 +蔫 +疯 +纹 +撇 +毒 +绶 +痛 +酯 +忍 +爪 +赳 +歆 +嘹 +辕 +烈 +册 +朴 +钱 +吮 +毯 +癜 +娃 +谀 +邵 +厮 +炽 +璞 +邃 +丐 +追 +词 +瓒 +忆 +轧 +芫 +谯 +喷 +弟 +半 +冕 +裙 +掖 +墉 +绮 +寝 +苔 +势 +顷 +褥 +切 +衮 +君 +佳 +嫒 +蚩 +霞 +佚 +洙 +逊 +镖 +暹 +唛 +& +殒 +顶 +碗 +獗 +轭 +铺 +蛊 +废 +恹 +汨 +崩 +珍 +那 +杵 +曲 +纺 +夏 +薰 +傀 +闳 +淬 +姘 +舀 +拧 +卷 +楂 +恍 +讪 +厩 +寮 +篪 +赓 +乘 +灭 +盅 +鞣 +沟 +慎 +挂 +饺 +鼾 +杳 +树 +缨 +丛 +絮 +娌 +臻 +嗳 +篡 +侩 +述 +衰 +矛 +圈 +蚜 +匕 +筹 +匿 +濞 +晨 +叶 +骋 +郝 +挚 +蚴 +滞 +增 +侍 +描 +瓣 +吖 +嫦 +蟒 +匾 +圣 +赌 +毡 +癞 +恺 +百 +曳 +需 +篓 +肮 +庖 +帏 +卿 +驿 +遗 +蹬 +鬓 +骡 +歉 +芎 +胳 +屐 +禽 +烦 +晌 +寄 +媾 +狄 +翡 +苒 +船 +廉 +终 +痞 +殇 +々 +畦 +饶 +改 +拆 +悻 +萄 +£ +瓿 +乃 +訾 +桅 +匮 +溧 +拥 +纱 +铍 +骗 +蕃 +龋 +缬 +父 +佐 +疚 +栎 +醍 +掳 +蓄 +x +惆 +颜 +鲆 +榆 +〔 +猎 +敌 +暴 +谥 +鲫 +贾 +罗 +玻 +缄 +扦 +芪 +癣 +落 +徒 +臾 +恿 +猩 +托 +邴 +肄 +牵 +春 +陛 +耀 +刊 +拓 +蓓 +邳 +堕 +寇 +枉 +淌 +啡 +湄 +兽 +酷 +萼 +碚 +濠 +萤 +夹 +旬 +戮 +梭 +琥 +椭 +昔 +勺 +蜊 +绐 +晚 +孺 +僵 +宣 +摄 +冽 +旨 +萌 +忙 +蚤 +眉 +噼 +蟑 +付 +契 +瓜 +悼 +颡 +壁 +曾 +窕 +颢 +澎 +仿 +俑 +浑 +嵌 +浣 +乍 +碌 +褪 +乱 +蔟 +隙 +玩 +剐 +葫 +箫 +纲 +围 +伐 +决 +伙 +漩 +瑟 +刑 +肓 +镳 +缓 +蹭 +氨 +皓 +典 +畲 +坍 +铑 +檐 +塑 +洞 +倬 +储 +胴 +淳 +戾 +吐 +灼 +惺 +妙 +毕 +珐 +缈 +虱 +盖 +羰 +鸿 +磅 +谓 +髅 +娴 +苴 +唷 +蚣 +霹 +抨 +贤 +唠 +犬 +誓 +逍 +庠 +逼 +麓 +籼 +釉 +呜 +碧 +秧 +氩 +摔 +霄 +穸 +纨 +辟 +妈 +映 +完 +牛 +缴 +嗷 +炊 +恩 +荔 +茆 +掉 +紊 +慌 +莓 +羟 +阙 +萁 +磐 +另 +蕹 +辱 +鳐 +湮 +吡 +吩 +唐 +睦 +垠 +舒 +圜 +冗 +瞿 +溺 +芾 +囱 +匠 +僳 +汐 +菩 +饬 +漓 +黑 +霰 +浸 +濡 +窥 +毂 +蒡 +兢 +驻 +鹉 +芮 +诙 +迫 +雳 +厂 +忐 +臆 +猴 +鸣 +蚪 +栈 +箕 +羡 +渐 +莆 +捍 +眈 +哓 +趴 +蹼 +埕 +嚣 +骛 +宏 +淄 +斑 +噜 +严 +瑛 +垃 +椎 +诱 +压 +庾 +绞 +焘 +廿 +抡 +迄 +棘 +夫 +纬 +锹 +眨 +瞌 +侠 +脐 +竞 +瀑 +孳 +骧 +遁 +姜 +颦 +荪 +滚 +萦 +伪 +逸 +粳 +爬 +锁 +矣 +役 +趣 +洒 +颔 +诏 +逐 +奸 +甭 +惠 +攀 +蹄 +泛 +尼 +拼 +阮 +鹰 +亚 +颈 +惑 +勒 +〉 +际 +肛 +爷 +刚 +钨 +丰 +养 +冶 +鲽 +辉 +蔻 +画 +覆 +皴 +妊 +麦 +返 +醉 +皂 +擀 +〗 +酶 +凑 +粹 +悟 +诀 +硖 +港 +卜 +z +杀 +涕 +± +舍 +铠 +抵 +弛 +段 +敝 +镐 +奠 +拂 +轴 +跛 +袱 +e +t +沉 +菇 +俎 +薪 +峦 +秭 +蟹 +历 +盟 +菠 +寡 +液 +肢 +喻 +染 +裱 +悱 +抱 +氙 +赤 +捅 +猛 +跑 +氮 +谣 +仁 +尺 +辊 +窍 +烙 +衍 +架 +擦 +倏 +璐 +瑁 +币 +楞 +胖 +夔 +趸 +邛 +惴 +饕 +虔 +蝎 +§ +哉 +贝 +宽 +辫 +炮 +扩 +饲 +籽 +魏 +菟 +锰 +伍 +猝 +末 +琳 +哚 +蛎 +邂 +呀 +姿 +鄞 +却 +歧 +仙 +恸 +椐 +森 +牒 +寤 +袒 +婆 +虢 +雅 +钉 +朵 +贼 +欲 +苞 +寰 +故 +龚 +坭 +嘘 +咫 +礼 +硷 +兀 +睢 +汶 +’ +铲 +烧 +绕 +诃 +浃 +钿 +哺 +柜 +讼 +颊 +璁 +腔 +洽 +咐 +脲 +簌 +筠 +镣 +玮 +鞠 +谁 +兼 +姆 +挥 +梯 +蝴 +谘 +漕 +刷 +躏 +宦 +弼 +b +垌 +劈 +麟 +莉 +揭 +笙 +渎 +仕 +嗤 +仓 +配 +怏 +抬 +错 +泯 +镊 +孰 +猿 +邪 +仍 +秋 +鼬 +壹 +歇 +吵 +炼 +< +尧 +射 +柬 +廷 +胧 +霾 +凳 +隋 +肚 +浮 +梦 +祥 +株 +堵 +退 +L +鹫 +跎 +凶 +毽 +荟 +炫 +栩 +玳 +甜 +沂 +鹿 +顽 +伯 +爹 +赔 +蛴 +徐 +匡 +欣 +狰 +缸 +雹 +蟆 +疤 +默 +沤 +啜 +痂 +衣 +禅 +w +i +h +辽 +葳 +黝 +钗 +停 +沽 +棒 +馨 +颌 +肉 +吴 +硫 +悯 +劾 +娈 +马 +啧 +吊 +悌 +镑 +峭 +帆 +瀣 +涉 +咸 +疸 +滋 +泣 +翦 +拙 +癸 +钥 +蜒 ++ +尾 +庄 +凝 +泉 +婢 +渴 +谊 +乞 +陆 +锉 +糊 +鸦 +淮 +I +B +N +晦 +弗 +乔 +庥 +葡 +尻 +席 +橡 +傣 +渣 +拿 +惩 +麋 +斛 +缃 +矮 +蛏 +岘 +鸽 +姐 +膏 +催 +奔 +镒 +喱 +蠡 +摧 +钯 +胤 +柠 +拐 +璋 +鸥 +卢 +荡 +倾 +^ +_ +珀 +逄 +萧 +塾 +掇 +贮 +笆 +聂 +圃 +冲 +嵬 +M +滔 +笕 +值 +炙 +偶 +蜱 +搐 +梆 +汪 +蔬 +腑 +鸯 +蹇 +敞 +绯 +仨 +祯 +谆 +梧 +糗 +鑫 +啸 +豺 +囹 +猾 +巢 +柄 +瀛 +筑 +踌 +沭 +暗 +苁 +鱿 +蹉 +脂 +蘖 +牢 +热 +木 +吸 +溃 +宠 +序 +泞 +偿 +拜 +檩 +厚 +朐 +毗 +螳 +吞 +媚 +朽 +担 +蝗 +橘 +畴 +祈 +糟 +盱 +隼 +郜 +惜 +珠 +裨 +铵 +焙 +琚 +唯 +咚 +噪 +骊 +丫 +滢 +勤 +棉 +呸 +咣 +淀 +隔 +蕾 +窈 +饨 +挨 +煅 +短 +匙 +粕 +镜 +赣 +撕 +墩 +酬 +馁 +豌 +颐 +抗 +酣 +氓 +佑 +搁 +哭 +递 +耷 +涡 +桃 +贻 +碣 +截 +瘦 +昭 +镌 +蔓 +氚 +甲 +猕 +蕴 +蓬 +散 +拾 +纛 +狼 +猷 +铎 +埋 +旖 +矾 +讳 +囊 +糜 +迈 +粟 +蚂 +紧 +鲳 +瘢 +栽 +稼 +羊 +锄 +斟 +睁 +桥 +瓮 +蹙 +祉 +醺 +鼻 +昱 +剃 +跳 +篱 +跷 +蒜 +翎 +宅 +晖 +嗑 +壑 +峻 +癫 +屏 +狠 +陋 +袜 +途 +憎 +祀 +莹 +滟 +佶 +溥 +臣 +约 +盛 +峰 +磁 +慵 +婪 +拦 +莅 +朕 +鹦 +粲 +裤 +哎 +疡 +嫖 +琵 +窟 +堪 +谛 +嘉 +儡 +鳝 +斩 +郾 +驸 +酊 +妄 +胜 +贺 +徙 +傅 +噌 +钢 +栅 +庇 +恋 +匝 +巯 +邈 +尸 +锚 +粗 +佟 +蛟 +薹 +纵 +蚊 +郅 +绢 +锐 +苗 +俞 +篆 +淆 +膀 +鲜 +煎 +诶 +秽 +寻 +涮 +刺 +怀 +噶 +巨 +褰 +魅 +灶 +灌 +桉 +藕 +谜 +舸 +薄 +搀 +恽 +借 +牯 +痉 +渥 +愿 +亓 +耘 +杠 +柩 +锔 +蚶 +钣 +珈 +喘 +蹒 +幽 +赐 +稗 +晤 +莱 +泔 +扯 +肯 +菪 +裆 +腩 +豉 +疆 +骜 +腐 +倭 +珏 +唔 +粮 +亡 +润 +慰 +伽 +橄 +玄 +誉 +醐 +胆 +龊 +粼 +塬 +陇 +彼 +削 +嗣 +绾 +芽 +妗 +垭 +瘴 +爽 +薏 +寨 +龈 +泠 +弹 +赢 +漪 +猫 +嘧 +涂 +恤 +圭 +茧 +烽 +屑 +痕 +巾 +赖 +荸 +凰 +腮 +畈 +亵 +蹲 +偃 +苇 +澜 +艮 +换 +骺 +烘 +苕 +梓 +颉 +肇 +哗 +悄 +氤 +涠 +葬 +屠 +鹭 +植 +竺 +佯 +诣 +鲇 +瘀 +鲅 +邦 +移 +滁 +冯 +耕 +癔 +戌 +茬 +沁 +巩 +悠 +湘 +洪 +痹 +锟 +循 +谋 +腕 +鳃 +钠 +捞 +焉 +迎 +碱 +伫 +急 +榷 +奈 +邝 +卯 +辄 +皲 +卟 +醛 +畹 +忧 +稳 +雄 +昼 +缩 +阈 +睑 +扌 +耗 +曦 +涅 +捏 +瞧 +邕 +淖 +漉 +铝 +耦 +禹 +湛 +喽 +莼 +琅 +诸 +苎 +纂 +硅 +始 +嗨 +傥 +燃 +臂 +赅 +嘈 +呆 +贵 +屹 +壮 +肋 +亍 +蚀 +卅 +豹 +腆 +邬 +迭 +浊 +} +童 +螂 +捐 +圩 +勐 +触 +寞 +汊 +壤 +荫 +膺 +渌 +芳 +懿 +遴 +螈 +泰 +蓼 +蛤 +茜 +舅 +枫 +朔 +膝 +眙 +避 +梅 +判 +鹜 +璜 +牍 +缅 +垫 +藻 +黔 +侥 +惚 +懂 +踩 +腰 +腈 +札 +丞 +唾 +慈 +顿 +摹 +荻 +琬 +~ +斧 +沈 +滂 +胁 +胀 +幄 +莜 +Z +匀 +鄄 +掌 +绰 +茎 +焚 +赋 +萱 +谑 +汁 +铒 +瞎 +夺 +蜗 +野 +娆 +冀 +弯 +篁 +懵 +灞 +隽 +芡 +脘 +俐 +辩 +芯 +掺 +喏 +膈 +蝈 +觐 +悚 +踹 +蔗 +熠 +鼠 +呵 +抓 +橼 +峨 +畜 +缔 +禾 +崭 +弃 +熊 +摒 +凸 +拗 +穹 +蒙 +抒 +祛 +劝 +闫 +扳 +阵 +醌 +踪 +喵 +侣 +搬 +仅 +荧 +赎 +蝾 +琦 +买 +婧 +瞄 +寓 +皎 +冻 +赝 +箩 +莫 +瞰 +郊 +笫 +姝 +筒 +枪 +遣 +煸 +袋 +舆 +痱 +涛 +母 +〇 +启 +践 +耙 +绲 +盘 +遂 +昊 +搞 +槿 +诬 +纰 +泓 +惨 +檬 +亻 +越 +C +o +憩 +熵 +祷 +钒 +暧 +塔 +阗 +胰 +咄 +娶 +魔 +琶 +钞 +邻 +扬 +杉 +殴 +咽 +弓 +〆 +髻 +】 +吭 +揽 +霆 +拄 +殖 +脆 +彻 +岩 +芝 +勃 +辣 +剌 +钝 +嘎 +甄 +佘 +皖 +伦 +授 +徕 +憔 +挪 +皇 +庞 +稔 +芜 +踏 +溴 +兖 +卒 +擢 +饥 +鳞 +煲 +‰ +账 +颗 +叻 +斯 +捧 +鳍 +琮 +讹 +蛙 +纽 +谭 +酸 +兔 +莒 +睇 +伟 +觑 +羲 +嗜 +宜 +褐 +旎 +辛 +卦 +诘 +筋 +鎏 +溪 +挛 +熔 +阜 +晰 +鳅 +丢 +奚 +灸 +呱 +献 +陉 +黛 +鸪 +甾 +萨 +疮 +拯 +洲 +疹 +辑 +叙 +恻 +谒 +允 +柔 +烂 +氏 +逅 +漆 +拎 +惋 +扈 +湟 +纭 +啕 +掬 +擞 +哥 +忽 +涤 +鸵 +靡 +郗 +瓷 +扁 +廊 +怨 +雏 +钮 +敦 +E +懦 +憋 +汀 +拚 +啉 +腌 +岸 +f +痼 +瞅 +尊 +咀 +眩 +飙 +忌 +仝 +迦 +熬 +毫 +胯 +篑 +茄 +腺 +凄 +舛 +碴 +锵 +诧 +羯 +後 +漏 +汤 +宓 +仞 +蚁 +壶 +谰 +皑 +铄 +棰 +罔 +辅 +晶 +苦 +牟 +闽 +\ +烃 +饮 +聿 +丙 +蛳 +朱 +煤 +涔 +鳖 +犁 +罐 +荼 +砒 +淦 +妤 +黏 +戎 +孑 +婕 +瑾 +戢 +钵 +枣 +捋 +砥 +衩 +狙 +桠 +稣 +阎 +肃 +梏 +诫 +孪 +昶 +婊 +衫 +嗔 +侃 +塞 +蜃 +樵 +峒 +貌 +屿 +欺 +缫 +阐 +栖 +诟 +珞 +荭 +吝 +萍 +嗽 +恂 +啻 +蜴 +磬 +峋 +俸 +豫 +谎 +徊 +镍 +韬 +魇 +晴 +U +囟 +猜 +蛮 +坐 +囿 +伴 +亭 +肝 +佗 +蝠 +妃 +胞 +滩 +榴 +氖 +垩 +苋 +砣 +扪 +馏 +姓 +轩 +厉 +夥 +侈 +禀 +垒 +岑 +赏 +钛 +辐 +痔 +披 +纸 +碳 +“ +坞 +蠓 +挤 +荥 +沅 +悔 +铧 +帼 +蒌 +蝇 +a +p +y +n +g +哀 +浆 +瑶 +凿 +桶 +馈 +皮 +奴 +苜 +佤 +伶 +晗 +铱 +炬 +优 +弊 +氢 +恃 +甫 +攥 +端 +锌 +灰 +稹 +炝 +曙 +邋 +亥 +眶 +碾 +拉 +萝 +绔 +捷 +浍 +腋 +姑 +菖 +凌 +涞 +麽 +锢 +桨 +潢 +绎 +镰 +殆 +锑 +渝 +铬 +困 +绽 +觎 +匈 +糙 +暑 +裹 +鸟 +盔 +肽 +迷 +綦 +『 +亳 +佝 +俘 +钴 +觇 +骥 +仆 +疝 +跪 +婶 +郯 +瀹 +唉 +脖 +踞 +针 +晾 +忒 +扼 +瞩 +叛 +椒 +疟 +嗡 +邗 +肆 +跆 +玫 +忡 +捣 +咧 +唆 +艄 +蘑 +潦 +笛 +阚 +沸 +泻 +掊 +菽 +贫 +斥 +髂 +孢 +镂 +赂 +麝 +鸾 +屡 +衬 +苷 +恪 +叠 +希 +粤 +爻 +喝 +茫 +惬 +郸 +绻 +庸 +撅 +碟 +宄 +妹 +膛 +叮 +饵 +崛 +嗲 +椅 +冤 +搅 +咕 +敛 +尹 +垦 +闷 +蝉 +霎 +勰 +败 +蓑 +泸 +肤 +鹌 +幌 +焦 +浠 +鞍 +刁 +舰 +乙 +竿 +裔 +。 +茵 +函 +伊 +兄 +丨 +娜 +匍 +謇 +莪 +宥 +似 +蝽 +翳 +酪 +翠 +粑 +薇 +祢 +骏 +赠 +叫 +Q +噤 +噻 +竖 +芗 +莠 +潭 +俊 +羿 +耜 +O +郫 +趁 +嗪 +囚 +蹶 +芒 +洁 +笋 +鹑 +敲 +硝 +啶 +堡 +渲 +揩 +』 +携 +宿 +遒 +颍 +扭 +棱 +割 +萜 +蔸 +葵 +琴 +捂 +饰 +衙 +耿 +掠 +募 +岂 +窖 +涟 +蔺 +瘤 +柞 +瞪 +怜 +匹 +距 +楔 +炜 +哆 +秦 +缎 +幼 +茁 +绪 +痨 +恨 +楸 +娅 +瓦 +桩 +雪 +嬴 +伏 +榔 +妥 +铿 +拌 +眠 +雍 +缇 +‘ +卓 +搓 +哌 +觞 +噩 +屈 +哧 +髓 +咦 +巅 +娑 +侑 +淫 +膳 +祝 +勾 +姊 +莴 +胄 +疃 +薛 +蜷 +胛 +巷 +芙 +芋 +熙 +闰 +勿 +窃 +狱 +剩 +钏 +幢 +陟 +铛 +慧 +靴 +耍 +k +浙 +浇 +飨 +惟 +绗 +祜 +澈 +啼 +咪 +磷 +摞 +诅 +郦 +抹 +跃 +壬 +吕 +肖 +琏 +颤 +尴 +剡 +抠 +凋 +赚 +泊 +津 +宕 +殷 +倔 +氲 +漫 +邺 +涎 +怠 +$ +垮 +荬 +遵 +俏 +叹 +噢 +饽 +蜘 +孙 +筵 +疼 +鞭 +羧 +牦 +箭 +潴 +c +眸 +祭 +髯 +啖 +坳 +愁 +芩 +驮 +倡 +巽 +穰 +沃 +胚 +怒 +凤 +槛 +剂 +趵 +嫁 +v +邢 +灯 +鄢 +桐 +睽 +檗 +锯 +槟 +婷 +嵋 +圻 +诗 +蕈 +颠 +遭 +痢 +芸 +怯 +馥 +竭 +锗 +徜 +恭 +遍 +籁 +剑 +嘱 +苡 +龄 +僧 +桑 +潸 +弘 +澶 +楹 +悲 +讫 +愤 +腥 +悸 +谍 +椹 +呢 +桓 +葭 +攫 +阀 +翰 +躲 +敖 +柑 +郎 +笨 +橇 +呃 +魁 +燎 +脓 +葩 +磋 +垛 +玺 +狮 +沓 +砜 +蕊 +锺 +罹 +蕉 +翱 +虐 +闾 +巫 +旦 +茱 +嬷 +枯 +鹏 +贡 +芹 +汛 +矫 +绁 +拣 +禺 +佃 +讣 +舫 +惯 +乳 +趋 +疲 +挽 +岚 +虾 +衾 +蠹 +蹂 +飓 +氦 +铖 +孩 +稞 +瑜 +壅 +掀 +勘 +妓 +畅 +髋 +W +庐 +牲 +蓿 +榕 +练 +垣 +唱 +邸 +菲 +昆 +婺 +穿 +绡 +麒 +蚱 +掂 +愚 +泷 +涪 +漳 +妩 +娉 +榄 +讷 +觅 +旧 +藤 +煮 +呛 +柳 +腓 +叭 +庵 +烷 +阡 +罂 +蜕 +擂 +猖 +咿 +媲 +脉 +【 +沏 +貅 +黠 +熏 +哲 +烁 +坦 +酵 +兜 +× +潇 +撒 +剽 +珩 +圹 +乾 +摸 +樟 +帽 +嗒 +襄 +魂 +轿 +憬 +锡 +〕 +喃 +皆 +咖 +隅 +脸 +残 +泮 +袂 +鹂 +珊 +囤 +捆 +咤 +误 +徨 +闹 +淙 +芊 +淋 +怆 +囗 +拨 +梳 +渤 +R +G +绨 +蚓 +婀 +幡 +狩 +麾 +谢 +唢 +裸 +旌 +伉 +纶 +裂 +驳 +砼 +咛 +澄 +樨 +蹈 +宙 +澍 +倍 +貔 +操 +勇 +蟠 +摈 +砧 +虬 +够 +缁 +悦 +藿 +撸 +艹 +摁 +淹 +豇 +虎 +榭 +ˉ +吱 +d +° +喧 +荀 +踱 +侮 +奋 +偕 +饷 +犍 +惮 +坑 +璎 +徘 +宛 +妆 +袈 +倩 +窦 +昂 +荏 +乖 +K +怅 +撰 +鳙 +牙 +袁 +酞 +X +痿 +琼 +闸 +雁 +趾 +荚 +虻 +涝 +《 +杏 +韭 +偈 +烤 +绫 +鞘 +卉 +症 +遢 +蓥 +诋 +杭 +荨 +匆 +竣 +簪 +辙 +敕 +虞 +丹 +缭 +咩 +黟 +m +淤 +瑕 +咂 +铉 +硼 +茨 +嶂 +痒 +畸 +敬 +涿 +粪 +窘 +熟 +叔 +嫔 +盾 +忱 +裘 +憾 +梵 +赡 +珙 +咯 +娘 +庙 +溯 +胺 +葱 +痪 +摊 +荷 +卞 +乒 +髦 +寐 +铭 +坩 +胗 +枷 +爆 +溟 +嚼 +羚 +砬 +轨 +惊 +挠 +罄 +竽 +菏 +氧 +浅 +楣 +盼 +枢 +炸 +阆 +杯 +谏 +噬 +淇 +渺 +俪 +秆 +墓 +泪 +跻 +砌 +痰 +垡 +渡 +耽 +釜 +讶 +鳎 +煞 +呗 +韶 +舶 +绷 +鹳 +缜 +旷 +铊 +皱 +龌 +檀 +霖 +奄 +槐 +艳 +蝶 +旋 +哝 +赶 +骞 +蚧 +腊 +盈 +丁 +` +蜚 +矸 +蝙 +睨 +嚓 +僻 +鬼 +醴 +夜 +彝 +磊 +笔 +拔 +栀 +糕 +厦 +邰 +纫 +逭 +纤 +眦 +膊 +馍 +躇 +烯 +蘼 +冬 +诤 +暄 +骶 +哑 +瘠 +」 +臊 +丕 +愈 +咱 +螺 +擅 +跋 +搏 +硪 +谄 +笠 +淡 +嘿 +骅 +谧 +鼎 +皋 +姚 +歼 +蠢 +驼 +耳 +胬 +挝 +涯 +狗 +蒽 +孓 +犷 +凉 +芦 +箴 +铤 +孤 +嘛 +坤 +V +茴 +朦 +挞 +尖 +橙 +诞 +搴 +碇 +洵 +浚 +帚 +蜍 +漯 +柘 +嚎 +讽 +芭 +荤 +咻 +祠 +秉 +跖 +埃 +吓 +糯 +眷 +馒 +惹 +娼 +鲑 +嫩 +讴 +轮 +瞥 +靶 +褚 +乏 +缤 +宋 +帧 +删 +驱 +碎 +扑 +俩 +俄 +偏 +涣 +竹 +噱 +皙 +佰 +渚 +唧 +斡 +# +镉 +刀 +崎 +筐 +佣 +夭 +贰 +肴 +峙 +哔 +艿 +匐 +牺 +镛 +缘 +仡 +嫡 +劣 +枸 +堀 +梨 +簿 +鸭 +蒸 +亦 +稽 +浴 +{ +衢 +束 +槲 +j +阁 +揍 +疥 +棋 +潋 +聪 +窜 +乓 +睛 +插 +冉 +阪 +苍 +搽 +「 +蟾 +螟 +幸 +仇 +樽 +撂 +慢 +跤 +幔 +俚 +淅 +覃 +觊 +溶 +妖 +帛 +侨 +曰 +妾 +泗 +· +: +瀘 +風 +Ë +( +) +∶ +紅 +紗 +瑭 +雲 +頭 +鶏 +財 +許 +• +¥ +樂 +焗 +麗 +— +; +滙 +東 +榮 +繪 +興 +… +門 +業 +π +楊 +國 +顧 +é +盤 +寳 +Λ +龍 +鳳 +島 +誌 +緣 +結 +銭 +萬 +勝 +祎 +璟 +優 +歡 +臨 +時 +購 += +★ +藍 +昇 +鐵 +觀 +勅 +農 +聲 +畫 +兿 +術 +發 +劉 +記 +專 +耑 +園 +書 +壴 +種 +Ο +● +褀 +號 +銀 +匯 +敟 +锘 +葉 +橪 +廣 +進 +蒄 +鑽 +阝 +祙 +貢 +鍋 +豊 +夬 +喆 +團 +閣 +開 +燁 +賓 +館 +酡 +沔 +順 ++ +硚 +劵 +饸 +陽 +車 +湓 +復 +萊 +氣 +軒 +華 +堃 +迮 +纟 +戶 +馬 +學 +裡 +電 +嶽 +獨 +マ +シ +サ +ジ +燘 +袪 +環 +❤ +臺 +灣 +専 +賣 +孖 +聖 +攝 +線 +▪ +α +傢 +俬 +夢 +達 +莊 +喬 +貝 +薩 +劍 +羅 +壓 +棛 +饦 +尃 +璈 +囍 +醫 +G +I +A +# +N +鷄 +髙 +嬰 +啓 +約 +隹 +潔 +賴 +藝 +~ +寶 +籣 +麺 +  +嶺 +√ +義 +網 +峩 +長 +∧ +魚 +機 +構 +② +鳯 +偉 +L +B +㙟 +畵 +鴿 +' +詩 +溝 +嚞 +屌 +藔 +佧 +玥 +蘭 +織 +1 +3 +9 +0 +7 +點 +砭 +鴨 +鋪 +銘 +廳 +弍 +‧ +創 +湯 +坶 +℃ +卩 +骝 +& +烜 +荘 +當 +潤 +扞 +係 +懷 +碶 +钅 +蚨 +讠 +☆ +叢 +爲 +埗 +涫 +塗 +→ +楽 +現 +鯨 +愛 +瑪 +鈺 +忄 +悶 +藥 +飾 +樓 +視 +孬 +ㆍ +燚 +苪 +師 +① +丼 +锽 +│ +韓 +標 +è +兒 +閏 +匋 +張 +漢 +Ü +髪 +會 +閑 +檔 +習 +裝 +の +峯 +菘 +輝 +И +雞 +釣 +億 +浐 +K +O +R +8 +H +E +P +T +W +D +S +C +M +F +姌 +饹 +» +晞 +廰 +ä +嵯 +鷹 +負 +飲 +絲 +冚 +楗 +澤 +綫 +區 +❋ +← +質 +靑 +揚 +③ +滬 +統 +産 +協 +﹑ +乸 +畐 +經 +運 +際 +洺 +岽 +為 +粵 +諾 +崋 +豐 +碁 +ɔ +V +2 +6 +齋 +誠 +訂 +´ +勑 +雙 +陳 +無 +í +泩 +媄 +夌 +刂 +i +c +t +o +r +a +嘢 +耄 +燴 +暃 +壽 +媽 +靈 +抻 +體 +唻 +É +冮 +甹 +鎮 +錦 +ʌ +蜛 +蠄 +尓 +駕 +戀 +飬 +逹 +倫 +貴 +極 +Я +Й +寬 +磚 +嶪 +郎 +職 +| +間 +n +d +剎 +伈 +課 +飛 +橋 +瘊 +№ +譜 +骓 +圗 +滘 +縣 +粿 +咅 +養 +濤 +彳 +® +% +Ⅱ +啰 +㴪 +見 +矞 +薬 +糁 +邨 +鲮 +顔 +罱 +З +選 +話 +贏 +氪 +俵 +競 +瑩 +繡 +枱 +β +綉 +á +獅 +爾 +™ +麵 +戋 +淩 +徳 +個 +劇 +場 +務 +簡 +寵 +h +實 +膠 +轱 +圖 +築 +嘣 +樹 +㸃 +營 +耵 +孫 +饃 +鄺 +飯 +麯 +遠 +輸 +坫 +孃 +乚 +閃 +鏢 +㎡ +題 +廠 +關 +↑ +爺 +將 +軍 +連 +篦 +覌 +參 +箸 +- +窠 +棽 +寕 +夀 +爰 +歐 +呙 +閥 +頡 +熱 +雎 +垟 +裟 +凬 +勁 +帑 +馕 +夆 +疌 +枼 +馮 +貨 +蒤 +樸 +彧 +旸 +靜 +龢 +暢 +㐱 +鳥 +珺 +鏡 +灡 +爭 +堷 +廚 +Ó +騰 +診 +┅ +蘇 +褔 +凱 +頂 +豕 +亞 +帥 +嘬 +⊥ +仺 +桖 +複 +饣 +絡 +穂 +顏 +棟 +納 +▏ +濟 +親 +設 +計 +攵 +埌 +烺 +ò +頤 +燦 +蓮 +撻 +節 +講 +濱 +濃 +娽 +洳 +朿 +燈 +鈴 +護 +膚 +铔 +過 +補 +Z +U +5 +4 +坋 +闿 +䖝 +餘 +缐 +铞 +貿 +铪 +桼 +趙 +鍊 +[ +㐂 +垚 +菓 +揸 +捲 +鐘 +滏 +𣇉 +爍 +輪 +燜 +鴻 +鮮 +動 +鹞 +鷗 +丄 +慶 +鉌 +翥 +飮 +腸 +⇋ +漁 +覺 +來 +熘 +昴 +翏 +鲱 +圧 +鄉 +萭 +頔 +爐 +嫚 +г +貭 +類 +聯 +幛 +輕 +訓 +鑒 +夋 +锨 +芃 +珣 +䝉 +扙 +嵐 +銷 +處 +ㄱ +語 +誘 +苝 +歸 +儀 +燒 +楿 +內 +粢 +葒 +奧 +麥 +礻 +滿 +蠔 +穵 +瞭 +態 +鱬 +榞 +硂 +鄭 +黃 +煙 +祐 +奓 +逺 +* +瑄 +獲 +聞 +薦 +讀 +這 +樣 +決 +問 +啟 +們 +執 +説 +轉 +單 +隨 +唘 +帶 +倉 +庫 +還 +贈 +尙 +皺 +■ +餅 +產 +○ +∈ +報 +狀 +楓 +賠 +琯 +嗮 +禮 +` +傳 +> +≤ +嗞 +Φ +≥ +換 +咭 +∣ +↓ +曬 +ε +応 +寫 +″ +終 +様 +純 +費 +療 +聨 +凍 +壐 +郵 +ü +黒 +∫ +製 +塊 +調 +軽 +確 +撃 +級 +馴 +Ⅲ +涇 +繹 +數 +碼 +證 +狒 +処 +劑 +< +晧 +賀 +衆 +] +櫥 +兩 +陰 +絶 +對 +鯉 +憶 +◎ +p +e +Y +蕒 +煖 +頓 +測 +試 +鼽 +僑 +碩 +妝 +帯 +≈ +鐡 +舖 +權 +喫 +倆 +ˋ +該 +悅 +ā +俫 +. +f +s +b +m +k +g +u +j +貼 +淨 +濕 +針 +適 +備 +l +/ +給 +謢 +強 +觸 +衛 +與 +⊙ +$ +緯 +變 +⑴ +⑵ +⑶ +㎏ +殺 +∩ +幚 +─ +價 +▲ +離 +ú +ó +飄 +烏 +関 +閟 +﹝ +﹞ +邏 +輯 +鍵 +驗 +訣 +導 +歷 +屆 +層 +▼ +儱 +錄 +熳 +ē +艦 +吋 +錶 +辧 +飼 +顯 +④ +禦 +販 +気 +対 +枰 +閩 +紀 +幹 +瞓 +貊 +淚 +△ +眞 +墊 +Ω +獻 +褲 +縫 +緑 +亜 +鉅 +餠 +{ +} +◆ +蘆 +薈 +█ +◇ +溫 +彈 +晳 +粧 +犸 +穩 +訊 +崬 +凖 +熥 +П +舊 +條 +紋 +圍 +Ⅳ +筆 +尷 +難 +雜 +錯 +綁 +識 +頰 +鎖 +艶 +□ +殁 +殼 +⑧ +├ +▕ +鵬 +ǐ +ō +ǒ +糝 +綱 +▎ +μ +盜 +饅 +醬 +籤 +蓋 +釀 +鹽 +據 +à +ɡ +辦 +◥ +彐 +┌ +婦 +獸 +鲩 +伱 +ī +蒟 +蒻 +齊 +袆 +腦 +寧 +凈 +妳 +煥 +詢 +偽 +謹 +啫 +鯽 +騷 +鱸 +損 +傷 +鎻 +髮 +買 +冏 +儥 +両 +﹢ +∞ +載 +喰 +z +羙 +悵 +燙 +曉 +員 +組 +徹 +艷 +痠 +鋼 +鼙 +縮 +細 +嚒 +爯 +≠ +維 +" +鱻 +壇 +厍 +帰 +浥 +犇 +薡 +軎 +² +應 +醜 +刪 +緻 +鶴 +賜 +噁 +軌 +尨 +镔 +鷺 +槗 +彌 +葚 +濛 +請 +溇 +緹 +賢 +訪 +獴 +瑅 +資 +縤 +陣 +蕟 +栢 +韻 +祼 +恁 +伢 +謝 +劃 +涑 +總 +衖 +踺 +砋 +凉 +籃 +駿 +苼 +瘋 +昽 +紡 +驊 +腎 +﹗ +響 +杋 +剛 +嚴 +禪 +歓 +槍 +傘 +檸 +檫 +炣 +勢 +鏜 +鎢 +銑 +尐 +減 +奪 +惡 +θ +僮 +婭 +臘 +ū +ì +殻 +鉄 +∑ +蛲 +焼 +緖 +續 +紹 +懮 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_attr.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_attr.py new file mode 100644 index 0000000000000000000000000000000000000000..eb1b9423b64b3f1649f6e95790a0fa5b9fbcd7ff --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_attr.py @@ -0,0 +1,150 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob + +import cv2 +import numpy as np +import math +import paddle +import sys +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3))) +sys.path.insert(0, parent_path) + +from paddle.inference import Config, create_predictor +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from pipeline.pphuman.attr_infer import AttrDetector + + +class VehicleAttr(AttrDetector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + type_threshold (float): The threshold of score for vehicle type recognition. + color_threshold (float): The threshold of score for vehicle color recognition. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + color_threshold=0.5, + type_threshold=0.5): + super(VehicleAttr, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir) + self.color_threshold = color_threshold + self.type_threshold = type_threshold + self.result_history = {} + self.color_list = [ + "yellow", "orange", "green", "gray", "red", "blue", "white", + "golden", "brown", "black" + ] + self.type_list = [ + "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck", + "estate" + ] + + @classmethod + def init_with_cfg(cls, args, cfg): + return cls(model_dir=cfg['model_dir'], + batch_size=cfg['batch_size'], + color_threshold=cfg['color_threshold'], + type_threshold=cfg['type_threshold'], + device=args.device, + run_mode=args.run_mode, + trt_min_shape=args.trt_min_shape, + trt_max_shape=args.trt_max_shape, + trt_opt_shape=args.trt_opt_shape, + trt_calib_mode=args.trt_calib_mode, + cpu_threads=args.cpu_threads, + enable_mkldnn=args.enable_mkldnn) + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + res = res.tolist() + attr_res = [] + color_res_str = "Color: " + type_res_str = "Type: " + color_idx = np.argmax(res[:10]) + type_idx = np.argmax(res[10:]) + + if res[color_idx] >= self.color_threshold: + color_res_str += self.color_list[color_idx] + else: + color_res_str += "Unknown" + attr_res.append(color_res_str) + + if res[type_idx + 10] >= self.type_threshold: + type_res_str += self.type_list[type_idx] + else: + type_res_str += "Unknown" + attr_res.append(type_res_str) + + batch_res.append(attr_res) + result = {'output': batch_res} + return result + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plate.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plate.py new file mode 100644 index 0000000000000000000000000000000000000000..01f260e7f183d5efccec776e02e51f9c3e6fe37e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plate.py @@ -0,0 +1,331 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +from functools import reduce + +import time +import cv2 +import numpy as np +import math +import paddle + +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3))) +sys.path.insert(0, parent_path) + +from python.infer import get_test_images +from python.preprocess import preprocess, NormalizeImage, Permute, Resize_Mult32 +from pipeline.ppvehicle.vehicle_plateutils import create_predictor, get_infer_gpuid, get_rotate_crop_image, draw_boxes +from pipeline.ppvehicle.vehicleplate_postprocess import build_post_process +from pipeline.cfg_utils import merge_cfg, print_arguments, argsparser + + +class PlateDetector(object): + def __init__(self, args, cfg): + self.args = args + self.pre_process_list = { + 'Resize_Mult32': { + 'limit_side_len': cfg['det_limit_side_len'], + 'limit_type': cfg['det_limit_type'], + }, + 'NormalizeImage': { + 'mean': [0.485, 0.456, 0.406], + 'std': [0.229, 0.224, 0.225], + 'is_scale': True, + }, + 'Permute': {} + } + postprocess_params = {} + postprocess_params['name'] = 'DBPostProcess' + postprocess_params["thresh"] = 0.3 + postprocess_params["box_thresh"] = 0.6 + postprocess_params["max_candidates"] = 1000 + postprocess_params["unclip_ratio"] = 1.5 + postprocess_params["use_dilation"] = False + postprocess_params["score_mode"] = "fast" + + self.postprocess_op = build_post_process(postprocess_params) + self.predictor, self.input_tensor, self.output_tensors, self.config = create_predictor( + args, cfg, 'det') + + def preprocess(self, im_path): + preprocess_ops = [] + for op_type, new_op_info in self.pre_process_list.items(): + preprocess_ops.append(eval(op_type)(**new_op_info)) + + input_im_lst = [] + input_im_info_lst = [] + + im, im_info = preprocess(im_path, preprocess_ops) + input_im_lst.append(im) + input_im_info_lst.append(im_info['im_shape'] / im_info['scale_factor']) + + return np.stack(input_im_lst, axis=0), input_im_info_lst + + def order_points_clockwise(self, pts): + rect = np.zeros((4, 2), dtype="float32") + s = pts.sum(axis=1) + rect[0] = pts[np.argmin(s)] + rect[2] = pts[np.argmax(s)] + diff = np.diff(pts, axis=1) + rect[1] = pts[np.argmin(diff)] + rect[3] = pts[np.argmax(diff)] + return rect + + def clip_det_res(self, points, img_height, img_width): + for pno in range(points.shape[0]): + points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1)) + points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1)) + return points + + def filter_tag_det_res(self, dt_boxes, image_shape): + img_height, img_width = image_shape[0:2] + dt_boxes_new = [] + for box in dt_boxes: + box = self.order_points_clockwise(box) + box = self.clip_det_res(box, img_height, img_width) + rect_width = int(np.linalg.norm(box[0] - box[1])) + rect_height = int(np.linalg.norm(box[0] - box[3])) + if rect_width <= 3 or rect_height <= 3: + continue + dt_boxes_new.append(box) + dt_boxes = np.array(dt_boxes_new) + return dt_boxes + + def filter_tag_det_res_only_clip(self, dt_boxes, image_shape): + img_height, img_width = image_shape[0:2] + dt_boxes_new = [] + for box in dt_boxes: + box = self.clip_det_res(box, img_height, img_width) + dt_boxes_new.append(box) + dt_boxes = np.array(dt_boxes_new) + return dt_boxes + + def predict_image(self, img_list): + st = time.time() + + dt_batch_boxes = [] + for image in img_list: + img, shape_list = self.preprocess(image) + if img is None: + return None, 0 + self.input_tensor.copy_from_cpu(img) + self.predictor.run() + outputs = [] + for output_tensor in self.output_tensors: + output = output_tensor.copy_to_cpu() + outputs.append(output) + + preds = {} + preds['maps'] = outputs[0] + + #self.predictor.try_shrink_memory() + post_result = self.postprocess_op(preds, shape_list) + # print("post_result length:{}".format(len(post_result))) + + org_shape = image.shape + dt_boxes = post_result[0]['points'] + dt_boxes = self.filter_tag_det_res(dt_boxes, org_shape) + dt_batch_boxes.append(dt_boxes) + + et = time.time() + return dt_batch_boxes, et - st + + +class TextRecognizer(object): + def __init__(self, args, cfg, use_gpu=True): + self.rec_image_shape = cfg['rec_image_shape'] + self.rec_batch_num = cfg['rec_batch_num'] + word_dict_path = cfg['word_dict_path'] + use_space_char = True + + postprocess_params = { + 'name': 'CTCLabelDecode', + "character_dict_path": word_dict_path, + "use_space_char": use_space_char + } + self.postprocess_op = build_post_process(postprocess_params) + self.predictor, self.input_tensor, self.output_tensors, self.config = \ + create_predictor(args, cfg, 'rec') + self.use_onnx = False + + def resize_norm_img(self, img, max_wh_ratio): + imgC, imgH, imgW = self.rec_image_shape + + assert imgC == img.shape[2] + imgW = int((imgH * max_wh_ratio)) + if self.use_onnx: + w = self.input_tensor.shape[3:][0] + if w is not None and w > 0: + imgW = w + + h, w = img.shape[:2] + ratio = w / float(h) + if math.ceil(imgH * ratio) > imgW: + resized_w = imgW + else: + resized_w = int(math.ceil(imgH * ratio)) + resized_image = cv2.resize(img, (resized_w, imgH)) + resized_image = resized_image.astype('float32') + resized_image = resized_image.transpose((2, 0, 1)) / 255 + resized_image -= 0.5 + resized_image /= 0.5 + padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) + padding_im[:, :, 0:resized_w] = resized_image + return padding_im + + def predict_text(self, img_list): + img_num = len(img_list) + # Calculate the aspect ratio of all text bars + width_list = [] + for img in img_list: + width_list.append(img.shape[1] / float(img.shape[0])) + # Sorting can speed up the recognition process + indices = np.argsort(np.array(width_list)) + rec_res = [['', 0.0]] * img_num + batch_num = self.rec_batch_num + st = time.time() + for beg_img_no in range(0, img_num, batch_num): + end_img_no = min(img_num, beg_img_no + batch_num) + norm_img_batch = [] + imgC, imgH, imgW = self.rec_image_shape + max_wh_ratio = imgW / imgH + # max_wh_ratio = 0 + for ino in range(beg_img_no, end_img_no): + h, w = img_list[indices[ino]].shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + for ino in range(beg_img_no, end_img_no): + norm_img = self.resize_norm_img(img_list[indices[ino]], + max_wh_ratio) + norm_img = norm_img[np.newaxis, :] + norm_img_batch.append(norm_img) + norm_img_batch = np.concatenate(norm_img_batch) + norm_img_batch = norm_img_batch.copy() + if self.use_onnx: + input_dict = {} + input_dict[self.input_tensor.name] = norm_img_batch + outputs = self.predictor.run(self.output_tensors, input_dict) + preds = outputs[0] + else: + self.input_tensor.copy_from_cpu(norm_img_batch) + self.predictor.run() + outputs = [] + for output_tensor in self.output_tensors: + output = output_tensor.copy_to_cpu() + outputs.append(output) + if len(outputs) != 1: + preds = outputs + else: + preds = outputs[0] + rec_result = self.postprocess_op(preds) + for rno in range(len(rec_result)): + rec_res[indices[beg_img_no + rno]] = rec_result[rno] + return rec_res, time.time() - st + + +class PlateRecognizer(object): + def __init__(self, args, cfg): + use_gpu = args.device.lower() == "gpu" + self.platedetector = PlateDetector(args, cfg) + self.textrecognizer = TextRecognizer(args, cfg, use_gpu=use_gpu) + + def get_platelicense(self, image_list): + plate_text_list = [] + plateboxes, det_time = self.platedetector.predict_image(image_list) + for idx, boxes_pcar in enumerate(plateboxes): + plate_pcar_list = [] + for box in boxes_pcar: + plate_images = get_rotate_crop_image(image_list[idx], box) + plate_texts = self.textrecognizer.predict_text([plate_images]) + plate_pcar_list.append(plate_texts) + plate_text_list.append(plate_pcar_list) + return self.check_plate(plate_text_list) + + def check_plate(self, text_list): + plate_all = {"plate": []} + for text_pcar in text_list: + platelicense = "" + for text_info in text_pcar: + text = text_info[0][0][0] + if len(text) > 2 and len(text) < 10: + platelicense = self.replace_cn_code(text) + plate_all["plate"].append(platelicense) + return plate_all + + def replace_cn_code(self, text): + simcode = { + '浙': 'ZJ-', + '粤': 'GD-', + '京': 'BJ-', + '津': 'TJ-', + '冀': 'HE-', + '晋': 'SX-', + '蒙': 'NM-', + '辽': 'LN-', + '黑': 'HLJ-', + '沪': 'SH-', + '吉': 'JL-', + '苏': 'JS-', + '皖': 'AH-', + '赣': 'JX-', + '鲁': 'SD-', + '豫': 'HA-', + '鄂': 'HB-', + '湘': 'HN-', + '桂': 'GX-', + '琼': 'HI-', + '渝': 'CQ-', + '川': 'SC-', + '贵': 'GZ-', + '云': 'YN-', + '藏': 'XZ-', + '陕': 'SN-', + '甘': 'GS-', + '青': 'QH-', + '宁': 'NX-', + '闽': 'FJ-', + '·': ' ' + } + for _char in text: + if _char in simcode: + text = text.replace(_char, simcode[_char]) + return text + + +def main(): + cfg = merge_cfg(FLAGS) + print_arguments(cfg) + vehicleplate_cfg = cfg['VEHICLE_PLATE'] + detector = PlateRecognizer(FLAGS, vehicleplate_cfg) + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + for img in img_list: + image = cv2.imread(img) + results = detector.get_platelicense([image]) + print(results) + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plateutils.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plateutils.py new file mode 100644 index 0000000000000000000000000000000000000000..8a93c945cb95ea4b456b1be19572a717ca61150d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_plateutils.py @@ -0,0 +1,505 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import sys +import platform +import cv2 +import numpy as np +import paddle +from PIL import Image, ImageDraw, ImageFont +import math +from paddle import inference +import time +import ast + + +def create_predictor(args, cfg, mode): + if mode == "det": + model_dir = cfg['det_model_dir'] + else: + model_dir = cfg['rec_model_dir'] + + if model_dir is None: + print("not find {} model file path {}".format(mode, model_dir)) + sys.exit(0) + + model_file_path = model_dir + "/inference.pdmodel" + params_file_path = model_dir + "/inference.pdiparams" + if not os.path.exists(model_file_path): + raise ValueError("not find model file path {}".format(model_file_path)) + if not os.path.exists(params_file_path): + raise ValueError("not find params file path {}".format( + params_file_path)) + + config = inference.Config(model_file_path, params_file_path) + + batch_size = 1 + + if args.device == "GPU": + gpu_id = get_infer_gpuid() + if gpu_id is None: + print( + "GPU is not found in current device by nvidia-smi. Please check your device or ignore it if run on jetson." + ) + config.enable_use_gpu(500, 0) + + precision_map = { + 'trt_int8': inference.PrecisionType.Int8, + 'trt_fp32': inference.PrecisionType.Float32, + 'trt_fp16': inference.PrecisionType.Half + } + min_subgraph_size = 15 + if args.run_mode in precision_map.keys(): + config.enable_tensorrt_engine( + workspace_size=(1 << 25) * batch_size, + max_batch_size=batch_size, + min_subgraph_size=min_subgraph_size, + precision_mode=precision_map[args.run_mode]) + use_dynamic_shape = True + + if mode == "det": + min_input_shape = { + "x": [1, 3, 50, 50], + "conv2d_92.tmp_0": [1, 120, 20, 20], + "conv2d_91.tmp_0": [1, 24, 10, 10], + "conv2d_59.tmp_0": [1, 96, 20, 20], + "nearest_interp_v2_1.tmp_0": [1, 256, 10, 10], + "nearest_interp_v2_2.tmp_0": [1, 256, 20, 20], + "conv2d_124.tmp_0": [1, 256, 20, 20], + "nearest_interp_v2_3.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_4.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_5.tmp_0": [1, 64, 20, 20], + "elementwise_add_7": [1, 56, 2, 2], + "nearest_interp_v2_0.tmp_0": [1, 256, 2, 2] + } + max_input_shape = { + "x": [1, 3, 1536, 1536], + "conv2d_92.tmp_0": [1, 120, 400, 400], + "conv2d_91.tmp_0": [1, 24, 200, 200], + "conv2d_59.tmp_0": [1, 96, 400, 400], + "nearest_interp_v2_1.tmp_0": [1, 256, 200, 200], + "conv2d_124.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_2.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_3.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_4.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_5.tmp_0": [1, 64, 400, 400], + "elementwise_add_7": [1, 56, 400, 400], + "nearest_interp_v2_0.tmp_0": [1, 256, 400, 400] + } + opt_input_shape = { + "x": [1, 3, 640, 640], + "conv2d_92.tmp_0": [1, 120, 160, 160], + "conv2d_91.tmp_0": [1, 24, 80, 80], + "conv2d_59.tmp_0": [1, 96, 160, 160], + "nearest_interp_v2_1.tmp_0": [1, 256, 80, 80], + "nearest_interp_v2_2.tmp_0": [1, 256, 160, 160], + "conv2d_124.tmp_0": [1, 256, 160, 160], + "nearest_interp_v2_3.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_4.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_5.tmp_0": [1, 64, 160, 160], + "elementwise_add_7": [1, 56, 40, 40], + "nearest_interp_v2_0.tmp_0": [1, 256, 40, 40] + } + min_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 20, 20], + "nearest_interp_v2_27.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_28.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_29.tmp_0": [1, 64, 20, 20] + } + max_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_27.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_28.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_29.tmp_0": [1, 64, 400, 400] + } + opt_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 160, 160], + "nearest_interp_v2_27.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_28.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_29.tmp_0": [1, 64, 160, 160] + } + min_input_shape.update(min_pact_shape) + max_input_shape.update(max_pact_shape) + opt_input_shape.update(opt_pact_shape) + elif mode == "rec": + imgH = int(cfg['rec_image_shape'][-2]) + min_input_shape = {"x": [1, 3, imgH, 10]} + max_input_shape = {"x": [batch_size, 3, imgH, 2304]} + opt_input_shape = {"x": [batch_size, 3, imgH, 320]} + config.exp_disable_tensorrt_ops(["transpose2"]) + elif mode == "cls": + min_input_shape = {"x": [1, 3, 48, 10]} + max_input_shape = {"x": [batch_size, 3, 48, 1024]} + opt_input_shape = {"x": [batch_size, 3, 48, 320]} + else: + use_dynamic_shape = False + if use_dynamic_shape: + config.set_trt_dynamic_shape_info( + min_input_shape, max_input_shape, opt_input_shape) + + else: + config.disable_gpu() + if hasattr(args, "cpu_threads"): + config.set_cpu_math_library_num_threads(args.cpu_threads) + else: + # default cpu threads as 10 + config.set_cpu_math_library_num_threads(10) + if args.enable_mkldnn: + # cache 10 different shapes for mkldnn to avoid memory leak + config.set_mkldnn_cache_capacity(10) + config.enable_mkldnn() + if args.run_mode == "fp16": + config.enable_mkldnn_bfloat16() + # enable memory optim + config.enable_memory_optim() + config.disable_glog_info() + config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") + config.delete_pass("matmul_transpose_reshape_fuse_pass") + if mode == 'table': + config.delete_pass("fc_fuse_pass") # not supported for table + config.switch_use_feed_fetch_ops(False) + config.switch_ir_optim(True) + + # create predictor + predictor = inference.create_predictor(config) + input_names = predictor.get_input_names() + for name in input_names: + input_tensor = predictor.get_input_handle(name) + output_tensors = get_output_tensors(cfg, mode, predictor) + return predictor, input_tensor, output_tensors, config + + +def get_output_tensors(cfg, mode, predictor): + output_names = predictor.get_output_names() + output_tensors = [] + output_name = 'softmax_0.tmp_0' + if output_name in output_names: + return [predictor.get_output_handle(output_name)] + else: + for output_name in output_names: + output_tensor = predictor.get_output_handle(output_name) + output_tensors.append(output_tensor) + return output_tensors + + +def get_infer_gpuid(): + sysstr = platform.system() + if sysstr == "Windows": + return 0 + + if not paddle.fluid.core.is_compiled_with_rocm(): + cmd = "env | grep CUDA_VISIBLE_DEVICES" + else: + cmd = "env | grep HIP_VISIBLE_DEVICES" + env_cuda = os.popen(cmd).readlines() + if len(env_cuda) == 0: + return 0 + else: + gpu_id = env_cuda[0].strip().split("=")[1] + return int(gpu_id[0]) + + +def draw_e2e_res(dt_boxes, strs, img_path): + src_im = cv2.imread(img_path) + for box, str in zip(dt_boxes, strs): + box = box.astype(np.int32).reshape((-1, 1, 2)) + cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2) + cv2.putText( + src_im, + str, + org=(int(box[0, 0, 0]), int(box[0, 0, 1])), + fontFace=cv2.FONT_HERSHEY_COMPLEX, + fontScale=0.7, + color=(0, 255, 0), + thickness=1) + return src_im + + +def draw_text_det_res(dt_boxes, img_path): + src_im = cv2.imread(img_path) + for box in dt_boxes: + box = np.array(box).astype(np.int32).reshape(-1, 2) + cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2) + return src_im + + +def resize_img(img, input_size=600): + """ + resize img and limit the longest side of the image to input_size + """ + img = np.array(img) + im_shape = img.shape + im_size_max = np.max(im_shape[0:2]) + im_scale = float(input_size) / float(im_size_max) + img = cv2.resize(img, None, None, fx=im_scale, fy=im_scale) + return img + + +def draw_ocr(image, + boxes, + txts=None, + scores=None, + drop_score=0.5, + font_path="./doc/fonts/simfang.ttf"): + """ + Visualize the results of OCR detection and recognition + args: + image(Image|array): RGB image + boxes(list): boxes with shape(N, 4, 2) + txts(list): the texts + scores(list): txxs corresponding scores + drop_score(float): only scores greater than drop_threshold will be visualized + font_path: the path of font which is used to draw text + return(array): + the visualized img + """ + if scores is None: + scores = [1] * len(boxes) + box_num = len(boxes) + for i in range(box_num): + if scores is not None and (scores[i] < drop_score or + math.isnan(scores[i])): + continue + box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64) + image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) + if txts is not None: + img = np.array(resize_img(image, input_size=600)) + txt_img = text_visual( + txts, + scores, + img_h=img.shape[0], + img_w=600, + threshold=drop_score, + font_path=font_path) + img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) + return img + return image + + +def draw_ocr_box_txt(image, + boxes, + txts, + scores=None, + drop_score=0.5, + font_path="./doc/simfang.ttf"): + h, w = image.height, image.width + img_left = image.copy() + img_right = Image.new('RGB', (w, h), (255, 255, 255)) + + import random + + random.seed(0) + draw_left = ImageDraw.Draw(img_left) + draw_right = ImageDraw.Draw(img_right) + for idx, (box, txt) in enumerate(zip(boxes, txts)): + if scores is not None and scores[idx] < drop_score: + continue + color = (random.randint(0, 255), random.randint(0, 255), + random.randint(0, 255)) + draw_left.polygon(box, fill=color) + draw_right.polygon( + [ + box[0][0], box[0][1], box[1][0], box[1][1], box[2][0], + box[2][1], box[3][0], box[3][1] + ], + outline=color) + box_height = math.sqrt((box[0][0] - box[3][0])**2 + (box[0][1] - box[3][ + 1])**2) + box_width = math.sqrt((box[0][0] - box[1][0])**2 + (box[0][1] - box[1][ + 1])**2) + if box_height > 2 * box_width: + font_size = max(int(box_width * 0.9), 10) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + cur_y = box[0][1] + for c in txt: + char_size = font.getsize(c) + draw_right.text( + (box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font) + cur_y += char_size[1] + else: + font_size = max(int(box_height * 0.8), 10) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + draw_right.text( + [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) + img_left = Image.blend(image, img_left, 0.5) + img_show = Image.new('RGB', (w * 2, h), (255, 255, 255)) + img_show.paste(img_left, (0, 0, w, h)) + img_show.paste(img_right, (w, 0, w * 2, h)) + return np.array(img_show) + + +def str_count(s): + """ + Count the number of Chinese characters, + a single English character and a single number + equal to half the length of Chinese characters. + args: + s(string): the input of string + return(int): + the number of Chinese characters + """ + import string + count_zh = count_pu = 0 + s_len = len(s) + en_dg_count = 0 + for c in s: + if c in string.ascii_letters or c.isdigit() or c.isspace(): + en_dg_count += 1 + elif c.isalpha(): + count_zh += 1 + else: + count_pu += 1 + return s_len - math.ceil(en_dg_count / 2) + + +def text_visual(texts, + scores, + img_h=400, + img_w=600, + threshold=0., + font_path="./doc/simfang.ttf"): + """ + create new blank img and draw txt on it + args: + texts(list): the text will be draw + scores(list|None): corresponding score of each txt + img_h(int): the height of blank img + img_w(int): the width of blank img + font_path: the path of font which is used to draw text + return(array): + """ + if scores is not None: + assert len(texts) == len( + scores), "The number of txts and corresponding scores must match" + + def create_blank_img(): + blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255 + blank_img[:, img_w - 1:] = 0 + blank_img = Image.fromarray(blank_img).convert("RGB") + draw_txt = ImageDraw.Draw(blank_img) + return blank_img, draw_txt + + blank_img, draw_txt = create_blank_img() + + font_size = 20 + txt_color = (0, 0, 0) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + + gap = font_size + 5 + txt_img_list = [] + count, index = 1, 0 + for idx, txt in enumerate(texts): + index += 1 + if scores[idx] < threshold or math.isnan(scores[idx]): + index -= 1 + continue + first_line = True + while str_count(txt) >= img_w // font_size - 4: + tmp = txt + txt = tmp[:img_w // font_size - 4] + if first_line: + new_txt = str(index) + ': ' + txt + first_line = False + else: + new_txt = ' ' + txt + draw_txt.text((0, gap * count), new_txt, txt_color, font=font) + txt = tmp[img_w // font_size - 4:] + if count >= img_h // gap - 1: + txt_img_list.append(np.array(blank_img)) + blank_img, draw_txt = create_blank_img() + count = 0 + count += 1 + if first_line: + new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx]) + else: + new_txt = " " + txt + " " + '%.3f' % (scores[idx]) + draw_txt.text((0, gap * count), new_txt, txt_color, font=font) + # whether add new blank img or not + if count >= img_h // gap - 1 and idx + 1 < len(texts): + txt_img_list.append(np.array(blank_img)) + blank_img, draw_txt = create_blank_img() + count = 0 + count += 1 + txt_img_list.append(np.array(blank_img)) + if len(txt_img_list) == 1: + blank_img = np.array(txt_img_list[0]) + else: + blank_img = np.concatenate(txt_img_list, axis=1) + return np.array(blank_img) + + +def base64_to_cv2(b64str): + import base64 + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +def draw_boxes(image, boxes, scores=None, drop_score=0.5): + if scores is None: + scores = [1] * len(boxes) + for (box, score) in zip(boxes, scores): + if score < drop_score: + continue + box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) + image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) + return image + + +def get_rotate_crop_image(img, points): + ''' + img_height, img_width = img.shape[0:2] + left = int(np.min(points[:, 0])) + right = int(np.max(points[:, 0])) + top = int(np.min(points[:, 1])) + bottom = int(np.max(points[:, 1])) + img_crop = img[top:bottom, left:right, :].copy() + points[:, 0] = points[:, 0] - left + points[:, 1] = points[:, 1] - top + ''' + assert len(points) == 4, "shape of points must be 4*2" + img_crop_width = int( + max( + np.linalg.norm(points[0] - points[1]), + np.linalg.norm(points[2] - points[3]))) + img_crop_height = int( + max( + np.linalg.norm(points[0] - points[3]), + np.linalg.norm(points[1] - points[2]))) + pts_std = np.float32([[0, 0], [img_crop_width, 0], + [img_crop_width, img_crop_height], + [0, img_crop_height]]) + M = cv2.getPerspectiveTransform(points, pts_std) + dst_img = cv2.warpPerspective( + img, + M, (img_crop_width, img_crop_height), + borderMode=cv2.BORDER_REPLICATE, + flags=cv2.INTER_CUBIC) + dst_img_height, dst_img_width = dst_img.shape[0:2] + if dst_img_height * 1.0 / dst_img_width >= 1.5: + dst_img = np.rot90(dst_img) + return dst_img + + +def check_gpu(use_gpu): + if use_gpu and not paddle.is_compiled_with_cuda(): + use_gpu = False + return use_gpu + + +if __name__ == '__main__': + pass diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_pressing.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_pressing.py new file mode 100644 index 0000000000000000000000000000000000000000..77f7d4ca71dbe2a2ac529bf33ca47554450c6fff --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_pressing.py @@ -0,0 +1,81 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import numpy as np +import math + + +class VehiclePressingRecognizer(object): + def __init__(self, cfg): + self.cfg = cfg + + def judge(self, Ax1, Ay1, Ax2, Ay2, Bx1, By1, Bx2, By2): + + if (max(Ax1,Ax2)>=min(Bx1,Bx2) and min(Ax1,Ax2)<=max(Bx1,Bx2)) and \ + (max(Ay1,Ay2)>=min(By1,By2) and min(Ay1,Ay2)<=max(By1,By2)): + + if ((Bx1-Ax1)*(Ay2-Ay1)-(By1-Ay1)*(Ax2-Ax1)) * ((Bx2-Ax1)*(Ay2-Ay1)-(By2-Ay1)*(Ax2-Ax1))<=0 \ + and ((Ax1-Bx1)*(By2-By1)-(Ay1-By1)*(Bx2-Bx1)) * ((Ax2-Bx1)*(By2-By1)-(Ay2-By1)*(Bx2-Bx1)) <=0: + return True + else: + return False + else: + return False + + def is_intersect(self, line, bbox): + Ax1, Ay1, Ax2, Ay2 = line + + xmin, ymin, xmax, ymax = bbox + + bottom = self.judge(Ax1, Ay1, Ax2, Ay2, xmin, ymax, xmax, ymax) + return bottom + + def run(self, lanes, det_res): + intersect_bbox_list = [] + start_idx, boxes_num_i = 0, 0 + + for i in range(len(lanes)): + lane = lanes[i] + if det_res is not None: + det_res_i = {} + boxes_num_i = det_res['boxes_num'][i] + det_res_i['boxes'] = det_res['boxes'][start_idx:start_idx + + boxes_num_i, :] + intersect_bbox = [] + + for line in lane: + for bbox in det_res_i['boxes']: + if self.is_intersect(line, bbox[2:]): + intersect_bbox.append(bbox) + intersect_bbox_list.append(intersect_bbox) + + start_idx += boxes_num_i + + return intersect_bbox_list + + def mot_run(self, lanes, det_res): + + intersect_bbox_list = [] + if det_res is None: + return intersect_bbox_list + lanes_res = lanes['output'] + for i in range(len(lanes_res)): + lane = lanes_res[i] + for line in lane: + for bbox in det_res: + if self.is_intersect(line, bbox[3:]): + intersect_bbox_list.append(bbox) + return intersect_bbox_list \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_retrograde.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_retrograde.py new file mode 100644 index 0000000000000000000000000000000000000000..f9c32e321fca632940bab14123cddb9371019bf5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicle_retrograde.py @@ -0,0 +1,320 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import math + + +class VehicleRetrogradeRecognizer(object): + def __init__(self, cfg): + self.cfg = cfg + self.filter_horizontal_flag = self.cfg['filter_horizontal_flag'] + self.deviation = self.cfg['deviation'] + self.move_scale = self.cfg['move_scale'] + self.keep_right_flag = self.cfg['keep_right_flag'] + self.center_traj_retrograde = [{}] #retrograde recognizer record use + self.fence_line = None if len(self.cfg[ + 'fence_line']) == 0 else self.cfg['fence_line'] + + def update_center_traj(self, mot_res, max_len): + from collections import deque, defaultdict + if mot_res is not None: + ids = mot_res['boxes'][:, 0] + scores = mot_res['boxes'][:, 2] + boxes = mot_res['boxes'][:, 3:] + boxes[:, 2] = boxes[:, 2] - boxes[:, 0] + boxes[:, 3] = boxes[:, 3] - boxes[:, 1] + else: + boxes = np.zeros([0, 4]) + ids = np.zeros([0]) + scores = np.zeros([0]) + + # single class, still need to be defaultdict type for ploting + num_classes = 1 + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + online_tlwhs[0] = boxes + online_ids[0] = ids + + if mot_res is not None: + for cls_id in range(num_classes): + tlwhs = online_tlwhs[cls_id] + obj_ids = online_ids[cls_id] + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + center = tuple(map(int, (x1 + w / 2., y1 + h))) + obj_id = int(obj_ids[i]) + if self.center_traj_retrograde is not None: + if obj_id not in self.center_traj_retrograde[cls_id]: + self.center_traj_retrograde[cls_id][obj_id] = deque( + maxlen=max_len) + self.center_traj_retrograde[cls_id][obj_id].append( + center) + + def get_angle(self, array): + + x1, y1, x2, y2 = array + a_x = x2 - x1 + a_y = y2 - y1 + angle1 = math.atan2(a_y, a_x) + angle1 = int(angle1 * 180 / math.pi) + + a_x = x2 - x1 if y2 >= y1 else x1 - x2 + a_y = y2 - y1 if y2 >= y1 else y1 - y2 + angle2 = math.atan2(a_y, a_x) + angle2 = int(angle2 * 180 / math.pi) + if angle2 > 90: + angle2 = 180 - angle2 + + return angle1, angle2 + + def is_move(self, array, frame_shape): + x1, y1, x2, y2 = array + h, w, _ = frame_shape + + if abs(x1 - x2) > w * self.move_scale or abs(y1 - + y2) > h * self.move_scale: + return True + else: + return False + + def get_distance_point2line(self, point, line): + + line_point1, line_point2 = np.array(line[0:2]), np.array(line[2:]) + vec1 = line_point1 - point + vec2 = line_point2 - point + distance = np.abs(np.cross(vec1, vec2)) / np.linalg.norm(line_point1 - + line_point2) + + return distance + + def driving_direction(self, line1, line2, is_init=False): + x1, y1 = line1[2] - line1[0], line1[3] - line1[1] + x2, y2 = line2[0] - line1[0], line2[1] - line1[1] + result = x1 * y2 - x2 * y1 + + distance = self.get_distance_point2line([x2, y2], line1) + + if result < 0: + result = 1 + elif result == 0: + if line2[3] >= line2[1]: + return -1 + else: + return 1 + else: + result = -1 + + return result, distance + + def get_long_fence_line(self, h, w, line): + + x1, y1, x2, y2 = line + if x1 == x2: + return [x1, 0, x1, h] + if y1 == y2: + return [0, y1, w, y1] + k = (y2 - y1) / (x2 - x1) + b = y1 - k * x1 + + if k == 1 and b == 0: + return [0, 0, w, h] + if k == -1 and b == 0: + return [w, 0, h, h] + + top = [-b / k, 0] + left = [0, b] + right = [w, k * w + b] + bottom = [(h - b) / k, h] + candidate = np.array([top, left, right, bottom]) + + flag = np.array([0, 0, 0, 0]) + + if top[0] >= 0 and top[0] <= w: + flag[0] = 1 + if left[1] > 0 and left[1] <= h: + flag[1] = 1 + if right[1] > 0 and right[1] <= h: + flag[2] = 1 + if bottom[0] > 0 and bottom[0] < w: + flag[3] = 1 + + ind = np.where(flag == 1) + candidate = candidate[ind] + candidate_sort = candidate[candidate[:, 1].argsort()] + + return [ + int(candidate_sort[0][0]), int(candidate_sort[0][1]), + int(candidate_sort[1][0]), int(candidate_sort[1][1]) + ] + + def init_fence_line(self, lanes, pos_dir_traj, neg_dir_traj, frame_shape): + + fence_lines_candidate = None + h, w, _ = frame_shape + abs_distance = h * h + w * w + + for lane in lanes[0]: + pos_dir_distansce = h * h + w * w + neg_dir_distansce = h * h + w * w + pos_dir = 0 + neg_dir = 0 + + for traj_line in pos_dir_traj: + dir_result, distansce = self.driving_direction( + lane, traj_line['traj_line']) + if dir_result > 0: + pos_dir_distansce = distansce if distansce < pos_dir_distansce else pos_dir_distansce + pos_dir = 1 + else: + neg_dir_distansce = distansce if distansce < neg_dir_distansce else neg_dir_distansce + neg_dir = 1 + + if pos_dir > 0 and neg_dir > 0: + continue + + for traj_line in neg_dir_traj: + + dir_result, distansce = self.driving_direction( + lane, traj_line['traj_line']) + + if dir_result > 0: + pos_dir_distansce = distansce if distansce < pos_dir_distansce else pos_dir_distansce + pos_dir = 1 + else: + neg_dir_distansce = distansce if distansce < neg_dir_distansce else neg_dir_distansce + neg_dir = 1 + + if pos_dir > 0 and neg_dir > 0: + diff_dir_distance = abs(pos_dir_distansce - neg_dir_distansce) + if diff_dir_distance < abs_distance: + fence_lines_candidate = lane + abs_distance = diff_dir_distance + + if fence_lines_candidate is None: + return None + + fence_lines_candidate = self.get_long_fence_line(h, w, + fence_lines_candidate) + + return fence_lines_candidate + + def judge_retrograde(self, traj_line): + + line1 = self.fence_line + x1, y1 = line1[2] - line1[0], line1[3] - line1[1] + + line2 = traj_line['traj_line'] + x2_start_point, y2_start_point = line2[0] - line1[0], line2[1] - line1[ + 1] + x2_end_point, y2_end_point = line2[2] - line1[0], line2[3] - line1[1] + + start_point_dir = x1 * y2_start_point - x2_start_point * y1 + end_point_dir = x1 * y2_end_point - x2_end_point * y1 + + if start_point_dir < 0: + start_point_dir = 1 + + elif start_point_dir == 0: + if line2[3] >= line2[1]: + start_point_dir = -1 + else: + start_point_dir = 1 + else: + start_point_dir = -1 + + if end_point_dir < 0: + end_point_dir = 1 + + elif end_point_dir == 0: + if line2[3] >= line2[1]: + end_point_dir = -1 + else: + end_point_dir = 1 + else: + end_point_dir = -1 + + if self.keep_right_flag: + driver_dir = -1 if (line2[3] - line2[1]) >= 0 else 1 + else: + driver_dir = -1 if (line2[3] - line2[1]) <= 0 else 1 + + return start_point_dir == driver_dir and start_point_dir == end_point_dir + + def mot_run(self, lanes_res, det_res, frame_shape): + + det = det_res['boxes'] + directions = lanes_res['directions'] + lanes = lanes_res['output'] + if len(directions) > 0: + direction = directions[0] + else: + return [], self.fence_line + + if len(det) == 0: + return [], self.fence_line + + traj_lines = [] + pos_dir_traj = [] + neg_dir_traj = [] + for i in range(len(det)): + class_id = int(det[i][1]) + mot_id = int(det[i][0]) + traj_i = self.center_traj_retrograde[class_id][mot_id] + if len(traj_i) < 2: + continue + + traj_line = { + 'index': i, + 'mot_id': mot_id, + 'traj_line': + [traj_i[0][0], traj_i[0][1], traj_i[-1][0], traj_i[-1][1]] + } + + if not self.is_move(traj_line['traj_line'], frame_shape): + continue + angle, angle_deviation = self.get_angle(traj_line['traj_line']) + if direction is not None and self.filter_horizontal_flag: + if abs(angle_deviation - direction) > self.deviation: + continue + + traj_line['angle'] = angle + traj_lines.append(traj_line) + + if self.fence_line is None: + if angle >= 0: + pos_dir_traj.append(traj_line) + else: + neg_dir_traj.append(traj_line) + + if len(traj_lines) == 0: + return [], self.fence_line + + if self.fence_line is None: + + if len(pos_dir_traj) < 1 or len(neg_dir_traj) < 1: + return [], None + + self.fence_line = self.init_fence_line(lanes, pos_dir_traj, + neg_dir_traj, frame_shape) + return [], self.fence_line + + else: + retrograde_list = [] + for traj_line in traj_lines: + if self.judge_retrograde(traj_line) == False: + retrograde_list.append(det[traj_line['index']][0]) + + return retrograde_list, self.fence_line diff --git a/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..66a00a3410340995a058368abfc333a35f454b66 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py @@ -0,0 +1,296 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle +from paddle.nn import functional as F +import re +from shapely.geometry import Polygon +import cv2 +import copy + + +def build_post_process(config, global_config=None): + support_dict = ['DBPostProcess', 'CTCLabelDecode'] + + config = copy.deepcopy(config) + module_name = config.pop('name') + if module_name == "None": + return + if global_config is not None: + config.update(global_config) + assert module_name in support_dict, Exception( + 'post process only support {}'.format(support_dict)) + module_class = eval(module_name)(**config) + return module_class + + +class DBPostProcess(object): + """ + The post process for Differentiable Binarization (DB). + """ + + def __init__(self, + thresh=0.3, + box_thresh=0.7, + max_candidates=1000, + unclip_ratio=2.0, + use_dilation=False, + score_mode="fast", + **kwargs): + self.thresh = thresh + self.box_thresh = box_thresh + self.max_candidates = max_candidates + self.unclip_ratio = unclip_ratio + self.min_size = 3 + self.score_mode = score_mode + assert score_mode in [ + "slow", "fast" + ], "Score mode must be in [slow, fast] but got: {}".format(score_mode) + + self.dilation_kernel = None if not use_dilation else np.array( + [[1, 1], [1, 1]]) + + def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height): + ''' + _bitmap: single map with shape (1, H, W), + whose values are binarized as {0, 1} + ''' + + bitmap = _bitmap + height, width = bitmap.shape + + outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, + cv2.CHAIN_APPROX_SIMPLE) + if len(outs) == 3: + img, contours, _ = outs[0], outs[1], outs[2] + elif len(outs) == 2: + contours, _ = outs[0], outs[1] + + num_contours = min(len(contours), self.max_candidates) + + boxes = [] + scores = [] + for index in range(num_contours): + contour = contours[index] + points, sside = self.get_mini_boxes(contour) + if sside < self.min_size: + continue + points = np.array(points) + if self.score_mode == "fast": + score = self.box_score_fast(pred, points.reshape(-1, 2)) + else: + score = self.box_score_slow(pred, contour) + if self.box_thresh > score: + continue + + box = self.unclip(points).reshape(-1, 1, 2) + box, sside = self.get_mini_boxes(box) + if sside < self.min_size + 2: + continue + box = np.array(box) + + box[:, 0] = np.clip( + np.round(box[:, 0] / width * dest_width), 0, dest_width) + box[:, 1] = np.clip( + np.round(box[:, 1] / height * dest_height), 0, dest_height) + boxes.append(box.astype(np.int16)) + scores.append(score) + return np.array(boxes, dtype=np.int16), scores + + def unclip(self, box): + try: + import pyclipper + except Exception as e: + raise RuntimeError( + 'Unable to use vehicleplate postprocess in PP-Vehicle, please install pyclipper, for example: `pip install pyclipper`, see https://github.com/fonttools/pyclipper' + ) + unclip_ratio = self.unclip_ratio + poly = Polygon(box) + distance = poly.area * unclip_ratio / poly.length + offset = pyclipper.PyclipperOffset() + offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON) + expanded = np.array(offset.Execute(distance)) + return expanded + + def get_mini_boxes(self, contour): + bounding_box = cv2.minAreaRect(contour) + points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0]) + + index_1, index_2, index_3, index_4 = 0, 1, 2, 3 + if points[1][1] > points[0][1]: + index_1 = 0 + index_4 = 1 + else: + index_1 = 1 + index_4 = 0 + if points[3][1] > points[2][1]: + index_2 = 2 + index_3 = 3 + else: + index_2 = 3 + index_3 = 2 + + box = [ + points[index_1], points[index_2], points[index_3], points[index_4] + ] + return box, min(bounding_box[1]) + + def box_score_fast(self, bitmap, _box): + ''' + box_score_fast: use bbox mean score as the mean score + ''' + h, w = bitmap.shape[:2] + box = _box.copy() + xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1) + xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1) + ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1) + ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1) + + mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8) + box[:, 0] = box[:, 0] - xmin + box[:, 1] = box[:, 1] - ymin + cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1) + return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] + + def box_score_slow(self, bitmap, contour): + ''' + box_score_slow: use polyon mean score as the mean score + ''' + h, w = bitmap.shape[:2] + contour = contour.copy() + contour = np.reshape(contour, (-1, 2)) + + xmin = np.clip(np.min(contour[:, 0]), 0, w - 1) + xmax = np.clip(np.max(contour[:, 0]), 0, w - 1) + ymin = np.clip(np.min(contour[:, 1]), 0, h - 1) + ymax = np.clip(np.max(contour[:, 1]), 0, h - 1) + + mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8) + + contour[:, 0] = contour[:, 0] - xmin + contour[:, 1] = contour[:, 1] - ymin + + cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1) + return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] + + def __call__(self, outs_dict, shape_list): + pred = outs_dict['maps'] + if isinstance(pred, paddle.Tensor): + pred = pred.numpy() + pred = pred[:, 0, :, :] + segmentation = pred > self.thresh + + boxes_batch = [] + for batch_index in range(pred.shape[0]): + src_h, src_w = shape_list[batch_index] + if self.dilation_kernel is not None: + mask = cv2.dilate( + np.array(segmentation[batch_index]).astype(np.uint8), + self.dilation_kernel) + else: + mask = segmentation[batch_index] + boxes, scores = self.boxes_from_bitmap(pred[batch_index], mask, + src_w, src_h) + + boxes_batch.append({'points': boxes}) + return boxes_batch + + +class BaseRecLabelDecode(object): + """ Convert between text-label and text-index """ + + def __init__(self, character_dict_path=None, use_space_char=False): + self.beg_str = "sos" + self.end_str = "eos" + + self.character_str = [] + if character_dict_path is None: + self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" + dict_character = list(self.character_str) + else: + with open(character_dict_path, "rb") as fin: + lines = fin.readlines() + for line in lines: + line = line.decode('utf-8').strip("\n").strip("\r\n") + self.character_str.append(line) + if use_space_char: + self.character_str.append(" ") + dict_character = list(self.character_str) + + dict_character = self.add_special_char(dict_character) + self.dict = {} + for i, char in enumerate(dict_character): + self.dict[char] = i + self.character = dict_character + + def add_special_char(self, dict_character): + return dict_character + + def decode(self, text_index, text_prob=None, is_remove_duplicate=False): + """ convert text-index into text-label. """ + result_list = [] + ignored_tokens = self.get_ignored_tokens() + batch_size = len(text_index) + for batch_idx in range(batch_size): + selection = np.ones(len(text_index[batch_idx]), dtype=bool) + if is_remove_duplicate: + selection[1:] = text_index[batch_idx][1:] != text_index[ + batch_idx][:-1] + for ignored_token in ignored_tokens: + selection &= text_index[batch_idx] != ignored_token + + char_list = [ + self.character[text_id] + for text_id in text_index[batch_idx][selection] + ] + if text_prob is not None: + conf_list = text_prob[batch_idx][selection] + else: + conf_list = [1] * len(selection) + if len(conf_list) == 0: + conf_list = [0] + + text = ''.join(char_list) + result_list.append((text, np.mean(conf_list).tolist())) + return result_list + + def get_ignored_tokens(self): + return [0] # for ctc blank + + +class CTCLabelDecode(BaseRecLabelDecode): + """ Convert between text-label and text-index """ + + def __init__(self, character_dict_path=None, use_space_char=False, + **kwargs): + super(CTCLabelDecode, self).__init__(character_dict_path, + use_space_char) + + def __call__(self, preds, label=None, *args, **kwargs): + if isinstance(preds, tuple) or isinstance(preds, list): + preds = preds[-1] + if isinstance(preds, paddle.Tensor): + preds = preds.numpy() + preds_idx = preds.argmax(axis=2) + preds_prob = preds.max(axis=2) + text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True) + if label is None: + return text + label = self.decode(label) + return text, label + + def add_special_char(self, dict_character): + dict_character = ['blank'] + dict_character + return dict_character diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/ccpd2ocr_all.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/ccpd2ocr_all.py new file mode 100644 index 0000000000000000000000000000000000000000..2f0d168e4e2eb2680fddce030b02fbbcc07f62d8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/ccpd2ocr_all.py @@ -0,0 +1,167 @@ +import cv2 +import os +import json +from tqdm import tqdm +import numpy as np + +provinces = [ + "皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", + "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", + "新", "警", "学", "O" +] +alphabets = [ + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', + 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O' +] +ads = [ + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', + 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', + '6', '7', '8', '9', 'O' +] + + +def make_label_2020(img_dir, save_gt_folder, phase): + crop_img_save_dir = os.path.join(save_gt_folder, phase, 'crop_imgs') + os.makedirs(crop_img_save_dir, exist_ok=True) + + f_det = open( + os.path.join(save_gt_folder, phase, 'det.txt'), 'w', encoding='utf-8') + f_rec = open( + os.path.join(save_gt_folder, phase, 'rec.txt'), 'w', encoding='utf-8') + + i = 0 + for filename in tqdm(os.listdir(os.path.join(img_dir, phase))): + str_list = filename.split('-') + if len(str_list) < 5: + continue + coord_list = str_list[3].split('_') + txt_list = str_list[4].split('_') + boxes = [] + for coord in coord_list: + boxes.append([int(x) for x in coord.split("&")]) + boxes = [boxes[2], boxes[3], boxes[0], boxes[1]] + lp_number = provinces[int(txt_list[0])] + alphabets[int(txt_list[ + 1])] + ''.join([ads[int(x)] for x in txt_list[2:]]) + + # det + det_info = [{'points': boxes, 'transcription': lp_number}] + f_det.write('{}\t{}\n'.format( + os.path.join("CCPD2020/ccpd_green", phase, filename), + json.dumps( + det_info, ensure_ascii=False))) + + # rec + boxes = np.float32(boxes) + img = cv2.imread(os.path.join(img_dir, phase, filename)) + # crop_img = img[int(boxes[:,1].min()):int(boxes[:,1].max()),int(boxes[:,0].min()):int(boxes[:,0].max())] + crop_img = get_rotate_crop_image(img, boxes) + crop_img_save_filename = '{}_{}.jpg'.format(i, '_'.join(txt_list)) + crop_img_save_path = os.path.join(crop_img_save_dir, + crop_img_save_filename) + cv2.imwrite(crop_img_save_path, crop_img) + f_rec.write('{}/{}/crop_imgs/{}\t{}\n'.format( + "CCPD2020/PPOCR", phase, crop_img_save_filename, lp_number)) + i += 1 + f_det.close() + f_rec.close() + + +def make_label_2019(list_dir, save_gt_folder, phase): + crop_img_save_dir = os.path.join(save_gt_folder, phase, 'crop_imgs') + os.makedirs(crop_img_save_dir, exist_ok=True) + + f_det = open( + os.path.join(save_gt_folder, phase, 'det.txt'), 'w', encoding='utf-8') + f_rec = open( + os.path.join(save_gt_folder, phase, 'rec.txt'), 'w', encoding='utf-8') + + with open(os.path.join(list_dir, phase + ".txt"), 'r') as rf: + imglist = rf.readlines() + + i = 0 + for idx, filename in enumerate(imglist): + if idx % 1000 == 0: + print("{}/{}".format(idx, len(imglist))) + filename = filename.strip() + str_list = filename.split('-') + if len(str_list) < 5: + continue + coord_list = str_list[3].split('_') + txt_list = str_list[4].split('_') + boxes = [] + for coord in coord_list: + boxes.append([int(x) for x in coord.split("&")]) + boxes = [boxes[2], boxes[3], boxes[0], boxes[1]] + lp_number = provinces[int(txt_list[0])] + alphabets[int(txt_list[ + 1])] + ''.join([ads[int(x)] for x in txt_list[2:]]) + + # det + det_info = [{'points': boxes, 'transcription': lp_number}] + f_det.write('{}\t{}\n'.format( + os.path.join("CCPD2019", filename), + json.dumps( + det_info, ensure_ascii=False))) + + # rec + boxes = np.float32(boxes) + imgpath = os.path.join(list_dir[:-7], filename) + img = cv2.imread(imgpath) + # crop_img = img[int(boxes[:,1].min()):int(boxes[:,1].max()),int(boxes[:,0].min()):int(boxes[:,0].max())] + crop_img = get_rotate_crop_image(img, boxes) + crop_img_save_filename = '{}_{}.jpg'.format(i, '_'.join(txt_list)) + crop_img_save_path = os.path.join(crop_img_save_dir, + crop_img_save_filename) + cv2.imwrite(crop_img_save_path, crop_img) + f_rec.write('{}/{}/crop_imgs/{}\t{}\n'.format( + "CCPD2019/PPOCR", phase, crop_img_save_filename, lp_number)) + i += 1 + f_det.close() + f_rec.close() + + +def get_rotate_crop_image(img, points): + ''' + img_height, img_width = img.shape[0:2] + left = int(np.min(points[:, 0])) + right = int(np.max(points[:, 0])) + top = int(np.min(points[:, 1])) + bottom = int(np.max(points[:, 1])) + img_crop = img[top:bottom, left:right, :].copy() + points[:, 0] = points[:, 0] - left + points[:, 1] = points[:, 1] - top + ''' + assert len(points) == 4, "shape of points must be 4*2" + img_crop_width = int( + max( + np.linalg.norm(points[0] - points[1]), + np.linalg.norm(points[2] - points[3]))) + img_crop_height = int( + max( + np.linalg.norm(points[0] - points[3]), + np.linalg.norm(points[1] - points[2]))) + pts_std = np.float32([[0, 0], [img_crop_width, 0], + [img_crop_width, img_crop_height], + [0, img_crop_height]]) + M = cv2.getPerspectiveTransform(points, pts_std) + dst_img = cv2.warpPerspective( + img, + M, (img_crop_width, img_crop_height), + borderMode=cv2.BORDER_REPLICATE, + flags=cv2.INTER_CUBIC) + dst_img_height, dst_img_width = dst_img.shape[0:2] + if dst_img_height * 1.0 / dst_img_width >= 1.5: + dst_img = np.rot90(dst_img) + return dst_img + + +img_dir = './CCPD2020/ccpd_green' +save_gt_folder = './CCPD2020/PPOCR' +# phase = 'train' # change to val and test to make val dataset and test dataset +for phase in ['train', 'val', 'test']: + make_label_2020(img_dir, save_gt_folder, phase) + +list_dir = './CCPD2019/splits/' +save_gt_folder = './CCPD2019/PPOCR' + +for phase in ['train', 'val', 'test']: + make_label_2019(list_dir, save_gt_folder, phase) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/clip_video.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/clip_video.py new file mode 100644 index 0000000000000000000000000000000000000000..fbfb9cd08169b90bc71a436f6a414c4d6d1f480f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/clip_video.py @@ -0,0 +1,36 @@ +import cv2 + + +def cut_video(video_path, frameToStart, frametoStop, saved_video_path): + cap = cv2.VideoCapture(video_path) + FPS = cap.get(cv2.CAP_PROP_FPS) + + TOTAL_FRAME = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 获取视频总帧数 + + size = (cap.get(cv2.CAP_PROP_FRAME_WIDTH), + cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + + videoWriter = cv2.VideoWriter( + saved_video_path, + apiPreference=0, + fourcc=cv2.VideoWriter_fourcc(* 'mp4v'), + fps=FPS, + frameSize=(int(size[0]), int(size[1]))) + + COUNT = 0 + while True: + success, frame = cap.read() + if success: + COUNT += 1 + if COUNT <= frametoStop and COUNT > frameToStart: # 选取起始帧 + videoWriter.write(frame) + else: + print("cap.read failed!") + break + if COUNT > frametoStop: + break + + cap.release() + videoWriter.release() + + print(saved_video_path) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/create_dataset_list.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/create_dataset_list.py new file mode 100644 index 0000000000000000000000000000000000000000..261e15e8f2568d48a474647fa453ffd0560a3f3a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/create_dataset_list.py @@ -0,0 +1,147 @@ +# coding: utf8 +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import os.path +import argparse +import warnings + + +def parse_args(): + parser = argparse.ArgumentParser( + description='PaddleSeg generate file list on cityscapes or your customized dataset.' + ) + parser.add_argument('dataset_root', help='dataset root directory', type=str) + parser.add_argument( + '--type', + help='dataset type: \n' + '- cityscapes \n' + '- custom(default)', + default="custom", + type=str) + parser.add_argument( + '--separator', + dest='separator', + help='file list separator', + default=" ", + type=str) + parser.add_argument( + '--folder', + help='the folder names of images and labels', + type=str, + nargs=2, + default=['images', 'labels']) + parser.add_argument( + '--second_folder', + help='the second-level folder names of train set, validation set, test set', + type=str, + nargs='*', + default=['train', 'val', 'test']) + parser.add_argument( + '--format', + help='data format of images and labels, e.g. jpg or png.', + type=str, + nargs=2, + default=['jpg', 'png']) + parser.add_argument( + '--postfix', + help='postfix of images or labels', + type=str, + nargs=2, + default=['', '']) + + return parser.parse_args() + + +def get_files(image_or_label, dataset_split, args): + dataset_root = args.dataset_root + postfix = args.postfix + format = args.format + folder = args.folder + + pattern = '*%s.%s' % (postfix[image_or_label], format[image_or_label]) + + search_files = os.path.join(dataset_root, folder[image_or_label], + dataset_split, pattern) + search_files2 = os.path.join(dataset_root, folder[image_or_label], + dataset_split, "*", pattern) # 包含子目录 + search_files3 = os.path.join(dataset_root, folder[image_or_label], + dataset_split, "*", "*", pattern) # 包含三级目录 + search_files4 = os.path.join(dataset_root, folder[image_or_label], + dataset_split, "*", "*", "*", + pattern) # 包含四级目录 + search_files5 = os.path.join(dataset_root, folder[image_or_label], + dataset_split, "*", "*", "*", "*", + pattern) # 包含五级目录 + + filenames = glob.glob(search_files) + filenames2 = glob.glob(search_files2) + filenames3 = glob.glob(search_files3) + filenames4 = glob.glob(search_files4) + filenames5 = glob.glob(search_files5) + + filenames = filenames + filenames2 + filenames3 + filenames4 + filenames5 + + return sorted(filenames) + + +def generate_list(args): + dataset_root = args.dataset_root + separator = args.separator + + for dataset_split in args.second_folder: + print("Creating {}.txt...".format(dataset_split)) + image_files = get_files(0, dataset_split, args) + label_files = get_files(1, dataset_split, args) + if not image_files: + img_dir = os.path.join(dataset_root, args.folder[0], dataset_split) + warnings.warn("No images in {} !!!".format(img_dir)) + num_images = len(image_files) + + if not label_files: + label_dir = os.path.join(dataset_root, args.folder[1], + dataset_split) + warnings.warn("No labels in {} !!!".format(label_dir)) + num_label = len(label_files) + + if num_images != num_label and num_label > 0: + raise Exception( + "Number of images = {} number of labels = {} \n" + "Either number of images is equal to number of labels, " + "or number of labels is equal to 0.\n" + "Please check your dataset!".format(num_images, num_label)) + + file_list = os.path.join(dataset_root, dataset_split + '.txt') + with open(file_list, "w") as f: + for item in range(num_images): + left = image_files[item].replace(dataset_root, '', 1) + if left[0] == os.path.sep: + left = left.lstrip(os.path.sep) + + try: + right = label_files[item].replace(dataset_root, '', 1) + if right[0] == os.path.sep: + right = right.lstrip(os.path.sep) + line = left + separator + right + '\n' + except: + line = left + '\n' + + f.write(line) + print(line) + + +if __name__ == '__main__': + args = parse_args() + generate_list(args) diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/get_video_info.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/get_video_info.py new file mode 100644 index 0000000000000000000000000000000000000000..39aa30d81212577666f25d4e14a147113197b1ed --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/get_video_info.py @@ -0,0 +1,71 @@ +import os +import sys +import cv2 +import numpy as np +import argparse + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + return parser + + +def get_video_info(video_file, region_polygon): + entrance = [] + assert len(region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points." + for i in range(0, len(region_polygon), 2): + entrance.append([region_polygon[i], region_polygon[i + 1]]) + + if not os.path.exists(video_file): + print("video path '{}' not exists".format(video_file)) + sys.exit(-1) + capture = cv2.VideoCapture(video_file) + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + print("video width: %d, height: %d" % (width, height)) + np_masks = np.zeros((height, width, 1), np.uint8) + + entrance = np.array(entrance) + cv2.fillPoly(np_masks, [entrance], 255) + + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("video fps: %d, frame_count: %d" % (fps, frame_count)) + cnt = 0 + while (1): + ret, frame = capture.read() + cnt += 1 + if cnt == 3: break + + alpha = 0.3 + img = np.array(frame).astype('float32') + mask = np_masks[:, :, 0] + color_mask = [0, 0, 255] + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + img[idx[0], idx[1], :] *= 1.0 - alpha + img[idx[0], idx[1], :] += alpha * color_mask + cv2.imwrite('region_vis.jpg', img) + + +if __name__ == "__main__": + parser = argsparser() + FLAGS = parser.parse_args() + get_video_info(FLAGS.video_file, FLAGS.region_polygon) + + # python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/lane_to_mask.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/lane_to_mask.py new file mode 100644 index 0000000000000000000000000000000000000000..ece2efb87d03b8db209e14373a71140bf3a8f766 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/lane_to_mask.py @@ -0,0 +1,508 @@ +# coding: utf8 +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Convert poly2d to mask/bitmask.""" + +import os +from functools import partial +from multiprocessing import Pool +from typing import Callable, Dict, List + +import matplotlib # type: ignore +import matplotlib.pyplot as plt # type: ignore +import numpy as np +from PIL import Image +from scalabel.common.parallel import NPROC +from scalabel.common.typing import NDArrayU8 +from scalabel.label.io import group_and_sort, load +from scalabel.label.transforms import poly_to_patch +from scalabel.label.typing import Config, Frame, ImageSize, Label, Poly2D +from scalabel.label.utils import ( + check_crowd, + check_ignored, + get_leaf_categories, ) +from tqdm import tqdm + +from bdd100k.common.logger import logger +from bdd100k.common.typing import BDD100KConfig +from bdd100k.common.utils import get_bdd100k_instance_id, load_bdd100k_config +from bdd100k.label.label import drivables, labels, lane_categories +from bdd100k.label.to_coco import parse_args +from bdd100k.label.to_scalabel import bdd100k_to_scalabel + +IGNORE_LABEL = 255 +STUFF_NUM = 30 +LANE_DIRECTION_MAP = {"parallel": 0, "vertical": 1} +LANE_STYLE_MAP = {"solid": 0, "dashed": 1} + + +def frame_to_mask( + out_path: str, + shape: ImageSize, + colors: List[NDArrayU8], + poly2ds: List[List[Poly2D]], + with_instances: bool=True, + back_color: int=0, + closed: bool=True, ) -> None: + """Converting a frame of poly2ds to mask/bitmask.""" + assert len(colors) == len(poly2ds) + height, width = shape.height, shape.width + + assert back_color >= 0 + if with_instances: + img: NDArrayU8 = ( + np.ones( + [height, width, 4], dtype=np.uint8) * back_color # type: ignore + ) + else: + img = ( + np.ones( + [height, width, 1], dtype=np.uint8) * back_color # type: ignore + ) + + if len(colors) == 0: + pil_img = Image.fromarray(img.squeeze()) + pil_img.save(out_path) + + matplotlib.use("Agg") + fig = plt.figure(facecolor="0") + fig.set_size_inches((width / fig.get_dpi()), height / fig.get_dpi()) + ax = fig.add_axes([0, 0, 1, 1]) + ax.axis("off") + ax.set_xlim(0, width) + ax.set_ylim(0, height) + ax.set_facecolor((0, 0, 0, 0)) + ax.invert_yaxis() + + for i, poly2d in enumerate(poly2ds): + for poly in poly2d: + ax.add_patch( + poly_to_patch( + poly.vertices, + poly.types, + # (0, 0, 0) for the background + color=( + ((i + 1) >> 8) / 255.0, + ((i + 1) % 255) / 255.0, + 0.0, ), + closed=closed, )) + + fig.canvas.draw() + out: NDArrayU8 = np.frombuffer(fig.canvas.tostring_rgb(), np.uint8) + out = out.reshape((height, width, -1)).astype(np.int32) + out = (out[..., 0] << 8) + out[..., 1] + plt.close() + + for i, color in enumerate(colors): + # 0 is for the background + img[out == i + 1] = color + + img[img == 255] = 0 + + pil_img = Image.fromarray(img.squeeze()) + pil_img.save(out_path) + + +def set_instance_color(label: Label, category_id: int, + ann_id: int) -> NDArrayU8: + """Set the color for an instance given its attributes and ID.""" + attributes = label.attributes + if attributes is None: + truncated, occluded, crowd, ignored = 0, 0, 0, 0 + else: + truncated = int(attributes.get("truncated", False)) + occluded = int(attributes.get("occluded", False)) + crowd = int(check_crowd(label)) + ignored = int(check_ignored(label)) + color: NDArrayU8 = np.array( + [ + category_id & 255, + (truncated << 3) + (occluded << 2) + (crowd << 1) + ignored, + ann_id >> 8, + ann_id & 255, + ], + dtype=np.uint8, ) + return color + + +def set_lane_color(label: Label, category_id: int) -> NDArrayU8: + """Set the color for the lane given its attributes and category.""" + attributes = label.attributes + if attributes is None: + lane_direction, lane_style = 0, 0 + else: + lane_direction = LANE_DIRECTION_MAP[str( + attributes.get("laneDirection", "parallel"))] + lane_style = LANE_STYLE_MAP[str(attributes.get("laneStyle", "solid"))] + + #value = category_id + (lane_direction << 5) + (lane_style << 4) + value = category_id + if lane_style == 0 and (category_id == 3 or category_id == 2): + value = 1 + if lane_style == 0: + value = 2 + else: + value = 3 + + color: NDArrayU8 = np.array([value], dtype=np.uint8) + return color + + +def frames_to_masks( + nproc: int, + out_paths: List[str], + shapes: List[ImageSize], + colors_list: List[List[NDArrayU8]], + poly2ds_list: List[List[List[Poly2D]]], + with_instances: bool=True, + back_color: int=0, + closed: bool=True, ) -> None: + """Execute the mask conversion in parallel.""" + with Pool(nproc) as pool: + pool.starmap( + partial( + frame_to_mask, + with_instances=with_instances, + back_color=back_color, + closed=closed, ), + tqdm( + zip(out_paths, shapes, colors_list, poly2ds_list), + total=len(out_paths), ), ) + + +def seg_to_masks( + frames: List[Frame], + out_base: str, + config: Config, + nproc: int=NPROC, + mode: str="sem_seg", + back_color: int=IGNORE_LABEL, + closed: bool=True, ) -> None: + """Converting segmentation poly2d to 1-channel masks.""" + os.makedirs(out_base, exist_ok=True) + img_shape = config.imageSize + + out_paths: List[str] = [] + shapes: List[ImageSize] = [] + colors_list: List[List[NDArrayU8]] = [] + poly2ds_list: List[List[List[Poly2D]]] = [] + + categories = dict( + sem_seg=labels, drivable=drivables, lane_mark=lane_categories)[mode] + cat_name2id = { + cat.name: cat.trainId + for cat in categories if cat.trainId != IGNORE_LABEL + } + + logger.info("Preparing annotations for Semseg to Bitmasks") + + for image_anns in tqdm(frames): + # Mask in .png format + image_name = image_anns.name.replace(".jpg", ".png") + image_name = os.path.split(image_name)[-1] + out_path = os.path.join(out_base, image_name) + out_paths.append(out_path) + + if img_shape is None: + if image_anns.size is not None: + img_shape = image_anns.size + else: + raise ValueError("Image shape not defined!") + shapes.append(img_shape) + + colors: List[NDArrayU8] = [] + poly2ds: List[List[Poly2D]] = [] + colors_list.append(colors) + poly2ds_list.append(poly2ds) + + if image_anns.labels is None: + continue + + for label in image_anns.labels: + if label.category not in cat_name2id: + continue + if label.poly2d is None: + continue + + category_id = cat_name2id[label.category] + if mode in ["sem_seg", "drivable"]: + color: NDArrayU8 = np.array([category_id], dtype=np.uint8) + else: + color = set_lane_color(label, category_id) + + colors.append(color) + poly2ds.append(label.poly2d) + + logger.info("Start Conversion for Seg to Masks") + frames_to_masks( + nproc, + out_paths, + shapes, + colors_list, + poly2ds_list, + with_instances=False, + back_color=back_color, + closed=closed, ) + + +ToMasksFunc = Callable[[List[Frame], str, Config, int], None] +semseg_to_masks: ToMasksFunc = partial( + seg_to_masks, mode="sem_seg", back_color=IGNORE_LABEL, closed=True) +drivable_to_masks: ToMasksFunc = partial( + seg_to_masks, + mode="drivable", + back_color=len(drivables) - 1, + closed=True, ) +lanemark_to_masks: ToMasksFunc = partial( + seg_to_masks, mode="lane_mark", back_color=IGNORE_LABEL, closed=False) + + +def insseg_to_bitmasks(frames: List[Frame], + out_base: str, + config: Config, + nproc: int=NPROC) -> None: + """Converting instance segmentation poly2d to bitmasks.""" + os.makedirs(out_base, exist_ok=True) + img_shape = config.imageSize + + out_paths: List[str] = [] + shapes: List[ImageSize] = [] + colors_list: List[List[NDArrayU8]] = [] + poly2ds_list: List[List[List[Poly2D]]] = [] + + categories = get_leaf_categories(config.categories) + cat_name2id = {cat.name: i + 1 for i, cat in enumerate(categories)} + + logger.info("Preparing annotations for InsSeg to Bitmasks") + + for image_anns in tqdm(frames): + ann_id = 0 + + # Bitmask in .png format + image_name = image_anns.name.replace(".jpg", ".png") + image_name = os.path.split(image_name)[-1] + out_path = os.path.join(out_base, image_name) + out_paths.append(out_path) + + if img_shape is None: + if image_anns.size is not None: + img_shape = image_anns.size + else: + raise ValueError("Image shape not defined!") + shapes.append(img_shape) + + colors: List[NDArrayU8] = [] + poly2ds: List[List[Poly2D]] = [] + colors_list.append(colors) + poly2ds_list.append(poly2ds) + + labels_ = image_anns.labels + if labels_ is None or len(labels_) == 0: + continue + + # Scores higher, rendering later + if labels_[0].score is not None: + labels_ = sorted(labels_, key=lambda label: float(label.score)) + + for label in labels_: + if label.poly2d is None: + continue + if label.category not in cat_name2id: + continue + + ann_id += 1 + category_id = cat_name2id[label.category] + color = set_instance_color(label, category_id, ann_id) + colors.append(color) + poly2ds.append(label.poly2d) + + logger.info("Start conversion for InsSeg to Bitmasks") + frames_to_masks(nproc, out_paths, shapes, colors_list, poly2ds_list) + + +def panseg_to_bitmasks(frames: List[Frame], + out_base: str, + config: Config, + nproc: int=NPROC) -> None: + """Converting panoptic segmentation poly2d to bitmasks.""" + os.makedirs(out_base, exist_ok=True) + img_shape = config.imageSize + + out_paths: List[str] = [] + shapes: List[ImageSize] = [] + colors_list: List[List[NDArrayU8]] = [] + poly2ds_list: List[List[List[Poly2D]]] = [] + cat_name2id = {cat.name: cat.id for cat in labels} + + logger.info("Preparing annotations for InsSeg to Bitmasks") + + for image_anns in tqdm(frames): + cur_ann_id = STUFF_NUM + + # Bitmask in .png format + image_name = image_anns.name.replace(".jpg", ".png") + image_name = os.path.split(image_name)[-1] + out_path = os.path.join(out_base, image_name) + out_paths.append(out_path) + + if img_shape is None: + if image_anns.size is not None: + img_shape = image_anns.size + else: + raise ValueError("Image shape not defined!") + shapes.append(img_shape) + + colors: List[NDArrayU8] = [] + poly2ds: List[List[Poly2D]] = [] + colors_list.append(colors) + poly2ds_list.append(poly2ds) + + labels_ = image_anns.labels + if labels_ is None or len(labels_) == 0: + continue + + # Scores higher, rendering later + if labels_[0].score is not None: + labels_ = sorted(labels_, key=lambda label: float(label.score)) + + for label in labels_: + if label.poly2d is None: + continue + if label.category not in cat_name2id: + continue + + category_id = cat_name2id[label.category] + if category_id == 0: + continue + if category_id <= STUFF_NUM: + ann_id = category_id + else: + cur_ann_id += 1 + ann_id = cur_ann_id + + color = set_instance_color(label, category_id, ann_id) + colors.append(color) + poly2ds.append(label.poly2d) + + logger.info("Start conversion for PanSeg to Bitmasks") + frames_to_masks(nproc, out_paths, shapes, colors_list, poly2ds_list) + + +def segtrack_to_bitmasks(frames: List[Frame], + out_base: str, + config: Config, + nproc: int=NPROC) -> None: + """Converting segmentation tracking poly2d to bitmasks.""" + frames_list = group_and_sort(frames) + img_shape = config.imageSize + + out_paths: List[str] = [] + shapes: List[ImageSize] = [] + colors_list: List[List[NDArrayU8]] = [] + poly2ds_list: List[List[List[Poly2D]]] = [] + + categories = get_leaf_categories(config.categories) + cat_name2id = {cat.name: i + 1 for i, cat in enumerate(categories)} + + logger.info("Preparing annotations for SegTrack to Bitmasks") + + for video_anns in tqdm(frames_list): + global_instance_id: int = 1 + instance_id_maps: Dict[str, int] = {} + + video_name = video_anns[0].videoName + out_dir = os.path.join(out_base, video_name) + if not os.path.isdir(out_dir): + os.makedirs(out_dir) + + for image_anns in video_anns: + # Bitmask in .png format + image_name = image_anns.name.replace(".jpg", ".png") + image_name = os.path.split(image_name)[-1] + out_path = os.path.join(out_dir, image_name) + out_paths.append(out_path) + + if img_shape is None: + if image_anns.size is not None: + img_shape = image_anns.size + else: + raise ValueError("Image shape not defined!") + shapes.append(img_shape) + + colors: List[NDArrayU8] = [] + poly2ds: List[List[Poly2D]] = [] + colors_list.append(colors) + poly2ds_list.append(poly2ds) + + labels_ = image_anns.labels + if labels_ is None or len(labels_) == 0: + continue + + # Scores higher, rendering later + if labels_[0].score is not None: + labels_ = sorted(labels_, key=lambda label: float(label.score)) + + for label in labels_: + if label.poly2d is None: + continue + if label.category not in cat_name2id: + continue + + instance_id, global_instance_id = get_bdd100k_instance_id( + instance_id_maps, global_instance_id, label.id) + category_id = cat_name2id[label.category] + color = set_instance_color(label, category_id, instance_id) + colors.append(color) + poly2ds.append(label.poly2d) + + logger.info("Start Conversion for SegTrack to Bitmasks") + frames_to_masks(nproc, out_paths, shapes, colors_list, poly2ds_list) + + +def main() -> None: + """Main function.""" + args = parse_args() + args.mode = "lane_mark" + + os.environ["QT_QPA_PLATFORM"] = "offscreen" # matplotlib offscreen render + + convert_funcs: Dict[str, ToMasksFunc] = dict( + sem_seg=semseg_to_masks, + drivable=drivable_to_masks, + lane_mark=lanemark_to_masks, + pan_seg=panseg_to_bitmasks, + ins_seg=insseg_to_bitmasks, + seg_track=segtrack_to_bitmasks, ) + + dataset = load(args.input, args.nproc) + if args.config is not None: + bdd100k_config = load_bdd100k_config(args.config) + elif dataset.config is not None: + bdd100k_config = BDD100KConfig(config=dataset.config) + else: + bdd100k_config = load_bdd100k_config(args.mode) + + if args.mode in ["ins_seg", "seg_track"]: + frames = bdd100k_to_scalabel(dataset.frames, bdd100k_config) + else: + frames = dataset.frames + + convert_funcs[args.mode](frames, args.output, bdd100k_config.scalabel, + args.nproc) + + logger.info("Finished!") + + +if __name__ == "__main__": + main() diff --git a/PaddleDetection-release-2.6/deploy/pipeline/tools/split_fight_train_test_dataset.py b/PaddleDetection-release-2.6/deploy/pipeline/tools/split_fight_train_test_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..5ca8fce64d00ccefee965c899a0d2b96863ff1dc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pipeline/tools/split_fight_train_test_dataset.py @@ -0,0 +1,80 @@ +import os +import glob +import random +import fnmatch +import re +import sys + +class_id = {"nofight": 0, "fight": 1} + + +def get_list(path, key_func=lambda x: x[-11:], rgb_prefix='img_', level=1): + if level == 1: + frame_folders = glob.glob(os.path.join(path, '*')) + elif level == 2: + frame_folders = glob.glob(os.path.join(path, '*', '*')) + else: + raise ValueError('level can be only 1 or 2') + + def count_files(directory): + lst = os.listdir(directory) + cnt = len(fnmatch.filter(lst, rgb_prefix + '*')) + return cnt + + # check RGB + video_dict = {} + for f in frame_folders: + cnt = count_files(f) + k = key_func(f) + if level == 2: + k = k.split("/")[0] + + video_dict[f] = str(cnt) + " " + str(class_id[k]) + + return video_dict + + +def fight_splits(video_dict, train_percent=0.8): + videos = list(video_dict.keys()) + + train_num = int(len(videos) * train_percent) + + train_list = [] + val_list = [] + + random.shuffle(videos) + + for i in range(train_num): + train_list.append(videos[i] + " " + str(video_dict[videos[i]])) + for i in range(train_num, len(videos)): + val_list.append(videos[i] + " " + str(video_dict[videos[i]])) + + print("train:", len(train_list), ",val:", len(val_list)) + + with open("fight_train_list.txt", "w") as f: + for item in train_list: + f.write(item + "\n") + + with open("fight_val_list.txt", "w") as f: + for item in val_list: + f.write(item + "\n") + + +if __name__ == "__main__": + frame_dir = sys.argv[1] # "rawframes" + level = sys.argv[2] # 2 + train_percent = sys.argv[3] # 0.8 + + if level == 2: + + def key_func(x): + return '/'.join(x.split('/')[-2:]) + else: + + def key_func(x): + return x.split('/')[-1] + + video_dict = get_list(frame_dir, key_func=key_func, level=level) + print("number:", len(video_dict)) + + fight_splits(video_dict, train_percent) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/README.md b/PaddleDetection-release-2.6/deploy/pptracking/README.md new file mode 100644 index 0000000000000000000000000000000000000000..13c4f964bb9063f28d6e08dfb8c6b828a81d2536 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/README.md @@ -0,0 +1 @@ +README_en.md \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pptracking/README_cn.md b/PaddleDetection-release-2.6/deploy/pptracking/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..c68690fb49fd76af203b86dceace075e7004d1be --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/README_cn.md @@ -0,0 +1,93 @@ +[English](README_en.md) | 简体中文 + +# 实时多目标跟踪系统PP-Tracking + +PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 + +
    + +
    + +
    + +
    + 视频来源:VisDrone和BDD100K公开数据集
    + + + +## 一、快速开始 + +### AI Studio公开项目案例 +PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +### Python端预测部署 +PP-Tracking 支持Python预测部署,教程请参考[PP-Tracking Python部署文档](python/README.md)。 + +### C++端预测部署 +PP-Tracking 支持C++预测部署,教程请参考[PP-Tracking C++部署文档](cpp/README.md)。 + +### GUI可视化界面预测部署 +PP-Tracking 提供了简洁的GUI可视化界面,教程请参考[PP-Tracking可视化界面试用版使用文档](https://github.com/yangyudong2020/PP-Tracking_GUi)。 + + +## 二、算法介绍 + +PP-Tracking 支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式。 +- 单镜头跟踪同时支持**FairMOT**和**DeepSORT**两种多目标跟踪算法,跨镜头跟踪只支持**DeepSORT**算法。 +- 单镜头跟踪的功能包括行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪以及流量统计,模型主要是基于FairMOT进行优化,实现了实时跟踪的效果,同时基于不同应用场景提供了针对性的预训练模型。 +- DeepSORT算法方案(包括跨镜头跟踪用到的DeepSORT),选用的检测器是PaddleDetection自研的高性能检测模型[PP-YOLOv2](../../configs/ppyolo/)和轻量级特色检测模型[PP-PicoDet](../../configs/picodet/),选用的ReID模型是PaddleClas自研的超轻量骨干网络模型[PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/PP-LCNet.md) + +PP-Tracking中提供的多场景预训练模型以及导出后的预测部署模型如下: + +| 场景 | 数据集 | 精度(MOTA) | 预测速度(FPS) | 配置文件 | 模型权重 | 预测部署模型 | +| :---------: |:--------------- | :-------: | :------: | :------:|:-----: | :--------: | +| 行人跟踪 | MOT17 | 65.3 | 23.9 | [配置文件](../../configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar) | +| 行人小目标跟踪 | VisDrone-pedestrian | 40.5 | 8.35 | [配置文件](../../configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.tar) | +| 车辆跟踪 | BDD100k-vehicle | 32.6 | 24.3 | [配置文件](../../configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.tar) | +| 车辆小目标跟踪 | VisDrone-vehicle | 39.8 | 22.8 | [配置文件](../../configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.tar) +| 多类别跟踪 | BDD100k | - | 12.5 | [配置文件](../../configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.tar) | +| 多类别小目标跟踪 | VisDrone | 20.4 | 6.74 | [配置文件](../../configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) | [下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.tar) | + +**注意:** +1. 模型预测速度的设备为**NVIDIA Jetson Xavier NX**,速度为**TensorRT FP16**速度,测试环境为CUDA 10.2、JETPACK 4.5.1、TensorRT 7.1。 +2. 模型权重是指使用PaddleDetection训练完直接保存的权重,更多跟踪模型权重请参考[多目标跟踪模型库](../../configs/mot/README.md#模型库)去下载,也可按照相应模型配置文件去训练。 +3. 预测部署模型是指导出后的前向参数的模型,因为PP-Tracking项目的部署过程中只需要前向参数,可根据[多目标跟踪模型库](../../configs/mot/README.md#模型库)去下载并导出,也可按照相应模型配置文件去训练并导出。导出后的模型文件夹应包括`infer_cfg.yml`、`model.pdiparams`、`model.pdiparams.info`和`model.pdmodel`四个文件,一般会将它们以tar格式打包。 + + +## 引用 +``` +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +@InProceedings{bdd100k, + author = {Yu, Fisher and Chen, Haofeng and Wang, Xin and Xian, Wenqi and Chen, + Yingying and Liu, Fangchen and Madhavan, Vashisht and Darrell, Trevor}, + title = {BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} +@inproceedings{Wojke2018deep, + title={Deep Cosine Metric Learning for Person Re-identification}, + author={Wojke, Nicolai and Bewley, Alex}, + booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, + year={2018}, + pages={748--756}, + organization={IEEE}, + doi={10.1109/WACV.2018.00087} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pptracking/README_en.md b/PaddleDetection-release-2.6/deploy/pptracking/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..3f76f9f73e71849853ec47442d8a137606a77be9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/README_en.md @@ -0,0 +1,94 @@ +English | [简体中文](README_cn.md) + +# Real-time Multi-Object Tracking system PP-Tracking + +PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment. + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc. + +
    + +
    + +
    + +
    + video source:VisDrone and BDD100K dataset
    + + + +## 一、Quick Start + +### AI studio public project case +PP-tracking provides AI studio public project cases. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582). + +### Python predict and deployment +PP-Tracking supports Python predict and deployment. Please refer to this [doc](python/README.md). + +### C++ predict and deployment +PP-Tracking supports C++ predict and deployment. Please refer to this [doc](cpp/README.md). + +### GUI predict and deployment +PP-Tracking supports GUI predict and deployment. Please refer to this [doc](https://github.com/yangyudong2020/PP-Tracking_GUi). + + +## 二、Model Zoo + +PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). +- Single camera tracking supports **FairMOT** and **DeepSORT** two MOT models, multi-camera tracking only support **DeepSORT**. +- The applications of single camera tracking include pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking and traffic statistics. The models are mainly optimized based on FairMOT to achieve the effect of real-time tracking. At the same time, PP-Tracking provides pre-training models based on different application scenarios. +- In DeepSORT (including DeepSORT used in multi-camera tracking), the selected detectors are PaddeDetection's self-developed high-performance detector [PP-YOLOv2](../../configs/ppyolo/) and lightweight detector [PP-PicoDet](../../configs/picodet/), and the selected ReID model is PaddleClas's self-developed ultra lightweight backbone [PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/PP-LCNet.md) + +PP-Tracking provids multi-scenario pre-training models and the exported models for deployment: + +| Scene | Dataset | MOTA | Speed(FPS) | config | model weights | inference model | +| :---------: |:--------------- | :-------: | :------: | :------:|:-----: | :------------: | +| pedestrian | MOT17 | 65.3 | 23.9 | [config](../../configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [download](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar) | +| pedestrian(small objects) | VisDrone-pedestrian | 40.5| 8.35 | [config](../../configs/mot/pedestrian/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.pdparams) | [download](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.tar) | +| vehicle | BDD100k-vehicle | 32.6 | 24.3 | [config](../../configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.pdparams)| [download](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100kmot_vehicle.tar) | +| vehicle(small objects)| VisDrone-vehicle | 39.8 | 22.8 | [config](../../configs/mot/vehicle/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.pdparams) | [download](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_vehicle.tar) +| multi-class | BDD100k | - | 12.5 | [config](../../configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.pdparams) | [download](https://bj.bcebos.com/v1/paddledet/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.tar) | +| multi-class(small objects) | VisDrone | 20.4 | 6.74 | [config](../../configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [download](https://bj.bcebos.com/v1/paddledet/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.tar) | + +**Note:** +1. The equipment predicted by the model is **NVIDIA Jetson Xavier NX**, the speed is tested by **TensorRT FP16**, and the test environment is CUDA 10.2, JETPACK 4.5.1, TensorRT 7.1. +2. `model weights` means the weights saved directly after PaddleDetection training. For more tracking model weights, please refer to [modelzoo](../../configs/mot/README.md#模型库), you can also train according to the corresponding model config file and get the model weights. +3. `inference model` means the model weights with only forward parameters after exported, because only forward parameters are required during the deployment of PP-Tracking project. It can be downloaded and exported according to [modelzoo](../../configs/mot/README.md#模型库), you can also train according to the corresponding model config file and get the model weights, and then export them。In exported model files, there should be `infer_cfg.yml`,`model.pdiparams`,`model.pdiparams.info` and `model.pdmodel` four files in total, which are generally packaged in tar format. + + +## Citations +``` +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +@InProceedings{bdd100k, + author = {Yu, Fisher and Chen, Haofeng and Wang, Xin and Xian, Wenqi and Chen, + Yingying and Liu, Fangchen and Madhavan, Vashisht and Darrell, Trevor}, + title = {BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} +@article{zhang2020fair, + title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking}, + author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu}, + journal={arXiv preprint arXiv:2004.01888}, + year={2020} +} +@inproceedings{Wojke2018deep, + title={Deep Cosine Metric Learning for Person Re-identification}, + author={Wojke, Nicolai and Bewley, Alex}, + booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, + year={2018}, + pages={748--756}, + organization={IEEE}, + doi={10.1109/WACV.2018.00087} +} +``` diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/pptracking/cpp/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..3656cde94a914f3df903571254d9303a02fdac79 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/CMakeLists.txt @@ -0,0 +1,242 @@ +cmake_minimum_required(VERSION 3.0) +project(PaddleObjectDetector CXX C) + +option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON) +option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON) +option(WITH_TENSORRT "Compile demo with TensorRT." OFF) + +SET(PADDLE_DIR "" CACHE PATH "Location of libraries") +SET(PADDLE_LIB_NAME "" CACHE STRING "libpaddle_inference") +SET(OPENCV_DIR "" CACHE PATH "Location of libraries") +SET(CUDA_LIB "" CACHE PATH "Location of libraries") +SET(CUDNN_LIB "" CACHE PATH "Location of libraries") +SET(TENSORRT_INC_DIR "" CACHE PATH "Compile demo with TensorRT") +SET(TENSORRT_LIB_DIR "" CACHE PATH "Compile demo with TensorRT") + +include(cmake/yaml-cpp.cmake) + +include_directories("${CMAKE_SOURCE_DIR}/") +include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include") +link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib") + +set(SRCS src/main.cc src/preprocess_op.cc src/pipeline.cc src/jde_predictor.cc src/sde_predictor.cc src/tracker.cc src/trajectory.cc src/lapjv.cpp src/postprocess.cc) + +macro(safe_set_static_flag) + foreach(flag_var + CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE + CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO) + if(${flag_var} MATCHES "/MD") + string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}") + endif(${flag_var} MATCHES "/MD") + endforeach(flag_var) +endmacro() + +if (WITH_MKL) + ADD_DEFINITIONS(-DUSE_MKL) +endif() + +if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "") + message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir") +endif() +message("PADDLE_DIR IS:" ${PADDLE_DIR}) + +if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "") + message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv") +endif() + +include_directories("${CMAKE_SOURCE_DIR}/") +include_directories("${PADDLE_DIR}/") +include_directories("${PADDLE_DIR}/third_party/install/protobuf/include") +include_directories("${PADDLE_DIR}/third_party/install/glog/include") +include_directories("${PADDLE_DIR}/third_party/install/gflags/include") +include_directories("${PADDLE_DIR}/third_party/install/xxhash/include") +if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include") + include_directories("${PADDLE_DIR}/third_party/install/snappy/include") +endif() +if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include") + include_directories("${PADDLE_DIR}/third_party/install/snappystream/include") +endif() +include_directories("${PADDLE_DIR}/third_party/boost") +include_directories("${PADDLE_DIR}/third_party/eigen3") + +if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + link_directories("${PADDLE_DIR}/third_party/install/snappy/lib") +endif() +if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib") +endif() + +link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib") +link_directories("${PADDLE_DIR}/third_party/install/glog/lib") +link_directories("${PADDLE_DIR}/third_party/install/gflags/lib") +link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib") +link_directories("${PADDLE_DIR}/paddle/lib/") +link_directories("${CMAKE_CURRENT_BINARY_DIR}") + + + +if (WIN32) + include_directories("${PADDLE_DIR}/paddle/fluid/inference") + include_directories("${PADDLE_DIR}/paddle/include") + link_directories("${PADDLE_DIR}/paddle/fluid/inference") + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH) + +else () + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH) + include_directories("${PADDLE_DIR}/paddle/include") + link_directories("${PADDLE_DIR}/paddle/lib") +endif () +include_directories(${OpenCV_INCLUDE_DIRS}) + +if (WIN32) + add_definitions("/DGOOGLE_GLOG_DLL_DECL=") + set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT") + set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT") +else() + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11") + set(CMAKE_STATIC_LIBRARY_PREFIX "") +endif() + +# TODO let users define cuda lib path +if (WITH_GPU) + if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "") + message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64") + endif() + if (NOT WIN32) + if (NOT DEFINED CUDNN_LIB) + message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64") + endif() + endif(NOT WIN32) +endif() + + +if (NOT WIN32) + if (WITH_TENSORRT AND WITH_GPU) + include_directories("${TENSORRT_INC_DIR}/") + link_directories("${TENSORRT_LIB_DIR}/") + endif() +endif(NOT WIN32) + +if (NOT WIN32) + set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph") + if(EXISTS ${NGRAPH_PATH}) + include(GNUInstallDirs) + include_directories("${NGRAPH_PATH}/include") + link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}") + set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() +endif() + +if(WITH_MKL) + include_directories("${PADDLE_DIR}/third_party/install/mklml/include") + if (WIN32) + set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib + ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib) + else () + set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} + ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX}) + execute_process(COMMAND cp -r ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib) + endif () + set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn") + if(EXISTS ${MKLDNN_PATH}) + include_directories("${MKLDNN_PATH}/include") + if (WIN32) + set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib) + else () + set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0) + endif () + endif() +else() + set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX}) +endif() + + +if (WIN32) + if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}") + set(DEPS + ${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) + else() + set(DEPS + ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) + endif() +endif() + + +if (WIN32) + set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}) +else() + set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_SHARED_LIBRARY_SUFFIX}) +endif() + +message("PADDLE_LIB_NAME:" ${PADDLE_LIB_NAME}) +message("DEPS:" $DEPS) + +if (NOT WIN32) + set(DEPS ${DEPS} + ${MATH_LIB} ${MKLDNN_LIB} + glog gflags protobuf z xxhash yaml-cpp + ) + if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + set(DEPS ${DEPS} snappystream) + endif() + if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + set(DEPS ${DEPS} snappy) + endif() +else() + set(DEPS ${DEPS} + ${MATH_LIB} ${MKLDNN_LIB} + glog gflags_static libprotobuf xxhash libyaml-cppmt) + set(DEPS ${DEPS} libcmt shlwapi) + if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib") + set(DEPS ${DEPS} snappy) + endif() + if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib") + set(DEPS ${DEPS} snappystream) + endif() +endif(NOT WIN32) + +if(WITH_GPU) + if(NOT WIN32) + if (WITH_TENSORRT) + set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() + set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX}) + else() + set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} ) + set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} ) + set(DEPS ${DEPS} ${CUDNN_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX}) + endif() +endif() + +if (NOT WIN32) + set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread") + set(DEPS ${DEPS} ${EXTERNAL_LIB}) +endif() + +set(DEPS ${DEPS} ${OpenCV_LIBS}) +add_executable(main ${SRCS}) +ADD_DEPENDENCIES(main ext-yaml-cpp) +message("DEPS:" $DEPS) +target_link_libraries(main ${DEPS}) + +if (WIN32 AND WITH_MKL) + add_custom_command(TARGET main POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll + ) +endif() + +if (WIN32) + add_custom_command(TARGET main POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll + ) +endif() diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/README.md b/PaddleDetection-release-2.6/deploy/pptracking/cpp/README.md new file mode 100644 index 0000000000000000000000000000000000000000..425ec11b8be35010d2db5d6cefbf920afb4f519c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/README.md @@ -0,0 +1,175 @@ +# C++端预测部署 + +在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。当前C++部署支持基于Fairmot的单镜头类别预测部署,并支持人流量统计、出入口计数功能。 + +主要包含三个步骤: +- 准备环境 +- 导出预测模型 +- C++预测 + +## 一、准备环境 + +环境要求: + +- GCC 8.2 +- CUDA 10.1/10.2/11.1; CUDNN 7.6/8.1 +- CMake 3.0+ +- TensorRT 6/7 + +NVIDIA Jetson用户请参考[Jetson平台编译指南](../../cpp/docs/Jetson_build.md#jetson环境搭建)完成JetPack安装 + +### 1. 下载代码 + +``` +git clone https://github.com/PaddlePaddle/PaddleDetection.git +# C++部署代码与其他目录代码独立 +cd deploy/pptracking/cpp +``` + +### 2. 下载或编译PaddlePaddle C++预测库 + +请根据环境选择适当的预测库进行下载,参考[C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html) + +下载并解压后`./paddle_inference`目录包含内容为: + +``` +paddle_inference +├── paddle # paddle核心库和头文件 +| +├── third_party # 第三方依赖库和头文件 +| +└── version.txt # 版本和编译信息 +``` + +**注意:** 如果用户环境与官网提供环境不一致(如cuda 、cudnn、tensorrt版本不一致等),或对飞桨源代码有修改需求,或希望进行定制化构建,可参考[文档](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)自行源码编译预测库。 + +### 3. 编译 + +
    +Linux编译: + +编译`cmake`的命令在`scripts/build.sh`中,请根据实际情况修改主要参数,其主要内容说明如下: + +``` +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=ON + +# 是否使用MKL or openblas,TX2需要设置为OFF +WITH_MKL=OFF + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=ON + +# TensorRT 的include路径 +TENSORRT_INC_DIR=/path/to/TensorRT/include + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/path/to/TensorRT/lib + +# Paddle 预测库路径 +PADDLE_DIR=/path/to/paddle_inference/ + +# Paddle 预测库名称 +PADDLE_LIB_NAME=libpaddle_inference + +# CUDA 的 lib 路径 +CUDA_LIB=/path/to/cuda/lib + +# CUDNN 的 lib 路径 +CUDNN_LIB=/path/to/cudnn/lib + +# OPENCV路径 +OPENCV_DIR=/path/to/opencv +``` + +修改脚本设置好主要参数后,执行```build.sh```脚本: + +``` +sh ./scripts/build.sh +``` + + +
    +
    +Windows编译: + +- 安装配置OpenCV + 1. 在OpenCV官网下载适用于Windows平台的3.4.6版本,[下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download) + 2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv` + 3. 配置环境变量,如下流程所示(如果使用全局绝对路径,可以不用设置环境变量) + + - 我的电脑->属性->高级系统设置->环境变量 + - 在系统变量中找到Path(如没有,自行创建),并双击编辑 + - 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin` + +- 使用CMake生成项目文件 + + 执行如下命令项目文件: +``` +cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=path_to_opencv -DWITH_KEYPOINT=ON +``` + +- 编译 +用`Visual Studio 2019`打开`cpp`文件夹下的`PaddleObjectDetector.sln`,将编译模式设置为`Release`,点击`生成`->`全部生成 + +编译产出的可执行文件在`Release`目录下 + +
    + +**注意:** + +1. `TX2`平台的`CUDA`、`CUDNN`需要通过`JetPack`安装。 +2. 已提供linux和tx2平台的opencv下载方式,其他环境请自行安装[opencv](https://opencv.org/) +3. Windows用户推荐使用Visual Studio 2019编译 + +## 二、导出预测模型 + +将训练保存的权重导出为预测库需要的模型格式,使用PaddleDetection下的```tools/export_model.py```导出模型 + +``` +python tools/export_model.py -c configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams +``` + +预测模型会默认导出到```output_inference/fairmot_hrnetv2_w18_dlafpn_30e_576x320```目录下,包括```infer_cfg.yml```, ```model.pdiparams```, ```model.pdiparams.info```, ```model.pdmodel``` + +导出模型也可以通过[预测模型列表](../README.md)中'算法介绍部分'直接下载使用 + +## 三、C++预测 + +完成以上步骤后,可以通过```build/main```(Linux)或```main.exe```(Windows)进行预测,参数列表如下: + +| 参数 | 说明 | +| ---- | ---- | +| --track_model_dir | 导出的跟踪预测模型所在路径 | +| --video_file | 要预测的视频文件路径 | +| --device | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --gpu_id | 指定进行推理的GPU device id(默认值为0)| +| --run_mode | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --output_dir | 输出图片所在的文件夹, 默认为output | +| --use_mkldnn | CPU预测中是否开启MKLDNN加速 | +| --cpu_threads | 设置cpu线程数,默认为1 | +| --do_entrance_counting | 是否进行出入口流量统计,默认为否 | +| --save_result | 是否保存跟踪结果 | + +样例一: + +```shell +# 使用CPU测试视频 `test.mp4` , 模型和测试视频均移至`build`目录下 + +./main --track_model_dir=./fairmot_hrnetv2_w18_dlafpn_30e_576x320 --video_file=test.mp4 + +# 视频可视化预测结果默认保存在当前目录下output/test.mp4文件中 +``` + + +样例二: + +```shell +# 使用GPU测试视频 `test.mp4` , 模型和测试视频均移至`build`目录下,实现出入口计数功能,并保存跟踪结果 + +./main -video_file=test.mp4 -track_model_dir=./fairmot_dla34_30e_1088x608/ --device=gpu --do_entrance_counting=True --save_result=True + +# 视频可视化预测结果默认保存在当前目录下`output/test.mp4`中 +# 跟踪结果保存在`output/mot_output.txt`中 +# 计数结果保存在`output/flow_statistic.txt`中 +``` diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/cmake/yaml-cpp.cmake b/PaddleDetection-release-2.6/deploy/pptracking/cpp/cmake/yaml-cpp.cmake new file mode 100644 index 0000000000000000000000000000000000000000..7bc7f34d476d69d57336940bcf6c8c55311b8112 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/cmake/yaml-cpp.cmake @@ -0,0 +1,30 @@ + +find_package(Git REQUIRED) + +include(ExternalProject) + +message("${CMAKE_BUILD_TYPE}") + +ExternalProject_Add( + ext-yaml-cpp + URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip + URL_MD5 9542d6de397d1fbd649ed468cb5850e6 + CMAKE_ARGS + -DYAML_CPP_BUILD_TESTS=OFF + -DYAML_CPP_BUILD_TOOLS=OFF + -DYAML_CPP_INSTALL=OFF + -DYAML_CPP_BUILD_CONTRIB=OFF + -DMSVC_SHARED_RT=OFF + -DBUILD_SHARED_LIBS=OFF + -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} + -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} + -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG} + -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE} + -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + -DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp" + # Disable install step + INSTALL_COMMAND "" + LOG_DOWNLOAD ON + LOG_BUILD 1 +) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/config_parser.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/config_parser.h new file mode 100644 index 0000000000000000000000000000000000000000..c71b160db7a3b2c8b19fd5443b8f766cd2fc7b29 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/config_parser.h @@ -0,0 +1,137 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include + +#include "yaml-cpp/yaml.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleDetection { + +// Inference model configuration parser +class ConfigPaser { + public: + ConfigPaser() {} + + ~ConfigPaser() {} + + bool load_config(const std::string& model_dir, + const std::string& cfg = "infer_cfg.yml") { + // Load as a YAML::Node + YAML::Node config; + config = YAML::LoadFile(model_dir + OS_PATH_SEP + cfg); + + // Get runtime mode : paddle, trt_fp16, trt_fp32 + if (config["mode"].IsDefined()) { + mode_ = config["mode"].as(); + } else { + std::cerr << "Please set mode, " + << "support value : paddle/trt_fp16/trt_fp32." << std::endl; + return false; + } + + // Get model arch: FairMot or YOLO/Picodet/LCNet for DeepSort + if (config["arch"].IsDefined()) { + arch_ = config["arch"].as(); + } else { + std::cerr << "Please set model arch," + << "support value : FairMot, YOLO, PicoDet, LCNet etc" + << std::endl; + return false; + } + + // Get min_subgraph_size for tensorrt + if (config["min_subgraph_size"].IsDefined()) { + min_subgraph_size_ = config["min_subgraph_size"].as(); + } else { + std::cerr << "Please set min_subgraph_size." << std::endl; + return false; + } + // Get draw_threshold for visualization + if (config["draw_threshold"].IsDefined()) { + draw_threshold_ = config["draw_threshold"].as(); + } else { + std::cerr << "Please set draw_threshold." << std::endl; + return false; + } + // Get Preprocess for preprocessing + if (config["Preprocess"].IsDefined()) { + preprocess_info_ = config["Preprocess"]; + } else { + std::cerr << "Please set Preprocess." << std::endl; + return false; + } + // Get label_list for visualization + if (config["label_list"].IsDefined()) { + label_list_ = config["label_list"].as>(); + } else { + std::cerr << "Please set label_list." << std::endl; + return false; + } + + // Get use_dynamic_shape for TensorRT + if (config["use_dynamic_shape"].IsDefined()) { + use_dynamic_shape_ = config["use_dynamic_shape"].as(); + } else { + std::cerr << "Please set use_dynamic_shape." << std::endl; + return false; + } + + // Get conf_thresh for tracker + if (config["tracker"].IsDefined()) { + if (config["tracker"]["conf_thres"].IsDefined()) { + conf_thresh_ = config["tracker"]["conf_thres"].as(); + } else { + std::cerr << "Please set conf_thres in tracker." << std::endl; + return false; + } + } + + // Get NMS for postprocess + if (config["NMS"].IsDefined()) { + nms_info_ = config["NMS"]; + } + // Get fpn_stride in PicoDet + if (config["fpn_stride"].IsDefined()) { + fpn_stride_.clear(); + for (auto item : config["fpn_stride"]) { + fpn_stride_.emplace_back(item.as()); + } + } + + return true; + } + std::string mode_; + float draw_threshold_; + std::string arch_; + int min_subgraph_size_; + YAML::Node preprocess_info_; + YAML::Node nms_info_; + std::vector label_list_; + std::vector fpn_stride_; + bool use_dynamic_shape_; + float conf_thresh_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/jde_predictor.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/jde_predictor.h new file mode 100644 index 0000000000000000000000000000000000000000..32f5921061cd783a246eca29dff4ecb5241338c7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/jde_predictor.h @@ -0,0 +1,97 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/preprocess_op.h" +#include "include/utils.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +class JDEPredictor { + public: + explicit JDEPredictor(const std::string& device = "CPU", + const std::string& model_dir = "", + const double threshold = -1., + const std::string& run_mode = "paddle", + const int gpu_id = 0, + const bool use_mkldnn = false, + const int cpu_threads = 1, + bool trt_calib_mode = false, + const int min_box_area = 200) { + this->device_ = device; + this->gpu_id_ = gpu_id; + this->use_mkldnn_ = use_mkldnn; + this->cpu_math_library_num_threads_ = cpu_threads; + this->trt_calib_mode_ = trt_calib_mode; + this->min_box_area_ = min_box_area; + + config_.load_config(model_dir); + this->min_subgraph_size_ = config_.min_subgraph_size_; + preprocessor_.Init(config_.preprocess_info_); + LoadModel(model_dir, run_mode); + this->conf_thresh_ = config_.conf_thresh_; + } + + // Load Paddle inference model + void LoadModel(const std::string& model_dir, + const std::string& run_mode = "paddle"); + + // Run predictor + void Predict(const std::vector imgs, + const double threshold = 0.5, + MOTResult* result = nullptr, + std::vector* times = nullptr); + + private: + std::string device_ = "CPU"; + float threhold = 0.5; + int gpu_id_ = 0; + bool use_mkldnn_ = false; + int cpu_math_library_num_threads_ = 1; + int min_subgraph_size_ = 3; + bool trt_calib_mode_ = false; + + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(const cv::Mat dets, const cv::Mat emb, MOTResult* result); + + std::shared_ptr predictor_; + Preprocessor preprocessor_; + ImageBlob inputs_; + std::vector bbox_data_; + std::vector emb_data_; + double threshold_; + ConfigPaser config_; + float min_box_area_; + float conf_thresh_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/lapjv.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/lapjv.h new file mode 100644 index 0000000000000000000000000000000000000000..ffaa010c00525e9babac1bfe276189187dc840f8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/lapjv.h @@ -0,0 +1,64 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/gatagat/lap/blob/master/lap/lapjv.h +// Ths copyright of gatagat/lap is as follows: +// MIT License + +#ifndef DEPLOY_PPTRACKING_CPP_INCLUDE_LAPJV_H_ +#define DEPLOY_PPTRACKING_CPP_INCLUDE_LAPJV_H_ +#define LARGE 1000000 + +#if !defined TRUE +#define TRUE 1 +#endif +#if !defined FALSE +#define FALSE 0 +#endif + +#define NEW(x, t, n) \ + if ((x = reinterpret_cast(malloc(sizeof(t) * (n)))) == 0) { \ + return -1; \ + } +#define FREE(x) \ + if (x != 0) { \ + free(x); \ + x = 0; \ + } +#define SWAP_INDICES(a, b) \ + { \ + int_t _temp_index = a; \ + a = b; \ + b = _temp_index; \ + } +#include + +namespace PaddleDetection { + +typedef signed int int_t; +typedef unsigned int uint_t; +typedef double cost_t; +typedef char boolean; +typedef enum fp_t { FP_1 = 1, FP_2 = 2, FP_DYNAMIC = 3 } fp_t; + +int lapjv_internal(const cv::Mat &cost, + const bool extend_cost, + const float cost_limit, + int *x, + int *y); + +} // namespace PaddleDetection + +#endif // DEPLOY_PPTRACKING_CPP_INCLUDE_LAPJV_H_ diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/pipeline.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/pipeline.h new file mode 100644 index 0000000000000000000000000000000000000000..d17b4d35aef5efaaa7e99889f5a7c52ec864b394 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/pipeline.h @@ -0,0 +1,142 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef DEPLOY_PPTRACKING_CPP_INCLUDE_PIPELINE_H_ +#define DEPLOY_PPTRACKING_CPP_INCLUDE_PIPELINE_H_ + +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifdef _WIN32 +#include +#include +#elif LINUX +#include +#include +#endif + +#include "include/jde_predictor.h" +#include "include/sde_predictor.h" + +namespace PaddleDetection { + +class Pipeline { + public: + explicit Pipeline(const std::string& device, + const double threshold, + const std::string& output_dir, + const std::string& run_mode = "paddle", + const int gpu_id = 0, + const bool use_mkldnn = false, + const int cpu_threads = 1, + const bool trt_calib_mode = false, + const bool do_entrance_counting = false, + const bool save_result = false, + const std::string& scene = "pedestrian", + const bool tiny_obj = false, + const bool is_mtmct = false, + const int secs_interval = 10, + const std::string track_model_dir = "", + const std::string det_model_dir = "", + const std::string reid_model_dir = "") { + std::vector input; + this->input_ = input; + this->device_ = device; + this->threshold_ = threshold; + this->output_dir_ = output_dir; + this->run_mode_ = run_mode; + this->gpu_id_ = gpu_id; + this->use_mkldnn_ = use_mkldnn; + this->cpu_threads_ = cpu_threads; + this->trt_calib_mode_ = trt_calib_mode; + this->do_entrance_counting_ = do_entrance_counting; + this->secs_interval_ = secs_interval_; + this->save_result_ = save_result; + SelectModel(scene, + tiny_obj, + is_mtmct, + track_model_dir, + det_model_dir, + reid_model_dir); + InitPredictor(); + } + + // Set input, it must execute before Run() + void SetInput(const std::string& input_video); + void ClearInput(); + + // Run pipeline in video + void Run(); + void PredictMOT(const std::string& video_path); + void PredictMTMCT(const std::vector video_inputs); + + // Run pipeline in stream + void RunMOTStream(const cv::Mat img, + const int frame_id, + const int video_fps, + const Rect entrance, + cv::Mat out_img, + std::vector* records, + std::set* count_set, + std::set* interval_count_set, + std::vector* in_count_list, + std::vector* out_count_list, + std::map>* prev_center, + std::vector* flow_records); + void RunMTMCTStream(const std::vector imgs, + std::vector* records); + + void PrintBenchmarkLog(const std::vector det_time, const int img_num); + + private: + // Select model according to scenes, it must execute before Run() + void SelectModel(const std::string& scene = "pedestrian", + const bool tiny_obj = false, + const bool is_mtmct = false, + const std::string track_model_dir = "", + const std::string det_model_dir = "", + const std::string reid_model_dir = ""); + void InitPredictor(); + + std::shared_ptr jde_sct_; + std::shared_ptr sde_sct_; + + std::vector input_; + std::vector stream_; + std::string device_; + double threshold_; + std::string output_dir_; + std::string track_model_dir_; + std::string det_model_dir_; + std::string reid_model_dir_; + std::string run_mode_ = "paddle"; + int gpu_id_ = 0; + bool use_mkldnn_ = false; + int cpu_threads_ = 1; + bool trt_calib_mode_ = false; + bool do_entrance_counting_ = false; + bool save_result_ = false; + int secs_interval_ = 10; +}; + +} // namespace PaddleDetection + +#endif // DEPLOY_PPTRACKING_CPP_INCLUDE_PIPELINE_H_ diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/postprocess.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..41b10960351d69a0e3388adfc763c11f162a0bbd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/postprocess.h @@ -0,0 +1,62 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "include/utils.h" + +namespace PaddleDetection { + +// Generate visualization color +cv::Scalar GetColor(int idx); + +// Visualize Tracking Results +cv::Mat VisualizeTrackResult(const cv::Mat& img, + const MOTResult& results, + const float fps, + const int frame_id); + +// Pedestrian/Vehicle Counting +void FlowStatistic(const MOTResult& results, + const int frame_id, + const int secs_interval, + const bool do_entrance_counting, + const int video_fps, + const Rect entrance, + std::set* id_set, + std::set* interval_id_set, + std::vector* in_id_list, + std::vector* out_id_list, + std::map>* prev_center, + std::vector* records); + +// Save Tracking Results +void SaveMOTResult(const MOTResult& results, + const int frame_id, + std::vector* records); + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/predictor.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/predictor.h new file mode 100644 index 0000000000000000000000000000000000000000..cfb6306513ac9c04e5e3254fc8e8f681b5bcedbb --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/predictor.h @@ -0,0 +1,99 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/jde_predictor.h" +#include "include/preprocess_op.h" +#include "include/sde_predictor.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +class Predictor { + public: + explicit Predictor(const std::string& device = "CPU", + const std::string& track_model_dir = "", + const std::string& det_model_dir = "", + const std::string& reid_model_dir = "", + const double threshold = -1., + const std::string& run_mode = "paddle", + const int gpu_id = 0, + const bool use_mkldnn = false, + const int cpu_threads = 1, + bool trt_calib_mode = false, + const int min_box_area = 200) { + if (track_model_dir.empty() && det_model_dir.empty()) { + throw "Predictor must receive track_model or det_model!"; + } + + if (!track_model_dir.empty() && !det_model_dir.empty()) { + throw "Predictor only receive one of track_model or det_model!"; + } + + if (!track_model_dir.empty()) { + jde_sct_ = + std::make_shared(device, + track_model_dir, + threshold, + run_mode, + gpu_id, + use_mkldnn, + cpu_threads, + trt_calib_mode, + min_box_area); + use_jde_ = true; + } + if (!det_model_dir.empty()) { + sde_sct_ = std::make_shared(device, + det_model_dir, + reid_model_dir, + threshold, + run_mode, + gpu_id, + use_mkldnn, + cpu_threads, + trt_calib_mode, + min_box_area); + use_jde_ = false; + } + } + + // Run predictor + void Predict(const std::vector imgs, + const double threshold = 0.5, + MOTResult* result = nullptr, + std::vector* times = nullptr); + + private: + std::shared_ptr jde_sct_; + std::shared_ptr sde_sct_; + bool use_jde_ = true; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/preprocess_op.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/preprocess_op.h new file mode 100644 index 0000000000000000000000000000000000000000..b45388c91cde778b9b83e8f1c297878edf02573e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/preprocess_op.h @@ -0,0 +1,171 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace PaddleDetection { + +// Object for storing all preprocessed data +class ImageBlob { + public: + // image width and height + std::vector im_shape_; + // Buffer for image data after preprocessing + std::vector im_data_; + // in net data shape(after pad) + std::vector in_net_shape_; + // Evaluation image width and height + // std::vector eval_im_size_f_; + // Scale factor for image size to origin image size + std::vector scale_factor_; +}; + +// Abstraction of preprocessing opration class +class PreprocessOp { + public: + virtual void Init(const YAML::Node& item) = 0; + virtual void Run(cv::Mat* im, ImageBlob* data) = 0; +}; + +class InitInfo : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class NormalizeImage : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + mean_ = item["mean"].as>(); + scale_ = item["std"].as>(); + is_scale_ = item["is_scale"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + // CHW or HWC + std::vector mean_; + std::vector scale_; + bool is_scale_; +}; + +class Permute : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) {} + virtual void Run(cv::Mat* im, ImageBlob* data); +}; + +class Resize : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + interp_ = item["interp"].as(); + keep_ratio_ = item["keep_ratio"].as(); + target_size_ = item["target_size"].as>(); + } + + // Compute best resize scale for x-dimension, y-dimension + std::pair GenerateScale(const cv::Mat& im); + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int interp_; + bool keep_ratio_; + std::vector target_size_; + std::vector in_net_shape_; +}; + +class LetterBoxResize : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + target_size_ = item["target_size"].as>(); + } + + float GenerateScale(const cv::Mat& im); + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector target_size_; + std::vector in_net_shape_; +}; +// Models with FPN need input shape % stride == 0 +class PadStride : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + stride_ = item["stride"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int stride_; +}; + +class Preprocessor { + public: + void Init(const YAML::Node& config_node) { + // initialize image info at first + ops_["InitInfo"] = std::make_shared(); + for (const auto& item : config_node) { + auto op_name = item["type"].as(); + + ops_[op_name] = CreateOp(op_name); + ops_[op_name]->Init(item); + } + } + + std::shared_ptr CreateOp(const std::string& name) { + if (name == "Resize") { + return std::make_shared(); + } else if (name == "LetterBoxResize") { + return std::make_shared(); + } else if (name == "Permute") { + return std::make_shared(); + } else if (name == "NormalizeImage") { + return std::make_shared(); + } else if (name == "PadStride") { + // use PadStride instead of PadBatch + return std::make_shared(); + } + std::cerr << "can not find function of OP: " << name + << " and return: nullptr" << std::endl; + return nullptr; + } + + void Run(cv::Mat* im, ImageBlob* data); + + public: + static const std::vector RUN_ORDER; + + private: + std::unordered_map> ops_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/sde_predictor.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/sde_predictor.h new file mode 100644 index 0000000000000000000000000000000000000000..3919eb105d1ea8ab8dfba63f668ae40e109d763a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/sde_predictor.h @@ -0,0 +1,106 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "paddle_inference_api.h" // NOLINT + +#include "include/config_parser.h" +#include "include/preprocess_op.h" +#include "include/utils.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +class SDEPredictor { + public: + explicit SDEPredictor(const std::string& device, + const std::string& det_model_dir = "", + const std::string& reid_model_dir = "", + const double threshold = -1., + const std::string& run_mode = "paddle", + const int gpu_id = 0, + const bool use_mkldnn = false, + const int cpu_threads = 1, + bool trt_calib_mode = false, + const int min_box_area = 200) { + this->device_ = device; + this->gpu_id_ = gpu_id; + this->use_mkldnn_ = use_mkldnn; + this->cpu_math_library_num_threads_ = cpu_threads; + this->trt_calib_mode_ = trt_calib_mode; + this->min_box_area_ = min_box_area; + + det_config_.load_config(det_model_dir); + this->min_subgraph_size_ = det_config_.min_subgraph_size_; + det_preprocessor_.Init(det_config_.preprocess_info_); + + reid_config_.load_config(reid_model_dir); + reid_preprocessor_.Init(reid_config_.preprocess_info_); + + LoadModel(det_model_dir, reid_model_dir, run_mode); + this->conf_thresh_ = det_config_.conf_thresh_; + } + + // Load Paddle inference model + void LoadModel(const std::string& det_model_dir, + const std::string& reid_model_dir, + const std::string& run_mode = "paddle"); + + // Run predictor + void Predict(const std::vector imgs, + const double threshold = 0.5, + MOTResult* result = nullptr, + std::vector* times = nullptr); + + private: + std::string device_ = "CPU"; + float threhold = 0.5; + int gpu_id_ = 0; + bool use_mkldnn_ = false; + int cpu_math_library_num_threads_ = 1; + int min_subgraph_size_ = 3; + bool trt_calib_mode_ = false; + + // Preprocess image and copy data to input buffer + void Preprocess(const cv::Mat& image_mat); + // Postprocess result + void Postprocess(const cv::Mat dets, const cv::Mat emb, MOTResult* result); + + std::shared_ptr det_predictor_; + std::shared_ptr reid_predictor_; + Preprocessor det_preprocessor_; + Preprocessor reid_preprocessor_; + ImageBlob inputs_; + std::vector bbox_data_; + std::vector emb_data_; + double threshold_; + ConfigPaser det_config_; + ConfigPaser reid_config_; + float min_box_area_ = 200; + float conf_thresh_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/tracker.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/tracker.h new file mode 100644 index 0000000000000000000000000000000000000000..244530f140a3728b1f37d032c2d74693bd7e8f74 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/tracker.h @@ -0,0 +1,72 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/jdetracker.h +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#pragma once + +#include +#include + +#include +#include +#include +#include "include/trajectory.h" + +namespace PaddleDetection { + +typedef std::map Match; +typedef std::map::iterator MatchIterator; + +struct Track { + int id; + float score; + cv::Vec4f ltrb; +}; + +class JDETracker { + public: + static JDETracker *instance(void); + virtual bool update(const cv::Mat &dets, + const cv::Mat &emb, + std::vector *tracks); + + private: + JDETracker(void); + virtual ~JDETracker(void) {} + cv::Mat motion_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b); + void linear_assignment(const cv::Mat &cost, + float cost_limit, + Match *matches, + std::vector *mismatch_row, + std::vector *mismatch_col); + void remove_duplicate_trajectory(TrajectoryPool *a, + TrajectoryPool *b, + float iou_thresh = 0.15f); + + private: + static JDETracker *me; + int timestamp; + TrajectoryPool tracked_trajectories; + TrajectoryPool lost_trajectories; + TrajectoryPool removed_trajectories; + int max_lost_time; + float lambda; + float det_thresh; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/trajectory.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/trajectory.h new file mode 100644 index 0000000000000000000000000000000000000000..c21e8cac368a77983da2844794b7c778a97573a3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/trajectory.h @@ -0,0 +1,230 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/trajectory.h +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#pragma once + +#include + +#include +#include +#include +#include "opencv2/video/tracking.hpp" + +namespace PaddleDetection { + +typedef enum { New = 0, Tracked = 1, Lost = 2, Removed = 3 } TrajectoryState; + +class Trajectory; +typedef std::vector TrajectoryPool; +typedef std::vector::iterator TrajectoryPoolIterator; +typedef std::vector TrajectoryPtrPool; +typedef std::vector::iterator TrajectoryPtrPoolIterator; + +class TKalmanFilter : public cv::KalmanFilter { + public: + TKalmanFilter(void); + virtual ~TKalmanFilter(void) {} + virtual void init(const cv::Mat &measurement); + virtual const cv::Mat &predict(); + virtual const cv::Mat &correct(const cv::Mat &measurement); + virtual void project(cv::Mat *mean, cv::Mat *covariance) const; + + private: + float std_weight_position; + float std_weight_velocity; +}; + +inline TKalmanFilter::TKalmanFilter(void) : cv::KalmanFilter(8, 4) { + cv::KalmanFilter::transitionMatrix = cv::Mat::eye(8, 8, CV_32F); + for (int i = 0; i < 4; ++i) + cv::KalmanFilter::transitionMatrix.at(i, i + 4) = 1; + cv::KalmanFilter::measurementMatrix = cv::Mat::eye(4, 8, CV_32F); + std_weight_position = 1 / 20.f; + std_weight_velocity = 1 / 160.f; +} + +class Trajectory : public TKalmanFilter { + public: + Trajectory(); + Trajectory(const cv::Vec4f <rb, float score, const cv::Mat &embedding); + Trajectory(const Trajectory &other); + Trajectory &operator=(const Trajectory &rhs); + virtual ~Trajectory(void) {} + + static int next_id(); + virtual const cv::Mat &predict(void); + virtual void update(Trajectory *traj, + int timestamp, + bool update_embedding = true); + virtual void activate(int timestamp); + virtual void reactivate(Trajectory *traj, int timestamp, bool newid = false); + virtual void mark_lost(void); + virtual void mark_removed(void); + + friend TrajectoryPool operator+(const TrajectoryPool &a, + const TrajectoryPool &b); + friend TrajectoryPool operator+(const TrajectoryPool &a, + const TrajectoryPtrPool &b); + friend TrajectoryPool &operator+=(TrajectoryPool &a, // NOLINT + const TrajectoryPtrPool &b); + friend TrajectoryPool operator-(const TrajectoryPool &a, + const TrajectoryPool &b); + friend TrajectoryPool &operator-=(TrajectoryPool &a, // NOLINT + const TrajectoryPool &b); + friend TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b); + friend TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, + TrajectoryPool *b); + friend TrajectoryPtrPool operator-(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b); + + friend cv::Mat embedding_distance(const TrajectoryPool &a, + const TrajectoryPool &b); + friend cv::Mat embedding_distance(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b); + friend cv::Mat embedding_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b); + + friend cv::Mat mahalanobis_distance(const TrajectoryPool &a, + const TrajectoryPool &b); + friend cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b); + friend cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b); + + friend cv::Mat iou_distance(const TrajectoryPool &a, const TrajectoryPool &b); + friend cv::Mat iou_distance(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b); + friend cv::Mat iou_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b); + + private: + void update_embedding(const cv::Mat &embedding); + + public: + TrajectoryState state; + cv::Vec4f ltrb; + cv::Mat smooth_embedding; + int id; + bool is_activated; + int timestamp; + int starttime; + float score; + + private: + static int count; + cv::Vec4f xyah; + cv::Mat current_embedding; + float eta; + int length; +}; + +inline cv::Vec4f ltrb2xyah(const cv::Vec4f <rb) { + cv::Vec4f xyah; + xyah[0] = (ltrb[0] + ltrb[2]) * 0.5f; + xyah[1] = (ltrb[1] + ltrb[3]) * 0.5f; + xyah[3] = ltrb[3] - ltrb[1]; + xyah[2] = (ltrb[2] - ltrb[0]) / xyah[3]; + return xyah; +} + +inline Trajectory::Trajectory() + : state(New), + ltrb(cv::Vec4f()), + smooth_embedding(cv::Mat()), + id(0), + is_activated(false), + timestamp(0), + starttime(0), + score(0), + eta(0.9), + length(0) {} + +inline Trajectory::Trajectory(const cv::Vec4f <rb_, + float score_, + const cv::Mat &embedding) + : state(New), + ltrb(ltrb_), + smooth_embedding(cv::Mat()), + id(0), + is_activated(false), + timestamp(0), + starttime(0), + score(score_), + eta(0.9), + length(0) { + xyah = ltrb2xyah(ltrb); + update_embedding(embedding); +} + +inline Trajectory::Trajectory(const Trajectory &other) + : state(other.state), + ltrb(other.ltrb), + id(other.id), + is_activated(other.is_activated), + timestamp(other.timestamp), + starttime(other.starttime), + xyah(other.xyah), + score(other.score), + eta(other.eta), + length(other.length) { + other.smooth_embedding.copyTo(smooth_embedding); + other.current_embedding.copyTo(current_embedding); + // copy state in KalmanFilter + + other.statePre.copyTo(cv::KalmanFilter::statePre); + other.statePost.copyTo(cv::KalmanFilter::statePost); + other.errorCovPre.copyTo(cv::KalmanFilter::errorCovPre); + other.errorCovPost.copyTo(cv::KalmanFilter::errorCovPost); +} + +inline Trajectory &Trajectory::operator=(const Trajectory &rhs) { + this->state = rhs.state; + this->ltrb = rhs.ltrb; + rhs.smooth_embedding.copyTo(this->smooth_embedding); + this->id = rhs.id; + this->is_activated = rhs.is_activated; + this->timestamp = rhs.timestamp; + this->starttime = rhs.starttime; + this->xyah = rhs.xyah; + this->score = rhs.score; + rhs.current_embedding.copyTo(this->current_embedding); + this->eta = rhs.eta; + this->length = rhs.length; + + // copy state in KalmanFilter + + rhs.statePre.copyTo(cv::KalmanFilter::statePre); + rhs.statePost.copyTo(cv::KalmanFilter::statePost); + rhs.errorCovPre.copyTo(cv::KalmanFilter::errorCovPre); + rhs.errorCovPost.copyTo(cv::KalmanFilter::errorCovPost); + + return *this; +} + +inline int Trajectory::next_id() { + ++count; + return count; +} + +inline void Trajectory::mark_lost(void) { state = Lost; } + +inline void Trajectory::mark_removed(void) { state = Removed; } + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/utils.h b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/utils.h new file mode 100644 index 0000000000000000000000000000000000000000..9d94492a430be3e25ceb572ceb181a5dce755637 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/include/utils.h @@ -0,0 +1,44 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include + +#include "include/tracker.h" + +namespace PaddleDetection { + +struct Rect { + float left; + float top; + float right; + float bottom; +}; + +struct MOTTrack { + int ids; + float score; + Rect rects; + int class_id = -1; +}; + +typedef std::vector MOTResult; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/scripts/build.sh b/PaddleDetection-release-2.6/deploy/pptracking/cpp/scripts/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..8b8d2cf7f970e774fd838aecefa08e258d569fe6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/scripts/build.sh @@ -0,0 +1,78 @@ +# 是否使用GPU(即是否使用 CUDA) +WITH_GPU=OFF + +# 是否使用MKL or openblas,TX2需要设置为OFF +WITH_MKL=ON + +# 是否集成 TensorRT(仅WITH_GPU=ON 有效) +WITH_TENSORRT=OFF + +# paddle 预测库lib名称,由于不同平台不同版本预测库lib名称不同,请查看所下载的预测库中`paddle_inference/lib/`文件夹下`lib`的名称 +PADDLE_LIB_NAME=libpaddle_inference + +# TensorRT 的include路径 +TENSORRT_INC_DIR=/path/to/tensorrt/include + +# TensorRT 的lib路径 +TENSORRT_LIB_DIR=/path/to/tensorrt/lib + +# Paddle 预测库路径 +PADDLE_DIR=/path/to/paddle_inference + +# CUDA 的 lib 路径 +CUDA_LIB=/path/to/cuda/lib + +# CUDNN 的 lib 路径 +CUDNN_LIB=/path/to/cudnn/lib + +MACHINE_TYPE=`uname -m` +echo "MACHINE_TYPE: "${MACHINE_TYPE} + + +if [ "$MACHINE_TYPE" = "x86_64" ] +then + echo "set OPENCV_DIR for x86_64" + # linux系统通过以下命令下载预编译的opencv + mkdir -p $(pwd)/deps && cd $(pwd)/deps + wget -c https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz + tar -xvf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz && cd .. + + # set OPENCV_DIR + OPENCV_DIR=$(pwd)/deps/opencv-3.4.16_gcc8.2_ffmpeg + +elif [ "$MACHINE_TYPE" = "aarch64" ] +then + echo "set OPENCV_DIR for aarch64" + # TX2平台通过以下命令下载预编译的opencv + mkdir -p $(pwd)/deps && cd $(pwd)/deps + wget -c https://bj.bcebos.com/v1/paddledet/data/TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0.tar.gz + tar -xvf TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0.tar.gz && cd .. + + # set OPENCV_DIR + OPENCV_DIR=$(pwd)/deps/TX2_JetPack4.3_opencv_3.4.6_gcc7.5.0/ + +else + echo "Please set OPENCV_DIR manually" +fi + +echo "OPENCV_DIR: "$OPENCV_DIR + +# 以下无需改动 +rm -rf build +mkdir -p build +cd build +cmake .. \ + -DWITH_GPU=${WITH_GPU} \ + -DWITH_MKL=${WITH_MKL} \ + -DWITH_TENSORRT=${WITH_TENSORRT} \ + -DTENSORRT_LIB_DIR=${TENSORRT_LIB_DIR} \ + -DTENSORRT_INC_DIR=${TENSORRT_INC_DIR} \ + -DPADDLE_DIR=${PADDLE_DIR} \ + -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \ + -DCUDA_LIB=${CUDA_LIB} \ + -DCUDNN_LIB=${CUDNN_LIB} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DPADDLE_LIB_NAME=${PADDLE_LIB_NAME} \ + +make +echo "make finished!" diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/jde_predictor.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/jde_predictor.cc new file mode 100644 index 0000000000000000000000000000000000000000..0673d8a0bb1d3dbb436e0614d53d563f245fa063 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/jde_predictor.cc @@ -0,0 +1,235 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/jde_predictor.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +// Load Model and create model predictor +void JDEPredictor::LoadModel(const std::string& model_dir, + const std::string& run_mode) { + paddle_infer::Config config; + std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel"; + std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams"; + config.SetModel(prog_file, params_file); + if (this->device_ == "GPU") { + config.EnableUseGpu(200, this->gpu_id_); + config.SwitchIrOptim(true); + // use tensorrt + if (run_mode != "paddle") { + auto precision = paddle_infer::Config::Precision::kFloat32; + if (run_mode == "trt_fp32") { + precision = paddle_infer::Config::Precision::kFloat32; + } else if (run_mode == "trt_fp16") { + precision = paddle_infer::Config::Precision::kHalf; + } else if (run_mode == "trt_int8") { + precision = paddle_infer::Config::Precision::kInt8; + } else { + printf( + "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " + "'trt_int8'"); + } + // set tensorrt + config.EnableTensorRtEngine(1 << 30, + 1, + this->min_subgraph_size_, + precision, + false, + this->trt_calib_mode_); + } + } else if (this->device_ == "XPU") { + config.EnableXpu(10 * 1024 * 1024); + } else { + config.DisableGpu(); + if (this->use_mkldnn_) { + config.EnableMKLDNN(); + // cache 10 different shapes for mkldnn to avoid memory leak + config.SetMkldnnCacheCapacity(10); + } + config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); + } + config.SwitchUseFeedFetchOps(false); + config.SwitchIrOptim(true); + config.DisableGlogInfo(); + // Memory optimization + config.EnableMemoryOptim(); + predictor_ = std::move(CreatePredictor(config)); +} + +void FilterDets(const float conf_thresh, + const cv::Mat dets, + std::vector* index) { + for (int i = 0; i < dets.rows; ++i) { + float score = *dets.ptr(i, 4); + if (score > conf_thresh) { + index->push_back(i); + } + } +} + +void JDEPredictor::Preprocess(const cv::Mat& ori_im) { + // Clone the image : keep the original mat for postprocess + cv::Mat im = ori_im.clone(); + preprocessor_.Run(&im, &inputs_); +} + +void JDEPredictor::Postprocess(const cv::Mat dets, + const cv::Mat emb, + MOTResult* result) { + result->clear(); + std::vector tracks; + std::vector valid; + FilterDets(conf_thresh_, dets, &valid); + cv::Mat new_dets, new_emb; + for (int i = 0; i < valid.size(); ++i) { + new_dets.push_back(dets.row(valid[i])); + new_emb.push_back(emb.row(valid[i])); + } + JDETracker::instance()->update(new_dets, new_emb, &tracks); + if (tracks.size() == 0) { + MOTTrack mot_track; + Rect ret = {*dets.ptr(0, 0), + *dets.ptr(0, 1), + *dets.ptr(0, 2), + *dets.ptr(0, 3)}; + mot_track.ids = 1; + mot_track.score = *dets.ptr(0, 4); + mot_track.rects = ret; + result->push_back(mot_track); + } else { + std::vector::iterator titer; + for (titer = tracks.begin(); titer != tracks.end(); ++titer) { + if (titer->score < threshold_) { + continue; + } else { + float w = titer->ltrb[2] - titer->ltrb[0]; + float h = titer->ltrb[3] - titer->ltrb[1]; + bool vertical = w / h > 1.6; + float area = w * h; + if (area > min_box_area_ && !vertical) { + MOTTrack mot_track; + Rect ret = { + titer->ltrb[0], titer->ltrb[1], titer->ltrb[2], titer->ltrb[3]}; + mot_track.rects = ret; + mot_track.score = titer->score; + mot_track.ids = titer->id; + result->push_back(mot_track); + } + } + } + } +} + +void JDEPredictor::Predict(const std::vector imgs, + const double threshold, + MOTResult* result, + std::vector* times) { + auto preprocess_start = std::chrono::steady_clock::now(); + int batch_size = imgs.size(); + + // in_data_batch + std::vector in_data_all; + std::vector im_shape_all(batch_size * 2); + std::vector scale_factor_all(batch_size * 2); + + // Preprocess image + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + Preprocess(im); + im_shape_all[bs_idx * 2] = inputs_.im_shape_[0]; + im_shape_all[bs_idx * 2 + 1] = inputs_.im_shape_[1]; + + scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; + scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; + + in_data_all.insert( + in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + } + + // Prepare input tensor + auto input_names = predictor_->GetInputNames(); + for (const auto& tensor_name : input_names) { + auto in_tensor = predictor_->GetInputHandle(tensor_name); + if (tensor_name == "image") { + int rh = inputs_.in_net_shape_[0]; + int rw = inputs_.in_net_shape_[1]; + in_tensor->Reshape({batch_size, 3, rh, rw}); + in_tensor->CopyFromCpu(in_data_all.data()); + } else if (tensor_name == "im_shape") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(im_shape_all.data()); + } else if (tensor_name == "scale_factor") { + in_tensor->Reshape({batch_size, 2}); + in_tensor->CopyFromCpu(scale_factor_all.data()); + } + } + + auto preprocess_end = std::chrono::steady_clock::now(); + std::vector bbox_shape; + std::vector emb_shape; + + // Run predictor + auto inference_start = std::chrono::steady_clock::now(); + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + auto bbox_tensor = predictor_->GetOutputHandle(output_names[0]); + bbox_shape = bbox_tensor->shape(); + auto emb_tensor = predictor_->GetOutputHandle(output_names[1]); + emb_shape = emb_tensor->shape(); + // Calculate bbox length + int bbox_size = 1; + for (int j = 0; j < bbox_shape.size(); ++j) { + bbox_size *= bbox_shape[j]; + } + // Calculate emb length + int emb_size = 1; + for (int j = 0; j < emb_shape.size(); ++j) { + emb_size *= emb_shape[j]; + } + + bbox_data_.resize(bbox_size); + bbox_tensor->CopyToCpu(bbox_data_.data()); + + emb_data_.resize(emb_size); + emb_tensor->CopyToCpu(emb_data_.data()); + auto inference_end = std::chrono::steady_clock::now(); + + // Postprocessing result + auto postprocess_start = std::chrono::steady_clock::now(); + result->clear(); + + cv::Mat dets(bbox_shape[0], 6, CV_32FC1, bbox_data_.data()); + cv::Mat emb(bbox_shape[0], emb_shape[1], CV_32FC1, emb_data_.data()); + + Postprocess(dets, emb, result); + + auto postprocess_end = std::chrono::steady_clock::now(); + + std::chrono::duration preprocess_diff = + preprocess_end - preprocess_start; + (*times)[0] += static_cast(preprocess_diff.count() * 1000); + std::chrono::duration inference_diff = inference_end - inference_start; + (*times)[1] += static_cast(inference_diff.count() * 1000); + std::chrono::duration postprocess_diff = + postprocess_end - postprocess_start; + (*times)[2] += static_cast(postprocess_diff.count() * 1000); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/lapjv.cpp b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/lapjv.cpp new file mode 100644 index 0000000000000000000000000000000000000000..bb710e740e5291c5332ad91770e3649ec317ed20 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/lapjv.cpp @@ -0,0 +1,409 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/gatagat/lap/blob/master/lap/lapjv.cpp +// Ths copyright of gatagat/lap is as follows: +// MIT License + +#include +#include +#include + +#include "include/lapjv.h" + +namespace PaddleDetection { + +/** Column-reduction and reduction transfer for a dense cost matrix. + */ +int _ccrrt_dense( + const int n, float *cost[], int *free_rows, int *x, int *y, float *v) { + int n_free_rows; + bool *unique; + + for (int i = 0; i < n; i++) { + x[i] = -1; + v[i] = LARGE; + y[i] = 0; + } + for (int i = 0; i < n; i++) { + for (int j = 0; j < n; j++) { + const float c = cost[i][j]; + if (c < v[j]) { + v[j] = c; + y[j] = i; + } + } + } + NEW(unique, bool, n); + memset(unique, TRUE, n); + { + int j = n; + do { + j--; + const int i = y[j]; + if (x[i] < 0) { + x[i] = j; + } else { + unique[i] = FALSE; + y[j] = -1; + } + } while (j > 0); + } + n_free_rows = 0; + for (int i = 0; i < n; i++) { + if (x[i] < 0) { + free_rows[n_free_rows++] = i; + } else if (unique[i]) { + const int j = x[i]; + float min = LARGE; + for (int j2 = 0; j2 < n; j2++) { + if (j2 == static_cast(j)) { + continue; + } + const float c = cost[i][j2] - v[j2]; + if (c < min) { + min = c; + } + } + v[j] -= min; + } + } + FREE(unique); + return n_free_rows; +} + +/** Augmenting row reduction for a dense cost matrix. + */ +int _carr_dense(const int n, + float *cost[], + const int n_free_rows, + int *free_rows, + int *x, + int *y, + float *v) { + int current = 0; + int new_free_rows = 0; + int rr_cnt = 0; + while (current < n_free_rows) { + int i0; + int j1, j2; + float v1, v2, v1_new; + bool v1_lowers; + + rr_cnt++; + const int free_i = free_rows[current++]; + j1 = 0; + v1 = cost[free_i][0] - v[0]; + j2 = -1; + v2 = LARGE; + for (int j = 1; j < n; j++) { + const float c = cost[free_i][j] - v[j]; + if (c < v2) { + if (c >= v1) { + v2 = c; + j2 = j; + } else { + v2 = v1; + v1 = c; + j2 = j1; + j1 = j; + } + } + } + i0 = y[j1]; + v1_new = v[j1] - (v2 - v1); + v1_lowers = v1_new < v[j1]; + if (rr_cnt < current * n) { + if (v1_lowers) { + v[j1] = v1_new; + } else if (i0 >= 0 && j2 >= 0) { + j1 = j2; + i0 = y[j2]; + } + if (i0 >= 0) { + if (v1_lowers) { + free_rows[--current] = i0; + } else { + free_rows[new_free_rows++] = i0; + } + } + } else { + if (i0 >= 0) { + free_rows[new_free_rows++] = i0; + } + } + x[free_i] = j1; + y[j1] = free_i; + } + return new_free_rows; +} + +/** Find columns with minimum d[j] and put them on the SCAN list. + */ +int _find_dense(const int n, int lo, float *d, int *cols, int *y) { + int hi = lo + 1; + float mind = d[cols[lo]]; + for (int k = hi; k < n; k++) { + int j = cols[k]; + if (d[j] <= mind) { + if (d[j] < mind) { + hi = lo; + mind = d[j]; + } + cols[k] = cols[hi]; + cols[hi++] = j; + } + } + return hi; +} + +// Scan all columns in TODO starting from arbitrary column in SCAN +// and try to decrease d of the TODO columns using the SCAN column. +int _scan_dense(const int n, + float *cost[], + int *plo, + int *phi, + float *d, + int *cols, + int *pred, + int *y, + float *v) { + int lo = *plo; + int hi = *phi; + float h, cred_ij; + + while (lo != hi) { + int j = cols[lo++]; + const int i = y[j]; + const float mind = d[j]; + h = cost[i][j] - v[j] - mind; + // For all columns in TODO + for (int k = hi; k < n; k++) { + j = cols[k]; + cred_ij = cost[i][j] - v[j] - h; + if (cred_ij < d[j]) { + d[j] = cred_ij; + pred[j] = i; + if (cred_ij == mind) { + if (y[j] < 0) { + return j; + } + cols[k] = cols[hi]; + cols[hi++] = j; + } + } + } + } + *plo = lo; + *phi = hi; + return -1; +} + +/** Single iteration of modified Dijkstra shortest path algorithm as explained + * in the JV paper. + * + * This is a dense matrix version. + * + * \return The closest free column index. + */ +int find_path_dense(const int n, + float *cost[], + const int start_i, + int *y, + float *v, + int *pred) { + int lo = 0, hi = 0; + int final_j = -1; + int n_ready = 0; + int *cols; + float *d; + + NEW(cols, int, n); + NEW(d, float, n); + + for (int i = 0; i < n; i++) { + cols[i] = i; + pred[i] = start_i; + d[i] = cost[start_i][i] - v[i]; + } + while (final_j == -1) { + // No columns left on the SCAN list. + if (lo == hi) { + n_ready = lo; + hi = _find_dense(n, lo, d, cols, y); + for (int k = lo; k < hi; k++) { + const int j = cols[k]; + if (y[j] < 0) { + final_j = j; + } + } + } + if (final_j == -1) { + final_j = _scan_dense(n, cost, &lo, &hi, d, cols, pred, y, v); + } + } + + { + const float mind = d[cols[lo]]; + for (int k = 0; k < n_ready; k++) { + const int j = cols[k]; + v[j] += d[j] - mind; + } + } + + FREE(cols); + FREE(d); + + return final_j; +} + +/** Augment for a dense cost matrix. + */ +int _ca_dense(const int n, + float *cost[], + const int n_free_rows, + int *free_rows, + int *x, + int *y, + float *v) { + int *pred; + + NEW(pred, int, n); + + for (int *pfree_i = free_rows; pfree_i < free_rows + n_free_rows; pfree_i++) { + int i = -1, j; + int k = 0; + + j = find_path_dense(n, cost, *pfree_i, y, v, pred); + while (i != *pfree_i) { + i = pred[j]; + y[j] = i; + SWAP_INDICES(j, x[i]); + k++; + } + } + FREE(pred); + return 0; +} + +/** Solve dense sparse LAP. + */ +int lapjv_internal(const cv::Mat &cost, + const bool extend_cost, + const float cost_limit, + int *x, + int *y) { + int n_rows = cost.rows; + int n_cols = cost.cols; + int n; + if (n_rows == n_cols) { + n = n_rows; + } else if (!extend_cost) { + throw std::invalid_argument( + "Square cost array expected. If cost is intentionally non-square, pass " + "extend_cost=True."); + } + + // Get extend cost + if (extend_cost || cost_limit < LARGE) { + n = n_rows + n_cols; + } + cv::Mat cost_expand(n, n, CV_32F); + float expand_value; + if (cost_limit < LARGE) { + expand_value = cost_limit / 2; + } else { + double max_v; + minMaxLoc(cost, nullptr, &max_v); + expand_value = static_cast(max_v) + 1.; + } + + for (int i = 0; i < n; ++i) { + for (int j = 0; j < n; ++j) { + cost_expand.at(i, j) = expand_value; + if (i >= n_rows && j >= n_cols) { + cost_expand.at(i, j) = 0; + } else if (i < n_rows && j < n_cols) { + cost_expand.at(i, j) = cost.at(i, j); + } + } + } + + // Convert Mat to pointer array + float **cost_ptr; + NEW(cost_ptr, float *, n); + for (int i = 0; i < n; ++i) { + NEW(cost_ptr[i], float, n); + } + for (int i = 0; i < n; ++i) { + for (int j = 0; j < n; ++j) { + cost_ptr[i][j] = cost_expand.at(i, j); + } + } + + int ret; + int *free_rows; + float *v; + int *x_c; + int *y_c; + + NEW(free_rows, int, n); + NEW(v, float, n); + NEW(x_c, int, n); + NEW(y_c, int, n); + + ret = _ccrrt_dense(n, cost_ptr, free_rows, x_c, y_c, v); + int i = 0; + while (ret > 0 && i < 2) { + ret = _carr_dense(n, cost_ptr, ret, free_rows, x_c, y_c, v); + i++; + } + if (ret > 0) { + ret = _ca_dense(n, cost_ptr, ret, free_rows, x_c, y_c, v); + } + FREE(v); + FREE(free_rows); + for (int i = 0; i < n; ++i) { + FREE(cost_ptr[i]); + } + FREE(cost_ptr); + if (ret != 0) { + if (ret == -1) { + throw "Out of memory."; + } + throw "Unknown error (lapjv_internal)"; + } + // Get output of x, y, opt + for (int i = 0; i < n; ++i) { + if (i < n_rows) { + x[i] = x_c[i]; + if (x[i] >= n_cols) { + x[i] = -1; + } + } + if (i < n_cols) { + y[i] = y_c[i]; + if (y[i] >= n_rows) { + y[i] = -1; + } + } + } + + FREE(x_c); + FREE(y_c); + return ret; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/main.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/main.cc new file mode 100644 index 0000000000000000000000000000000000000000..40ffc08017f1d5324b25242af88d8d4c7a6e3597 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/main.cc @@ -0,0 +1,172 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifdef _WIN32 +#include +#include +#else +#include +#include +#endif + +#include +#include "include/pipeline.h" + +DEFINE_string(video_file, "", "Path of input video."); +DEFINE_string(video_other_file, + "", + "Path of other input video used for MTMCT."); +DEFINE_string(device, + "CPU", + "Choose the device you want to run, it can be: CPU/GPU/XPU, " + "default is CPU."); +DEFINE_double(threshold, 0.5, "Threshold of score."); +DEFINE_string(output_dir, "output", "Directory of output visualization files."); +DEFINE_string(run_mode, + "paddle", + "Mode of running(paddle/trt_fp32/trt_fp16/trt_int8)"); +DEFINE_int32(gpu_id, 0, "Device id of GPU to execute"); +DEFINE_bool(use_mkldnn, false, "Whether use mkldnn with CPU"); +DEFINE_int32(cpu_threads, 1, "Num of threads with CPU"); +DEFINE_bool(trt_calib_mode, + false, + "If the model is produced by TRT offline quantitative calibration, " + "trt_calib_mode need to set True"); +DEFINE_bool(tiny_obj, false, "Whether tracking tiny object"); +DEFINE_bool(do_entrance_counting, + false, + "Whether counting the numbers of identifiers entering " + "or getting out from the entrance."); +DEFINE_int32(secs_interval, 10, "The seconds interval to count after tracking"); +DEFINE_bool(save_result, false, "Whether saving result after tracking"); +DEFINE_string( + scene, + "", + "scene of tracking system, it can be : pedestrian/vehicle/multiclass"); +DEFINE_bool(is_mtmct, false, "Whether use multi-target multi-camera tracking"); +DEFINE_string(track_model_dir, "", "Path of tracking model"); +DEFINE_string(det_model_dir, "", "Path of detection model"); +DEFINE_string(reid_model_dir, "", "Path of reid model"); + +static std::string DirName(const std::string& filepath) { + auto pos = filepath.rfind(OS_PATH_SEP); + if (pos == std::string::npos) { + return ""; + } + return filepath.substr(0, pos); +} + +static bool PathExists(const std::string& path) { +#ifdef _WIN32 + struct _stat buffer; + return (_stat(path.c_str(), &buffer) == 0); +#else + struct stat buffer; + return (stat(path.c_str(), &buffer) == 0); +#endif // !_WIN32 +} + +static void MkDir(const std::string& path) { + if (PathExists(path)) return; + int ret = 0; +#ifdef _WIN32 + ret = _mkdir(path.c_str()); +#else + ret = mkdir(path.c_str(), 0755); +#endif // !_WIN32 + if (ret != 0) { + std::string path_error(path); + path_error += " mkdir failed!"; + throw std::runtime_error(path_error); + } +} + +static void MkDirs(const std::string& path) { + if (path.empty()) return; + if (PathExists(path)) return; + + MkDirs(DirName(path)); + MkDir(path); +} + +int main(int argc, char** argv) { + // Parsing command-line + google::ParseCommandLineFlags(&argc, &argv, true); + bool has_model_dir = + !(FLAGS_track_model_dir.empty() && FLAGS_det_model_dir.empty() && + FLAGS_reid_model_dir.empty()); + if (FLAGS_video_file.empty() || (FLAGS_scene.empty() && !has_model_dir)) { + LOG(ERROR) << "Usage: \n" + << "1. ./main -video_file=/PATH/TO/INPUT/IMAGE/ " + << "-scene=pedestrian/vehicle/multiclass\n" + << "2. ./main -video_file=/PATH/TO/INPUT/IMAGE/ " + << "-track_model_dir=/PATH/TO/MODEL_DIR" << std::endl; + + return -1; + } + if (!(FLAGS_run_mode == "paddle" || FLAGS_run_mode == "trt_fp32" || + FLAGS_run_mode == "trt_fp16" || FLAGS_run_mode == "trt_int8")) { + LOG(ERROR) + << "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or 'trt_int8'."; + return -1; + } + transform(FLAGS_device.begin(), + FLAGS_device.end(), + FLAGS_device.begin(), + ::toupper); + if (!(FLAGS_device == "CPU" || FLAGS_device == "GPU" || + FLAGS_device == "XPU")) { + LOG(ERROR) << "device should be 'CPU', 'GPU' or 'XPU'."; + return -1; + } + + if (!PathExists(FLAGS_output_dir)) { + MkDirs(FLAGS_output_dir); + } + + PaddleDetection::Pipeline pipeline(FLAGS_device, + FLAGS_threshold, + FLAGS_output_dir, + FLAGS_run_mode, + FLAGS_gpu_id, + FLAGS_use_mkldnn, + FLAGS_cpu_threads, + FLAGS_trt_calib_mode, + FLAGS_do_entrance_counting, + FLAGS_save_result, + FLAGS_scene, + FLAGS_tiny_obj, + FLAGS_is_mtmct, + FLAGS_secs_interval, + FLAGS_track_model_dir, + FLAGS_det_model_dir, + FLAGS_reid_model_dir); + + pipeline.SetInput(FLAGS_video_file); + if (!FLAGS_video_other_file.empty()) { + pipeline.SetInput(FLAGS_video_other_file); + } + pipeline.Run(); + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/pipeline.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/pipeline.cc new file mode 100644 index 0000000000000000000000000000000000000000..7c22c630f9f07fd2059813e951aa3a7f47f48fdb --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/pipeline.cc @@ -0,0 +1,367 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +// for setprecision +#include +#include +#include +#include + +#include "include/pipeline.h" +#include "include/postprocess.h" +#include "include/predictor.h" + +namespace PaddleDetection { + +void Pipeline::SetInput(const std::string& input_video) { + input_.push_back(input_video); +} + +void Pipeline::ClearInput() { + input_.clear(); + stream_.clear(); +} + +void Pipeline::SelectModel(const std::string& scene, + const bool tiny_obj, + const bool is_mtmct, + const std::string track_model_dir, + const std::string det_model_dir, + const std::string reid_model_dir) { + // model_dir has higher priority + if (!track_model_dir.empty()) { + track_model_dir_ = track_model_dir; + return; + } + if (!det_model_dir.empty() && !reid_model_dir.empty()) { + det_model_dir_ = det_model_dir; + reid_model_dir_ = reid_model_dir; + return; + } + + // Single camera model, based on FairMot + if (scene == "pedestrian") { + if (tiny_obj) { + track_model_dir_ = "../pedestrian_track_tiny"; + } else { + track_model_dir_ = "../pedestrian_track"; + } + } else if (scene != "vehicle") { + if (tiny_obj) { + track_model_dir_ = "../vehicle_track_tiny"; + } else { + track_model_dir_ = "../vehicle_track"; + } + } else if (scene == "multiclass") { + if (tiny_obj) { + track_model_dir_ = "../multiclass_track_tiny"; + } else { + track_model_dir_ = "../multiclass_track"; + } + } + + // Multi-camera model, based on PicoDet & LCNet + if (is_mtmct && scene == "pedestrian") { + det_model_dir_ = "../pedestrian_det"; + reid_model_dir_ = "../pedestrian_reid"; + } else if (is_mtmct && scene == "vehicle") { + det_model_dir_ = "../vehicle_det"; + reid_model_dir_ = "../vehicle_reid"; + } else if (is_mtmct && scene == "multiclass") { + throw "Multi-camera tracking is not supported in multiclass scene now."; + } +} + +void Pipeline::InitPredictor() { + if (track_model_dir_.empty() && det_model_dir_.empty()) { + throw "Predictor must receive track_model or det_model!"; + } + + if (!track_model_dir_.empty()) { + jde_sct_ = std::make_shared(device_, + track_model_dir_, + threshold_, + run_mode_, + gpu_id_, + use_mkldnn_, + cpu_threads_, + trt_calib_mode_); + } + if (!det_model_dir_.empty()) { + sde_sct_ = std::make_shared(device_, + det_model_dir_, + reid_model_dir_, + threshold_, + run_mode_, + gpu_id_, + use_mkldnn_, + cpu_threads_, + trt_calib_mode_); + } +} + +void Pipeline::Run() { + if (track_model_dir_.empty() && det_model_dir_.empty()) { + LOG(ERROR) << "Pipeline must use SelectModel before Run"; + return; + } + if (input_.size() == 0) { + LOG(ERROR) << "Pipeline must use SetInput before Run"; + return; + } + + if (!track_model_dir_.empty()) { + // single camera + if (input_.size() > 1) { + throw "Single camera tracking except single video, but received %d", + input_.size(); + } + PredictMOT(input_[0]); + } else { + // multi cameras + if (input_.size() != 2) { + throw "Multi camera tracking except two videos, but received %d", + input_.size(); + } + PredictMTMCT(input_); + } +} + +void Pipeline::PredictMOT(const std::string& video_path) { + // Open video + cv::VideoCapture capture; + capture.open(video_path.c_str()); + if (!capture.isOpened()) { + printf("can not open video : %s\n", video_path.c_str()); + return; + } + + // Get Video info : resolution, fps + int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH)); + int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT)); + int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS)); + + LOG(INFO) << "----------------------- Input info -----------------------"; + LOG(INFO) << "video_width: " << video_width; + LOG(INFO) << "video_height: " << video_height; + LOG(INFO) << "input fps: " << video_fps; + + // Create VideoWriter for output + cv::VideoWriter video_out; + std::string video_out_path = output_dir_ + OS_PATH_SEP + "mot_output.mp4"; + int fcc = cv::VideoWriter::fourcc('m', 'p', '4', 'v'); + video_out.open(video_out_path.c_str(), + fcc, // 0x00000021, + video_fps, + cv::Size(video_width, video_height), + true); + if (!video_out.isOpened()) { + printf("create video writer failed!\n"); + return; + } + + PaddleDetection::MOTResult result; + std::vector det_times(3); + std::set id_set; + std::set interval_id_set; + std::vector in_id_list; + std::vector out_id_list; + std::map> prev_center; + Rect entrance = {0, + static_cast(video_height) / 2, + static_cast(video_width), + static_cast(video_height) / 2}; + double times; + double total_time; + // Capture all frames and do inference + cv::Mat frame; + int frame_id = 0; + + std::vector records; + std::vector flow_records; + records.push_back("result format: frame_id, track_id, x1, y1, w, h\n"); + + LOG(INFO) << "------------------- Predict info ------------------------"; + while (capture.read(frame)) { + if (frame.empty()) { + break; + } + std::vector imgs; + imgs.push_back(frame); + jde_sct_->Predict(imgs, threshold_, &result, &det_times); + frame_id += 1; + total_time = std::accumulate(det_times.begin(), det_times.end(), 0.); + times = total_time / frame_id; + + LOG(INFO) << "frame_id: " << frame_id + << " predict time(s): " << times / 1000; + + cv::Mat out_img = PaddleDetection::VisualizeTrackResult( + frame, result, 1000. / times, frame_id); + + // TODO(qianhui): the entrance line can be set by users + PaddleDetection::FlowStatistic(result, + frame_id, + secs_interval_, + do_entrance_counting_, + video_fps, + entrance, + &id_set, + &interval_id_set, + &in_id_list, + &out_id_list, + &prev_center, + &flow_records); + + if (save_result_) { + PaddleDetection::SaveMOTResult(result, frame_id, &records); + } + + // Draw the entrance line + if (do_entrance_counting_) { + float line_thickness = std::max(1, static_cast(video_width / 500.)); + cv::Point pt1 = cv::Point(entrance.left, entrance.top); + cv::Point pt2 = cv::Point(entrance.right, entrance.bottom); + cv::line(out_img, pt1, pt2, cv::Scalar(0, 255, 255), line_thickness); + } + video_out.write(out_img); + } + capture.release(); + video_out.release(); + PrintBenchmarkLog(det_times, frame_id); + LOG(INFO) << "-------------------- Final Output info -------------------"; + LOG(INFO) << "Total frame: " << frame_id; + LOG(INFO) << "Visualized output saved as " << video_out_path.c_str(); + if (save_result_) { + FILE* fp; + + std::string result_output_path = + output_dir_ + OS_PATH_SEP + "mot_output.txt"; + if ((fp = fopen(result_output_path.c_str(), "w+")) == NULL) { + printf("Open %s error.\n", result_output_path.c_str()); + return; + } + for (int l; l < records.size(); ++l) { + fprintf(fp, records[l].c_str()); + } + + fclose(fp); + LOG(INFO) << "txt result output saved as " << result_output_path.c_str(); + + result_output_path = output_dir_ + OS_PATH_SEP + "flow_statistic.txt"; + if ((fp = fopen(result_output_path.c_str(), "w+")) == NULL) { + printf("Open %s error.\n", result_output_path); + return; + } + for (int l; l < flow_records.size(); ++l) { + fprintf(fp, flow_records[l].c_str()); + } + fclose(fp); + LOG(INFO) << "txt flow statistic saved as " << result_output_path.c_str(); + } +} + +void Pipeline::PredictMTMCT(const std::vector video_path) { + throw "Not Implement!"; +} + +void Pipeline::RunMOTStream(const cv::Mat img, + const int frame_id, + const int video_fps, + const Rect entrance, + cv::Mat out_img, + std::vector* records, + std::set* id_set, + std::set* interval_id_set, + std::vector* in_id_list, + std::vector* out_id_list, + std::map>* prev_center, + std::vector* flow_records) { + PaddleDetection::MOTResult result; + std::vector det_times(3); + double times; + double total_time; + + LOG(INFO) << "------------------- Predict info ------------------------"; + std::vector imgs; + imgs.push_back(img); + jde_sct_->Predict(imgs, threshold_, &result, &det_times); + total_time = std::accumulate(det_times.begin(), det_times.end(), 0.); + times = total_time / frame_id; + + LOG(INFO) << "frame_id: " << frame_id << " predict time(s): " << times / 1000; + + out_img = PaddleDetection::VisualizeTrackResult( + img, result, 1000. / times, frame_id); + + // Count total number + // Count in & out number + PaddleDetection::FlowStatistic(result, + frame_id, + secs_interval_, + do_entrance_counting_, + video_fps, + entrance, + id_set, + interval_id_set, + in_id_list, + out_id_list, + prev_center, + flow_records); + + PrintBenchmarkLog(det_times, frame_id); + if (save_result_) { + PaddleDetection::SaveMOTResult(result, frame_id, records); + } +} + +void Pipeline::RunMTMCTStream(const std::vector imgs, + std::vector* records) { + throw "Not Implement!"; +} + +void Pipeline::PrintBenchmarkLog(const std::vector det_time, + const int img_num) { + LOG(INFO) << "----------------------- Config info -----------------------"; + LOG(INFO) << "runtime_device: " << device_; + LOG(INFO) << "ir_optim: " + << "True"; + LOG(INFO) << "enable_memory_optim: " + << "True"; + int has_trt = run_mode_.find("trt"); + if (has_trt >= 0) { + LOG(INFO) << "enable_tensorrt: " + << "True"; + std::string precision = run_mode_.substr(4, 8); + LOG(INFO) << "precision: " << precision; + } else { + LOG(INFO) << "enable_tensorrt: " + << "False"; + LOG(INFO) << "precision: " + << "fp32"; + } + LOG(INFO) << "enable_mkldnn: " << (use_mkldnn_ ? "True" : "False"); + LOG(INFO) << "cpu_math_library_num_threads: " << cpu_threads_; + LOG(INFO) << "----------------------- Perf info ------------------------"; + LOG(INFO) << "Total number of predicted data: " << img_num + << " and total time spent(s): " + << std::accumulate(det_time.begin(), det_time.end(), 0.) / 1000; + int num = std::max(1, img_num); + LOG(INFO) << "preproce_time(ms): " << det_time[0] / num + << ", inference_time(ms): " << det_time[1] / num + << ", postprocess_time(ms): " << det_time[2] / num; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/postprocess.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/postprocess.cc new file mode 100644 index 0000000000000000000000000000000000000000..47ea5a7cc48513f19ec927bf5fd88b24770e1b04 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/postprocess.cc @@ -0,0 +1,207 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +// for setprecision +#include +#include +#include +#include "include/postprocess.h" + +namespace PaddleDetection { + +cv::Scalar GetColor(int idx) { + idx = idx * 3; + cv::Scalar color = + cv::Scalar((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255); + return color; +} + +cv::Mat VisualizeTrackResult(const cv::Mat& img, + const MOTResult& results, + const float fps, + const int frame_id) { + cv::Mat vis_img = img.clone(); + int im_h = img.rows; + int im_w = img.cols; + float text_scale = std::max(1, static_cast(im_w / 1600.)); + float text_thickness = 2.; + float line_thickness = std::max(1, static_cast(im_w / 500.)); + + std::ostringstream oss; + oss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + oss << "frame: " << frame_id << " "; + oss << "fps: " << fps << " "; + oss << "num: " << results.size(); + std::string text = oss.str(); + + cv::Point origin; + origin.x = 0; + origin.y = static_cast(15 * text_scale); + cv::putText(vis_img, + text, + origin, + cv::FONT_HERSHEY_PLAIN, + text_scale, + (0, 0, 255), + 2); + + for (int i = 0; i < results.size(); ++i) { + const int obj_id = results[i].ids; + const float score = results[i].score; + + cv::Scalar color = GetColor(obj_id); + + cv::Point pt1 = cv::Point(results[i].rects.left, results[i].rects.top); + cv::Point pt2 = cv::Point(results[i].rects.right, results[i].rects.bottom); + cv::Point id_pt = + cv::Point(results[i].rects.left, results[i].rects.top + 10); + cv::Point score_pt = + cv::Point(results[i].rects.left, results[i].rects.top - 10); + cv::rectangle(vis_img, pt1, pt2, color, line_thickness); + + std::ostringstream idoss; + idoss << std::setiosflags(std::ios::fixed) << std::setprecision(4); + idoss << obj_id; + std::string id_text = idoss.str(); + + cv::putText(vis_img, + id_text, + id_pt, + cv::FONT_HERSHEY_PLAIN, + text_scale, + cv::Scalar(0, 255, 255), + text_thickness); + + std::ostringstream soss; + soss << std::setiosflags(std::ios::fixed) << std::setprecision(2); + soss << score; + std::string score_text = soss.str(); + + cv::putText(vis_img, + score_text, + score_pt, + cv::FONT_HERSHEY_PLAIN, + text_scale, + cv::Scalar(0, 255, 255), + text_thickness); + } + return vis_img; +} + +void FlowStatistic(const MOTResult& results, + const int frame_id, + const int secs_interval, + const bool do_entrance_counting, + const int video_fps, + const Rect entrance, + std::set* id_set, + std::set* interval_id_set, + std::vector* in_id_list, + std::vector* out_id_list, + std::map>* prev_center, + std::vector* records) { + if (frame_id == 0) interval_id_set->clear(); + + if (do_entrance_counting) { + // Count in and out number: + // Use horizontal center line as the entrance just for simplification. + // If a person located in the above the horizontal center line + // at the previous frame and is in the below the line at the current frame, + // the in number is increased by one. + // If a person was in the below the horizontal center line + // at the previous frame and locates in the below the line at the current + // frame, + // the out number is increased by one. + // TODO(qianhui): if the entrance is not the horizontal center line, + // the counting method should be optimized. + + float entrance_y = entrance.top; + for (const auto& result : results) { + float center_x = (result.rects.left + result.rects.right) / 2; + float center_y = (result.rects.top + result.rects.bottom) / 2; + int ids = result.ids; + std::map>::iterator iter; + iter = prev_center->find(ids); + if (iter != prev_center->end()) { + if (iter->second[1] <= entrance_y && center_y > entrance_y) { + in_id_list->push_back(ids); + } + if (iter->second[1] >= entrance_y && center_y < entrance_y) { + out_id_list->push_back(ids); + } + (*prev_center)[ids][0] = center_x; + (*prev_center)[ids][1] = center_y; + } else { + prev_center->insert( + std::pair>(ids, {center_x, center_y})); + } + } + } + + // Count totol number, number at a manual-setting interval + for (const auto& result : results) { + id_set->insert(result.ids); + interval_id_set->insert(result.ids); + } + + std::ostringstream os; + os << "Frame id: " << frame_id << ", Total count: " << id_set->size(); + if (do_entrance_counting) { + os << ", In count: " << in_id_list->size() + << ", Out count: " << out_id_list->size(); + } + + // Reset counting at the interval beginning + int curr_interval_count = -1; + if (frame_id % video_fps == 0 && frame_id / video_fps % secs_interval == 0) { + curr_interval_count = interval_id_set->size(); + os << ", Count during " << secs_interval + << " secs: " << curr_interval_count; + interval_id_set->clear(); + } + os << "\n"; + std::string record = os.str(); + records->push_back(record); + LOG(INFO) << record; +} + +void SaveMOTResult(const MOTResult& results, + const int frame_id, + std::vector* records) { + // result format: frame_id, track_id, x1, y1, w, h + std::string record; + for (int i = 0; i < results.size(); ++i) { + MOTTrack mot_track = results[i]; + int ids = mot_track.ids; + float score = mot_track.score; + Rect rects = mot_track.rects; + float x1 = rects.left; + float y1 = rects.top; + float x2 = rects.right; + float y2 = rects.bottom; + float w = x2 - x1; + float h = y2 - y1; + if (w == 0 || h == 0) { + continue; + } + std::ostringstream os; + os << frame_id << " " << ids << " " << x1 << " " << y1 << " " << w << " " + << h << "\n"; + record = os.str(); + records->push_back(record); + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/predictor.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/predictor.cc new file mode 100644 index 0000000000000000000000000000000000000000..ea479f3ab049143147a938bb575f8995dee55c95 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/predictor.cc @@ -0,0 +1,35 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/predictor.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +void Predictor::Predict(const std::vector imgs, + const double threshold, + MOTResult* result, + std::vector* times) { + if (use_jde_) { + jde_sct_->Predict(imgs, threshold, result, times); + } else { + sde_sct_->Predict(imgs, threshold, result, times); + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/preprocess_op.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/preprocess_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..3158ad67b60b473e0f38d609b55accf35a37a6a8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/preprocess_op.cc @@ -0,0 +1,187 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include + +#include "include/preprocess_op.h" + +namespace PaddleDetection { + +void InitInfo::Run(cv::Mat* im, ImageBlob* data) { + data->im_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; + data->scale_factor_ = {1., 1.}; + data->in_net_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; +} + +void NormalizeImage::Run(cv::Mat* im, ImageBlob* data) { + double e = 1.0; + if (is_scale_) { + e /= 255.0; + } + (*im).convertTo(*im, CV_32FC3, e); + for (int h = 0; h < im->rows; h++) { + for (int w = 0; w < im->cols; w++) { + im->at(h, w)[0] = + (im->at(h, w)[0] - mean_[0]) / scale_[0]; + im->at(h, w)[1] = + (im->at(h, w)[1] - mean_[1]) / scale_[1]; + im->at(h, w)[2] = + (im->at(h, w)[2] - mean_[2]) / scale_[2]; + } + } +} + +void Permute::Run(cv::Mat* im, ImageBlob* data) { + (*im).convertTo(*im, CV_32FC3); + int rh = im->rows; + int rw = im->cols; + int rc = im->channels(); + (data->im_data_).resize(rc * rh * rw); + float* base = (data->im_data_).data(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); + } +} + +void Resize::Run(cv::Mat* im, ImageBlob* data) { + auto resize_scale = GenerateScale(*im); + data->im_shape_ = {static_cast(im->cols * resize_scale.first), + static_cast(im->rows * resize_scale.second)}; + data->in_net_shape_ = {static_cast(im->cols * resize_scale.first), + static_cast(im->rows * resize_scale.second)}; + cv::resize( + *im, *im, cv::Size(), resize_scale.first, resize_scale.second, interp_); + data->im_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + data->scale_factor_ = { + resize_scale.second, resize_scale.first, + }; +} + +std::pair Resize::GenerateScale(const cv::Mat& im) { + std::pair resize_scale; + int origin_w = im.cols; + int origin_h = im.rows; + + if (keep_ratio_) { + int im_size_max = std::max(origin_w, origin_h); + int im_size_min = std::min(origin_w, origin_h); + int target_size_max = + *std::max_element(target_size_.begin(), target_size_.end()); + int target_size_min = + *std::min_element(target_size_.begin(), target_size_.end()); + float scale_min = + static_cast(target_size_min) / static_cast(im_size_min); + float scale_max = + static_cast(target_size_max) / static_cast(im_size_max); + float scale_ratio = std::min(scale_min, scale_max); + resize_scale = {scale_ratio, scale_ratio}; + } else { + resize_scale.first = + static_cast(target_size_[1]) / static_cast(origin_w); + resize_scale.second = + static_cast(target_size_[0]) / static_cast(origin_h); + } + return resize_scale; +} + +void LetterBoxResize::Run(cv::Mat* im, ImageBlob* data) { + float resize_scale = GenerateScale(*im); + int new_shape_w = std::round(im->cols * resize_scale); + int new_shape_h = std::round(im->rows * resize_scale); + data->im_shape_ = {static_cast(new_shape_h), + static_cast(new_shape_w)}; + float padw = (target_size_[1] - new_shape_w) / 2.; + float padh = (target_size_[0] - new_shape_h) / 2.; + + int top = std::round(padh - 0.1); + int bottom = std::round(padh + 0.1); + int left = std::round(padw - 0.1); + int right = std::round(padw + 0.1); + + cv::resize( + *im, *im, cv::Size(new_shape_w, new_shape_h), 0, 0, cv::INTER_AREA); + + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + cv::copyMakeBorder(*im, + *im, + top, + bottom, + left, + right, + cv::BORDER_CONSTANT, + cv::Scalar(127.5)); + + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; + + data->scale_factor_ = { + resize_scale, resize_scale, + }; +} + +float LetterBoxResize::GenerateScale(const cv::Mat& im) { + int origin_w = im.cols; + int origin_h = im.rows; + + int target_h = target_size_[0]; + int target_w = target_size_[1]; + + float ratio_h = static_cast(target_h) / static_cast(origin_h); + float ratio_w = static_cast(target_w) / static_cast(origin_w); + float resize_scale = std::min(ratio_h, ratio_w); + return resize_scale; +} + +void PadStride::Run(cv::Mat* im, ImageBlob* data) { + if (stride_ <= 0) { + return; + } + int rc = im->channels(); + int rh = im->rows; + int rw = im->cols; + int nh = (rh / stride_) * stride_ + (rh % stride_ != 0) * stride_; + int nw = (rw / stride_) * stride_ + (rw % stride_ != 0) * stride_; + cv::copyMakeBorder( + *im, *im, 0, nh - rh, 0, nw - rw, cv::BORDER_CONSTANT, cv::Scalar(0)); + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; +} + +// Preprocessor op running order +const std::vector Preprocessor::RUN_ORDER = {"InitInfo", + "Resize", + "LetterBoxResize", + "NormalizeImage", + "PadStride", + "Permute"}; + +void Preprocessor::Run(cv::Mat* im, ImageBlob* data) { + for (const auto& name : RUN_ORDER) { + if (ops_.find(name) != ops_.end()) { + ops_[name]->Run(im, data); + } + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/sde_predictor.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/sde_predictor.cc new file mode 100644 index 0000000000000000000000000000000000000000..e469e8ddc5a154e2ba9b97560b6434427f5e7df1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/sde_predictor.cc @@ -0,0 +1,46 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "include/sde_predictor.h" + +using namespace paddle_infer; // NOLINT + +namespace PaddleDetection { + +// Load Model and create model predictor +void SDEPredictor::LoadModel(const std::string& det_model_dir, + const std::string& reid_model_dir, + const std::string& run_mode) { + throw "Not Implement"; +} + +void SDEPredictor::Preprocess(const cv::Mat& ori_im) { throw "Not Implement"; } + +void SDEPredictor::Postprocess(const cv::Mat dets, + const cv::Mat emb, + MOTResult* result) { + throw "Not Implement"; +} + +void SDEPredictor::Predict(const std::vector imgs, + const double threshold, + MOTResult* result, + std::vector* times) { + throw "Not Implement"; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/tracker.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/tracker.cc new file mode 100644 index 0000000000000000000000000000000000000000..9540e39f6701750ae8af5229ecd9cfa264460095 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/tracker.cc @@ -0,0 +1,304 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/jdetracker.cpp +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#include +#include +#include +#include + +#include "include/lapjv.h" +#include "include/tracker.h" + +#define mat2vec4f(m) \ + cv::Vec4f(*m.ptr(0, 0), \ + *m.ptr(0, 1), \ + *m.ptr(0, 2), \ + *m.ptr(0, 3)) + +namespace PaddleDetection { + +static std::map chi2inv95 = {{1, 3.841459f}, + {2, 5.991465f}, + {3, 7.814728f}, + {4, 9.487729f}, + {5, 11.070498f}, + {6, 12.591587f}, + {7, 14.067140f}, + {8, 15.507313f}, + {9, 16.918978f}}; + +JDETracker *JDETracker::me = new JDETracker; + +JDETracker *JDETracker::instance(void) { return me; } + +JDETracker::JDETracker(void) + : timestamp(0), max_lost_time(30), lambda(0.98f), det_thresh(0.3f) {} + +bool JDETracker::update(const cv::Mat &dets, + const cv::Mat &emb, + std::vector *tracks) { + ++timestamp; + TrajectoryPool candidates(dets.rows); + for (int i = 0; i < dets.rows; ++i) { + float score = *dets.ptr(i, 1); + const cv::Mat <rb_ = dets(cv::Rect(2, i, 4, 1)); + cv::Vec4f ltrb = mat2vec4f(ltrb_); + const cv::Mat &embedding = emb(cv::Rect(0, i, emb.cols, 1)); + candidates[i] = Trajectory(ltrb, score, embedding); + } + + TrajectoryPtrPool tracked_trajectories; + TrajectoryPtrPool unconfirmed_trajectories; + for (size_t i = 0; i < this->tracked_trajectories.size(); ++i) { + if (this->tracked_trajectories[i].is_activated) + tracked_trajectories.push_back(&this->tracked_trajectories[i]); + else + unconfirmed_trajectories.push_back(&this->tracked_trajectories[i]); + } + + TrajectoryPtrPool trajectory_pool = + tracked_trajectories + &(this->lost_trajectories); + + for (size_t i = 0; i < trajectory_pool.size(); ++i) + trajectory_pool[i]->predict(); + + Match matches; + std::vector mismatch_row; + std::vector mismatch_col; + + cv::Mat cost = motion_distance(trajectory_pool, candidates); + linear_assignment(cost, 0.7f, &matches, &mismatch_row, &mismatch_col); + + MatchIterator miter; + TrajectoryPtrPool activated_trajectories; + TrajectoryPtrPool retrieved_trajectories; + + for (miter = matches.begin(); miter != matches.end(); miter++) { + Trajectory *pt = trajectory_pool[miter->first]; + Trajectory &ct = candidates[miter->second]; + if (pt->state == Tracked) { + pt->update(&ct, timestamp); + activated_trajectories.push_back(pt); + } else { + pt->reactivate(&ct, timestamp); + retrieved_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool next_candidates(mismatch_col.size()); + for (size_t i = 0; i < mismatch_col.size(); ++i) + next_candidates[i] = &candidates[mismatch_col[i]]; + + TrajectoryPtrPool next_trajectory_pool; + for (size_t i = 0; i < mismatch_row.size(); ++i) { + int j = mismatch_row[i]; + if (trajectory_pool[j]->state == Tracked) + next_trajectory_pool.push_back(trajectory_pool[j]); + } + + cost = iou_distance(next_trajectory_pool, next_candidates); + linear_assignment(cost, 0.5f, &matches, &mismatch_row, &mismatch_col); + + for (miter = matches.begin(); miter != matches.end(); miter++) { + Trajectory *pt = next_trajectory_pool[miter->first]; + Trajectory *ct = next_candidates[miter->second]; + if (pt->state == Tracked) { + pt->update(ct, timestamp); + activated_trajectories.push_back(pt); + } else { + pt->reactivate(ct, timestamp); + retrieved_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool lost_trajectories; + for (size_t i = 0; i < mismatch_row.size(); ++i) { + Trajectory *pt = next_trajectory_pool[mismatch_row[i]]; + if (pt->state != Lost) { + pt->mark_lost(); + lost_trajectories.push_back(pt); + } + } + + TrajectoryPtrPool nnext_candidates(mismatch_col.size()); + for (size_t i = 0; i < mismatch_col.size(); ++i) + nnext_candidates[i] = next_candidates[mismatch_col[i]]; + cost = iou_distance(unconfirmed_trajectories, nnext_candidates); + linear_assignment(cost, 0.7f, &matches, &mismatch_row, &mismatch_col); + + for (miter = matches.begin(); miter != matches.end(); miter++) { + unconfirmed_trajectories[miter->first]->update( + nnext_candidates[miter->second], timestamp); + activated_trajectories.push_back(unconfirmed_trajectories[miter->first]); + } + + TrajectoryPtrPool removed_trajectories; + + for (size_t i = 0; i < mismatch_row.size(); ++i) { + unconfirmed_trajectories[mismatch_row[i]]->mark_removed(); + removed_trajectories.push_back(unconfirmed_trajectories[mismatch_row[i]]); + } + + for (size_t i = 0; i < mismatch_col.size(); ++i) { + if (nnext_candidates[mismatch_col[i]]->score < det_thresh) continue; + nnext_candidates[mismatch_col[i]]->activate(timestamp); + activated_trajectories.push_back(nnext_candidates[mismatch_col[i]]); + } + + for (size_t i = 0; i < this->lost_trajectories.size(); ++i) { + Trajectory < = this->lost_trajectories[i]; + if (timestamp - lt.timestamp > max_lost_time) { + lt.mark_removed(); + removed_trajectories.push_back(<); + } + } + + TrajectoryPoolIterator piter; + for (piter = this->tracked_trajectories.begin(); + piter != this->tracked_trajectories.end();) { + if (piter->state != Tracked) + piter = this->tracked_trajectories.erase(piter); + else + ++piter; + } + + this->tracked_trajectories += activated_trajectories; + this->tracked_trajectories += retrieved_trajectories; + + this->lost_trajectories -= this->tracked_trajectories; + this->lost_trajectories += lost_trajectories; + this->lost_trajectories -= this->removed_trajectories; + this->removed_trajectories += removed_trajectories; + remove_duplicate_trajectory(&this->tracked_trajectories, + &this->lost_trajectories); + + tracks->clear(); + for (size_t i = 0; i < this->tracked_trajectories.size(); ++i) { + if (this->tracked_trajectories[i].is_activated) { + Track track = {this->tracked_trajectories[i].id, + this->tracked_trajectories[i].score, + this->tracked_trajectories[i].ltrb}; + tracks->push_back(track); + } + } + return 0; +} + +cv::Mat JDETracker::motion_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b) { + if (0 == a.size() || 0 == b.size()) + return cv::Mat(a.size(), b.size(), CV_32F); + + cv::Mat edists = embedding_distance(a, b); + cv::Mat mdists = mahalanobis_distance(a, b); + cv::Mat fdists = lambda * edists + (1 - lambda) * mdists; + + const float gate_thresh = chi2inv95[4]; + for (int i = 0; i < fdists.rows; ++i) { + for (int j = 0; j < fdists.cols; ++j) { + if (*mdists.ptr(i, j) > gate_thresh) + *fdists.ptr(i, j) = FLT_MAX; + } + } + + return fdists; +} + +void JDETracker::linear_assignment(const cv::Mat &cost, + float cost_limit, + Match *matches, + std::vector *mismatch_row, + std::vector *mismatch_col) { + matches->clear(); + mismatch_row->clear(); + mismatch_col->clear(); + if (cost.empty()) { + for (int i = 0; i < cost.rows; ++i) mismatch_row->push_back(i); + for (int i = 0; i < cost.cols; ++i) mismatch_col->push_back(i); + return; + } + + float opt = 0; + cv::Mat x(cost.rows, 1, CV_32S); + cv::Mat y(cost.cols, 1, CV_32S); + + lapjv_internal(cost, + true, + cost_limit, + reinterpret_cast(x.data), + reinterpret_cast(y.data)); + + for (int i = 0; i < x.rows; ++i) { + int j = *x.ptr(i); + if (j >= 0) + matches->insert({i, j}); + else + mismatch_row->push_back(i); + } + + for (int i = 0; i < y.rows; ++i) { + int j = *y.ptr(i); + if (j < 0) mismatch_col->push_back(i); + } + + return; +} + +void JDETracker::remove_duplicate_trajectory(TrajectoryPool *a, + TrajectoryPool *b, + float iou_thresh) { + if (a->size() == 0 || b->size() == 0) return; + + cv::Mat dist = iou_distance(*a, *b); + cv::Mat mask = dist < iou_thresh; + std::vector idx; + cv::findNonZero(mask, idx); + + std::vector da; + std::vector db; + for (size_t i = 0; i < idx.size(); ++i) { + int ta = (*a)[idx[i].y].timestamp - (*a)[idx[i].y].starttime; + int tb = (*b)[idx[i].x].timestamp - (*b)[idx[i].x].starttime; + if (ta > tb) + db.push_back(idx[i].x); + else + da.push_back(idx[i].y); + } + + int id = 0; + TrajectoryPoolIterator piter; + for (piter = a->begin(); piter != a->end();) { + std::vector::iterator iter = find(da.begin(), da.end(), id++); + if (iter != da.end()) + piter = a->erase(piter); + else + ++piter; + } + + id = 0; + for (piter = b->begin(); piter != b->end();) { + std::vector::iterator iter = find(db.begin(), db.end(), id++); + if (iter != db.end()) + piter = b->erase(piter); + else + ++piter; + } +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/trajectory.cc b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/trajectory.cc new file mode 100644 index 0000000000000000000000000000000000000000..0ff2e1a5fc7088eec94f052d933d22589d2a81c0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/cpp/src/trajectory.cc @@ -0,0 +1,517 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// The code is based on: +// https://github.com/CnybTseng/JDE/blob/master/platforms/common/trajectory.cpp +// Ths copyright of CnybTseng/JDE is as follows: +// MIT License + +#include "include/trajectory.h" +#include + +namespace PaddleDetection { + +void TKalmanFilter::init(const cv::Mat &measurement) { + measurement.copyTo(statePost(cv::Rect(0, 0, 1, 4))); + statePost(cv::Rect(0, 4, 1, 4)).setTo(0); + statePost.copyTo(statePre); + + float varpos = 2 * std_weight_position * (*measurement.ptr(3)); + varpos *= varpos; + float varvel = 10 * std_weight_velocity * (*measurement.ptr(3)); + varvel *= varvel; + + errorCovPost.setTo(0); + *errorCovPost.ptr(0, 0) = varpos; + *errorCovPost.ptr(1, 1) = varpos; + *errorCovPost.ptr(2, 2) = 1e-4f; + *errorCovPost.ptr(3, 3) = varpos; + *errorCovPost.ptr(4, 4) = varvel; + *errorCovPost.ptr(5, 5) = varvel; + *errorCovPost.ptr(6, 6) = 1e-10f; + *errorCovPost.ptr(7, 7) = varvel; + errorCovPost.copyTo(errorCovPre); +} + +const cv::Mat &TKalmanFilter::predict() { + float varpos = std_weight_position * (*statePre.ptr(3)); + varpos *= varpos; + float varvel = std_weight_velocity * (*statePre.ptr(3)); + varvel *= varvel; + + processNoiseCov.setTo(0); + *processNoiseCov.ptr(0, 0) = varpos; + *processNoiseCov.ptr(1, 1) = varpos; + *processNoiseCov.ptr(2, 2) = 1e-4f; + *processNoiseCov.ptr(3, 3) = varpos; + *processNoiseCov.ptr(4, 4) = varvel; + *processNoiseCov.ptr(5, 5) = varvel; + *processNoiseCov.ptr(6, 6) = 1e-10f; + *processNoiseCov.ptr(7, 7) = varvel; + + return cv::KalmanFilter::predict(); +} + +const cv::Mat &TKalmanFilter::correct(const cv::Mat &measurement) { + float varpos = std_weight_position * (*measurement.ptr(3)); + varpos *= varpos; + + measurementNoiseCov.setTo(0); + *measurementNoiseCov.ptr(0, 0) = varpos; + *measurementNoiseCov.ptr(1, 1) = varpos; + *measurementNoiseCov.ptr(2, 2) = 1e-2f; + *measurementNoiseCov.ptr(3, 3) = varpos; + + return cv::KalmanFilter::correct(measurement); +} + +void TKalmanFilter::project(cv::Mat *mean, cv::Mat *covariance) const { + float varpos = std_weight_position * (*statePost.ptr(3)); + varpos *= varpos; + + cv::Mat measurementNoiseCov_ = cv::Mat::eye(4, 4, CV_32F); + *measurementNoiseCov_.ptr(0, 0) = varpos; + *measurementNoiseCov_.ptr(1, 1) = varpos; + *measurementNoiseCov_.ptr(2, 2) = 1e-2f; + *measurementNoiseCov_.ptr(3, 3) = varpos; + + *mean = measurementMatrix * statePost; + cv::Mat temp = measurementMatrix * errorCovPost; + gemm(temp, + measurementMatrix, + 1, + measurementNoiseCov_, + 1, + *covariance, + cv::GEMM_2_T); +} + +int Trajectory::count = 0; + +const cv::Mat &Trajectory::predict(void) { + if (state != Tracked) *cv::KalmanFilter::statePost.ptr(7) = 0; + return TKalmanFilter::predict(); +} + +void Trajectory::update(Trajectory *traj, + int timestamp_, + bool update_embedding_) { + timestamp = timestamp_; + ++length; + ltrb = traj->ltrb; + xyah = traj->xyah; + TKalmanFilter::correct(cv::Mat(traj->xyah)); + state = Tracked; + is_activated = true; + score = traj->score; + if (update_embedding_) update_embedding(traj->current_embedding); +} + +void Trajectory::activate(int timestamp_) { + id = next_id(); + TKalmanFilter::init(cv::Mat(xyah)); + length = 0; + state = Tracked; + if (timestamp_ == 1) { + is_activated = true; + } + timestamp = timestamp_; + starttime = timestamp_; +} + +void Trajectory::reactivate(Trajectory *traj, int timestamp_, bool newid) { + TKalmanFilter::correct(cv::Mat(traj->xyah)); + update_embedding(traj->current_embedding); + length = 0; + state = Tracked; + is_activated = true; + timestamp = timestamp_; + if (newid) id = next_id(); +} + +void Trajectory::update_embedding(const cv::Mat &embedding) { + current_embedding = embedding / cv::norm(embedding); + if (smooth_embedding.empty()) { + smooth_embedding = current_embedding; + } else { + smooth_embedding = eta * smooth_embedding + (1 - eta) * current_embedding; + } + smooth_embedding = smooth_embedding / cv::norm(smooth_embedding); +} + +TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPool &b) { + TrajectoryPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i].id); + if (iter == ids.end()) { + sum.push_back(b[i]); + ids.push_back(b[i].id); + } + } + + return sum; +} + +TrajectoryPool operator+(const TrajectoryPool &a, const TrajectoryPtrPool &b) { + TrajectoryPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) { + sum.push_back(*b[i]); + ids.push_back(b[i]->id); + } + } + + return sum; +} + +TrajectoryPool &operator+=(TrajectoryPool &a, // NOLINT + const TrajectoryPtrPool &b) { + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) ids[i] = a[i].id; + + for (size_t i = 0; i < b.size(); ++i) { + if (b[i]->smooth_embedding.empty()) continue; + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) { + a.push_back(*b[i]); + ids.push_back(b[i]->id); + } + } + + return a; +} + +TrajectoryPool operator-(const TrajectoryPool &a, const TrajectoryPool &b) { + TrajectoryPool dif; + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) ids[i] = b[i].id; + + for (size_t i = 0; i < a.size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), a[i].id); + if (iter == ids.end()) dif.push_back(a[i]); + } + + return dif; +} + +TrajectoryPool &operator-=(TrajectoryPool &a, // NOLINT + const TrajectoryPool &b) { + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) ids[i] = b[i].id; + + TrajectoryPoolIterator piter; + for (piter = a.begin(); piter != a.end();) { + std::vector::iterator iter = find(ids.begin(), ids.end(), piter->id); + if (iter == ids.end()) + ++piter; + else + piter = a.erase(piter); + } + + return a; +} + +TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b) { + TrajectoryPtrPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) ids[i] = a[i]->id; + + for (size_t i = 0; i < b.size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), b[i]->id); + if (iter == ids.end()) { + sum.push_back(b[i]); + ids.push_back(b[i]->id); + } + } + + return sum; +} + +TrajectoryPtrPool operator+(const TrajectoryPtrPool &a, TrajectoryPool *b) { + TrajectoryPtrPool sum; + sum.insert(sum.end(), a.begin(), a.end()); + + std::vector ids(a.size()); + for (size_t i = 0; i < a.size(); ++i) ids[i] = a[i]->id; + + for (size_t i = 0; i < b->size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), (*b)[i].id); + if (iter == ids.end()) { + sum.push_back(&(*b)[i]); + ids.push_back((*b)[i].id); + } + } + + return sum; +} + +TrajectoryPtrPool operator-(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b) { + TrajectoryPtrPool dif; + std::vector ids(b.size()); + for (size_t i = 0; i < b.size(); ++i) ids[i] = b[i]->id; + + for (size_t i = 0; i < a.size(); ++i) { + std::vector::iterator iter = find(ids.begin(), ids.end(), a[i]->id); + if (iter == ids.end()) dif.push_back(a[i]); + } + + return dif; +} + +cv::Mat embedding_distance(const TrajectoryPool &a, const TrajectoryPool &b) { + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + cv::Mat u = a[i].smooth_embedding; + cv::Mat v = b[j].smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + // double dist = cv::norm(a[i].smooth_embedding, b[j].smooth_embedding, + // cv::NORM_L2); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + } + } + return dists; +} + +cv::Mat embedding_distance(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b) { + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + // double dist = cv::norm(a[i]->smooth_embedding, b[j]->smooth_embedding, + // cv::NORM_L2); + // distsi[j] = static_cast(dist); + cv::Mat u = a[i]->smooth_embedding; + cv::Mat v = b[j]->smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + } + } + + return dists; +} + +cv::Mat embedding_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b) { + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + // double dist = cv::norm(a[i]->smooth_embedding, b[j].smooth_embedding, + // cv::NORM_L2); + // distsi[j] = static_cast(dist); + cv::Mat u = a[i]->smooth_embedding; + cv::Mat v = b[j].smooth_embedding; + double uv = u.dot(v); + double uu = u.dot(u); + double vv = v.dot(v); + double dist = std::abs(1. - uv / std::sqrt(uu * vv)); + distsi[j] = static_cast(std::max(std::min(dist, 2.), 0.)); + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPool &a, const TrajectoryPool &b) { + std::vector means(a.size()); + std::vector icovariances(a.size()); + for (size_t i = 0; i < a.size(); ++i) { + cv::Mat covariance; + a[i].project(&means[i], &covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Mat x(b[j].xyah); + float dist = + static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, + const TrajectoryPtrPool &b) { + std::vector means(a.size()); + std::vector icovariances(a.size()); + for (size_t i = 0; i < a.size(); ++i) { + cv::Mat covariance; + a[i]->project(&means[i], &covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Mat x(b[j]->xyah); + float dist = + static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +cv::Mat mahalanobis_distance(const TrajectoryPtrPool &a, + const TrajectoryPool &b) { + std::vector means(a.size()); + std::vector icovariances(a.size()); + + for (size_t i = 0; i < a.size(); ++i) { + cv::Mat covariance; + a[i]->project(&means[i], &covariance); + cv::invert(covariance, icovariances[i]); + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Mat x(b[j].xyah); + float dist = + static_cast(cv::Mahalanobis(x, means[i], icovariances[i])); + distsi[j] = dist * dist; + } + } + + return dists; +} + +static inline float calc_inter_area(const cv::Vec4f &a, const cv::Vec4f &b) { + if (a[2] < b[0] || a[0] > b[2] || a[3] < b[1] || a[1] > b[3]) return 0.f; + + float w = std::min(a[2], b[2]) - std::max(a[0], b[0]); + float h = std::min(a[3], b[3]) - std::max(a[1], b[1]); + return w * h; +} + +cv::Mat iou_distance(const TrajectoryPool &a, const TrajectoryPool &b) { + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) { + float w = a[i].ltrb[2] - a[i].ltrb[0]; + float h = a[i].ltrb[3] - a[i].ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) { + float w = b[j].ltrb[2] - b[j].ltrb[0]; + float h = b[j].ltrb[3] - b[j].ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + const cv::Vec4f &boxa = a[i].ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Vec4f &boxb = b[j].ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPtrPool &b) { + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) { + float w = a[i]->ltrb[2] - a[i]->ltrb[0]; + float h = a[i]->ltrb[3] - a[i]->ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) { + float w = b[j]->ltrb[2] - b[j]->ltrb[0]; + float h = b[j]->ltrb[3] - b[j]->ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + const cv::Vec4f &boxa = a[i]->ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Vec4f &boxb = b[j]->ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +cv::Mat iou_distance(const TrajectoryPtrPool &a, const TrajectoryPool &b) { + std::vector areaa(a.size()); + for (size_t i = 0; i < a.size(); ++i) { + float w = a[i]->ltrb[2] - a[i]->ltrb[0]; + float h = a[i]->ltrb[3] - a[i]->ltrb[1]; + areaa[i] = w * h; + } + + std::vector areab(b.size()); + for (size_t j = 0; j < b.size(); ++j) { + float w = b[j].ltrb[2] - b[j].ltrb[0]; + float h = b[j].ltrb[3] - b[j].ltrb[1]; + areab[j] = w * h; + } + + cv::Mat dists(a.size(), b.size(), CV_32F); + for (size_t i = 0; i < a.size(); ++i) { + const cv::Vec4f &boxa = a[i]->ltrb; + float *distsi = dists.ptr(i); + for (size_t j = 0; j < b.size(); ++j) { + const cv::Vec4f &boxb = b[j].ltrb; + float inters = calc_inter_area(boxa, boxb); + distsi[j] = 1.f - inters / (areaa[i] + areab[j] - inters); + } + } + + return dists; +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/README.md b/PaddleDetection-release-2.6/deploy/pptracking/python/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6731622dfcab1c6733c6fea0029fb70eefded8d0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/README.md @@ -0,0 +1,222 @@ +# PP-Tracking Python端预测部署 + +## 内容 +- [简介](#简介) +- [1-FairMOT模型导出和预测](#1-FairMOT模型导出和预测) +- [2-DeepSORT模型导出和预测](#2-DeepSORT模型导出和预测) +- [3-ByteTrack和OC_SORT模型导出和预测](#3-ByteTrack和OC_SORT模型导出和预测) +- [4-车辆跨镜头跟踪模型导出和预测](#4-车辆跨镜头跟踪模型导出和预测) +- [5-参数说明](#5-参数说明) + +## 简介 +在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。 + +主要包含两个步骤: +- 导出预测模型 +- 基于Python进行预测 + +PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/EXPORT_MODEL.md) +导出后目录下,包括`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`四个文件。 + +PP-Tracking也提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +## 1-FairMOT模型导出和预测 +### 1.1 导出预测模型 +```bash +# 命令行导出PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams + +# 命令行导出训完保存的checkpoint权重 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o weights=output/fairmot_hrnetv2_w18_dlafpn_30e_576x320/model_final.pdparams + +# 或下载PaddleDetection发布的已导出的模型 +wget https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar +tar -xvf fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar +``` +**注意:** + 导出的模型默认会保存在`output_inference`目录下,如新下载请存放于对应目录下。 + +### 1.2 用导出的模型基于Python去预测 +```bash +# 下载行人跟踪demo视频: +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# Python预测视频 +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_hrnetv2_w18_dlafpn_30e_576x320 --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images +``` + +### 1.3 用导出的模型基于Python去预测,以及进行流量计数、出入口统计和绘制跟踪轨迹等 +```bash +# 下载出入口统计demo视频: +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/entrance_count_demo.mp4 + +# Python预测视频 +python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_hrnetv2_w18_dlafpn_30e_576x320 --video_file=entrance_count_demo.mp4 --device=GPU --do_entrance_counting --draw_center_traj +``` + +**注意:** + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 + - `--do_entrance_counting`表示是否统计出入口流量,默认为False,`--draw_center_traj`表示是否绘制跟踪轨迹,默认为False。注意绘制跟踪轨迹的测试视频最好是静止摄像头拍摄的。 + - 对于多类别或车辆的FairMOT模型的导出和Python预测只需更改相应的config和模型权重即可。如: + ```bash + job_name=mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone + model_type=mot/mcfairmot + config=configs/${model_type}/${job_name}.yml + # 命令行导出模型 + CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/${job_name}.pdparams + # Python预测视频 + python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/${job_name} --video_file={your video name}.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images + ``` + - 多类别跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,cls_id,-1,-1`。 + - visdrone多类别跟踪demo视频可从此链接下载:`wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/visdrone_demo.mp4` + - bdd100k车辆跟踪和多类别demo视频可从此链接下载:`wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4` + + + +## 2-DeepSORT模型导出和预测 +### 2.1 导出预测模型 +Step 1:导出检测模型 +```bash +# 导出PPYOLOe行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +Step 2:导出行人ReID模型 +```bash +# 导出PCB Pyramid ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams +# 或者导出PPLCNet ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams +``` + +### 2.2 用导出的模型基于Python去预测行人跟踪 +```bash +# 下载行人跟踪demo视频: +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# 用导出的PPYOLOE行人检测模型和PPLCNet ReID模型 +python3.7 deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts --threshold=0.5 +``` + +### 2.3 用导出的模型基于Python去预测车辆跟踪 +```bash +# 下载车辆demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4 + +# 下载车辆检测PPYOLOE导出的模型: +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip +unzip mot_ppyoloe_l_36e_ppvehicle.zip + +# 下载车辆ReID导出的模型: +wget https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar +tar -xvf deepsort_pplcnet_vehicle.tar + +# 用导出的PPYOLOE车辆检测模型和PPLCNet车辆ReID模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=mot_ppyoloe_l_36e_ppvehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --tracker_config=deploy/pptracking/python/tracker_config.yml --device=GPU --threshold=0.5 --video_file=bdd100k_demo.mp4 --save_mot_txts --save_images +``` + +**注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: DeepSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 + - DeepSORT算法不支持多类别跟踪,只支持单类别跟踪,且ReID模型最好是与检测模型同一类别的物体训练过的,比如行人跟踪最好使用行人ReID模型,车辆跟踪最好使用车辆ReID模型。 + + + +## 3-ByteTrack和OC_SORT模型导出和预测 +### 3.1 导出预测模型 +```bash +# 导出PPYOLOe行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +### 3.2 用导出的模型基于Python去预测行人跟踪 +```bash +# 下载行人跟踪demo视频: +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# 用导出的PPYOLOe行人检测模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts + +# 用导出的PPYOLOe行人检测模型和PPLCNet ReID模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images +``` +**注意:** + - 运行ByteTrack模型需要确认`tracker_config.yml`的跟踪器类型为`type: JDETracker`。 + - 可切换`tracker_config.yml`的跟踪器类型为`type: OCSORTTracker`运行OC_SORT模型。 + - ByteTrack模型是加载导出的检测器和单独配置的`--tracker_config`文件运行的,为了实时跟踪所以不需要reid模型,`--reid_model_dir`表示reid导出模型的路径,默认为空,加不加具体视效果而定; + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 + + + +## 4-车辆跨镜头跟踪模型导出和预测 +### 4.1 导出预测模型 +Step 1:下载导出的检测模型 +```bash +# 下载车辆检测PPYOLOE导出的模型: +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip +unzip mot_ppyoloe_l_36e_ppvehicle.zip +``` +Step 2:下载导出的ReID模型 +```bash +wget https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar +tar -xvf deepsort_pplcnet_vehicle.tar +``` + +### 4.2 用导出的模型基于Python去做跨镜头跟踪 +```bash +# 下载demo测试视频 +wget https://paddledet.bj.bcebos.com/data/mot/demo/mtmct-demo.tar +tar -xvf mtmct-demo.tar + +# 用导出的PPYOLOE车辆检测模型和PPLCNet车辆ReID模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=mot_ppyoloe_l_36e_ppvehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --tracker_config=deploy/pptracking/python/tracker_config.yml --mtmct_dir=mtmct-demo --mtmct_cfg=deploy/pptracking/python/mtmct_cfg.yml --device=GPU --threshold=0.5 --save_mot_txts --save_images +``` + +**注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: DeepSORTTracker`,跨镜头跟踪仅支持DeepSORT。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt),或`--save_images`表示保存跟踪结果可视化图片。 + - 跨镜头跟踪结果txt文件每行信息是`camera_id,frame,id,x1,y1,w,h,-1,-1`。 + - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 + - DeepSORT算法不支持多类别跟踪,只支持单类别跟踪,且ReID模型最好是与检测模型同一类别的物体训练过的,比如行人跟踪最好使用行人ReID模型,车辆跟踪最好使用车辆ReID模型。 + - `--mtmct_dir`是MTMCT预测的某个场景的文件夹名字,里面包含该场景不同摄像头拍摄视频的图片文件夹,其数量至少为两个。 + - `--mtmct_cfg`是MTMCT预测的某个场景的配置文件,里面包含该一些trick操作的开关和该场景摄像头相关设置的文件路径,用户可以自行更改相关路径以及设置某些操作是否启用。 + + +## 5-参数说明 + +| 参数 | 是否必须|含义 | +|-------|-------|----------| +| --model_dir | Yes| 上述导出的模型路径 | +| --reid_model_dir | Option| ReID导出的模型路径 | +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --batch_size | Option |预测时的batch size,在指定`image_dir`时有效,默认为1 | +| --threshold | Option|预测得分的阈值,默认为0.5| +| --output_dir | Option|可视化结果保存的根目录,默认为output/| +| --run_benchmark | Option| 是否运行benchmark,同时需指定`--image_file`或`--image_dir`,默认为False | +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --save_mot_txts | Option | 跟踪任务是否保存txt结果文件,默认为False | +| --save_images | Option | 跟踪任务是否保存视频的可视化图片,默认为False | +| --do_entrance_counting | Option | 跟踪任务是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 跟踪任务是否绘制跟踪轨迹,默认为False | +| --mtmct_dir | Option | 需要进行MTMCT跨境头跟踪预测的图片文件夹路径,默认为None | +| --mtmct_cfg | Option | 需要进行MTMCT跨境头跟踪预测的配置文件路径,默认为None | + +说明: + +- 参数优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。 +- run_mode:paddle代表使用AnalysisPredictor,精度float32来推理,其他参数指用AnalysisPredictor,TensorRT不同精度来推理。 +- 如果安装的PaddlePaddle不支持基于TensorRT进行预测,需要自行编译,详细可参考[预测库编译教程](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。 +- --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/benchmark_utils.py b/PaddleDetection-release-2.6/deploy/pptracking/python/benchmark_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..adf36217955ed71103ad46a7e7ae5cb488e93d96 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/benchmark_utils.py @@ -0,0 +1,289 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import logging + +import paddle +import paddle.inference as paddle_infer + +from pathlib import Path + +CUR_DIR = os.path.dirname(os.path.abspath(__file__)) +LOG_PATH_ROOT = f"{CUR_DIR}/../../output" + + +class PaddleInferBenchmark(object): + def __init__(self, + config, + model_info: dict={}, + data_info: dict={}, + perf_info: dict={}, + resource_info: dict={}, + **kwargs): + """ + Construct PaddleInferBenchmark Class to format logs. + args: + config(paddle.inference.Config): paddle inference config + model_info(dict): basic model info + {'model_name': 'resnet50' + 'precision': 'fp32'} + data_info(dict): input data info + {'batch_size': 1 + 'shape': '3,224,224' + 'data_num': 1000} + perf_info(dict): performance result + {'preprocess_time_s': 1.0 + 'inference_time_s': 2.0 + 'postprocess_time_s': 1.0 + 'total_time_s': 4.0} + resource_info(dict): + cpu and gpu resources + {'cpu_rss': 100 + 'gpu_rss': 100 + 'gpu_util': 60} + """ + # PaddleInferBenchmark Log Version + self.log_version = "1.0.3" + + # Paddle Version + self.paddle_version = paddle.__version__ + self.paddle_commit = paddle.__git_commit__ + paddle_infer_info = paddle_infer.get_version() + self.paddle_branch = paddle_infer_info.strip().split(': ')[-1] + + # model info + self.model_info = model_info + + # data info + self.data_info = data_info + + # perf info + self.perf_info = perf_info + + try: + # required value + self.model_name = model_info['model_name'] + self.precision = model_info['precision'] + + self.batch_size = data_info['batch_size'] + self.shape = data_info['shape'] + self.data_num = data_info['data_num'] + + self.inference_time_s = round(perf_info['inference_time_s'], 4) + except: + self.print_help() + raise ValueError( + "Set argument wrong, please check input argument and its type") + + self.preprocess_time_s = perf_info.get('preprocess_time_s', 0) + self.postprocess_time_s = perf_info.get('postprocess_time_s', 0) + self.with_tracker = True if 'tracking_time_s' in perf_info else False + self.tracking_time_s = perf_info.get('tracking_time_s', 0) + self.total_time_s = perf_info.get('total_time_s', 0) + + self.inference_time_s_90 = perf_info.get("inference_time_s_90", "") + self.inference_time_s_99 = perf_info.get("inference_time_s_99", "") + self.succ_rate = perf_info.get("succ_rate", "") + self.qps = perf_info.get("qps", "") + + # conf info + self.config_status = self.parse_config(config) + + # mem info + if isinstance(resource_info, dict): + self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0)) + self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0)) + self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0)) + self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0)) + self.cpu_util = round(resource_info.get('cpu_util', 0), 2) + + self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0)) + self.gpu_util = round(resource_info.get('gpu_util', 0), 2) + self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2) + else: + self.cpu_rss_mb = 0 + self.cpu_vms_mb = 0 + self.cpu_shared_mb = 0 + self.cpu_dirty_mb = 0 + self.cpu_util = 0 + + self.gpu_rss_mb = 0 + self.gpu_util = 0 + self.gpu_mem_util = 0 + + # init benchmark logger + self.benchmark_logger() + + def benchmark_logger(self): + """ + benchmark logger + """ + # remove other logging handler + for handler in logging.root.handlers[:]: + logging.root.removeHandler(handler) + + # Init logger + FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s' + log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log" + Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True) + logging.basicConfig( + level=logging.INFO, + format=FORMAT, + handlers=[ + logging.FileHandler( + filename=log_output, mode='w'), + logging.StreamHandler(), + ]) + self.logger = logging.getLogger(__name__) + self.logger.info( + f"Paddle Inference benchmark log will be saved to {log_output}") + + def parse_config(self, config) -> dict: + """ + parse paddle predictor config + args: + config(paddle.inference.Config): paddle inference config + return: + config_status(dict): dict style config info + """ + if isinstance(config, paddle_infer.Config): + config_status = {} + config_status['runtime_device'] = "gpu" if config.use_gpu( + ) else "cpu" + config_status['ir_optim'] = config.ir_optim() + config_status['enable_tensorrt'] = config.tensorrt_engine_enabled() + config_status['precision'] = self.precision + config_status['enable_mkldnn'] = config.mkldnn_enabled() + config_status[ + 'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads( + ) + elif isinstance(config, dict): + config_status['runtime_device'] = config.get('runtime_device', "") + config_status['ir_optim'] = config.get('ir_optim', "") + config_status['enable_tensorrt'] = config.get('enable_tensorrt', "") + config_status['precision'] = config.get('precision', "") + config_status['enable_mkldnn'] = config.get('enable_mkldnn', "") + config_status['cpu_math_library_num_threads'] = config.get( + 'cpu_math_library_num_threads', "") + else: + self.print_help() + raise ValueError( + "Set argument config wrong, please check input argument and its type" + ) + return config_status + + def report(self, identifier=None): + """ + print log report + args: + identifier(string): identify log + """ + if identifier: + identifier = f"[{identifier}]" + else: + identifier = "" + + self.logger.info("\n") + self.logger.info( + "---------------------- Paddle info ----------------------") + self.logger.info(f"{identifier} paddle_version: {self.paddle_version}") + self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}") + self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}") + self.logger.info(f"{identifier} log_api_version: {self.log_version}") + self.logger.info( + "----------------------- Conf info -----------------------") + self.logger.info( + f"{identifier} runtime_device: {self.config_status['runtime_device']}" + ) + self.logger.info( + f"{identifier} ir_optim: {self.config_status['ir_optim']}") + self.logger.info(f"{identifier} enable_memory_optim: {True}") + self.logger.info( + f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}" + ) + self.logger.info( + f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}") + self.logger.info( + f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}" + ) + self.logger.info( + "----------------------- Model info ----------------------") + self.logger.info(f"{identifier} model_name: {self.model_name}") + self.logger.info(f"{identifier} precision: {self.precision}") + self.logger.info( + "----------------------- Data info -----------------------") + self.logger.info(f"{identifier} batch_size: {self.batch_size}") + self.logger.info(f"{identifier} input_shape: {self.shape}") + self.logger.info(f"{identifier} data_num: {self.data_num}") + self.logger.info( + "----------------------- Perf info -----------------------") + self.logger.info( + f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%" + ) + self.logger.info( + f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%" + ) + self.logger.info( + f"{identifier} total time spent(s): {self.total_time_s}") + + if self.with_tracker: + self.logger.info( + f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, " + f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, " + f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}, " + f"tracking_time(ms): {round(self.tracking_time_s*1000, 1)}") + else: + self.logger.info( + f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, " + f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, " + f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}" + ) + if self.inference_time_s_90: + self.looger.info( + f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}" + ) + if self.qps: + self.logger.info(f"{identifier} QPS: {self.qps}") + + def print_help(self): + """ + print function help + """ + print("""Usage: + ==== Print inference benchmark logs. ==== + config = paddle.inference.Config() + model_info = {'model_name': 'resnet50' + 'precision': 'fp32'} + data_info = {'batch_size': 1 + 'shape': '3,224,224' + 'data_num': 1000} + perf_info = {'preprocess_time_s': 1.0 + 'inference_time_s': 2.0 + 'postprocess_time_s': 1.0 + 'total_time_s': 4.0} + resource_info = {'cpu_rss_mb': 100 + 'gpu_rss_mb': 100 + 'gpu_util': 60} + log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info) + log('Test') + """) + + def __call__(self, identifier=None): + """ + __call__ + args: + identifier(string): identify log + """ + self.report(identifier) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/det_infer.py b/PaddleDetection-release-2.6/deploy/pptracking/python/det_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..3dec3e6d6353b324270feaeeb4cd37fbd055e972 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/det_infer.py @@ -0,0 +1,595 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +from functools import reduce + +import cv2 +import numpy as np +import math + +import paddle +from paddle.inference import Config +from paddle.inference import create_predictor + +import sys +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) +sys.path.insert(0, parent_path) + +from benchmark_utils import PaddleInferBenchmark +from picodet_postprocess import PicoDetPostProcess +from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, Pad, decode_image +from mot.visualize import visualize_box_mask +from mot_utils import argsparser, Timer, get_current_memory_mb + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', + 'PPYOLOE', + 'PicoDet', + 'JDE', + 'FairMOT', + 'DeepSORT', + 'StrongBaseline', +} + + +def bench_log(detector, img_list, model_info, batch_size=1, name=None): + mems = { + 'cpu_rss_mb': detector.cpu_mem / len(img_list), + 'gpu_rss_mb': detector.gpu_mem / len(img_list), + 'gpu_util': detector.gpu_util * 100 / len(img_list) + } + perf_info = detector.det_times.report(average=True) + data_info = { + 'batch_size': batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + log = PaddleInferBenchmark(detector.config, model_info, data_info, + perf_info, mems) + log(name) + + +class Detector(object): + """ + Args: + pred_config (object): config of model, defined by `Config(model_dir)` + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (str): The path of output + threshold (float): The threshold of score for visualization + """ + + def __init__( + self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, ): + self.pred_config = self.set_config(model_dir) + self.predictor, self.config = load_predictor( + model_dir, + run_mode=run_mode, + batch_size=batch_size, + min_subgraph_size=self.pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) + self.det_times = Timer() + self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0 + self.batch_size = batch_size + self.output_dir = output_dir + self.threshold = threshold + + def set_config(self, model_dir): + return PredictConfig(model_dir) + + def preprocess(self, image_list): + preprocess_ops = [] + for op_info in self.pred_config.preprocess_infos: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + preprocess_ops.append(eval(op_type)(**new_op_info)) + + input_im_lst = [] + input_im_info_lst = [] + for im_path in image_list: + im, im_info = preprocess(im_path, preprocess_ops) + input_im_lst.append(im) + input_im_info_lst.append(im_info) + inputs = create_inputs(input_im_lst, input_im_info_lst) + input_names = self.predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(inputs[input_names[i]]) + + return inputs + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes_num = result['boxes_num'] + if np_boxes_num[0] <= 0: + print('[WARNNING] No object detected.') + result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} + result = {k: v for k, v in result.items() if v is not None} + return result + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + ''' + # model prediction + np_boxes, np_boxes_num = None, None + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + boxes_tensor = self.predictor.get_output_handle(output_names[0]) + np_boxes = boxes_tensor.copy_to_cpu() + boxes_num = self.predictor.get_output_handle(output_names[1]) + np_boxes_num = boxes_num.copy_to_cpu() + result = dict(boxes=np_boxes, boxes_num=np_boxes_num) + return result + + def merge_batch_result(self, batch_result): + if len(batch_result) == 1: + return batch_result[0] + res_key = batch_result[0].keys() + results = {k: [] for k in res_key} + for res in batch_result: + for k, v in res.items(): + results[k].append(v) + for k, v in results.items(): + results[k] = np.concatenate(v) + return results + + def get_timer(self): + return self.det_times + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True): + batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size) + results = [] + for i in range(batch_loop_cnt): + start_index = i * self.batch_size + end_index = min((i + 1) * self.batch_size, len(image_list)) + batch_image_list = image_list[start_index:end_index] + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + if visual: + visualize( + batch_image_list, + result, + self.pred_config.labels, + output_dir=self.output_dir, + threshold=self.threshold) + + results.append(result) + if visual: + print('Test iter {}'.format(i)) + + results = self.merge_batch_result(results) + return results + + def predict_video(self, video_file, camera_id): + video_out_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + index = 1 + while (1): + ret, frame = capture.read() + if not ret: + break + print('detect frame: %d' % (index)) + index += 1 + results = self.predict_image([frame], visual=False) + + im = visualize_box_mask( + frame, + results, + self.pred_config.labels, + threshold=self.threshold) + im = np.array(im) + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + writer.release() + + +def create_inputs(imgs, im_info): + """generate input for different model type + Args: + imgs (list(numpy)): list of images (np.ndarray) + im_info (list(dict)): list of image info + Returns: + inputs (dict): input of model + """ + inputs = {} + + im_shape = [] + scale_factor = [] + if len(imgs) == 1: + inputs['image'] = np.array((imgs[0], )).astype('float32') + inputs['im_shape'] = np.array( + (im_info[0]['im_shape'], )).astype('float32') + inputs['scale_factor'] = np.array( + (im_info[0]['scale_factor'], )).astype('float32') + return inputs + + for e in im_info: + im_shape.append(np.array((e['im_shape'], )).astype('float32')) + scale_factor.append(np.array((e['scale_factor'], )).astype('float32')) + + inputs['im_shape'] = np.concatenate(im_shape, axis=0) + inputs['scale_factor'] = np.concatenate(scale_factor, axis=0) + + imgs_shape = [[e.shape[1], e.shape[2]] for e in imgs] + max_shape_h = max([e[0] for e in imgs_shape]) + max_shape_w = max([e[1] for e in imgs_shape]) + padding_imgs = [] + for img in imgs: + im_c, im_h, im_w = img.shape[:] + padding_im = np.zeros( + (im_c, max_shape_h, max_shape_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = img + padding_imgs.append(padding_im) + inputs['image'] = np.stack(padding_imgs, axis=0) + return inputs + + +class PredictConfig(): + """set config of preprocess, postprocess and visualize + Args: + model_dir (str): root path of model.yml + """ + + def __init__(self, model_dir): + # parsing Yaml config for Preprocess + deploy_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.labels = yml_conf['label_list'] + self.mask = False + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + if 'mask' in yml_conf: + self.mask = yml_conf['mask'] + self.tracker = None + if 'tracker' in yml_conf: + self.tracker = yml_conf['tracker'] + if 'NMS' in yml_conf: + self.nms = yml_conf['NMS'] + if 'fpn_stride' in yml_conf: + self.fpn_stride = yml_conf['fpn_stride'] + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def load_predictor(model_dir, + run_mode='paddle', + batch_size=1, + device='CPU', + min_subgraph_size=3, + use_dynamic_shape=False, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False): + """set AnalysisConfig, generate AnalysisPredictor + Args: + model_dir (str): root path of __model__ and __params__ + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8) + use_dynamic_shape (bool): use dynamic shape or not + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + Returns: + predictor (PaddlePredictor): AnalysisPredictor + Raises: + ValueError: predict by TensorRT need device == 'GPU'. + """ + if device != 'GPU' and run_mode != 'paddle': + raise ValueError( + "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}" + .format(run_mode, device)) + infer_model = os.path.join(model_dir, 'model.pdmodel') + infer_params = os.path.join(model_dir, 'model.pdiparams') + if not os.path.exists(infer_model): + infer_model = os.path.join(model_dir, 'inference.pdmodel') + infer_params = os.path.join(model_dir, 'inference.pdiparams') + if not os.path.exists(infer_model): + raise ValueError( + "Cannot find any inference model in dir: {},".format(model_dir)) + config = Config(infer_model, infer_params) + if device == 'GPU': + # initial GPU memory(M), device ID + config.enable_use_gpu(200, 0) + # optimize graph and fuse op + config.switch_ir_optim(True) + elif device == 'XPU': + config.enable_lite_engine() + config.enable_xpu(10 * 1024 * 1024) + else: + config.disable_gpu() + config.set_cpu_math_library_num_threads(cpu_threads) + if enable_mkldnn: + try: + # cache 10 different shapes for mkldnn to avoid memory leak + config.set_mkldnn_cache_capacity(10) + config.enable_mkldnn() + except Exception as e: + print( + "The current environment does not support `mkldnn`, so disable mkldnn." + ) + pass + + precision_map = { + 'trt_int8': Config.Precision.Int8, + 'trt_fp32': Config.Precision.Float32, + 'trt_fp16': Config.Precision.Half + } + if run_mode in precision_map.keys(): + config.enable_tensorrt_engine( + workspace_size=1 << 25, + max_batch_size=batch_size, + min_subgraph_size=min_subgraph_size, + precision_mode=precision_map[run_mode], + use_static=False, + use_calib_mode=trt_calib_mode) + + if use_dynamic_shape: + min_input_shape = { + 'image': [batch_size, 3, trt_min_shape, trt_min_shape] + } + max_input_shape = { + 'image': [batch_size, 3, trt_max_shape, trt_max_shape] + } + opt_input_shape = { + 'image': [batch_size, 3, trt_opt_shape, trt_opt_shape] + } + config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape, + opt_input_shape) + print('trt set dynamic shape done!') + + # disable print log when predict + config.disable_glog_info() + # enable shared memory + config.enable_memory_optim() + # disable feed, fetch OP, needed by zero_copy_run + config.switch_use_feed_fetch_ops(False) + predictor = create_predictor(config) + return predictor, config + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--infer_img or --infer_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def visualize(image_list, result, labels, output_dir='output/', threshold=0.5): + # visualize the predict result + start_idx = 0 + for idx, image_file in enumerate(image_list): + im_bboxes_num = result['boxes_num'][idx] + im_results = {} + if 'boxes' in result: + im_results['boxes'] = result['boxes'][start_idx:start_idx + + im_bboxes_num, :] + start_idx += im_bboxes_num + im = visualize_box_mask( + image_file, im_results, labels, threshold=threshold) + img_name = os.path.split(image_file)[-1] + if not os.path.exists(output_dir): + os.makedirs(output_dir) + out_path = os.path.join(output_dir, img_name) + im.save(out_path, quality=95) + print("save result to: " + out_path) + + +def print_arguments(args): + print('----------- Running Arguments -----------') + for arg, value in sorted(vars(args).items()): + print('%s: %s' % (arg, value)) + print('------------------------------------------') + + +def main(): + deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + detector_func = 'Detector' + detector = eval(detector_func)(FLAGS.model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.threshold, + output_dir=FLAGS.output_dir) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + if FLAGS.image_dir is None and FLAGS.image_file is not None: + assert FLAGS.batch_size == 1, "batch_size should be 1, when image_file is not None" + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='DET') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + main() diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/__init__.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..5f57110ca5731af0d91236516cbc3154f08b44be --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/__init__.py @@ -0,0 +1,25 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import matching +from . import tracker +from . import motion +from . import utils +from . import mtmct + +from .matching import * +from .tracker import * +from .motion import * +from .utils import * +from .mtmct import * diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/__init__.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..f6a88c5673a50452415b1f86f7b18bac12297f49 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/__init__.py @@ -0,0 +1,21 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import jde_matching +from . import deepsort_matching +from . import ocsort_matching + +from .jde_matching import * +from .deepsort_matching import * +from .ocsort_matching import * diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/deepsort_matching.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/deepsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..3859ccfbd1f384cc24716a94342230c2c8a2387f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/deepsort_matching.py @@ -0,0 +1,379 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/tree/master/deep_sort +""" + +import numpy as np +from scipy.optimize import linear_sum_assignment +from ..motion import kalman_filter + +INFTY_COST = 1e+5 + +__all__ = [ + 'iou_1toN', + 'iou_cost', + '_nn_euclidean_distance', + '_nn_cosine_distance', + 'NearestNeighborDistanceMetric', + 'min_cost_matching', + 'matching_cascade', + 'gate_cost_matrix', +] + + +def iou_1toN(bbox, candidates): + """ + Computer intersection over union (IoU) by one box to N candidates. + + Args: + bbox (ndarray): A bounding box in format `(top left x, top left y, width, height)`. + candidates (ndarray): A matrix of candidate bounding boxes (one per row) in the + same format as `bbox`. + + Returns: + ious (ndarray): The intersection over union in [0, 1] between the `bbox` + and each candidate. A higher score means a larger fraction of the + `bbox` is occluded by the candidate. + """ + bbox_tl = bbox[:2] + bbox_br = bbox[:2] + bbox[2:] + candidates_tl = candidates[:, :2] + candidates_br = candidates[:, :2] + candidates[:, 2:] + + tl = np.c_[np.maximum(bbox_tl[0], candidates_tl[:, 0])[:, np.newaxis], + np.maximum(bbox_tl[1], candidates_tl[:, 1])[:, np.newaxis]] + br = np.c_[np.minimum(bbox_br[0], candidates_br[:, 0])[:, np.newaxis], + np.minimum(bbox_br[1], candidates_br[:, 1])[:, np.newaxis]] + wh = np.maximum(0., br - tl) + + area_intersection = wh.prod(axis=1) + area_bbox = bbox[2:].prod() + area_candidates = candidates[:, 2:].prod(axis=1) + ious = area_intersection / (area_bbox + area_candidates - area_intersection) + return ious + + +def iou_cost(tracks, detections, track_indices=None, detection_indices=None): + """ + IoU distance metric. + + Args: + tracks (list[Track]): A list of tracks. + detections (list[Detection]): A list of detections. + track_indices (Optional[list[int]]): A list of indices to tracks that + should be matched. Defaults to all `tracks`. + detection_indices (Optional[list[int]]): A list of indices to detections + that should be matched. Defaults to all `detections`. + + Returns: + cost_matrix (ndarray): A cost matrix of shape len(track_indices), + len(detection_indices) where entry (i, j) is + `1 - iou(tracks[track_indices[i]], detections[detection_indices[j]])`. + """ + if track_indices is None: + track_indices = np.arange(len(tracks)) + if detection_indices is None: + detection_indices = np.arange(len(detections)) + + cost_matrix = np.zeros((len(track_indices), len(detection_indices))) + for row, track_idx in enumerate(track_indices): + if tracks[track_idx].time_since_update > 1: + cost_matrix[row, :] = 1e+5 + continue + + bbox = tracks[track_idx].to_tlwh() + candidates = np.asarray([detections[i].tlwh for i in detection_indices]) + cost_matrix[row, :] = 1. - iou_1toN(bbox, candidates) + return cost_matrix + + +def _nn_euclidean_distance(s, q): + """ + Compute pair-wise squared (Euclidean) distance between points in `s` and `q`. + + Args: + s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M. + q (ndarray): Query points: an LxM matrix of L samples of dimensionality M. + + Returns: + distances (ndarray): A vector of length M that contains for each entry in `q` the + smallest Euclidean distance to a sample in `s`. + """ + s, q = np.asarray(s), np.asarray(q) + if len(s) == 0 or len(q) == 0: + return np.zeros((len(s), len(q))) + s2, q2 = np.square(s).sum(axis=1), np.square(q).sum(axis=1) + distances = -2. * np.dot(s, q.T) + s2[:, None] + q2[None, :] + distances = np.clip(distances, 0., float(np.inf)) + + return np.maximum(0.0, distances.min(axis=0)) + + +def _nn_cosine_distance(s, q): + """ + Compute pair-wise cosine distance between points in `s` and `q`. + + Args: + s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M. + q (ndarray): Query points: an LxM matrix of L samples of dimensionality M. + + Returns: + distances (ndarray): A vector of length M that contains for each entry in `q` the + smallest Euclidean distance to a sample in `s`. + """ + s = np.asarray(s) / np.linalg.norm(s, axis=1, keepdims=True) + q = np.asarray(q) / np.linalg.norm(q, axis=1, keepdims=True) + distances = 1. - np.dot(s, q.T) + + return distances.min(axis=0) + + +class NearestNeighborDistanceMetric(object): + """ + A nearest neighbor distance metric that, for each target, returns + the closest distance to any sample that has been observed so far. + + Args: + metric (str): Either "euclidean" or "cosine". + matching_threshold (float): The matching threshold. Samples with larger + distance are considered an invalid match. + budget (Optional[int]): If not None, fix samples per class to at most + this number. Removes the oldest samples when the budget is reached. + + Attributes: + samples (Dict[int -> List[ndarray]]): A dictionary that maps from target + identities to the list of samples that have been observed so far. + """ + + def __init__(self, metric, matching_threshold, budget=None): + if metric == "euclidean": + self._metric = _nn_euclidean_distance + elif metric == "cosine": + self._metric = _nn_cosine_distance + else: + raise ValueError( + "Invalid metric; must be either 'euclidean' or 'cosine'") + self.matching_threshold = matching_threshold + self.budget = budget + self.samples = {} + + def partial_fit(self, features, targets, active_targets): + """ + Update the distance metric with new data. + + Args: + features (ndarray): An NxM matrix of N features of dimensionality M. + targets (ndarray): An integer array of associated target identities. + active_targets (List[int]): A list of targets that are currently + present in the scene. + """ + for feature, target in zip(features, targets): + self.samples.setdefault(target, []).append(feature) + if self.budget is not None: + self.samples[target] = self.samples[target][-self.budget:] + self.samples = {k: self.samples[k] for k in active_targets} + + def distance(self, features, targets): + """ + Compute distance between features and targets. + + Args: + features (ndarray): An NxM matrix of N features of dimensionality M. + targets (list[int]): A list of targets to match the given `features` against. + + Returns: + cost_matrix (ndarray): a cost matrix of shape len(targets), len(features), + where element (i, j) contains the closest squared distance between + `targets[i]` and `features[j]`. + """ + cost_matrix = np.zeros((len(targets), len(features))) + for i, target in enumerate(targets): + cost_matrix[i, :] = self._metric(self.samples[target], features) + return cost_matrix + + +def min_cost_matching(distance_metric, + max_distance, + tracks, + detections, + track_indices=None, + detection_indices=None): + """ + Solve linear assignment problem. + + Args: + distance_metric : + Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray + The distance metric is given a list of tracks and detections as + well as a list of N track indices and M detection indices. The + metric should return the NxM dimensional cost matrix, where element + (i, j) is the association cost between the i-th track in the given + track indices and the j-th detection in the given detection_indices. + max_distance (float): Gating threshold. Associations with cost larger + than this value are disregarded. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (list[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + + Returns: + A tuple (List[(int, int)], List[int], List[int]) with the following + three entries: + * A list of matched track and detection indices. + * A list of unmatched track indices. + * A list of unmatched detection indices. + """ + if track_indices is None: + track_indices = np.arange(len(tracks)) + if detection_indices is None: + detection_indices = np.arange(len(detections)) + + if len(detection_indices) == 0 or len(track_indices) == 0: + return [], track_indices, detection_indices # Nothing to match. + + cost_matrix = distance_metric(tracks, detections, track_indices, + detection_indices) + + cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5 + indices = linear_sum_assignment(cost_matrix) + + matches, unmatched_tracks, unmatched_detections = [], [], [] + for col, detection_idx in enumerate(detection_indices): + if col not in indices[1]: + unmatched_detections.append(detection_idx) + for row, track_idx in enumerate(track_indices): + if row not in indices[0]: + unmatched_tracks.append(track_idx) + for row, col in zip(indices[0], indices[1]): + track_idx = track_indices[row] + detection_idx = detection_indices[col] + if cost_matrix[row, col] > max_distance: + unmatched_tracks.append(track_idx) + unmatched_detections.append(detection_idx) + else: + matches.append((track_idx, detection_idx)) + return matches, unmatched_tracks, unmatched_detections + + +def matching_cascade(distance_metric, + max_distance, + cascade_depth, + tracks, + detections, + track_indices=None, + detection_indices=None): + """ + Run matching cascade. + + Args: + distance_metric : + Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray + The distance metric is given a list of tracks and detections as + well as a list of N track indices and M detection indices. The + metric should return the NxM dimensional cost matrix, where element + (i, j) is the association cost between the i-th track in the given + track indices and the j-th detection in the given detection_indices. + max_distance (float): Gating threshold. Associations with cost larger + than this value are disregarded. + cascade_depth (int): The cascade depth, should be se to the maximum + track age. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (list[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + + Returns: + A tuple (List[(int, int)], List[int], List[int]) with the following + three entries: + * A list of matched track and detection indices. + * A list of unmatched track indices. + * A list of unmatched detection indices. + """ + if track_indices is None: + track_indices = list(range(len(tracks))) + if detection_indices is None: + detection_indices = list(range(len(detections))) + + unmatched_detections = detection_indices + matches = [] + for level in range(cascade_depth): + if len(unmatched_detections) == 0: # No detections left + break + + track_indices_l = [ + k for k in track_indices if tracks[k].time_since_update == 1 + level + ] + if len(track_indices_l) == 0: # Nothing to match at this level + continue + + matches_l, _, unmatched_detections = \ + min_cost_matching( + distance_metric, max_distance, tracks, detections, + track_indices_l, unmatched_detections) + matches += matches_l + unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches)) + return matches, unmatched_tracks, unmatched_detections + + +def gate_cost_matrix(kf, + cost_matrix, + tracks, + detections, + track_indices, + detection_indices, + gated_cost=INFTY_COST, + only_position=False): + """ + Invalidate infeasible entries in cost matrix based on the state + distributions obtained by Kalman filtering. + + Args: + kf (object): The Kalman filter. + cost_matrix (ndarray): The NxM dimensional cost matrix, where N is the + number of track indices and M is the number of detection indices, + such that entry (i, j) is the association cost between + `tracks[track_indices[i]]` and `detections[detection_indices[j]]`. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (List[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + gated_cost (Optional[float]): Entries in the cost matrix corresponding + to infeasible associations are set this value. Defaults to a very + large value. + only_position (Optional[bool]): If True, only the x, y position of the + state distribution is considered during gating. Default False. + """ + gating_dim = 2 if only_position else 4 + gating_threshold = kalman_filter.chi2inv95[gating_dim] + measurements = np.asarray( + [detections[i].to_xyah() for i in detection_indices]) + for row, track_idx in enumerate(track_indices): + track = tracks[track_idx] + gating_distance = kf.gating_distance(track.mean, track.covariance, + measurements, only_position) + cost_matrix[row, gating_distance > gating_threshold] = gated_cost + return cost_matrix diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/jde_matching.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/jde_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..ac28f90167f1b98c3193c375bd74536cf109a3ee --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/jde_matching.py @@ -0,0 +1,163 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py +""" + +try: + import lap +except: + print( + 'Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + pass + +import scipy +import numpy as np +from scipy.spatial.distance import cdist +from ..motion import kalman_filter +import warnings +warnings.filterwarnings("ignore") + +__all__ = [ + 'merge_matches', + 'linear_assignment', + 'bbox_ious', + 'iou_distance', + 'embedding_distance', + 'fuse_motion', +] + + +def merge_matches(m1, m2, shape): + O, P, Q = shape + m1 = np.asarray(m1) + m2 = np.asarray(m2) + + M1 = scipy.sparse.coo_matrix( + (np.ones(len(m1)), (m1[:, 0], m1[:, 1])), shape=(O, P)) + M2 = scipy.sparse.coo_matrix( + (np.ones(len(m2)), (m2[:, 0], m2[:, 1])), shape=(P, Q)) + + mask = M1 * M2 + match = mask.nonzero() + match = list(zip(match[0], match[1])) + unmatched_O = tuple(set(range(O)) - set([i for i, j in match])) + unmatched_Q = tuple(set(range(Q)) - set([j for i, j in match])) + + return match, unmatched_O, unmatched_Q + + +def linear_assignment(cost_matrix, thresh): + try: + import lap + except Exception as e: + raise RuntimeError( + 'Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + if cost_matrix.size == 0: + return np.empty( + (0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple( + range(cost_matrix.shape[1])) + matches, unmatched_a, unmatched_b = [], [], [] + cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh) + for ix, mx in enumerate(x): + if mx >= 0: + matches.append([ix, mx]) + unmatched_a = np.where(x < 0)[0] + unmatched_b = np.where(y < 0)[0] + matches = np.asarray(matches) + return matches, unmatched_a, unmatched_b + + +def bbox_ious(atlbrs, btlbrs): + boxes = np.ascontiguousarray(atlbrs, dtype=np.float32) + query_boxes = np.ascontiguousarray(btlbrs, dtype=np.float32) + N = boxes.shape[0] + K = query_boxes.shape[0] + ious = np.zeros((N, K), dtype=boxes.dtype) + if N * K == 0: + return ious + + for k in range(K): + box_area = ((query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1)) + for n in range(N): + iw = (min(boxes[n, 2], query_boxes[k, 2]) - max( + boxes[n, 0], query_boxes[k, 0]) + 1) + if iw > 0: + ih = (min(boxes[n, 3], query_boxes[k, 3]) - max( + boxes[n, 1], query_boxes[k, 1]) + 1) + if ih > 0: + ua = float((boxes[n, 2] - boxes[n, 0] + 1) * (boxes[ + n, 3] - boxes[n, 1] + 1) + box_area - iw * ih) + ious[n, k] = iw * ih / ua + return ious + + +def iou_distance(atracks, btracks): + """ + Compute cost based on IoU between two list[STrack]. + """ + if (len(atracks) > 0 and isinstance(atracks[0], np.ndarray)) or ( + len(btracks) > 0 and isinstance(btracks[0], np.ndarray)): + atlbrs = atracks + btlbrs = btracks + else: + atlbrs = [track.tlbr for track in atracks] + btlbrs = [track.tlbr for track in btracks] + _ious = bbox_ious(atlbrs, btlbrs) + cost_matrix = 1 - _ious + + return cost_matrix + + +def embedding_distance(tracks, detections, metric='euclidean'): + """ + Compute cost based on features between two list[STrack]. + """ + cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float32) + if cost_matrix.size == 0: + return cost_matrix + det_features = np.asarray( + [track.curr_feat for track in detections], dtype=np.float32) + track_features = np.asarray( + [track.smooth_feat for track in tracks], dtype=np.float32) + cost_matrix = np.maximum(0.0, cdist(track_features, det_features, + metric)) # Nomalized features + return cost_matrix + + +def fuse_motion(kf, + cost_matrix, + tracks, + detections, + only_position=False, + lambda_=0.98): + if cost_matrix.size == 0: + return cost_matrix + gating_dim = 2 if only_position else 4 + gating_threshold = kalman_filter.chi2inv95[gating_dim] + measurements = np.asarray([det.to_xyah() for det in detections]) + for row, track in enumerate(tracks): + gating_distance = kf.gating_distance( + track.mean, + track.covariance, + measurements, + only_position, + metric='maha') + cost_matrix[row, gating_distance > gating_threshold] = np.inf + cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_ + ) * gating_distance + return cost_matrix diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/ocsort_matching.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/ocsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..3caae6cdca56a03ab1ac4dff8d44d1a653b202b2 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/matching/ocsort_matching.py @@ -0,0 +1,169 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/association.py +""" + +import os +import numpy as np + + +def iou_batch(bboxes1, bboxes2): + """ + From SORT: Computes IOU between two bboxes in the form [x1,y1,x2,y2] + """ + bboxes2 = np.expand_dims(bboxes2, 0) + bboxes1 = np.expand_dims(bboxes1, 1) + + xx1 = np.maximum(bboxes1[..., 0], bboxes2[..., 0]) + yy1 = np.maximum(bboxes1[..., 1], bboxes2[..., 1]) + xx2 = np.minimum(bboxes1[..., 2], bboxes2[..., 2]) + yy2 = np.minimum(bboxes1[..., 3], bboxes2[..., 3]) + w = np.maximum(0., xx2 - xx1) + h = np.maximum(0., yy2 - yy1) + wh = w * h + o = wh / ((bboxes1[..., 2] - bboxes1[..., 0]) * + (bboxes1[..., 3] - bboxes1[..., 1]) + + (bboxes2[..., 2] - bboxes2[..., 0]) * + (bboxes2[..., 3] - bboxes2[..., 1]) - wh) + return (o) + + +def speed_direction_batch(dets, tracks): + tracks = tracks[..., np.newaxis] + CX1, CY1 = (dets[:, 0] + dets[:, 2]) / 2.0, (dets[:, 1] + dets[:, 3]) / 2.0 + CX2, CY2 = (tracks[:, 0] + tracks[:, 2]) / 2.0, ( + tracks[:, 1] + tracks[:, 3]) / 2.0 + dx = CX1 - CX2 + dy = CY1 - CY2 + norm = np.sqrt(dx**2 + dy**2) + 1e-6 + dx = dx / norm + dy = dy / norm + return dy, dx # size: num_track x num_det + + +def linear_assignment(cost_matrix): + try: + import lap + _, x, y = lap.lapjv(cost_matrix, extend_cost=True) + match = np.array([[y[i], i] for i in x if i >= 0]) + return match + except ImportError: + from scipy.optimize import linear_sum_assignment + x, y = linear_sum_assignment(cost_matrix) + return np.array(list(zip(x, y))) + + +def associate(detections, trackers, iou_threshold, velocities, previous_obs, + vdc_weight): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + Y, X = speed_direction_batch(detections, previous_obs) + inertia_Y, inertia_X = velocities[:, 0], velocities[:, 1] + inertia_Y = np.repeat(inertia_Y[:, np.newaxis], Y.shape[1], axis=1) + inertia_X = np.repeat(inertia_X[:, np.newaxis], X.shape[1], axis=1) + diff_angle_cos = inertia_X * X + inertia_Y * Y + diff_angle_cos = np.clip(diff_angle_cos, a_min=-1, a_max=1) + diff_angle = np.arccos(diff_angle_cos) + diff_angle = (np.pi / 2.0 - np.abs(diff_angle)) / np.pi + + valid_mask = np.ones(previous_obs.shape[0]) + valid_mask[np.where(previous_obs[:, 4] < 0)] = 0 + + iou_matrix = iou_batch(detections, trackers) + scores = np.repeat( + detections[:, -1][:, np.newaxis], trackers.shape[0], axis=1) + # iou_matrix = iou_matrix * scores # a trick sometiems works, we don't encourage this + valid_mask = np.repeat(valid_mask[:, np.newaxis], X.shape[1], axis=1) + + angle_diff_cost = (valid_mask * diff_angle) * vdc_weight + angle_diff_cost = angle_diff_cost.T + angle_diff_cost = angle_diff_cost * scores + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-(iou_matrix + angle_diff_cost)) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) + + +def associate_only_iou(detections, trackers, iou_threshold): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + iou_matrix = iou_batch(detections, trackers) + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-iou_matrix) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/__init__.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b06f5c4c09010234134104da059ff6ffb3107dc5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/__init__.py @@ -0,0 +1,19 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import kalman_filter + +from .kalman_filter import * +from .gmc import * +from .ocsort_kalman_filter import * \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/gmc.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/gmc.py new file mode 100644 index 0000000000000000000000000000000000000000..e99bdcc297b82a9171efce1ea17a917cde67a6dc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/gmc.py @@ -0,0 +1,365 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/WWangYuHsiang/SMILEtrack/blob/main/BoT-SORT/tracker/gmc.py +""" + +import cv2 +import matplotlib.pyplot as plt +import numpy as np +import copy +import time + + +class GMC: + def __init__(self, method='sparseOptFlow', downscale=2, verbose=None): + super(GMC, self).__init__() + + self.method = method + self.downscale = max(1, int(downscale)) + + if self.method == 'orb': + self.detector = cv2.FastFeatureDetector_create(20) + self.extractor = cv2.ORB_create() + self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING) + + elif self.method == 'sift': + self.detector = cv2.SIFT_create( + nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20) + self.extractor = cv2.SIFT_create( + nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20) + self.matcher = cv2.BFMatcher(cv2.NORM_L2) + + elif self.method == 'ecc': + number_of_iterations = 5000 + termination_eps = 1e-6 + self.warp_mode = cv2.MOTION_EUCLIDEAN + self.criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, + number_of_iterations, termination_eps) + + elif self.method == 'sparseOptFlow': + self.feature_params = dict( + maxCorners=1000, + qualityLevel=0.01, + minDistance=1, + blockSize=3, + useHarrisDetector=False, + k=0.04) + # self.gmc_file = open('GMC_results.txt', 'w') + + elif self.method == 'file' or self.method == 'files': + seqName = verbose[0] + ablation = verbose[1] + if ablation: + filePath = r'tracker/GMC_files/MOT17_ablation' + else: + filePath = r'tracker/GMC_files/MOTChallenge' + + if '-FRCNN' in seqName: + seqName = seqName[:-6] + elif '-DPM' in seqName: + seqName = seqName[:-4] + elif '-SDP' in seqName: + seqName = seqName[:-4] + + self.gmcFile = open(filePath + "/GMC-" + seqName + ".txt", 'r') + + if self.gmcFile is None: + raise ValueError("Error: Unable to open GMC file in directory:" + + filePath) + elif self.method == 'none' or self.method == 'None': + self.method = 'none' + else: + raise ValueError("Error: Unknown CMC method:" + method) + + self.prevFrame = None + self.prevKeyPoints = None + self.prevDescriptors = None + + self.initializedFirstFrame = False + + def apply(self, raw_frame, detections=None): + if self.method == 'orb' or self.method == 'sift': + return self.applyFeaures(raw_frame, detections) + elif self.method == 'ecc': + return self.applyEcc(raw_frame, detections) + elif self.method == 'sparseOptFlow': + return self.applySparseOptFlow(raw_frame, detections) + elif self.method == 'file': + return self.applyFile(raw_frame, detections) + elif self.method == 'none': + return np.eye(2, 3) + else: + return np.eye(2, 3) + + def applyEcc(self, raw_frame, detections=None): + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3, dtype=np.float32) + + # Downscale image (TODO: consider using pyramids) + if self.downscale > 1.0: + frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + width = width // self.downscale + height = height // self.downscale + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + + # Initialization done + self.initializedFirstFrame = True + + return H + + # Run the ECC algorithm. The results are stored in warp_matrix. + # (cc, H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, self.criteria) + try: + (cc, + H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, + self.criteria, None, 1) + except: + print('Warning: find transform failed. Set warp as identity') + + return H + + def applyFeaures(self, raw_frame, detections=None): + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3) + + # Downscale image (TODO: consider using pyramids) + if self.downscale > 1.0: + # frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + width = width // self.downscale + height = height // self.downscale + + # find the keypoints + mask = np.zeros_like(frame) + # mask[int(0.05 * height): int(0.95 * height), int(0.05 * width): int(0.95 * width)] = 255 + mask[int(0.02 * height):int(0.98 * height), int(0.02 * width):int( + 0.98 * width)] = 255 + if detections is not None: + for det in detections: + tlbr = (det[:4] / self.downscale).astype(np.int_) + mask[tlbr[1]:tlbr[3], tlbr[0]:tlbr[2]] = 0 + + keypoints = self.detector.detect(frame, mask) + + # compute the descriptors + keypoints, descriptors = self.extractor.compute(frame, keypoints) + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + # Initialization done + self.initializedFirstFrame = True + + return H + + # Match descriptors. + knnMatches = self.matcher.knnMatch(self.prevDescriptors, descriptors, 2) + + # Filtered matches based on smallest spatial distance + matches = [] + spatialDistances = [] + + maxSpatialDistance = 0.25 * np.array([width, height]) + + # Handle empty matches case + if len(knnMatches) == 0: + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + return H + + for m, n in knnMatches: + if m.distance < 0.9 * n.distance: + prevKeyPointLocation = self.prevKeyPoints[m.queryIdx].pt + currKeyPointLocation = keypoints[m.trainIdx].pt + + spatialDistance = ( + prevKeyPointLocation[0] - currKeyPointLocation[0], + prevKeyPointLocation[1] - currKeyPointLocation[1]) + + if (np.abs(spatialDistance[0]) < maxSpatialDistance[0]) and \ + (np.abs(spatialDistance[1]) < maxSpatialDistance[1]): + spatialDistances.append(spatialDistance) + matches.append(m) + + meanSpatialDistances = np.mean(spatialDistances, 0) + stdSpatialDistances = np.std(spatialDistances, 0) + + inliesrs = (spatialDistances - meanSpatialDistances + ) < 2.5 * stdSpatialDistances + + goodMatches = [] + prevPoints = [] + currPoints = [] + for i in range(len(matches)): + if inliesrs[i, 0] and inliesrs[i, 1]: + goodMatches.append(matches[i]) + prevPoints.append(self.prevKeyPoints[matches[i].queryIdx].pt) + currPoints.append(keypoints[matches[i].trainIdx].pt) + + prevPoints = np.array(prevPoints) + currPoints = np.array(currPoints) + + # Draw the keypoint matches on the output image + if 0: + matches_img = np.hstack((self.prevFrame, frame)) + matches_img = cv2.cvtColor(matches_img, cv2.COLOR_GRAY2BGR) + W = np.size(self.prevFrame, 1) + for m in goodMatches: + prev_pt = np.array( + self.prevKeyPoints[m.queryIdx].pt, dtype=np.int_) + curr_pt = np.array(keypoints[m.trainIdx].pt, dtype=np.int_) + curr_pt[0] += W + color = np.random.randint(0, 255, (3, )) + color = (int(color[0]), int(color[1]), int(color[2])) + + matches_img = cv2.line(matches_img, prev_pt, curr_pt, + tuple(color), 1, cv2.LINE_AA) + matches_img = cv2.circle(matches_img, prev_pt, 2, + tuple(color), -1) + matches_img = cv2.circle(matches_img, curr_pt, 2, + tuple(color), -1) + + plt.figure() + plt.imshow(matches_img) + plt.show() + + # Find rigid matrix + if (np.size(prevPoints, 0) > 4) and ( + np.size(prevPoints, 0) == np.size(prevPoints, 0)): + H, inliesrs = cv2.estimateAffinePartial2D(prevPoints, currPoints, + cv2.RANSAC) + + # Handle downscale + if self.downscale > 1.0: + H[0, 2] *= self.downscale + H[1, 2] *= self.downscale + else: + print('Warning: not enough matching points') + + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + return H + + def applySparseOptFlow(self, raw_frame, detections=None): + + t0 = time.time() + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3) + + # Downscale image + if self.downscale > 1.0: + # frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + + # find the keypoints + keypoints = cv2.goodFeaturesToTrack( + frame, mask=None, **self.feature_params) + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + + # Initialization done + self.initializedFirstFrame = True + + return H + + if self.prevFrame.shape != frame.shape: + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + return H + + # find correspondences + matchedKeypoints, status, err = cv2.calcOpticalFlowPyrLK( + self.prevFrame, frame, self.prevKeyPoints, None) + + # leave good correspondences only + prevPoints = [] + currPoints = [] + + for i in range(len(status)): + if status[i]: + prevPoints.append(self.prevKeyPoints[i]) + currPoints.append(matchedKeypoints[i]) + + prevPoints = np.array(prevPoints) + currPoints = np.array(currPoints) + + # Find rigid matrix + if (np.size(prevPoints, 0) > 4) and ( + np.size(prevPoints, 0) == np.size(prevPoints, 0)): + H, inliesrs = cv2.estimateAffinePartial2D(prevPoints, currPoints, + cv2.RANSAC) + + # Handle downscale + if self.downscale > 1.0: + H[0, 2] *= self.downscale + H[1, 2] *= self.downscale + else: + print('Warning: not enough matching points') + + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + + t1 = time.time() + + # gmc_line = str(1000 * (t1 - t0)) + "\t" + str(H[0, 0]) + "\t" + str(H[0, 1]) + "\t" + str( + # H[0, 2]) + "\t" + str(H[1, 0]) + "\t" + str(H[1, 1]) + "\t" + str(H[1, 2]) + "\n" + # self.gmc_file.write(gmc_line) + + return H + + def applyFile(self, raw_frame, detections=None): + line = self.gmcFile.readline() + tokens = line.split("\t") + H = np.eye(2, 3, dtype=np.float_) + H[0, 0] = float(tokens[1]) + H[0, 1] = float(tokens[2]) + H[0, 2] = float(tokens[3]) + H[1, 0] = float(tokens[4]) + H[1, 1] = float(tokens[5]) + H[1, 2] = float(tokens[6]) + + return H diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/kalman_filter.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/kalman_filter.py new file mode 100644 index 0000000000000000000000000000000000000000..18951aabd6fbdebc69191dea7b07da1dbea8d52c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/kalman_filter.py @@ -0,0 +1,316 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/kalman_filter.py +""" + +import numpy as np +import scipy.linalg + +use_numba = True +try: + import numba as nb + + @nb.njit(fastmath=True, cache=True) + def nb_project(mean, covariance, std, _update_mat): + innovation_cov = np.diag(np.square(std)) + mean = np.dot(_update_mat, mean) + covariance = np.dot(np.dot(_update_mat, covariance), _update_mat.T) + return mean, covariance + innovation_cov + + @nb.njit(fastmath=True, cache=True) + def nb_multi_predict(mean, covariance, motion_cov, motion_mat): + mean = np.dot(mean, motion_mat.T) + left = np.dot(motion_mat, covariance) + covariance = np.dot(left, motion_mat.T) + motion_cov + return mean, covariance + + @nb.njit(fastmath=True, cache=True) + def nb_update(mean, covariance, proj_mean, proj_cov, measurement, meas_mat): + kalman_gain = np.linalg.solve(proj_cov, (covariance @meas_mat.T).T).T + innovation = measurement - proj_mean + mean = mean + innovation @kalman_gain.T + covariance = covariance - kalman_gain @proj_cov @kalman_gain.T + return mean, covariance + +except: + use_numba = False + print( + 'Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`' + ) + pass + +__all__ = ['KalmanFilter'] +""" +Table for the 0.95 quantile of the chi-square distribution with N degrees of +freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv +function and used as Mahalanobis gating threshold. +""" + +chi2inv95 = { + 1: 3.8415, + 2: 5.9915, + 3: 7.8147, + 4: 9.4877, + 5: 11.070, + 6: 12.592, + 7: 14.067, + 8: 15.507, + 9: 16.919 +} + + +class KalmanFilter(object): + """ + A simple Kalman filter for tracking bounding boxes in image space. + + The 8-dimensional state space + + x, y, a, h, vx, vy, va, vh + + contains the bounding box center position (x, y), aspect ratio a, height h, + and their respective velocities. + + Object motion follows a constant velocity model. The bounding box location + (x, y, a, h) is taken as direct observation of the state space (linear + observation model). + + """ + + def __init__(self): + ndim, dt = 4, 1. + + # Create Kalman filter model matrices. + self._motion_mat = np.eye(2 * ndim, 2 * ndim, dtype=np.float32) + for i in range(ndim): + self._motion_mat[i, ndim + i] = dt + self._update_mat = np.eye(ndim, 2 * ndim, dtype=np.float32) + + # Motion and observation uncertainty are chosen relative to the current + # state estimate. These weights control the amount of uncertainty in + # the model. This is a bit hacky. + self._std_weight_position = 1. / 20 + self._std_weight_velocity = 1. / 160 + + def initiate(self, measurement): + """ + Create track from unassociated measurement. + + Args: + measurement (ndarray): Bounding box coordinates (x, y, a, h) with + center position (x, y), aspect ratio a, and height h. + + Returns: + The mean vector (8 dimensional) and covariance matrix (8x8 + dimensional) of the new track. Unobserved velocities are + initialized to 0 mean. + """ + mean_pos = measurement + mean_vel = np.zeros_like(mean_pos) + mean = np.r_[mean_pos, mean_vel] + + std = [ + 2 * self._std_weight_position * measurement[3], + 2 * self._std_weight_position * measurement[3], 1e-2, + 2 * self._std_weight_position * measurement[3], + 10 * self._std_weight_velocity * measurement[3], + 10 * self._std_weight_velocity * measurement[3], 1e-5, + 10 * self._std_weight_velocity * measurement[3] + ] + covariance = np.diag(np.square(std)) + return mean, np.float32(covariance) + + def predict(self, mean, covariance): + """ + Run Kalman filter prediction step. + + Args: + mean (ndarray): The 8 dimensional mean vector of the object state + at the previous time step. + covariance (ndarray): The 8x8 dimensional covariance matrix of the + object state at the previous time step. + + Returns: + The mean vector and covariance matrix of the predicted state. + Unobserved velocities are initialized to 0 mean. + """ + std_pos = [ + self._std_weight_position * mean[3], self._std_weight_position * + mean[3], 1e-2, self._std_weight_position * mean[3] + ] + std_vel = [ + self._std_weight_velocity * mean[3], self._std_weight_velocity * + mean[3], 1e-5, self._std_weight_velocity * mean[3] + ] + motion_cov = np.diag(np.square(np.r_[std_pos, std_vel])) + + #mean = np.dot(self._motion_mat, mean) + mean = np.dot(mean, self._motion_mat.T) + covariance = np.linalg.multi_dot( + (self._motion_mat, covariance, self._motion_mat.T)) + motion_cov + + return mean, covariance + + def project(self, mean, covariance): + """ + Project state distribution to measurement space. + + Args + mean (ndarray): The state's mean vector (8 dimensional array). + covariance (ndarray): The state's covariance matrix (8x8 dimensional). + + Returns: + The projected mean and covariance matrix of the given state estimate. + """ + std = np.array( + [ + self._std_weight_position * mean[3], self._std_weight_position * + mean[3], 1e-1, self._std_weight_position * mean[3] + ], + dtype=np.float32) + + if use_numba: + return nb_project(mean, covariance, std, self._update_mat) + + innovation_cov = np.diag(np.square(std)) + + mean = np.dot(self._update_mat, mean) + covariance = np.linalg.multi_dot((self._update_mat, covariance, + self._update_mat.T)) + return mean, covariance + innovation_cov + + def multi_predict(self, mean, covariance): + """ + Run Kalman filter prediction step (Vectorized version). + + Args: + mean (ndarray): The Nx8 dimensional mean matrix of the object states + at the previous time step. + covariance (ndarray): The Nx8x8 dimensional covariance matrics of the + object states at the previous time step. + + Returns: + The mean vector and covariance matrix of the predicted state. + Unobserved velocities are initialized to 0 mean. + """ + std_pos = np.array([ + self._std_weight_position * mean[:, 3], self._std_weight_position * + mean[:, 3], 1e-2 * np.ones_like(mean[:, 3]), + self._std_weight_position * mean[:, 3] + ]) + std_vel = np.array([ + self._std_weight_velocity * mean[:, 3], self._std_weight_velocity * + mean[:, 3], 1e-5 * np.ones_like(mean[:, 3]), + self._std_weight_velocity * mean[:, 3] + ]) + sqr = np.square(np.r_[std_pos, std_vel]).T + + if use_numba: + + means = [] + covariances = [] + for i in range(len(mean)): + a, b = nb_multi_predict(mean[i], covariance[i], + np.diag(sqr[i]), self._motion_mat) + means.append(a) + covariances.append(b) + return np.asarray(means), np.asarray(covariances) + + motion_cov = [] + for i in range(len(mean)): + motion_cov.append(np.diag(sqr[i])) + motion_cov = np.asarray(motion_cov) + + mean = np.dot(mean, self._motion_mat.T) + left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2)) + covariance = np.dot(left, self._motion_mat.T) + motion_cov + + return mean, covariance + + def update(self, mean, covariance, measurement): + """ + Run Kalman filter correction step. + + Args: + mean (ndarray): The predicted state's mean vector (8 dimensional). + covariance (ndarray): The state's covariance matrix (8x8 dimensional). + measurement (ndarray): The 4 dimensional measurement vector + (x, y, a, h), where (x, y) is the center position, a the aspect + ratio, and h the height of the bounding box. + + Returns: + The measurement-corrected state distribution. + """ + projected_mean, projected_cov = self.project(mean, covariance) + + if use_numba: + + return nb_update(mean, covariance, projected_mean, projected_cov, + measurement, self._update_mat) + + kalman_gain = np.linalg.solve(projected_cov, + (covariance @self._update_mat.T).T).T + innovation = measurement - projected_mean + mean = mean + innovation @kalman_gain.T + covariance = covariance - kalman_gain @projected_cov @kalman_gain.T + return mean, covariance + + def gating_distance(self, + mean, + covariance, + measurements, + only_position=False, + metric='maha'): + """ + Compute gating distance between state distribution and measurements. + A suitable distance threshold can be obtained from `chi2inv95`. If + `only_position` is False, the chi-square distribution has 4 degrees of + freedom, otherwise 2. + + Args: + mean (ndarray): Mean vector over the state distribution (8 + dimensional). + covariance (ndarray): Covariance of the state distribution (8x8 + dimensional). + measurements (ndarray): An Nx4 dimensional matrix of N measurements, + each in format (x, y, a, h) where (x, y) is the bounding box center + position, a the aspect ratio, and h the height. + only_position (Optional[bool]): If True, distance computation is + done with respect to the bounding box center position only. + metric (str): Metric type, 'gaussian' or 'maha'. + + Returns + An array of length N, where the i-th element contains the squared + Mahalanobis distance between (mean, covariance) and `measurements[i]`. + """ + mean, covariance = self.project(mean, covariance) + if only_position: + mean, covariance = mean[:2], covariance[:2, :2] + measurements = measurements[:, :2] + + d = measurements - mean + if metric == 'gaussian': + return np.sum(d * d, axis=1) + elif metric == 'maha': + cholesky_factor = np.linalg.cholesky(covariance) + z = scipy.linalg.solve_triangular( + cholesky_factor, + d.T, + lower=True, + check_finite=False, + overwrite_b=True) + squared_maha = np.sum(z * z, axis=0) + return squared_maha + else: + raise ValueError('invalid distance metric') diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/ocsort_kalman_filter.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/ocsort_kalman_filter.py new file mode 100644 index 0000000000000000000000000000000000000000..8cfd9c5970bea7b9c135389348ef9238446359cf --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/motion/ocsort_kalman_filter.py @@ -0,0 +1,93 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/danbochman/SORT/blob/danny_opencv/kalman_filter.py +""" + +import numpy as np +from numpy import dot, zeros, eye +from numpy.linalg import inv + +use_numba = True +try: + import numba as nb + + @nb.njit(fastmath=True, cache=True) + def nb_predict(x, F, P, Q): + x = dot(F, x) + P = dot(dot(F, P), F.T) + Q + return x, P + + @nb.njit(fastmath=True, cache=True) + def nb_update(x, z, H, P, R, _I): + + y = z - np.dot(H, x) + PHT = dot(P, H.T) + + S = dot(H, PHT) + R + K = dot(PHT, inv(S)) + + x = x + dot(K, y) + + I_KH = _I - dot(K, H) + P = dot(dot(I_KH, P), I_KH.T) + dot(dot(K, R), K.T) + return x, P +except: + use_numba = False + print( + 'Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`' + ) + pass + + +class OCSORTKalmanFilter: + def __init__(self, dim_x, dim_z): + self.dim_x = dim_x + self.dim_z = dim_z + self.x = zeros((dim_x, 1)) + self.P = eye(dim_x) + self.Q = eye(dim_x) + self.F = eye(dim_x) + self.H = zeros((dim_z, dim_x)) + self.R = eye(dim_z) + self.M = zeros((dim_z, dim_z)) + + self._I = eye(dim_x) + + def predict(self): + if use_numba: + self.x, self.P = nb_predict(self.x, self.F, self.P, self.Q) + else: + self.x = dot(self.F, self.x) + self.P = dot(dot(self.F, self.P), self.F.T) + self.Q + + def update(self, z): + + if z is None: + return + + if use_numba: + self.x, self.P = nb_update(self.x, z, self.H, self.P, self.R, + self._I) + else: + y = z - np.dot(self.H, self.x) + PHT = dot(self.P, self.H.T) + + S = dot(self.H, PHT) + self.R + K = dot(PHT, inv(S)) + + self.x = self.x + dot(K, y) + + I_KH = self._I - dot(K, self.H) + self.P = dot(dot(I_KH, self.P), I_KH.T) + dot(dot(K, self.R), K.T) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/__init__.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..4d0c4f3b6616ab2419d43242a5b3cb33651a4a75 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/__init__.py @@ -0,0 +1,24 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import utils +from . import postprocess +from .utils import * +from .postprocess import * + +# The following codes are strongly related to zone and camera parameters +from . import camera_utils +from . import zone +from .camera_utils import * +from .zone import * diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/camera_utils.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/camera_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..445e6386cff826742e8f7f7d5171ca247e148b67 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/camera_utils.py @@ -0,0 +1,288 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/LCFractal/AIC21-MTMC/tree/main/reid/reid-matching/tools + +Note: The following codes are strongly related to camera parameters of the AIC21 test-set S06, + so they can only be used in S06, and can not be used for other MTMCT datasets. +""" + +import numpy as np +try: + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + pass +from .utils import get_dire, get_match, get_cid_tid, combin_feature, combin_cluster +from .utils import normalize, intracam_ignore, visual_rerank + +__all__ = [ + 'st_filter', + 'get_labels_with_camera', +] + +CAM_DIST = [[0, 40, 55, 100, 120, 145], [40, 0, 15, 60, 80, 105], + [55, 15, 0, 40, 65, 90], [100, 60, 40, 0, 20, 45], + [120, 80, 65, 20, 0, 25], [145, 105, 90, 45, 25, 0]] + + +def st_filter(st_mask, cid_tids, cid_tid_dict): + count = len(cid_tids) + for i in range(count): + i_tracklet = cid_tid_dict[cid_tids[i]] + i_cid = i_tracklet['cam'] + i_dire = get_dire(i_tracklet['zone_list'], i_cid) + i_iot = i_tracklet['io_time'] + for j in range(count): + j_tracklet = cid_tid_dict[cid_tids[j]] + j_cid = j_tracklet['cam'] + j_dire = get_dire(j_tracklet['zone_list'], j_cid) + j_iot = j_tracklet['io_time'] + + match_dire = True + cam_dist = CAM_DIST[i_cid - 41][j_cid - 41] + # if time overlopped + if i_iot[0] - cam_dist < j_iot[0] and j_iot[0] < i_iot[ + 1] + cam_dist: + match_dire = False + if i_iot[0] - cam_dist < j_iot[1] and j_iot[1] < i_iot[ + 1] + cam_dist: + match_dire = False + + # not match after go out + if i_dire[1] in [1, 2]: # i out + if i_iot[0] < j_iot[1] + cam_dist: + match_dire = False + + if i_dire[1] in [1, 2]: + if i_dire[0] in [3] and i_cid > j_cid: + match_dire = False + if i_dire[0] in [4] and i_cid < j_cid: + match_dire = False + + if i_cid in [41] and i_dire[1] in [4]: + if i_iot[0] < j_iot[1] + cam_dist: + match_dire = False + if i_iot[1] > 199: + match_dire = False + if i_cid in [46] and i_dire[1] in [3]: + if i_iot[0] < j_iot[1] + cam_dist: + match_dire = False + + # match after come into + if i_dire[0] in [1, 2]: + if i_iot[1] > j_iot[0] - cam_dist: + match_dire = False + + if i_dire[0] in [1, 2]: + if i_dire[1] in [3] and i_cid > j_cid: + match_dire = False + if i_dire[1] in [4] and i_cid < j_cid: + match_dire = False + + is_ignore = False + if ((i_dire[0] == i_dire[1] and i_dire[0] in [3, 4]) or + (j_dire[0] == j_dire[1] and j_dire[0] in [3, 4])): + is_ignore = True + + if not is_ignore: + # direction conflict + if (i_dire[0] in [3] and j_dire[0] in [4]) or ( + i_dire[1] in [3] and j_dire[1] in [4]): + match_dire = False + # filter before going next scene + if i_dire[1] in [3] and i_cid < j_cid: + if i_iot[1] > j_iot[1] - cam_dist: + match_dire = False + if i_dire[1] in [4] and i_cid > j_cid: + if i_iot[1] > j_iot[1] - cam_dist: + match_dire = False + + if i_dire[0] in [3] and i_cid < j_cid: + if i_iot[0] < j_iot[0] + cam_dist: + match_dire = False + if i_dire[0] in [4] and i_cid > j_cid: + if i_iot[0] < j_iot[0] + cam_dist: + match_dire = False + ## 3-30 + ## 4-1 + if i_dire[0] in [3] and i_cid > j_cid: + if i_iot[1] > j_iot[0] - cam_dist: + match_dire = False + if i_dire[0] in [4] and i_cid < j_cid: + if i_iot[1] > j_iot[0] - cam_dist: + match_dire = False + # filter before going next scene + ## 4-7 + if i_dire[1] in [3] and i_cid > j_cid: + if i_iot[0] < j_iot[1] + cam_dist: + match_dire = False + if i_dire[1] in [4] and i_cid < j_cid: + if i_iot[0] < j_iot[1] + cam_dist: + match_dire = False + else: + if i_iot[1] > 199: + if i_dire[0] in [3] and i_cid < j_cid: + if i_iot[0] < j_iot[0] + cam_dist: + match_dire = False + if i_dire[0] in [4] and i_cid > j_cid: + if i_iot[0] < j_iot[0] + cam_dist: + match_dire = False + if i_dire[0] in [3] and i_cid > j_cid: + match_dire = False + if i_dire[0] in [4] and i_cid < j_cid: + match_dire = False + if i_iot[0] < 1: + if i_dire[1] in [3] and i_cid > j_cid: + match_dire = False + if i_dire[1] in [4] and i_cid < j_cid: + match_dire = False + + if not match_dire: + st_mask[i, j] = 0.0 + st_mask[j, i] = 0.0 + return st_mask + + +def subcam_list(cid_tid_dict, cid_tids): + sub_3_4 = dict() + sub_4_3 = dict() + for cid_tid in cid_tids: + cid, tid = cid_tid + tracklet = cid_tid_dict[cid_tid] + zs, ze = get_dire(tracklet['zone_list'], cid) + if zs in [3] and cid not in [46]: # 4 to 3 + if not cid + 1 in sub_4_3: + sub_4_3[cid + 1] = [] + sub_4_3[cid + 1].append(cid_tid) + if ze in [4] and cid not in [41]: # 4 to 3 + if not cid in sub_4_3: + sub_4_3[cid] = [] + sub_4_3[cid].append(cid_tid) + if zs in [4] and cid not in [41]: # 3 to 4 + if not cid - 1 in sub_3_4: + sub_3_4[cid - 1] = [] + sub_3_4[cid - 1].append(cid_tid) + if ze in [3] and cid not in [46]: # 3 to 4 + if not cid in sub_3_4: + sub_3_4[cid] = [] + sub_3_4[cid].append(cid_tid) + sub_cid_tids = dict() + for i in sub_3_4: + sub_cid_tids[(i, i + 1)] = sub_3_4[i] + for i in sub_4_3: + sub_cid_tids[(i, i - 1)] = sub_4_3[i] + return sub_cid_tids + + +def subcam_list2(cid_tid_dict, cid_tids): + sub_dict = dict() + for cid_tid in cid_tids: + cid, tid = cid_tid + if cid not in [41]: + if not cid in sub_dict: + sub_dict[cid] = [] + sub_dict[cid].append(cid_tid) + if cid not in [46]: + if not cid + 1 in sub_dict: + sub_dict[cid + 1] = [] + sub_dict[cid + 1].append(cid_tid) + return sub_dict + + +def get_sim_matrix(cid_tid_dict, + cid_tids, + use_ff=True, + use_rerank=True, + use_st_filter=False): + # Note: camera releated get_sim_matrix function, + # which is different from the one in utils.py. + count = len(cid_tids) + + q_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + g_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + q_arr = normalize(q_arr, axis=1) + g_arr = normalize(g_arr, axis=1) + + st_mask = np.ones((count, count), dtype=np.float32) + st_mask = intracam_ignore(st_mask, cid_tids) + + # different from utils.py + if use_st_filter: + st_mask = st_filter(st_mask, cid_tids, cid_tid_dict) + + visual_sim_matrix = visual_rerank( + q_arr, g_arr, cid_tids, use_ff=use_ff, use_rerank=use_rerank) + visual_sim_matrix = visual_sim_matrix.astype('float32') + + np.set_printoptions(precision=3) + sim_matrix = visual_sim_matrix * st_mask + + np.fill_diagonal(sim_matrix, 0) + return sim_matrix + + +def get_labels_with_camera(cid_tid_dict, + cid_tids, + use_ff=True, + use_rerank=True, + use_st_filter=False): + # 1st cluster + sub_cid_tids = subcam_list(cid_tid_dict, cid_tids) + sub_labels = dict() + dis_thrs = [0.7, 0.5, 0.5, 0.5, 0.5, 0.7, 0.5, 0.5, 0.5, 0.5] + + for i, sub_c_to_c in enumerate(sub_cid_tids): + sim_matrix = get_sim_matrix( + cid_tid_dict, + sub_cid_tids[sub_c_to_c], + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + cluster_labels = AgglomerativeClustering( + n_clusters=None, + distance_threshold=1 - dis_thrs[i], + affinity='precomputed', + linkage='complete').fit_predict(1 - sim_matrix) + labels = get_match(cluster_labels) + cluster_cid_tids = get_cid_tid(labels, sub_cid_tids[sub_c_to_c]) + sub_labels[sub_c_to_c] = cluster_cid_tids + labels, sub_cluster = combin_cluster(sub_labels, cid_tids) + + # 2nd cluster + cid_tid_dict_new = combin_feature(cid_tid_dict, sub_cluster) + sub_cid_tids = subcam_list2(cid_tid_dict_new, cid_tids) + sub_labels = dict() + for i, sub_c_to_c in enumerate(sub_cid_tids): + sim_matrix = get_sim_matrix( + cid_tid_dict_new, + sub_cid_tids[sub_c_to_c], + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + cluster_labels = AgglomerativeClustering( + n_clusters=None, + distance_threshold=1 - 0.1, + affinity='precomputed', + linkage='complete').fit_predict(1 - sim_matrix) + labels = get_match(cluster_labels) + cluster_cid_tids = get_cid_tid(labels, sub_cid_tids[sub_c_to_c]) + sub_labels[sub_c_to_c] = cluster_cid_tids + labels, sub_cluster = combin_cluster(sub_labels, cid_tids) + + return labels diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/postprocess.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..32be0466d0ed2899d01d192f82561f5e30cc9ad9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/postprocess.py @@ -0,0 +1,383 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/LCFractal/AIC21-MTMC/tree/main/reid/reid-matching/tools +""" + +import os +import re +import cv2 +from tqdm import tqdm +import numpy as np +try: + import motmetrics as mm +except: + print( + 'Warning: Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass +from functools import reduce + +from .utils import parse_pt_gt, parse_pt, compare_dataframes_mtmc +from .utils import get_labels, getData, gen_new_mot +from .camera_utils import get_labels_with_camera +from .zone import Zone +from ..visualize import plot_tracking + +__all__ = [ + 'trajectory_fusion', + 'sub_cluster', + 'gen_res', + 'print_mtmct_result', + 'get_mtmct_matching_results', + 'save_mtmct_crops', + 'save_mtmct_vis_results', +] + + +def trajectory_fusion(mot_feature, cid, cid_bias, use_zone=False, zone_path=''): + cur_bias = cid_bias[cid] + mot_list_break = {} + if use_zone: + zones = Zone(zone_path=zone_path) + zones.set_cam(cid) + mot_list = parse_pt(mot_feature, zones) + else: + mot_list = parse_pt(mot_feature) + + if use_zone: + mot_list = zones.break_mot(mot_list, cid) + mot_list = zones.filter_mot(mot_list, cid) # filter by zone + mot_list = zones.filter_bbox(mot_list, cid) # filter bbox + + mot_list_break = gen_new_mot(mot_list) # save break feature for gen result + + tid_data = dict() + for tid in mot_list: + tracklet = mot_list[tid] + if len(tracklet) <= 1: + continue + frame_list = list(tracklet.keys()) + frame_list.sort() + # filter area too large + zone_list = [tracklet[f]['zone'] for f in frame_list] + feature_list = [ + tracklet[f]['feat'] for f in frame_list + if (tracklet[f]['bbox'][3] - tracklet[f]['bbox'][1]) * + (tracklet[f]['bbox'][2] - tracklet[f]['bbox'][0]) > 2000 + ] + if len(feature_list) < 2: + feature_list = [tracklet[f]['feat'] for f in frame_list] + io_time = [ + cur_bias + frame_list[0] / 10., cur_bias + frame_list[-1] / 10. + ] + all_feat = np.array([feat for feat in feature_list]) + mean_feat = np.mean(all_feat, axis=0) + tid_data[tid] = { + 'cam': cid, + 'tid': tid, + 'mean_feat': mean_feat, + 'zone_list': zone_list, + 'frame_list': frame_list, + 'tracklet': tracklet, + 'io_time': io_time + } + return tid_data, mot_list_break + + +def sub_cluster(cid_tid_dict, + scene_cluster, + use_ff=True, + use_rerank=True, + use_camera=False, + use_st_filter=False): + ''' + cid_tid_dict: all camera_id and track_id + scene_cluster: like [41, 42, 43, 44, 45, 46] in AIC21 MTMCT S06 test videos + ''' + assert (len(scene_cluster) != 0), "Error: scene_cluster length equals 0" + cid_tids = sorted( + [key for key in cid_tid_dict.keys() if key[0] in scene_cluster]) + if use_camera: + clu = get_labels_with_camera( + cid_tid_dict, + cid_tids, + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + else: + clu = get_labels( + cid_tid_dict, + cid_tids, + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + new_clu = list() + for c_list in clu: + if len(c_list) <= 1: continue + cam_list = [cid_tids[c][0] for c in c_list] + if len(cam_list) != len(set(cam_list)): continue + new_clu.append([cid_tids[c] for c in c_list]) + all_clu = new_clu + cid_tid_label = dict() + for i, c_list in enumerate(all_clu): + for c in c_list: + cid_tid_label[c] = i + 1 + return cid_tid_label + + +def gen_res(output_dir_filename, + scene_cluster, + map_tid, + mot_list_breaks, + use_roi=False, + roi_dir=''): + f_w = open(output_dir_filename, 'w') + for idx, mot_feature in enumerate(mot_list_breaks): + cid = scene_cluster[idx] + img_rects = parse_pt_gt(mot_feature) + if use_roi: + assert (roi_dir != ''), "Error: roi_dir is not empty!" + roi = cv2.imread(os.path.join(roi_dir, f'c{cid:03d}/roi.jpg'), 0) + height, width = roi.shape + + for fid in img_rects: + tid_rects = img_rects[fid] + fid = int(fid) + 1 + for tid_rect in tid_rects: + tid = tid_rect[0] + rect = tid_rect[1:] + cx = 0.5 * rect[0] + 0.5 * rect[2] + cy = 0.5 * rect[1] + 0.5 * rect[3] + w = rect[2] - rect[0] + w = min(w * 1.2, w + 40) + h = rect[3] - rect[1] + h = min(h * 1.2, h + 40) + rect[2] -= rect[0] + rect[3] -= rect[1] + rect[0] = max(0, rect[0]) + rect[1] = max(0, rect[1]) + x1, y1 = max(0, cx - 0.5 * w), max(0, cy - 0.5 * h) + if use_roi: + x2, y2 = min(width, cx + 0.5 * w), min(height, cy + 0.5 * h) + else: + x2, y2 = cx + 0.5 * w, cy + 0.5 * h + w, h = x2 - x1, y2 - y1 + new_rect = list(map(int, [x1, y1, w, h])) + rect = list(map(int, rect)) + if (cid, tid) in map_tid: + new_tid = map_tid[(cid, tid)] + f_w.write( + str(cid) + ' ' + str(new_tid) + ' ' + str(fid) + ' ' + + ' '.join(map(str, new_rect)) + ' -1 -1' + '\n') + print('gen_res: write file in {}'.format(output_dir_filename)) + f_w.close() + + +def print_mtmct_result(gt_file, pred_file): + names = [ + 'CameraId', 'Id', 'FrameId', 'X', 'Y', 'Width', 'Height', 'Xworld', + 'Yworld' + ] + gt = getData(gt_file, names=names) + pred = getData(pred_file, names=names) + summary = compare_dataframes_mtmc(gt, pred) + print('MTMCT summary: ', summary.columns.tolist()) + + formatters = { + 'idf1': '{:2.2f}'.format, + 'idp': '{:2.2f}'.format, + 'idr': '{:2.2f}'.format, + 'mota': '{:2.2f}'.format + } + summary = summary[['idf1', 'idp', 'idr', 'mota']] + summary.loc[:, 'idp'] *= 100 + summary.loc[:, 'idr'] *= 100 + summary.loc[:, 'idf1'] *= 100 + summary.loc[:, 'mota'] *= 100 + try: + import motmetrics as mm + except Exception as e: + raise RuntimeError( + 'Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + print( + mm.io.render_summary( + summary, + formatters=formatters, + namemap=mm.io.motchallenge_metric_names)) + + +def get_mtmct_matching_results(pred_mtmct_file, secs_interval=0.5, + video_fps=20): + res = np.loadtxt(pred_mtmct_file) # 'cid, tid, fid, x1, y1, w, h, -1, -1' + camera_ids = list(map(int, np.unique(res[:, 0]))) + + res = res[:, :7] + # each line in res: 'cid, tid, fid, x1, y1, w, h' + + camera_tids = [] + camera_results = dict() + for c_id in camera_ids: + camera_results[c_id] = res[res[:, 0] == c_id] + tids = np.unique(camera_results[c_id][:, 1]) + tids = list(map(int, tids)) + camera_tids.append(tids) + + # select common tids throughout each video + common_tids = reduce(np.intersect1d, camera_tids) + if len(common_tids) == 0: + print( + 'No common tracked ids in these videos, please check your MOT result or select new videos.' + ) + return None, None + + # get mtmct matching results by cid_tid_fid_results[c_id][t_id][f_id] + cid_tid_fid_results = dict() + cid_tid_to_fids = dict() + interval = int(secs_interval * video_fps) # preferably less than 10 + for c_id in camera_ids: + cid_tid_fid_results[c_id] = dict() + cid_tid_to_fids[c_id] = dict() + for t_id in common_tids: + tid_mask = camera_results[c_id][:, 1] == t_id + cid_tid_fid_results[c_id][t_id] = dict() + + camera_trackid_results = camera_results[c_id][tid_mask] + fids = np.unique(camera_trackid_results[:, 2]) + fids = fids[fids % interval == 0] + fids = list(map(int, fids)) + cid_tid_to_fids[c_id][t_id] = fids + + for f_id in fids: + st_frame = f_id + ed_frame = f_id + interval + + st_mask = camera_trackid_results[:, 2] >= st_frame + ed_mask = camera_trackid_results[:, 2] < ed_frame + frame_mask = np.logical_and(st_mask, ed_mask) + cid_tid_fid_results[c_id][t_id][f_id] = camera_trackid_results[ + frame_mask] + + return camera_results, cid_tid_fid_results + + +def save_mtmct_crops(cid_tid_fid_res, + images_dir, + crops_dir, + width=300, + height=200): + camera_ids = cid_tid_fid_res.keys() + seqs_folder = os.listdir(images_dir) + seqs = [] + for x in seqs_folder: + if os.path.isdir(os.path.join(images_dir, x)): + seqs.append(x) + assert len(seqs) == len(camera_ids) + seqs.sort() + + if not os.path.exists(crops_dir): + os.makedirs(crops_dir) + + common_tids = list(cid_tid_fid_res[list(camera_ids)[0]].keys()) + + # get crops by name 'tid_cid_fid.jpg + for t_id in common_tids: + for i, c_id in enumerate(camera_ids): + infer_dir = os.path.join(images_dir, seqs[i]) + if os.path.exists(os.path.join(infer_dir, 'img1')): + infer_dir = os.path.join(infer_dir, 'img1') + all_images = os.listdir(infer_dir) + all_images.sort() + + for f_id in cid_tid_fid_res[c_id][t_id].keys(): + frame_idx = f_id - 1 if f_id > 0 else 0 + im_path = os.path.join(infer_dir, all_images[frame_idx]) + + im = cv2.imread(im_path) # (H, W, 3) + + # only select one track + track = cid_tid_fid_res[c_id][t_id][f_id][0] + + cid, tid, fid, x1, y1, w, h = [int(v) for v in track] + clip = im[y1:(y1 + h), x1:(x1 + w)] + clip = cv2.resize(clip, (width, height)) + + cv2.imwrite( + os.path.join(crops_dir, + 'tid{:06d}_cid{:06d}_fid{:06d}.jpg'.format( + tid, cid, fid)), clip) + + print("Finish cropping image of tracked_id {} in camera: {}".format( + t_id, c_id)) + + +def save_mtmct_vis_results(camera_results, + images_dir, + save_dir, + save_videos=False): + # camera_results: 'cid, tid, fid, x1, y1, w, h' + camera_ids = camera_results.keys() + seqs_folder = os.listdir(images_dir) + seqs = [] + for x in seqs_folder: + if os.path.isdir(os.path.join(images_dir, x)): + seqs.append(x) + assert len(seqs) == len(camera_ids) + seqs.sort() + + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + for i, c_id in enumerate(camera_ids): + print("Start visualization for camera {} of sequence {}.".format( + c_id, seqs[i])) + cid_save_dir = os.path.join(save_dir, '{}'.format(seqs[i])) + if not os.path.exists(cid_save_dir): + os.makedirs(cid_save_dir) + + infer_dir = os.path.join(images_dir, seqs[i]) + if os.path.exists(os.path.join(infer_dir, 'img1')): + infer_dir = os.path.join(infer_dir, 'img1') + all_images = os.listdir(infer_dir) + all_images.sort() + + for f_id, im_path in enumerate(all_images): + img = cv2.imread(os.path.join(infer_dir, im_path)) + tracks = camera_results[c_id][camera_results[c_id][:, 2] == f_id] + if tracks.shape[0] > 0: + tracked_ids = tracks[:, 1] + xywhs = tracks[:, 3:] + online_im = plot_tracking( + img, xywhs, tracked_ids, scores=None, frame_id=f_id) + else: + online_im = img + print('Frame {} of seq {} has no tracking results'.format( + f_id, seqs[i])) + + cv2.imwrite( + os.path.join(cid_save_dir, '{:05d}.jpg'.format(f_id)), + online_im) + if f_id % 40 == 0: + print('Processing frame {}'.format(f_id)) + + if save_videos: + output_video_path = os.path.join(cid_save_dir, '..', + '{}_mtmct_vis.mp4'.format(seqs[i])) + cmd_str = 'ffmpeg -f image2 -i {}/%05d.jpg {}'.format( + cid_save_dir, output_video_path) + os.system(cmd_str) + print('Save camera {} video in {}.'.format(seqs[i], + output_video_path)) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/utils.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f0b52aa67638b6659e18b470ae384720a68f5294 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/utils.py @@ -0,0 +1,604 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/LCFractal/AIC21-MTMC/tree/main/reid/reid-matching/tools +""" + +import os +import re +import cv2 +import gc +import numpy as np +import pandas as pd +from tqdm import tqdm +import warnings +warnings.filterwarnings("ignore") + +__all__ = [ + 'parse_pt', 'parse_bias', 'get_dire', 'parse_pt_gt', + 'compare_dataframes_mtmc', 'get_sim_matrix', 'get_labels', 'getData', + 'gen_new_mot' +] + + +def parse_pt(mot_feature, zones=None): + mot_list = dict() + for line in mot_feature: + fid = int(re.sub('[a-z,A-Z]', "", mot_feature[line]['frame'])) + tid = mot_feature[line]['id'] + bbox = list(map(lambda x: int(float(x)), mot_feature[line]['bbox'])) + if tid not in mot_list: + mot_list[tid] = dict() + out_dict = mot_feature[line] + if zones is not None: + out_dict['zone'] = zones.get_zone(bbox) + else: + out_dict['zone'] = None + mot_list[tid][fid] = out_dict + return mot_list + + +def gen_new_mot(mot_list): + out_dict = dict() + for tracklet in mot_list: + tracklet = mot_list[tracklet] + for f in tracklet: + out_dict[tracklet[f]['imgname']] = tracklet[f] + return out_dict + + +def mergesetfeat1_notrk(P, neg_vector, in_feats, in_labels): + out_feats = [] + for i in range(in_feats.shape[0]): + camera_id = in_labels[i, 1] + feat = in_feats[i] - neg_vector[camera_id] + feat = P[camera_id].dot(feat) + feat = feat / np.linalg.norm(feat, ord=2) + out_feats.append(feat) + out_feats = np.vstack(out_feats) + return out_feats + + +def compute_P2(prb_feats, gal_feats, gal_labels, la=3.0): + X = gal_feats + neg_vector = {} + u_labels = np.unique(gal_labels[:, 1]) + P = {} + for label in u_labels: + curX = gal_feats[gal_labels[:, 1] == label, :] + neg_vector[label] = np.mean(curX, axis=0) + P[label] = np.linalg.inv( + curX.T.dot(curX) + curX.shape[0] * la * np.eye(X.shape[1])) + return P, neg_vector + + +def parse_bias(cameras_bias): + cid_bias = dict() + for cameras in cameras_bias.keys(): + cameras_id = re.sub('[a-z,A-Z]', "", cameras) + cameras_id = int(cameras_id) + bias = cameras_bias[cameras] + cid_bias[cameras_id] = float(bias) + return cid_bias + + +def get_dire(zone_list, cid): + zs, ze = zone_list[0], zone_list[-1] + return (zs, ze) + + +def intracam_ignore(st_mask, cid_tids): + count = len(cid_tids) + for i in range(count): + for j in range(count): + if cid_tids[i][0] == cid_tids[j][0]: + st_mask[i, j] = 0. + return st_mask + + +def mergesetfeat(in_feats, in_labels, in_tracks): + trackset = list(set(list(in_tracks))) + out_feats = [] + out_labels = [] + for track in trackset: + feat = np.mean(in_feats[in_tracks == track], axis=0) + feat = feat / np.linalg.norm(feat, ord=2) + label = in_labels[in_tracks == track][0] + out_feats.append(feat) + out_labels.append(label) + out_feats = np.vstack(out_feats) + out_labels = np.vstack(out_labels) + return out_feats, out_labels + + +def mergesetfeat3(X, labels, gX, glabels, beta=0.08, knn=20, lr=0.5): + for i in range(0, X.shape[0]): + if i % 1000 == 0: + print('feat3:%d/%d' % (i, X.shape[0])) + knnX = gX[glabels[:, 1] != labels[i, 1], :] + sim = knnX.dot(X[i, :]) + knnX = knnX[sim > 0, :] + sim = sim[sim > 0] + if len(sim) > 0: + idx = np.argsort(-sim) + if len(sim) > 2 * knn: + sim = sim[idx[:2 * knn]] + knnX = knnX[idx[:2 * knn], :] + else: + sim = sim[idx] + knnX = knnX[idx, :] + knn = min(knn, len(sim)) + knn_pos_weight = np.exp((sim[:knn] - 1) / beta) + knn_neg_weight = np.ones(len(sim) - knn) + knn_pos_prob = knn_pos_weight / np.sum(knn_pos_weight) + knn_neg_prob = knn_neg_weight / np.sum(knn_neg_weight) + X[i, :] += lr * (knn_pos_prob.dot(knnX[:knn, :]) - + knn_neg_prob.dot(knnX[knn:, :])) + X[i, :] /= np.linalg.norm(X[i, :]) + return X + + +def run_fic(prb_feats, gal_feats, prb_labels, gal_labels, la=3.0): + P, neg_vector = compute_P2(prb_feats, gal_feats, gal_labels, la) + prb_feats_new = mergesetfeat1_notrk(P, neg_vector, prb_feats, prb_labels) + gal_feats_new = mergesetfeat1_notrk(P, neg_vector, gal_feats, gal_labels) + return prb_feats_new, gal_feats_new + + +def run_fac(prb_feats, + gal_feats, + prb_labels, + gal_labels, + beta=0.08, + knn=20, + lr=0.5, + prb_epoch=2, + gal_epoch=3): + gal_feats_new = gal_feats.copy() + for i in range(prb_epoch): + gal_feats_new = mergesetfeat3(gal_feats_new, gal_labels, gal_feats, + gal_labels, beta, knn, lr) + prb_feats_new = prb_feats.copy() + for i in range(gal_epoch): + prb_feats_new = mergesetfeat3(prb_feats_new, prb_labels, gal_feats_new, + gal_labels, beta, knn, lr) + return prb_feats_new, gal_feats_new + + +def euclidean_distance(qf, gf): + m = qf.shape[0] + n = gf.shape[0] + dist_mat = 2 - 2 * np.matmul(qf, gf.T) + return dist_mat + + +def find_topk(a, k, axis=-1, largest=True, sorted=True): + if axis is None: + axis_size = a.size + else: + axis_size = a.shape[axis] + assert 1 <= k <= axis_size + + a = np.asanyarray(a) + if largest: + index_array = np.argpartition(a, axis_size - k, axis=axis) + topk_indices = np.take(index_array, -np.arange(k) - 1, axis=axis) + else: + index_array = np.argpartition(a, k - 1, axis=axis) + topk_indices = np.take(index_array, np.arange(k), axis=axis) + topk_values = np.take_along_axis(a, topk_indices, axis=axis) + if sorted: + sorted_indices_in_topk = np.argsort(topk_values, axis=axis) + if largest: + sorted_indices_in_topk = np.flip(sorted_indices_in_topk, axis=axis) + sorted_topk_values = np.take_along_axis( + topk_values, sorted_indices_in_topk, axis=axis) + sorted_topk_indices = np.take_along_axis( + topk_indices, sorted_indices_in_topk, axis=axis) + return sorted_topk_values, sorted_topk_indices + return topk_values, topk_indices + + +def batch_numpy_topk(qf, gf, k1, N=6000): + m = qf.shape[0] + n = gf.shape[0] + initial_rank = [] + for j in range(n // N + 1): + temp_gf = gf[j * N:j * N + N] + temp_qd = [] + for i in range(m // N + 1): + temp_qf = qf[i * N:i * N + N] + temp_d = euclidean_distance(temp_qf, temp_gf) + temp_qd.append(temp_d) + temp_qd = np.concatenate(temp_qd, axis=0) + temp_qd = temp_qd / (np.max(temp_qd, axis=0)[0]) + temp_qd = temp_qd.T + initial_rank.append( + find_topk( + temp_qd, k=k1, axis=1, largest=False, sorted=True)[1]) + del temp_qd + del temp_gf + del temp_qf + del temp_d + initial_rank = np.concatenate(initial_rank, axis=0) + return initial_rank + + +def batch_euclidean_distance(qf, gf, N=6000): + m = qf.shape[0] + n = gf.shape[0] + dist_mat = [] + for j in range(n // N + 1): + temp_gf = gf[j * N:j * N + N] + temp_qd = [] + for i in range(m // N + 1): + temp_qf = qf[i * N:i * N + N] + temp_d = euclidean_distance(temp_qf, temp_gf) + temp_qd.append(temp_d) + temp_qd = np.concatenate(temp_qd, axis=0) + temp_qd = temp_qd / (np.max(temp_qd, axis=0)[0]) + dist_mat.append(temp_qd.T) + del temp_qd + del temp_gf + del temp_qf + del temp_d + dist_mat = np.concatenate(dist_mat, axis=0) + return dist_mat + + +def batch_v(feat, R, all_num): + V = np.zeros((all_num, all_num), dtype=np.float32) + m = feat.shape[0] + for i in tqdm(range(m)): + temp_gf = feat[i].reshape(1, -1) + temp_qd = euclidean_distance(temp_gf, feat) + temp_qd = temp_qd / (np.max(temp_qd)) + temp_qd = temp_qd.reshape(-1) + temp_qd = temp_qd[R[i].tolist()] + weight = np.exp(-temp_qd) + weight = weight / np.sum(weight) + V[i, R[i]] = weight.astype(np.float32) + return V + + +def k_reciprocal_neigh(initial_rank, i, k1): + forward_k_neigh_index = initial_rank[i, :k1 + 1] + backward_k_neigh_index = initial_rank[forward_k_neigh_index, :k1 + 1] + fi = np.where(backward_k_neigh_index == i)[0] + return forward_k_neigh_index[fi] + + +def ReRank2(probFea, galFea, k1=20, k2=6, lambda_value=0.3): + query_num = probFea.shape[0] + all_num = query_num + galFea.shape[0] + feat = np.concatenate((probFea, galFea), axis=0) + + initial_rank = batch_numpy_topk(feat, feat, k1 + 1, N=6000) + del probFea + del galFea + gc.collect() # empty memory + R = [] + for i in tqdm(range(all_num)): + # k-reciprocal neighbors + k_reciprocal_index = k_reciprocal_neigh(initial_rank, i, k1) + k_reciprocal_expansion_index = k_reciprocal_index + for j in range(len(k_reciprocal_index)): + candidate = k_reciprocal_index[j] + candidate_k_reciprocal_index = k_reciprocal_neigh( + initial_rank, candidate, int(np.around(k1 / 2))) + if len( + np.intersect1d(candidate_k_reciprocal_index, + k_reciprocal_index)) > 2. / 3 * len( + candidate_k_reciprocal_index): + k_reciprocal_expansion_index = np.append( + k_reciprocal_expansion_index, candidate_k_reciprocal_index) + k_reciprocal_expansion_index = np.unique(k_reciprocal_expansion_index) + R.append(k_reciprocal_expansion_index) + + gc.collect() # empty memory + V = batch_v(feat, R, all_num) + del R + gc.collect() # empty memory + initial_rank = initial_rank[:, :k2] + + # Faster version + if k2 != 1: + V_qe = np.zeros_like(V, dtype=np.float16) + for i in range(all_num): + V_qe[i, :] = np.mean(V[initial_rank[i], :], axis=0) + V = V_qe + del V_qe + del initial_rank + gc.collect() # empty memory + invIndex = [] + for i in range(all_num): + invIndex.append(np.where(V[:, i] != 0)[0]) + jaccard_dist = np.zeros((query_num, all_num), dtype=np.float32) + for i in tqdm(range(query_num)): + temp_min = np.zeros(shape=[1, all_num], dtype=np.float32) + indNonZero = np.where(V[i, :] != 0)[0] + indImages = [invIndex[ind] for ind in indNonZero] + for j in range(len(indNonZero)): + temp_min[0, indImages[j]] = temp_min[0, indImages[j]] + np.minimum( + V[i, indNonZero[j]], V[indImages[j], indNonZero[j]]) + jaccard_dist[i] = 1 - temp_min / (2. - temp_min) + del V + gc.collect() # empty memory + original_dist = batch_euclidean_distance(feat, feat[:query_num, :]) + final_dist = jaccard_dist * (1 - lambda_value + ) + original_dist * lambda_value + del original_dist + del jaccard_dist + final_dist = final_dist[:query_num, query_num:] + return final_dist + + +def visual_rerank(prb_feats, + gal_feats, + cid_tids, + use_ff=False, + use_rerank=False): + """Rerank by visual cures.""" + gal_labels = np.array([[0, item[0]] for item in cid_tids]) + prb_labels = gal_labels.copy() + if use_ff: + print('current use ff finetuned parameters....') + # Step1-1: fic. finetuned parameters: [la] + prb_feats, gal_feats = run_fic(prb_feats, gal_feats, prb_labels, + gal_labels, 3.0) + # Step1=2: fac. finetuned parameters: [beta,knn,lr,prb_epoch,gal_epoch] + prb_feats, gal_feats = run_fac(prb_feats, gal_feats, prb_labels, + gal_labels, 0.08, 20, 0.5, 1, 1) + if use_rerank: + print('current use rerank finetuned parameters....') + # Step2: k-reciprocal. finetuned parameters: [k1,k2,lambda_value] + sims = ReRank2(prb_feats, gal_feats, 20, 3, 0.3) + else: + sims = 1.0 - np.dot(prb_feats, gal_feats.T) + + # NOTE: sims here is actually dist, the smaller the more similar + return 1.0 - sims + + +def normalize(nparray, axis=0): + try: + from sklearn import preprocessing + except Exception as e: + raise RuntimeError( + 'Unable to use sklearn in MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + nparray = preprocessing.normalize(nparray, norm='l2', axis=axis) + return nparray + + +def get_match(cluster_labels): + cluster_dict = dict() + cluster = list() + for i, l in enumerate(cluster_labels): + if l in list(cluster_dict.keys()): + cluster_dict[l].append(i) + else: + cluster_dict[l] = [i] + for idx in cluster_dict: + cluster.append(cluster_dict[idx]) + return cluster + + +def get_cid_tid(cluster_labels, cid_tids): + cluster = list() + for labels in cluster_labels: + cid_tid_list = list() + for label in labels: + cid_tid_list.append(cid_tids[label]) + cluster.append(cid_tid_list) + return cluster + + +def combin_feature(cid_tid_dict, sub_cluster): + for sub_ct in sub_cluster: + if len(sub_ct) < 2: continue + mean_feat = np.array([cid_tid_dict[i]['mean_feat'] for i in sub_ct]) + for i in sub_ct: + cid_tid_dict[i]['mean_feat'] = mean_feat.mean(axis=0) + return cid_tid_dict + + +def combin_cluster(sub_labels, cid_tids): + cluster = list() + for sub_c_to_c in sub_labels: + if len(cluster) < 1: + cluster = sub_labels[sub_c_to_c] + continue + for c_ts in sub_labels[sub_c_to_c]: + is_add = False + for i_c, c_set in enumerate(cluster): + if len(set(c_ts) & set(c_set)) > 0: + new_list = list(set(c_ts) | set(c_set)) + cluster[i_c] = new_list + is_add = True + break + if not is_add: + cluster.append(c_ts) + labels = list() + num_tr = 0 + for c_ts in cluster: + label_list = list() + for c_t in c_ts: + label_list.append(cid_tids.index(c_t)) + num_tr += 1 + label_list.sort() + labels.append(label_list) + return labels, cluster + + +def parse_pt_gt(mot_feature): + img_rects = dict() + for line in mot_feature: + fid = int(re.sub('[a-z,A-Z]', "", mot_feature[line]['frame'])) + tid = mot_feature[line]['id'] + rect = list(map(lambda x: int(float(x)), mot_feature[line]['bbox'])) + if fid not in img_rects: + img_rects[fid] = list() + rect.insert(0, tid) + img_rects[fid].append(rect) + return img_rects + + +# eval result +def compare_dataframes_mtmc(gts, ts): + try: + import motmetrics as mm + except Exception as e: + raise RuntimeError( + 'Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + """Compute ID-based evaluation metrics for MTMCT + Return: + df (pandas.DataFrame): Results of the evaluations in a df with only the 'idf1', 'idp', and 'idr' columns. + """ + gtds = [] + tsds = [] + gtcams = gts['CameraId'].drop_duplicates().tolist() + tscams = ts['CameraId'].drop_duplicates().tolist() + maxFrameId = 0 + + for k in sorted(gtcams): + gtd = gts.query('CameraId == %d' % k) + gtd = gtd[['FrameId', 'Id', 'X', 'Y', 'Width', 'Height']] + # max FrameId in gtd only + mfid = gtd['FrameId'].max() + gtd['FrameId'] += maxFrameId + gtd = gtd.set_index(['FrameId', 'Id']) + gtds.append(gtd) + + if k in tscams: + tsd = ts.query('CameraId == %d' % k) + tsd = tsd[['FrameId', 'Id', 'X', 'Y', 'Width', 'Height']] + # max FrameId among both gtd and tsd + mfid = max(mfid, tsd['FrameId'].max()) + tsd['FrameId'] += maxFrameId + tsd = tsd.set_index(['FrameId', 'Id']) + tsds.append(tsd) + + maxFrameId += mfid + + # compute multi-camera tracking evaluation stats + multiCamAcc = mm.utils.compare_to_groundtruth( + pd.concat(gtds), pd.concat(tsds), 'iou') + metrics = list(mm.metrics.motchallenge_metrics) + metrics.extend(['num_frames', 'idfp', 'idfn', 'idtp']) + mh = mm.metrics.create() + summary = mh.compute(multiCamAcc, metrics=metrics, name='MultiCam') + return summary + + +def get_sim_matrix(cid_tid_dict, + cid_tids, + use_ff=True, + use_rerank=True, + use_st_filter=False): + # Note: camera independent get_sim_matrix function, + # which is different from the one in camera_utils.py. + count = len(cid_tids) + + q_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + g_arr = np.array( + [cid_tid_dict[cid_tids[i]]['mean_feat'] for i in range(count)]) + q_arr = normalize(q_arr, axis=1) + g_arr = normalize(g_arr, axis=1) + + st_mask = np.ones((count, count), dtype=np.float32) + st_mask = intracam_ignore(st_mask, cid_tids) + + visual_sim_matrix = visual_rerank( + q_arr, g_arr, cid_tids, use_ff=use_ff, use_rerank=use_rerank) + visual_sim_matrix = visual_sim_matrix.astype('float32') + + np.set_printoptions(precision=3) + sim_matrix = visual_sim_matrix * st_mask + + np.fill_diagonal(sim_matrix, 0) + return sim_matrix + + +def get_labels(cid_tid_dict, + cid_tids, + use_ff=True, + use_rerank=True, + use_st_filter=False): + try: + from sklearn.cluster import AgglomerativeClustering + except Exception as e: + raise RuntimeError( + 'Unable to use sklearn in MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + # 1st cluster + sim_matrix = get_sim_matrix( + cid_tid_dict, + cid_tids, + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + cluster_labels = AgglomerativeClustering( + n_clusters=None, + distance_threshold=0.5, + affinity='precomputed', + linkage='complete').fit_predict(1 - sim_matrix) + labels = get_match(cluster_labels) + sub_cluster = get_cid_tid(labels, cid_tids) + + # 2nd cluster + cid_tid_dict_new = combin_feature(cid_tid_dict, sub_cluster) + sim_matrix = get_sim_matrix( + cid_tid_dict_new, + cid_tids, + use_ff=use_ff, + use_rerank=use_rerank, + use_st_filter=use_st_filter) + cluster_labels = AgglomerativeClustering( + n_clusters=None, + distance_threshold=0.9, + affinity='precomputed', + linkage='complete').fit_predict(1 - sim_matrix) + labels = get_match(cluster_labels) + sub_cluster = get_cid_tid(labels, cid_tids) + + return labels + + +def getData(fpath, names=None, sep='\s+|\t+|,'): + """ Get the necessary track data from a file handle. + Args: + fpath (str) : Original path of file reading from. + names (list[str]): List of column names for the data. + sep (str): Allowed separators regular expression string. + Return: + df (pandas.DataFrame): Data frame containing the data loaded from the + stream with optionally assigned column names. No index is set on the data. + """ + try: + df = pd.read_csv( + fpath, + sep=sep, + index_col=None, + skipinitialspace=True, + header=None, + names=names, + engine='python') + return df + + except Exception as e: + raise ValueError("Could not read input from %s. Error: %s" % + (fpath, repr(e))) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/zone.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/zone.py new file mode 100644 index 0000000000000000000000000000000000000000..f3fa162e573550a4f497fb7ddd3e578796df6176 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/mtmct/zone.py @@ -0,0 +1,412 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/LCFractal/AIC21-MTMC/tree/main/reid/reid-matching/tools + +Note: The following codes are strongly related to zone of the AIC21 test-set S06, + so they can only be used in S06, and can not be used for other MTMCT datasets. +""" + +import os +import cv2 +import numpy as np +try: + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + pass + +BBOX_B = 10 / 15 + + +class Zone(object): + def __init__(self, zone_path='datasets/zone'): + # 0: b 1: g 3: r 123:w + # w r not high speed + # b g high speed + assert zone_path != '', "Error: zone_path is not empty!" + zones = {} + for img_name in os.listdir(zone_path): + camnum = int(img_name.split('.')[0][-3:]) + zone_img = cv2.imread(os.path.join(zone_path, img_name)) + zones[camnum] = zone_img + self.zones = zones + self.current_cam = 0 + + def set_cam(self, cam): + self.current_cam = cam + + def get_zone(self, bbox): + cx = int((bbox[0] + bbox[2]) / 2) + cy = int((bbox[1] + bbox[3]) / 2) + pix = self.zones[self.current_cam][max(cy - 1, 0), max(cx - 1, 0), :] + zone_num = 0 + if pix[0] > 50 and pix[1] > 50 and pix[2] > 50: # w + zone_num = 1 + if pix[0] < 50 and pix[1] < 50 and pix[2] > 50: # r + zone_num = 2 + if pix[0] < 50 and pix[1] > 50 and pix[2] < 50: # g + zone_num = 3 + if pix[0] > 50 and pix[1] < 50 and pix[2] < 50: # b + zone_num = 4 + return zone_num + + def is_ignore(self, zone_list, frame_list, cid): + # 0 not in any corssroad, 1 white 2 red 3 green 4 bule + zs, ze = zone_list[0], zone_list[-1] + fs, fe = frame_list[0], frame_list[-1] + if zs == ze: + # if always on one section, excluding + if ze in [1, 2]: + return 2 + if zs != 0 and 0 in zone_list: + return 0 + if fe - fs > 1500: + return 2 + if fs < 2: + if cid in [45]: + if ze in [3, 4]: + return 1 + else: + return 2 + if fe > 1999: + if cid in [41]: + if ze not in [3]: + return 2 + else: + return 0 + if fs < 2 or fe > 1999: + if ze in [3, 4]: + return 0 + if ze in [3, 4]: + return 1 + return 2 + else: + # if camera section change + if cid in [41, 42, 43, 44, 45, 46]: + # come from road extension, exclusing + if zs == 1 and ze == 2: + return 2 + if zs == 2 and ze == 1: + return 2 + if cid in [41]: + # On 41 camera, no vehicle come into 42 camera + if (zs in [1, 2]) and ze == 4: + return 2 + if zs == 4 and (ze in [1, 2]): + return 2 + if cid in [46]: + # On 46 camera,no vehicle come into 45 + if (zs in [1, 2]) and ze == 3: + return 2 + if zs == 3 and (ze in [1, 2]): + return 2 + return 0 + + def filter_mot(self, mot_list, cid): + new_mot_list = dict() + sub_mot_list = dict() + for tracklet in mot_list: + tracklet_dict = mot_list[tracklet] + frame_list = list(tracklet_dict.keys()) + frame_list.sort() + zone_list = [] + for f in frame_list: + zone_list.append(tracklet_dict[f]['zone']) + if self.is_ignore(zone_list, frame_list, cid) == 0: + new_mot_list[tracklet] = tracklet_dict + if self.is_ignore(zone_list, frame_list, cid) == 1: + sub_mot_list[tracklet] = tracklet_dict + return new_mot_list + + def filter_bbox(self, mot_list, cid): + new_mot_list = dict() + yh = self.zones[cid].shape[0] + for tracklet in mot_list: + tracklet_dict = mot_list[tracklet] + frame_list = list(tracklet_dict.keys()) + frame_list.sort() + bbox_list = [] + for f in frame_list: + bbox_list.append(tracklet_dict[f]['bbox']) + bbox_x = [b[0] for b in bbox_list] + bbox_y = [b[1] for b in bbox_list] + bbox_w = [b[2] - b[0] for b in bbox_list] + bbox_h = [b[3] - b[1] for b in bbox_list] + new_frame_list = list() + if 0 in bbox_x or 0 in bbox_y: + b0 = [ + i for i, f in enumerate(frame_list) + if bbox_x[i] < 5 or bbox_y[i] + bbox_h[i] > yh - 5 + ] + if len(b0) == len(frame_list): + if cid in [41, 42, 44, 45, 46]: + continue + max_w = max(bbox_w) + max_h = max(bbox_h) + for i, f in enumerate(frame_list): + if bbox_w[i] > max_w * BBOX_B and bbox_h[ + i] > max_h * BBOX_B: + new_frame_list.append(f) + else: + l_i, r_i = 0, len(frame_list) - 1 + if len(b0) == 0: + continue + if b0[0] == 0: + for i in range(len(b0) - 1): + if b0[i] + 1 == b0[i + 1]: + l_i = b0[i + 1] + else: + break + if b0[-1] == len(frame_list) - 1: + for i in range(len(b0) - 1): + i = len(b0) - 1 - i + if b0[i] - 1 == b0[i - 1]: + r_i = b0[i - 1] + else: + break + + max_lw, max_lh = bbox_w[l_i], bbox_h[l_i] + max_rw, max_rh = bbox_w[r_i], bbox_h[r_i] + for i, f in enumerate(frame_list): + if i < l_i: + if bbox_w[i] > max_lw * BBOX_B and bbox_h[ + i] > max_lh * BBOX_B: + new_frame_list.append(f) + elif i > r_i: + if bbox_w[i] > max_rw * BBOX_B and bbox_h[ + i] > max_rh * BBOX_B: + new_frame_list.append(f) + else: + new_frame_list.append(f) + new_tracklet_dict = dict() + for f in new_frame_list: + new_tracklet_dict[f] = tracklet_dict[f] + new_mot_list[tracklet] = new_tracklet_dict + else: + new_mot_list[tracklet] = tracklet_dict + return new_mot_list + + def break_mot(self, mot_list, cid): + new_mot_list = dict() + new_num_tracklets = max(mot_list) + 1 + for tracklet in mot_list: + tracklet_dict = mot_list[tracklet] + frame_list = list(tracklet_dict.keys()) + frame_list.sort() + zone_list = [] + back_tracklet = False + new_zone_f = 0 + pre_frame = frame_list[0] + time_break = False + for f in frame_list: + if f - pre_frame > 100: + if cid in [44, 45]: + time_break = True + break + if not cid in [41, 44, 45, 46]: + break + pre_frame = f + new_zone = tracklet_dict[f]['zone'] + if len(zone_list) > 0 and zone_list[-1] == new_zone: + continue + if new_zone_f > 1: + if len(zone_list) > 1 and new_zone in zone_list: + back_tracklet = True + zone_list.append(new_zone) + new_zone_f = 0 + else: + new_zone_f += 1 + if back_tracklet: + new_tracklet_dict = dict() + pre_bbox = -1 + pre_arrow = 0 + have_break = False + for f in frame_list: + now_bbox = tracklet_dict[f]['bbox'] + if type(pre_bbox) == int: + if pre_bbox == -1: + pre_bbox = now_bbox + now_arrow = now_bbox[0] - pre_bbox[0] + if pre_arrow * now_arrow < 0 and len( + new_tracklet_dict) > 15 and not have_break: + new_mot_list[tracklet] = new_tracklet_dict + new_tracklet_dict = dict() + have_break = True + if have_break: + tracklet_dict[f]['id'] = new_num_tracklets + new_tracklet_dict[f] = tracklet_dict[f] + pre_bbox, pre_arrow = now_bbox, now_arrow + + if have_break: + new_mot_list[new_num_tracklets] = new_tracklet_dict + new_num_tracklets += 1 + else: + new_mot_list[tracklet] = new_tracklet_dict + elif time_break: + new_tracklet_dict = dict() + have_break = False + pre_frame = frame_list[0] + for f in frame_list: + if f - pre_frame > 100: + new_mot_list[tracklet] = new_tracklet_dict + new_tracklet_dict = dict() + have_break = True + new_tracklet_dict[f] = tracklet_dict[f] + pre_frame = f + if have_break: + new_mot_list[new_num_tracklets] = new_tracklet_dict + new_num_tracklets += 1 + else: + new_mot_list[tracklet] = new_tracklet_dict + else: + new_mot_list[tracklet] = tracklet_dict + return new_mot_list + + def intra_matching(self, mot_list, sub_mot_list): + sub_zone_dict = dict() + new_mot_list = dict() + new_mot_list, new_sub_mot_list = self.do_intra_matching2(mot_list, + sub_mot_list) + return new_mot_list + + def do_intra_matching2(self, mot_list, sub_list): + new_zone_dict = dict() + + def get_trac_info(tracklet1): + t1_f = list(tracklet1) + t1_f.sort() + t1_fs = t1_f[0] + t1_fe = t1_f[-1] + t1_zs = tracklet1[t1_fs]['zone'] + t1_ze = tracklet1[t1_fe]['zone'] + t1_boxs = tracklet1[t1_fs]['bbox'] + t1_boxe = tracklet1[t1_fe]['bbox'] + t1_boxs = [(t1_boxs[2] + t1_boxs[0]) / 2, + (t1_boxs[3] + t1_boxs[1]) / 2] + t1_boxe = [(t1_boxe[2] + t1_boxe[0]) / 2, + (t1_boxe[3] + t1_boxe[1]) / 2] + return t1_fs, t1_fe, t1_zs, t1_ze, t1_boxs, t1_boxe + + for t1id in sub_list: + tracklet1 = sub_list[t1id] + if tracklet1 == -1: + continue + t1_fs, t1_fe, t1_zs, t1_ze, t1_boxs, t1_boxe = get_trac_info( + tracklet1) + sim_dict = dict() + for t2id in mot_list: + tracklet2 = mot_list[t2id] + t2_fs, t2_fe, t2_zs, t2_ze, t2_boxs, t2_boxe = get_trac_info( + tracklet2) + if t1_ze == t2_zs: + if abs(t2_fs - t1_fe) < 5 and abs(t2_boxe[0] - t1_boxs[ + 0]) < 50 and abs(t2_boxe[1] - t1_boxs[1]) < 50: + t1_feat = tracklet1[t1_fe]['feat'] + t2_feat = tracklet2[t2_fs]['feat'] + sim_dict[t2id] = np.matmul(t1_feat, t2_feat) + if t1_zs == t2_ze: + if abs(t2_fe - t1_fs) < 5 and abs(t2_boxs[0] - t1_boxe[ + 0]) < 50 and abs(t2_boxs[1] - t1_boxe[1]) < 50: + t1_feat = tracklet1[t1_fs]['feat'] + t2_feat = tracklet2[t2_fe]['feat'] + sim_dict[t2id] = np.matmul(t1_feat, t2_feat) + if len(sim_dict) > 0: + max_sim = 0 + max_id = 0 + for t2id in sim_dict: + if sim_dict[t2id] > max_sim: + sim_dict[t2id] = max_sim + max_id = t2id + if max_sim > 0.5: + t2 = mot_list[max_id] + for t1f in tracklet1: + if t1f not in t2: + tracklet1[t1f]['id'] = max_id + t2[t1f] = tracklet1[t1f] + mot_list[max_id] = t2 + sub_list[t1id] = -1 + return mot_list, sub_list + + def do_intra_matching(self, sub_zone_dict, sub_zone): + new_zone_dict = dict() + id_list = list(sub_zone_dict) + id2index = dict() + for index, id in enumerate(id_list): + id2index[id] = index + + def get_trac_info(tracklet1): + t1_f = list(tracklet1) + t1_f.sort() + t1_fs = t1_f[0] + t1_fe = t1_f[-1] + t1_zs = tracklet1[t1_fs]['zone'] + t1_ze = tracklet1[t1_fe]['zone'] + t1_boxs = tracklet1[t1_fs]['bbox'] + t1_boxe = tracklet1[t1_fe]['bbox'] + t1_boxs = [(t1_boxs[2] + t1_boxs[0]) / 2, + (t1_boxs[3] + t1_boxs[1]) / 2] + t1_boxe = [(t1_boxe[2] + t1_boxe[0]) / 2, + (t1_boxe[3] + t1_boxe[1]) / 2] + return t1_fs, t1_fe, t1_zs, t1_ze, t1_boxs, t1_boxe + + sim_matrix = np.zeros([len(id_list), len(id_list)]) + + for t1id in sub_zone_dict: + tracklet1 = sub_zone_dict[t1id] + t1_fs, t1_fe, t1_zs, t1_ze, t1_boxs, t1_boxe = get_trac_info( + tracklet1) + t1_feat = tracklet1[t1_fe]['feat'] + for t2id in sub_zone_dict: + if t1id == t2id: + continue + tracklet2 = sub_zone_dict[t2id] + t2_fs, t2_fe, t2_zs, t2_ze, t2_boxs, t2_boxe = get_trac_info( + tracklet2) + if t1_zs != t1_ze and t2_ze != t2_zs or t1_fe > t2_fs: + continue + if abs(t1_boxe[0] - t2_boxs[0]) > 50 or abs(t1_boxe[1] - + t2_boxs[1]) > 50: + continue + if t2_fs - t1_fe > 5: + continue + t2_feat = tracklet2[t2_fs]['feat'] + sim_matrix[id2index[t1id], id2index[t2id]] = np.matmul(t1_feat, + t2_feat) + sim_matrix[id2index[t2id], id2index[t1id]] = np.matmul(t1_feat, + t2_feat) + sim_matrix = 1 - sim_matrix + cluster_labels = AgglomerativeClustering( + n_clusters=None, + distance_threshold=0.7, + affinity='precomputed', + linkage='complete').fit_predict(sim_matrix) + new_zone_dict = dict() + label2id = dict() + for index, label in enumerate(cluster_labels): + tracklet = sub_zone_dict[id_list[index]] + if label not in label2id: + new_id = tracklet[list(tracklet)[0]] + new_tracklet = dict() + else: + new_id = label2id[label] + new_tracklet = new_zone_dict[label2id[label]] + for tf in tracklet: + tracklet[tf]['id'] = new_id + new_tracklet[tf] = tracklet[tf] + new_zone_dict[label] = new_tracklet + + return new_zone_dict diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/__init__.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a3c4229ed8a0e87c8bf6a138ce6945c9242c6fc4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/__init__.py @@ -0,0 +1,30 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import base_jde_tracker +from . import base_sde_tracker + +from .base_jde_tracker import * +from .base_sde_tracker import * + +from . import jde_tracker +from . import deepsort_tracker +from . import ocsort_tracker +from . import center_tracker + +from .jde_tracker import * +from .deepsort_tracker import * +from .ocsort_tracker import * +from .botsort_tracker import * +from .center_tracker import * diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_jde_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_jde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..860c7fb6217781d2bbe1a40aa3a6cd04592bb15f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_jde_tracker.py @@ -0,0 +1,304 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py +""" + +import numpy as np +from collections import defaultdict +from collections import deque, OrderedDict +from ..matching import jde_matching as matching + +__all__ = [ + 'TrackState', + 'BaseTrack', + 'STrack', + 'joint_stracks', + 'sub_stracks', + 'remove_duplicate_stracks', +] + + +class TrackState(object): + New = 0 + Tracked = 1 + Lost = 2 + Removed = 3 + + +class BaseTrack(object): + _count_dict = defaultdict(int) # support single class and multi classes + + track_id = 0 + is_activated = False + state = TrackState.New + + history = OrderedDict() + features = [] + curr_feat = None + score = 0 + start_frame = 0 + frame_id = 0 + time_since_update = 0 + + # multi-camera + location = (np.inf, np.inf) + + @property + def end_frame(self): + return self.frame_id + + @staticmethod + def next_id(cls_id): + BaseTrack._count_dict[cls_id] += 1 + return BaseTrack._count_dict[cls_id] + + # @even: reset track id + @staticmethod + def init_count(num_classes): + """ + Initiate _count for all object classes + :param num_classes: + """ + for cls_id in range(num_classes): + BaseTrack._count_dict[cls_id] = 0 + + @staticmethod + def reset_track_count(cls_id): + BaseTrack._count_dict[cls_id] = 0 + + def activate(self, *args): + raise NotImplementedError + + def predict(self): + raise NotImplementedError + + def update(self, *args, **kwargs): + raise NotImplementedError + + def mark_lost(self): + self.state = TrackState.Lost + + def mark_removed(self): + self.state = TrackState.Removed + + +class STrack(BaseTrack): + def __init__(self, tlwh, score, cls_id, buff_size=30, temp_feat=None): + # wait activate + self._tlwh = np.asarray(tlwh, dtype=np.float32) + self.score = score + self.cls_id = cls_id + self.track_len = 0 + + self.kalman_filter = None + self.mean, self.covariance = None, None + self.is_activated = False + + self.use_reid = True if temp_feat is not None else False + if self.use_reid: + self.smooth_feat = None + self.update_features(temp_feat) + self.features = deque([], maxlen=buff_size) + self.alpha = 0.9 + + def update_features(self, feat): + # L2 normalizing, this function has no use for BYTETracker + feat /= np.linalg.norm(feat) + self.curr_feat = feat + if self.smooth_feat is None: + self.smooth_feat = feat + else: + self.smooth_feat = self.alpha * self.smooth_feat + (1.0 - self.alpha + ) * feat + self.features.append(feat) + self.smooth_feat /= np.linalg.norm(self.smooth_feat) + + def predict(self): + mean_state = self.mean.copy() + if self.state != TrackState.Tracked: + mean_state[7] = 0 + self.mean, self.covariance = self.kalman_filter.predict(mean_state, + self.covariance) + + @staticmethod + def multi_predict(tracks, kalman_filter): + if len(tracks) > 0: + multi_mean = np.asarray([track.mean.copy() for track in tracks]) + multi_covariance = np.asarray( + [track.covariance for track in tracks]) + for i, st in enumerate(tracks): + if st.state != TrackState.Tracked: + multi_mean[i][7] = 0 + multi_mean, multi_covariance = kalman_filter.multi_predict( + multi_mean, multi_covariance) + for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)): + tracks[i].mean = mean + tracks[i].covariance = cov + + @staticmethod + def multi_gmc(stracks, H=np.eye(2, 3)): + if len(stracks) > 0: + multi_mean = np.asarray([st.mean.copy() for st in stracks]) + multi_covariance = np.asarray([st.covariance for st in stracks]) + + R = H[:2, :2] + R8x8 = np.kron(np.eye(4, dtype=float), R) + t = H[:2, 2] + + for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)): + mean = R8x8.dot(mean) + mean[:2] += t + cov = R8x8.dot(cov).dot(R8x8.transpose()) + + stracks[i].mean = mean + stracks[i].covariance = cov + + def reset_track_id(self): + self.reset_track_count(self.cls_id) + + def activate(self, kalman_filter, frame_id): + """Start a new track""" + self.kalman_filter = kalman_filter + # update track id for the object class + self.track_id = self.next_id(self.cls_id) + self.mean, self.covariance = self.kalman_filter.initiate( + self.tlwh_to_xyah(self._tlwh)) + + self.track_len = 0 + self.state = TrackState.Tracked # set flag 'tracked' + + if frame_id == 1: # to record the first frame's detection result + self.is_activated = True + + self.frame_id = frame_id + self.start_frame = frame_id + + def re_activate(self, new_track, frame_id, new_id=False): + self.mean, self.covariance = self.kalman_filter.update( + self.mean, self.covariance, self.tlwh_to_xyah(new_track.tlwh)) + if self.use_reid: + self.update_features(new_track.curr_feat) + self.track_len = 0 + self.state = TrackState.Tracked + self.is_activated = True + self.frame_id = frame_id + if new_id: # update track id for the object class + self.track_id = self.next_id(self.cls_id) + + def update(self, new_track, frame_id, update_feature=True): + self.frame_id = frame_id + self.track_len += 1 + + new_tlwh = new_track.tlwh + self.mean, self.covariance = self.kalman_filter.update( + self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh)) + self.state = TrackState.Tracked # set flag 'tracked' + self.is_activated = True # set flag 'activated' + + self.score = new_track.score + if update_feature and self.use_reid: + self.update_features(new_track.curr_feat) + + @property + def tlwh(self): + """Get current position in bounding box format `(top left x, top left y, + width, height)`. + """ + if self.mean is None: + return self._tlwh.copy() + + ret = self.mean[:4].copy() + ret[2] *= ret[3] + ret[:2] -= ret[2:] / 2 + return ret + + @property + def tlbr(self): + """Convert bounding box to format `(min x, min y, max x, max y)`, i.e., + `(top left, bottom right)`. + """ + ret = self.tlwh.copy() + ret[2:] += ret[:2] + return ret + + @staticmethod + def tlwh_to_xyah(tlwh): + """Convert bounding box to format `(center x, center y, aspect ratio, + height)`, where the aspect ratio is `width / height`. + """ + ret = np.asarray(tlwh).copy() + ret[:2] += ret[2:] / 2 + ret[2] /= ret[3] + return ret + + def to_xyah(self): + return self.tlwh_to_xyah(self.tlwh) + + @staticmethod + def tlbr_to_tlwh(tlbr): + ret = np.asarray(tlbr).copy() + ret[2:] -= ret[:2] + return ret + + @staticmethod + def tlwh_to_tlbr(tlwh): + ret = np.asarray(tlwh).copy() + ret[2:] += ret[:2] + return ret + + def __repr__(self): + return 'OT_({}-{})_({}-{})'.format(self.cls_id, self.track_id, + self.start_frame, self.end_frame) + + +def joint_stracks(tlista, tlistb): + exists = {} + res = [] + for t in tlista: + exists[t.track_id] = 1 + res.append(t) + for t in tlistb: + tid = t.track_id + if not exists.get(tid, 0): + exists[tid] = 1 + res.append(t) + return res + + +def sub_stracks(tlista, tlistb): + stracks = {} + for t in tlista: + stracks[t.track_id] = t + for t in tlistb: + tid = t.track_id + if stracks.get(tid, 0): + del stracks[tid] + return list(stracks.values()) + + +def remove_duplicate_stracks(stracksa, stracksb): + pdist = matching.iou_distance(stracksa, stracksb) + pairs = np.where(pdist < 0.15) + dupa, dupb = list(), list() + for p, q in zip(*pairs): + timep = stracksa[p].frame_id - stracksa[p].start_frame + timeq = stracksb[q].frame_id - stracksb[q].start_frame + if timep > timeq: + dupb.append(q) + else: + dupa.append(p) + resa = [t for i, t in enumerate(stracksa) if not i in dupa] + resb = [t for i, t in enumerate(stracksb) if not i in dupb] + return resa, resb diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_sde_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_sde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..1d50c7ee19d47a63e7a2e7774931156559d25b52 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/base_sde_tracker.py @@ -0,0 +1,153 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py +""" + +import datetime + +__all__ = ['TrackState', 'Track'] + + +class TrackState(object): + """ + Enumeration type for the single target track state. Newly created tracks are + classified as `tentative` until enough evidence has been collected. Then, + the track state is changed to `confirmed`. Tracks that are no longer alive + are classified as `deleted` to mark them for removal from the set of active + tracks. + """ + Tentative = 1 + Confirmed = 2 + Deleted = 3 + + +class Track(object): + """ + A single target track with state space `(x, y, a, h)` and associated + velocities, where `(x, y)` is the center of the bounding box, `a` is the + aspect ratio and `h` is the height. + + Args: + mean (ndarray): Mean vector of the initial state distribution. + covariance (ndarray): Covariance matrix of the initial state distribution. + track_id (int): A unique track identifier. + n_init (int): Number of consecutive detections before the track is confirmed. + The track state is set to `Deleted` if a miss occurs within the first + `n_init` frames. + max_age (int): The maximum number of consecutive misses before the track + state is set to `Deleted`. + cls_id (int): The category id of the tracked box. + score (float): The confidence score of the tracked box. + feature (Optional[ndarray]): Feature vector of the detection this track + originates from. If not None, this feature is added to the `features` cache. + + Attributes: + hits (int): Total number of measurement updates. + age (int): Total number of frames since first occurance. + time_since_update (int): Total number of frames since last measurement + update. + state (TrackState): The current track state. + features (List[ndarray]): A cache of features. On each measurement update, + the associated feature vector is added to this list. + """ + + def __init__(self, + mean, + covariance, + track_id, + n_init, + max_age, + cls_id, + score, + feature=None): + self.mean = mean + self.covariance = covariance + self.track_id = track_id + self.hits = 1 + self.age = 1 + self.time_since_update = 0 + self.cls_id = cls_id + self.score = score + self.start_time = datetime.datetime.now() + + self.state = TrackState.Tentative + self.features = [] + self.feat = feature + if feature is not None: + self.features.append(feature) + + self._n_init = n_init + self._max_age = max_age + + def to_tlwh(self): + """Get position in format `(top left x, top left y, width, height)`.""" + ret = self.mean[:4].copy() + ret[2] *= ret[3] + ret[:2] -= ret[2:] / 2 + return ret + + def to_tlbr(self): + """Get position in bounding box format `(min x, miny, max x, max y)`.""" + ret = self.to_tlwh() + ret[2:] = ret[:2] + ret[2:] + return ret + + def predict(self, kalman_filter): + """ + Propagate the state distribution to the current time step using a Kalman + filter prediction step. + """ + self.mean, self.covariance = kalman_filter.predict(self.mean, + self.covariance) + self.age += 1 + self.time_since_update += 1 + + def update(self, kalman_filter, detection): + """ + Perform Kalman filter measurement update step and update the associated + detection feature cache. + """ + self.mean, self.covariance = kalman_filter.update(self.mean, + self.covariance, + detection.to_xyah()) + self.features.append(detection.feature) + self.feat = detection.feature + self.cls_id = detection.cls_id + self.score = detection.score + + self.hits += 1 + self.time_since_update = 0 + if self.state == TrackState.Tentative and self.hits >= self._n_init: + self.state = TrackState.Confirmed + + def mark_missed(self): + """Mark this track as missed (no association at the current time step). + """ + if self.state == TrackState.Tentative: + self.state = TrackState.Deleted + elif self.time_since_update > self._max_age: + self.state = TrackState.Deleted + + def is_tentative(self): + """Returns True if this track is tentative (unconfirmed).""" + return self.state == TrackState.Tentative + + def is_confirmed(self): + """Returns True if this track is confirmed.""" + return self.state == TrackState.Confirmed + + def is_deleted(self): + """Returns True if this track is dead and should be deleted.""" + return self.state == TrackState.Deleted diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/botsort_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/botsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..7aa83b5318753619eb8e9b3d0471f04f44c7722c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/botsort_tracker.py @@ -0,0 +1,238 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/WWangYuHsiang/SMILEtrack/blob/main/BoT-SORT/tracker/bot_sort.py +""" + +import cv2 +import matplotlib.pyplot as plt +import numpy as np +from collections import deque + +from ..matching import jde_matching as matching +from ..motion import GMC +from .base_jde_tracker import TrackState, STrack +from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks +from ..motion import KalmanFilter + + +class BOTSORTTracker(object): + """ + BOTSORT tracker, support single class + + Args: + track_high_thresh (float): threshold of detection high score + track_low_thresh (float): threshold of remove detection score + new_track_thresh (float): threshold of new track score + match_thresh (float): iou threshold for associate + track_buffer (int): tracking reserved frames,default 30 + min_box_area (float): reserved min box + camera_motion (bool): Whether use camera motion, default False + cmc_method (str): camera motion method,defalut sparseOptFlow + frame_rate (int): fps buffer_size=int(frame_rate / 30.0 * track_buffer) + """ + + def __init__(self, + track_high_thresh=0.3, + track_low_thresh=0.2, + new_track_thresh=0.4, + match_thresh=0.7, + track_buffer=30, + min_box_area=0, + camera_motion=False, + cmc_method='sparseOptFlow', + frame_rate=30): + + self.tracked_stracks = [] # type: list[STrack] + self.lost_stracks = [] # type: list[STrack] + self.removed_stracks = [] # type: list[STrack] + + self.frame_id = 0 + + self.track_high_thresh = track_high_thresh + self.track_low_thresh = track_low_thresh + self.new_track_thresh = new_track_thresh + self.match_thresh = match_thresh + self.buffer_size = int(frame_rate / 30.0 * track_buffer) + self.max_time_lost = self.buffer_size + self.kalman_filter = KalmanFilter() + self.min_box_area = min_box_area + + self.camera_motion = camera_motion + self.gmc = GMC(method=cmc_method) + + def update(self, output_results, img=None): + self.frame_id += 1 + activated_starcks = [] + refind_stracks = [] + lost_stracks = [] + removed_stracks = [] + + if len(output_results): + bboxes = output_results[:, 2:6] + scores = output_results[:, 1] + classes = output_results[:, 0] + + # Remove bad detections + lowest_inds = scores > self.track_low_thresh + bboxes = bboxes[lowest_inds] + scores = scores[lowest_inds] + classes = classes[lowest_inds] + + # Find high threshold detections + remain_inds = scores > self.track_high_thresh + dets = bboxes[remain_inds] + scores_keep = scores[remain_inds] + classes_keep = classes[remain_inds] + + else: + bboxes = [] + scores = [] + classes = [] + dets = [] + scores_keep = [] + classes_keep = [] + + if len(dets) > 0: + '''Detections''' + detections = [ + STrack(STrack.tlbr_to_tlwh(tlbr), s, c) + for (tlbr, s, c) in zip(dets, scores_keep, classes_keep) + ] + else: + detections = [] + ''' Add newly detected tracklets to tracked_stracks''' + unconfirmed = [] + tracked_stracks = [] # type: list[STrack] + for track in self.tracked_stracks: + if not track.is_activated: + unconfirmed.append(track) + else: + tracked_stracks.append(track) + ''' Step 2: First association, with high score detection boxes''' + strack_pool = joint_stracks(tracked_stracks, self.lost_stracks) + + # Predict the current location with KF + STrack.multi_predict(strack_pool, self.kalman_filter) + + # Fix camera motion + if self.camera_motion: + warp = self.gmc.apply(img[0], dets) + STrack.multi_gmc(strack_pool, warp) + STrack.multi_gmc(unconfirmed, warp) + + # Associate with high score detection boxes + ious_dists = matching.iou_distance(strack_pool, detections) + matches, u_track, u_detection = matching.linear_assignment( + ious_dists, thresh=self.match_thresh) + + for itracked, idet in matches: + track = strack_pool[itracked] + det = detections[idet] + if track.state == TrackState.Tracked: + track.update(detections[idet], self.frame_id) + activated_starcks.append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + ''' Step 3: Second association, with low score detection boxes''' + if len(scores): + inds_high = scores < self.track_high_thresh + inds_low = scores > self.track_low_thresh + inds_second = np.logical_and(inds_low, inds_high) + dets_second = bboxes[inds_second] + scores_second = scores[inds_second] + classes_second = classes[inds_second] + else: + dets_second = [] + scores_second = [] + classes_second = [] + + # association the untrack to the low score detections + if len(dets_second) > 0: + '''Detections''' + detections_second = [ + STrack(STrack.tlbr_to_tlwh(tlbr), s, c) for (tlbr, s, c) in + zip(dets_second, scores_second, classes_second) + ] + else: + detections_second = [] + + r_tracked_stracks = [ + strack_pool[i] for i in u_track + if strack_pool[i].state == TrackState.Tracked + ] + dists = matching.iou_distance(r_tracked_stracks, detections_second) + matches, u_track, u_detection_second = matching.linear_assignment( + dists, thresh=0.5) + for itracked, idet in matches: + track = r_tracked_stracks[itracked] + det = detections_second[idet] + if track.state == TrackState.Tracked: + track.update(det, self.frame_id) + activated_starcks.append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + + for it in u_track: + track = r_tracked_stracks[it] + if not track.state == TrackState.Lost: + track.mark_lost() + lost_stracks.append(track) + '''Deal with unconfirmed tracks, usually tracks with only one beginning frame''' + detections = [detections[i] for i in u_detection] + dists = matching.iou_distance(unconfirmed, detections) + + matches, u_unconfirmed, u_detection = matching.linear_assignment( + dists, thresh=0.7) + for itracked, idet in matches: + unconfirmed[itracked].update(detections[idet], self.frame_id) + activated_starcks.append(unconfirmed[itracked]) + for it in u_unconfirmed: + track = unconfirmed[it] + track.mark_removed() + removed_stracks.append(track) + """ Step 4: Init new stracks""" + for inew in u_detection: + track = detections[inew] + if track.score < self.new_track_thresh: + continue + + track.activate(self.kalman_filter, self.frame_id) + activated_starcks.append(track) + """ Step 5: Update state""" + for track in self.lost_stracks: + if self.frame_id - track.end_frame > self.max_time_lost: + track.mark_removed() + removed_stracks.append(track) + """ Merge """ + self.tracked_stracks = [ + t for t in self.tracked_stracks if t.state == TrackState.Tracked + ] + self.tracked_stracks = joint_stracks(self.tracked_stracks, + activated_starcks) + self.tracked_stracks = joint_stracks(self.tracked_stracks, + refind_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks) + self.lost_stracks.extend(lost_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks) + self.removed_stracks.extend(removed_stracks) + self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks( + self.tracked_stracks, self.lost_stracks) + + # output_stracks = [track for track in self.tracked_stracks if track.is_activated] + output_stracks = [track for track in self.tracked_stracks] + + return output_stracks diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/center_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/center_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..9b30ba9269711b21a60aa553e97d68f4950b7d1a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/center_tracker.py @@ -0,0 +1,143 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/xingyizhou/CenterTrack/blob/master/src/lib/utils/tracker.py +""" + +import copy +import numpy as np +import sklearn + +__all__ = ['CenterTracker'] + + +class CenterTracker(object): + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=1, + min_box_area=0, + vertical_ratio=-1, + track_thresh=0.4, + pre_thresh=0.5, + new_thresh=0.4, + out_thresh=0.4, + hungarian=False): + self.num_classes = num_classes + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + + self.track_thresh = track_thresh + self.pre_thresh = max(track_thresh, pre_thresh) + self.new_thresh = max(track_thresh, new_thresh) + self.out_thresh = max(track_thresh, out_thresh) + self.hungarian = hungarian + + self.reset() + + def init_track(self, results): + print('Initialize tracking!') + for item in results: + if item['score'] > self.new_thresh: + self.id_count += 1 + item['tracking_id'] = self.id_count + if not ('ct' in item): + bbox = item['bbox'] + item['ct'] = [(bbox[0] + bbox[2]) / 2, + (bbox[1] + bbox[3]) / 2] + self.tracks.append(item) + + def reset(self): + self.id_count = 0 + self.tracks = [] + + def update(self, results, public_det=None): + N = len(results) + M = len(self.tracks) + + dets = np.array([det['ct'] + det['tracking'] for det in results], + np.float32) # N x 2 + track_size = np.array([((track['bbox'][2] - track['bbox'][0]) * \ + (track['bbox'][3] - track['bbox'][1])) \ + for track in self.tracks], np.float32) # M + track_cat = np.array([track['class'] for track in self.tracks], + np.int32) # M + item_size = np.array([((item['bbox'][2] - item['bbox'][0]) * \ + (item['bbox'][3] - item['bbox'][1])) \ + for item in results], np.float32) # N + item_cat = np.array([item['class'] for item in results], np.int32) # N + tracks = np.array([pre_det['ct'] for pre_det in self.tracks], + np.float32) # M x 2 + dist = (((tracks.reshape(1, -1, 2) - \ + dets.reshape(-1, 1, 2)) ** 2).sum(axis=2)) # N x M + + invalid = ((dist > track_size.reshape(1, M)) + \ + (dist > item_size.reshape(N, 1)) + \ + (item_cat.reshape(N, 1) != track_cat.reshape(1, M))) > 0 + dist = dist + invalid * 1e18 + + if self.hungarian: + item_score = np.array([item['score'] for item in results], + np.float32) + dist[dist > 1e18] = 1e18 + from sklearn.utils.linear_assignment_ import linear_assignment + matched_indices = linear_assignment(dist) + else: + matched_indices = greedy_assignment(copy.deepcopy(dist)) + + unmatched_dets = [d for d in range(dets.shape[0]) \ + if not (d in matched_indices[:, 0])] + unmatched_tracks = [d for d in range(tracks.shape[0]) \ + if not (d in matched_indices[:, 1])] + + if self.hungarian: + matches = [] + for m in matched_indices: + if dist[m[0], m[1]] > 1e16: + unmatched_dets.append(m[0]) + unmatched_tracks.append(m[1]) + else: + matches.append(m) + matches = np.array(matches).reshape(-1, 2) + else: + matches = matched_indices + + ret = [] + for m in matches: + track = results[m[0]] + track['tracking_id'] = self.tracks[m[1]]['tracking_id'] + ret.append(track) + + # Private detection: create tracks for all un-matched detections + for i in unmatched_dets: + track = results[i] + if track['score'] > self.new_thresh: + self.id_count += 1 + track['tracking_id'] = self.id_count + ret.append(track) + + self.tracks = ret + return ret + + +def greedy_assignment(dist): + matched_indices = [] + if dist.shape[1] == 0: + return np.array(matched_indices, np.int32).reshape(-1, 2) + for i in range(dist.shape[0]): + j = dist[i].argmin() + if dist[i][j] < 1e16: + dist[:, j] = 1e18 + matched_indices.append([i, j]) + return np.array(matched_indices, np.int32).reshape(-1, 2) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/deepsort_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/deepsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..4679dec3e57f79c5d06bcc89aae80be211ac19ab --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/deepsort_tracker.py @@ -0,0 +1,183 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/tracker.py +""" + +import numpy as np + +from ..motion import KalmanFilter +from ..matching.deepsort_matching import NearestNeighborDistanceMetric +from ..matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix +from .base_sde_tracker import Track +from ..utils import Detection + +__all__ = ['DeepSORTTracker'] + + +class DeepSORTTracker(object): + """ + DeepSORT tracker + + Args: + input_size (list): input feature map size to reid model, [h, w] format, + [64, 192] as default. + min_box_area (int): min box area to filter out low quality boxes + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results, set 1.6 default for pedestrian tracking. If set <=0 + means no need to filter bboxes. + budget (int): If not None, fix samples per class to at most this number. + Removes the oldest samples when the budget is reached. + max_age (int): maximum number of missed misses before a track is deleted + n_init (float): Number of frames that a track remains in initialization + phase. Number of consecutive detections before the track is confirmed. + The track state is set to `Deleted` if a miss occurs within the first + `n_init` frames. + metric_type (str): either "euclidean" or "cosine", the distance metric + used for measurement to track association. + matching_threshold (float): samples with larger distance are + considered an invalid match. + max_iou_distance (float): max iou distance threshold + motion (object): KalmanFilter instance + """ + + def __init__(self, + input_size=[64, 192], + min_box_area=0, + vertical_ratio=-1, + budget=100, + max_age=70, + n_init=3, + metric_type='cosine', + matching_threshold=0.2, + max_iou_distance=0.9, + motion='KalmanFilter'): + self.input_size = input_size + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + self.max_age = max_age + self.n_init = n_init + self.metric = NearestNeighborDistanceMetric(metric_type, + matching_threshold, budget) + self.max_iou_distance = max_iou_distance + if motion == 'KalmanFilter': + self.motion = KalmanFilter() + + self.tracks = [] + self._next_id = 1 + + def predict(self): + """ + Propagate track state distributions one time step forward. + This function should be called once every time step, before `update`. + """ + for track in self.tracks: + track.predict(self.motion) + + def update(self, pred_dets, pred_embs): + """ + Perform measurement update and track management. + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128], usually pred_embs.shape[1] is a multiple of 128. + """ + pred_cls_ids = pred_dets[:, 0:1] + pred_scores = pred_dets[:, 1:2] + pred_xyxys = pred_dets[:, 2:6] + pred_tlwhs = np.concatenate((pred_xyxys[:, 0:2], pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1), axis=1) + + detections = [ + Detection(tlwh, score, feat, cls_id) + for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores, + pred_embs, pred_cls_ids) + ] + + # Run matching cascade. + matches, unmatched_tracks, unmatched_detections = \ + self._match(detections) + + # Update track set. + for track_idx, detection_idx in matches: + self.tracks[track_idx].update(self.motion, + detections[detection_idx]) + for track_idx in unmatched_tracks: + self.tracks[track_idx].mark_missed() + for detection_idx in unmatched_detections: + self._initiate_track(detections[detection_idx]) + self.tracks = [t for t in self.tracks if not t.is_deleted()] + + # Update distance metric. + active_targets = [t.track_id for t in self.tracks if t.is_confirmed()] + features, targets = [], [] + for track in self.tracks: + if not track.is_confirmed(): + continue + features += track.features + targets += [track.track_id for _ in track.features] + track.features = [] + self.metric.partial_fit( + np.asarray(features), np.asarray(targets), active_targets) + output_stracks = self.tracks + return output_stracks + + def _match(self, detections): + def gated_metric(tracks, dets, track_indices, detection_indices): + features = np.array([dets[i].feature for i in detection_indices]) + targets = np.array([tracks[i].track_id for i in track_indices]) + cost_matrix = self.metric.distance(features, targets) + cost_matrix = gate_cost_matrix(self.motion, cost_matrix, tracks, + dets, track_indices, + detection_indices) + return cost_matrix + + # Split track set into confirmed and unconfirmed tracks. + confirmed_tracks = [ + i for i, t in enumerate(self.tracks) if t.is_confirmed() + ] + unconfirmed_tracks = [ + i for i, t in enumerate(self.tracks) if not t.is_confirmed() + ] + + # Associate confirmed tracks using appearance features. + matches_a, unmatched_tracks_a, unmatched_detections = \ + matching_cascade( + gated_metric, self.metric.matching_threshold, self.max_age, + self.tracks, detections, confirmed_tracks) + + # Associate remaining tracks together with unconfirmed tracks using IOU. + iou_track_candidates = unconfirmed_tracks + [ + k for k in unmatched_tracks_a + if self.tracks[k].time_since_update == 1 + ] + unmatched_tracks_a = [ + k for k in unmatched_tracks_a + if self.tracks[k].time_since_update != 1 + ] + matches_b, unmatched_tracks_b, unmatched_detections = \ + min_cost_matching( + iou_cost, self.max_iou_distance, self.tracks, + detections, iou_track_candidates, unmatched_detections) + + matches = matches_a + matches_b + unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b)) + return matches, unmatched_tracks, unmatched_detections + + def _initiate_track(self, detection): + mean, covariance = self.motion.initiate(detection.to_xyah()) + self.tracks.append( + Track(mean, covariance, self._next_id, self.n_init, self.max_age, + detection.cls_id, detection.score, detection.feature)) + self._next_id += 1 diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/jde_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/jde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..bcfd0f266f4176dfd569d909c3b833131572c03f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/jde_tracker.py @@ -0,0 +1,337 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py +""" + +import numpy as np +from collections import defaultdict + +from ..matching import jde_matching as matching +from ..motion import KalmanFilter +from .base_jde_tracker import TrackState, STrack +from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks + +__all__ = ['JDETracker'] + + +class JDETracker(object): + __shared__ = ['num_classes'] + """ + JDE tracker, support single class and multi classes + + Args: + use_byte (bool): Whether use ByteTracker, default False + num_classes (int): the number of classes + det_thresh (float): threshold of detection score + track_buffer (int): buffer for tracker + min_box_area (int): min box area to filter out low quality boxes + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + tracked_thresh (float): linear assignment threshold of tracked + stracks and detections + r_tracked_thresh (float): linear assignment threshold of + tracked stracks and unmatched detections + unconfirmed_thresh (float): linear assignment threshold of + unconfirmed stracks and unmatched detections + conf_thres (float): confidence threshold for tracking, also used in + ByteTracker as higher confidence threshold + match_thres (float): linear assignment threshold of tracked + stracks and detections in ByteTracker + low_conf_thres (float): lower confidence threshold for tracking in + ByteTracker + input_size (list): input feature map size to reid model, [h, w] format, + [64, 192] as default. + motion (str): motion model, KalmanFilter as default + metric_type (str): either "euclidean" or "cosine", the distance metric + used for measurement to track association. + """ + + def __init__(self, + use_byte=False, + num_classes=1, + det_thresh=0.3, + track_buffer=30, + min_box_area=0, + vertical_ratio=0, + tracked_thresh=0.7, + r_tracked_thresh=0.5, + unconfirmed_thresh=0.7, + conf_thres=0, + match_thres=0.8, + low_conf_thres=0.2, + input_size=[64, 192], + motion='KalmanFilter', + metric_type='euclidean'): + self.use_byte = use_byte + self.num_classes = num_classes + self.det_thresh = det_thresh if not use_byte else conf_thres + 0.1 + self.track_buffer = track_buffer + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + + self.tracked_thresh = tracked_thresh + self.r_tracked_thresh = r_tracked_thresh + self.unconfirmed_thresh = unconfirmed_thresh + self.conf_thres = conf_thres + self.match_thres = match_thres + self.low_conf_thres = low_conf_thres + + self.input_size = input_size + if motion == 'KalmanFilter': + self.motion = KalmanFilter() + self.metric_type = metric_type + + self.frame_id = 0 + self.tracked_tracks_dict = defaultdict(list) # dict(list[STrack]) + self.lost_tracks_dict = defaultdict(list) # dict(list[STrack]) + self.removed_tracks_dict = defaultdict(list) # dict(list[STrack]) + + self.max_time_lost = 0 + # max_time_lost will be calculated: int(frame_rate / 30.0 * track_buffer) + + def update(self, pred_dets, pred_embs=None): + """ + Processes the image frame and finds bounding box(detections). + Associates the detection with corresponding tracklets and also handles + lost, removed, refound and active tracklets. + + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512]. + + Return: + output_stracks_dict (dict(list)): The list contains information + regarding the online_tracklets for the received image tensor. + """ + self.frame_id += 1 + if self.frame_id == 1: + STrack.init_count(self.num_classes) + activated_tracks_dict = defaultdict(list) + refined_tracks_dict = defaultdict(list) + lost_tracks_dict = defaultdict(list) + removed_tracks_dict = defaultdict(list) + output_tracks_dict = defaultdict(list) + + pred_dets_dict = defaultdict(list) + pred_embs_dict = defaultdict(list) + + # unify single and multi classes detection and embedding results + for cls_id in range(self.num_classes): + cls_idx = (pred_dets[:, 0:1] == cls_id).squeeze(-1) + pred_dets_dict[cls_id] = pred_dets[cls_idx] + if pred_embs is not None: + pred_embs_dict[cls_id] = pred_embs[cls_idx] + else: + pred_embs_dict[cls_id] = None + + for cls_id in range(self.num_classes): + """ Step 1: Get detections by class""" + pred_dets_cls = pred_dets_dict[cls_id] + pred_embs_cls = pred_embs_dict[cls_id] + remain_inds = (pred_dets_cls[:, 1:2] > self.conf_thres).squeeze(-1) + if remain_inds.sum() > 0: + pred_dets_cls = pred_dets_cls[remain_inds] + if pred_embs_cls is None: + # in original ByteTrack + detections = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), + tlbrs[1], + cls_id, + 30, + temp_feat=None) for tlbrs in pred_dets_cls + ] + else: + pred_embs_cls = pred_embs_cls[remain_inds] + detections = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], cls_id, + 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls, pred_embs_cls) + ] + else: + detections = [] + ''' Add newly detected tracklets to tracked_stracks''' + unconfirmed_dict = defaultdict(list) + tracked_tracks_dict = defaultdict(list) + for track in self.tracked_tracks_dict[cls_id]: + if not track.is_activated: + # previous tracks which are not active in the current frame are added in unconfirmed list + unconfirmed_dict[cls_id].append(track) + else: + # Active tracks are added to the local list 'tracked_stracks' + tracked_tracks_dict[cls_id].append(track) + """ Step 2: First association, with embedding""" + # building tracking pool for the current frame + track_pool_dict = defaultdict(list) + track_pool_dict[cls_id] = joint_stracks( + tracked_tracks_dict[cls_id], self.lost_tracks_dict[cls_id]) + + # Predict the current location with KalmanFilter + STrack.multi_predict(track_pool_dict[cls_id], self.motion) + + if pred_embs_cls is None: + # in original ByteTrack + dists = matching.iou_distance(track_pool_dict[cls_id], + detections) + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.match_thres) # not self.tracked_thresh + else: + dists = matching.embedding_distance( + track_pool_dict[cls_id], + detections, + metric=self.metric_type) + dists = matching.fuse_motion( + self.motion, dists, track_pool_dict[cls_id], detections) + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.tracked_thresh) + + for i_tracked, idet in matches: + # i_tracked is the id of the track and idet is the detection + track = track_pool_dict[cls_id][i_tracked] + det = detections[idet] + if track.state == TrackState.Tracked: + # If the track is active, add the detection to the track + track.update(detections[idet], self.frame_id) + activated_tracks_dict[cls_id].append(track) + else: + # We have obtained a detection from a track which is not active, + # hence put the track in refind_stracks list + track.re_activate(det, self.frame_id, new_id=False) + refined_tracks_dict[cls_id].append(track) + + # None of the steps below happen if there are no undetected tracks. + """ Step 3: Second association, with IOU""" + if self.use_byte: + inds_low = pred_dets_dict[cls_id][:, 1:2] > self.low_conf_thres + inds_high = pred_dets_dict[cls_id][:, 1:2] < self.conf_thres + inds_second = np.logical_and(inds_low, inds_high).squeeze(-1) + pred_dets_cls_second = pred_dets_dict[cls_id][inds_second] + + # association the untrack to the low score detections + if len(pred_dets_cls_second) > 0: + if pred_embs_dict[cls_id] is None: + # in original ByteTrack + detections_second = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), + tlbrs[1], + cls_id, + 30, + temp_feat=None) + for tlbrs in pred_dets_cls_second + ] + else: + pred_embs_cls_second = pred_embs_dict[cls_id][ + inds_second] + detections_second = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], + cls_id, 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls_second, pred_embs_cls_second) + ] + else: + detections_second = [] + r_tracked_stracks = [ + track_pool_dict[cls_id][i] for i in u_track + if track_pool_dict[cls_id][i].state == TrackState.Tracked + ] + dists = matching.iou_distance(r_tracked_stracks, + detections_second) + matches, u_track, u_detection_second = matching.linear_assignment( + dists, thresh=0.4) # not r_tracked_thresh + else: + detections = [detections[i] for i in u_detection] + r_tracked_stracks = [] + for i in u_track: + if track_pool_dict[cls_id][i].state == TrackState.Tracked: + r_tracked_stracks.append(track_pool_dict[cls_id][i]) + dists = matching.iou_distance(r_tracked_stracks, detections) + + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.r_tracked_thresh) + + for i_tracked, idet in matches: + track = r_tracked_stracks[i_tracked] + det = detections[ + idet] if not self.use_byte else detections_second[idet] + if track.state == TrackState.Tracked: + track.update(det, self.frame_id) + activated_tracks_dict[cls_id].append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refined_tracks_dict[cls_id].append(track) + + for it in u_track: + track = r_tracked_stracks[it] + if not track.state == TrackState.Lost: + track.mark_lost() + lost_tracks_dict[cls_id].append(track) + '''Deal with unconfirmed tracks, usually tracks with only one beginning frame''' + detections = [detections[i] for i in u_detection] + dists = matching.iou_distance(unconfirmed_dict[cls_id], detections) + matches, u_unconfirmed, u_detection = matching.linear_assignment( + dists, thresh=self.unconfirmed_thresh) + for i_tracked, idet in matches: + unconfirmed_dict[cls_id][i_tracked].update(detections[idet], + self.frame_id) + activated_tracks_dict[cls_id].append(unconfirmed_dict[cls_id][ + i_tracked]) + for it in u_unconfirmed: + track = unconfirmed_dict[cls_id][it] + track.mark_removed() + removed_tracks_dict[cls_id].append(track) + """ Step 4: Init new stracks""" + for inew in u_detection: + track = detections[inew] + if track.score < self.det_thresh: + continue + track.activate(self.motion, self.frame_id) + activated_tracks_dict[cls_id].append(track) + """ Step 5: Update state""" + for track in self.lost_tracks_dict[cls_id]: + if self.frame_id - track.end_frame > self.max_time_lost: + track.mark_removed() + removed_tracks_dict[cls_id].append(track) + + self.tracked_tracks_dict[cls_id] = [ + t for t in self.tracked_tracks_dict[cls_id] + if t.state == TrackState.Tracked + ] + self.tracked_tracks_dict[cls_id] = joint_stracks( + self.tracked_tracks_dict[cls_id], activated_tracks_dict[cls_id]) + self.tracked_tracks_dict[cls_id] = joint_stracks( + self.tracked_tracks_dict[cls_id], refined_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id] = sub_stracks( + self.lost_tracks_dict[cls_id], self.tracked_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id].extend(lost_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id] = sub_stracks( + self.lost_tracks_dict[cls_id], self.removed_tracks_dict[cls_id]) + self.removed_tracks_dict[cls_id].extend(removed_tracks_dict[cls_id]) + self.tracked_tracks_dict[cls_id], self.lost_tracks_dict[ + cls_id] = remove_duplicate_stracks( + self.tracked_tracks_dict[cls_id], + self.lost_tracks_dict[cls_id]) + + # get scores of lost tracks + output_tracks_dict[cls_id] = [ + track for track in self.tracked_tracks_dict[cls_id] + if track.is_activated + ] + + return output_tracks_dict diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/ocsort_tracker.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/ocsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..02b1028e1a59a74357d3ee8aef2446902458ffd2 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/tracker/ocsort_tracker.py @@ -0,0 +1,370 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/ocsort.py +""" + +import time +import numpy as np +from ..matching.ocsort_matching import associate, linear_assignment, iou_batch, associate_only_iou +from ..motion.ocsort_kalman_filter import OCSORTKalmanFilter + + +def k_previous_obs(observations, cur_age, k): + if len(observations) == 0: + return [-1, -1, -1, -1, -1] + for i in range(k): + dt = k - i + if cur_age - dt in observations: + return observations[cur_age - dt] + max_age = max(observations.keys()) + return observations[max_age] + + +def convert_bbox_to_z(bbox): + """ + Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form + [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is + the aspect ratio + """ + w = bbox[2] - bbox[0] + h = bbox[3] - bbox[1] + x = bbox[0] + w / 2. + y = bbox[1] + h / 2. + s = w * h # scale is just area + r = w / float(h + 1e-6) + return np.array([x, y, s, r]).reshape((4, 1)) + + +def convert_x_to_bbox(x, score=None): + """ + Takes a bounding box in the centre form [x,y,s,r] and returns it in the form + [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right + """ + w = np.sqrt(x[2] * x[3]) + h = x[2] / w + if (score == None): + return np.array( + [x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., + x[1] + h / 2.]).reshape((1, 4)) + else: + score = np.array([score]) + return np.array([ + x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score + ]).reshape((1, 5)) + + +def speed_direction(bbox1, bbox2): + cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0 + cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0 + speed = np.array([cy2 - cy1, cx2 - cx1]) + norm = np.sqrt((cy2 - cy1)**2 + (cx2 - cx1)**2) + 1e-6 + return speed / norm + + +class KalmanBoxTracker(object): + """ + This class represents the internal state of individual tracked objects observed as bbox. + + Args: + bbox (np.array): bbox in [x1,y1,x2,y2,score] format. + delta_t (int): delta_t of previous observation + """ + count = 0 + + def __init__(self, bbox, delta_t=3): + + self.kf = OCSORTKalmanFilter(dim_x=7, dim_z=4) + self.kf.F = np.array([[1., 0, 0, 0, 1., 0, 0], [0, 1., 0, 0, 0, 1., 0], + [0, 0, 1., 0, 0, 0, 1.], [0, 0, 0, 1., 0, 0, 0], + [0, 0, 0, 0, 1., 0, 0], [0, 0, 0, 0, 0, 1., 0], + [0, 0, 0, 0, 0, 0, 1.]]) + self.kf.H = np.array([[1., 0, 0, 0, 0, 0, 0], [0, 1., 0, 0, 0, 0, 0], + [0, 0, 1., 0, 0, 0, 0], [0, 0, 0, 1., 0, 0, 0]]) + self.kf.R[2:, 2:] *= 10. + self.kf.P[4:, 4:] *= 1000. + # give high uncertainty to the unobservable initial velocities + self.kf.P *= 10. + self.kf.Q[-1, -1] *= 0.01 + self.kf.Q[4:, 4:] *= 0.01 + + self.score = bbox[4] + self.kf.x[:4] = convert_bbox_to_z(bbox) + self.time_since_update = 0 + self.id = KalmanBoxTracker.count + KalmanBoxTracker.count += 1 + self.history = [] + self.hits = 0 + self.hit_streak = 0 + self.age = 0 + """ + NOTE: [-1,-1,-1,-1,-1] is a compromising placeholder for non-observation status, the same for the return of + function k_previous_obs. It is ugly and I do not like it. But to support generate observation array in a + fast and unified way, which you would see below k_observations = np.array([k_previous_obs(...]]), let's bear it for now. + """ + self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder + self.observations = dict() + self.history_observations = [] + self.velocity = None + self.delta_t = delta_t + + def update(self, bbox, angle_cost=False): + """ + Updates the state vector with observed bbox. + """ + if bbox is not None: + if angle_cost and self.last_observation.sum( + ) >= 0: # no previous observation + previous_box = None + for i in range(self.delta_t): + dt = self.delta_t - i + if self.age - dt in self.observations: + previous_box = self.observations[self.age - dt] + break + if previous_box is None: + previous_box = self.last_observation + # """ + # Estimate the track speed direction with observations \Delta t steps away + # """ + self.velocity = speed_direction(previous_box, bbox) + """ + Insert new observations. This is a ugly way to maintain both self.observations + and self.history_observations. Bear it for the moment. + """ + self.last_observation = bbox + self.observations[self.age] = bbox + self.history_observations.append(bbox) + + self.time_since_update = 0 + self.history = [] + self.hits += 1 + self.hit_streak += 1 + self.kf.update(convert_bbox_to_z(bbox)) + else: + self.kf.update(bbox) + + def predict(self): + """ + Advances the state vector and returns the predicted bounding box estimate. + """ + if ((self.kf.x[6] + self.kf.x[2]) <= 0): + self.kf.x[6] *= 0.0 + + self.kf.predict() + self.age += 1 + if (self.time_since_update > 0): + self.hit_streak = 0 + self.time_since_update += 1 + self.history.append(convert_x_to_bbox(self.kf.x, score=self.score)) + return self.history[-1] + + def get_state(self): + return convert_x_to_bbox(self.kf.x, score=self.score) + + +class OCSORTTracker(object): + """ + OCSORT tracker, support single class + + Args: + det_thresh (float): threshold of detection score + max_age (int): maximum number of missed misses before a track is deleted + min_hits (int): minimum hits for associate + iou_threshold (float): iou threshold for associate + delta_t (int): delta_t of previous observation + inertia (float): vdc_weight of angle_diff_cost for associate + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + min_box_area (int): min box area to filter out low quality boxes + use_byte (bool): Whether use ByteTracker, default False + use_angle_cost (bool) Whether use angle cost, default False + """ + + def __init__(self, + det_thresh=0.6, + max_age=30, + min_hits=3, + iou_threshold=0.3, + delta_t=3, + inertia=0.2, + vertical_ratio=-1, + min_box_area=0, + use_byte=False, + use_angle_cost=False): + self.det_thresh = det_thresh + self.max_age = max_age + self.min_hits = min_hits + self.iou_threshold = iou_threshold + self.delta_t = delta_t + self.inertia = inertia + self.vertical_ratio = vertical_ratio + self.min_box_area = min_box_area + self.use_byte = use_byte + self.use_angle_cost = use_angle_cost + + self.trackers = [] + self.frame_count = 0 + KalmanBoxTracker.count = 0 + + def update(self, pred_dets, pred_embs=None): + """ + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512], default as None. + + Return: + tracking boxes (np.array): [M, 6], means 'x0, y0, x1, y1, score, id'. + """ + if pred_dets is None: + return np.empty((0, 6)) + + self.frame_count += 1 + + bboxes = pred_dets[:, 2:] + scores = pred_dets[:, 1:2] + dets = np.concatenate((bboxes, scores), axis=1) + scores = scores.squeeze(-1) + + inds_low = scores > 0.1 + inds_high = scores < self.det_thresh + inds_second = np.logical_and(inds_low, inds_high) + # self.det_thresh > score > 0.1, for second matching + dets_second = dets[inds_second] # detections for second matching + remain_inds = scores > self.det_thresh + dets = dets[remain_inds] + + # get predicted locations from existing trackers. + trks = np.zeros((len(self.trackers), 5)) + to_del = [] + ret = [] + for t, trk in enumerate(trks): + pos = self.trackers[t].predict()[0] + trk[:] = [pos[0], pos[1], pos[2], pos[3], 0] + if np.any(np.isnan(pos)): + to_del.append(t) + trks = np.ma.compress_rows(np.ma.masked_invalid(trks)) + for t in reversed(to_del): + self.trackers.pop(t) + + if self.use_angle_cost: + velocities = np.array([ + trk.velocity if trk.velocity is not None else np.array((0, 0)) + for trk in self.trackers + ]) + k_observations = np.array([ + k_previous_obs(trk.observations, trk.age, self.delta_t) + for trk in self.trackers + ]) + + last_boxes = np.array([trk.last_observation for trk in self.trackers]) + """ + First round of association + """ + if self.use_angle_cost: + matched, unmatched_dets, unmatched_trks = associate( + dets, trks, self.iou_threshold, velocities, k_observations, + self.inertia) + else: + matched, unmatched_dets, unmatched_trks = associate_only_iou( + dets, trks, self.iou_threshold) + + for m in matched: + self.trackers[m[1]].update( + dets[m[0], :], angle_cost=self.use_angle_cost) + """ + Second round of associaton by OCR + """ + # BYTE association + if self.use_byte and len(dets_second) > 0 and unmatched_trks.shape[ + 0] > 0: + u_trks = trks[unmatched_trks] + iou_left = iou_batch( + dets_second, + u_trks) # iou between low score detections and unmatched tracks + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + matched_indices = linear_assignment(-iou_left) + to_remove_trk_indices = [] + for m in matched_indices: + det_ind, trk_ind = m[0], unmatched_trks[m[1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update( + dets_second[det_ind, :], angle_cost=self.use_angle_cost) + to_remove_trk_indices.append(trk_ind) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0: + left_dets = dets[unmatched_dets] + left_trks = last_boxes[unmatched_trks] + iou_left = iou_batch(left_dets, left_trks) + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + rematched_indices = linear_assignment(-iou_left) + to_remove_det_indices = [] + to_remove_trk_indices = [] + for m in rematched_indices: + det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[ + 1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update( + dets[det_ind, :], angle_cost=self.use_angle_cost) + to_remove_det_indices.append(det_ind) + to_remove_trk_indices.append(trk_ind) + unmatched_dets = np.setdiff1d(unmatched_dets, + np.array(to_remove_det_indices)) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + for m in unmatched_trks: + self.trackers[m].update(None) + + # create and initialise new trackers for unmatched detections + for i in unmatched_dets: + trk = KalmanBoxTracker(dets[i, :], delta_t=self.delta_t) + self.trackers.append(trk) + + i = len(self.trackers) + for trk in reversed(self.trackers): + if trk.last_observation.sum() < 0: + d = trk.get_state()[0] + else: + d = trk.last_observation # tlbr + score + if (trk.time_since_update < 1) and ( + trk.hit_streak >= self.min_hits or + self.frame_count <= self.min_hits): + # +1 as MOT benchmark requires positive + ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) + i -= 1 + # remove dead tracklet + if (trk.time_since_update > self.max_age): + self.trackers.pop(i) + if (len(ret) > 0): + return np.concatenate(ret) + return np.empty((0, 6)) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/utils.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..b5da3c79f3844e383a4f408f9adc9ede5a5efc5a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/utils.py @@ -0,0 +1,436 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import cv2 +import time +import numpy as np +import collections +import math + +__all__ = [ + 'MOTTimer', 'Detection', 'write_mot_results', 'load_det_results', + 'preprocess_reid', 'get_crops', 'clip_box', 'scale_coords', + 'flow_statistic', 'update_object_info' +] + + +class MOTTimer(object): + """ + This class used to compute and print the current FPS while evaling. + """ + + def __init__(self, window_size=20): + self.start_time = 0. + self.diff = 0. + self.duration = 0. + self.deque = collections.deque(maxlen=window_size) + + def tic(self): + # using time.time instead of time.clock because time time.clock + # does not normalize for multithreading + self.start_time = time.time() + + def toc(self, average=True): + self.diff = time.time() - self.start_time + self.deque.append(self.diff) + if average: + self.duration = np.mean(self.deque) + else: + self.duration = np.sum(self.deque) + return self.duration + + def clear(self): + self.start_time = 0. + self.diff = 0. + self.duration = 0. + + +class Detection(object): + """ + This class represents a bounding box detection in a single image. + + Args: + tlwh (Tensor): Bounding box in format `(top left x, top left y, + width, height)`. + score (Tensor): Bounding box confidence score. + feature (Tensor): A feature vector that describes the object + contained in this image. + cls_id (Tensor): Bounding box category id. + """ + + def __init__(self, tlwh, score, feature, cls_id): + self.tlwh = np.asarray(tlwh, dtype=np.float32) + self.score = float(score) + self.feature = np.asarray(feature, dtype=np.float32) + self.cls_id = int(cls_id) + + def to_tlbr(self): + """ + Convert bounding box to format `(min x, min y, max x, max y)`, i.e., + `(top left, bottom right)`. + """ + ret = self.tlwh.copy() + ret[2:] += ret[:2] + return ret + + def to_xyah(self): + """ + Convert bounding box to format `(center x, center y, aspect ratio, + height)`, where the aspect ratio is `width / height`. + """ + ret = self.tlwh.copy() + ret[:2] += ret[2:] / 2 + ret[2] /= ret[3] + return ret + + +def write_mot_results(filename, results, data_type='mot', num_classes=1): + # support single and multi classes + if data_type in ['mot', 'mcmot']: + save_format = '{frame},{id},{x1},{y1},{w},{h},{score},{cls_id},-1,-1\n' + elif data_type == 'kitti': + save_format = '{frame} {id} car 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n' + else: + raise ValueError(data_type) + + f = open(filename, 'w') + for cls_id in range(num_classes): + for frame_id, tlwhs, tscores, track_ids in results[cls_id]: + if data_type == 'kitti': + frame_id -= 1 + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + if data_type == 'mot': + cls_id = -1 + + x1, y1, w, h = tlwh + x2, y2 = x1 + w, y1 + h + line = save_format.format( + frame=frame_id, + id=track_id, + x1=x1, + y1=y1, + x2=x2, + y2=y2, + w=w, + h=h, + score=score, + cls_id=cls_id) + f.write(line) + print('MOT results save in {}'.format(filename)) + + +def load_det_results(det_file, num_frames): + assert os.path.exists(det_file) and os.path.isfile(det_file), \ + '{} is not exist or not a file.'.format(det_file) + labels = np.loadtxt(det_file, dtype='float32', delimiter=',') + assert labels.shape[1] == 7, \ + "Each line of {} should have 7 items: '[frame_id],[x0],[y0],[w],[h],[score],[class_id]'.".format(det_file) + results_list = [] + for frame_i in range(num_frames): + results = {'bbox': [], 'score': [], 'cls_id': []} + lables_with_frame = labels[labels[:, 0] == frame_i + 1] + # each line of lables_with_frame: + # [frame_id],[x0],[y0],[w],[h],[score],[class_id] + for l in lables_with_frame: + results['bbox'].append(l[1:5]) + results['score'].append(l[5:6]) + results['cls_id'].append(l[6:7]) + results_list.append(results) + return results_list + + +def scale_coords(coords, input_shape, im_shape, scale_factor): + # Note: ratio has only one value, scale_factor[0] == scale_factor[1] + # + # This function only used for JDE YOLOv3 or other detectors with + # LetterBoxResize and JDEBBoxPostProcess, coords output from detector had + # not scaled back to the origin image. + + ratio = scale_factor[0] + pad_w = (input_shape[1] - int(im_shape[1])) / 2 + pad_h = (input_shape[0] - int(im_shape[0])) / 2 + coords[:, 0::2] -= pad_w + coords[:, 1::2] -= pad_h + coords[:, 0:4] /= ratio + coords[:, :4] = np.clip(coords[:, :4], a_min=0, a_max=coords[:, :4].max()) + return coords.round() + + +def clip_box(xyxy, ori_image_shape): + H, W = ori_image_shape + xyxy[:, 0::2] = np.clip(xyxy[:, 0::2], a_min=0, a_max=W) + xyxy[:, 1::2] = np.clip(xyxy[:, 1::2], a_min=0, a_max=H) + w = xyxy[:, 2:3] - xyxy[:, 0:1] + h = xyxy[:, 3:4] - xyxy[:, 1:2] + mask = np.logical_and(h > 0, w > 0) + keep_idx = np.nonzero(mask) + return xyxy[keep_idx[0]], keep_idx + + +def get_crops(xyxy, ori_img, w, h): + crops = [] + xyxy = xyxy.astype(np.int64) + ori_img = ori_img.transpose(1, 0, 2) # [h,w,3]->[w,h,3] + for i, bbox in enumerate(xyxy): + crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :] + crops.append(crop) + crops = preprocess_reid(crops, w, h) + return crops + + +def preprocess_reid(imgs, + w=64, + h=192, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + im_batch = [] + for img in imgs: + img = cv2.resize(img, (w, h)) + img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255 + img_mean = np.array(mean).reshape((3, 1, 1)) + img_std = np.array(std).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + img = np.expand_dims(img, axis=0) + im_batch.append(img) + im_batch = np.concatenate(im_batch, 0) + return im_batch + + +def flow_statistic(result, + secs_interval, + do_entrance_counting, + do_break_in_counting, + region_type, + video_fps, + entrance, + id_set, + interval_id_set, + in_id_list, + out_id_list, + prev_center, + records, + data_type='mot', + ids2names=['pedestrian']): + # Count in/out number: + # Note that 'region_type' should be one of ['horizontal', 'vertical', 'custom'], + # 'horizontal' and 'vertical' means entrance is the center line as the entrance when do_entrance_counting, + # 'custom' means entrance is a region defined by users when do_break_in_counting. + + if do_entrance_counting: + assert region_type in [ + 'horizontal', 'vertical' + ], "region_type should be 'horizontal' or 'vertical' when do entrance counting." + entrance_x, entrance_y = entrance[0], entrance[1] + frame_id, tlwhs, tscores, track_ids = result + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + if data_type == 'kitti': + frame_id -= 1 + x1, y1, w, h = tlwh + center_x = x1 + w / 2. + center_y = y1 + h / 2. + if track_id in prev_center: + if region_type == 'horizontal': + # horizontal center line + if prev_center[track_id][1] <= entrance_y and \ + center_y > entrance_y: + in_id_list.append(track_id) + if prev_center[track_id][1] >= entrance_y and \ + center_y < entrance_y: + out_id_list.append(track_id) + else: + # vertical center line + if prev_center[track_id][0] <= entrance_x and \ + center_x > entrance_x: + in_id_list.append(track_id) + if prev_center[track_id][0] >= entrance_x and \ + center_x < entrance_x: + out_id_list.append(track_id) + prev_center[track_id][0] = center_x + prev_center[track_id][1] = center_y + else: + prev_center[track_id] = [center_x, center_y] + + if do_break_in_counting: + assert region_type in [ + 'custom' + ], "region_type should be 'custom' when do break_in counting." + assert len( + entrance + ) >= 4, "entrance should be at least 3 points and (w,h) of image when do break_in counting." + im_w, im_h = entrance[-1][:] + entrance = np.array(entrance[:-1]) + + frame_id, tlwhs, tscores, track_ids = result + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + if data_type == 'kitti': + frame_id -= 1 + x1, y1, w, h = tlwh + center_x = min(x1 + w / 2., im_w - 1) + if ids2names[0] == 'pedestrian': + center_y = min(y1 + h, im_h - 1) + else: + center_y = min(y1 + h / 2, im_h - 1) + + # counting objects in region of the first frame + if frame_id == 1: + if in_quadrangle([center_x, center_y], entrance, im_h, im_w): + in_id_list.append(-1) + else: + prev_center[track_id] = [center_x, center_y] + else: + if track_id in prev_center: + if not in_quadrangle(prev_center[track_id], entrance, im_h, + im_w) and in_quadrangle( + [center_x, center_y], entrance, + im_h, im_w): + in_id_list.append(track_id) + prev_center[track_id] = [center_x, center_y] + else: + prev_center[track_id] = [center_x, center_y] + +# Count totol number, number at a manual-setting interval + frame_id, tlwhs, tscores, track_ids = result + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + id_set.add(track_id) + interval_id_set.add(track_id) + + # Reset counting at the interval beginning + if frame_id % video_fps == 0 and frame_id / video_fps % secs_interval == 0: + curr_interval_count = len(interval_id_set) + interval_id_set.clear() + info = "Frame id: {}, Total count: {}".format(frame_id, len(id_set)) + if do_entrance_counting: + info += ", In count: {}, Out count: {}".format( + len(in_id_list), len(out_id_list)) + if do_break_in_counting: + info += ", Break_in count: {}".format(len(in_id_list)) + if frame_id % video_fps == 0 and frame_id / video_fps % secs_interval == 0: + info += ", Count during {} secs: {}".format(secs_interval, + curr_interval_count) + interval_id_set.clear() + # print(info) + info += "\n" + records.append(info) + + return { + "id_set": id_set, + "interval_id_set": interval_id_set, + "in_id_list": in_id_list, + "out_id_list": out_id_list, + "prev_center": prev_center, + "records": records, + } + + +def distance(center_1, center_2): + return math.sqrt( + math.pow(center_1[0] - center_2[0], 2) + math.pow(center_1[1] - + center_2[1], 2)) + + +# update vehicle parking info +def update_object_info(object_in_region_info, + result, + region_type, + entrance, + fps, + illegal_parking_time, + distance_threshold_frame=3, + distance_threshold_interval=50): + ''' + For consecutive frames, the distance between two frame is smaller than distance_threshold_frame, regard as parking + For parking in general, the move distance should smaller than distance_threshold_interval + The moving distance of the vehicle is scaled according to the y, which is inversely proportional to y. + ''' + + assert region_type in [ + 'custom' + ], "region_type should be 'custom' when do break_in counting." + assert len( + entrance + ) >= 4, "entrance should be at least 3 points and (w,h) of image when do break_in counting." + + frame_id, tlwhs, tscores, track_ids = result # result from mot + + im_w, im_h = entrance[-1][:] + entrance = np.array(entrance[:-1]) + + illegal_parking_dict = {} + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + + x1, y1, w, h = tlwh + center_x = min(x1 + w / 2., im_w - 1) + center_y = min(y1 + h / 2, im_h - 1) + + if not in_quadrangle([center_x, center_y], entrance, im_h, im_w): + continue + + current_center = (center_x, center_y) + if track_id not in object_in_region_info.keys( + ): # first time appear in region + object_in_region_info[track_id] = {} + object_in_region_info[track_id]["start_frame"] = frame_id + object_in_region_info[track_id]["end_frame"] = frame_id + object_in_region_info[track_id]["prev_center"] = current_center + object_in_region_info[track_id]["start_center"] = current_center + else: + prev_center = object_in_region_info[track_id]["prev_center"] + + dis = distance(current_center, prev_center) + scaled_dis = 200 * dis / ( + current_center[1] + 1) # scale distance according to y + dis = scaled_dis + + if dis < distance_threshold_frame: # not move + object_in_region_info[track_id]["end_frame"] = frame_id + object_in_region_info[track_id]["prev_center"] = current_center + else: # move + object_in_region_info[track_id]["start_frame"] = frame_id + object_in_region_info[track_id]["end_frame"] = frame_id + object_in_region_info[track_id]["prev_center"] = current_center + object_in_region_info[track_id]["start_center"] = current_center + + # whether current object parking + distance_from_start = distance( + object_in_region_info[track_id]["start_center"], current_center) + if distance_from_start > distance_threshold_interval: + # moved + object_in_region_info[track_id]["start_frame"] = frame_id + object_in_region_info[track_id]["end_frame"] = frame_id + object_in_region_info[track_id]["prev_center"] = current_center + object_in_region_info[track_id]["start_center"] = current_center + continue + + if (object_in_region_info[track_id]["end_frame"]-object_in_region_info[track_id]["start_frame"]) /fps >= illegal_parking_time \ + and distance_from_start 0: + return True + else: + return False diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot/visualize.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/visualize.py new file mode 100644 index 0000000000000000000000000000000000000000..205f4aac2312e1c7238cfd16490e416b2e5dfe8f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot/visualize.py @@ -0,0 +1,375 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import division + +import os +import cv2 +import numpy as np +from PIL import Image, ImageDraw, ImageFile +ImageFile.LOAD_TRUNCATED_IMAGES = True +from collections import deque + + +def visualize_box_mask(im, results, labels, threshold=0.5): + """ + Args: + im (str/np.ndarray): path of image/np.ndarray read by cv2 + results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): Threshold of score. + Returns: + im (PIL.Image.Image): visualized image + """ + if isinstance(im, str): + im = Image.open(im).convert('RGB') + else: + im = Image.fromarray(im) + if 'boxes' in results and len(results['boxes']) > 0: + im = draw_box(im, results['boxes'], labels, threshold=threshold) + return im + + +def get_color_map_list(num_classes): + """ + Args: + num_classes (int): number of class + Returns: + color_map (list): RGB color list + """ + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + +def draw_box(im, np_boxes, labels, threshold=0.5): + """ + Args: + im (PIL.Image.Image): PIL image + np_boxes (np.ndarray): shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): threshold of box + Returns: + im (PIL.Image.Image): visualized image + """ + draw_thickness = min(im.size) // 320 + draw = ImageDraw.Draw(im) + clsid2color = {} + color_list = get_color_map_list(len(labels)) + expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) + np_boxes = np_boxes[expect_boxes, :] + + for dt in np_boxes: + clsid, bbox, score = int(dt[0]), dt[2:], dt[1] + if clsid not in clsid2color: + clsid2color[clsid] = color_list[clsid] + color = tuple(clsid2color[clsid]) + + if len(bbox) == 4: + xmin, ymin, xmax, ymax = bbox + print('class_id:{:d}, confidence:{:.4f}, left_top:[{:.2f},{:.2f}],' + 'right_bottom:[{:.2f},{:.2f}]'.format( + int(clsid), score, xmin, ymin, xmax, ymax)) + # draw bbox + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=draw_thickness, + fill=color) + elif len(bbox) == 8: + x1, y1, x2, y2, x3, y3, x4, y4 = bbox + draw.line( + [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)], + width=2, + fill=color) + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + + # draw label + text = "{} {:.4f}".format(labels[clsid], score) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color) + draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) + return im + + +def get_color(idx): + idx = idx * 3 + color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255) + return color + + +def plot_tracking(image, + tlwhs, + obj_ids, + scores=None, + frame_id=0, + fps=0., + ids2names=[], + do_entrance_counting=False, + entrance=None): + im = np.ascontiguousarray(np.copy(image)) + im_h, im_w = im.shape[:2] + + text_scale = max(0.5, image.shape[1] / 3000.) + text_thickness = 2 + line_thickness = max(1, int(image.shape[1] / 500.)) + + cv2.putText( + im, + 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), + (0, int(15 * text_scale) + 5), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h))) + obj_id = int(obj_ids[i]) + id_text = 'ID: {}'.format(int(obj_id)) + if ids2names != []: + assert len( + ids2names) == 1, "plot_tracking only supports single classes." + id_text = 'ID: {}_'.format(ids2names[0]) + id_text + _line_thickness = 1 if obj_id <= 0 else line_thickness + color = get_color(abs(obj_id)) + cv2.rectangle( + im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness) + cv2.putText( + im, + id_text, (intbox[0], intbox[1] - 25), + cv2.FONT_ITALIC, + text_scale, (0, 255, 255), + thickness=text_thickness) + + if scores is not None: + text = 'score: {:.2f}'.format(float(scores[i])) + cv2.putText( + im, + text, (intbox[0], intbox[1] - 6), + cv2.FONT_ITALIC, + text_scale, (0, 255, 0), + thickness=text_thickness) + if do_entrance_counting: + entrance_line = tuple(map(int, entrance)) + cv2.rectangle( + im, + entrance_line[0:2], + entrance_line[2:4], + color=(0, 255, 255), + thickness=line_thickness) + return im + + +def plot_tracking_dict(image, + num_classes, + tlwhs_dict, + obj_ids_dict, + scores_dict, + frame_id=0, + fps=0., + ids2names=[], + do_entrance_counting=False, + do_break_in_counting=False, + do_illegal_parking_recognition=False, + illegal_parking_dict=None, + entrance=None, + records=None, + center_traj=None): + im = np.ascontiguousarray(np.copy(image)) + im_h, im_w = im.shape[:2] + if do_break_in_counting or do_illegal_parking_recognition: + entrance = np.array(entrance[:-1]) # last pair is [im_w, im_h] + + text_scale = max(0.5, image.shape[1] / 3000.) + text_thickness = 2 + line_thickness = max(1, int(image.shape[1] / 500.)) + + if num_classes == 1: + if records is not None: + start = records[-1].find('Total') + end = records[-1].find('In') + cv2.putText( + im, + records[-1][start:end], (0, int(40 * text_scale) + 10), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if num_classes == 1 and do_entrance_counting: + entrance_line = tuple(map(int, entrance)) + cv2.rectangle( + im, + entrance_line[0:2], + entrance_line[2:4], + color=(0, 255, 255), + thickness=line_thickness) + # find start location for entrance counting data + start = records[-1].find('In') + cv2.putText( + im, + records[-1][start:-1], (0, int(60 * text_scale) + 10), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if num_classes == 1 and (do_break_in_counting or + do_illegal_parking_recognition): + np_masks = np.zeros((im_h, im_w, 1), np.uint8) + cv2.fillPoly(np_masks, [entrance], 255) + + # Draw region mask + alpha = 0.3 + im = np.array(im).astype('float32') + mask = np_masks[:, :, 0] + color_mask = [0, 0, 255] + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + im[idx[0], idx[1], :] *= 1.0 - alpha + im[idx[0], idx[1], :] += alpha * color_mask + im = np.array(im).astype('uint8') + + if do_break_in_counting: + # find start location for break in counting data + start = records[-1].find('Break_in') + cv2.putText( + im, + records[-1][start:-1], + (entrance[0][0] - 10, entrance[0][1] - 10), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if illegal_parking_dict is not None and len(illegal_parking_dict) != 0: + for key, value in illegal_parking_dict.items(): + x1, y1, w, h = value['bbox'] + plate = value['plate'] + if plate is None: + plate = "" + + # red box + cv2.rectangle(im, (int(x1), int(y1)), + (int(x1 + w), int(y1 + h)), (0, 0, 255), 2) + + cv2.putText( + im, + "illegal_parking:" + plate, + (int(x1) + 5, int(16 * text_scale + y1 + 15)), + cv2.FONT_ITALIC, + text_scale * 1.5, (0, 0, 255), + thickness=text_thickness) + + for cls_id in range(num_classes): + tlwhs = tlwhs_dict[cls_id] + obj_ids = obj_ids_dict[cls_id] + scores = scores_dict[cls_id] + cv2.putText( + im, + 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), + (0, int(15 * text_scale) + 5), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + record_id = set() + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h))) + center = tuple(map(int, (x1 + w / 2., y1 + h / 2.))) + obj_id = int(obj_ids[i]) + if center_traj is not None: + record_id.add(obj_id) + if obj_id not in center_traj[cls_id]: + center_traj[cls_id][obj_id] = deque(maxlen=30) + center_traj[cls_id][obj_id].append(center) + + id_text = '{}'.format(int(obj_id)) + if ids2names != []: + id_text = '{}_{}'.format(ids2names[cls_id], id_text) + else: + id_text = 'class{}_{}'.format(cls_id, id_text) + + _line_thickness = 1 if obj_id <= 0 else line_thickness + + in_region = False + if do_break_in_counting: + center_x = min(x1 + w / 2., im_w - 1) + center_down_y = min(y1 + h, im_h - 1) + if in_quadrangle([center_x, center_down_y], entrance, im_h, + im_w): + in_region = True + + color = get_color(abs(obj_id)) if in_region == False else (0, 0, + 255) + cv2.rectangle( + im, + intbox[0:2], + intbox[2:4], + color=color, + thickness=line_thickness) + cv2.putText( + im, + id_text, (intbox[0], intbox[1] - 25), + cv2.FONT_ITALIC, + text_scale, + color, + thickness=text_thickness) + + if do_break_in_counting and in_region: + cv2.putText( + im, + 'Break in now.', (intbox[0], intbox[1] - 50), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if scores is not None: + text = 'score: {:.2f}'.format(float(scores[i])) + cv2.putText( + im, + text, (intbox[0], intbox[1] - 6), + cv2.FONT_ITALIC, + text_scale, + color, + thickness=text_thickness) + if center_traj is not None: + for traj in center_traj: + for i in traj.keys(): + if i not in record_id: + continue + for point in traj[i]: + cv2.circle(im, point, 3, (0, 0, 255), -1) + return im + + +def in_quadrangle(point, entrance, im_h, im_w): + mask = np.zeros((im_h, im_w, 1), np.uint8) + cv2.fillPoly(mask, [entrance], 255) + p = tuple(map(int, point)) + if mask[p[1], p[0], :] > 0: + return True + else: + return False diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot_jde_infer.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_jde_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..e3a9958f7a79b568d0cb1ba9c649aa0499b21e0b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_jde_infer.py @@ -0,0 +1,508 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import yaml +import cv2 +import numpy as np +from collections import defaultdict +import paddle + +from benchmark_utils import PaddleInferBenchmark +from preprocess import decode_image +from mot_utils import argsparser, Timer, get_current_memory_mb +from det_infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from mot import JDETracker +from mot.utils import MOTTimer, write_mot_results, flow_statistic +from mot.visualize import plot_tracking, plot_tracking_dict + +# Global dictionary +MOT_JDE_SUPPORT_MODELS = { + 'JDE', + 'FairMOT', +} + + +class JDE_Detector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + draw_center_traj (bool): Whether drawing the trajectory of center, default as False + secs_interval (int): The seconds interval to count after tracking, default as 10 + skip_frame_num (int): Skip frame num to get faster MOT results, default as -1 + do_entrance_counting(bool): Whether counting the numbers of identifiers entering + or getting out from the entrance, default as False,only support single class + counting in MOT. + do_break_in_counting(bool): Whether counting the numbers of identifiers break in + the area, default as False,only support single class counting in MOT, + and the video should be taken by a static camera. + region_type (str): Area type for entrance counting or break in counting, 'horizontal' + and 'vertical' used when do entrance counting. 'custom' used when do break in counting. + Note that only support single-class MOT, and the video should be taken by a static camera. + region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when + do_break_in_counting. Note that only support single-class MOT and + the video should be taken by a static camera. + """ + + def __init__(self, + model_dir, + tracker_config=None, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1088, + trt_opt_shape=608, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, + draw_center_traj=False, + secs_interval=10, + skip_frame_num=-1, + do_entrance_counting=False, + do_break_in_counting=False, + region_type='horizontal', + region_polygon=[]): + super(JDE_Detector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts + self.draw_center_traj = draw_center_traj + self.secs_interval = secs_interval + self.skip_frame_num = skip_frame_num + self.do_entrance_counting = do_entrance_counting + self.do_break_in_counting = do_break_in_counting + self.region_type = region_type + self.region_polygon = region_polygon + if self.region_type == 'custom': + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' + + assert batch_size == 1, "MOT model only supports batch_size=1." + self.det_times = Timer(with_tracker=True) + self.num_classes = len(self.pred_config.labels) + if self.skip_frame_num > 1: + self.previous_det_result = None + + # tracker config + assert self.pred_config.tracker, "The exported JDE Detector model should have tracker." + cfg = self.pred_config.tracker + min_box_area = cfg.get('min_box_area', 0.0) + vertical_ratio = cfg.get('vertical_ratio', 0.0) + conf_thres = cfg.get('conf_thres', 0.0) + tracked_thresh = cfg.get('tracked_thresh', 0.7) + metric_type = cfg.get('metric_type', 'euclidean') + + self.tracker = JDETracker( + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + conf_thres=conf_thres, + tracked_thresh=tracked_thresh, + metric_type=metric_type) + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes = result['pred_dets'] + if np_boxes.shape[0] <= 0: + print('[WARNNING] No object detected.') + result = {'pred_dets': np.zeros([0, 6]), 'pred_embs': None} + result = {k: v for k, v in result.items() if v is not None} + return result + + def tracking(self, det_results): + pred_dets = det_results['pred_dets'] # cls_id, score, x0, y0, x1, y1 + pred_embs = det_results['pred_embs'] + online_targets_dict = self.tracker.update(pred_dets, pred_embs) + + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(self.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + return online_tlwhs, online_scores, online_ids + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + FairMOT(JDE)'s result include 'pred_embs': np.ndarray: + shape: [N, 128] + ''' + # model prediction + np_pred_dets, np_pred_embs = None, None + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + boxes_tensor = self.predictor.get_output_handle(output_names[0]) + np_pred_dets = boxes_tensor.copy_to_cpu() + embs_tensor = self.predictor.get_output_handle(output_names[1]) + np_pred_embs = embs_tensor.copy_to_cpu() + + result = dict(pred_dets=np_pred_dets, pred_embs=np_pred_embs) + return result + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + seq_name=None, + reuse_det_result=False): + mot_results = [] + num_classes = self.num_classes + image_list.sort() + ids2names = self.pred_config.labels + data_type = 'mcmot' if num_classes > 1 else 'mot' + for frame_id, img_file in enumerate(image_list): + batch_image_list = [img_file] # bs=1 in MOT model + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking + result_warmup = self.tracking(det_result) + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking( + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + self.det_times.preprocess_time_s.start() + if not reuse_det_result: + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + self.det_times.inference_time_s.start() + if not reuse_det_result: + result = self.predict() + self.det_times.inference_time_s.end() + + self.det_times.postprocess_time_s.start() + if not reuse_det_result: + det_result = self.postprocess(inputs, result) + self.previous_det_result = det_result + else: + assert self.previous_det_result is not None + det_result = self.previous_det_result + self.det_times.postprocess_time_s.end() + + # tracking process + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking( + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + if visual: + if len(image_list) > 1 and frame_id % 10 == 0: + print('Tracking frame {}'.format(frame_id)) + frame, _ = decode_image(img_file, {}) + + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + if seq_name is None: + seq_name = image_list[0].split('/')[-2] + save_dir = os.path.join(self.output_dir, seq_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) + + mot_results.append([online_tlwhs, online_scores, online_ids]) + return mot_results + + def predict_video(self, video_file, camera_id): + video_out_name = 'mot_output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 0 + timer = MOTTimer() + results = defaultdict(list) # support single class and multi classes + num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + + center_traj = None + entrance = None + records = None + if self.draw_center_traj: + center_traj = [{} for i in range(num_classes)] + if num_classes == 1: + id_set = set() + interval_id_set = set() + in_id_list = list() + out_id_list = list() + prev_center = dict() + records = list() + if self.do_entrance_counting or self.do_break_in_counting: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + for i in range(0, len(self.region_polygon), 2): + entrance.append([ + self.region_polygon[i], self.region_polygon[i + 1] + ]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} is not supported.".format( + self.region_type)) + + video_fps = fps + + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + + timer.tic() + mot_skip_frame_num = self.skip_frame_num + reuse_det_result = False + if mot_skip_frame_num > 1 and frame_id > 0 and frame_id % mot_skip_frame_num > 0: + reuse_det_result = True + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame], + visual=False, + seq_name=seq_name, + reuse_det_result=reuse_det_result) + timer.toc() + + online_tlwhs, online_scores, online_ids = mot_results[0] + for cls_id in range(num_classes): + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id], + online_ids[cls_id])) + + # NOTE: just implement flow statistic for single class + if num_classes == 1: + result = (frame_id + 1, online_tlwhs[0], online_scores[0], + online_ids[0]) + statistic = flow_statistic( + result, + self.secs_interval, + self.do_entrance_counting, + self.do_break_in_counting, + self.region_type, + video_fps, + entrance, + id_set, + interval_id_set, + in_id_list, + out_id_list, + prev_center, + records, + data_type, + ids2names=self.pred_config.labels) + records = statistic['records'] + + fps = 1. / timer.duration + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names, + do_entrance_counting=self.do_entrance_counting, + entrance=entrance, + records=records, + center_traj=center_traj) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + frame_id += 1 + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + + write_mot_results(result_filename, results, data_type, num_classes) + + if num_classes == 1: + result_filename = os.path.join( + self.output_dir, + video_out_name.split('.')[-2] + '_flow_statistic.txt') + f = open(result_filename, 'w') + for line in records: + f.write(line) + print('Flow statistic save in {}'.format(result_filename)) + f.close() + + writer.release() + + +def main(): + detector = JDE_Detector( + FLAGS.model_dir, + tracker_config=None, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts, + draw_center_traj=FLAGS.draw_center_traj, + secs_interval=FLAGS.secs_interval, + skip_frame_num=FLAGS.skip_frame_num, + do_entrance_counting=FLAGS.do_entrance_counting, + do_break_in_counting=FLAGS.do_break_in_counting, + region_type=FLAGS.region_type, + region_polygon=FLAGS.region_polygon) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='MOT') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot_sde_infer.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_sde_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..499ee2c2dfe2f8f8b1266601464b61a7fb8ae48d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_sde_infer.py @@ -0,0 +1,952 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import yaml +import cv2 +import re +import glob +import numpy as np +from collections import defaultdict +import paddle + +from benchmark_utils import PaddleInferBenchmark +from preprocess import decode_image + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) +sys.path.insert(0, parent_path) + +from det_infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor +from mot_utils import argsparser, Timer, get_current_memory_mb, video2frames, _is_valid_video +from mot.tracker import JDETracker, DeepSORTTracker, OCSORTTracker, BOTSORTTracker +from mot.utils import MOTTimer, write_mot_results, get_crops, clip_box, flow_statistic +from mot.visualize import plot_tracking, plot_tracking_dict + +from mot.mtmct.utils import parse_bias +from mot.mtmct.postprocess import trajectory_fusion, sub_cluster, gen_res, print_mtmct_result +from mot.mtmct.postprocess import get_mtmct_matching_results, save_mtmct_crops, save_mtmct_vis_results + + +class SDE_Detector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + tracker_config (str): tracker config path + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + draw_center_traj (bool): Whether drawing the trajectory of center, default as False + secs_interval (int): The seconds interval to count after tracking, default as 10 + skip_frame_num (int): Skip frame num to get faster MOT results, default as -1 + warmup_frame (int):Warmup frame num to test speed of MOT,default as 50 + do_entrance_counting(bool): Whether counting the numbers of identifiers entering + or getting out from the entrance, default as False,only support single class + counting in MOT, and the video should be taken by a static camera. + do_break_in_counting(bool): Whether counting the numbers of identifiers break in + the area, default as False,only support single class counting in MOT, + and the video should be taken by a static camera. + region_type (str): Area type for entrance counting or break in counting, 'horizontal' + and 'vertical' used when do entrance counting. 'custom' used when do break in counting. + Note that only support single-class MOT, and the video should be taken by a static camera. + region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when + do_break_in_counting. Note that only support single-class MOT and + the video should be taken by a static camera. + reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT + mtmct_dir (str): MTMCT dir, default None, set for doing MTMCT + """ + + def __init__(self, + model_dir, + tracker_config, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, + draw_center_traj=False, + secs_interval=10, + skip_frame_num=-1, + warmup_frame=50, + do_entrance_counting=False, + do_break_in_counting=False, + region_type='horizontal', + region_polygon=[], + reid_model_dir=None, + mtmct_dir=None): + super(SDE_Detector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts + self.draw_center_traj = draw_center_traj + self.secs_interval = secs_interval + self.skip_frame_num = skip_frame_num + self.warmup_frame = warmup_frame + self.do_entrance_counting = do_entrance_counting + self.do_break_in_counting = do_break_in_counting + self.region_type = region_type + self.region_polygon = region_polygon + if self.region_type == 'custom': + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' + + assert batch_size == 1, "MOT model only supports batch_size=1." + self.det_times = Timer(with_tracker=True) + self.num_classes = len(self.pred_config.labels) + if self.skip_frame_num > 1: + self.previous_det_result = None + + # reid config + self.use_reid = False if reid_model_dir is None else True + if self.use_reid: + self.reid_pred_config = self.set_config(reid_model_dir) + self.reid_predictor, self.config = load_predictor( + reid_model_dir, + run_mode=run_mode, + batch_size=50, # reid_batch_size + min_subgraph_size=self.reid_pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.reid_pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) + else: + self.reid_pred_config = None + self.reid_predictor = None + + assert tracker_config is not None, 'Note that tracker_config should be set.' + self.tracker_config = tracker_config + tracker_cfg = yaml.safe_load(open(self.tracker_config)) + cfg = tracker_cfg[tracker_cfg['type']] + + # tracker config + self.use_deepsort_tracker = True if tracker_cfg[ + 'type'] == 'DeepSORTTracker' else False + self.use_ocsort_tracker = True if tracker_cfg[ + 'type'] == 'OCSORTTracker' else False + self.use_botsort_tracker = True if tracker_cfg[ + 'type'] == 'BOTSORTTracker' else False + + if self.use_deepsort_tracker: + if self.reid_pred_config is not None and hasattr( + self.reid_pred_config, 'tracker'): + cfg = self.reid_pred_config.tracker + budget = cfg.get('budget', 100) + max_age = cfg.get('max_age', 30) + max_iou_distance = cfg.get('max_iou_distance', 0.7) + matching_threshold = cfg.get('matching_threshold', 0.2) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + + self.tracker = DeepSORTTracker( + budget=budget, + max_age=max_age, + max_iou_distance=max_iou_distance, + matching_threshold=matching_threshold, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, ) + + elif self.use_ocsort_tracker: + det_thresh = cfg.get('det_thresh', 0.4) + max_age = cfg.get('max_age', 30) + min_hits = cfg.get('min_hits', 3) + iou_threshold = cfg.get('iou_threshold', 0.3) + delta_t = cfg.get('delta_t', 3) + inertia = cfg.get('inertia', 0.2) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + use_byte = cfg.get('use_byte', False) + use_angle_cost = cfg.get('use_angle_cost', False) + + self.tracker = OCSORTTracker( + det_thresh=det_thresh, + max_age=max_age, + min_hits=min_hits, + iou_threshold=iou_threshold, + delta_t=delta_t, + inertia=inertia, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + use_byte=use_byte, + use_angle_cost=use_angle_cost) + + elif self.use_botsort_tracker: + track_high_thresh = cfg.get('track_high_thresh', 0.3) + track_low_thresh = cfg.get('track_low_thresh', 0.2) + new_track_thresh = cfg.get('new_track_thresh', 0.4) + match_thresh = cfg.get('match_thresh', 0.7) + track_buffer = cfg.get('track_buffer', 30) + camera_motion = cfg.get('camera_motion', False) + cmc_method = cfg.get('cmc_method', 'sparseOptFlow') + + self.tracker = BOTSORTTracker( + track_high_thresh=track_high_thresh, + track_low_thresh=track_low_thresh, + new_track_thresh=new_track_thresh, + match_thresh=match_thresh, + track_buffer=track_buffer, + camera_motion=camera_motion, + cmc_method=cmc_method) + + else: + # use ByteTracker + use_byte = cfg.get('use_byte', False) + det_thresh = cfg.get('det_thresh', 0.3) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + match_thres = cfg.get('match_thres', 0.9) + conf_thres = cfg.get('conf_thres', 0.6) + low_conf_thres = cfg.get('low_conf_thres', 0.1) + + self.tracker = JDETracker( + use_byte=use_byte, + det_thresh=det_thresh, + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + match_thres=match_thres, + conf_thres=conf_thres, + low_conf_thres=low_conf_thres, ) + + self.do_mtmct = False if mtmct_dir is None else True + self.mtmct_dir = mtmct_dir + + def postprocess(self, inputs, result): + # postprocess output of predictor + keep_idx = result['boxes'][:, 1] > self.threshold + result['boxes'] = result['boxes'][keep_idx] + np_boxes_num = [len(result['boxes'])] + if np_boxes_num[0] <= 0: + print('[WARNNING] No object detected.') + result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} + result = {k: v for k, v in result.items() if v is not None} + return result + + def reidprocess(self, det_results, repeats=1): + pred_dets = det_results['boxes'] # cls_id, score, x0, y0, x1, y1 + pred_xyxys = pred_dets[:, 2:6] + + ori_image = det_results['ori_image'] + ori_image_shape = ori_image.shape[:2] + pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape) + + if len(keep_idx[0]) == 0: + det_results['boxes'] = np.zeros((1, 6), dtype=np.float32) + det_results['embeddings'] = None + return det_results + + pred_dets = pred_dets[keep_idx[0]] + pred_xyxys = pred_dets[:, 2:6] + + w, h = self.tracker.input_size + crops = get_crops(pred_xyxys, ori_image, w, h) + + # to keep fast speed, only use topk crops + crops = crops[:50] # reid_batch_size + det_results['crops'] = np.array(crops).astype('float32') + det_results['boxes'] = pred_dets[:50] + + input_names = self.reid_predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.reid_predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(det_results[input_names[i]]) + + # model prediction + for i in range(repeats): + self.reid_predictor.run() + output_names = self.reid_predictor.get_output_names() + feature_tensor = self.reid_predictor.get_output_handle(output_names[ + 0]) + pred_embs = feature_tensor.copy_to_cpu() + + det_results['embeddings'] = pred_embs + return det_results + + def tracking(self, det_results, img=None): + pred_dets = det_results['boxes'] # cls_id, score, x0, y0, x1, y1 + pred_embs = det_results.get('embeddings', None) + + if self.use_deepsort_tracker: + # use DeepSORTTracker, only support singe class + self.tracker.predict() + online_targets = self.tracker.update(pred_dets, pred_embs) + online_tlwhs, online_scores, online_ids = [], [], [] + if self.do_mtmct: + online_tlbrs, online_feats = [], [] + for t in online_targets: + if not t.is_confirmed() or t.time_since_update > 1: + continue + tlwh = t.to_tlwh() + tscore = t.score + tid = t.track_id + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs.append(tlwh) + online_scores.append(tscore) + online_ids.append(tid) + if self.do_mtmct: + online_tlbrs.append(t.to_tlbr()) + online_feats.append(t.feat) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + if self.do_mtmct: + seq_name = det_results['seq_name'] + frame_id = det_results['frame_id'] + + tracking_outs['feat_data'] = {} + for _tlbr, _id, _feat in zip(online_tlbrs, online_ids, + online_feats): + feat_data = {} + feat_data['bbox'] = _tlbr + feat_data['frame'] = f"{frame_id:06d}" + feat_data['id'] = _id + _imgname = f'{seq_name}_{_id}_{frame_id}.jpg' + feat_data['imgname'] = _imgname + feat_data['feat'] = _feat + tracking_outs['feat_data'].update({_imgname: feat_data}) + return tracking_outs + + elif self.use_ocsort_tracker: + # use OCSORTTracker, only support singe class + online_targets = self.tracker.update(pred_dets, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for t in online_targets: + tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]] + tscore = float(t[4]) + tid = int(t[5]) + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + if tlwh[2] * tlwh[3] > 0: + online_tlwhs[0].append(tlwh) + online_ids[0].append(tid) + online_scores[0].append(tscore) + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + + elif self.use_botsort_tracker: + # use BOTSORTTracker, only support singe class + online_targets = self.tracker.update(pred_dets, img) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: + continue + online_tlwhs[0].append(tlwh) + online_ids[0].append(tid) + online_scores[0].append(tscore) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + + else: + # use ByteTracker, support multiple class + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + if self.do_mtmct: + online_tlbrs, online_feats = defaultdict(list), defaultdict( + list) + online_targets_dict = self.tracker.update(pred_dets, pred_embs) + for cls_id in range(self.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: + continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + if self.do_mtmct: + online_tlbrs[cls_id].append(t.tlbr) + online_feats[cls_id].append(t.curr_feat) + + if self.do_mtmct: + assert self.num_classes == 1, 'MTMCT only support single class.' + tracking_outs = { + 'online_tlwhs': online_tlwhs[0], + 'online_scores': online_scores[0], + 'online_ids': online_ids[0], + } + seq_name = det_results['seq_name'] + frame_id = det_results['frame_id'] + tracking_outs['feat_data'] = {} + for _tlbr, _id, _feat in zip(online_tlbrs[0], online_ids[0], + online_feats[0]): + feat_data = {} + feat_data['bbox'] = _tlbr + feat_data['frame'] = f"{frame_id:06d}" + feat_data['id'] = _id + _imgname = f'{seq_name}_{_id}_{frame_id}.jpg' + feat_data['imgname'] = _imgname + feat_data['feat'] = _feat + tracking_outs['feat_data'].update({_imgname: feat_data}) + return tracking_outs + + else: + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + seq_name=None, + reuse_det_result=False, + frame_count=0): + num_classes = self.num_classes + image_list.sort() + ids2names = self.pred_config.labels + if self.do_mtmct: + mot_features_dict = {} # cid_tid_fid feats + else: + mot_results = [] + for frame_id, img_file in enumerate(image_list): + if self.do_mtmct: + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + batch_image_list = [img_file] # bs=1 in MOT model + frame, _ = decode_image(img_file, {}) + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) + if self.use_botsort_tracker: + result_warmup = self.tracking(det_result, batch_image_list) + else: + result_warmup = self.tracking(det_result) + self.det_times.tracking_time_s.start() + if self.use_reid: + det_result = self.reidprocess(det_result) + tracking_outs = self.tracking(det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + if frame_count > self.warmup_frame: + self.det_times.preprocess_time_s.start() + if not reuse_det_result: + inputs = self.preprocess(batch_image_list) + if frame_count > self.warmup_frame: + self.det_times.preprocess_time_s.end() + if frame_count > self.warmup_frame: + self.det_times.inference_time_s.start() + if not reuse_det_result: + result = self.predict() + if frame_count > self.warmup_frame: + self.det_times.inference_time_s.end() + if frame_count > self.warmup_frame: + self.det_times.postprocess_time_s.start() + if not reuse_det_result: + det_result = self.postprocess(inputs, result) + self.previous_det_result = det_result + else: + assert self.previous_det_result is not None + det_result = self.previous_det_result + if frame_count > self.warmup_frame: + self.det_times.postprocess_time_s.end() + + # tracking process + if frame_count > self.warmup_frame: + self.det_times.tracking_time_s.start() + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) + if self.use_botsort_tracker: + tracking_outs = self.tracking(det_result, batch_image_list) + else: + tracking_outs = self.tracking(det_result) + if frame_count > self.warmup_frame: + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + online_tlwhs = tracking_outs['online_tlwhs'] + online_scores = tracking_outs['online_scores'] + online_ids = tracking_outs['online_ids'] + + if self.do_mtmct: + feat_data_dict = tracking_outs['feat_data'] + mot_features_dict = dict(mot_features_dict, **feat_data_dict) + else: + mot_results.append([online_tlwhs, online_scores, online_ids]) + + if visual: + if len(image_list) > 1 and frame_id % 10 == 0: + print('Tracking frame {}'.format(frame_id)) + frame, _ = decode_image(img_file, {}) + if isinstance(online_tlwhs, defaultdict): + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + else: + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + save_dir = os.path.join(self.output_dir, seq_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) + + if self.do_mtmct: + return mot_features_dict + else: + return mot_results + + def predict_video(self, video_file, camera_id): + video_out_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 0 + timer = MOTTimer() + results = defaultdict(list) + num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + + center_traj = None + entrance = None + records = None + if self.draw_center_traj: + center_traj = [{} for i in range(num_classes)] + if num_classes == 1: + id_set = set() + interval_id_set = set() + in_id_list = list() + out_id_list = list() + prev_center = dict() + records = list() + if self.do_entrance_counting or self.do_break_in_counting: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + for i in range(0, len(self.region_polygon), 2): + entrance.append([ + self.region_polygon[i], self.region_polygon[i + 1] + ]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} is not supported.".format( + self.region_type)) + + video_fps = fps + + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + + timer.tic() + mot_skip_frame_num = self.skip_frame_num + reuse_det_result = False + if mot_skip_frame_num > 1 and frame_id > 0 and frame_id % mot_skip_frame_num > 0: + reuse_det_result = True + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame], + visual=False, + seq_name=seq_name, + reuse_det_result=reuse_det_result, + frame_count=frame_id) + timer.toc() + + # bs=1 in MOT model + online_tlwhs, online_scores, online_ids = mot_results[0] + + # flow statistic for one class, and only for bytetracker + if num_classes == 1 and not self.use_deepsort_tracker and not self.use_ocsort_tracker: + result = (frame_id + 1, online_tlwhs[0], online_scores[0], + online_ids[0]) + statistic = flow_statistic( + result, + self.secs_interval, + self.do_entrance_counting, + self.do_break_in_counting, + self.region_type, + video_fps, + entrance, + id_set, + interval_id_set, + in_id_list, + out_id_list, + prev_center, + records, + data_type, + ids2names=self.pred_config.labels) + records = statistic['records'] + + fps = 1. / timer.duration + if self.use_deepsort_tracker or self.use_ocsort_tracker or self.use_botsort_tracker: + # use DeepSORTTracker or OCSORTTracker, only support singe class + if isinstance(online_tlwhs, defaultdict): + online_tlwhs = online_tlwhs[0] + online_scores = online_scores[0] + online_ids = online_ids[0] + + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names, + do_entrance_counting=self.do_entrance_counting, + entrance=entrance) + else: + # use ByteTracker, support multiple class + for cls_id in range(num_classes): + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], + online_scores[cls_id], online_ids[cls_id])) + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names, + do_entrance_counting=self.do_entrance_counting, + entrance=entrance, + records=records, + center_traj=center_traj) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + frame_id += 1 + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + write_mot_results(result_filename, results) + + result_filename = os.path.join( + self.output_dir, + video_out_name.split('.')[-2] + '_flow_statistic.txt') + f = open(result_filename, 'w') + for line in records: + f.write(line) + print('Flow statistic save in {}'.format(result_filename)) + f.close() + + writer.release() + + def predict_mtmct(self, mtmct_dir, mtmct_cfg): + cameras_bias = mtmct_cfg['cameras_bias'] + cid_bias = parse_bias(cameras_bias) + scene_cluster = list(cid_bias.keys()) + # 1.zone releated parameters + use_zone = mtmct_cfg.get('use_zone', False) + zone_path = mtmct_cfg.get('zone_path', None) + + # 2.tricks parameters, can be used for other mtmct dataset + use_ff = mtmct_cfg.get('use_ff', False) + use_rerank = mtmct_cfg.get('use_rerank', False) + + # 3.camera releated parameters + use_camera = mtmct_cfg.get('use_camera', False) + use_st_filter = mtmct_cfg.get('use_st_filter', False) + + # 4.zone releated parameters + use_roi = mtmct_cfg.get('use_roi', False) + roi_dir = mtmct_cfg.get('roi_dir', False) + + mot_list_breaks = [] + cid_tid_dict = dict() + + output_dir = self.output_dir + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + seqs = os.listdir(mtmct_dir) + for seq in sorted(seqs): + fpath = os.path.join(mtmct_dir, seq) + if os.path.isfile(fpath) and _is_valid_video(fpath): + seq = seq.split('.')[-2] + print('ffmpeg processing of video {}'.format(fpath)) + frames_path = video2frames( + video_path=fpath, outpath=mtmct_dir, frame_rate=25) + fpath = os.path.join(mtmct_dir, seq) + + if os.path.isdir(fpath) == False: + print('{} is not a image folder.'.format(fpath)) + continue + if os.path.exists(os.path.join(fpath, 'img1')): + fpath = os.path.join(fpath, 'img1') + assert os.path.isdir(fpath), '{} should be a directory'.format( + fpath) + image_list = glob.glob(os.path.join(fpath, '*.jpg')) + image_list.sort() + assert len(image_list) > 0, '{} has no images.'.format(fpath) + print('start tracking seq: {}'.format(seq)) + + mot_features_dict = self.predict_image( + image_list, visual=False, seq_name=seq) + + cid = int(re.sub('[a-z,A-Z]', "", seq)) + tid_data, mot_list_break = trajectory_fusion( + mot_features_dict, + cid, + cid_bias, + use_zone=use_zone, + zone_path=zone_path) + mot_list_breaks.append(mot_list_break) + # single seq process + for line in tid_data: + tracklet = tid_data[line] + tid = tracklet['tid'] + if (cid, tid) not in cid_tid_dict: + cid_tid_dict[(cid, tid)] = tracklet + + map_tid = sub_cluster( + cid_tid_dict, + scene_cluster, + use_ff=use_ff, + use_rerank=use_rerank, + use_camera=use_camera, + use_st_filter=use_st_filter) + + pred_mtmct_file = os.path.join(output_dir, 'mtmct_result.txt') + if use_camera: + gen_res(pred_mtmct_file, scene_cluster, map_tid, mot_list_breaks) + else: + gen_res( + pred_mtmct_file, + scene_cluster, + map_tid, + mot_list_breaks, + use_roi=use_roi, + roi_dir=roi_dir) + + camera_results, cid_tid_fid_res = get_mtmct_matching_results( + pred_mtmct_file) + + crops_dir = os.path.join(output_dir, 'mtmct_crops') + save_mtmct_crops( + cid_tid_fid_res, images_dir=mtmct_dir, crops_dir=crops_dir) + + save_dir = os.path.join(output_dir, 'mtmct_vis') + save_mtmct_vis_results( + camera_results, + images_dir=mtmct_dir, + save_dir=save_dir, + save_videos=FLAGS.save_images) + + +def main(): + deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + detector = SDE_Detector( + FLAGS.model_dir, + tracker_config=FLAGS.tracker_config, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts, + draw_center_traj=FLAGS.draw_center_traj, + secs_interval=FLAGS.secs_interval, + skip_frame_num=FLAGS.skip_frame_num, + warmup_frame=FLAGS.warmup_frame, + do_entrance_counting=FLAGS.do_entrance_counting, + do_break_in_counting=FLAGS.do_break_in_counting, + region_type=FLAGS.region_type, + region_polygon=FLAGS.region_polygon, + reid_model_dir=FLAGS.reid_model_dir, + mtmct_dir=FLAGS.mtmct_dir, ) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + detector.det_times.info(average=True) + elif FLAGS.mtmct_dir is not None: + with open(FLAGS.mtmct_cfg) as f: + mtmct_cfg = yaml.safe_load(f) + detector.predict_mtmct(FLAGS.mtmct_dir, mtmct_cfg) + else: + # predict from image + if FLAGS.image_dir is None and FLAGS.image_file is not None: + assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models." + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + seq_name = FLAGS.image_dir.split('/')[-1] + detector.predict_image( + img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name) + + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='MOT') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mot_utils.py b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..9d7b18f921d35b987e8df28d32f7c5e030803234 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mot_utils.py @@ -0,0 +1,389 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time +import os +import sys +import ast +import argparse + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--batch_size", type=int, default=1, help="batch_size for inference.") + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--threshold", type=float, default=0.5, help="Threshold of score.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--use_gpu", + type=ast.literal_eval, + default=False, + help="Deprecated, please use `--device`.") + parser.add_argument( + "--run_benchmark", + type=ast.literal_eval, + default=False, + help="Whether to predict a image_file repeatedly for benchmark") + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1280, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=640, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + '--save_images', + action='store_true', + help='Save visualization image results.') + parser.add_argument( + '--save_mot_txts', + action='store_true', + help='Save tracking results (txt).') + parser.add_argument( + '--save_mot_txt_per_img', + action='store_true', + help='Save tracking results (txt) for each image.') + parser.add_argument( + '--scaled', + type=bool, + default=False, + help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 " + "True in general detector.") + parser.add_argument( + "--tracker_config", type=str, default=None, help=("tracker donfig")) + parser.add_argument( + "--reid_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py.")) + parser.add_argument( + "--reid_batch_size", + type=int, + default=50, + help="max batch_size for reid model inference.") + parser.add_argument( + '--use_dark', + type=ast.literal_eval, + default=True, + help='whether to use darkpose to get better keypoint position predict ') + parser.add_argument( + '--skip_frame_num', + type=int, + default=-1, + help='Skip frames to speed up the process of getting mot results.') + parser.add_argument( + '--warmup_frame', + type=int, + default=50, + help='Warmup frames to test speed of the process of getting mot results.' + ) + parser.add_argument( + "--do_entrance_counting", + action='store_true', + help="Whether counting the numbers of identifiers entering " + "or getting out from the entrance. Note that only support single-class MOT." + ) + parser.add_argument( + "--do_break_in_counting", + action='store_true', + help="Whether counting the numbers of identifiers break in " + "the area. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--region_type", + type=str, + default='horizontal', + help="Area type for entrance counting or break in counting, 'horizontal' and " + "'vertical' used when do entrance counting. 'custom' used when do break in counting. " + "Note that only support single-class MOT, and the video should be taken by a static camera." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--secs_interval", + type=int, + default=2, + help="The seconds interval to count after tracking") + parser.add_argument( + "--draw_center_traj", + action='store_true', + help="Whether drawing the trajectory of center") + parser.add_argument( + "--mtmct_dir", + type=str, + default=None, + help="The MTMCT scene video folder.") + parser.add_argument( + "--mtmct_cfg", type=str, default=None, help="The MTMCT config.") + return parser + + +class Times(object): + def __init__(self): + self.time = 0. + # start time + self.st = 0. + # end time + self.et = 0. + + def start(self): + self.st = time.time() + + def end(self, repeats=1, accumulative=True): + self.et = time.time() + if accumulative: + self.time += (self.et - self.st) / repeats + else: + self.time = (self.et - self.st) / repeats + + def reset(self): + self.time = 0. + self.st = 0. + self.et = 0. + + def value(self): + return round(self.time, 4) + + +class Timer(Times): + def __init__(self, with_tracker=False): + super(Timer, self).__init__() + self.with_tracker = with_tracker + self.preprocess_time_s = Times() + self.inference_time_s = Times() + self.postprocess_time_s = Times() + self.tracking_time_s = Times() + self.img_num = 0 + + def info(self, average=False): + pre_time = self.preprocess_time_s.value() + infer_time = self.inference_time_s.value() + post_time = self.postprocess_time_s.value() + track_time = self.tracking_time_s.value() + + total_time = pre_time + infer_time + post_time + if self.with_tracker: + total_time = total_time + track_time + total_time = round(total_time, 4) + print("------------------ Inference Time Info ----------------------") + print("total_time(ms): {}, img_num: {}".format(total_time * 1000, + self.img_num)) + preprocess_time = round(pre_time / max(1, self.img_num), + 4) if average else pre_time + postprocess_time = round(post_time / max(1, self.img_num), + 4) if average else post_time + inference_time = round(infer_time / max(1, self.img_num), + 4) if average else infer_time + tracking_time = round(track_time / max(1, self.img_num), + 4) if average else track_time + + average_latency = total_time / max(1, self.img_num) + qps = 0 + if total_time > 0: + qps = 1 / average_latency + print("average latency time(ms): {:.2f}, QPS: {:2f}".format( + average_latency * 1000, qps)) + if self.with_tracker: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000, tracking_time * 1000)) + else: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000)) + + def tracking_info(self, average=True): + pre_time = self.preprocess_time_s.value() + infer_time = self.inference_time_s.value() + post_time = self.postprocess_time_s.value() + track_time = self.tracking_time_s.value() + + total_time = pre_time + infer_time + post_time + if self.with_tracker: + total_time = total_time + track_time + total_time = round(total_time, 4) + print( + "------------------ Tracking Module Time Info ----------------------" + ) + + preprocess_time = round(pre_time / max(1, self.img_num), + 4) if average else pre_time + postprocess_time = round(post_time / max(1, self.img_num), + 4) if average else post_time + inference_time = round(infer_time / max(1, self.img_num), + 4) if average else infer_time + tracking_time = round(track_time / max(1, self.img_num), + 4) if average else track_time + + if self.with_tracker: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000, tracking_time * 1000)) + else: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000)) + + def report(self, average=False): + dic = {} + pre_time = self.preprocess_time_s.value() + infer_time = self.inference_time_s.value() + post_time = self.postprocess_time_s.value() + track_time = self.tracking_time_s.value() + + dic['preprocess_time_s'] = round(pre_time / max(1, self.img_num), + 4) if average else pre_time + dic['inference_time_s'] = round(infer_time / max(1, self.img_num), + 4) if average else infer_time + dic['postprocess_time_s'] = round(post_time / max(1, self.img_num), + 4) if average else post_time + dic['img_num'] = self.img_num + total_time = pre_time + infer_time + post_time + if self.with_tracker: + dic['tracking_time_s'] = round(track_time / max(1, self.img_num), + 4) if average else track_time + total_time = total_time + track_time + dic['total_time_s'] = round(total_time, 4) + return dic + + +def get_current_memory_mb(): + """ + It is used to Obtain the memory usage of the CPU and GPU during the running of the program. + And this function Current program is time-consuming. + """ + import pynvml + import psutil + import GPUtil + gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0)) + + pid = os.getpid() + p = psutil.Process(pid) + info = p.memory_full_info() + cpu_mem = info.uss / 1024. / 1024. + gpu_mem = 0 + gpu_percent = 0 + gpus = GPUtil.getGPUs() + if gpu_id is not None and len(gpus) > 0: + gpu_percent = gpus[gpu_id].load + pynvml.nvmlInit() + handle = pynvml.nvmlDeviceGetHandleByIndex(0) + meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle) + gpu_mem = meminfo.used / 1024. / 1024. + return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4) + + +def video2frames(video_path, outpath, frame_rate=25, **kargs): + def _dict2str(kargs): + cmd_str = '' + for k, v in kargs.items(): + cmd_str += (' ' + str(k) + ' ' + str(v)) + return cmd_str + + ffmpeg = ['ffmpeg ', ' -y -loglevel ', ' error '] + vid_name = os.path.basename(video_path).split('.')[0] + out_full_path = os.path.join(outpath, vid_name) + + if not os.path.exists(out_full_path): + os.makedirs(out_full_path) + + # video file name + outformat = os.path.join(out_full_path, '%05d.jpg') + + cmd = ffmpeg + cmd = ffmpeg + [ + ' -i ', video_path, ' -r ', str(frame_rate), ' -f image2 ', outformat + ] + cmd = ''.join(cmd) + _dict2str(kargs) + + if os.system(cmd) != 0: + raise RuntimeError('ffmpeg process video: {} error'.format(video_path)) + sys.exit(-1) + + sys.stdout.flush() + return out_full_path + + +def _is_valid_video(f, extensions=('.mp4', '.avi', '.mov', '.rmvb', '.flv')): + return f.lower().endswith(extensions) diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/mtmct_cfg.yml b/PaddleDetection-release-2.6/deploy/pptracking/python/mtmct_cfg.yml new file mode 100644 index 0000000000000000000000000000000000000000..03ae9d1c933d7001f6e284d156a209624578b78d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/mtmct_cfg.yml @@ -0,0 +1,17 @@ +# config for MTMCT +MTMCT: True +cameras_bias: # default for scene S01. For S06, should modify as 'c041: 0 c042: 0' + c003: 0 + c004: 0 +# 1.zone releated parameters +use_zone: False +zone_path: dataset/mot/aic21mtmct_vehicle/S06/zone +# 2.tricks parameters, can be used for other mtmct dataset +use_ff: False +use_rerank: False +# 3.camera releated parameters +use_camera: False +use_st_filter: False +# 4.zone releated parameters +use_roi: False +roi_dir: dataset/mot/aic21mtmct_vehicle/S06 diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/picodet_postprocess.py b/PaddleDetection-release-2.6/deploy/pptracking/python/picodet_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..7df13f8278d13c51179c5502987926dec637bec4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/picodet_postprocess.py @@ -0,0 +1,227 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +from scipy.special import softmax + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetPostProcess(object): + """ + Args: + input_shape (int): network input image size + ori_shape (int): ori image shape of before padding + scale_factor (float): scale factor of ori image + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + input_shape, + ori_shape, + scale_factor, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.ori_shape = ori_shape + self.input_shape = input_shape + self.scale_factor = scale_factor + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def warp_boxes(self, boxes, ori_shape): + """Apply transform to boxes + """ + width, height = ori_shape[1], ori_shape[0] + n = len(boxes) + if n: + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + # xy = xy @ M.T # transform + xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate( + (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + # clip boxes + xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) + xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) + return xy.astype(np.float32) + else: + return boxes + + def __call__(self, scores, raw_boxes): + batch_size = raw_boxes[0].shape[0] + reg_max = int(raw_boxes[0].shape[-1] / 4 - 1) + out_boxes_num = [] + out_boxes_list = [] + for batch_id in range(batch_size): + # generate centers + decode_boxes = [] + select_scores = [] + for stride, box_distribute, score in zip(self.strides, raw_boxes, + scores): + box_distribute = box_distribute[batch_id] + score = score[batch_id] + # centers + fm_h = self.input_shape[0] / stride + fm_w = self.input_shape[1] / stride + h_range = np.arange(fm_h) + w_range = np.arange(fm_w) + ww, hh = np.meshgrid(w_range, h_range) + ct_row = (hh.flatten() + 0.5) * stride + ct_col = (ww.flatten() + 0.5) * stride + center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) + + # box distribution to distance + reg_range = np.arange(reg_max + 1) + box_distance = box_distribute.reshape((-1, reg_max + 1)) + box_distance = softmax(box_distance, axis=1) + box_distance = box_distance * np.expand_dims(reg_range, axis=0) + box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) + box_distance = box_distance * stride + + # top K candidate + topk_idx = np.argsort(score.max(axis=1))[::-1] + topk_idx = topk_idx[:self.nms_top_k] + center = center[topk_idx] + score = score[topk_idx] + box_distance = box_distance[topk_idx] + + # decode box + decode_box = center + [-1, -1, 1, 1] * box_distance + + select_scores.append(score) + decode_boxes.append(decode_box) + + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + out_boxes_num.append(0) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, :4] = self.warp_boxes( + picked_box_probs[:, :4], self.ori_shape[batch_id]) + im_scale = np.concatenate([ + self.scale_factor[batch_id][::-1], + self.scale_factor[batch_id][::-1] + ]) + picked_box_probs[:, :4] /= im_scale + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + out_boxes_num.append(len(picked_labels)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + out_boxes_num = np.asarray(out_boxes_num).astype(np.int32) + return out_boxes_list, out_boxes_num diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/preprocess.py b/PaddleDetection-release-2.6/deploy/pptracking/python/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..427479c814d6b3250921ead6b7b2a07ea6352173 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/preprocess.py @@ -0,0 +1,286 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np + + +def decode_image(im_file, im_info): + """read rgb image + Args: + im_file (str|np.ndarray): input can be image path or np.ndarray + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + if isinstance(im_file, str): + with open(im_file, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + else: + im = im_file + im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32) + return im, im_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + is_channel_first (bool): if True: image shape is CHW, else: HWC + """ + + def __init__(self, mean, std, is_scale=True): + self.mean = mean + self.std = std + self.is_scale = is_scale + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + + if self.is_scale: + im = im / 255.0 + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def preprocess(im, preprocess_ops): + # process image by preprocess_ops + im_info = { + 'scale_factor': np.array( + [1., 1.], dtype=np.float32), + 'im_shape': None, + } + im, im_info = decode_image(im, im_info) + for operator in preprocess_ops: + im, im_info = operator(im, im_info) + return im, im_info diff --git a/PaddleDetection-release-2.6/deploy/pptracking/python/tracker_config.yml b/PaddleDetection-release-2.6/deploy/pptracking/python/tracker_config.yml new file mode 100644 index 0000000000000000000000000000000000000000..c4a3f60268108be5ab285eeddea2c704958ce2a5 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/pptracking/python/tracker_config.yml @@ -0,0 +1,55 @@ +# config of tracker for MOT SDE Detector, use 'OCSORTTracker' as default, 'JDETracker' here is just BYTETracker. +# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. +# Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. + +type: BOTSORTTracker # choose one tracker in ['JDETracker', 'OCSORTTracker', 'DeepSORTTracker','BOTSORTTracker'] +# When using for MTMCT(Multi-Target Multi-Camera Tracking), you should modify to 'DeepSORTTracker' + + +# just as BYTETracker, used for FairMOT in PP-Tracking project and for ByteTrack in PP-Humanv1 project +JDETracker: + use_byte: True + det_thresh: 0.3 + conf_thres: 0.6 + low_conf_thres: 0.1 + match_thres: 0.9 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + + +# used for OC-SORT in PP-Humanv2 project and PP-Vehicle project +OCSORTTracker: + det_thresh: 0.4 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + min_box_area: 0 + vertical_ratio: 0 + use_byte: False + use_angle_cost: False + + +# used for DeepSORT and MTMCT in PP-Tracking project +DeepSORTTracker: + input_size: [64, 192] # An unique operation to scale the sub-image of the selected detected boxes to a fixed size + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + +BOTSORTTracker: + track_high_thresh: 0.3 + track_low_thresh: 0.2 + new_track_thresh: 0.4 + match_thresh: 0.7 + track_buffer: 30 + min_box_area: 0 + camera_motion: False + cmc_method: 'sparseOptFlow' # only camera_motion is True, + # sparseOptFlow | files (Vidstab GMC) | orb | ecc diff --git a/PaddleDetection-release-2.6/deploy/python/README.md b/PaddleDetection-release-2.6/deploy/python/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a190a878848af7cd5e8b12d4c0f05a177fa2d156 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/README.md @@ -0,0 +1,104 @@ +# Python端预测部署 + +在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。 + + +Python端预测部署主要包含两个步骤: +- 导出预测模型 +- 基于Python进行预测 + +## 1. 导出预测模型 + +PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](../EXPORT_MODEL.md),例如 + +```bash +# 导出YOLOv3检测模型 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams + +# 导出HigherHRNet(bottom-up)关键点检测模型 +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams + +# 导出HRNet(top-down)关键点检测模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams + +# 导出FairMOT多目标跟踪模型 +python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +# 导出ByteTrack多目标跟踪模型(相当于只导出检测器) +python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` + +导出后目录下,包括`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`四个文件。 + + +## 2. 基于Python的预测 + +### 2.1 通用检测 +在终端输入以下命令进行预测: +```bash +python deploy/python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco --image_file=./demo/000000014439.jpg --device=GPU +``` + +### 2.2 关键点检测 +在终端输入以下命令进行预测: +```bash +# keypoint top-down(HRNet)/bottom-up(HigherHRNet)单独推理,该模式下top-down模型HRNet只支持单人截图预测 +python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=GPU --threshold=0.5 +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=GPU --threshold=0.5 + +# detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down关键点模型) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/yolov3_darknet53_270e_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU +``` +**注意:** + - 关键点检测模型导出和预测具体可参照[keypoint](../../configs/keypoint/README.md),可分别在各个模型的文档中查找具体用法; + - 此目录下的关键点检测部署为基础前向功能,更多关键点检测功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md); + + +### 2.3 多目标跟踪 +在终端输入以下命令进行预测: +```bash +# FairMOT跟踪 +python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU + +# ByteTrack跟踪 +python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True + +# FairMOT多目标跟踪联合HRNet关键点检测(联合推理只支持top-down关键点模型) +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU +``` + +**注意:** + - 多目标跟踪模型导出和预测具体可参照[mot]](../../configs/mot/README.md),可分别在各个模型的文档中查找具体用法; + - 此目录下的跟踪部署为基础前向功能以及联合关键点部署,更多跟踪功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md),或PP-Tracking项目(绘制轨迹、出入口流量计数),参照[pptracking](../pptracking/README.md); + + +参数说明如下: + +| 参数 | 是否必须| 含义 | +|-------|-------|---------------------------------------------------------------------------------------------| +| --model_dir | Yes| 上述导出的模型路径 | +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4 | +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU` | +| --run_mode | Option | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8) | +| --batch_size | Option | 预测时的batch size,在指定`image_dir`时有效,默认为1 | +| --threshold | Option| 预测得分的阈值,默认为0.5 | +| --output_dir | Option| 可视化结果保存的根目录,默认为output/ | +| --run_benchmark | Option| 是否运行benchmark,同时需指定`--image_file`或`--image_dir`,默认为False | +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --save_images | Option| 是否保存可视化结果 | +| --save_results | Option| 是否在文件夹下将图片的预测结果以JSON的形式保存 | + + +说明: + +- 参数优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。 +- run_mode:paddle代表使用AnalysisPredictor,精度float32来推理,其他参数指用AnalysisPredictor,TensorRT不同精度来推理。 +- 如果安装的PaddlePaddle不支持基于TensorRT进行预测,需要自行编译,详细可参考[预测库编译教程](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。 +- --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。 +- 如果需要使用导出模型在coco数据集上进行评估,请在推理时添加`--save_results`和`--use_coco_category`参数用以保存coco评估所需要的json文件 diff --git a/PaddleDetection-release-2.6/deploy/python/benchmark_utils.py b/PaddleDetection-release-2.6/deploy/python/benchmark_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..adf36217955ed71103ad46a7e7ae5cb488e93d96 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/benchmark_utils.py @@ -0,0 +1,289 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import logging + +import paddle +import paddle.inference as paddle_infer + +from pathlib import Path + +CUR_DIR = os.path.dirname(os.path.abspath(__file__)) +LOG_PATH_ROOT = f"{CUR_DIR}/../../output" + + +class PaddleInferBenchmark(object): + def __init__(self, + config, + model_info: dict={}, + data_info: dict={}, + perf_info: dict={}, + resource_info: dict={}, + **kwargs): + """ + Construct PaddleInferBenchmark Class to format logs. + args: + config(paddle.inference.Config): paddle inference config + model_info(dict): basic model info + {'model_name': 'resnet50' + 'precision': 'fp32'} + data_info(dict): input data info + {'batch_size': 1 + 'shape': '3,224,224' + 'data_num': 1000} + perf_info(dict): performance result + {'preprocess_time_s': 1.0 + 'inference_time_s': 2.0 + 'postprocess_time_s': 1.0 + 'total_time_s': 4.0} + resource_info(dict): + cpu and gpu resources + {'cpu_rss': 100 + 'gpu_rss': 100 + 'gpu_util': 60} + """ + # PaddleInferBenchmark Log Version + self.log_version = "1.0.3" + + # Paddle Version + self.paddle_version = paddle.__version__ + self.paddle_commit = paddle.__git_commit__ + paddle_infer_info = paddle_infer.get_version() + self.paddle_branch = paddle_infer_info.strip().split(': ')[-1] + + # model info + self.model_info = model_info + + # data info + self.data_info = data_info + + # perf info + self.perf_info = perf_info + + try: + # required value + self.model_name = model_info['model_name'] + self.precision = model_info['precision'] + + self.batch_size = data_info['batch_size'] + self.shape = data_info['shape'] + self.data_num = data_info['data_num'] + + self.inference_time_s = round(perf_info['inference_time_s'], 4) + except: + self.print_help() + raise ValueError( + "Set argument wrong, please check input argument and its type") + + self.preprocess_time_s = perf_info.get('preprocess_time_s', 0) + self.postprocess_time_s = perf_info.get('postprocess_time_s', 0) + self.with_tracker = True if 'tracking_time_s' in perf_info else False + self.tracking_time_s = perf_info.get('tracking_time_s', 0) + self.total_time_s = perf_info.get('total_time_s', 0) + + self.inference_time_s_90 = perf_info.get("inference_time_s_90", "") + self.inference_time_s_99 = perf_info.get("inference_time_s_99", "") + self.succ_rate = perf_info.get("succ_rate", "") + self.qps = perf_info.get("qps", "") + + # conf info + self.config_status = self.parse_config(config) + + # mem info + if isinstance(resource_info, dict): + self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0)) + self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0)) + self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0)) + self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0)) + self.cpu_util = round(resource_info.get('cpu_util', 0), 2) + + self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0)) + self.gpu_util = round(resource_info.get('gpu_util', 0), 2) + self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2) + else: + self.cpu_rss_mb = 0 + self.cpu_vms_mb = 0 + self.cpu_shared_mb = 0 + self.cpu_dirty_mb = 0 + self.cpu_util = 0 + + self.gpu_rss_mb = 0 + self.gpu_util = 0 + self.gpu_mem_util = 0 + + # init benchmark logger + self.benchmark_logger() + + def benchmark_logger(self): + """ + benchmark logger + """ + # remove other logging handler + for handler in logging.root.handlers[:]: + logging.root.removeHandler(handler) + + # Init logger + FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s' + log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log" + Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True) + logging.basicConfig( + level=logging.INFO, + format=FORMAT, + handlers=[ + logging.FileHandler( + filename=log_output, mode='w'), + logging.StreamHandler(), + ]) + self.logger = logging.getLogger(__name__) + self.logger.info( + f"Paddle Inference benchmark log will be saved to {log_output}") + + def parse_config(self, config) -> dict: + """ + parse paddle predictor config + args: + config(paddle.inference.Config): paddle inference config + return: + config_status(dict): dict style config info + """ + if isinstance(config, paddle_infer.Config): + config_status = {} + config_status['runtime_device'] = "gpu" if config.use_gpu( + ) else "cpu" + config_status['ir_optim'] = config.ir_optim() + config_status['enable_tensorrt'] = config.tensorrt_engine_enabled() + config_status['precision'] = self.precision + config_status['enable_mkldnn'] = config.mkldnn_enabled() + config_status[ + 'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads( + ) + elif isinstance(config, dict): + config_status['runtime_device'] = config.get('runtime_device', "") + config_status['ir_optim'] = config.get('ir_optim', "") + config_status['enable_tensorrt'] = config.get('enable_tensorrt', "") + config_status['precision'] = config.get('precision', "") + config_status['enable_mkldnn'] = config.get('enable_mkldnn', "") + config_status['cpu_math_library_num_threads'] = config.get( + 'cpu_math_library_num_threads', "") + else: + self.print_help() + raise ValueError( + "Set argument config wrong, please check input argument and its type" + ) + return config_status + + def report(self, identifier=None): + """ + print log report + args: + identifier(string): identify log + """ + if identifier: + identifier = f"[{identifier}]" + else: + identifier = "" + + self.logger.info("\n") + self.logger.info( + "---------------------- Paddle info ----------------------") + self.logger.info(f"{identifier} paddle_version: {self.paddle_version}") + self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}") + self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}") + self.logger.info(f"{identifier} log_api_version: {self.log_version}") + self.logger.info( + "----------------------- Conf info -----------------------") + self.logger.info( + f"{identifier} runtime_device: {self.config_status['runtime_device']}" + ) + self.logger.info( + f"{identifier} ir_optim: {self.config_status['ir_optim']}") + self.logger.info(f"{identifier} enable_memory_optim: {True}") + self.logger.info( + f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}" + ) + self.logger.info( + f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}") + self.logger.info( + f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}" + ) + self.logger.info( + "----------------------- Model info ----------------------") + self.logger.info(f"{identifier} model_name: {self.model_name}") + self.logger.info(f"{identifier} precision: {self.precision}") + self.logger.info( + "----------------------- Data info -----------------------") + self.logger.info(f"{identifier} batch_size: {self.batch_size}") + self.logger.info(f"{identifier} input_shape: {self.shape}") + self.logger.info(f"{identifier} data_num: {self.data_num}") + self.logger.info( + "----------------------- Perf info -----------------------") + self.logger.info( + f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%" + ) + self.logger.info( + f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%" + ) + self.logger.info( + f"{identifier} total time spent(s): {self.total_time_s}") + + if self.with_tracker: + self.logger.info( + f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, " + f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, " + f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}, " + f"tracking_time(ms): {round(self.tracking_time_s*1000, 1)}") + else: + self.logger.info( + f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, " + f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, " + f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}" + ) + if self.inference_time_s_90: + self.looger.info( + f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}" + ) + if self.qps: + self.logger.info(f"{identifier} QPS: {self.qps}") + + def print_help(self): + """ + print function help + """ + print("""Usage: + ==== Print inference benchmark logs. ==== + config = paddle.inference.Config() + model_info = {'model_name': 'resnet50' + 'precision': 'fp32'} + data_info = {'batch_size': 1 + 'shape': '3,224,224' + 'data_num': 1000} + perf_info = {'preprocess_time_s': 1.0 + 'inference_time_s': 2.0 + 'postprocess_time_s': 1.0 + 'total_time_s': 4.0} + resource_info = {'cpu_rss_mb': 100 + 'gpu_rss_mb': 100 + 'gpu_util': 60} + log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info) + log('Test') + """) + + def __call__(self, identifier=None): + """ + __call__ + args: + identifier(string): identify log + """ + self.report(identifier) diff --git a/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_infer.py b/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..7b57714d18c7a34bc8b740ec39ed7443da9a10d6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_infer.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import cv2 +import math +import numpy as np +import paddle +import yaml + +from det_keypoint_unite_utils import argsparser +from preprocess import decode_image +from infer import Detector, DetectorPicoDet, PredictConfig, print_arguments, get_test_images, bench_log +from keypoint_infer import KeyPointDetector, PredictConfig_KeyPoint +from visualize import visualize_pose +from benchmark_utils import PaddleInferBenchmark +from utils import get_current_memory_mb +from keypoint_postprocess import translate_to_ori_images + +KEYPOINT_SUPPORT_MODELS = { + 'HigherHRNet': 'keypoint_bottomup', + 'HRNet': 'keypoint_topdown' +} + + +def predict_with_given_det(image, det_res, keypoint_detector, + keypoint_batch_size, run_benchmark): + keypoint_res = {} + + rec_images, records, det_rects = keypoint_detector.get_person_from_rect( + image, det_res) + + if len(det_rects) == 0: + keypoint_res['keypoint'] = [[], []] + return keypoint_res + + keypoint_vector = [] + score_vector = [] + + rect_vector = det_rects + keypoint_results = keypoint_detector.predict_image( + rec_images, run_benchmark, repeats=10, visual=False) + keypoint_vector, score_vector = translate_to_ori_images(keypoint_results, + np.array(records)) + keypoint_res['keypoint'] = [ + keypoint_vector.tolist(), score_vector.tolist() + ] if len(keypoint_vector) > 0 else [[], []] + keypoint_res['bbox'] = rect_vector + return keypoint_res + + +def topdown_unite_predict(detector, + topdown_keypoint_detector, + image_list, + keypoint_batch_size=1, + save_res=False): + det_timer = detector.get_timer() + store_res = [] + for i, img_file in enumerate(image_list): + # Decode image in advance in det + pose prediction + det_timer.preprocess_time_s.start() + image, _ = decode_image(img_file, {}) + det_timer.preprocess_time_s.end() + + if FLAGS.run_benchmark: + results = detector.predict_image( + [image], run_benchmark=True, repeats=10) + + cm, gm, gu = get_current_memory_mb() + detector.cpu_mem += cm + detector.gpu_mem += gm + detector.gpu_util += gu + else: + results = detector.predict_image([image], visual=False) + results = detector.filter_box(results, FLAGS.det_threshold) + if results['boxes_num'] > 0: + keypoint_res = predict_with_given_det( + image, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) + + if save_res: + save_name = img_file if isinstance(img_file, str) else i + store_res.append([ + save_name, keypoint_res['bbox'], + [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] + ]) + else: + results["keypoint"] = [[], []] + keypoint_res = results + if FLAGS.run_benchmark: + cm, gm, gu = get_current_memory_mb() + topdown_keypoint_detector.cpu_mem += cm + topdown_keypoint_detector.gpu_mem += gm + topdown_keypoint_detector.gpu_util += gu + else: + if not os.path.exists(FLAGS.output_dir): + os.makedirs(FLAGS.output_dir) + visualize_pose( + img_file, + keypoint_res, + visual_thresh=FLAGS.keypoint_threshold, + save_dir=FLAGS.output_dir) + if save_res: + """ + 1) store_res: a list of image_data + 2) image_data: [imageid, rects, [keypoints, scores]] + 3) rects: list of rect [xmin, ymin, xmax, ymax] + 4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list + 5) scores: mean of all joint conf + """ + with open("det_keypoint_unite_image_results.json", 'w') as wf: + json.dump(store_res, wf, indent=4) + + +def topdown_unite_predict_video(detector, + topdown_keypoint_detector, + camera_id, + keypoint_batch_size=1, + save_res=False): + video_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(FLAGS.video_file) + video_name = os.path.split(FLAGS.video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(FLAGS.output_dir): + os.makedirs(FLAGS.output_dir) + out_path = os.path.join(FLAGS.output_dir, video_name) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + index = 0 + store_res = [] + keypoint_smoothing = KeypointSmoothing( + width, height, filter_type=FLAGS.filter_type, beta=0.05) + + while (1): + ret, frame = capture.read() + if not ret: + break + index += 1 + print('detect frame: %d' % (index)) + + frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) + + results = detector.predict_image([frame2], visual=False) + results = detector.filter_box(results, FLAGS.det_threshold) + if results['boxes_num'] == 0: + writer.write(frame) + continue + + keypoint_res = predict_with_given_det( + frame2, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) + + if FLAGS.smooth and len(keypoint_res['keypoint'][0]) == 1: + current_keypoints = np.array(keypoint_res['keypoint'][0][0]) + smooth_keypoints = keypoint_smoothing.smooth_process( + current_keypoints) + + keypoint_res['keypoint'][0][0] = smooth_keypoints.tolist() + + im = visualize_pose( + frame, + keypoint_res, + visual_thresh=FLAGS.keypoint_threshold, + returnimg=True) + + if save_res: + store_res.append([ + index, keypoint_res['bbox'], + [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] + ]) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + writer.release() + print('output_video saved to: {}'.format(out_path)) + if save_res: + """ + 1) store_res: a list of frame_data + 2) frame_data: [frameid, rects, [keypoints, scores]] + 3) rects: list of rect [xmin, ymin, xmax, ymax] + 4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list + 5) scores: mean of all joint conf + """ + with open("det_keypoint_unite_video_results.json", 'w') as wf: + json.dump(store_res, wf, indent=4) + + +class KeypointSmoothing(object): + # The following code are modified from: + # https://github.com/jaantollander/OneEuroFilter + + def __init__(self, + width, + height, + filter_type, + alpha=0.5, + fc_d=0.1, + fc_min=0.1, + beta=0.1, + thres_mult=0.3): + super(KeypointSmoothing, self).__init__() + self.image_width = width + self.image_height = height + self.threshold = np.array([ + 0.005, 0.005, 0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.01, 0.01, + 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01 + ]) * thres_mult + self.filter_type = filter_type + self.alpha = alpha + self.dx_prev_hat = None + self.x_prev_hat = None + self.fc_d = fc_d + self.fc_min = fc_min + self.beta = beta + + if self.filter_type == 'OneEuro': + self.smooth_func = self.one_euro_filter + elif self.filter_type == 'EMA': + self.smooth_func = self.ema_filter + else: + raise ValueError('filter type must be one_euro or ema') + + def smooth_process(self, current_keypoints): + if self.x_prev_hat is None: + self.x_prev_hat = current_keypoints[:, :2] + self.dx_prev_hat = np.zeros(current_keypoints[:, :2].shape) + return current_keypoints + else: + result = current_keypoints + num_keypoints = len(current_keypoints) + for i in range(num_keypoints): + result[i, :2] = self.smooth(current_keypoints[i, :2], + self.threshold[i], i) + return result + + def smooth(self, current_keypoint, threshold, index): + distance = np.sqrt( + np.square((current_keypoint[0] - self.x_prev_hat[index][0]) / + self.image_width) + np.square((current_keypoint[ + 1] - self.x_prev_hat[index][1]) / self.image_height)) + if distance < threshold: + result = self.x_prev_hat[index] + else: + result = self.smooth_func(current_keypoint, self.x_prev_hat[index], + index) + + return result + + def one_euro_filter(self, x_cur, x_pre, index): + te = 1 + self.alpha = self.smoothing_factor(te, self.fc_d) + dx_cur = (x_cur - x_pre) / te + dx_cur_hat = self.exponential_smoothing(dx_cur, self.dx_prev_hat[index]) + + fc = self.fc_min + self.beta * np.abs(dx_cur_hat) + self.alpha = self.smoothing_factor(te, fc) + x_cur_hat = self.exponential_smoothing(x_cur, x_pre) + self.dx_prev_hat[index] = dx_cur_hat + self.x_prev_hat[index] = x_cur_hat + return x_cur_hat + + def ema_filter(self, x_cur, x_pre, index): + x_cur_hat = self.exponential_smoothing(x_cur, x_pre) + self.x_prev_hat[index] = x_cur_hat + return x_cur_hat + + def smoothing_factor(self, te, fc): + r = 2 * math.pi * fc * te + return r / (r + 1) + + def exponential_smoothing(self, x_cur, x_pre, index=0): + return self.alpha * x_cur + (1 - self.alpha) * x_pre + + +def main(): + deploy_file = os.path.join(FLAGS.det_model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + detector_func = 'Detector' + if arch == 'PicoDet': + detector_func = 'DetectorPicoDet' + + detector = eval(detector_func)(FLAGS.det_model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.det_threshold) + + topdown_keypoint_detector = KeyPointDetector( + FLAGS.keypoint_model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.keypoint_batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + use_dark=FLAGS.use_dark) + keypoint_arch = topdown_keypoint_detector.pred_config.arch + assert KEYPOINT_SUPPORT_MODELS[ + keypoint_arch] == 'keypoint_topdown', 'Detection-Keypoint unite inference only supports topdown models.' + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + topdown_unite_predict_video(detector, topdown_keypoint_detector, + FLAGS.camera_id, FLAGS.keypoint_batch_size, + FLAGS.save_res) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + topdown_unite_predict(detector, topdown_keypoint_detector, img_list, + FLAGS.keypoint_batch_size, FLAGS.save_res) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + topdown_keypoint_detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + det_model_dir = FLAGS.det_model_dir + det_model_info = { + 'model_name': det_model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, det_model_info, name='Det') + keypoint_model_dir = FLAGS.keypoint_model_dir + keypoint_model_info = { + 'model_name': keypoint_model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(topdown_keypoint_detector, img_list, keypoint_model_info, + FLAGS.keypoint_batch_size, 'KeyPoint') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_utils.py b/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..7de1295128d9151cf55a7b1e427d6ee946db8bc4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/det_keypoint_unite_utils.py @@ -0,0 +1,141 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import ast +import argparse + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--det_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--keypoint_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--keypoint_batch_size", + type=int, + default=8, + help=("batch_size for keypoint inference. In detection-keypoint unit" + "inference, the batch size in detection is 1. Then collate det " + "result in batch for keypoint inference.")) + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--det_threshold", type=float, default=0.5, help="Threshold of score.") + parser.add_argument( + "--keypoint_threshold", + type=float, + default=0.5, + help="Threshold of score.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--run_benchmark", + type=ast.literal_eval, + default=False, + help="Whether to predict a image_file repeatedly for benchmark") + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1280, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=640, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + '--use_dark', + type=ast.literal_eval, + default=True, + help='whether to use darkpose to get better keypoint position predict ') + parser.add_argument( + '--save_res', + type=bool, + default=False, + help=( + "whether to save predict results to json file" + "1) store_res: a list of image_data" + "2) image_data: [imageid, rects, [keypoints, scores]]" + "3) rects: list of rect [xmin, ymin, xmax, ymax]" + "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list" + "5) scores: mean of all joint conf")) + parser.add_argument( + '--smooth', + type=ast.literal_eval, + default=False, + help='smoothing keypoints for each frame, new incoming keypoints will be more stable.' + ) + parser.add_argument( + '--filter_type', + type=str, + default='OneEuro', + help='when set --smooth True, choose filter type you want to use, it can be [OneEuro] or [EMA].' + ) + return parser diff --git a/PaddleDetection-release-2.6/deploy/python/infer.py b/PaddleDetection-release-2.6/deploy/python/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..31e491b12783a80aebfdbc09866b114063387717 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/infer.py @@ -0,0 +1,1084 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +import json +from pathlib import Path +from functools import reduce + +import cv2 +import numpy as np +import math +import paddle +from paddle.inference import Config +from paddle.inference import create_predictor + +import sys +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) +sys.path.insert(0, parent_path) + +from benchmark_utils import PaddleInferBenchmark +from picodet_postprocess import PicoDetPostProcess +from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine, Pad, decode_image +from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop +from visualize import visualize_box_mask +from utils import argsparser, Timer, get_current_memory_mb, multiclass_nms, coco_clsid2catid + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'PPYOLOE', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', + 'S2ANet', 'JDE', 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', + 'TOOD', 'RetinaNet', 'StrongBaseline', 'STGCN', 'YOLOX', 'YOLOF', 'PPHGNet', + 'PPLCNet', 'DETR', 'CenterTrack' +} + +TUNED_TRT_DYNAMIC_MODELS = {'DETR'} + + +def bench_log(detector, img_list, model_info, batch_size=1, name=None): + mems = { + 'cpu_rss_mb': detector.cpu_mem / len(img_list), + 'gpu_rss_mb': detector.gpu_mem / len(img_list), + 'gpu_util': detector.gpu_util * 100 / len(img_list) + } + perf_info = detector.det_times.report(average=True) + data_info = { + 'batch_size': batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + log = PaddleInferBenchmark(detector.config, model_info, data_info, + perf_info, mems) + log(name) + + +class Detector(object): + """ + Args: + pred_config (object): config of model, defined by `Config(model_dir)` + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + enable_mkldnn_bfloat16 (bool): whether to turn on mkldnn bfloat16 + output_dir (str): The path of output + threshold (float): The threshold of score for visualization + delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT. + Used by action model. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + enable_mkldnn_bfloat16=False, + output_dir='output', + threshold=0.5, + delete_shuffle_pass=False): + self.pred_config = self.set_config(model_dir) + self.predictor, self.config = load_predictor( + model_dir, + self.pred_config.arch, + run_mode=run_mode, + batch_size=batch_size, + min_subgraph_size=self.pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + enable_mkldnn_bfloat16=enable_mkldnn_bfloat16, + delete_shuffle_pass=delete_shuffle_pass) + self.det_times = Timer() + self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0 + self.batch_size = batch_size + self.output_dir = output_dir + self.threshold = threshold + + def set_config(self, model_dir): + return PredictConfig(model_dir) + + def preprocess(self, image_list): + preprocess_ops = [] + for op_info in self.pred_config.preprocess_infos: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + preprocess_ops.append(eval(op_type)(**new_op_info)) + + input_im_lst = [] + input_im_info_lst = [] + for im_path in image_list: + im, im_info = preprocess(im_path, preprocess_ops) + input_im_lst.append(im) + input_im_info_lst.append(im_info) + inputs = create_inputs(input_im_lst, input_im_info_lst) + input_names = self.predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.predictor.get_input_handle(input_names[i]) + if input_names[i] == 'x': + input_tensor.copy_from_cpu(inputs['image']) + else: + input_tensor.copy_from_cpu(inputs[input_names[i]]) + + return inputs + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes_num = result['boxes_num'] + assert isinstance(np_boxes_num, np.ndarray), \ + '`np_boxes_num` should be a `numpy.ndarray`' + + result = {k: v for k, v in result.items() if v is not None} + return result + + def filter_box(self, result, threshold): + np_boxes_num = result['boxes_num'] + boxes = result['boxes'] + start_idx = 0 + filter_boxes = [] + filter_num = [] + for i in range(len(np_boxes_num)): + boxes_num = np_boxes_num[i] + boxes_i = boxes[start_idx:start_idx + boxes_num, :] + idx = boxes_i[:, 1] > threshold + filter_boxes_i = boxes_i[idx, :] + filter_boxes.append(filter_boxes_i) + filter_num.append(filter_boxes_i.shape[0]) + start_idx += boxes_num + boxes = np.concatenate(filter_boxes) + filter_num = np.array(filter_num) + filter_res = {'boxes': boxes, 'boxes_num': filter_num} + return filter_res + + def predict(self, repeats=1, run_benchmark=False): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + MaskRCNN's result include 'masks': np.ndarray: + shape: [N, im_h, im_w] + ''' + # model prediction + np_boxes_num, np_boxes, np_masks = np.array([0]), None, None + + if run_benchmark: + for i in range(repeats): + self.predictor.run() + paddle.device.cuda.synchronize() + result = dict( + boxes=np_boxes, masks=np_masks, boxes_num=np_boxes_num) + return result + + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + boxes_tensor = self.predictor.get_output_handle(output_names[0]) + np_boxes = boxes_tensor.copy_to_cpu() + if len(output_names) == 1: + # some exported model can not get tensor 'bbox_num' + np_boxes_num = np.array([len(np_boxes)]) + else: + boxes_num = self.predictor.get_output_handle(output_names[1]) + np_boxes_num = boxes_num.copy_to_cpu() + if self.pred_config.mask: + masks_tensor = self.predictor.get_output_handle(output_names[2]) + np_masks = masks_tensor.copy_to_cpu() + result = dict(boxes=np_boxes, masks=np_masks, boxes_num=np_boxes_num) + return result + + def merge_batch_result(self, batch_result): + if len(batch_result) == 1: + return batch_result[0] + res_key = batch_result[0].keys() + results = {k: [] for k in res_key} + for res in batch_result: + for k, v in res.items(): + results[k].append(v) + for k, v in results.items(): + if k not in ['masks', 'segm']: + results[k] = np.concatenate(v) + return results + + def get_timer(self): + return self.det_times + + def predict_image_slice(self, + img_list, + slice_size=[640, 640], + overlap_ratio=[0.25, 0.25], + combine_method='nms', + match_threshold=0.6, + match_metric='ios', + run_benchmark=False, + repeats=1, + visual=True, + save_results=False): + # slice infer only support bs=1 + results = [] + try: + import sahi + from sahi.slicing import slice_image + except Exception as e: + print( + 'sahi not found, plaese install sahi. ' + 'for example: `pip install sahi`, see https://github.com/obss/sahi.' + ) + raise e + num_classes = len(self.pred_config.labels) + for i in range(len(img_list)): + ori_image = img_list[i] + slice_image_result = sahi.slicing.slice_image( + image=ori_image, + slice_height=slice_size[0], + slice_width=slice_size[1], + overlap_height_ratio=overlap_ratio[0], + overlap_width_ratio=overlap_ratio[1]) + sub_img_num = len(slice_image_result) + merged_bboxs = [] + print('slice to {} sub_samples.', sub_img_num) + + batch_image_list = [ + slice_image_result.images[_ind] for _ind in range(sub_img_num) + ] + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result = self.predict(repeats=50, run_benchmark=True) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats, run_benchmark=True) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += 1 + + st, ed = 0, result['boxes_num'][0] # start_index, end_index + for _ind in range(sub_img_num): + boxes_num = result['boxes_num'][_ind] + ed = st + boxes_num + shift_amount = slice_image_result.starting_pixels[_ind] + result['boxes'][st:ed][:, 2:4] = result['boxes'][ + st:ed][:, 2:4] + shift_amount + result['boxes'][st:ed][:, 4:6] = result['boxes'][ + st:ed][:, 4:6] + shift_amount + merged_bboxs.append(result['boxes'][st:ed]) + st = ed + + merged_results = {'boxes': []} + if combine_method == 'nms': + final_boxes = multiclass_nms( + np.concatenate(merged_bboxs), num_classes, match_threshold, + match_metric) + merged_results['boxes'] = np.concatenate(final_boxes) + elif combine_method == 'concat': + merged_results['boxes'] = np.concatenate(merged_bboxs) + else: + raise ValueError( + "Now only support 'nms' or 'concat' to fuse detection results." + ) + merged_results['boxes_num'] = np.array( + [len(merged_results['boxes'])], dtype=np.int32) + + if visual: + visualize( + [ori_image], # should be list + merged_results, + self.pred_config.labels, + output_dir=self.output_dir, + threshold=self.threshold) + + results.append(merged_results) + print('Test iter {}'.format(i)) + + results = self.merge_batch_result(results) + if save_results: + Path(self.output_dir).mkdir(exist_ok=True) + self.save_coco_results( + img_list, results, use_coco_category=FLAGS.use_coco_category) + return results + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + save_results=False): + batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size) + results = [] + for i in range(batch_loop_cnt): + start_index = i * self.batch_size + end_index = min((i + 1) * self.batch_size, len(image_list)) + batch_image_list = image_list[start_index:end_index] + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result = self.predict(repeats=50, run_benchmark=True) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats, run_benchmark=True) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + if visual: + visualize( + batch_image_list, + result, + self.pred_config.labels, + output_dir=self.output_dir, + threshold=self.threshold) + results.append(result) + print('Test iter {}'.format(i)) + results = self.merge_batch_result(results) + if save_results: + Path(self.output_dir).mkdir(exist_ok=True) + self.save_coco_results( + image_list, results, use_coco_category=FLAGS.use_coco_category) + return results + + def predict_video(self, video_file, camera_id): + video_out_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + index = 1 + while (1): + ret, frame = capture.read() + if not ret: + break + print('detect frame: %d' % (index)) + index += 1 + results = self.predict_image([frame[:, :, ::-1]], visual=False) + + im = visualize_box_mask( + frame, + results, + self.pred_config.labels, + threshold=self.threshold) + im = np.array(im) + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + writer.release() + + def save_coco_results(self, image_list, results, use_coco_category=False): + bbox_results = [] + mask_results = [] + idx = 0 + print("Start saving coco json files...") + for i, box_num in enumerate(results['boxes_num']): + file_name = os.path.split(image_list[i])[-1] + if use_coco_category: + img_id = int(os.path.splitext(file_name)[0]) + else: + img_id = i + + if 'boxes' in results: + boxes = results['boxes'][idx:idx + box_num].tolist() + bbox_results.extend([{ + 'image_id': img_id, + 'category_id': coco_clsid2catid[int(box[0])] \ + if use_coco_category else int(box[0]), + 'file_name': file_name, + 'bbox': [box[2], box[3], box[4] - box[2], + box[5] - box[3]], # xyxy -> xywh + 'score': box[1]} for box in boxes]) + + if 'masks' in results: + import pycocotools.mask as mask_util + + boxes = results['boxes'][idx:idx + box_num].tolist() + masks = results['masks'][i][:box_num].astype(np.uint8) + seg_res = [] + for box, mask in zip(boxes, masks): + rle = mask_util.encode( + np.array( + mask[:, :, None], dtype=np.uint8, order="F"))[0] + if 'counts' in rle: + rle['counts'] = rle['counts'].decode("utf8") + seg_res.append({ + 'image_id': img_id, + 'category_id': coco_clsid2catid[int(box[0])] \ + if use_coco_category else int(box[0]), + 'file_name': file_name, + 'segmentation': rle, + 'score': box[1]}) + mask_results.extend(seg_res) + + idx += box_num + + if bbox_results: + bbox_file = os.path.join(self.output_dir, "bbox.json") + with open(bbox_file, 'w') as f: + json.dump(bbox_results, f) + print(f"The bbox result is saved to {bbox_file}") + if mask_results: + mask_file = os.path.join(self.output_dir, "mask.json") + with open(mask_file, 'w') as f: + json.dump(mask_results, f) + print(f"The mask result is saved to {mask_file}") + + +class DetectorSOLOv2(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + enable_mkldnn_bfloat16 (bool): Whether to turn on mkldnn bfloat16 + output_dir (str): The path of output + threshold (float): The threshold of score for visualization + + """ + + def __init__( + self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + enable_mkldnn_bfloat16=False, + output_dir='./', + threshold=0.5, ): + super(DetectorSOLOv2, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + enable_mkldnn_bfloat16=enable_mkldnn_bfloat16, + output_dir=output_dir, + threshold=threshold, ) + + def predict(self, repeats=1, run_benchmark=False): + ''' + Args: + repeats (int): repeat number for prediction + Returns: + result (dict): 'segm': np.ndarray,shape:[N, im_h, im_w] + 'cate_label': label of segm, shape:[N] + 'cate_score': confidence score of segm, shape:[N] + ''' + np_segms, np_label, np_score, np_boxes_num = None, None, None, np.array( + [0]) + + if run_benchmark: + for i in range(repeats): + self.predictor.run() + paddle.device.cuda.synchronize() + result = dict( + segm=np_segms, + label=np_label, + score=np_score, + boxes_num=np_boxes_num) + return result + + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + np_boxes_num = self.predictor.get_output_handle(output_names[ + 0]).copy_to_cpu() + np_label = self.predictor.get_output_handle(output_names[ + 1]).copy_to_cpu() + np_score = self.predictor.get_output_handle(output_names[ + 2]).copy_to_cpu() + np_segms = self.predictor.get_output_handle(output_names[ + 3]).copy_to_cpu() + + result = dict( + segm=np_segms, + label=np_label, + score=np_score, + boxes_num=np_boxes_num) + return result + + +class DetectorPicoDet(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to turn on MKLDNN + enable_mkldnn_bfloat16 (bool): whether to turn on MKLDNN_BFLOAT16 + """ + + def __init__( + self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + enable_mkldnn_bfloat16=False, + output_dir='./', + threshold=0.5, ): + super(DetectorPicoDet, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + enable_mkldnn_bfloat16=enable_mkldnn_bfloat16, + output_dir=output_dir, + threshold=threshold, ) + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_score_list = result['boxes'] + np_boxes_list = result['boxes_num'] + postprocessor = PicoDetPostProcess( + inputs['image'].shape[2:], + inputs['im_shape'], + inputs['scale_factor'], + strides=self.pred_config.fpn_stride, + nms_threshold=self.pred_config.nms['nms_threshold']) + np_boxes, np_boxes_num = postprocessor(np_score_list, np_boxes_list) + result = dict(boxes=np_boxes, boxes_num=np_boxes_num) + return result + + def predict(self, repeats=1, run_benchmark=False): + ''' + Args: + repeats (int): repeat number for prediction + Returns: + result (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + ''' + np_score_list, np_boxes_list = [], [] + + if run_benchmark: + for i in range(repeats): + self.predictor.run() + paddle.device.cuda.synchronize() + result = dict(boxes=np_score_list, boxes_num=np_boxes_list) + return result + + for i in range(repeats): + self.predictor.run() + np_score_list.clear() + np_boxes_list.clear() + output_names = self.predictor.get_output_names() + num_outs = int(len(output_names) / 2) + for out_idx in range(num_outs): + np_score_list.append( + self.predictor.get_output_handle(output_names[out_idx]) + .copy_to_cpu()) + np_boxes_list.append( + self.predictor.get_output_handle(output_names[ + out_idx + num_outs]).copy_to_cpu()) + result = dict(boxes=np_score_list, boxes_num=np_boxes_list) + return result + + +def create_inputs(imgs, im_info): + """generate input for different model type + Args: + imgs (list(numpy)): list of images (np.ndarray) + im_info (list(dict)): list of image info + Returns: + inputs (dict): input of model + """ + inputs = {} + + im_shape = [] + scale_factor = [] + if len(imgs) == 1: + inputs['image'] = np.array((imgs[0], )).astype('float32') + inputs['im_shape'] = np.array( + (im_info[0]['im_shape'], )).astype('float32') + inputs['scale_factor'] = np.array( + (im_info[0]['scale_factor'], )).astype('float32') + return inputs + + for e in im_info: + im_shape.append(np.array((e['im_shape'], )).astype('float32')) + scale_factor.append(np.array((e['scale_factor'], )).astype('float32')) + + inputs['im_shape'] = np.concatenate(im_shape, axis=0) + inputs['scale_factor'] = np.concatenate(scale_factor, axis=0) + + imgs_shape = [[e.shape[1], e.shape[2]] for e in imgs] + max_shape_h = max([e[0] for e in imgs_shape]) + max_shape_w = max([e[1] for e in imgs_shape]) + padding_imgs = [] + for img in imgs: + im_c, im_h, im_w = img.shape[:] + padding_im = np.zeros( + (im_c, max_shape_h, max_shape_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = img + padding_imgs.append(padding_im) + inputs['image'] = np.stack(padding_imgs, axis=0) + return inputs + + +class PredictConfig(): + """set config of preprocess, postprocess and visualize + Args: + model_dir (str): root path of model.yml + """ + + def __init__(self, model_dir): + # parsing Yaml config for Preprocess + deploy_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.labels = yml_conf['label_list'] + self.mask = False + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + if 'mask' in yml_conf: + self.mask = yml_conf['mask'] + self.tracker = None + if 'tracker' in yml_conf: + self.tracker = yml_conf['tracker'] + if 'NMS' in yml_conf: + self.nms = yml_conf['NMS'] + if 'fpn_stride' in yml_conf: + self.fpn_stride = yml_conf['fpn_stride'] + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def load_predictor(model_dir, + arch, + run_mode='paddle', + batch_size=1, + device='CPU', + min_subgraph_size=3, + use_dynamic_shape=False, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + enable_mkldnn_bfloat16=False, + delete_shuffle_pass=False, + tuned_trt_shape_file="shape_range_info.pbtxt"): + """set AnalysisConfig, generate AnalysisPredictor + Args: + model_dir (str): root path of __model__ and __params__ + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8) + use_dynamic_shape (bool): use dynamic shape or not + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT. + Used by action model. + Returns: + predictor (PaddlePredictor): AnalysisPredictor + Raises: + ValueError: predict by TensorRT need device == 'GPU'. + """ + if device != 'GPU' and run_mode != 'paddle': + raise ValueError( + "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}" + .format(run_mode, device)) + infer_model = os.path.join(model_dir, 'model.pdmodel') + infer_params = os.path.join(model_dir, 'model.pdiparams') + if not os.path.exists(infer_model): + infer_model = os.path.join(model_dir, 'inference.pdmodel') + infer_params = os.path.join(model_dir, 'inference.pdiparams') + if not os.path.exists(infer_model): + raise ValueError( + "Cannot find any inference model in dir: {},".format(model_dir)) + config = Config(infer_model, infer_params) + if device == 'GPU': + # initial GPU memory(M), device ID + config.enable_use_gpu(200, 0) + # optimize graph and fuse op + config.switch_ir_optim(True) + elif device == 'XPU': + if config.lite_engine_enabled(): + config.enable_lite_engine() + config.enable_xpu(10 * 1024 * 1024) + elif device == 'NPU': + if config.lite_engine_enabled(): + config.enable_lite_engine() + config.enable_npu() + else: + config.disable_gpu() + config.set_cpu_math_library_num_threads(cpu_threads) + if enable_mkldnn: + try: + # cache 10 different shapes for mkldnn to avoid memory leak + config.set_mkldnn_cache_capacity(10) + config.enable_mkldnn() + if enable_mkldnn_bfloat16: + config.enable_mkldnn_bfloat16() + except Exception as e: + print( + "The current environment does not support `mkldnn`, so disable mkldnn." + ) + pass + + precision_map = { + 'trt_int8': Config.Precision.Int8, + 'trt_fp32': Config.Precision.Float32, + 'trt_fp16': Config.Precision.Half + } + if run_mode in precision_map.keys(): + if arch in TUNED_TRT_DYNAMIC_MODELS: + config.collect_shape_range_info(tuned_trt_shape_file) + config.enable_tensorrt_engine( + workspace_size=(1 << 25) * batch_size, + max_batch_size=batch_size, + min_subgraph_size=min_subgraph_size, + precision_mode=precision_map[run_mode], + use_static=False, + use_calib_mode=trt_calib_mode) + if arch in TUNED_TRT_DYNAMIC_MODELS: + config.enable_tuned_tensorrt_dynamic_shape(tuned_trt_shape_file, + True) + + if use_dynamic_shape: + min_input_shape = { + 'image': [batch_size, 3, trt_min_shape, trt_min_shape], + 'scale_factor': [batch_size, 2] + } + max_input_shape = { + 'image': [batch_size, 3, trt_max_shape, trt_max_shape], + 'scale_factor': [batch_size, 2] + } + opt_input_shape = { + 'image': [batch_size, 3, trt_opt_shape, trt_opt_shape], + 'scale_factor': [batch_size, 2] + } + config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape, + opt_input_shape) + print('trt set dynamic shape done!') + + # disable print log when predict + config.disable_glog_info() + # enable shared memory + config.enable_memory_optim() + # disable feed, fetch OP, needed by zero_copy_run + config.switch_use_feed_fetch_ops(False) + if delete_shuffle_pass: + config.delete_pass("shuffle_channel_detect_pass") + predictor = create_predictor(config) + return predictor, config + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def visualize(image_list, result, labels, output_dir='output/', threshold=0.5): + # visualize the predict result + start_idx = 0 + for idx, image_file in enumerate(image_list): + im_bboxes_num = result['boxes_num'][idx] + im_results = {} + if 'boxes' in result: + im_results['boxes'] = result['boxes'][start_idx:start_idx + + im_bboxes_num, :] + if 'masks' in result: + im_results['masks'] = result['masks'][start_idx:start_idx + + im_bboxes_num, :] + if 'segm' in result: + im_results['segm'] = result['segm'][start_idx:start_idx + + im_bboxes_num, :] + if 'label' in result: + im_results['label'] = result['label'][start_idx:start_idx + + im_bboxes_num] + if 'score' in result: + im_results['score'] = result['score'][start_idx:start_idx + + im_bboxes_num] + + start_idx += im_bboxes_num + im = visualize_box_mask( + image_file, im_results, labels, threshold=threshold) + img_name = os.path.split(image_file)[-1] + if not os.path.exists(output_dir): + os.makedirs(output_dir) + out_path = os.path.join(output_dir, img_name) + im.save(out_path, quality=95) + print("save result to: " + out_path) + + +def print_arguments(args): + print('----------- Running Arguments -----------') + for arg, value in sorted(vars(args).items()): + print('%s: %s' % (arg, value)) + print('------------------------------------------') + + +def main(): + deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + detector_func = 'Detector' + if arch == 'SOLOv2': + detector_func = 'DetectorSOLOv2' + elif arch == 'PicoDet': + detector_func = 'DetectorPicoDet' + + detector = eval(detector_func)( + FLAGS.model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + enable_mkldnn_bfloat16=FLAGS.enable_mkldnn_bfloat16, + threshold=FLAGS.threshold, + output_dir=FLAGS.output_dir) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + if FLAGS.image_dir is None and FLAGS.image_file is not None: + assert FLAGS.batch_size == 1, "batch_size should be 1, when image_file is not None" + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + if FLAGS.slice_infer: + detector.predict_image_slice( + img_list, + FLAGS.slice_size, + FLAGS.overlap_ratio, + FLAGS.combine_method, + FLAGS.match_threshold, + FLAGS.match_metric, + visual=FLAGS.save_images, + save_results=FLAGS.save_results) + else: + detector.predict_image( + img_list, + FLAGS.run_benchmark, + repeats=100, + visual=FLAGS.save_images, + save_results=FLAGS.save_results) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='DET') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU' + ], "device should be CPU, GPU, XPU or NPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + assert not ( + FLAGS.enable_mkldnn == False and FLAGS.enable_mkldnn_bfloat16 == True + ), 'To enable mkldnn bfloat, please turn on both enable_mkldnn and enable_mkldnn_bfloat16' + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/keypoint_infer.py b/PaddleDetection-release-2.6/deploy/python/keypoint_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..52e12fda74fdac95d33c95eea3de062ce43ee774 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/keypoint_infer.py @@ -0,0 +1,415 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import yaml +import glob +from functools import reduce + +from PIL import Image +import cv2 +import math +import numpy as np +import paddle + +import sys +# add deploy path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) +sys.path.insert(0, parent_path) + +from preprocess import preprocess, NormalizeImage, Permute +from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop +from keypoint_postprocess import HrHRNetPostProcess, HRNetPostProcess +from visualize import visualize_pose +from paddle.inference import Config +from paddle.inference import create_predictor +from utils import argsparser, Timer, get_current_memory_mb +from benchmark_utils import PaddleInferBenchmark +from infer import Detector, get_test_images, print_arguments + +# Global dictionary +KEYPOINT_SUPPORT_MODELS = { + 'HigherHRNet': 'keypoint_bottomup', + 'HRNet': 'keypoint_topdown' +} + + +class KeyPointDetector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + use_dark(bool): whether to use postprocess in DarkPose + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + use_dark=True): + super(KeyPointDetector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.use_dark = use_dark + + def set_config(self, model_dir): + return PredictConfig_KeyPoint(model_dir) + + def get_person_from_rect(self, image, results): + # crop the person result from image + self.det_times.preprocess_time_s.start() + valid_rects = results['boxes'] + rect_images = [] + new_rects = [] + org_rects = [] + for rect in valid_rects: + rect_image, new_rect, org_rect = expand_crop(image, rect) + if rect_image is None or rect_image.size == 0: + continue + rect_images.append(rect_image) + new_rects.append(new_rect) + org_rects.append(org_rect) + self.det_times.preprocess_time_s.end() + return rect_images, new_rects, org_rects + + def postprocess(self, inputs, result): + np_heatmap = result['heatmap'] + np_masks = result['masks'] + # postprocess output of predictor + if KEYPOINT_SUPPORT_MODELS[ + self.pred_config.arch] == 'keypoint_bottomup': + results = {} + h, w = inputs['im_shape'][0] + preds = [np_heatmap] + if np_masks is not None: + preds += np_masks + preds += [h, w] + keypoint_postprocess = HrHRNetPostProcess() + kpts, scores = keypoint_postprocess(*preds) + results['keypoint'] = kpts + results['score'] = scores + return results + elif KEYPOINT_SUPPORT_MODELS[ + self.pred_config.arch] == 'keypoint_topdown': + results = {} + imshape = inputs['im_shape'][:, ::-1] + center = np.round(imshape / 2.) + scale = imshape / 200. + keypoint_postprocess = HRNetPostProcess(use_dark=self.use_dark) + kpts, scores = keypoint_postprocess(np_heatmap, center, scale) + results['keypoint'] = kpts + results['score'] = scores + return results + else: + raise ValueError("Unsupported arch: {}, expect {}".format( + self.pred_config.arch, KEYPOINT_SUPPORT_MODELS)) + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeat number for prediction + Returns: + results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + MaskRCNN's results include 'masks': np.ndarray: + shape: [N, im_h, im_w] + ''' + # model prediction + np_heatmap, np_masks = None, None + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + heatmap_tensor = self.predictor.get_output_handle(output_names[0]) + np_heatmap = heatmap_tensor.copy_to_cpu() + if self.pred_config.tagmap: + masks_tensor = self.predictor.get_output_handle(output_names[1]) + heat_k = self.predictor.get_output_handle(output_names[2]) + inds_k = self.predictor.get_output_handle(output_names[3]) + np_masks = [ + masks_tensor.copy_to_cpu(), heat_k.copy_to_cpu(), + inds_k.copy_to_cpu() + ] + result = dict(heatmap=np_heatmap, masks=np_masks) + return result + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True): + results = [] + batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size) + for i in range(batch_loop_cnt): + start_index = i * self.batch_size + end_index = min((i + 1) * self.batch_size, len(image_list)) + batch_image_list = image_list[start_index:end_index] + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + # preprocess + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + # postprocess + self.det_times.postprocess_time_s.start() + result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + self.det_times.img_num += len(batch_image_list) + + if visual: + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + visualize( + batch_image_list, + result, + visual_thresh=self.threshold, + save_dir=self.output_dir) + + results.append(result) + if visual: + print('Test iter {}'.format(i)) + results = self.merge_batch_result(results) + return results + + def predict_video(self, video_file, camera_id): + video_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_name) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + index = 1 + while (1): + ret, frame = capture.read() + if not ret: + break + print('detect frame: %d' % (index)) + index += 1 + results = self.predict_image([frame[:, :, ::-1]], visual=False) + im_results = {} + im_results['keypoint'] = [results['keypoint'], results['score']] + im = visualize_pose( + frame, im_results, visual_thresh=self.threshold, returnimg=True) + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + writer.release() + + +def create_inputs(imgs, im_info): + """generate input for different model type + Args: + imgs (list(numpy)): list of image (np.ndarray) + im_info (list(dict)): list of image info + Returns: + inputs (dict): input of model + """ + inputs = {} + inputs['image'] = np.stack(imgs, axis=0).astype('float32') + im_shape = [] + for e in im_info: + im_shape.append(np.array((e['im_shape'])).astype('float32')) + inputs['im_shape'] = np.stack(im_shape, axis=0) + return inputs + + +class PredictConfig_KeyPoint(): + """set config of preprocess, postprocess and visualize + Args: + model_dir (str): root path of model.yml + """ + + def __init__(self, model_dir): + # parsing Yaml config for Preprocess + deploy_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.archcls = KEYPOINT_SUPPORT_MODELS[yml_conf['arch']] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.labels = yml_conf['label_list'] + self.tagmap = False + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + if 'keypoint_bottomup' == self.archcls: + self.tagmap = True + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in KEYPOINT_SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], KEYPOINT_SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def visualize(image_list, results, visual_thresh=0.6, save_dir='output'): + im_results = {} + for i, image_file in enumerate(image_list): + skeletons = results['keypoint'] + scores = results['score'] + skeleton = skeletons[i:i + 1] + score = scores[i:i + 1] + im_results['keypoint'] = [skeleton, score] + visualize_pose( + image_file, + im_results, + visual_thresh=visual_thresh, + save_dir=save_dir) + + +def main(): + detector = KeyPointDetector( + FLAGS.model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.threshold, + output_dir=FLAGS.output_dir, + use_dark=FLAGS.use_dark) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mems = { + 'cpu_rss_mb': detector.cpu_mem / len(img_list), + 'gpu_rss_mb': detector.gpu_mem / len(img_list), + 'gpu_util': detector.gpu_util * 100 / len(img_list) + } + perf_info = detector.det_times.report(average=True) + model_dir = FLAGS.model_dir + mode = FLAGS.run_mode + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + data_info = { + 'batch_size': 1, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + det_log = PaddleInferBenchmark(detector.config, model_info, + data_info, perf_info, mems) + det_log('KeyPoint') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/keypoint_postprocess.py b/PaddleDetection-release-2.6/deploy/python/keypoint_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..69f1d3fd9dcc83278d331cd361b36d50e64ef508 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/keypoint_postprocess.py @@ -0,0 +1,369 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from scipy.optimize import linear_sum_assignment +from collections import abc, defaultdict +import cv2 +import numpy as np +import math +import paddle +import paddle.nn as nn +from keypoint_preprocess import get_affine_mat_kernel, get_affine_transform + + +class HrHRNetPostProcess(object): + """ + HrHRNet postprocess contain: + 1) get topk keypoints in the output heatmap + 2) sample the tagmap's value corresponding to each of the topk coordinate + 3) match different joints to combine to some people with Hungary algorithm + 4) adjust the coordinate by +-0.25 to decrease error std + 5) salvage missing joints by check positivity of heatmap - tagdiff_norm + Args: + max_num_people (int): max number of people support in postprocess + heat_thresh (float): value of topk below this threshhold will be ignored + tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init + + inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk + original_height, original_width (float): the original image size + """ + + def __init__(self, max_num_people=30, heat_thresh=0.2, tag_thresh=1.): + self.max_num_people = max_num_people + self.heat_thresh = heat_thresh + self.tag_thresh = tag_thresh + + def lerp(self, j, y, x, heatmap): + H, W = heatmap.shape[-2:] + left = np.clip(x - 1, 0, W - 1) + right = np.clip(x + 1, 0, W - 1) + up = np.clip(y - 1, 0, H - 1) + down = np.clip(y + 1, 0, H - 1) + offset_y = np.where(heatmap[j, down, x] > heatmap[j, up, x], 0.25, + -0.25) + offset_x = np.where(heatmap[j, y, right] > heatmap[j, y, left], 0.25, + -0.25) + return offset_y + 0.5, offset_x + 0.5 + + def __call__(self, heatmap, tagmap, heat_k, inds_k, original_height, + original_width): + + N, J, H, W = heatmap.shape + assert N == 1, "only support batch size 1" + heatmap = heatmap[0] + tagmap = tagmap[0] + heats = heat_k[0] + inds_np = inds_k[0] + y = inds_np // W + x = inds_np % W + tags = tagmap[np.arange(J)[None, :].repeat(self.max_num_people), + y.flatten(), x.flatten()].reshape(J, -1, tagmap.shape[-1]) + coords = np.stack((y, x), axis=2) + # threshold + mask = heats > self.heat_thresh + # cluster + cluster = defaultdict(lambda: { + 'coords': np.zeros((J, 2), dtype=np.float32), + 'scores': np.zeros(J, dtype=np.float32), + 'tags': [] + }) + for jid, m in enumerate(mask): + num_valid = m.sum() + if num_valid == 0: + continue + valid_inds = np.where(m)[0] + valid_tags = tags[jid, m, :] + if len(cluster) == 0: # initialize + for i in valid_inds: + tag = tags[jid, i] + key = tag[0] + cluster[key]['tags'].append(tag) + cluster[key]['scores'][jid] = heats[jid, i] + cluster[key]['coords'][jid] = coords[jid, i] + continue + candidates = list(cluster.keys())[:self.max_num_people] + centroids = [ + np.mean( + cluster[k]['tags'], axis=0) for k in candidates + ] + num_clusters = len(centroids) + # shape is (num_valid, num_clusters, tag_dim) + dist = valid_tags[:, None, :] - np.array(centroids)[None, ...] + l2_dist = np.linalg.norm(dist, ord=2, axis=2) + # modulate dist with heat value, see `use_detection_val` + cost = np.round(l2_dist) * 100 - heats[jid, m, None] + # pad the cost matrix, otherwise new pose are ignored + if num_valid > num_clusters: + cost = np.pad(cost, ((0, 0), (0, num_valid - num_clusters)), + 'constant', + constant_values=((0, 0), (0, 1e-10))) + rows, cols = linear_sum_assignment(cost) + for y, x in zip(rows, cols): + tag = tags[jid, y] + if y < num_valid and x < num_clusters and \ + l2_dist[y, x] < self.tag_thresh: + key = candidates[x] # merge to cluster + else: + key = tag[0] # initialize new cluster + cluster[key]['tags'].append(tag) + cluster[key]['scores'][jid] = heats[jid, y] + cluster[key]['coords'][jid] = coords[jid, y] + + # shape is [k, J, 2] and [k, J] + pose_tags = np.array([cluster[k]['tags'] for k in cluster]) + pose_coords = np.array([cluster[k]['coords'] for k in cluster]) + pose_scores = np.array([cluster[k]['scores'] for k in cluster]) + valid = pose_scores > 0 + + pose_kpts = np.zeros((pose_scores.shape[0], J, 3), dtype=np.float32) + if valid.sum() == 0: + return pose_kpts, pose_kpts + + # refine coords + valid_coords = pose_coords[valid].astype(np.int32) + y = valid_coords[..., 0].flatten() + x = valid_coords[..., 1].flatten() + _, j = np.nonzero(valid) + offsets = self.lerp(j, y, x, heatmap) + pose_coords[valid, 0] += offsets[0] + pose_coords[valid, 1] += offsets[1] + + # mean score before salvage + mean_score = pose_scores.mean(axis=1) + pose_kpts[valid, 2] = pose_scores[valid] + + # salvage missing joints + if True: + for pid, coords in enumerate(pose_coords): + tag_mean = np.array(pose_tags[pid]).mean(axis=0) + norm = np.sum((tagmap - tag_mean)**2, axis=3)**0.5 + score = heatmap - np.round(norm) # (J, H, W) + flat_score = score.reshape(J, -1) + max_inds = np.argmax(flat_score, axis=1) + max_scores = np.max(flat_score, axis=1) + salvage_joints = (pose_scores[pid] == 0) & (max_scores > 0) + if salvage_joints.sum() == 0: + continue + y = max_inds[salvage_joints] // W + x = max_inds[salvage_joints] % W + offsets = self.lerp(salvage_joints.nonzero()[0], y, x, heatmap) + y = y.astype(np.float32) + offsets[0] + x = x.astype(np.float32) + offsets[1] + pose_coords[pid][salvage_joints, 0] = y + pose_coords[pid][salvage_joints, 1] = x + pose_kpts[pid][salvage_joints, 2] = max_scores[salvage_joints] + pose_kpts[..., :2] = transpred(pose_coords[..., :2][..., ::-1], + original_height, original_width, + min(H, W)) + return pose_kpts, mean_score + + +def transpred(kpts, h, w, s): + trans, _ = get_affine_mat_kernel(h, w, s, inv=True) + + return warp_affine_joints(kpts[..., :2].copy(), trans) + + +def warp_affine_joints(joints, mat): + """Apply affine transformation defined by the transform matrix on the + joints. + + Args: + joints (np.ndarray[..., 2]): Origin coordinate of joints. + mat (np.ndarray[3, 2]): The affine matrix. + + Returns: + matrix (np.ndarray[..., 2]): Result coordinate of joints. + """ + joints = np.array(joints) + shape = joints.shape + joints = joints.reshape(-1, 2) + return np.dot(np.concatenate( + (joints, joints[:, 0:1] * 0 + 1), axis=1), + mat.T).reshape(shape) + + +class HRNetPostProcess(object): + def __init__(self, use_dark=True): + self.use_dark = use_dark + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped + + def get_max_preds(self, heatmaps): + """get predictions from score maps + + Args: + heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints + """ + assert isinstance(heatmaps, + np.ndarray), 'heatmaps should be numpy.ndarray' + assert heatmaps.ndim == 4, 'batch_images should be 4-ndim' + + batch_size = heatmaps.shape[0] + num_joints = heatmaps.shape[1] + width = heatmaps.shape[3] + heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1)) + idx = np.argmax(heatmaps_reshaped, 2) + maxvals = np.amax(heatmaps_reshaped, 2) + + maxvals = maxvals.reshape((batch_size, num_joints, 1)) + idx = idx.reshape((batch_size, num_joints, 1)) + + preds = np.tile(idx, (1, 1, 2)).astype(np.float32) + + preds[:, :, 0] = (preds[:, :, 0]) % width + preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) + + pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) + pred_mask = pred_mask.astype(np.float32) + + preds *= pred_mask + + return preds, maxvals + + def gaussian_blur(self, heatmap, kernel): + border = (kernel - 1) // 2 + batch_size = heatmap.shape[0] + num_joints = heatmap.shape[1] + height = heatmap.shape[2] + width = heatmap.shape[3] + for i in range(batch_size): + for j in range(num_joints): + origin_max = np.max(heatmap[i, j]) + dr = np.zeros((height + 2 * border, width + 2 * border)) + dr[border:-border, border:-border] = heatmap[i, j].copy() + dr = cv2.GaussianBlur(dr, (kernel, kernel), 0) + heatmap[i, j] = dr[border:-border, border:-border].copy() + heatmap[i, j] *= origin_max / np.max(heatmap[i, j]) + return heatmap + + def dark_parse(self, hm, coord): + heatmap_height = hm.shape[0] + heatmap_width = hm.shape[1] + px = int(coord[0]) + py = int(coord[1]) + if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2: + dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1]) + dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px]) + dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2]) + dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \ + + hm[py-1][px-1]) + dyy = 0.25 * ( + hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px]) + derivative = np.matrix([[dx], [dy]]) + hessian = np.matrix([[dxx, dxy], [dxy, dyy]]) + if dxx * dyy - dxy**2 != 0: + hessianinv = hessian.I + offset = -hessianinv * derivative + offset = np.squeeze(np.array(offset.T), axis=0) + coord += offset + return coord + + def dark_postprocess(self, hm, coords, kernelsize): + """ + refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py + + """ + hm = self.gaussian_blur(hm, kernelsize) + hm = np.maximum(hm, 1e-10) + hm = np.log(hm) + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + coords[n, p] = self.dark_parse(hm[n][p], coords[n][p]) + return coords + + def get_final_preds(self, heatmaps, center, scale, kernelsize=3): + """the highest heatvalue location with a quarter offset in the + direction from the highest response to the second highest response. + + Args: + heatmaps (numpy.ndarray): The predicted heatmaps + center (numpy.ndarray): The boxes center + scale (numpy.ndarray): The scale factor + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints + """ + + coords, maxvals = self.get_max_preds(heatmaps) + + heatmap_height = heatmaps.shape[2] + heatmap_width = heatmaps.shape[3] + + if self.use_dark: + coords = self.dark_postprocess(heatmaps, coords, kernelsize) + else: + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + hm = heatmaps[n][p] + px = int(math.floor(coords[n][p][0] + 0.5)) + py = int(math.floor(coords[n][p][1] + 0.5)) + if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1: + diff = np.array([ + hm[py][px + 1] - hm[py][px - 1], + hm[py + 1][px] - hm[py - 1][px] + ]) + coords[n][p] += np.sign(diff) * .25 + preds = coords.copy() + + # Transform back + for i in range(coords.shape[0]): + preds[i] = transform_preds(coords[i], center[i], scale[i], + [heatmap_width, heatmap_height]) + + return preds, maxvals + + def __call__(self, output, center, scale): + preds, maxvals = self.get_final_preds(output, center, scale) + return np.concatenate( + (preds, maxvals), axis=-1), np.mean( + maxvals, axis=1) + + +def transform_preds(coords, center, scale, output_size): + target_coords = np.zeros(coords.shape) + trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1) + for p in range(coords.shape[0]): + target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) + return target_coords + + +def affine_transform(pt, t): + new_pt = np.array([pt[0], pt[1], 1.]).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] + + +def translate_to_ori_images(keypoint_result, batch_records): + kpts = keypoint_result['keypoint'] + scores = keypoint_result['score'] + kpts[..., 0] += batch_records[:, 0:1] + kpts[..., 1] += batch_records[:, 1:2] + return kpts, scores diff --git a/PaddleDetection-release-2.6/deploy/python/keypoint_preprocess.py b/PaddleDetection-release-2.6/deploy/python/keypoint_preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..b4e50e887a2b4bdecaf6670d0385dd1d7f889824 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/keypoint_preprocess.py @@ -0,0 +1,243 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is based on https://github.com/open-mmlab/mmpose/mmpose/core/post_processing/post_transforms.py +""" +import cv2 +import numpy as np + + +class EvalAffine(object): + def __init__(self, size, stride=64): + super(EvalAffine, self).__init__() + self.size = size + self.stride = stride + + def __call__(self, image, im_info): + s = self.size + h, w, _ = image.shape + trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False) + image_resized = cv2.warpAffine(image, trans, size_resized) + return image_resized, im_info + + +def get_affine_mat_kernel(h, w, s, inv=False): + if w < h: + w_ = s + h_ = int(np.ceil((s / w * h) / 64.) * 64) + scale_w = w + scale_h = h_ / w_ * w + + else: + h_ = s + w_ = int(np.ceil((s / h * w) / 64.) * 64) + scale_h = h + scale_w = w_ / h_ * h + + center = np.array([np.round(w / 2.), np.round(h / 2.)]) + + size_resized = (w_, h_) + trans = get_affine_transform( + center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv) + + return trans, size_resized + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +def expand_crop(images, rect, expand_ratio=0.3): + imgh, imgw, c = images.shape + label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()] + if label != 0: + return None, None, None + org_rect = [xmin, ymin, xmax, ymax] + h_half = (ymax - ymin) * (1 + expand_ratio) / 2. + w_half = (xmax - xmin) * (1 + expand_ratio) / 2. + if h_half > w_half * 4 / 3: + w_half = h_half * 0.75 + center = [(ymin + ymax) / 2., (xmin + xmax) / 2.] + ymin = max(0, int(center[0] - h_half)) + ymax = min(imgh - 1, int(center[0] + h_half)) + xmin = max(0, int(center[1] - w_half)) + xmax = min(imgw - 1, int(center[1] + w_half)) + return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect diff --git a/PaddleDetection-release-2.6/deploy/python/mot_centertrack_infer.py b/PaddleDetection-release-2.6/deploy/python/mot_centertrack_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..c04a96876ae3d3b785366029e119be3d943f92fa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/mot_centertrack_infer.py @@ -0,0 +1,505 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import copy +import math +import time +import yaml +import cv2 +import numpy as np +from collections import defaultdict +import paddle + +from benchmark_utils import PaddleInferBenchmark +from utils import gaussian_radius, gaussian2D, draw_umich_gaussian +from preprocess import preprocess, decode_image, WarpAffine, NormalizeImage, Permute +from utils import argsparser, Timer, get_current_memory_mb +from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig +from keypoint_preprocess import get_affine_transform + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from pptracking.python.mot import CenterTracker +from pptracking.python.mot.utils import MOTTimer, write_mot_results +from pptracking.python.mot.visualize import plot_tracking + + +def transform_preds_with_trans(coords, trans): + target_coords = np.ones((coords.shape[0], 3), np.float32) + target_coords[:, :2] = coords + target_coords = np.dot(trans, target_coords.transpose()).transpose() + return target_coords[:, :2] + + +def affine_transform(pt, t): + new_pt = np.array([pt[0], pt[1], 1.]).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] + + +def affine_transform_bbox(bbox, trans, width, height): + bbox = np.array(copy.deepcopy(bbox), dtype=np.float32) + bbox[:2] = affine_transform(bbox[:2], trans) + bbox[2:] = affine_transform(bbox[2:], trans) + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, width - 1) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, height - 1) + return bbox + + +class CenterTrack(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + """ + + def __init__( + self, + model_dir, + tracker_config=None, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=960, + trt_opt_shape=544, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, ): + super(CenterTrack, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts + assert batch_size == 1, "MOT model only supports batch_size=1." + self.det_times = Timer(with_tracker=True) + self.num_classes = len(self.pred_config.labels) + + # tracker config + cfg = self.pred_config.tracker + min_box_area = cfg.get('min_box_area', -1) + vertical_ratio = cfg.get('vertical_ratio', -1) + track_thresh = cfg.get('track_thresh', 0.4) + pre_thresh = cfg.get('pre_thresh', 0.5) + + self.tracker = CenterTracker( + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + track_thresh=track_thresh, + pre_thresh=pre_thresh) + + self.pre_image = None + + def get_additional_inputs(self, dets, meta, with_hm=True): + # Render input heatmap from previous trackings. + trans_input = meta['trans_input'] + inp_width, inp_height = int(meta['inp_width']), int(meta['inp_height']) + input_hm = np.zeros((1, inp_height, inp_width), dtype=np.float32) + + for det in dets: + if det['score'] < self.tracker.pre_thresh: + continue + bbox = affine_transform_bbox(det['bbox'], trans_input, inp_width, + inp_height) + h, w = bbox[3] - bbox[1], bbox[2] - bbox[0] + if (h > 0 and w > 0): + radius = gaussian_radius( + (math.ceil(h), math.ceil(w)), min_overlap=0.7) + radius = max(0, int(radius)) + ct = np.array( + [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], + dtype=np.float32) + ct_int = ct.astype(np.int32) + if with_hm: + input_hm[0] = draw_umich_gaussian(input_hm[0], ct_int, + radius) + if with_hm: + input_hm = input_hm[np.newaxis] + return input_hm + + def preprocess(self, image_list): + preprocess_ops = [] + for op_info in self.pred_config.preprocess_infos: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + preprocess_ops.append(eval(op_type)(**new_op_info)) + + assert len(image_list) == 1, 'MOT only support bs=1' + im_path = image_list[0] + im, im_info = preprocess(im_path, preprocess_ops) + #inputs = create_inputs(im, im_info) + inputs = {} + inputs['image'] = np.array((im, )).astype('float32') + inputs['im_shape'] = np.array( + (im_info['im_shape'], )).astype('float32') + inputs['scale_factor'] = np.array( + (im_info['scale_factor'], )).astype('float32') + + inputs['trans_input'] = im_info['trans_input'] + inputs['inp_width'] = im_info['inp_width'] + inputs['inp_height'] = im_info['inp_height'] + inputs['center'] = im_info['center'] + inputs['scale'] = im_info['scale'] + inputs['out_height'] = im_info['out_height'] + inputs['out_width'] = im_info['out_width'] + + if self.pre_image is None: + self.pre_image = inputs['image'] + # initializing tracker for the first frame + self.tracker.init_track([]) + inputs['pre_image'] = self.pre_image + self.pre_image = inputs['image'] # Note: update for next image + + # render input heatmap from tracker status + pre_hm = self.get_additional_inputs( + self.tracker.tracks, inputs, with_hm=True) + inputs['pre_hm'] = pre_hm #.to_tensor(pre_hm) + + input_names = self.predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.predictor.get_input_handle(input_names[i]) + if input_names[i] == 'x': + input_tensor.copy_from_cpu(inputs['image']) + else: + input_tensor.copy_from_cpu(inputs[input_names[i]]) + + return inputs + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_bboxes = result['bboxes'] + if np_bboxes.shape[0] <= 0: + print('[WARNNING] No object detected and tracked.') + result = {'bboxes': np.zeros([0, 6]), 'cts': None, 'tracking': None} + return result + result = {k: v for k, v in result.items() if v is not None} + return result + + def centertrack_post_process(self, dets, meta, out_thresh): + if not ('bboxes' in dets): + return [{}] + + preds = [] + c, s = meta['center'], meta['scale'] + h, w = meta['out_height'], meta['out_width'] + trans = get_affine_transform( + center=c, + input_size=s, + rot=0, + output_size=[w, h], + shift=(0., 0.), + inv=True).astype(np.float32) + for i, dets_bbox in enumerate(dets['bboxes']): + if dets_bbox[1] < out_thresh: + break + item = {} + item['score'] = dets_bbox[1] + item['class'] = int(dets_bbox[0]) + 1 + item['ct'] = transform_preds_with_trans( + dets['cts'][i].reshape([1, 2]), trans).reshape(2) + + if 'tracking' in dets: + tracking = transform_preds_with_trans( + (dets['tracking'][i] + dets['cts'][i]).reshape([1, 2]), + trans).reshape(2) + item['tracking'] = tracking - item['ct'] + + if 'bboxes' in dets: + bbox = transform_preds_with_trans( + dets_bbox[2:6].reshape([2, 2]), trans).reshape(4) + item['bbox'] = bbox + + preds.append(item) + return preds + + def tracking(self, inputs, det_results): + result = self.centertrack_post_process( + det_results, inputs, self.tracker.out_thresh) + online_targets = self.tracker.update(result) + + online_tlwhs, online_scores, online_ids = [], [], [] + for t in online_targets: + bbox = t['bbox'] + tlwh = [bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]] + tscore = float(t['score']) + tid = int(t['tracking_id']) + if tlwh[2] * tlwh[3] > 0: + online_tlwhs.append(tlwh) + online_ids.append(tid) + online_scores.append(tscore) + return online_tlwhs, online_scores, online_ids + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'bboxes', 'cts' and 'tracking': + np.ndarray: shape:[N,6],[N,2] and [N,2], N: number of box + ''' + # model prediction + np_bboxes, np_cts, np_tracking = None, None, None + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + bboxes_tensor = self.predictor.get_output_handle(output_names[0]) + np_bboxes = bboxes_tensor.copy_to_cpu() + cts_tensor = self.predictor.get_output_handle(output_names[1]) + np_cts = cts_tensor.copy_to_cpu() + tracking_tensor = self.predictor.get_output_handle(output_names[2]) + np_tracking = tracking_tensor.copy_to_cpu() + + result = dict( + bboxes=np_bboxes, + cts=np_cts, + tracking=np_tracking) + return result + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + seq_name=None): + mot_results = [] + num_classes = self.num_classes + image_list.sort() + ids2names = self.pred_config.labels + data_type = 'mcmot' if num_classes > 1 else 'mot' + for frame_id, img_file in enumerate(image_list): + batch_image_list = [img_file] # bs=1 in MOT model + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking + result_warmup = self.tracking(inputs, det_result) + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking(inputs, + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking process + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking(inputs, + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + if visual: + if len(image_list) > 1 and frame_id % 10 == 0: + print('Tracking frame {}'.format(frame_id)) + frame, _ = decode_image(img_file, {}) + + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + if seq_name is None: + seq_name = image_list[0].split('/')[-2] + save_dir = os.path.join(self.output_dir, seq_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) + + mot_results.append([online_tlwhs, online_scores, online_ids]) + return mot_results + + def predict_video(self, video_file, camera_id): + video_out_name = 'mot_output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 1 + timer = MOTTimer() + results = defaultdict(list) # centertrack onpy support single class + num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + frame_id += 1 + + timer.tic() + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame[:, :, ::-1]], visual=False, seq_name=seq_name) + timer.toc() + + fps = 1. / timer.duration + online_tlwhs, online_scores, online_ids = mot_results[0] + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + + write_mot_results(result_filename, results, data_type, num_classes) + + writer.release() + + +def main(): + detector = CenterTrack( + FLAGS.model_dir, + tracker_config=None, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='MOT') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/mot_jde_infer.py b/PaddleDetection-release-2.6/deploy/python/mot_jde_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..51a2562ee554a3eeb2489821d376486dcba0985c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/mot_jde_infer.py @@ -0,0 +1,381 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import yaml +import cv2 +import numpy as np +from collections import defaultdict +import paddle + +from benchmark_utils import PaddleInferBenchmark +from preprocess import decode_image +from utils import argsparser, Timer, get_current_memory_mb +from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from pptracking.python.mot import JDETracker +from pptracking.python.mot.utils import MOTTimer, write_mot_results +from pptracking.python.mot.visualize import plot_tracking_dict + +# Global dictionary +MOT_JDE_SUPPORT_MODELS = { + 'JDE', + 'FairMOT', +} + + +class JDE_Detector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + """ + + def __init__( + self, + model_dir, + tracker_config=None, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1088, + trt_opt_shape=608, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, ): + super(JDE_Detector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts + assert batch_size == 1, "MOT model only supports batch_size=1." + self.det_times = Timer(with_tracker=True) + self.num_classes = len(self.pred_config.labels) + + # tracker config + assert self.pred_config.tracker, "The exported JDE Detector model should have tracker." + cfg = self.pred_config.tracker + min_box_area = cfg.get('min_box_area', 0.0) + vertical_ratio = cfg.get('vertical_ratio', 0.0) + conf_thres = cfg.get('conf_thres', 0.0) + tracked_thresh = cfg.get('tracked_thresh', 0.7) + metric_type = cfg.get('metric_type', 'euclidean') + + self.tracker = JDETracker( + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + conf_thres=conf_thres, + tracked_thresh=tracked_thresh, + metric_type=metric_type) + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes = result['pred_dets'] + if np_boxes.shape[0] <= 0: + print('[WARNNING] No object detected.') + result = {'pred_dets': np.zeros([0, 6]), 'pred_embs': None} + result = {k: v for k, v in result.items() if v is not None} + return result + + def tracking(self, det_results): + pred_dets = det_results['pred_dets'] # cls_id, score, x0, y0, x1, y1 + pred_embs = det_results['pred_embs'] + online_targets_dict = self.tracker.update(pred_dets, pred_embs) + + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(self.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + return online_tlwhs, online_scores, online_ids + + def predict(self, repeats=1): + ''' + Args: + repeats (int): repeats number for prediction + Returns: + result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + FairMOT(JDE)'s result include 'pred_embs': np.ndarray: + shape: [N, 128] + ''' + # model prediction + np_pred_dets, np_pred_embs = None, None + for i in range(repeats): + self.predictor.run() + output_names = self.predictor.get_output_names() + boxes_tensor = self.predictor.get_output_handle(output_names[0]) + np_pred_dets = boxes_tensor.copy_to_cpu() + embs_tensor = self.predictor.get_output_handle(output_names[1]) + np_pred_embs = embs_tensor.copy_to_cpu() + + result = dict(pred_dets=np_pred_dets, pred_embs=np_pred_embs) + return result + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + seq_name=None): + mot_results = [] + num_classes = self.num_classes + image_list.sort() + ids2names = self.pred_config.labels + data_type = 'mcmot' if num_classes > 1 else 'mot' + for frame_id, img_file in enumerate(image_list): + batch_image_list = [img_file] # bs=1 in MOT model + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking + result_warmup = self.tracking(det_result) + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking( + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking process + self.det_times.tracking_time_s.start() + online_tlwhs, online_scores, online_ids = self.tracking( + det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + if visual: + if len(image_list) > 1 and frame_id % 10 == 0: + print('Tracking frame {}'.format(frame_id)) + frame, _ = decode_image(img_file, {}) + + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + if seq_name is None: + seq_name = image_list[0].split('/')[-2] + save_dir = os.path.join(self.output_dir, seq_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) + + mot_results.append([online_tlwhs, online_scores, online_ids]) + return mot_results + + def predict_video(self, video_file, camera_id): + video_out_name = 'mot_output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 1 + timer = MOTTimer() + results = defaultdict(list) # support single class and multi classes + num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + frame_id += 1 + + timer.tic() + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame[:, :, ::-1]], visual=False, seq_name=seq_name) + timer.toc() + + online_tlwhs, online_scores, online_ids = mot_results[0] + for cls_id in range(num_classes): + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id], + online_ids[cls_id])) + + fps = 1. / timer.duration + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + + write_mot_results(result_filename, results, data_type, num_classes) + + writer.release() + + +def main(): + detector = JDE_Detector( + FLAGS.model_dir, + tracker_config=None, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='MOT') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_infer.py b/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..edf394152c28d682cdb5845050b78dbb27e8b22f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_infer.py @@ -0,0 +1,301 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import cv2 +import math +import numpy as np +import paddle +import yaml +import copy +from collections import defaultdict + +from mot_keypoint_unite_utils import argsparser +from preprocess import decode_image +from infer import print_arguments, get_test_images, bench_log +from mot_sde_infer import SDE_Detector +from mot_jde_infer import JDE_Detector, MOT_JDE_SUPPORT_MODELS +from keypoint_infer import KeyPointDetector, KEYPOINT_SUPPORT_MODELS +from det_keypoint_unite_infer import predict_with_given_det +from visualize import visualize_pose +from benchmark_utils import PaddleInferBenchmark +from utils import get_current_memory_mb +from keypoint_postprocess import translate_to_ori_images + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict +from pptracking.python.mot.utils import MOTTimer as FPSTimer + + +def convert_mot_to_det(tlwhs, scores): + results = {} + num_mot = len(tlwhs) + xyxys = copy.deepcopy(tlwhs) + for xyxy in xyxys.copy(): + xyxy[2:] = xyxy[2:] + xyxy[:2] + # support single class now + results['boxes'] = np.vstack( + [np.hstack([0, scores[i], xyxys[i]]) for i in range(num_mot)]) + results['boxes_num'] = np.array([num_mot]) + return results + + +def mot_topdown_unite_predict(mot_detector, + topdown_keypoint_detector, + image_list, + keypoint_batch_size=1, + save_res=False): + det_timer = mot_detector.get_timer() + store_res = [] + image_list.sort() + num_classes = mot_detector.num_classes + for i, img_file in enumerate(image_list): + # Decode image in advance in mot + pose prediction + det_timer.preprocess_time_s.start() + image, _ = decode_image(img_file, {}) + det_timer.preprocess_time_s.end() + + if FLAGS.run_benchmark: + mot_results = mot_detector.predict_image( + [image], run_benchmark=True, repeats=10) + + cm, gm, gu = get_current_memory_mb() + mot_detector.cpu_mem += cm + mot_detector.gpu_mem += gm + mot_detector.gpu_util += gu + else: + mot_results = mot_detector.predict_image([image], visual=False) + + online_tlwhs, online_scores, online_ids = mot_results[ + 0] # only support bs=1 in MOT model + results = convert_mot_to_det( + online_tlwhs[0], + online_scores[0]) # only support single class for mot + pose + if results['boxes_num'] == 0: + continue + + keypoint_res = predict_with_given_det( + image, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) + + if save_res: + save_name = img_file if isinstance(img_file, str) else i + store_res.append([ + save_name, keypoint_res['bbox'], + [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] + ]) + if FLAGS.run_benchmark: + cm, gm, gu = get_current_memory_mb() + topdown_keypoint_detector.cpu_mem += cm + topdown_keypoint_detector.gpu_mem += gm + topdown_keypoint_detector.gpu_util += gu + else: + if not os.path.exists(FLAGS.output_dir): + os.makedirs(FLAGS.output_dir) + visualize_pose( + img_file, + keypoint_res, + visual_thresh=FLAGS.keypoint_threshold, + save_dir=FLAGS.output_dir) + + if save_res: + """ + 1) store_res: a list of image_data + 2) image_data: [imageid, rects, [keypoints, scores]] + 3) rects: list of rect [xmin, ymin, xmax, ymax] + 4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list + 5) scores: mean of all joint conf + """ + with open("det_keypoint_unite_image_results.json", 'w') as wf: + json.dump(store_res, wf, indent=4) + + +def mot_topdown_unite_predict_video(mot_detector, + topdown_keypoint_detector, + camera_id, + keypoint_batch_size=1, + save_res=False): + video_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(FLAGS.video_file) + video_name = os.path.split(FLAGS.video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(FLAGS.output_dir): + os.makedirs(FLAGS.output_dir) + out_path = os.path.join(FLAGS.output_dir, video_name) + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + frame_id = 0 + timer_mot, timer_kp, timer_mot_kp = FPSTimer(), FPSTimer(), FPSTimer() + + num_classes = mot_detector.num_classes + assert num_classes == 1, 'Only one category mot model supported for uniting keypoint deploy.' + data_type = 'mot' + + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + frame_id += 1 + timer_mot_kp.tic() + + # mot model + timer_mot.tic() + + frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) + + mot_results = mot_detector.predict_image([frame2], visual=False) + timer_mot.toc() + online_tlwhs, online_scores, online_ids = mot_results[0] + results = convert_mot_to_det( + online_tlwhs[0], + online_scores[0]) # only support single class for mot + pose + if results['boxes_num'] == 0: + continue + + # keypoint model + timer_kp.tic() + keypoint_res = predict_with_given_det( + frame2, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) + timer_kp.toc() + timer_mot_kp.toc() + + kp_fps = 1. / timer_kp.duration + mot_kp_fps = 1. / timer_mot_kp.duration + + im = visualize_pose( + frame, + keypoint_res, + visual_thresh=FLAGS.keypoint_threshold, + returnimg=True, + ids=online_ids[0]) + + im = plot_tracking_dict( + im, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=mot_kp_fps) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Tracking and keypoint results', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + + writer.release() + print('output_video saved to: {}'.format(out_path)) + + +def main(): + deploy_file = os.path.join(FLAGS.mot_model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + mot_detector_func = 'SDE_Detector' + if arch in MOT_JDE_SUPPORT_MODELS: + mot_detector_func = 'JDE_Detector' + + mot_detector = eval(mot_detector_func)(FLAGS.mot_model_dir, + FLAGS.tracker_config, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.mot_threshold, + output_dir=FLAGS.output_dir) + + topdown_keypoint_detector = KeyPointDetector( + FLAGS.keypoint_model_dir, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.keypoint_batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + threshold=FLAGS.keypoint_threshold, + output_dir=FLAGS.output_dir, + use_dark=FLAGS.use_dark) + keypoint_arch = topdown_keypoint_detector.pred_config.arch + assert KEYPOINT_SUPPORT_MODELS[ + keypoint_arch] == 'keypoint_topdown', 'MOT-Keypoint unite inference only supports topdown models.' + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + mot_topdown_unite_predict_video( + mot_detector, topdown_keypoint_detector, FLAGS.camera_id, + FLAGS.keypoint_batch_size, FLAGS.save_res) + else: + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + mot_topdown_unite_predict(mot_detector, topdown_keypoint_detector, + img_list, FLAGS.keypoint_batch_size, + FLAGS.save_res) + if not FLAGS.run_benchmark: + mot_detector.det_times.info(average=True) + topdown_keypoint_detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + mot_model_dir = FLAGS.mot_model_dir + mot_model_info = { + 'model_name': mot_model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(mot_detector, img_list, mot_model_info, name='MOT') + + keypoint_model_dir = FLAGS.keypoint_model_dir + keypoint_model_info = { + 'model_name': keypoint_model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(topdown_keypoint_detector, img_list, keypoint_model_info, + FLAGS.keypoint_batch_size, 'KeyPoint') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_utils.py b/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..246f46fe95dfa70ca00d3c33faa47ae0046d548a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/mot_keypoint_unite_utils.py @@ -0,0 +1,139 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import ast +import argparse + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--mot_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--keypoint_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--keypoint_batch_size", + type=int, + default=1, + help=("batch_size for keypoint inference. In detection-keypoint unit" + "inference, the batch size in detection is 1. Then collate det " + "result in batch for keypoint inference.")) + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--mot_threshold", type=float, default=0.5, help="Threshold of score.") + parser.add_argument( + "--keypoint_threshold", + type=float, + default=0.5, + help="Threshold of score.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--run_benchmark", + type=ast.literal_eval, + default=False, + help="Whether to predict a image_file repeatedly for benchmark") + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1088, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=608, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + '--save_images', + action='store_true', + help='Save visualization image results.') + parser.add_argument( + '--save_mot_txts', + action='store_true', + help='Save tracking results (txt).') + parser.add_argument( + '--use_dark', + type=bool, + default=True, + help='whether to use darkpose to get better keypoint position predict ') + parser.add_argument( + '--save_res', + type=bool, + default=False, + help=( + "whether to save predict results to json file" + "1) store_res: a list of image_data" + "2) image_data: [imageid, rects, [keypoints, scores]]" + "3) rects: list of rect [xmin, ymin, xmax, ymax]" + "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list" + "5) scores: mean of all joint conf")) + parser.add_argument( + "--tracker_config", type=str, default=None, help=("tracker donfig")) + return parser diff --git a/PaddleDetection-release-2.6/deploy/python/mot_sde_infer.py b/PaddleDetection-release-2.6/deploy/python/mot_sde_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..b4a487facddc4a6eb9492ba367c5333b0c77d9a9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/mot_sde_infer.py @@ -0,0 +1,522 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import yaml +import cv2 +import numpy as np +from collections import defaultdict +import paddle + +from benchmark_utils import PaddleInferBenchmark +from preprocess import decode_image +from utils import argsparser, Timer, get_current_memory_mb +from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor + +# add python path +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from pptracking.python.mot import JDETracker, DeepSORTTracker +from pptracking.python.mot.utils import MOTTimer, write_mot_results, get_crops, clip_box +from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict + + +class SDE_Detector(Detector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + tracker_config (str): tracker config path + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT + """ + + def __init__(self, + model_dir, + tracker_config, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, + reid_model_dir=None): + super(SDE_Detector, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts + assert batch_size == 1, "MOT model only supports batch_size=1." + self.det_times = Timer(with_tracker=True) + self.num_classes = len(self.pred_config.labels) + + # reid config + self.use_reid = False if reid_model_dir is None else True + if self.use_reid: + self.reid_pred_config = self.set_config(reid_model_dir) + self.reid_predictor, self.config = load_predictor( + reid_model_dir, + run_mode=run_mode, + batch_size=50, # reid_batch_size + min_subgraph_size=self.reid_pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.reid_pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) + else: + self.reid_pred_config = None + self.reid_predictor = None + + assert tracker_config is not None, 'Note that tracker_config should be set.' + self.tracker_config = tracker_config + tracker_cfg = yaml.safe_load(open(self.tracker_config)) + cfg = tracker_cfg[tracker_cfg['type']] + + # tracker config + self.use_deepsort_tracker = True if tracker_cfg[ + 'type'] == 'DeepSORTTracker' else False + if self.use_deepsort_tracker: + # use DeepSORTTracker + if self.reid_pred_config is not None and hasattr( + self.reid_pred_config, 'tracker'): + cfg = self.reid_pred_config.tracker + budget = cfg.get('budget', 100) + max_age = cfg.get('max_age', 30) + max_iou_distance = cfg.get('max_iou_distance', 0.7) + matching_threshold = cfg.get('matching_threshold', 0.2) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + + self.tracker = DeepSORTTracker( + budget=budget, + max_age=max_age, + max_iou_distance=max_iou_distance, + matching_threshold=matching_threshold, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, ) + else: + # use ByteTracker + use_byte = cfg.get('use_byte', False) + det_thresh = cfg.get('det_thresh', 0.3) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + match_thres = cfg.get('match_thres', 0.9) + conf_thres = cfg.get('conf_thres', 0.6) + low_conf_thres = cfg.get('low_conf_thres', 0.1) + + self.tracker = JDETracker( + use_byte=use_byte, + det_thresh=det_thresh, + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + match_thres=match_thres, + conf_thres=conf_thres, + low_conf_thres=low_conf_thres, ) + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes_num = result['boxes_num'] + if np_boxes_num[0] <= 0: + print('[WARNNING] No object detected.') + result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} + result = {k: v for k, v in result.items() if v is not None} + return result + + def reidprocess(self, det_results, repeats=1): + pred_dets = det_results['boxes'] + pred_xyxys = pred_dets[:, 2:6] + + ori_image = det_results['ori_image'] + ori_image_shape = ori_image.shape[:2] + pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape) + + if len(keep_idx[0]) == 0: + det_results['boxes'] = np.zeros((1, 6), dtype=np.float32) + det_results['embeddings'] = None + return det_results + + pred_dets = pred_dets[keep_idx[0]] + pred_xyxys = pred_dets[:, 2:6] + + w, h = self.tracker.input_size + crops = get_crops(pred_xyxys, ori_image, w, h) + + # to keep fast speed, only use topk crops + crops = crops[:50] # reid_batch_size + det_results['crops'] = np.array(crops).astype('float32') + det_results['boxes'] = pred_dets[:50] + + input_names = self.reid_predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.reid_predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(det_results[input_names[i]]) + + # model prediction + for i in range(repeats): + self.reid_predictor.run() + output_names = self.reid_predictor.get_output_names() + feature_tensor = self.reid_predictor.get_output_handle(output_names[ + 0]) + pred_embs = feature_tensor.copy_to_cpu() + + det_results['embeddings'] = pred_embs + return det_results + + def tracking(self, det_results): + pred_dets = det_results['boxes'] # 'cls_id, score, x0, y0, x1, y1' + pred_embs = det_results.get('embeddings', None) + + if self.use_deepsort_tracker: + # use DeepSORTTracker, only support singe class + self.tracker.predict() + online_targets = self.tracker.update(pred_dets, pred_embs) + online_tlwhs, online_scores, online_ids = [], [], [] + for t in online_targets: + if not t.is_confirmed() or t.time_since_update > 1: + continue + tlwh = t.to_tlwh() + tscore = t.score + tid = t.track_id + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs.append(tlwh) + online_scores.append(tscore) + online_ids.append(tid) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + else: + # use ByteTracker, support multiple class + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + online_targets_dict = self.tracker.update(pred_dets, pred_embs) + for cls_id in range(self.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: + continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + + def predict_image(self, + image_list, + run_benchmark=False, + repeats=1, + visual=True, + seq_name=None): + num_classes = self.num_classes + image_list.sort() + ids2names = self.pred_config.labels + mot_results = [] + for frame_id, img_file in enumerate(image_list): + batch_image_list = [img_file] # bs=1 in MOT model + frame, _ = decode_image(img_file, {}) + if run_benchmark: + # preprocess + inputs = self.preprocess(batch_image_list) # warmup + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + # model prediction + result_warmup = self.predict(repeats=repeats) # warmup + self.det_times.inference_time_s.start() + result = self.predict(repeats=repeats) + self.det_times.inference_time_s.end(repeats=repeats) + + # postprocess + result_warmup = self.postprocess(inputs, result) # warmup + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) + result_warmup = self.tracking(det_result) + self.det_times.tracking_time_s.start() + if self.use_reid: + det_result = self.reidprocess(det_result) + tracking_outs = self.tracking(det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + cm, gm, gu = get_current_memory_mb() + self.cpu_mem += cm + self.gpu_mem += gm + self.gpu_util += gu + + else: + self.det_times.preprocess_time_s.start() + inputs = self.preprocess(batch_image_list) + self.det_times.preprocess_time_s.end() + + self.det_times.inference_time_s.start() + result = self.predict() + self.det_times.inference_time_s.end() + + self.det_times.postprocess_time_s.start() + det_result = self.postprocess(inputs, result) + self.det_times.postprocess_time_s.end() + + # tracking process + self.det_times.tracking_time_s.start() + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) + tracking_outs = self.tracking(det_result) + self.det_times.tracking_time_s.end() + self.det_times.img_num += 1 + + online_tlwhs = tracking_outs['online_tlwhs'] + online_scores = tracking_outs['online_scores'] + online_ids = tracking_outs['online_ids'] + + mot_results.append([online_tlwhs, online_scores, online_ids]) + + if visual: + if len(image_list) > 1 and frame_id % 10 == 0: + print('Tracking frame {}'.format(frame_id)) + frame, _ = decode_image(img_file, {}) + if isinstance(online_tlwhs, defaultdict): + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + else: + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + save_dir = os.path.join(self.output_dir, seq_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) + + return mot_results + + def predict_video(self, video_file, camera_id): + video_out_name = 'output.mp4' + if camera_id != -1: + capture = cv2.VideoCapture(camera_id) + else: + capture = cv2.VideoCapture(video_file) + video_out_name = os.path.split(video_file)[-1] + # Get Video info : resolution, fps, frame count + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("fps: %d, frame_count: %d" % (fps, frame_count)) + + if not os.path.exists(self.output_dir): + os.makedirs(self.output_dir) + out_path = os.path.join(self.output_dir, video_out_name) + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) + writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) + + frame_id = 1 + timer = MOTTimer() + results = defaultdict(list) + num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + + while (1): + ret, frame = capture.read() + if not ret: + break + if frame_id % 10 == 0: + print('Tracking frame: %d' % (frame_id)) + frame_id += 1 + + timer.tic() + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame[:, :, ::-1]], visual=False, seq_name=seq_name) + timer.toc() + + # bs=1 in MOT model + online_tlwhs, online_scores, online_ids = mot_results[0] + + fps = 1. / timer.duration + if self.use_deepsort_tracker: + # use DeepSORTTracker, only support singe class + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) + else: + # use ByteTracker, support multiple class + for cls_id in range(num_classes): + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], + online_scores[cls_id], online_ids[cls_id])) + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) + + writer.write(im) + if camera_id != -1: + cv2.imshow('Mask Detection', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + write_mot_results(result_filename, results) + + writer.release() + + +def main(): + deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + arch = yml_conf['arch'] + detector = SDE_Detector( + FLAGS.model_dir, + tracker_config=FLAGS.tracker_config, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=1, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts, ) + + # predict from video file or camera video stream + if FLAGS.video_file is not None or FLAGS.camera_id != -1: + detector.predict_video(FLAGS.video_file, FLAGS.camera_id) + else: + # predict from image + if FLAGS.image_dir is None and FLAGS.image_file is not None: + assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models." + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + seq_name = FLAGS.image_dir.split('/')[-1] + detector.predict_image( + img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name) + + if not FLAGS.run_benchmark: + detector.det_times.info(average=True) + else: + mode = FLAGS.run_mode + model_dir = FLAGS.model_dir + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + bench_log(detector, img_list, model_info, name='MOT') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/PaddleDetection-release-2.6/deploy/python/picodet_postprocess.py b/PaddleDetection-release-2.6/deploy/python/picodet_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..7df13f8278d13c51179c5502987926dec637bec4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/picodet_postprocess.py @@ -0,0 +1,227 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +from scipy.special import softmax + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetPostProcess(object): + """ + Args: + input_shape (int): network input image size + ori_shape (int): ori image shape of before padding + scale_factor (float): scale factor of ori image + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + input_shape, + ori_shape, + scale_factor, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.ori_shape = ori_shape + self.input_shape = input_shape + self.scale_factor = scale_factor + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def warp_boxes(self, boxes, ori_shape): + """Apply transform to boxes + """ + width, height = ori_shape[1], ori_shape[0] + n = len(boxes) + if n: + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + # xy = xy @ M.T # transform + xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate( + (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + # clip boxes + xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) + xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) + return xy.astype(np.float32) + else: + return boxes + + def __call__(self, scores, raw_boxes): + batch_size = raw_boxes[0].shape[0] + reg_max = int(raw_boxes[0].shape[-1] / 4 - 1) + out_boxes_num = [] + out_boxes_list = [] + for batch_id in range(batch_size): + # generate centers + decode_boxes = [] + select_scores = [] + for stride, box_distribute, score in zip(self.strides, raw_boxes, + scores): + box_distribute = box_distribute[batch_id] + score = score[batch_id] + # centers + fm_h = self.input_shape[0] / stride + fm_w = self.input_shape[1] / stride + h_range = np.arange(fm_h) + w_range = np.arange(fm_w) + ww, hh = np.meshgrid(w_range, h_range) + ct_row = (hh.flatten() + 0.5) * stride + ct_col = (ww.flatten() + 0.5) * stride + center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) + + # box distribution to distance + reg_range = np.arange(reg_max + 1) + box_distance = box_distribute.reshape((-1, reg_max + 1)) + box_distance = softmax(box_distance, axis=1) + box_distance = box_distance * np.expand_dims(reg_range, axis=0) + box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) + box_distance = box_distance * stride + + # top K candidate + topk_idx = np.argsort(score.max(axis=1))[::-1] + topk_idx = topk_idx[:self.nms_top_k] + center = center[topk_idx] + score = score[topk_idx] + box_distance = box_distance[topk_idx] + + # decode box + decode_box = center + [-1, -1, 1, 1] * box_distance + + select_scores.append(score) + decode_boxes.append(decode_box) + + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + out_boxes_num.append(0) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, :4] = self.warp_boxes( + picked_box_probs[:, :4], self.ori_shape[batch_id]) + im_scale = np.concatenate([ + self.scale_factor[batch_id][::-1], + self.scale_factor[batch_id][::-1] + ]) + picked_box_probs[:, :4] /= im_scale + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + out_boxes_num.append(len(picked_labels)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + out_boxes_num = np.asarray(out_boxes_num).astype(np.int32) + return out_boxes_list, out_boxes_num diff --git a/PaddleDetection-release-2.6/deploy/python/preprocess.py b/PaddleDetection-release-2.6/deploy/python/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..6f1a5a2a1a0e38e3edbd9685ad4013b6579ddb87 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/preprocess.py @@ -0,0 +1,522 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +from keypoint_preprocess import get_affine_transform +from PIL import Image + + +def decode_image(im_file, im_info): + """read rgb image + Args: + im_file (str|np.ndarray): input can be image path or np.ndarray + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + if isinstance(im_file, str): + with open(im_file, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + else: + im = im_file + im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32) + return im, im_info + + +class Resize_Mult32(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, limit_side_len, limit_type, interp=cv2.INTER_LINEAR): + self.limit_side_len = limit_side_len + self.limit_type = limit_type + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, img): + """ + Args: + img (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + limit_side_len = self.limit_side_len + h, w, c = img.shape + + # limit the max side + if self.limit_type == 'max': + if h > w: + ratio = float(limit_side_len) / h + else: + ratio = float(limit_side_len) / w + elif self.limit_type == 'min': + if h < w: + ratio = float(limit_side_len) / h + else: + ratio = float(limit_side_len) / w + elif self.limit_type == 'resize_long': + ratio = float(limit_side_len) / max(h, w) + else: + raise Exception('not support limit type, image ') + resize_h = int(h * ratio) + resize_w = int(w * ratio) + + resize_h = max(int(round(resize_h / 32) * 32), 32) + resize_w = max(int(round(resize_w / 32) * 32), 32) + + im_scale_y = resize_h / float(h) + im_scale_x = resize_w / float(w) + return im_scale_y, im_scale_x + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class ShortSizeScale(object): + """ + Scale images by short size. + Args: + short_size(float | int): Short size of an image will be scaled to the short_size. + fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True + do_round(bool): Whether to round up when calculating the zoom ratio. default: False + backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow' + """ + + def __init__(self, + short_size, + fixed_ratio=True, + keep_ratio=None, + do_round=False, + backend='pillow'): + self.short_size = short_size + assert (fixed_ratio and not keep_ratio) or ( + not fixed_ratio + ), "fixed_ratio and keep_ratio cannot be true at the same time" + self.fixed_ratio = fixed_ratio + self.keep_ratio = keep_ratio + self.do_round = do_round + + assert backend in [ + 'pillow', 'cv2' + ], "Scale's backend must be pillow or cv2, but get {backend}" + + self.backend = backend + + def __call__(self, img): + """ + Performs resize operations. + Args: + img (PIL.Image): a PIL.Image. + return: + resized_img: a PIL.Image after scaling. + """ + + result_img = None + + if isinstance(img, np.ndarray): + h, w, _ = img.shape + elif isinstance(img, Image.Image): + w, h = img.size + else: + raise NotImplementedError + + if w <= h: + ow = self.short_size + if self.fixed_ratio: # default is True + oh = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + oh = self.short_size + else: + scale_factor = self.short_size / w + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * self.short_size / h) + else: + oh = self.short_size + if self.fixed_ratio: + ow = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + ow = self.short_size + else: + scale_factor = self.short_size / h + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * self.short_size / h) + + if type(img) == np.ndarray: + img = Image.fromarray(img, mode='RGB') + + if self.backend == 'pillow': + result_img = img.resize((ow, oh), Image.BILINEAR) + elif self.backend == 'cv2' and (self.keep_ratio is not None): + result_img = cv2.resize( + img, (ow, oh), interpolation=cv2.INTER_LINEAR) + else: + result_img = Image.fromarray( + cv2.resize( + np.asarray(img), (ow, oh), interpolation=cv2.INTER_LINEAR)) + + return result_img + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + norm_type (str): type in ['mean_std', 'none'] + """ + + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1, + down_ratio=4): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + self.down_ratio = down_ratio + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + # True in detection eval/infer + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + # False in centertrack eval_mot/eval_mot + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + + if not self.keep_res: + out_h = input_h // self.down_ratio + out_w = input_w // self.down_ratio + trans_output = get_affine_transform(c, s, 0, [out_w, out_h]) + + im_info.update({ + 'center': c, + 'scale': s, + 'out_height': out_h, + 'out_width': out_w, + 'inp_height': input_h, + 'inp_width': input_w, + 'trans_input': trans_input, + 'trans_output': trans_output, + }) + return inp, im_info + + +def preprocess(im, preprocess_ops): + # process image by preprocess_ops + im_info = { + 'scale_factor': np.array( + [1., 1.], dtype=np.float32), + 'im_shape': None, + } + im, im_info = decode_image(im, im_info) + for operator in preprocess_ops: + im, im_info = operator(im, im_info) + return im, im_info diff --git a/PaddleDetection-release-2.6/deploy/python/tracker_config.yml b/PaddleDetection-release-2.6/deploy/python/tracker_config.yml new file mode 100644 index 0000000000000000000000000000000000000000..9531c549e3f6993da81147a41d55d47b35a12fef --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/tracker_config.yml @@ -0,0 +1,32 @@ +# config of tracker for MOT SDE Detector, use 'JDETracker' as default. +# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. +# Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. + +type: JDETracker # 'JDETracker', 'DeepSORTTracker' or 'CenterTracker' + +# BYTETracker +JDETracker: + use_byte: True + det_thresh: 0.3 + conf_thres: 0.6 + low_conf_thres: 0.1 + match_thres: 0.9 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 + +CenterTracker: + min_box_area: -1 + vertical_ratio: -1 + track_thresh: 0.4 + pre_thresh: 0.5 diff --git a/PaddleDetection-release-2.6/deploy/python/utils.py b/PaddleDetection-release-2.6/deploy/python/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d1f7d59f8571f2795af059300be69016f47fb4d7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/utils.py @@ -0,0 +1,534 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time +import os +import ast +import argparse +import numpy as np + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--batch_size", type=int, default=1, help="batch_size for inference.") + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--threshold", type=float, default=0.5, help="Threshold of score.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--use_gpu", + type=ast.literal_eval, + default=False, + help="Deprecated, please use `--device`.") + parser.add_argument( + "--run_benchmark", + type=ast.literal_eval, + default=False, + help="Whether to predict a image_file repeatedly for benchmark") + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--enable_mkldnn_bfloat16", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn bfloat16 inference with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1280, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=640, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + '--save_images', + type=ast.literal_eval, + default=True, + help='Save visualization image results.') + parser.add_argument( + '--save_mot_txts', + action='store_true', + help='Save tracking results (txt).') + parser.add_argument( + '--save_mot_txt_per_img', + action='store_true', + help='Save tracking results (txt) for each image.') + parser.add_argument( + '--scaled', + type=bool, + default=False, + help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 " + "True in general detector.") + parser.add_argument( + "--tracker_config", type=str, default=None, help=("tracker donfig")) + parser.add_argument( + "--reid_model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py.")) + parser.add_argument( + "--reid_batch_size", + type=int, + default=50, + help="max batch_size for reid model inference.") + parser.add_argument( + '--use_dark', + type=ast.literal_eval, + default=True, + help='whether to use darkpose to get better keypoint position predict ') + parser.add_argument( + "--action_file", + type=str, + default=None, + help="Path of input file for action recognition.") + parser.add_argument( + "--window_size", + type=int, + default=50, + help="Temporal size of skeleton feature for action recognition.") + parser.add_argument( + "--random_pad", + type=ast.literal_eval, + default=False, + help="Whether do random padding for action recognition.") + parser.add_argument( + "--save_results", + action='store_true', + default=False, + help="Whether save detection result to file using coco format") + parser.add_argument( + '--use_coco_category', + action='store_true', + default=False, + help='Whether to use the coco format dictionary `clsid2catid`') + parser.add_argument( + "--slice_infer", + action='store_true', + help="Whether to slice the image and merge the inference results for small object detection." + ) + parser.add_argument( + '--slice_size', + nargs='+', + type=int, + default=[640, 640], + help="Height of the sliced image.") + parser.add_argument( + "--overlap_ratio", + nargs='+', + type=float, + default=[0.25, 0.25], + help="Overlap height ratio of the sliced image.") + parser.add_argument( + "--combine_method", + type=str, + default='nms', + help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']." + ) + parser.add_argument( + "--match_threshold", + type=float, + default=0.6, + help="Combine method matching threshold.") + parser.add_argument( + "--match_metric", + type=str, + default='ios', + help="Combine method matching metric, choose in ['iou', 'ios'].") + return parser + + +class Times(object): + def __init__(self): + self.time = 0. + # start time + self.st = 0. + # end time + self.et = 0. + + def start(self): + self.st = time.time() + + def end(self, repeats=1, accumulative=True): + self.et = time.time() + if accumulative: + self.time += (self.et - self.st) / repeats + else: + self.time = (self.et - self.st) / repeats + + def reset(self): + self.time = 0. + self.st = 0. + self.et = 0. + + def value(self): + return round(self.time, 4) + + +class Timer(Times): + def __init__(self, with_tracker=False): + super(Timer, self).__init__() + self.with_tracker = with_tracker + self.preprocess_time_s = Times() + self.inference_time_s = Times() + self.postprocess_time_s = Times() + self.tracking_time_s = Times() + self.img_num = 0 + + def info(self, average=False): + pre_time = self.preprocess_time_s.value() + infer_time = self.inference_time_s.value() + post_time = self.postprocess_time_s.value() + track_time = self.tracking_time_s.value() + + total_time = pre_time + infer_time + post_time + if self.with_tracker: + total_time = total_time + track_time + total_time = round(total_time, 4) + print("------------------ Inference Time Info ----------------------") + print("total_time(ms): {}, img_num: {}".format(total_time * 1000, + self.img_num)) + preprocess_time = round(pre_time / max(1, self.img_num), + 4) if average else pre_time + postprocess_time = round(post_time / max(1, self.img_num), + 4) if average else post_time + inference_time = round(infer_time / max(1, self.img_num), + 4) if average else infer_time + tracking_time = round(track_time / max(1, self.img_num), + 4) if average else track_time + + average_latency = total_time / max(1, self.img_num) + qps = 0 + if total_time > 0: + qps = 1 / average_latency + print("average latency time(ms): {:.2f}, QPS: {:2f}".format( + average_latency * 1000, qps)) + if self.with_tracker: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000, tracking_time * 1000)) + else: + print( + "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}". + format(preprocess_time * 1000, inference_time * 1000, + postprocess_time * 1000)) + + def report(self, average=False): + dic = {} + pre_time = self.preprocess_time_s.value() + infer_time = self.inference_time_s.value() + post_time = self.postprocess_time_s.value() + track_time = self.tracking_time_s.value() + + dic['preprocess_time_s'] = round(pre_time / max(1, self.img_num), + 4) if average else pre_time + dic['inference_time_s'] = round(infer_time / max(1, self.img_num), + 4) if average else infer_time + dic['postprocess_time_s'] = round(post_time / max(1, self.img_num), + 4) if average else post_time + dic['img_num'] = self.img_num + total_time = pre_time + infer_time + post_time + if self.with_tracker: + dic['tracking_time_s'] = round(track_time / max(1, self.img_num), + 4) if average else track_time + total_time = total_time + track_time + dic['total_time_s'] = round(total_time, 4) + return dic + + +def get_current_memory_mb(): + """ + It is used to Obtain the memory usage of the CPU and GPU during the running of the program. + And this function Current program is time-consuming. + """ + import pynvml + import psutil + import GPUtil + gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0)) + + pid = os.getpid() + p = psutil.Process(pid) + info = p.memory_full_info() + cpu_mem = info.uss / 1024. / 1024. + gpu_mem = 0 + gpu_percent = 0 + gpus = GPUtil.getGPUs() + if gpu_id is not None and len(gpus) > 0: + gpu_percent = gpus[gpu_id].load + pynvml.nvmlInit() + handle = pynvml.nvmlDeviceGetHandleByIndex(0) + meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle) + gpu_mem = meminfo.used / 1024. / 1024. + return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4) + + +def multiclass_nms(bboxs, num_classes, match_threshold=0.6, match_metric='iou'): + final_boxes = [] + for c in range(num_classes): + idxs = bboxs[:, 0] == c + if np.count_nonzero(idxs) == 0: continue + r = nms(bboxs[idxs, 1:], match_threshold, match_metric) + final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1)) + return final_boxes + + +def nms(dets, match_threshold=0.6, match_metric='iou'): + """ Apply NMS to avoid detecting too many overlapping bounding boxes. + Args: + dets: shape [N, 5], [score, x1, y1, x2, y2] + match_metric: 'iou' or 'ios' + match_threshold: overlap thresh for match metric. + """ + if dets.shape[0] == 0: + return dets[[], :] + scores = dets[:, 0] + x1 = dets[:, 1] + y1 = dets[:, 2] + x2 = dets[:, 3] + y2 = dets[:, 4] + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + ndets = dets.shape[0] + suppressed = np.zeros((ndets), dtype=np.int32) + + for _i in range(ndets): + i = order[_i] + if suppressed[i] == 1: + continue + ix1 = x1[i] + iy1 = y1[i] + ix2 = x2[i] + iy2 = y2[i] + iarea = areas[i] + for _j in range(_i + 1, ndets): + j = order[_j] + if suppressed[j] == 1: + continue + xx1 = max(ix1, x1[j]) + yy1 = max(iy1, y1[j]) + xx2 = min(ix2, x2[j]) + yy2 = min(iy2, y2[j]) + w = max(0.0, xx2 - xx1 + 1) + h = max(0.0, yy2 - yy1 + 1) + inter = w * h + if match_metric == 'iou': + union = iarea + areas[j] - inter + match_value = inter / union + elif match_metric == 'ios': + smaller = min(iarea, areas[j]) + match_value = inter / smaller + else: + raise ValueError() + if match_value >= match_threshold: + suppressed[j] = 1 + keep = np.where(suppressed == 0)[0] + dets = dets[keep, :] + return dets + + +coco_clsid2catid = { + 0: 1, + 1: 2, + 2: 3, + 3: 4, + 4: 5, + 5: 6, + 6: 7, + 7: 8, + 8: 9, + 9: 10, + 10: 11, + 11: 13, + 12: 14, + 13: 15, + 14: 16, + 15: 17, + 16: 18, + 17: 19, + 18: 20, + 19: 21, + 20: 22, + 21: 23, + 22: 24, + 23: 25, + 24: 27, + 25: 28, + 26: 31, + 27: 32, + 28: 33, + 29: 34, + 30: 35, + 31: 36, + 32: 37, + 33: 38, + 34: 39, + 35: 40, + 36: 41, + 37: 42, + 38: 43, + 39: 44, + 40: 46, + 41: 47, + 42: 48, + 43: 49, + 44: 50, + 45: 51, + 46: 52, + 47: 53, + 48: 54, + 49: 55, + 50: 56, + 51: 57, + 52: 58, + 53: 59, + 54: 60, + 55: 61, + 56: 62, + 57: 63, + 58: 64, + 59: 65, + 60: 67, + 61: 70, + 62: 72, + 63: 73, + 64: 74, + 65: 75, + 66: 76, + 67: 77, + 68: 78, + 69: 79, + 70: 80, + 71: 81, + 72: 82, + 73: 84, + 74: 85, + 75: 86, + 76: 87, + 77: 88, + 78: 89, + 79: 90 +} + + +def gaussian_radius(bbox_size, min_overlap): + height, width = bbox_size + + a1 = 1 + b1 = (height + width) + c1 = width * height * (1 - min_overlap) / (1 + min_overlap) + sq1 = np.sqrt(b1**2 - 4 * a1 * c1) + radius1 = (b1 + sq1) / (2 * a1) + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = np.sqrt(b2**2 - 4 * a2 * c2) + radius2 = (b2 + sq2) / 2 + + a3 = 4 * min_overlap + b3 = -2 * min_overlap * (height + width) + c3 = (min_overlap - 1) * width * height + sq3 = np.sqrt(b3**2 - 4 * a3 * c3) + radius3 = (b3 + sq3) / 2 + return min(radius1, radius2, radius3) + + +def gaussian2D(shape, sigma_x=1, sigma_y=1): + m, n = [(ss - 1.) / 2. for ss in shape] + y, x = np.ogrid[-m:m + 1, -n:n + 1] + + h = np.exp(-(x * x / (2 * sigma_x * sigma_x) + y * y / (2 * sigma_y * + sigma_y))) + h[h < np.finfo(h.dtype).eps * h.max()] = 0 + return h + + +def draw_umich_gaussian(heatmap, center, radius, k=1): + """ + draw_umich_gaussian, refer to https://github.com/xingyizhou/CenterNet/blob/master/src/lib/utils/image.py#L126 + """ + diameter = 2 * radius + 1 + gaussian = gaussian2D( + (diameter, diameter), sigma_x=diameter / 6, sigma_y=diameter / 6) + + x, y = int(center[0]), int(center[1]) + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left: + radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) + return heatmap diff --git a/PaddleDetection-release-2.6/deploy/python/visualize.py b/PaddleDetection-release-2.6/deploy/python/visualize.py new file mode 100644 index 0000000000000000000000000000000000000000..5d4ea4de12766dc067ae894def7374604484295d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/python/visualize.py @@ -0,0 +1,579 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import division + +import os +import cv2 +import numpy as np +from PIL import Image, ImageDraw, ImageFile +ImageFile.LOAD_TRUNCATED_IMAGES = True +import math + + +def visualize_box_mask(im, results, labels, threshold=0.5): + """ + Args: + im (str/np.ndarray): path of image/np.ndarray read by cv2 + results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + MaskRCNN's results include 'masks': np.ndarray: + shape:[N, im_h, im_w] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): Threshold of score. + Returns: + im (PIL.Image.Image): visualized image + """ + if isinstance(im, str): + im = Image.open(im).convert('RGB') + elif isinstance(im, np.ndarray): + im = Image.fromarray(im) + if 'masks' in results and 'boxes' in results and len(results['boxes']) > 0: + im = draw_mask( + im, results['boxes'], results['masks'], labels, threshold=threshold) + if 'boxes' in results and len(results['boxes']) > 0: + im = draw_box(im, results['boxes'], labels, threshold=threshold) + if 'segm' in results: + im = draw_segm( + im, + results['segm'], + results['label'], + results['score'], + labels, + threshold=threshold) + return im + + +def get_color_map_list(num_classes): + """ + Args: + num_classes (int): number of class + Returns: + color_map (list): RGB color list + """ + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + +def draw_mask(im, np_boxes, np_masks, labels, threshold=0.5): + """ + Args: + im (PIL.Image.Image): PIL image + np_boxes (np.ndarray): shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + np_masks (np.ndarray): shape:[N, im_h, im_w] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): threshold of mask + Returns: + im (PIL.Image.Image): visualized image + """ + color_list = get_color_map_list(len(labels)) + w_ratio = 0.4 + alpha = 0.7 + im = np.array(im).astype('float32') + clsid2color = {} + expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) + np_boxes = np_boxes[expect_boxes, :] + np_masks = np_masks[expect_boxes, :, :] + im_h, im_w = im.shape[:2] + np_masks = np_masks[:, :im_h, :im_w] + for i in range(len(np_masks)): + clsid, score = int(np_boxes[i][0]), np_boxes[i][1] + mask = np_masks[i] + if clsid not in clsid2color: + clsid2color[clsid] = color_list[clsid] + color_mask = clsid2color[clsid] + for c in range(3): + color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + im[idx[0], idx[1], :] *= 1.0 - alpha + im[idx[0], idx[1], :] += alpha * color_mask + return Image.fromarray(im.astype('uint8')) + + +def draw_box(im, np_boxes, labels, threshold=0.5): + """ + Args: + im (PIL.Image.Image): PIL image + np_boxes (np.ndarray): shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): threshold of box + Returns: + im (PIL.Image.Image): visualized image + """ + draw_thickness = min(im.size) // 320 + draw = ImageDraw.Draw(im) + clsid2color = {} + color_list = get_color_map_list(len(labels)) + expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) + np_boxes = np_boxes[expect_boxes, :] + + for dt in np_boxes: + clsid, bbox, score = int(dt[0]), dt[2:], dt[1] + if clsid not in clsid2color: + clsid2color[clsid] = color_list[clsid] + color = tuple(clsid2color[clsid]) + + if len(bbox) == 4: + xmin, ymin, xmax, ymax = bbox + print('class_id:{:d}, confidence:{:.4f}, left_top:[{:.2f},{:.2f}],' + 'right_bottom:[{:.2f},{:.2f}]'.format( + int(clsid), score, xmin, ymin, xmax, ymax)) + # draw bbox + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=draw_thickness, + fill=color) + elif len(bbox) == 8: + x1, y1, x2, y2, x3, y3, x4, y4 = bbox + draw.line( + [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)], + width=2, + fill=color) + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + + # draw label + text = "{} {:.4f}".format(labels[clsid], score) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color) + draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) + return im + + +def draw_segm(im, + np_segms, + np_label, + np_score, + labels, + threshold=0.5, + alpha=0.7): + """ + Draw segmentation on image + """ + mask_color_id = 0 + w_ratio = .4 + color_list = get_color_map_list(len(labels)) + im = np.array(im).astype('float32') + clsid2color = {} + np_segms = np_segms.astype(np.uint8) + for i in range(np_segms.shape[0]): + mask, score, clsid = np_segms[i], np_score[i], np_label[i] + if score < threshold: + continue + + if clsid not in clsid2color: + clsid2color[clsid] = color_list[clsid] + color_mask = clsid2color[clsid] + for c in range(3): + color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + idx0 = np.minimum(idx[0], im.shape[0] - 1) + idx1 = np.minimum(idx[1], im.shape[1] - 1) + im[idx0, idx1, :] *= 1.0 - alpha + im[idx0, idx1, :] += alpha * color_mask + sum_x = np.sum(mask, axis=0) + x = np.where(sum_x > 0.5)[0] + sum_y = np.sum(mask, axis=1) + y = np.where(sum_y > 0.5)[0] + x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1] + cv2.rectangle(im, (x0, y0), (x1, y1), + tuple(color_mask.astype('int32').tolist()), 1) + bbox_text = '%s %.2f' % (labels[clsid], score) + t_size = cv2.getTextSize(bbox_text, 0, 0.3, thickness=1)[0] + cv2.rectangle(im, (x0, y0), (x0 + t_size[0], y0 - t_size[1] - 3), + tuple(color_mask.astype('int32').tolist()), -1) + cv2.putText( + im, + bbox_text, (x0, y0 - 2), + cv2.FONT_HERSHEY_SIMPLEX, + 0.3, (0, 0, 0), + 1, + lineType=cv2.LINE_AA) + return Image.fromarray(im.astype('uint8')) + + +def get_color(idx): + idx = idx * 3 + color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255) + return color + + +def visualize_pose(imgfile, + results, + visual_thresh=0.6, + save_name='pose.jpg', + save_dir='output', + returnimg=False, + ids=None): + try: + import matplotlib.pyplot as plt + import matplotlib + plt.switch_backend('agg') + except Exception as e: + print('Matplotlib not found, please install matplotlib.' + 'for example: `pip install matplotlib`.') + raise e + skeletons, scores = results['keypoint'] + skeletons = np.array(skeletons) + kpt_nums = 17 + if len(skeletons) > 0: + kpt_nums = skeletons.shape[1] + if kpt_nums == 17: #plot coco keypoint + EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8), + (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14), + (13, 15), (14, 16), (11, 12)] + else: #plot mpii keypoint + EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7), (7, 8), + (8, 9), (10, 11), (11, 12), (13, 14), (14, 15), (8, 12), + (8, 13)] + NUM_EDGES = len(EDGES) + + colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \ + [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \ + [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]] + cmap = matplotlib.cm.get_cmap('hsv') + plt.figure() + + img = cv2.imread(imgfile) if type(imgfile) == str else imgfile + + color_set = results['colors'] if 'colors' in results else None + + if 'bbox' in results and ids is None: + bboxs = results['bbox'] + for j, rect in enumerate(bboxs): + xmin, ymin, xmax, ymax = rect + color = colors[0] if color_set is None else colors[color_set[j] % + len(colors)] + cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1) + + canvas = img.copy() + for i in range(kpt_nums): + for j in range(len(skeletons)): + if skeletons[j][i, 2] < visual_thresh: + continue + if ids is None: + color = colors[i] if color_set is None else colors[color_set[j] + % + len(colors)] + else: + color = get_color(ids[j]) + + cv2.circle( + canvas, + tuple(skeletons[j][i, 0:2].astype('int32')), + 2, + color, + thickness=-1) + + to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0) + fig = matplotlib.pyplot.gcf() + + stickwidth = 2 + + for i in range(NUM_EDGES): + for j in range(len(skeletons)): + edge = EDGES[i] + if skeletons[j][edge[0], 2] < visual_thresh or skeletons[j][edge[ + 1], 2] < visual_thresh: + continue + + cur_canvas = canvas.copy() + X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]] + Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]] + mX = np.mean(X) + mY = np.mean(Y) + length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5 + angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1])) + polygon = cv2.ellipse2Poly((int(mY), int(mX)), + (int(length / 2), stickwidth), + int(angle), 0, 360, 1) + if ids is None: + color = colors[i] if color_set is None else colors[color_set[j] + % + len(colors)] + else: + color = get_color(ids[j]) + cv2.fillConvexPoly(cur_canvas, polygon, color) + canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0) + if returnimg: + return canvas + save_name = os.path.join( + save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg') + plt.imsave(save_name, canvas[:, :, ::-1]) + print("keypoint visualize image saved to: " + save_name) + plt.close() + + +def visualize_attr(im, results, boxes=None, is_mtmct=False): + if isinstance(im, str): + im = Image.open(im) + im = np.ascontiguousarray(np.copy(im)) + im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + else: + im = np.ascontiguousarray(np.copy(im)) + + im_h, im_w = im.shape[:2] + text_scale = max(0.5, im.shape[0] / 3000.) + text_thickness = 1 + + line_inter = im.shape[0] / 40. + for i, res in enumerate(results): + if boxes is None: + text_w = 3 + text_h = 1 + elif is_mtmct: + box = boxes[i] # multi camera, bbox shape is x,y, w,h + text_w = int(box[0]) + 3 + text_h = int(box[1]) + else: + box = boxes[i] # single camera, bbox shape is 0, 0, x,y, w,h + text_w = int(box[2]) + 3 + text_h = int(box[3]) + for text in res: + text_h += int(line_inter) + text_loc = (text_w, text_h) + cv2.putText( + im, + text, + text_loc, + cv2.FONT_ITALIC, + text_scale, (0, 255, 255), + thickness=text_thickness) + return im + + +def visualize_action(im, + mot_boxes, + action_visual_collector=None, + action_text="", + video_action_score=None, + video_action_text=""): + im = cv2.imread(im) if isinstance(im, str) else im + im_h, im_w = im.shape[:2] + + text_scale = max(1, im.shape[1] / 400.) + text_thickness = 2 + + if action_visual_collector: + id_action_dict = {} + for collector, action_type in zip(action_visual_collector, action_text): + id_detected = collector.get_visualize_ids() + for pid in id_detected: + id_action_dict[pid] = id_action_dict.get(pid, []) + id_action_dict[pid].append(action_type) + for mot_box in mot_boxes: + # mot_box is a format with [mot_id, class, score, xmin, ymin, w, h] + if mot_box[0] in id_action_dict: + text_position = (int(mot_box[3] + mot_box[5] * 0.75), + int(mot_box[4] - 10)) + display_text = ', '.join(id_action_dict[mot_box[0]]) + cv2.putText(im, display_text, text_position, + cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), 2) + + if video_action_score: + cv2.putText( + im, + video_action_text + ': %.2f' % video_action_score, + (int(im_w / 2), int(15 * text_scale) + 5), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + return im + + +def visualize_vehicleplate(im, results, boxes=None): + if isinstance(im, str): + im = Image.open(im) + im = np.ascontiguousarray(np.copy(im)) + im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + else: + im = np.ascontiguousarray(np.copy(im)) + + im_h, im_w = im.shape[:2] + text_scale = max(1.0, im.shape[0] / 400.) + text_thickness = 2 + + line_inter = im.shape[0] / 40. + for i, res in enumerate(results): + if boxes is None: + text_w = 3 + text_h = 1 + else: + box = boxes[i] + text = res + if text == "": + continue + text_w = int(box[2]) + text_h = int(box[5] + box[3]) + text_loc = (text_w, text_h) + cv2.putText( + im, + "LP: " + text, + text_loc, + cv2.FONT_ITALIC, + text_scale, (0, 255, 255), + thickness=text_thickness) + return im + + +def draw_press_box_lanes(im, np_boxes, labels, threshold=0.5): + """ + Args: + im (PIL.Image.Image): PIL image + np_boxes (np.ndarray): shape:[N,6], N: number of box, + matix element:[class, score, x_min, y_min, x_max, y_max] + labels (list): labels:['class1', ..., 'classn'] + threshold (float): threshold of box + Returns: + im (PIL.Image.Image): visualized image + """ + + if isinstance(im, str): + im = Image.open(im).convert('RGB') + elif isinstance(im, np.ndarray): + im = Image.fromarray(im) + + draw_thickness = min(im.size) // 320 + draw = ImageDraw.Draw(im) + clsid2color = {} + color_list = get_color_map_list(len(labels)) + + if np_boxes.shape[1] == 7: + np_boxes = np_boxes[:, 1:] + + expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) + np_boxes = np_boxes[expect_boxes, :] + + for dt in np_boxes: + clsid, bbox, score = int(dt[0]), dt[2:], dt[1] + if clsid not in clsid2color: + clsid2color[clsid] = color_list[clsid] + color = tuple(clsid2color[clsid]) + + if len(bbox) == 4: + xmin, ymin, xmax, ymax = bbox + # draw bbox + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=draw_thickness, + fill=(0, 0, 255)) + elif len(bbox) == 8: + x1, y1, x2, y2, x3, y3, x4, y4 = bbox + draw.line( + [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)], + width=2, + fill=color) + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + + # draw label + text = "{}".format(labels[clsid]) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymax - th), (xmin + tw + 1, ymax)], fill=color) + draw.text((xmin + 1, ymax - th), text, fill=(0, 0, 255)) + return im + + +def visualize_vehiclepress(im, results, threshold=0.5): + results = np.array(results) + labels = ['violation'] + im = draw_press_box_lanes(im, results, labels, threshold=threshold) + return im + + +def visualize_lane(im, lanes): + if isinstance(im, str): + im = Image.open(im).convert('RGB') + elif isinstance(im, np.ndarray): + im = Image.fromarray(im) + + draw_thickness = min(im.size) // 320 + draw = ImageDraw.Draw(im) + + if len(lanes) > 0: + for lane in lanes: + draw.line( + [(lane[0], lane[1]), (lane[2], lane[3])], + width=draw_thickness, + fill=(0, 0, 255)) + + return im + + +def visualize_vehicle_retrograde(im, mot_res, vehicle_retrograde_res): + if isinstance(im, str): + im = Image.open(im).convert('RGB') + elif isinstance(im, np.ndarray): + im = Image.fromarray(im) + + draw_thickness = min(im.size) // 320 + draw = ImageDraw.Draw(im) + + lane = vehicle_retrograde_res['fence_line'] + if lane is not None: + draw.line( + [(lane[0], lane[1]), (lane[2], lane[3])], + width=draw_thickness, + fill=(0, 0, 0)) + + mot_id = vehicle_retrograde_res['output'] + if mot_id is None or len(mot_id) == 0: + return im + + if mot_res is None: + return im + np_boxes = mot_res['boxes'] + + if np_boxes is not None: + for dt in np_boxes: + if dt[0] not in mot_id: + continue + bbox = dt[3:] + if len(bbox) == 4: + xmin, ymin, xmax, ymax = bbox + # draw bbox + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=draw_thickness, + fill=(0, 255, 0)) + + # draw label + text = "retrograde" + tw, th = draw.textsize(text) + draw.rectangle( + [(xmax + 1, ymin - th), (xmax + tw + 1, ymin)], + fill=(0, 255, 0)) + draw.text((xmax + 1, ymin - th), text, fill=(0, 255, 0)) + + return im diff --git a/PaddleDetection-release-2.6/deploy/serving/README.md b/PaddleDetection-release-2.6/deploy/serving/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fdae78b1fe6c6932e1117ec28ba1099fca94b407 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/README.md @@ -0,0 +1,188 @@ +# 服务端预测部署 + +`PaddleDetection`训练出来的模型可以使用[Serving](https://github.com/PaddlePaddle/Serving) 部署在服务端。 +本教程以在COCO数据集上用`configs/yolov3/yolov3_darknet53_270e_coco.yml`算法训练的模型进行部署。 +预训练模型权重文件为[yolov3_darknet53_270e_coco.pdparams](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) 。 + +## 1. 首先验证模型 +``` +python tools/infer.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --infer_img=demo/000000014439.jpg -o use_gpu=True weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams --infer_img=demo/000000014439.jpg +``` + +## 2. 安装 paddle serving +请参考[PaddleServing](https://github.com/PaddlePaddle/Serving/tree/v0.7.0) 中安装教程安装(版本>=0.7.0)。 + +## 3. 导出模型 +PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/EXPORT_MODEL.md) + +``` +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams --export_serving_model=True +``` + +以上命令会在`output_inference/`文件夹下生成一个`yolov3_darknet53_270e_coco`文件夹: +``` +output_inference +│ ├── yolov3_darknet53_270e_coco +│ │ ├── infer_cfg.yml +│ │ ├── model.pdiparams +│ │ ├── model.pdiparams.info +│ │ ├── model.pdmodel +│ │ ├── serving_client +│ │ │ ├── serving_client_conf.prototxt +│ │ │ ├── serving_client_conf.stream.prototxt +│ │ ├── serving_server +│ │ │ ├── __model__ +│ │ │ ├── __params__ +│ │ │ ├── serving_server_conf.prototxt +│ │ │ ├── serving_server_conf.stream.prototxt +│ │ │ ├── ... +``` + +`serving_client`文件夹下`serving_client_conf.prototxt`详细说明了模型输入输出信息 +`serving_client_conf.prototxt`文件内容为: +``` +feed_var { + name: "im_shape" + alias_name: "im_shape" + is_lod_tensor: false + feed_type: 1 + shape: 2 +} +feed_var { + name: "image" + alias_name: "image" + is_lod_tensor: false + feed_type: 1 + shape: 3 + shape: 608 + shape: 608 +} +feed_var { + name: "scale_factor" + alias_name: "scale_factor" + is_lod_tensor: false + feed_type: 1 + shape: 2 +} +fetch_var { + name: "multiclass_nms3_0.tmp_0" + alias_name: "multiclass_nms3_0.tmp_0" + is_lod_tensor: true + fetch_type: 1 + shape: -1 +} +fetch_var { + name: "multiclass_nms3_0.tmp_2" + alias_name: "multiclass_nms3_0.tmp_2" + is_lod_tensor: false + fetch_type: 2 +``` + +## 4. 启动PaddleServing服务 + +``` +cd output_inference/yolov3_darknet53_270e_coco/ + +# GPU +python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 + +# CPU +python -m paddle_serving_server.serve --model serving_server --port 9393 +``` + +## 5. 测试部署的服务 +准备`label_list.txt`文件,示例`label_list.txt`文件内容为 +``` +person +bicycle +car +motorcycle +airplane +bus +train +truck +boat +traffic light +fire hydrant +stop sign +parking meter +bench +bird +cat +dog +horse +sheep +cow +elephant +bear +zebra +giraffe +backpack +umbrella +handbag +tie +suitcase +frisbee +skis +snowboard +sports ball +kite +baseball bat +baseball glove +skateboard +surfboard +tennis racket +bottle +wine glass +cup +fork +knife +spoon +bowl +banana +apple +sandwich +orange +broccoli +carrot +hot dog +pizza +donut +cake +chair +couch +potted plant +bed +dining table +toilet +tv +laptop +mouse +remote +keyboard +cell phone +microwave +oven +toaster +sink +refrigerator +book +clock +vase +scissors +teddy bear +hair drier +toothbrush +``` + +设置`prototxt`文件路径为`serving_client/serving_client_conf.prototxt` +设置`fetch`为`fetch=["multiclass_nms3_0.tmp_0"])` + +测试 +``` +# 进入目录 +cd output_inference/yolov3_darknet53_270e_coco/ + +# 测试代码 test_client.py 会自动创建output文件夹,并在output下生成`bbox.json`和`000000014439.jpg`两个文件 +python ../../deploy/serving/test_client.py ../../deploy/serving/label_list.txt ../../demo/000000014439.jpg +``` diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/README.md b/PaddleDetection-release-2.6/deploy/serving/cpp/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2e6d4af0c01c0c295bc384dca76de490cc222fba --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/README.md @@ -0,0 +1,74 @@ +# C++ Serving预测部署 + +## 1. 简介 +Paddle Serving是飞桨开源的服务化部署框架,提供了C++ Serving和Python Pipeline两套框架, +C++ Serving框架更倾向于追求极致性能,Python Pipeline框架倾向于二次开发的便捷性。 +旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务,助力人工智能落地应用。 + +更多关于Paddle Serving的介绍,可以参考[Paddle Serving官网repo](https://github.com/PaddlePaddle/Serving)。 + +本文档主要介绍利用C++ Serving框架实现模型(以yolov3_darknet53_270e_coco为例)的服务化部署。 + +## 2. C++ Serving预测部署 + +#### 2.1 C++ 服务化部署样例程序介绍 +服务化部署的样例程序的目录地址为:`deploy/serving/cpp` +```shell +deploy/ +├── serving/ +│ ├── python/ # Python 服务化部署样例程序目录 +│ │ ├──config.yml # 服务端模型预测相关配置文件 +│ │ ├──pipeline_http_client.py # 客户端代码 +│ │ ├──postprocess_ops.py # 用户自定义后处理代码 +│ │ ├──preprocess_ops.py # 用户自定义预处理代码 +│ │ ├──README.md # 说明文档 +│ │ ├──web_service.py # 服务端代码 +│ ├── cpp/ # C++ 服务化部署样例程序目录 +│ │ ├──preprocess/ # C++ 自定义OP +│ │ ├──build_server.sh # C++ Serving 编译脚本 +│ │ ├──serving_client.py # 客户端代码 +│ │ └── ... +│ └── ... +└── ... +``` + +### 2.2 环境准备 +安装Paddle Serving三个安装包的最新版本, +分别是:paddle-serving-client, paddle-serving-app和paddlepaddle(CPU/GPU版本二选一)。 +```commandline +pip install paddle-serving-client +# pip install paddle-serving-server # CPU +pip install paddle-serving-server-gpu # GPU 默认 CUDA10.2 + TensorRT6,其他环境需手动指定版本号 +pip install paddle-serving-app +# pip install paddlepaddle # CPU +pip install paddlepaddle-gpu +``` +您可能需要使用国内镜像源(例如百度源, 在pip命令中添加`-i https://mirror.baidu.com/pypi/simple`)来加速下载。 +Paddle Serving Server更多不同运行环境的whl包下载地址,请参考:[下载页面](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md) +PaddlePaddle更多版本请参考[官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) + +### 2.3 服务化部署模型导出 +导出步骤参考文档[PaddleDetection部署模型导出教程](../../EXPORT_MODEL.md), +导出服务化部署模型需要添加`--export_serving_model True`参数,导出示例如下: +```commandline +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ + --export_serving_model True \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +``` + +### 2.4 编译C++ Serving & 启动服务端模型预测服务 +可使用一键编译脚本`deploy/serving/cpp/build_server.sh`进行编译 +```commandline +bash deploy/serving/cpp/build_server.sh +``` +当完成以上编译安装和模型导出后,可以按如下命令启动模型预测服务: +```commandline +python -m paddle_serving_server.serve --model output_inference/yolov3_darknet53_270e_coco/serving_server --op yolov3_darknet53_270e_coco --port 9997 & +``` +如果需要自定义开发OP,请参考[文档](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/C%2B%2B_Serving/2%2B_model.md)进行开发 + +### 2.5 启动客户端访问 +当成功启动了模型预测服务,可以按如下命令启动客户端访问服务: +```commandline +python deploy/serving/python/serving_client.py --serving_client output_inference/yolov3_darknet53_270e_coco/serving_client --image_file demo/000000014439.jpg --http_port 9997 +``` diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/build_server.sh b/PaddleDetection-release-2.6/deploy/serving/cpp/build_server.sh new file mode 100644 index 0000000000000000000000000000000000000000..803dce07c1cdb9c6a77f063b7b01391f3109667c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/build_server.sh @@ -0,0 +1,70 @@ +#使用镜像: +#registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 + +#编译Serving Server: + +#client和app可以直接使用release版本 + +#server因为加入了自定义OP,需要重新编译 + +apt-get update +apt install -y libcurl4-openssl-dev libbz2-dev +wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && tar xf centos_ssl.tar && rm -rf centos_ssl.tar && mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so + +# 安装go依赖 +rm -rf /usr/local/go +wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | tar -xz -C /usr/local +export GOROOT=/usr/local/go +export GOPATH=/root/gopath +export PATH=$PATH:$GOPATH/bin:$GOROOT/bin +go env -w GO111MODULE=on +go env -w GOPROXY=https://goproxy.cn,direct +go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2 +go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2 +go install github.com/golang/protobuf/protoc-gen-go@v1.4.3 +go install google.golang.org/grpc@v1.33.0 +go env -w GO111MODULE=auto + +# 下载opencv库 +wget https://paddle-qa.bj.bcebos.com/PaddleServing/opencv3.tar.gz && tar -xvf opencv3.tar.gz && rm -rf opencv3.tar.gz +export OPENCV_DIR=$PWD/opencv3 + +# clone Serving +git clone https://github.com/PaddlePaddle/Serving.git -b develop --depth=1 +cd Serving +export Serving_repo_path=$PWD +git submodule update --init --recursive +python -m pip install -r python/requirements.txt + +# set env +export PYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") +export PYTHON_LIBRARIES=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))") +export PYTHON_EXECUTABLE=`which python` + +export CUDA_PATH='/usr/local/cuda' +export CUDNN_LIBRARY='/usr/local/cuda/lib64/' +export CUDA_CUDART_LIBRARY='/usr/local/cuda/lib64/' +export TENSORRT_LIBRARY_PATH='/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/' + +# cp 自定义OP代码 +\cp ../deploy/serving/cpp/preprocess/*.h ${Serving_repo_path}/core/general-server/op +\cp ../deploy/serving/cpp/preprocess/*.cpp ${Serving_repo_path}/core/general-server/op + +# 编译Server, export SERVING_BIN +mkdir server-build-gpu-opencv && cd server-build-gpu-opencv +cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ + -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \ + -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \ + -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \ + -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \ + -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \ + -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DWITH_OPENCV=ON \ + -DSERVER=ON \ + -DWITH_GPU=ON .. +make -j32 + +python -m pip install python/dist/paddle* +export SERVING_BIN=$PWD/core/general-server/serving +cd ../../ diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..b60eb24ce288e349eec73a4bf6c6b7ce8983e7fe --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp @@ -0,0 +1,309 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/mask_rcnn_r50_fpn_1x_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int mask_rcnn_r50_fpn_1x_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + Resize(&img, scale_factor_h, scale_factor_w, im_shape_h, im_shape_w); + Normalize(&img, mean_, scale_, is_scale_); + PadStride(&img, 32); + int input_shape_h = img.rows; + int input_shape_w = img.cols; + std::vector input(1 * 3 * input_shape_h * input_shape_w, 0.0f); + Permute(img, input.data()); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * input_shape_h * input_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, input_shape_h, input_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void mask_rcnn_r50_fpn_1x_coco::Resize(cv::Mat *img, float &scale_factor_h, + float &scale_factor_w, int &im_shape_h, + int &im_shape_w) { + // keep_ratio + int im_size_max = std::max(img->rows, img->cols); + int im_size_min = std::min(img->rows, img->cols); + int target_size_max = std::max(im_shape_h, im_shape_w); + int target_size_min = std::min(im_shape_h, im_shape_w); + float scale_min = + static_cast(target_size_min) / static_cast(im_size_min); + float scale_max = + static_cast(target_size_max) / static_cast(im_size_max); + float scale_ratio = std::min(scale_min, scale_max); + + // scale_factor + scale_factor_h = scale_ratio; + scale_factor_w = scale_ratio; + + // Resize + cv::resize(*img, *img, cv::Size(), scale_ratio, scale_ratio, 2); + im_shape_h = img->rows; + im_shape_w = img->cols; +} + +void mask_rcnn_r50_fpn_1x_coco::Normalize(cv::Mat *img, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + (*img).convertTo(*img, CV_32FC3, e); + for (int h = 0; h < img->rows; h++) { + for (int w = 0; w < img->cols; w++) { + img->at(h, w)[0] = + (img->at(h, w)[0] - mean[0]) / scale[0]; + img->at(h, w)[1] = + (img->at(h, w)[1] - mean[1]) / scale[1]; + img->at(h, w)[2] = + (img->at(h, w)[2] - mean[2]) / scale[2]; + } + } +} + +void mask_rcnn_r50_fpn_1x_coco::PadStride(cv::Mat *img, int stride_) { + // PadStride + if (stride_ <= 0) + return; + int rh = img->rows; + int rw = img->cols; + int nh = (rh / stride_) * stride_ + (rh % stride_ != 0) * stride_; + int nw = (rw / stride_) * stride_ + (rw % stride_ != 0) * stride_; + cv::copyMakeBorder(*img, *img, 0, nh - rh, 0, nw - rw, cv::BORDER_CONSTANT, + cv::Scalar(0)); +} + +void mask_rcnn_r50_fpn_1x_coco::Permute(const cv::Mat &img, float *data) { + // Permute + int rh = img.rows; + int rw = img.cols; + int rc = img.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i); + } +} + +cv::Mat mask_rcnn_r50_fpn_1x_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string mask_rcnn_r50_fpn_1x_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(mask_rcnn_r50_fpn_1x_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..5b2b8377a88b0cbcc313a3dd8a96c35dd9f57f91 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h @@ -0,0 +1,72 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class mask_rcnn_r50_fpn_1x_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(mask_rcnn_r50_fpn_1x_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 1333; + int im_shape_w = 800; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + + void Resize(cv::Mat *img, float &scale_factor_h, float &scale_factor_w, + int &im_shape_h, int &im_shape_w); + void Normalize(cv::Mat *img, const std::vector &mean, + const std::vector &scale, const bool is_scale); + void PadStride(cv::Mat *img, int stride_ = -1); + void Permute(const cv::Mat &img, float *data); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..66bfeaef21189e395c2f15d716468723465c24b6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp @@ -0,0 +1,258 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/picodet_lcnet_1_5x_416_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int picodet_lcnet_1_5x_416_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void picodet_lcnet_1_5x_416_coco::preprocess_det( + const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat picodet_lcnet_1_5x_416_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string picodet_lcnet_1_5x_416_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(picodet_lcnet_1_5x_416_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..4db649a27b2dbd408b1984511cbb184c112bf1fe --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class picodet_lcnet_1_5x_416_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(picodet_lcnet_1_5x_416_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 416; + int im_shape_w = 416; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..2d2d62cd321bf1d2d5055b827552337e86b4aa15 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp @@ -0,0 +1,282 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/ppyolo_mbv3_large_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int ppyolo_mbv3_large_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void ppyolo_mbv3_large_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat ppyolo_mbv3_large_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string ppyolo_mbv3_large_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(ppyolo_mbv3_large_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..5f55e18f51eae4c3f5588594b2db05773d529987 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class ppyolo_mbv3_large_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(ppyolo_mbv3_large_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 320; + int im_shape_w = 320; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f59c4f341539db3a7b777051c49da6d6f6919166 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp @@ -0,0 +1,260 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/ppyoloe_crn_s_300e_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int ppyoloe_crn_s_300e_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void ppyoloe_crn_s_300e_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat ppyoloe_crn_s_300e_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string ppyoloe_crn_s_300e_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(ppyoloe_crn_s_300e_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..cb3e68476998d7fadaafba8e2bc9282c4479a5f8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class ppyoloe_crn_s_300e_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(ppyoloe_crn_s_300e_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 640; + int im_shape_w = 640; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.cpp new file mode 100644 index 0000000000000000000000000000000000000000..ccc94d2c4a35ed9f47f65fab6e74301e35c801d6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.cpp @@ -0,0 +1,232 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/tinypose_128x96.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int tinypose_128x96::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void tinypose_128x96::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, int im_shape_h, + int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 1); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat tinypose_128x96::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string tinypose_128x96::base64Decode(const char *Data, int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(tinypose_128x96); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.h new file mode 100644 index 0000000000000000000000000000000000000000..83bf9bf7d17de5fd03407f73bf7e96b512a6fe3e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/tinypose_128x96.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class tinypose_128x96 + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(tinypose_128x96); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 128; + int im_shape_w = 96; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..5937be46c0ffffe07651e7c8ed13be369d03bf7c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp @@ -0,0 +1,282 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/yolov3_darknet53_270e_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int yolov3_darknet53_270e_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void yolov3_darknet53_270e_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat yolov3_darknet53_270e_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string yolov3_darknet53_270e_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(yolov3_darknet53_270e_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..67593040eadd664d49981c66f37d4e689807ec8f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class yolov3_darknet53_270e_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(yolov3_darknet53_270e_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 608; + int im_shape_w = 608; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client.py b/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client.py new file mode 100644 index 0000000000000000000000000000000000000000..49134c30569d60533b131b8a25d6584ab782329c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client.py @@ -0,0 +1,125 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import glob +import base64 +import argparse +from paddle_serving_client import Client +from paddle_serving_client.proto import general_model_config_pb2 as m_config +import google.protobuf.text_format + +parser = argparse.ArgumentParser(description="args for paddleserving") +parser.add_argument( + "--serving_client", type=str, help="the directory of serving_client") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) +parser.add_argument("--http_port", type=int, default=9997) +parser.add_argument( + "--threshold", type=float, default=0.5, help="Threshold of score.") +args = parser.parse_args() + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def postprocess(fetch_dict, fetch_vars, draw_threshold=0.5): + result = [] + if "conv2d_441.tmp_1" in fetch_dict: + heatmap = fetch_dict["conv2d_441.tmp_1"] + print(heatmap) + result.append(heatmap) + else: + bboxes = fetch_dict[fetch_vars[0]] + for bbox in bboxes: + if bbox[0] > -1 and bbox[1] > draw_threshold: + print(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + result.append(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + return result + + +def get_model_vars(client_config_dir): + # read original serving_client_conf.prototxt + client_config_file = os.path.join(client_config_dir, + "serving_client_conf.prototxt") + with open(client_config_file, 'r') as f: + model_var = google.protobuf.text_format.Merge( + str(f.read()), m_config.GeneralModelConfig()) + # modify feed_var to run core/general-server/op/ + [model_var.feed_var.pop() for _ in range(len(model_var.feed_var))] + feed_var = m_config.FeedVar() + feed_var.name = "input" + feed_var.alias_name = "input" + feed_var.is_lod_tensor = False + feed_var.feed_type = 20 + feed_var.shape.extend([1]) + model_var.feed_var.extend([feed_var]) + with open( + os.path.join(client_config_dir, "serving_client_conf_cpp.prototxt"), + "w") as f: + f.write(str(model_var)) + # get feed_vars/fetch_vars + feed_vars = [var.name for var in model_var.feed_var] + fetch_vars = [var.name for var in model_var.fetch_var] + return feed_vars, fetch_vars + + +if __name__ == '__main__': + url = f"127.0.0.1:{args.http_port}" + logid = 10000 + img_list = get_test_images(args.image_dir, args.image_file) + feed_vars, fetch_vars = get_model_vars(args.serving_client) + + client = Client() + client.load_client_config( + os.path.join(args.serving_client, "serving_client_conf_cpp.prototxt")) + client.connect([url]) + + for img_file in img_list: + with open(img_file, 'rb') as file: + image_data = file.read() + image = base64.b64encode(image_data).decode('utf8') + fetch_dict = client.predict( + feed={feed_vars[0]: image}, fetch=fetch_vars) + result = postprocess(fetch_dict, fetch_vars, args.threshold) diff --git a/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client_conf.prototxt b/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client_conf.prototxt new file mode 100644 index 0000000000000000000000000000000000000000..fb069003ab8a6b8163d7e06d7760b1c6c42b196a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/cpp/serving_client_conf.prototxt @@ -0,0 +1,20 @@ +feed_var { + name: "input" + alias_name: "input" + is_lod_tensor: false + feed_type: 20 + shape: 1 +} +fetch_var { + name: "multiclass_nms3_0.tmp_0" + alias_name: "multiclass_nms3_0.tmp_0" + is_lod_tensor: true + fetch_type: 1 + shape: -1 +} +fetch_var { + name: "multiclass_nms3_0.tmp_2" + alias_name: "multiclass_nms3_0.tmp_2" + is_lod_tensor: false + fetch_type: 2 +} \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/serving/label_list.txt b/PaddleDetection-release-2.6/deploy/serving/label_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..1f42c8eb44628f95b2f4067de928a7f5c1e9c8dc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/label_list.txt @@ -0,0 +1,80 @@ +person +bicycle +car +motorcycle +airplane +bus +train +truck +boat +traffic light +fire hydrant +stop sign +parking meter +bench +bird +cat +dog +horse +sheep +cow +elephant +bear +zebra +giraffe +backpack +umbrella +handbag +tie +suitcase +frisbee +skis +snowboard +sports ball +kite +baseball bat +baseball glove +skateboard +surfboard +tennis racket +bottle +wine glass +cup +fork +knife +spoon +bowl +banana +apple +sandwich +orange +broccoli +carrot +hot dog +pizza +donut +cake +chair +couch +potted plant +bed +dining table +toilet +tv +laptop +mouse +remote +keyboard +cell phone +microwave +oven +toaster +sink +refrigerator +book +clock +vase +scissors +teddy bear +hair drier +toothbrush \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/serving/python/README.md b/PaddleDetection-release-2.6/deploy/serving/python/README.md new file mode 100644 index 0000000000000000000000000000000000000000..40130d043b0db0bc7e7077088d9eb4c1fc1e6cb6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/README.md @@ -0,0 +1,72 @@ +# Python Serving预测部署 + +## 1. 简介 +Paddle Serving是飞桨开源的服务化部署框架,提供了C++ Serving和Python Pipeline两套框架, +C++ Serving框架更倾向于追求极致性能,Python Pipeline框架倾向于二次开发的便捷性。 +旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务,助力人工智能落地应用。 + +更多关于Paddle Serving的介绍,可以参考[Paddle Serving官网repo](https://github.com/PaddlePaddle/Serving)。 + +本文档主要介绍利用Python Pipeline框架实现模型(以yolov3_darknet53_270e_coco为例)的服务化部署。 + +## 2. Python Serving预测部署 + +#### 2.1 Python 服务化部署样例程序介绍 +服务化部署的样例程序的目录地址为:`deploy/serving/python` +```shell +deploy/ +├── serving/ +│ ├── python/ # Python 服务化部署样例程序目录 +│ │ ├──config.yml # 服务端模型预测相关配置文件 +│ │ ├──pipeline_http_client.py # 客户端代码 +│ │ ├──postprocess_ops.py # 用户自定义后处理代码 +│ │ ├──preprocess_ops.py # 用户自定义预处理代码 +│ │ ├──README.md # 说明文档 +│ │ ├──web_service.py # 服务端代码 +│ ├── cpp/ # C++ 服务化部署样例程序目录 +│ │ ├──preprocess/ # C++ 自定义OP +│ │ ├──build_server.sh # C++ Serving 编译脚本 +│ │ ├──serving_client.py # 客户端代码 +│ │ └── ... +│ └── ... +└── ... +``` + +### 2.2 环境准备 +安装Paddle Serving四个安装包的最新版本, +分别是:paddle-serving-server(CPU/GPU版本二选一), +paddle-serving-client, paddle-serving-app和paddlepaddle(CPU/GPU版本二选一)。 +```commandline +pip install paddle-serving-client +# pip install paddle-serving-server # CPU +pip install paddle-serving-server-gpu # GPU 默认 CUDA10.2 + TensorRT6,其他环境需手动指定版本号 +pip install paddle-serving-app +# pip install paddlepaddle # CPU +pip install paddlepaddle-gpu +``` +您可能需要使用国内镜像源(例如百度源, 在pip命令中添加`-i https://mirror.baidu.com/pypi/simple`)来加速下载。 +Paddle Serving Server更多不同运行环境的whl包下载地址,请参考:[下载页面](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md) +PaddlePaddle更多版本请参考[官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) + +### 2.3 服务化部署模型导出 +导出步骤参考文档[PaddleDetection部署模型导出教程](../../EXPORT_MODEL.md), +导出服务化部署模型需要添加`--export_serving_model True`参数,导出示例如下: +```commandline +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ + --export_serving_model True \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +``` + +### 2.4 启动服务端模型预测服务 +当完成以上环境准备和模型导出后,可以按如下命令启动模型预测服务: +```commandline +python deploy/serving/python/web_service.py --model_dir output_inference/yolov3_darknet53_270e_coco & +``` +服务端模型预测相关配置可在[config.yml](./config.yml)中修改, +开发者只需要关注如下配置:http_port(服务的http端口),device_type(计算硬件类型),devices(计算硬件ID)。 + +### 2.5 启动客户端访问 +当成功启动了模型预测服务,可以按如下命令启动客户端访问服务: +```commandline +python deploy/serving/python/pipeline_http_client.py --image_file demo/000000014439.jpg +``` diff --git a/PaddleDetection-release-2.6/deploy/serving/python/config.yml b/PaddleDetection-release-2.6/deploy/serving/python/config.yml new file mode 100644 index 0000000000000000000000000000000000000000..5ec4285257d618f6c5a7ed02aab5c34dae9a96e1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/config.yml @@ -0,0 +1,31 @@ +#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG +##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num +worker_num: 20 + +#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port +http_port: 18093 +rpc_port: 9993 + +dag: + #op资源类型, True, 为线程模型;False,为进程模型 + is_thread_op: False +op: + #op名称,与web_service中的TIPCExampleService初始化name参数一致 + ppdet: + #并发数,is_thread_op=True时,为线程并发;否则为进程并发 + concurrency: 1 + + #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置 + local_service_conf: + + #uci模型路径 + model_config: "./serving_server" + + #计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu + device_type: + + #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡 + devices: "0" # "0,1" + + #client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测 + client_type: local_predictor diff --git a/PaddleDetection-release-2.6/deploy/serving/python/pipeline_http_client.py b/PaddleDetection-release-2.6/deploy/serving/python/pipeline_http_client.py new file mode 100644 index 0000000000000000000000000000000000000000..fa9b30c0d79bf5a7e0d5da7a2538580e7452f8bb --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/pipeline_http_client.py @@ -0,0 +1,76 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import requests +import json +import base64 +import os +import argparse + +parser = argparse.ArgumentParser(description="args for paddleserving") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) +parser.add_argument("--http_port", type=int, default=18093) +parser.add_argument("--service_name", type=str, default="ppdet") +args = parser.parse_args() + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +if __name__ == "__main__": + url = f"http://127.0.0.1:{args.http_port}/{args.service_name}/prediction" + logid = 10000 + img_list = get_test_images(args.image_dir, args.image_file) + + for img_file in img_list: + with open(img_file, 'rb') as file: + image_data = file.read() + + # base64 encode + image = base64.b64encode(image_data).decode('utf8') + + data = {"key": ["image_0"], "value": [image], "logid": logid} + # send requests + r = requests.post(url=url, data=json.dumps(data)) + print(r.json()) diff --git a/PaddleDetection-release-2.6/deploy/serving/python/postprocess_ops.py b/PaddleDetection-release-2.6/deploy/serving/python/postprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..1836f7de776921c4dae97d42e927834a3d2d8613 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/postprocess_ops.py @@ -0,0 +1,171 @@ +import cv2 +import math +import numpy as np +from preprocess_ops import get_affine_transform + + +class HRNetPostProcess(object): + def __init__(self, use_dark=True): + self.use_dark = use_dark + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped + + def get_max_preds(self, heatmaps): + """get predictions from score maps + + Args: + heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints + """ + assert isinstance(heatmaps, + np.ndarray), 'heatmaps should be numpy.ndarray' + assert heatmaps.ndim == 4, 'batch_images should be 4-ndim' + + batch_size = heatmaps.shape[0] + num_joints = heatmaps.shape[1] + width = heatmaps.shape[3] + heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1)) + idx = np.argmax(heatmaps_reshaped, 2) + maxvals = np.amax(heatmaps_reshaped, 2) + + maxvals = maxvals.reshape((batch_size, num_joints, 1)) + idx = idx.reshape((batch_size, num_joints, 1)) + + preds = np.tile(idx, (1, 1, 2)).astype(np.float32) + + preds[:, :, 0] = (preds[:, :, 0]) % width + preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) + + pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) + pred_mask = pred_mask.astype(np.float32) + + preds *= pred_mask + + return preds, maxvals + + def gaussian_blur(self, heatmap, kernel): + border = (kernel - 1) // 2 + batch_size = heatmap.shape[0] + num_joints = heatmap.shape[1] + height = heatmap.shape[2] + width = heatmap.shape[3] + for i in range(batch_size): + for j in range(num_joints): + origin_max = np.max(heatmap[i, j]) + dr = np.zeros((height + 2 * border, width + 2 * border)) + dr[border:-border, border:-border] = heatmap[i, j].copy() + dr = cv2.GaussianBlur(dr, (kernel, kernel), 0) + heatmap[i, j] = dr[border:-border, border:-border].copy() + heatmap[i, j] *= origin_max / np.max(heatmap[i, j]) + return heatmap + + def dark_parse(self, hm, coord): + heatmap_height = hm.shape[0] + heatmap_width = hm.shape[1] + px = int(coord[0]) + py = int(coord[1]) + if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2: + dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1]) + dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px]) + dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2]) + dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \ + + hm[py-1][px-1]) + dyy = 0.25 * ( + hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px]) + derivative = np.matrix([[dx], [dy]]) + hessian = np.matrix([[dxx, dxy], [dxy, dyy]]) + if dxx * dyy - dxy**2 != 0: + hessianinv = hessian.I + offset = -hessianinv * derivative + offset = np.squeeze(np.array(offset.T), axis=0) + coord += offset + return coord + + def dark_postprocess(self, hm, coords, kernelsize): + """ + refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py + + """ + hm = self.gaussian_blur(hm, kernelsize) + hm = np.maximum(hm, 1e-10) + hm = np.log(hm) + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + coords[n, p] = self.dark_parse(hm[n][p], coords[n][p]) + return coords + + def get_final_preds(self, heatmaps, center, scale, kernelsize=3): + """the highest heatvalue location with a quarter offset in the + direction from the highest response to the second highest response. + + Args: + heatmaps (numpy.ndarray): The predicted heatmaps + center (numpy.ndarray): The boxes center + scale (numpy.ndarray): The scale factor + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints + """ + + coords, maxvals = self.get_max_preds(heatmaps) + + heatmap_height = heatmaps.shape[2] + heatmap_width = heatmaps.shape[3] + + if self.use_dark: + coords = self.dark_postprocess(heatmaps, coords, kernelsize) + else: + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + hm = heatmaps[n][p] + px = int(math.floor(coords[n][p][0] + 0.5)) + py = int(math.floor(coords[n][p][1] + 0.5)) + if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1: + diff = np.array([ + hm[py][px + 1] - hm[py][px - 1], + hm[py + 1][px] - hm[py - 1][px] + ]) + coords[n][p] += np.sign(diff) * .25 + preds = coords.copy() + + # Transform back + for i in range(coords.shape[0]): + preds[i] = transform_preds(coords[i], center[i], scale[i], + [heatmap_width, heatmap_height]) + + return preds, maxvals + + def __call__(self, output, center, scale): + preds, maxvals = self.get_final_preds(output, center, scale) + return np.concatenate( + (preds, maxvals), axis=-1), np.mean( + maxvals, axis=1) + + +def transform_preds(coords, center, scale, output_size): + target_coords = np.zeros(coords.shape) + trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1) + for p in range(coords.shape[0]): + target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) + return target_coords + + +def affine_transform(pt, t): + new_pt = np.array([pt[0], pt[1], 1.]).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] diff --git a/PaddleDetection-release-2.6/deploy/serving/python/preprocess_ops.py b/PaddleDetection-release-2.6/deploy/serving/python/preprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..15f76159818a159f3967c7778eda7dc53a8a40a4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/preprocess_ops.py @@ -0,0 +1,490 @@ +import numpy as np +import cv2 +import copy + + +def decode_image(im): + im = np.array(im) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + norm_type (str): type in ['mean_std', 'none'] + """ + + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + return inp, im_info + + +# keypoint preprocess +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img): + img, im_info = decode_image(img) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = img + return inputs diff --git a/PaddleDetection-release-2.6/deploy/serving/python/web_service.py b/PaddleDetection-release-2.6/deploy/serving/python/web_service.py new file mode 100644 index 0000000000000000000000000000000000000000..08be7d2c61904b3d3d2e21643c7eaddd3581f48b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/python/web_service.py @@ -0,0 +1,261 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import copy + +from paddle_serving_server.web_service import WebService, Op +from paddle_serving_server.proto import general_model_config_pb2 as m_config +import google.protobuf.text_format + +import os +import numpy as np +import base64 +from PIL import Image +import io +from preprocess_ops import Compose +from postprocess_ops import HRNetPostProcess + +from argparse import ArgumentParser, RawDescriptionHelpFormatter +import yaml + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'PPYOLOE', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', + 'S2ANet', 'JDE', 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', + 'TOOD', 'RetinaNet', 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet' +} + +GLOBAL_VAR = {} + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument( + "-c", + "--config", + default="deploy/serving/python/config.yml", + help="configuration file to use") + self.add_argument( + "--model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + self.add_argument( + "-o", "--opt", nargs='+', help="set configuration options") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + args.service_config = self._parse_opt(args.opt, args.config) + print("args config:", args.service_config) + args.model_config = PredictConfig(args.model_dir) + return args + + def _parse_helper(self, v): + if v.isnumeric(): + if "." in v: + v = float(v) + else: + v = int(v) + elif v == "True" or v == "False": + v = (v == "True") + return v + + def _parse_opt(self, opts, conf_path): + f = open(conf_path) + config = yaml.load(f, Loader=yaml.Loader) + if not opts: + return config + for s in opts: + s = s.strip() + k, v = s.split('=') + v = self._parse_helper(v) + if "devices" in k: + v = str(v) + print(k, v, type(v)) + cur = config + parent = cur + for kk in k.split("."): + if kk not in cur: + cur[kk] = {} + parent = cur + cur = cur[kk] + else: + parent = cur + cur = cur[kk] + parent[k.split(".")[-1]] = v + return config + + +class PredictConfig(object): + """set config of preprocess, postprocess and visualize + Args: + model_dir (str): root path of infer_cfg.yml + """ + + def __init__(self, model_dir): + # parsing Yaml config for Preprocess + deploy_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.label_list = yml_conf['label_list'] + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + self.draw_threshold = yml_conf.get("draw_threshold", 0.5) + self.mask = yml_conf.get("mask", False) + self.tracker = yml_conf.get("tracker", None) + self.nms = yml_conf.get("NMS", None) + self.fpn_stride = yml_conf.get("fpn_stride", None) + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +class DetectorOp(Op): + def init_op(self): + self.preprocess_pipeline = Compose(GLOBAL_VAR['preprocess_ops']) + + def preprocess(self, input_dicts, data_id, log_id): + (_, input_dict), = input_dicts.items() + inputs = [] + for key, data in input_dict.items(): + data = base64.b64decode(data.encode('utf8')) + byte_stream = io.BytesIO(data) + img = Image.open(byte_stream).convert("RGB") + inputs.append(self.preprocess_pipeline(img)) + inputs = self.collate_inputs(inputs) + return inputs, False, None, "" + + def postprocess(self, input_dicts, fetch_dict, data_id, log_id): + (_, input_dict), = input_dicts.items() + if GLOBAL_VAR['model_config'].arch in ["HRNet"]: + result = self.parse_keypoint_result(input_dict, fetch_dict) + else: + result = self.parse_detection_result(input_dict, fetch_dict) + return result, None, "" + + def collate_inputs(self, inputs): + collate_inputs = {k: [] for k in inputs[0].keys()} + for info in inputs: + for k in collate_inputs.keys(): + collate_inputs[k].append(info[k]) + return { + k: np.stack(v) + for k, v in collate_inputs.items() if k in GLOBAL_VAR['feed_vars'] + } + + def parse_detection_result(self, input_dict, fetch_dict): + bboxes = fetch_dict[GLOBAL_VAR['fetch_vars'][0]] + bboxes_num = fetch_dict[GLOBAL_VAR['fetch_vars'][1]] + if GLOBAL_VAR['model_config'].mask: + masks = fetch_dict[GLOBAL_VAR['fetch_vars'][2]] + idx = 0 + results = {} + for img_name, num in zip(input_dict.keys(), bboxes_num): + if num == 0: + results[img_name] = 'No object detected!' + else: + result = [] + bbox = bboxes[idx:idx + num] + for line in bbox: + if line[0] > -1 and line[1] > GLOBAL_VAR[ + 'model_config'].draw_threshold: + result.append( + f"{int(line[0])} {line[1]} " + f"{line[2]} {line[3]} {line[4]} {line[5]}") + if len(result) == 0: + result = 'No object detected!' + results[img_name] = result + idx += num + return results + + def parse_keypoint_result(self, input_dict, fetch_dict): + heatmap = fetch_dict["conv2d_441.tmp_1"] + keypoint_postprocess = HRNetPostProcess() + im_shape = [] + for key, data in input_dict.items(): + data = base64.b64decode(data.encode('utf8')) + byte_stream = io.BytesIO(data) + img = Image.open(byte_stream).convert("RGB") + im_shape.append([img.width, img.height]) + im_shape = np.array(im_shape) + center = np.round(im_shape / 2.) + scale = im_shape / 200. + kpts, scores = keypoint_postprocess(heatmap, center, scale) + results = {"keypoint": kpts, "scores": scores} + return results + + +class DetectorService(WebService): + def get_pipeline_response(self, read_op): + return DetectorOp(name="ppdet", input_ops=[read_op]) + + +def get_model_vars(model_dir, service_config): + serving_server_dir = os.path.join(model_dir, "serving_server") + # rewrite model_config + service_config['op']['ppdet']['local_service_conf'][ + 'model_config'] = serving_server_dir + serving_server_conf = os.path.join(serving_server_dir, + "serving_server_conf.prototxt") + with open(serving_server_conf, 'r') as f: + model_var = google.protobuf.text_format.Merge( + str(f.read()), m_config.GeneralModelConfig()) + feed_vars = [var.name for var in model_var.feed_var] + fetch_vars = [var.name for var in model_var.fetch_var] + return feed_vars, fetch_vars + + +if __name__ == '__main__': + # load config and prepare the service + FLAGS = ArgsParser().parse_args() + feed_vars, fetch_vars = get_model_vars(FLAGS.model_dir, + FLAGS.service_config) + GLOBAL_VAR['feed_vars'] = feed_vars + GLOBAL_VAR['fetch_vars'] = fetch_vars + GLOBAL_VAR['preprocess_ops'] = FLAGS.model_config.preprocess_infos + GLOBAL_VAR['model_config'] = FLAGS.model_config + # define the service + uci_service = DetectorService(name="ppdet") + uci_service.prepare_pipeline_config(yml_dict=FLAGS.service_config) + # start the service + uci_service.run_service() diff --git a/PaddleDetection-release-2.6/deploy/serving/test_client.py b/PaddleDetection-release-2.6/deploy/serving/test_client.py new file mode 100644 index 0000000000000000000000000000000000000000..d66d52b1c5708a8f7f36fe969841970cfb1d9cf8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/serving/test_client.py @@ -0,0 +1,43 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import numpy as np +from paddle_serving_client import Client +from paddle_serving_app.reader import * +import cv2 +preprocess = Sequential([ + File2Image(), BGR2RGB(), Resize( + (608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose( + (2, 0, 1)) +]) + +postprocess = RCNNPostprocess(sys.argv[1], "output", [608, 608]) +client = Client() + +client.load_client_config("serving_client/serving_client_conf.prototxt") +client.connect(['127.0.0.1:9393']) + +im = preprocess(sys.argv[2]) +fetch_map = client.predict( + feed={ + "image": im, + "im_shape": np.array(list(im.shape[1:])).reshape(-1), + "scale_factor": np.array([1.0, 1.0]).reshape(-1), + }, + fetch=["multiclass_nms3_0.tmp_0"], + batch=False) +print(fetch_map) +fetch_map["image"] = sys.argv[2] +postprocess(fetch_map) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/.gitignore b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..faeba235a6894f0bd28aab23dea5f4f559071846 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/.gitignore @@ -0,0 +1,5 @@ +include/inputs.h +include/outputs.h + +__pycache__/ +build/ \ No newline at end of file diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/Makefile b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..cf7d375b7e54c7781768db39274d9b3f7128812b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/Makefile @@ -0,0 +1,129 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Makefile to build demo + +# Setup build environment +BUILD_DIR := build + +ARM_CPU = ARMCM55 +ETHOSU_PATH = /opt/arm/ethosu +CMSIS_PATH ?= ${ETHOSU_PATH}/cmsis +ETHOSU_PLATFORM_PATH ?= ${ETHOSU_PATH}/core_platform +STANDALONE_CRT_PATH := $(abspath $(BUILD_DIR))/runtime +CORSTONE_300_PATH = ${ETHOSU_PLATFORM_PATH}/targets/corstone-300 +PKG_COMPILE_OPTS = -g -Wall -O2 -Wno-incompatible-pointer-types -Wno-format -mcpu=cortex-m55 -mthumb -mfloat-abi=hard -std=gnu99 +CMAKE ?= cmake +CC = arm-none-eabi-gcc +AR = arm-none-eabi-ar +RANLIB = arm-none-eabi-ranlib +PKG_CFLAGS = ${PKG_COMPILE_OPTS} \ + -I${STANDALONE_CRT_PATH}/include \ + -I${STANDALONE_CRT_PATH}/src/runtime/crt/include \ + -I${PWD}/include \ + -I${CORSTONE_300_PATH} \ + -I${CMSIS_PATH}/Device/ARM/${ARM_CPU}/Include/ \ + -I${CMSIS_PATH}/CMSIS/Core/Include \ + -I${CMSIS_PATH}/CMSIS/NN/Include \ + -I${CMSIS_PATH}/CMSIS/DSP/Include \ + -I$(abspath $(BUILD_DIR))/codegen/host/include +CMSIS_NN_CMAKE_FLAGS = -DCMAKE_TOOLCHAIN_FILE=$(abspath $(BUILD_DIR))/../arm-none-eabi-gcc.cmake \ + -DTARGET_CPU=cortex-m55 \ + -DBUILD_CMSIS_NN_FUNCTIONS=YES +PKG_LDFLAGS = -lm -specs=nosys.specs -static -T corstone300.ld + +$(ifeq VERBOSE,1) +QUIET ?= +$(else) +QUIET ?= @ +$(endif) + +DEMO_MAIN = src/demo_bare_metal.c +CODEGEN_SRCS = $(wildcard $(abspath $(BUILD_DIR))/codegen/host/src/*.c) +CODEGEN_OBJS = $(subst .c,.o,$(CODEGEN_SRCS)) +CMSIS_STARTUP_SRCS = $(wildcard ${CMSIS_PATH}/Device/ARM/${ARM_CPU}/Source/*.c) +UART_SRCS = $(wildcard ${CORSTONE_300_PATH}/*.c) + +demo: $(BUILD_DIR)/demo + +$(BUILD_DIR)/stack_allocator.o: $(STANDALONE_CRT_PATH)/src/runtime/crt/memory/stack_allocator.c + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) -c $(PKG_CFLAGS) -o $@ $^ + +$(BUILD_DIR)/crt_backend_api.o: $(STANDALONE_CRT_PATH)/src/runtime/crt/common/crt_backend_api.c + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) -c $(PKG_CFLAGS) -o $@ $^ + +# Build generated code +$(BUILD_DIR)/libcodegen.a: $(CODEGEN_SRCS) + $(QUIET)cd $(abspath $(BUILD_DIR)/codegen/host/src) && $(CC) -c $(PKG_CFLAGS) $(CODEGEN_SRCS) + $(QUIET)$(AR) -cr $(abspath $(BUILD_DIR)/libcodegen.a) $(CODEGEN_OBJS) + $(QUIET)$(RANLIB) $(abspath $(BUILD_DIR)/libcodegen.a) + +# Build CMSIS startup code +${BUILD_DIR}/libcmsis_startup.a: $(CMSIS_STARTUP_SRCS) + $(QUIET)mkdir -p $(abspath $(BUILD_DIR)/libcmsis_startup) + $(QUIET)cd $(abspath $(BUILD_DIR)/libcmsis_startup) && $(CC) -c $(PKG_CFLAGS) -D${ARM_CPU} $^ + $(QUIET)$(AR) -cr $(abspath $(BUILD_DIR)/libcmsis_startup.a) $(abspath $(BUILD_DIR))/libcmsis_startup/*.o + $(QUIET)$(RANLIB) $(abspath $(BUILD_DIR)/libcmsis_startup.a) + +CMSIS_SHA_FILE=${CMSIS_PATH}/977abe9849781a2e788b02282986480ff4e25ea6.sha +ifneq ("$(wildcard $(CMSIS_SHA_FILE))","") +${BUILD_DIR}/cmsis_nn/Source/libcmsis-nn.a: + $(QUIET)mkdir -p $(@D) + $(QUIET)cd $(CMSIS_PATH)/CMSIS/NN && $(CMAKE) -B $(abspath $(BUILD_DIR)/cmsis_nn) $(CMSIS_NN_CMAKE_FLAGS) + $(QUIET)cd $(abspath $(BUILD_DIR)/cmsis_nn) && $(MAKE) all +else +# Build CMSIS-NN +${BUILD_DIR}/cmsis_nn/Source/SoftmaxFunctions/libCMSISNNSoftmax.a: + $(QUIET)mkdir -p $(@D) + $(QUIET)cd $(CMSIS_PATH)/CMSIS/NN && $(CMAKE) -B $(abspath $(BUILD_DIR)/cmsis_nn) $(CMSIS_NN_CMAKE_FLAGS) + $(QUIET)cd $(abspath $(BUILD_DIR)/cmsis_nn) && $(MAKE) all +endif + +# Build demo application +ifneq ("$(wildcard $(CMSIS_SHA_FILE))","") +$(BUILD_DIR)/demo: $(DEMO_MAIN) $(UART_SRCS) $(BUILD_DIR)/stack_allocator.o $(BUILD_DIR)/crt_backend_api.o \ + ${BUILD_DIR}/libcodegen.a ${BUILD_DIR}/libcmsis_startup.a ${BUILD_DIR}/cmsis_nn/Source/libcmsis-nn.a + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) $(PKG_CFLAGS) $(FREERTOS_FLAGS) -o $@ -Wl,--whole-archive $^ -Wl,--no-whole-archive $(PKG_LDFLAGS) +else +$(BUILD_DIR)/demo: $(DEMO_MAIN) $(UART_SRCS) $(BUILD_DIR)/stack_allocator.o $(BUILD_DIR)/crt_backend_api.o \ + ${BUILD_DIR}/libcodegen.a ${BUILD_DIR}/libcmsis_startup.a \ + ${BUILD_DIR}/cmsis_nn/Source/SoftmaxFunctions/libCMSISNNSoftmax.a \ + ${BUILD_DIR}/cmsis_nn/Source/FullyConnectedFunctions/libCMSISNNFullyConnected.a \ + ${BUILD_DIR}/cmsis_nn/Source/SVDFunctions/libCMSISNNSVDF.a \ + ${BUILD_DIR}/cmsis_nn/Source/ReshapeFunctions/libCMSISNNReshape.a \ + ${BUILD_DIR}/cmsis_nn/Source/ActivationFunctions/libCMSISNNActivation.a \ + ${BUILD_DIR}/cmsis_nn/Source/NNSupportFunctions/libCMSISNNSupport.a \ + ${BUILD_DIR}/cmsis_nn/Source/ConcatenationFunctions/libCMSISNNConcatenation.a \ + ${BUILD_DIR}/cmsis_nn/Source/BasicMathFunctions/libCMSISNNBasicMaths.a \ + ${BUILD_DIR}/cmsis_nn/Source/ConvolutionFunctions/libCMSISNNConvolutions.a \ + ${BUILD_DIR}/cmsis_nn/Source/PoolingFunctions/libCMSISNNPooling.a + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) $(PKG_CFLAGS) $(FREERTOS_FLAGS) -o $@ -Wl,--whole-archive $^ -Wl,--no-whole-archive $(PKG_LDFLAGS) +endif + +clean: + $(QUIET)rm -rf $(BUILD_DIR)/codegen + +cleanall: + $(QUIET)rm -rf $(BUILD_DIR) + +.SUFFIXES: + +.DEFAULT: demo diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6c62e2cb6adb634d4f6008decf12d3e04dded87f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/README.md @@ -0,0 +1,107 @@ + + + + + + + + + + + + + + + + + +Running PP-PicoDet object detection model on bare metal Arm(R) Cortex(R)-M55 CPU using Arm Virtual Hardware +====================================================================== + +This folder contains an example of how to run a PP-PicoDet model on bare metal [Cortex(R)-M55 CPU](https://www.arm.com/products/silicon-ip-cpu/cortex-m/cortex-m55) using [Arm Virtual Hardware](https://www.arm.com/products/development-tools/simulation/virtual-hardware). + + +Running environment and prerequisites +------------- +Case 1: If the demo is run in Arm Virtual Hardware Amazon Machine Image(AMI) instance hosted by [AWS](https://aws.amazon.com/marketplace/pp/prodview-urbpq7yo5va7g?sr=0-1&ref_=beagle&applicationId=AWSMPContessa)/[AWS China](https://awsmarketplace.amazonaws.cn/marketplace/pp/prodview-2y7nefntbmybu), the following software will be installed through [configure_avh.sh](./configure_avh.sh) script. It will install automatically when you run the application through [run_demo.sh](./run_demo.sh) script. +You can refer to this [guide](https://arm-software.github.io/AVH/main/examples/html/MicroSpeech.html#amilaunch) to launch an Arm Virtual Hardware AMI instance. + +Case 2: If the demo is run in the [ci_cpu Docker container](https://github.com/apache/tvm/blob/main/docker/Dockerfile.ci_cpu) provided with [TVM](https://github.com/apache/tvm), then the following software will already be installed. + +Case 3: If the demo is not run in the ci_cpu Docker container, then you will need the following: +- Software required to build and run the demo (These can all be installed by running + tvm/docker/install/ubuntu_install_ethosu_driver_stack.sh.) + - [Fixed Virtual Platform (FVP) based on Arm(R) Corstone(TM)-300 software](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps) + - [cmake 3.19.5](https://github.com/Kitware/CMake/releases/) + - [GCC toolchain from Arm(R)](https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-x86_64-linux.tar.bz2) + - [Arm(R) Ethos(TM)-U NPU driver stack](https://review.mlplatform.org) + - [CMSIS](https://github.com/ARM-software/CMSIS_5) +- The python libraries listed in the requirements.txt of this directory + - These can be installed by running the following from the current directory: + ```bash + pip install -r ./requirements.txt + ``` + +In case2 and case3: + +You will need to update your PATH environment variable to include the path to cmake 3.19.5 and the FVP. +For example if you've installed these in ```/opt/arm``` , then you would do the following: +```bash +export PATH=/opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH +``` + +You will also need TVM which can either be: + - Installed from TLCPack(see [TLCPack](https://tlcpack.ai/)) + - Built from source (see [Install from Source](https://tvm.apache.org/docs/install/from_source.html)) + - When building from source, the following need to be set in config.cmake: + - set(USE_CMSISNN ON) + - set(USE_MICRO ON) + - set(USE_LLVM ON) + + +Running the demo application +---------------------------- +Type the following command to run the bare metal text recognition application ([src/demo_bare_metal.c](./src/demo_bare_metal.c)): + +```bash +./run_demo.sh +``` + +If you are not able to use Arm Virtual Hardware Amazon Machine Image(AMI) instance hosted by AWS/AWS China, specify argument --enable_FVP to 1 to make the application run on local Fixed Virtual Platforms (FVPs) executables. + +```bash +./run_demo.sh --enable_FVP 1 +``` + +If the Ethos(TM)-U platform and/or CMSIS have not been installed in /opt/arm/ethosu then +the locations for these can be specified as arguments to run_demo.sh, for example: + +```bash +./run_demo.sh --cmsis_path /home/tvm-user/cmsis \ +--ethosu_platform_path /home/tvm-user/ethosu/core_platform +``` + +With [run_demo.sh](./run_demo.sh) to run the demo application, it will: +- Set up running environment by installing the required prerequisites automatically if running in Arm Virtual Hardware Amazon AMI instance(not specify --enable_FVP to 1) +- Download a PP-PicoDet model +- Use tvmc to compile the text recognition model for Cortex(R)-M55 CPU and CMSIS-NN +- Create a C header file inputs.c containing the image data as a C array +- Create a C header file outputs.c containing a C array where the output of inference will be stored +- Build the demo application +- Run the demo application on a Arm Virtual Hardware based on Arm(R) Corstone(TM)-300 software +- The application will report the text on the image and the corresponding score. + +Using your own image +-------------------- +The create_image.py script takes a single argument on the command line which is the path of the +image to be converted into an array of bytes for consumption by the model. + +The demo can be modified to use an image of your choice by changing the following line in run_demo.sh + +```bash +python3 ./convert_image.py path/to/image +``` + +Model description +----------------- +In this demo, the model we used is based on [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/picodet). Because of the excellent performance, PP-PicoDet are very suitable for deployment on mobile or CPU. And it is released by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake new file mode 100644 index 0000000000000000000000000000000000000000..415b3139be1b7f891c017dff0dc299b67f7ef2fe --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake @@ -0,0 +1,79 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +if (__TOOLCHAIN_LOADED) + return() +endif() +set(__TOOLCHAIN_LOADED TRUE) + +set(CMAKE_SYSTEM_NAME Generic) +set(CMAKE_C_COMPILER "arm-none-eabi-gcc") +set(CMAKE_CXX_COMPILER "arm-none-eabi-g++") +set(CMAKE_SYSTEM_PROCESSOR "cortex-m55" CACHE STRING "Select Arm(R) Cortex(R)-M architecture. (cortex-m0, cortex-m3, cortex-m33, cortex-m4, cortex-m55, cortex-m7, etc)") + +set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY) + +SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER) +SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY) +SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY) + +set(CMAKE_C_STANDARD 99) +set(CMAKE_CXX_STANDARD 14) + +# The system processor could for example be set to cortex-m33+nodsp+nofp. +set(__CPU_COMPILE_TARGET ${CMAKE_SYSTEM_PROCESSOR}) +string(REPLACE "+" ";" __CPU_FEATURES ${__CPU_COMPILE_TARGET}) +list(POP_FRONT __CPU_FEATURES CMAKE_SYSTEM_PROCESSOR) + +string(FIND ${__CPU_COMPILE_TARGET} "+" __OFFSET) +if(__OFFSET GREATER_EQUAL 0) + string(SUBSTRING ${__CPU_COMPILE_TARGET} ${__OFFSET} -1 CPU_FEATURES) +endif() + +# Add -mcpu to the compile options to override the -mcpu the CMake toolchain adds +add_compile_options(-mcpu=${__CPU_COMPILE_TARGET}) + +# Set floating point unit +if("${__CPU_COMPILE_TARGET}" MATCHES "\\+fp") + set(FLOAT hard) +elseif("${__CPU_COMPILE_TARGET}" MATCHES "\\+nofp") + set(FLOAT soft) +elseif("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "cortex-m33" OR + "${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "cortex-m55") + set(FLOAT hard) +else() + set(FLOAT soft) +endif() + +add_compile_options(-mfloat-abi=${FLOAT}) +add_link_options(-mfloat-abi=${FLOAT}) + +# Link target +add_link_options(-mcpu=${__CPU_COMPILE_TARGET}) +add_link_options(-Xlinker -Map=output.map) + +# +# Compile options +# +set(cxx_flags "-fno-unwind-tables;-fno-rtti;-fno-exceptions") + +add_compile_options("-Wall;-Wextra;-Wsign-compare;-Wunused;-Wswitch-default;\ +-Wdouble-promotion;-Wredundant-decls;-Wshadow;-Wnull-dereference;\ +-Wno-format-extra-args;-Wno-unused-function;-Wno-unused-label;\ +-Wno-missing-field-initializers;-Wno-return-type;-Wno-format;-Wno-int-conversion" + "$<$:${cxx_flags}>" +) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/configure_avh.sh b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/configure_avh.sh new file mode 100644 index 0000000000000000000000000000000000000000..8042fd81d2379c6f7489d90372dffd2dc10e145e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/configure_avh.sh @@ -0,0 +1,79 @@ +#!/bin/bash +# Copyright (c) 2022 Arm Limited and Contributors. All rights reserved. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e +set -u +set -o pipefail + +# Show usage +function show_usage() { + cat <&2 + show_usage >&2 + exit 1 + fi + ;; + + --ethosu_platform_path) + if [ $# -gt 1 ] + then + export ETHOSU_PLATFORM_PATH="$2" + shift 2 + else + echo 'ERROR: --ethosu_platform_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + --fvp_path) + if [ $# -gt 1 ] + then + export PATH="$2/models/Linux64_GCC-6.4:$PATH" + shift 2 + else + echo 'ERROR: --fvp_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + --cmake_path) + if [ $# -gt 1 ] + then + export CMAKE="$2" + shift 2 + else + echo 'ERROR: --cmake_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + --enable_FVP) + if [ $# -gt 1 ] && [ "$2" == "1" -o "$2" == "0" ]; + then + FVP_enable="$2" + shift 2 + else + echo 'ERROR: --enable_FVP requires a right argument 1 or 0' >&2 + show_usage >&2 + exit 1 + fi + ;; + + -*|--*) + echo "Error: Unknown flag: $1" >&2 + show_usage >&2 + exit 1 + ;; + esac +done + +# Choose running environment: cloud(default) or local environment +Platform="VHT_Corstone_SSE-300_Ethos-U55" +if [ $FVP_enable == "1" ]; then + Platform="FVP_Corstone_SSE-300_Ethos-U55" + echo -e "\e[36mRun application on local Fixed Virtual Platforms (FVPs)\e[0m" +else + if [ ! -d "/opt/arm/" ]; then + sudo ./configure_avh.sh + fi +fi + +# Directories +script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + +# Make build directory +make cleanall +mkdir -p build +cd build + +# Get PaddlePaddle inference model +echo -e "\e[36mDownload PaddlePaddle inference model\e[0m" +wget https://bj.bcebos.com/v1/paddledet/deploy/Inference/picodet_s_320_coco_lcnet_no_nms.tar +tar -xf picodet_s_320_coco_lcnet_no_nms.tar + +# Compile model for Arm(R) Cortex(R)-M55 CPU and CMSIS-NN +# An alternative to using "python3 -m tvm.driver.tvmc" is to call +# "tvmc" directly once TVM has been pip installed. +python3 -m tvm.driver.tvmc compile --target=cmsis-nn,c \ + --target-cmsis-nn-mcpu=cortex-m55 \ + --target-c-mcpu=cortex-m55 \ + --runtime=crt \ + --executor=aot \ + --executor-aot-interface-api=c \ + --executor-aot-unpacked-api=1 \ + --pass-config tir.usmp.enable=1 \ + --pass-config tir.usmp.algorithm=hill_climb \ + --pass-config tir.disable_storage_rewrite=1 \ + --pass-config tir.disable_vectorize=1 picodet_s_320_coco_lcnet_no_nms/model.pdmodel \ + --output-format=mlf \ + --model-format=paddle \ + --module-name=picodet \ + --input-shapes image:[1,3,320,320] \ + --output=picodet.tar +tar -xf picodet.tar + +# Create C header files +cd .. +python3 ./convert_image.py ./image/000000014439_640x640.jpg + +# Build demo executable +cd ${script_dir} +echo ${script_dir} +make + +# Run demo executable on the AVH +$Platform -C cpu0.CFGDTCMSZ=15 \ +-C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \ +-C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \ +-C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \ +./build/demo --stat diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/src/demo_bare_metal.c b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/src/demo_bare_metal.c new file mode 100644 index 0000000000000000000000000000000000000000..07ed5bebe2c266bde5b59b101c1df1a54ba2ef28 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_avh/src/demo_bare_metal.c @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include + +#include "uart.h" + +// Header files generated by convert_image.py +#include "inputs.h" +#include "outputs.h" + +int main(int argc, char **argv) { + uart_init(); + printf("Starting PicoDet inference:\n"); + struct tvmgen_picodet_outputs rec_outputs = { + .output0 = output0, .output1 = output1, + }; + struct tvmgen_picodet_inputs rec_inputs = { + .image = input, + }; + + tvmgen_picodet_run(&rec_inputs, &rec_outputs); + + // post process + for (int i = 0; i < output0_len / 4; i++) { + float score = 0; + int32_t class = 0; + for (int j = 0; j < 80; j++) { + if (output1[i + j * 2125] > score) { + score = output1[i + j * 2125]; + class = j; + } + } + if (score > 0.1 && output0[i * 4] > 0 && output0[i * 4 + 1] > 0) { + printf("box: %f, %f, %f, %f, class: %d, score: %f\n", output0[i * 4] * 2, + output0[i * 4 + 1] * 2, output0[i * 4 + 2] * 2, + output0[i * 4 + 3] * 2, class, score); + } + } + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..9afa8cfc011587977b4ef3ed13bb0b050e990fa0 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/CMakeLists.txt @@ -0,0 +1,23 @@ +cmake_minimum_required(VERSION 3.9) +project(picodet-mnn) + +set(CMAKE_CXX_STANDARD 17) +set(MNN_DIR PATHS "./mnn") + +# find_package(OpenCV REQUIRED PATHS "/work/dependence/opencv/opencv-3.4.3/build") +find_package(OpenCV REQUIRED) +include_directories( + ${MNN_DIR}/include + ${MNN_DIR}/include/MNN + ${CMAKE_SOURCE_DIR} +) +link_directories(mnn/lib) + +add_library(libMNN SHARED IMPORTED) +set_target_properties( + libMNN + PROPERTIES IMPORTED_LOCATION + ${CMAKE_SOURCE_DIR}/mnn/lib/libMNN.so +) +add_executable(picodet-mnn main.cpp picodet_mnn.cpp) +target_link_libraries(picodet-mnn MNN ${OpenCV_LIBS} libMNN.so) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ac11a8e18fdc53aa7eebb57fa1ba2d4680a9dcf3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/README.md @@ -0,0 +1,89 @@ +# PicoDet MNN Demo + +本Demo提供的预测代码是根据[Alibaba's MNN framework](https://github.com/alibaba/MNN) 推理库预测的。 + +## C++ Demo + +- 第一步:根据[MNN官方编译文档](https://www.yuque.com/mnn/en/build_linux) 编译生成预测库. +- 第二步:编译或下载得到OpenCV库,可参考OpenCV官网,为了方便如果环境是gcc8.2 x86环境,可直接下载以下库: +```shell +wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +``` + +- 第三步:准备模型 + ```shell + modelName=picodet_s_320_coco_lcnet + # 导出Inference model + python tools/export_model.py \ + -c configs/picodet/${modelName}.yml \ + -o weights=${modelName}.pdparams \ + --output_dir=inference_model + # 转换到ONNX + paddle2onnx --model_dir inference_model/${modelName} \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file ${modelName}.onnx + # 简化模型 + python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx + # 将模型转换至MNN格式 + python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet_s_320_lcnet_processed.onnx --MNNModel picodet_s_320_lcnet.mnn + ``` +为了快速测试,可直接下载:[picodet_s_320_lcnet.mnn](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet.mnn)(不带后处理)。 + +**注意:**由于MNN里,Matmul算子的输入shape如果不一致计算有问题,带后处理的Demo正在升级中,很快发布。 + +## 编译可执行程序 + +- 第一步:导入lib包 +``` +mkdir mnn && cd mnn && mkdir lib +cp /path/to/MNN/build/libMNN.so . +cd .. +cp -r /path/to/MNN/include . +``` +- 第二步:修改CMakeLists.txt中OpenCV和MNN的路径 +- 第三步:开始编译 +``` shell +mkdir build && cd build +cmake .. +make +``` +如果在build目录下生成`picodet-mnn`可执行文件,就证明成功了。 + +## 开始运行 + +首先新建预测结果存放目录: +```shell +cp -r ../demo_onnxruntime/imgs . +cd build +mkdir ../results +``` + +- 预测一张图片 +``` shell +./picodet-mnn 0 ../picodet_s_320_lcnet_3.mnn 320 320 ../imgs/dog.jpg +``` + +-测试速度Benchmark + +``` shell +./picodet-mnn 1 ../picodet_s_320_lcnet.mnn 320 320 +``` + +## FAQ + +- 预测结果精度不对: +请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下: +```shell +# 分类分支 | 检测分支 +{"transpose_0.tmp_0", "transpose_1.tmp_0"}, +{"transpose_2.tmp_0", "transpose_3.tmp_0"}, +{"transpose_4.tmp_0", "transpose_5.tmp_0"}, +{"transpose_6.tmp_0", "transpose_7.tmp_0"}, +``` +可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。 + +## Reference +[MNN](https://github.com/alibaba/MNN) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/main.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..5737368d5473a75ced391ad2e28883427a942795 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/main.cpp @@ -0,0 +1,203 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "picodet_mnn.hpp" +#include +#include +#include +#include + +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else + // show it in windows + +struct object_rect { + int x; + int y; + int width; + int height; +}; + +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +void draw_bboxes(const cv::Mat &im, const std::vector &bboxes, + std::string save_path = "None") { + static const char *class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = im.clone(); + int src_w = image.cols; + int src_h = image.rows; + int thickness = 2; + auto colormap = GenerateColorMap(sizeof(class_names)); + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo &bbox = bboxes[i]; + std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". " + << bbox.y2 << ". " << std::endl; + int c1 = colormap[3 * bbox.label + 0]; + int c2 = colormap[3 * bbox.label + 1]; + int c3 = colormap[3 * bbox.label + 2]; + cv::Scalar color = cv::Scalar(c1, c2, c3); + // cv::Scalar color = cv::Scalar(0, 0, 255); + cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1), + cv::Point(bbox.x2, bbox.y2)), + color, 1, cv::LINE_AA); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + + int x = bbox.x1; + int y = bbox.y1 - label_size.height - baseLine; + if (y < 0) + y = 0; + if (x + label_size.width > image.cols) + x = image.cols - label_size.width; + + cv::rectangle(image, cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, + label_size.height + baseLine)), + color, -1); + + cv::putText(image, text, cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1, + cv::LINE_AA); + } + + if (save_path == "None") { + cv::imshow("image", image); + } else { + cv::imwrite(save_path, image); + std::cout << save_path << std::endl; + } +} + +int image_demo(PicoDet &detector, const char *imagepath) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR); + if (image.empty()) { + fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); + return -1; + } + std::vector results; + detector.detect(image, results, false); + std::cout << "detect done." << std::endl; + +#ifdef __SAVE_RESULT__ + std::string save_path = img_name; + draw_bboxes(image, results, save_path.replace(3, 4, "results")); +#else + draw_bboxes(image, results); + cv::waitKey(0); +#endif + } + return 0; +} + +int benchmark(PicoDet &detector, int width, int height) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1)); + for (int i = 0; i < warm_up + loop_num; i++) { + auto start = std::chrono::steady_clock::now(); + std::vector results; + detector.detect(image, results, false); + auto end = std::chrono::steady_clock::now(); + + std::chrono::duration elapsed = end - start; + double time = elapsed.count(); + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; + } + } + time_avg /= loop_num; + fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", + time_min, time_max, time_avg); + return 0; +} + +int main(int argc, char **argv) { + int mode = atoi(argv[1]); + std::string model_path = argv[2]; + int height = 320; + int width = 320; + if (argc == 4) { + height = atoi(argv[3]); + width = atoi(argv[4]); + } + PicoDet detector = PicoDet(model_path, width, height, 4, 0.45, 0.3); + if (mode == 1) { + benchmark(detector, width, height); + } else { + if (argc != 5) { + std::cout << "Must set image file, such as ./picodet-mnn 0 " + "../picodet_s_320_lcnet.mnn 320 320 img.jpg" + << std::endl; + } + const char *images = argv[5]; + image_demo(detector, images); + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.cpp new file mode 100644 index 0000000000000000000000000000000000000000..a315f14a9e29f0958a2707a4e09fcdb78bd12b6c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.cpp @@ -0,0 +1,253 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn + +#include "picodet_mnn.hpp" + +using namespace std; + +PicoDet::PicoDet(const std::string &mnn_path, int input_width, int input_length, + int num_thread_, float score_threshold_, + float nms_threshold_) { + num_thread = num_thread_; + in_w = input_width; + in_h = input_length; + score_threshold = score_threshold_; + nms_threshold = nms_threshold_; + + PicoDet_interpreter = std::shared_ptr( + MNN::Interpreter::createFromFile(mnn_path.c_str())); + MNN::ScheduleConfig config; + config.numThread = num_thread; + MNN::BackendConfig backendConfig; + backendConfig.precision = (MNN::BackendConfig::PrecisionMode)2; + config.backendConfig = &backendConfig; + + PicoDet_session = PicoDet_interpreter->createSession(config); + + input_tensor = PicoDet_interpreter->getSessionInput(PicoDet_session, nullptr); +} + +PicoDet::~PicoDet() { + PicoDet_interpreter->releaseModel(); + PicoDet_interpreter->releaseSession(PicoDet_session); +} + +int PicoDet::detect(cv::Mat &raw_image, std::vector &result_list, + bool has_postprocess) { + if (raw_image.empty()) { + std::cout << "image is empty ,please check!" << std::endl; + return -1; + } + + image_h = raw_image.rows; + image_w = raw_image.cols; + cv::Mat image; + cv::resize(raw_image, image, cv::Size(in_w, in_h)); + + PicoDet_interpreter->resizeTensor(input_tensor, {1, 3, in_h, in_w}); + PicoDet_interpreter->resizeSession(PicoDet_session); + std::shared_ptr pretreat(MNN::CV::ImageProcess::create( + MNN::CV::BGR, MNN::CV::BGR, mean_vals, 3, norm_vals, 3)); + pretreat->convert(image.data, in_w, in_h, image.step[0], input_tensor); + + auto start = chrono::steady_clock::now(); + + // run network + PicoDet_interpreter->runSession(PicoDet_session); + + // get output data + std::vector> results; + results.resize(num_class); + + if (has_postprocess) { + auto bbox_out_tensor = PicoDet_interpreter->getSessionOutput( + PicoDet_session, nms_heads_info[0].c_str()); + auto class_out_tensor = PicoDet_interpreter->getSessionOutput( + PicoDet_session, nms_heads_info[1].c_str()); + // bbox branch + auto tensor_bbox_host = + new MNN::Tensor(bbox_out_tensor, MNN::Tensor::CAFFE); + bbox_out_tensor->copyToHostTensor(tensor_bbox_host); + auto bbox_output_shape = tensor_bbox_host->shape(); + int output_size = 1; + for (int j = 0; j < bbox_output_shape.size(); ++j) { + output_size *= bbox_output_shape[j]; + } + std::cout << "output_size:" << output_size << std::endl; + bbox_output_data_.resize(output_size); + std::copy_n(tensor_bbox_host->host(), output_size, + bbox_output_data_.data()); + delete tensor_bbox_host; + // class branch + auto tensor_class_host = + new MNN::Tensor(class_out_tensor, MNN::Tensor::CAFFE); + class_out_tensor->copyToHostTensor(tensor_class_host); + auto class_output_shape = tensor_class_host->shape(); + output_size = 1; + for (int j = 0; j < class_output_shape.size(); ++j) { + output_size *= class_output_shape[j]; + } + std::cout << "output_size:" << output_size << std::endl; + class_output_data_.resize(output_size); + std::copy_n(tensor_class_host->host(), output_size, + class_output_data_.data()); + delete tensor_class_host; + } else { + for (const auto &head_info : non_postprocess_heads_info) { + MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.cls_layer.c_str()); + MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.dis_layer.c_str()); + + MNN::Tensor tensor_scores_host(tensor_scores, + tensor_scores->getDimensionType()); + tensor_scores->copyToHostTensor(&tensor_scores_host); + + MNN::Tensor tensor_boxes_host(tensor_boxes, + tensor_boxes->getDimensionType()); + tensor_boxes->copyToHostTensor(&tensor_boxes_host); + + decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride, + score_threshold, results); + } + } + + auto end = chrono::steady_clock::now(); + chrono::duration elapsed = end - start; + cout << "inference time:" << elapsed.count() << " s, "; + + for (int i = 0; i < (int)results.size(); i++) { + nms(results[i], nms_threshold); + + for (auto box : results[i]) { + box.x1 = box.x1 / in_w * image_w; + box.x2 = box.x2 / in_w * image_w; + box.y1 = box.y1 / in_h * image_h; + box.y2 = box.y2 / in_h * image_h; + result_list.push_back(box); + } + } + cout << "detect " << result_list.size() << " objects" << endl; + + return 0; +} + +void PicoDet::decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, + int stride, float threshold, + std::vector> &results) { + int feature_h = ceil((float)in_h / stride); + int feature_w = ceil((float)in_w / stride); + + for (int idx = 0; idx < feature_h * feature_w; idx++) { + const float *scores = cls_pred->host() + (idx * num_class); + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + for (int label = 0; label < num_class; label++) { + if (scores[label] > score) { + score = scores[label]; + cur_label = label; + } + } + if (score > threshold) { + const float *bbox_pred = + dis_pred->host() + (idx * 4 * (reg_max + 1)); + results[cur_label].push_back( + disPred2Bbox(bbox_pred, cur_label, score, col, row, stride)); + } + } +} + +BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score, + int x, int y, int stride) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[reg_max + 1]; + activation_function_softmax(dfl_det + i * (reg_max + 1), dis_after_sm, + reg_max + 1); + for (int j = 0; j < reg_max + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + float xmin = (std::max)(ct_x - dis_pred[0], .0f); + float ymin = (std::max)(ct_y - dis_pred[1], .0f); + float xmax = (std::min)(ct_x + dis_pred[2], (float)in_w); + float ymax = (std::min)(ct_y + dis_pred[3], (float)in_h); + return BoxInfo{xmin, ymin, xmax, ymax, score, label}; +} + +void PicoDet::nms(std::vector &input_boxes, float NMS_THRESH) { + std::sort(input_boxes.begin(), input_boxes.end(), + [](BoxInfo a, BoxInfo b) { return a.score > b.score; }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) * + (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1); + float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1); + float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2); + float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= NMS_THRESH) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } else { + j++; + } + } + } +} + +inline float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); } + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.hpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.hpp new file mode 100644 index 0000000000000000000000000000000000000000..4744040e258498afd70ee587ffc0ae0b39d24faa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn/picodet_mnn.hpp @@ -0,0 +1,108 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef __PicoDet_H__ +#define __PicoDet_H__ + +#pragma once + +#include "Interpreter.hpp" + +#include "ImageProcess.hpp" +#include "MNNDefine.h" +#include "Tensor.hpp" +#include +#include +#include +#include +#include +#include +#include + +typedef struct NonPostProcessHeadInfo_ { + std::string cls_layer; + std::string dis_layer; + int stride; +} NonPostProcessHeadInfo; + +typedef struct BoxInfo_ { + float x1; + float y1; + float x2; + float y2; + float score; + int label; +} BoxInfo; + +class PicoDet { +public: + PicoDet(const std::string &mnn_path, int input_width, int input_length, + int num_thread_ = 4, float score_threshold_ = 0.5, + float nms_threshold_ = 0.3); + + ~PicoDet(); + + int detect(cv::Mat &img, std::vector &result_list, + bool has_postprocess); + +private: + void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + void nms(std::vector &input_boxes, float NMS_THRESH); + +private: + std::shared_ptr PicoDet_interpreter; + MNN::Session *PicoDet_session = nullptr; + MNN::Tensor *input_tensor = nullptr; + + int num_thread; + int image_w; + int image_h; + + int in_w = 320; + int in_h = 320; + + float score_threshold; + float nms_threshold; + + const float mean_vals[3] = {103.53f, 116.28f, 123.675f}; + const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f}; + + const int num_class = 80; + const int reg_max = 7; + + std::vector bbox_output_data_; + std::vector class_output_data_; + + std::vector nms_heads_info{"tmp_16", "concat_4.tmp_0"}; + // If not export post-process, will use non_postprocess_heads_info + std::vector non_postprocess_heads_info{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; +}; + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length); + +inline float fast_exp(float x); +inline float sigmoid(float x); + +#endif diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..84bf51a93e17295669e3509a12a00a3cf2fea19c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists.txt @@ -0,0 +1,26 @@ +cmake_minimum_required(VERSION 3.9) + +project(tinypose-mnn) + +set(CMAKE_CXX_STANDARD 17) +set(MNN_DIR {YOUR_MNN_DIR}) + +find_package(OpenCV REQUIRED) + +include_directories( + ${MNN_DIR}/include + ${MNN_DIR}/include/MNN + ${CMAKE_SOURCE_DIR} +) +link_directories(mnn/lib) + +add_library(libMNN SHARED IMPORTED) +set_target_properties( + libMNN + PROPERTIES IMPORTED_LOCATION + ${CMAKE_SOURCE_DIR}/mnn/lib/libMNN.so +) +add_executable(tinypose-mnn main.cpp picodet_mnn.cpp keypoint_detector.cpp keypoint_postprocess.cpp) + +target_link_libraries(tinypose-mnn MNN ${OpenCV_LIBS} libMNN.so) + diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists_armv8.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists_armv8.txt new file mode 100644 index 0000000000000000000000000000000000000000..027f0dd970f0c9d07f20ab026b3d90d9f1af3ddc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/CMakeLists_armv8.txt @@ -0,0 +1,47 @@ +cmake_minimum_required(VERSION 3.9) + +project(tinypose-mnn) + +set(CMAKE_CXX_STANDARD 17) +set(MNN_DIR {YOUR_MNN_DIR}) +set(NDK_ROOT {YOUR_ANDROID_NDK_PATH}) +set(LDFLAGS -latomic -pthread -ldl -llog -lz -static-libstdc++) + +set(OpenCV_DIR ${CMAKE_SOURCE_DIR}/third/opencv4.1.0/arm64-v8a) + +set(OpenCV_DEPS ${OpenCV_DIR}/libs/libopencv_imgcodecs.a + ${OpenCV_DIR}/libs/libopencv_imgproc.a + ${OpenCV_DIR}/libs/libopencv_core.a + ${OpenCV_DIR}/3rdparty/libs/libtegra_hal.a + ${OpenCV_DIR}/3rdparty/libs/liblibjpeg-turbo.a + ${OpenCV_DIR}/3rdparty/libs/liblibwebp.a + ${OpenCV_DIR}/3rdparty/libs/liblibpng.a + ${OpenCV_DIR}/3rdparty/libs/liblibjasper.a + ${OpenCV_DIR}/3rdparty/libs/liblibtiff.a + ${OpenCV_DIR}/3rdparty/libs/libIlmImf.a + ${OpenCV_DIR}/3rdparty/libs/libtbb.a + ${OpenCV_DIR}/3rdparty/libs/libcpufeatures.a) + +set(FLAGS "-pie -Wl,--gc-sections -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=21 -fexceptions -frtti -std=c++11 -O3 -DNDEBUG -fPIE -fopenmp") +set(CMAKE_CXX_FLAGS "--sysroot=${NDK_ROOT}/sysroot ${FLAGS}") + +set(STDCXX ${NDK_ROOT}/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_static.a + ${NDK_ROOT}/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++abi.a + ${NDK_ROOT}/platforms/android-21/arch-arm64/usr/lib/libstdc++.a) +set(SYS_INCS ${NDK_ROOT}/sysroot/usr/include/aarch64-linux-android/ ${NDK_ROOT}/sources/cxx-stl/llvm-libc++/include/ ${NDK_ROOT}/sources/cxx-stl/llvm-libc++abi/include/ ${NDK_ROOT}/sources/android/support/include/ ${NDK_ROOT}/sysroot/usr/include/) + +include_directories( + ${SYS_INCS} + ${OpenCV_DIR}/include + ${MNN_DIR}/include + ${MNN_DIR}/include/MNN + ${CMAKE_SOURCE_DIR} +) + +link_directories(${NDK_ROOT}/platforms/android-21/arch-arm64) +link_directories(${MNN_DIR}/project/android/build_64) + +add_executable(tinypose-mnn picodet_mnn.cpp keypoint_postprocess.cpp keypoint_detector.cpp main.cpp) + +target_link_libraries(tinypose-mnn -lMNN ${OpenCV_DEPS} ${STDCXX} ${LDFLAGS}) + diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6445e603ad87f87f1a435a4d9822716f90a8ebb9 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/README.md @@ -0,0 +1,116 @@ +# TinyPose MNN Demo + +This fold provides PicoDet+TinyPose inference code using +[Alibaba's MNN framework](https://github.com/alibaba/MNN). Most of the implements in +this fold are same as *demo_ncnn*. + +## Install MNN + +### Python library + +Just run: + +``` shell +pip install MNN +``` + +### C++ library + +Please follow the [official document](https://www.yuque.com/mnn/en/build_linux) to build MNN engine. + +- Create picodet_m_416_coco.onnx and tinypose256.onnx + example: + ```shell + modelName=picodet_m_416_coco + # export model + python tools/export_model.py \ + -c configs/picodet/${modelName}.yml \ + -o weights=${modelName}.pdparams \ + --output_dir=inference_model + # convert to onnx + paddle2onnx --model_dir inference_model/${modelName} \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file ${modelName}.onnx + # onnxsim + python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx + ``` + +- Convert model + example: + ``` shell + python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet-416.onnx --MNNModel picodet-416.mnn + ``` +Here are converted model +[picodet_m_416](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416.mnn). +[tinypose256](https://paddledet.bj.bcebos.com/deploy/third_engine/tinypose256.mnn) + +## Build + +For C++ code, replace `libMNN.so` under *./mnn/lib* with the one you just compiled, modify OpenCV path and MNN path at CMake file, +and run + +``` shell +mkdir build && cd build +cmake .. +make +``` + +Note that a flag at `main.cpp` is used to control whether to show the detection result or save it into a fold. + +``` c++ +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows +``` + +#### ARM Build + +Prepare OpenCV library [OpenCV_4_1](https://paddle-inference-dist.bj.bcebos.com/opencv4.1.0.tar.gz). + +``` shell +mkdir third && cd third +wget https://paddle-inference-dist.bj.bcebos.com/opencv4.1.0.tar.gz +tar -zxvf opencv4.1.0.tar.gz +cd .. + +mkdir build && cd build +cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI="arm64-v8a" -DANDROID_PLATFORM=android-21 -DANDROID_TOOLCHAIN=gcc .. +make +``` + +## Run + +To detect images in a fold, run: +``` shell +./tinypose-mnn [mode] [image_file] +``` +| param | detail | +| ---- | ---- | +| --mode | input mode,0:camera;1:image;2:video;3:benchmark | +| --image_file | input image path | + +for example: + +``` shell +./tinypose-mnn "1" "../imgs/test.jpg" +``` + +For speed benchmark: + +``` shell +./tinypose-mnn "3" "0" +``` + +## Benchmark +Plateform: Kirin980 +Model: [tinypose256](https://paddledet.bj.bcebos.com/deploy/third_engine/tinypose256.mnn) + +| param | Min(s) | Max(s) | Avg(s) | +| -------- | ------ | ------ | ------ | +| Thread=4 | 0.018 | 0.021 | 0.019 | +| Thread=1 | 0.031 | 0.041 | 0.032 | + + + +## Reference +[MNN](https://github.com/alibaba/MNN) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.cpp new file mode 100644 index 0000000000000000000000000000000000000000..05fad66d853d721cbdfc5a9c839cdba9a75c7813 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.cpp @@ -0,0 +1,179 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "keypoint_detector.h" + +namespace PaddleDetection { + +// Visualiztion MaskDetector results +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold) { + const int edge[][2] = {{0, 1}, + {0, 2}, + {1, 3}, + {2, 4}, + {3, 5}, + {4, 6}, + {5, 7}, + {6, 8}, + {7, 9}, + {8, 10}, + {5, 11}, + {6, 12}, + {11, 13}, + {12, 14}, + {13, 15}, + {14, 16}, + {11, 12}}; + cv::Mat vis_img = img.clone(); + for (int batchid = 0; batchid < results.size(); batchid++) { + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[i * 3] > threshold) { + int x_coord = int(results[batchid].keypoints[i * 3 + 1]); + int y_coord = int(results[batchid].keypoints[i * 3 + 2]); + cv::circle(vis_img, + cv::Point2d(x_coord, y_coord), + 1, + cv::Scalar(0, 0, 255), + 2); + } + } + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[edge[i][0] * 3] > threshold && + results[batchid].keypoints[edge[i][1] * 3] > threshold) { + int x_start = int(results[batchid].keypoints[edge[i][0] * 3 + 1]); + int y_start = int(results[batchid].keypoints[edge[i][0] * 3 + 2]); + int x_end = int(results[batchid].keypoints[edge[i][1] * 3 + 1]); + int y_end = int(results[batchid].keypoints[edge[i][1] * 3 + 2]); + cv::line(vis_img, + cv::Point2d(x_start, y_start), + cv::Point2d(x_end, y_end), + colormap[i], + 1); + } + } + } + return vis_img; +} + +void KeyPointDetector::Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center_bs, + std::vector>& scale_bs) { + std::vector preds(output_shape[1] * 3, 0); + for (int batchid = 0; batchid < output_shape[0]; batchid++) { + get_final_preds(output, + output_shape, + idxout, + idx_shape, + center_bs[batchid], + scale_bs[batchid], + preds, + batchid, + this->use_dark()); + KeyPointResult result_item; + result_item.num_joints = output_shape[1]; + result_item.keypoints.clear(); + for (int i = 0; i < output_shape[1]; i++) { + result_item.keypoints.emplace_back(preds[i * 3]); + result_item.keypoints.emplace_back(preds[i * 3 + 1]); + result_item.keypoints.emplace_back(preds[i * 3 + 2]); + } + result->push_back(result_item); + } +} + +void KeyPointDetector::Predict(const std::vector imgs, + std::vector>& center_bs, + std::vector>& scale_bs, + std::vector* result) { + int batch_size = imgs.size(); + KeyPointDet_interpreter->resizeTensor(input_tensor, + {batch_size, 3, in_h, in_w}); + KeyPointDet_interpreter->resizeSession(KeyPointDet_session); + auto insize = 3 * in_h * in_w; + + // Preprocess image + cv::Mat resized_im; + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + + cv::resize(im, resized_im, cv::Size(in_w, in_h)); + std::shared_ptr pretreat( + MNN::CV::ImageProcess::create( + MNN::CV::BGR, MNN::CV::RGB, mean_vals, 3, norm_vals, 3)); + pretreat->convert( + resized_im.data, in_w, in_h, resized_im.step[0], input_tensor); + } + + // Run predictor + auto inference_start = std::chrono::steady_clock::now(); + + KeyPointDet_interpreter->runSession(KeyPointDet_session); + // Get output tensor + auto out_tensor = KeyPointDet_interpreter->getSessionOutput( + KeyPointDet_session, "conv2d_441.tmp_1"); + auto nchwoutTensor = new Tensor(out_tensor, Tensor::CAFFE); + out_tensor->copyToHostTensor(nchwoutTensor); + + auto output_shape = nchwoutTensor->shape(); + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + output_data_.resize(output_size); + std::copy_n(nchwoutTensor->host(), output_size, output_data_.data()); + delete nchwoutTensor; + + auto idx_tensor = KeyPointDet_interpreter->getSessionOutput( + KeyPointDet_session, "argmax_0.tmp_0"); + + auto idxhostTensor = new Tensor(idx_tensor, Tensor::CAFFE); + idx_tensor->copyToHostTensor(idxhostTensor); + + auto idx_shape = idxhostTensor->shape(); + // Calculate output length + output_size = 1; + for (int j = 0; j < idx_shape.size(); ++j) { + output_size *= idx_shape[j]; + } + + idx_data_.resize(output_size); + std::copy_n(idxhostTensor->host(), output_size, idx_data_.data()); + delete idxhostTensor; + + auto inference_end = std::chrono::steady_clock::now(); + std::chrono::duration elapsed = inference_end - inference_start; + printf("keypoint inference time: %f s\n", elapsed.count()); + + // Postprocessing result + Postprocess(output_data_, + output_shape, + idx_data_, + idx_shape, + result, + center_bs, + scale_bs); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..1c7af8921a0e3a0f649fd36a0f1d08a763a21256 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_detector.h @@ -0,0 +1,131 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "Interpreter.hpp" + +#include "ImageProcess.hpp" +#include "MNNDefine.h" +#include "Tensor.hpp" + +#include "keypoint_postprocess.h" + +using namespace MNN; + +namespace PaddleDetection { +// Object KeyPoint Result +struct KeyPointResult { + // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf + std::vector keypoints; + int num_joints = -1; +}; + +// Visualiztion KeyPoint Result +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold = 0.2); + +class KeyPointDetector { + public: + explicit KeyPointDetector(const std::string& model_path, + int num_thread = 4, + int input_height = 256, + int input_width = 192, + float score_threshold = 0.3, + const int batch_size = 1, + bool use_dark = true) { + printf("config path: %s", + model_path.substr(0, model_path.find_last_of('/') + 1).c_str()); + use_dark_ = use_dark; + + in_w = input_width; + in_h = input_height; + threshold_ = score_threshold; + + KeyPointDet_interpreter = std::shared_ptr( + MNN::Interpreter::createFromFile(model_path.c_str())); + MNN::ScheduleConfig config; + config.type = MNN_FORWARD_CPU; + /*modeNum means gpuMode for GPU usage, Or means numThread for CPU usage.*/ + config.numThread = num_thread; + // If type not fount, let it failed + config.backupType = MNN_FORWARD_CPU; + BackendConfig backendConfig; + backendConfig.precision = static_cast(1); + config.backendConfig = &backendConfig; + + KeyPointDet_session = KeyPointDet_interpreter->createSession(config); + + input_tensor = + KeyPointDet_interpreter->getSessionInput(KeyPointDet_session, nullptr); + } + + ~KeyPointDetector() { + KeyPointDet_interpreter->releaseModel(); + KeyPointDet_interpreter->releaseSession(KeyPointDet_session); + } + + // Load Paddle inference model + void LoadModel(std::string model_file, int num_theads); + + // Run predictor + void Predict(const std::vector imgs, + std::vector>& center, + std::vector>& scale, + std::vector* result = nullptr); + + bool use_dark() { return this->use_dark_; } + + inline float get_threshold() { return threshold_; }; + + // const float mean_vals[3] = { 103.53f, 116.28f, 123.675f }; + // const float norm_vals[3] = { 0.017429f, 0.017507f, 0.017125f }; + const float mean_vals[3] = {0.f, 0.f, 0.f}; + const float norm_vals[3] = {1.f, 1.f, 1.f}; + int in_w = 128; + int in_h = 256; + + private: + // Postprocess result + void Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center, + std::vector>& scale); + + std::vector output_data_; + std::vector idx_data_; + float threshold_; + bool use_dark_; + + std::shared_ptr KeyPointDet_interpreter; + MNN::Session* KeyPointDet_session = nullptr; + MNN::Tensor* input_tensor = nullptr; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.cpp new file mode 100644 index 0000000000000000000000000000000000000000..fe6e8298d01be7e76ec4792a6b33f5c8d96ba518 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.cpp @@ -0,0 +1,258 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "keypoint_postprocess.h" +#define PI 3.1415926535 +#define HALF_CIRCLE_DEGREE 180 + +cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) { + cv::Point2f direct{a.x - b.x, a.y - b.y}; + return cv::Point2f(a.x - direct.y, a.y + direct.x); +} + +std::vector get_dir(float src_point_x, + float src_point_y, + float rot_rad) { + float sn = sin(rot_rad); + float cs = cos(rot_rad); + std::vector src_result{0.0, 0.0}; + src_result[0] = src_point_x * cs - src_point_y * sn; + src_result[1] = src_point_x * sn + src_point_y * cs; + return src_result; +} + +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& preds, int p) { + double new1[3] = {pt_x, pt_y, 1.0}; + cv::Mat new_pt(3, 1, trans.type(), new1); + cv::Mat w = trans * new_pt; + preds[p * 3 + 1] = static_cast(w.at(0, 0)); + preds[p * 3 + 2] = static_cast(w.at(1, 0)); +} + +void get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + cv::Mat& trans, + int inv) { + float src_w = scale[0]; + float dst_w = static_cast(output_size[0]); + float dst_h = static_cast(output_size[1]); + float rot_rad = rot * PI / HALF_CIRCLE_DEGREE; + std::vector src_dir = get_dir(-0.5 * src_w, 0, rot_rad); + std::vector dst_dir{static_cast(-0.5) * dst_w, 0.0}; + cv::Point2f srcPoint2f[3], dstPoint2f[3]; + srcPoint2f[0] = cv::Point2f(center[0], center[1]); + srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]); + srcPoint2f[2] = get_3rd_point(srcPoint2f[0], srcPoint2f[1]); + + dstPoint2f[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstPoint2f[1] = + cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]); + dstPoint2f[2] = get_3rd_point(dstPoint2f[0], dstPoint2f[1]); + if (inv == 0) { + trans = cv::getAffineTransform(srcPoint2f, dstPoint2f); + } else { + trans = cv::getAffineTransform(dstPoint2f, srcPoint2f); + } +} + +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform(coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } +} + +// only for batchsize == 1 +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx) { + int num_joints = dim[1]; + int width = dim[3]; + std::vector idx; + idx.resize(num_joints * 2); + + for (int j = 0; j < dim[1]; j++) { + float* index = &( + heatmap[batchid * num_joints * dim[2] * dim[3] + j * dim[2] * dim[3]]); + float* end = index + dim[2] * dim[3]; + float* max_dis = std::max_element(index, end); + auto max_id = std::distance(index, max_dis); + maxvals[j] = *max_dis; + if (*max_dis > 0) { + preds[j * 2] = static_cast(max_id % width); + preds[j * 2 + 1] = static_cast(max_id / width); + } + } +} + +void dark_parse(std::vector& heatmap, + std::vector& dim, + std::vector& coords, + int px, + int py, + int index, + int ch) { + /*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate + Representation for Human Pose Estimation (CVPR 2020). + 1) offset = - hassian.inv() * derivative + 2) dx = (heatmap[x+1] - heatmap[x-1])/2. + 3) dxx = (dx[x+1] - dx[x-1])/2. + 4) derivative = Mat([dx, dy]) + 5) hassian = Mat([[dxx, dxy], [dxy, dyy]]) + */ + std::vector::const_iterator first1 = heatmap.begin() + index; + std::vector::const_iterator last1 = + heatmap.begin() + index + dim[2] * dim[3]; + std::vector heatmap_ch(first1, last1); + cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0, dim[2]); + heatmap_mat.convertTo(heatmap_mat, CV_32FC1); + cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0); + heatmap_mat = heatmap_mat.reshape(1, 1); + heatmap_ch = std::vector(heatmap_mat.reshape(1, 1)); + + float epsilon = 1e-10; + // sample heatmap to get values in around target location + float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon)); + float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon)); + float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon)); + + float xr2 = log(fmax(heatmap_ch[py * dim[3] + px + 2], epsilon)); + float xl2 = log(fmax(heatmap_ch[py * dim[3] + px - 2], epsilon)); + float yu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px], epsilon)); + float yd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px], epsilon)); + float yu2 = log(fmax(heatmap_ch[(py + 2) * dim[3] + px], epsilon)); + float yd2 = log(fmax(heatmap_ch[(py - 2) * dim[3] + px], epsilon)); + float xryu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px + 1], epsilon)); + float xryd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px + 1], epsilon)); + float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon)); + float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon)); + + // compute dx/dy and dxx/dyy with sampled values + float dx = 0.5 * (xr - xl); + float dy = 0.5 * (yu - yd); + float dxx = 0.25 * (xr2 - 2 * xy + xl2); + float dxy = 0.25 * (xryu - xryd - xlyu + xlyd); + float dyy = 0.25 * (yu2 - 2 * xy + yd2); + + // finally get offset by derivative and hassian, which combined by dx/dy and + // dxx/dyy + if (dxx * dyy - dxy * dxy != 0) { + float M[2][2] = {dxx, dxy, dxy, dyy}; + float D[2] = {dx, dy}; + cv::Mat hassian(2, 2, CV_32F, M); + cv::Mat derivative(2, 1, CV_32F, D); + cv::Mat offset = -hassian.inv() * derivative; + coords[ch * 2] += offset.at(0, 0); + coords[ch * 2 + 1] += offset.at(1, 0); + } +} + +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK) { + std::vector coords; + coords.resize(dim[1] * 2); + int heatmap_height = dim[2]; + int heatmap_width = dim[3]; + + for (int j = 0; j < dim[1]; ++j) { + int index = (batchid * dim[1] + j) * dim[2] * dim[3]; + + int idx = idxout[batchid * dim[1] + j]; + preds[j * 3] = heatmap[index + idx]; + coords[j * 2] = idx % heatmap_width; + coords[j * 2 + 1] = idx / heatmap_width; + + int px = int(coords[j * 2] + 0.5); + int py = int(coords[j * 2 + 1] + 0.5); + + if (DARK && px > 1 && px < heatmap_width - 2 && py > 1 && + py < heatmap_height - 2) { + dark_parse(heatmap, dim, coords, px, py, index, j); + } else { + if (px > 0 && px < heatmap_width - 1) { + float diff_x = heatmap[index + py * dim[3] + px + 1] - + heatmap[index + py * dim[3] + px - 1]; + coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25; + } + if (py > 0 && py < heatmap_height - 1) { + float diff_y = heatmap[index + (py + 1) * dim[3] + px] - + heatmap[index + (py - 1) * dim[3] + px]; + coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25; + } + } + } + + std::vector img_size{heatmap_width, heatmap_height}; + transform_preds(coords, center, scale, img_size, dim, preds); +} + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio) { + int crop_x1 = std::max(0, area[0]); + int crop_y1 = std::max(0, area[1]); + int crop_x2 = std::min(img.cols - 1, area[2]); + int crop_y2 = std::min(img.rows - 1, area[3]); + + int center_x = (crop_x1 + crop_x2) / 2.; + int center_y = (crop_y1 + crop_y2) / 2.; + int half_h = (crop_y2 - crop_y1) / 2.; + int half_w = (crop_x2 - crop_x1) / 2.; + + if (half_h * 3 > half_w * 4) { + half_w = static_cast(half_h * 0.75); + } else { + half_h = static_cast(half_w * 4 / 3); + } + + crop_x1 = + std::max(0, center_x - static_cast(half_w * (1 + expandratio))); + crop_y1 = + std::max(0, center_y - static_cast(half_h * (1 + expandratio))); + crop_x2 = std::min(img.cols - 1, + static_cast(center_x + half_w * (1 + expandratio))); + crop_y2 = std::min(img.rows - 1, + static_cast(center_y + half_h * (1 + expandratio))); + crop_img = + img(cv::Range(crop_y1, crop_y2 + 1), cv::Range(crop_x1, crop_x2 + 1)); + + center.clear(); + center.emplace_back((crop_x1 + crop_x2) / 2); + center.emplace_back((crop_y1 + crop_y2) / 2); + scale.clear(); + scale.emplace_back((crop_x2 - crop_x1)); + scale.emplace_back((crop_y2 - crop_y1)); +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..928a57bb0be897e4e368bb676f0de3268ffd4c7e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/keypoint_postprocess.h @@ -0,0 +1,67 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include + +std::vector get_3rd_point(std::vector& a, std::vector& b); +std::vector get_dir(float src_point_x, float src_point_y, float rot_rad); +void affine_tranform(float pt_x, + float pt_y, + cv::Mat& trans, + std::vector& x, + int p, + int num); +cv::Mat get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + int inv); +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords); +void box_to_center_scale(std::vector& box, + int width, + int height, + std::vector& center, + std::vector& scale); +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx); +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK = true); + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio = 0.25); diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/main.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f03e983c6271b6804cde2829604d4f3be369fdd4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/main.cpp @@ -0,0 +1,424 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn + +#include +#include +#include +#include +#include "keypoint_detector.h" +#include "picodet_mnn.h" + +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else + // show it in windows + +using namespace PaddleDetection; + +struct object_rect { + int x; + int y; + int width; + int height; +}; + +int resize_uniform(cv::Mat& src, + cv::Mat& dst, + cv::Size dst_size, + object_rect& effect_area) { + int w = src.cols; + int h = src.rows; + int dst_w = dst_size.width; + int dst_h = dst_size.height; + dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0)); + + float ratio_src = w * 1.0 / h; + float ratio_dst = dst_w * 1.0 / dst_h; + + int tmp_w = 0; + int tmp_h = 0; + if (ratio_src > ratio_dst) { + tmp_w = dst_w; + tmp_h = floor((dst_w * 1.0 / w) * h); + } else if (ratio_src < ratio_dst) { + tmp_h = dst_h; + tmp_w = floor((dst_h * 1.0 / h) * w); + } else { + cv::resize(src, dst, dst_size); + effect_area.x = 0; + effect_area.y = 0; + effect_area.width = dst_w; + effect_area.height = dst_h; + return 0; + } + cv::Mat tmp; + cv::resize(src, tmp, cv::Size(tmp_w, tmp_h)); + + if (tmp_w != dst_w) { + int index_w = floor((dst_w - tmp_w) / 2.0); + for (int i = 0; i < dst_h; i++) { + memcpy(dst.data + i * dst_w * 3 + index_w * 3, + tmp.data + i * tmp_w * 3, + tmp_w * 3); + } + effect_area.x = index_w; + effect_area.y = 0; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else if (tmp_h != dst_h) { + int index_h = floor((dst_h - tmp_h) / 2.0); + memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3); + effect_area.x = 0; + effect_area.y = index_h; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else { + printf("error\n"); + } + return 0; +} + +const int color_list[80][3] = { + {216, 82, 24}, {236, 176, 31}, {125, 46, 141}, {118, 171, 47}, + {76, 189, 237}, {238, 19, 46}, {76, 76, 76}, {153, 153, 153}, + {255, 0, 0}, {255, 127, 0}, {190, 190, 0}, {0, 255, 0}, + {0, 0, 255}, {170, 0, 255}, {84, 84, 0}, {84, 170, 0}, + {84, 255, 0}, {170, 84, 0}, {170, 170, 0}, {170, 255, 0}, + {255, 84, 0}, {255, 170, 0}, {255, 255, 0}, {0, 84, 127}, + {0, 170, 127}, {0, 255, 127}, {84, 0, 127}, {84, 84, 127}, + {84, 170, 127}, {84, 255, 127}, {170, 0, 127}, {170, 84, 127}, + {170, 170, 127}, {170, 255, 127}, {255, 0, 127}, {255, 84, 127}, + {255, 170, 127}, {255, 255, 127}, {0, 84, 255}, {0, 170, 255}, + {0, 255, 255}, {84, 0, 255}, {84, 84, 255}, {84, 170, 255}, + {84, 255, 255}, {170, 0, 255}, {170, 84, 255}, {170, 170, 255}, + {170, 255, 255}, {255, 0, 255}, {255, 84, 255}, {255, 170, 255}, + {42, 0, 0}, {84, 0, 0}, {127, 0, 0}, {170, 0, 0}, + {212, 0, 0}, {255, 0, 0}, {0, 42, 0}, {0, 84, 0}, + {0, 127, 0}, {0, 170, 0}, {0, 212, 0}, {0, 255, 0}, + {0, 0, 42}, {0, 0, 84}, {0, 0, 127}, {0, 0, 170}, + {0, 0, 212}, {0, 0, 255}, {0, 0, 0}, {36, 36, 36}, + {72, 72, 72}, {109, 109, 109}, {145, 145, 145}, {182, 182, 182}, + {218, 218, 218}, {0, 113, 188}, {80, 182, 188}, {127, 127, 0}, +}; + +void draw_bboxes(const cv::Mat& bgr, + const std::vector& bboxes, + object_rect effect_roi, + std::string save_path = "None") { + static const char* class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = bgr.clone(); + int src_w = image.cols; + int src_h = image.rows; + int dst_w = effect_roi.width; + int dst_h = effect_roi.height; + float width_ratio = (float)src_w / (float)dst_w; + float height_ratio = (float)src_h / (float)dst_h; + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo& bbox = bboxes[i]; + cv::Scalar color = cv::Scalar(color_list[bbox.label][0], + color_list[bbox.label][1], + color_list[bbox.label][2]); + cv::rectangle(image, + cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, + (bbox.y1 - effect_roi.y) * height_ratio), + cv::Point((bbox.x2 - effect_roi.x) * width_ratio, + (bbox.y2 - effect_roi.y) * height_ratio)), + color); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + + int x = (bbox.x1 - effect_roi.x) * width_ratio; + int y = + (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine; + if (y < 0) y = 0; + if (x + label_size.width > image.cols) x = image.cols - label_size.width; + + cv::rectangle( + image, + cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, label_size.height + baseLine)), + color, + -1); + + cv::putText(image, + text, + cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, + 0.4, + cv::Scalar(255, 255, 255)); + } + + if (save_path == "None") { + cv::imshow("image", image); + } else { + cv::imwrite(save_path, image); + std::cout << save_path << std::endl; + } +} + +std::vector coordsback(const cv::Mat image, + const object_rect effect_roi, + const std::vector& bboxes) { + int src_w = image.cols; + int src_h = image.rows; + int dst_w = effect_roi.width; + int dst_h = effect_roi.height; + float width_ratio = (float)src_w / (float)dst_w; + float height_ratio = (float)src_h / (float)dst_h; + + std::vector bboxes_oimg; + + for (int i = 0; i < bboxes.size(); i++) { + auto bbox = bboxes[i]; + bbox.x1 = (bbox.x1 - effect_roi.x) * width_ratio; + bbox.y1 = (bbox.y1 - effect_roi.y) * height_ratio; + bbox.x2 = (bbox.x2 - effect_roi.x) * width_ratio; + bbox.y2 = (bbox.y2 - effect_roi.y) * height_ratio; + bboxes_oimg.emplace_back(bbox); + } + return bboxes_oimg; +} + +void image_infer_kpts(KeyPointDetector* kpts_detector, + cv::Mat image, + const object_rect effect_roi, + const std::vector& results, + std::string img_name = "kpts_vis", + bool save_img = true) { + std::vector cropimgs; + std::vector> center_bs; + std::vector> scale_bs; + std::vector kpts_results; + auto results_oimg = coordsback(image, effect_roi, results); + + for (int i = 0; i < results_oimg.size(); i++) { + auto rect = results_oimg[i]; + if (rect.label == 0) { + cv::Mat cropimg; + std::vector center, scale; + std::vector area = {static_cast(rect.x1), + static_cast(rect.y1), + static_cast(rect.x2), + static_cast(rect.y2)}; + CropImg(image, cropimg, area, center, scale); + // cv::imwrite("./test_crop_"+std::to_string(i)+".jpg", cropimg); + cropimgs.emplace_back(cropimg); + center_bs.emplace_back(center); + scale_bs.emplace_back(scale); + } + if (cropimgs.size() == 1 || + (cropimgs.size() > 0 && i == results_oimg.size() - 1)) { + kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results); + cropimgs.clear(); + center_bs.clear(); + scale_bs.clear(); + } + } + std::vector compression_params; + compression_params.push_back(cv::IMWRITE_JPEG_QUALITY); + compression_params.push_back(95); + std::string kpts_savepath = + "keypoint_" + img_name.substr(img_name.find_last_of('/') + 1); + cv::Mat kpts_vis_img = + VisualizeKptsResult(image, kpts_results, {0, 255, 0}, 0.3); + if (save_img) { + cv::imwrite(kpts_savepath, kpts_vis_img, compression_params); + printf("Visualized output saved as %s\n", kpts_savepath.c_str()); + } else { + cv::imshow("image", kpts_vis_img); + } +} + +int image_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + const char* imagepath) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name); + if (image.empty()) { + fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); + return -1; + } + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); + std::vector results; + detector.detect(resized_img, results); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, img_name); + } + } + return 0; +} + +int webcam_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + int cam_id) { + cv::Mat image; + cv::VideoCapture cap(cam_id); + + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); + std::vector results; + detector.detect(resized_img, results); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, "", false); + } + } + return 0; +} + +int video_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + const char* path) { + cv::Mat image; + cv::VideoCapture cap(path); + + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); + std::vector results; + detector.detect(resized_img, results); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, "", false); + } + } + return 0; +} + +int benchmark(KeyPointDetector* kpts_detector) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(256, 192, CV_8UC3, cv::Scalar(1, 1, 1)); + std::vector center = {128, 96}; + std::vector scale = {256, 192}; + std::vector cropimgs = {image}; + std::vector> center_bs = {center}; + std::vector> scale_bs = {scale}; + std::vector kpts_results; + + for (int i = 0; i < warm_up + loop_num; i++) { + auto start = std::chrono::steady_clock::now(); + std::vector results; + kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results); + auto end = std::chrono::steady_clock::now(); + + std::chrono::duration elapsed = end - start; + double time = elapsed.count(); + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; + } + } + time_avg /= loop_num; + fprintf(stderr, + "%20s min = %7.2f max = %7.2f avg = %7.2f\n", + "tinypose", + time_min, + time_max, + time_avg); + return 0; +} + +int main(int argc, char** argv) { + if (argc != 3) { + fprintf(stderr, + "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n " + "For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; " + "\n For benchmark, mode=3 path=0.\n", + argv[0]); + return -1; + } + PicoDet detector = + PicoDet("../weight/picodet_m_416.mnn", 416, 416, 4, 0.45, 0.3); + KeyPointDetector* kpts_detector = + new KeyPointDetector("../weight/tinypose256.mnn", 4, 256, 192); + int mode = atoi(argv[1]); + switch (mode) { + case 0: { + int cam_id = atoi(argv[2]); + webcam_demo(detector, kpts_detector, cam_id); + break; + } + case 1: { + const char* images = argv[2]; + image_demo(detector, kpts_detector, images); + break; + } + case 2: { + const char* path = argv[2]; + video_demo(detector, kpts_detector, path); + break; + } + case 3: { + benchmark(kpts_detector); + break; + } + default: { + fprintf(stderr, + "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; " + "\n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, " + "mode=2; \n For benchmark, mode=3 path=0.\n", + argv[0]); + break; + } + } + delete kpts_detector; + kpts_detector = nullptr; +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.cpp new file mode 100644 index 0000000000000000000000000000000000000000..9f38e68c16317e7bc2019a6e533e550bd6607f93 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.cpp @@ -0,0 +1,229 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn + +#include "picodet_mnn.h" + +using namespace std; + +PicoDet::PicoDet(const std::string &mnn_path, + int input_width, + int input_length, + int num_thread_, + float score_threshold_, + float nms_threshold_) { + num_thread = num_thread_; + in_w = input_width; + in_h = input_length; + score_threshold = score_threshold_; + nms_threshold = nms_threshold_; + + PicoDet_interpreter = std::shared_ptr( + MNN::Interpreter::createFromFile(mnn_path.c_str())); + MNN::ScheduleConfig config; + config.numThread = num_thread; + MNN::BackendConfig backendConfig; + backendConfig.precision = (MNN::BackendConfig::PrecisionMode)2; + config.backendConfig = &backendConfig; + + PicoDet_session = PicoDet_interpreter->createSession(config); + + input_tensor = PicoDet_interpreter->getSessionInput(PicoDet_session, nullptr); +} + +PicoDet::~PicoDet() { + PicoDet_interpreter->releaseModel(); + PicoDet_interpreter->releaseSession(PicoDet_session); +} + +int PicoDet::detect(cv::Mat &raw_image, std::vector &result_list) { + if (raw_image.empty()) { + std::cout << "image is empty ,please check!" << std::endl; + return -1; + } + + image_h = raw_image.rows; + image_w = raw_image.cols; + cv::Mat image; + cv::resize(raw_image, image, cv::Size(in_w, in_h)); + + PicoDet_interpreter->resizeTensor(input_tensor, {1, 3, in_h, in_w}); + PicoDet_interpreter->resizeSession(PicoDet_session); + std::shared_ptr pretreat(MNN::CV::ImageProcess::create( + MNN::CV::BGR, MNN::CV::BGR, mean_vals, 3, norm_vals, 3)); + pretreat->convert(image.data, in_w, in_h, image.step[0], input_tensor); + + auto start = chrono::steady_clock::now(); + + // run network + PicoDet_interpreter->runSession(PicoDet_session); + + // get output data + std::vector> results; + results.resize(num_class); + + for (const auto &head_info : heads_info) { + MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.cls_layer.c_str()); + MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.dis_layer.c_str()); + + MNN::Tensor tensor_scores_host(tensor_scores, + tensor_scores->getDimensionType()); + tensor_scores->copyToHostTensor(&tensor_scores_host); + + MNN::Tensor tensor_boxes_host(tensor_boxes, + tensor_boxes->getDimensionType()); + tensor_boxes->copyToHostTensor(&tensor_boxes_host); + + decode_infer(&tensor_scores_host, + &tensor_boxes_host, + head_info.stride, + score_threshold, + results); + } + + auto end = chrono::steady_clock::now(); + chrono::duration elapsed = end - start; + cout << "inference time:" << elapsed.count() << " s, "; + + for (int i = 0; i < (int)results.size(); i++) { + nms(results[i], nms_threshold); + + for (auto box : results[i]) { + box.x1 = box.x1 / in_w * image_w; + box.x2 = box.x2 / in_w * image_w; + box.y1 = box.y1 / in_h * image_h; + box.y2 = box.y2 / in_h * image_h; + result_list.push_back(box); + } + } + cout << "detect " << result_list.size() << " objects." << std::endl; + ; + + return 0; +} + +void PicoDet::decode_infer(MNN::Tensor *cls_pred, + MNN::Tensor *dis_pred, + int stride, + float threshold, + std::vector> &results) { + int feature_h = in_h / stride; + int feature_w = in_w / stride; + + for (int idx = 0; idx < feature_h * feature_w; idx++) { + const float *scores = cls_pred->host() + (idx * num_class); + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + for (int label = 0; label < num_class; label++) { + if (scores[label] > score) { + score = scores[label]; + cur_label = label; + } + } + if (score > threshold) { + const float *bbox_pred = + dis_pred->host() + (idx * 4 * (reg_max + 1)); + results[cur_label].push_back( + disPred2Bbox(bbox_pred, cur_label, score, col, row, stride)); + } + } +} + +BoxInfo PicoDet::disPred2Bbox( + const float *&dfl_det, int label, float score, int x, int y, int stride) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[reg_max + 1]; + activation_function_softmax( + dfl_det + i * (reg_max + 1), dis_after_sm, reg_max + 1); + for (int j = 0; j < reg_max + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + float xmin = (std::max)(ct_x - dis_pred[0], .0f); + float ymin = (std::max)(ct_y - dis_pred[1], .0f); + float xmax = (std::min)(ct_x + dis_pred[2], (float)in_w); + float ymax = (std::min)(ct_y + dis_pred[3], (float)in_h); + return BoxInfo{xmin, ymin, xmax, ymax, score, label}; +} + +void PicoDet::nms(std::vector &input_boxes, float NMS_THRESH) { + std::sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) { + return a.score > b.score; + }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) * + (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1); + float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1); + float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2); + float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= NMS_THRESH) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } else { + j++; + } + } + } +} + +string PicoDet::get_label_str(int label) { return labels[label]; } + +inline float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); } + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.h new file mode 100644 index 0000000000000000000000000000000000000000..8686f5cf69cdb49a3aa09eecaf02f3514b1d05c1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_mnn_kpts/picodet_mnn.h @@ -0,0 +1,138 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn + +#ifndef __PicoDet_H__ +#define __PicoDet_H__ + +#pragma once + +#include "Interpreter.hpp" + +#include +#include +#include +#include +#include +#include +#include +#include "ImageProcess.hpp" +#include "MNNDefine.h" +#include "Tensor.hpp" + +typedef struct HeadInfo_ { + std::string cls_layer; + std::string dis_layer; + int stride; +} HeadInfo; + +typedef struct BoxInfo_ { + float x1; + float y1; + float x2; + float y2; + float score; + int label; +} BoxInfo; + +class PicoDet { + public: + PicoDet(const std::string &mnn_path, + int input_width, + int input_length, + int num_thread_ = 4, + float score_threshold_ = 0.5, + float nms_threshold_ = 0.3); + + ~PicoDet(); + + int detect(cv::Mat &img, std::vector &result_list); + std::string get_label_str(int label); + + private: + void decode_infer(MNN::Tensor *cls_pred, + MNN::Tensor *dis_pred, + int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox( + const float *&dfl_det, int label, float score, int x, int y, int stride); + void nms(std::vector &input_boxes, float NMS_THRESH); + + private: + std::shared_ptr PicoDet_interpreter; + MNN::Session *PicoDet_session = nullptr; + MNN::Tensor *input_tensor = nullptr; + + int num_thread; + int image_w; + int image_h; + + int in_w = 320; + int in_h = 320; + + float score_threshold; + float nms_threshold; + + const float mean_vals[3] = {103.53f, 116.28f, 123.675f}; + const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f}; + + const int num_class = 80; + const int reg_max = 7; + + std::vector heads_info{ + // cls_pred|dis_pred|stride + {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, + {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, + {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, + {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, + }; + + std::vector labels{ + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; +}; + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length); + +inline float fast_exp(float x); +inline float sigmoid(float x); + +#endif diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..0d4344c699d58082eb37ebe6089e16ad120bc87e --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/CMakeLists.txt @@ -0,0 +1,38 @@ +cmake_minimum_required(VERSION 3.9) +set(CMAKE_CXX_STANDARD 17) + +project(picodet_demo) + +find_package(OpenMP REQUIRED) +if(OPENMP_FOUND) + message("OPENMP FOUND") + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") + set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}") +endif() + +# find_package(OpenCV REQUIRED) +find_package(OpenCV REQUIRED PATHS "/path/to/opencv-3.4.16_gcc8.2_ffmpeg") + +# find_package(ncnn REQUIRED) +find_package(ncnn REQUIRED PATHS "/path/to/ncnn/build/install/lib/cmake/ncnn") +if(NOT TARGET ncnn) + message(WARNING "ncnn NOT FOUND! Please set ncnn_DIR environment variable") +else() + message("ncnn FOUND ") +endif() + +include_directories( + ${OpenCV_INCLUDE_DIRS} + ${CMAKE_CURRENT_SOURCE_DIR} + ${CMAKE_CURRENT_BINARY_DIR} +) + + +add_executable(picodet_demo main.cpp picodet.cpp) + +target_link_libraries( + picodet_demo + ncnn + ${OpenCV_LIBS} +) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f9867b8acc9652e22bfd891671da4f5429436c3c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/README.md @@ -0,0 +1,129 @@ +# PicoDet NCNN Demo + +该Demo提供的预测代码是根据[Tencent's NCNN framework](https://github.com/Tencent/ncnn)推理库预测的。 + +# 第一步:编译 +## Windows +### Step1. +Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/community/ + +### Step2. +Download and install OpenCV from https://github.com/opencv/opencv/releases + +为了方便,如果环境是gcc8.2 x86环境,可直接下载以下库: +```shell +wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +``` + +### Step3(可选). +Download and install Vulkan SDK from https://vulkan.lunarg.com/sdk/home + +### Step4:编译NCNN + +``` shell script +git clone --recursive https://github.com/Tencent/ncnn.git +``` +Build NCNN following this tutorial: [Build for Windows x64 using VS2017](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-windows-x64-using-visual-studio-community-2017) + +### Step5. + +增加 `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` 到系统变量中 + +Build project: Open x64 Native Tools Command Prompt for VS 2019 or 2017 + +``` cmd +cd +mkdir -p build +cd build +cmake .. +msbuild picodet_demo.vcxproj /p:configuration=release /p:platform=x64 +``` + +## Linux + +### Step1. +Build and install OpenCV from https://github.com/opencv/opencv + +### Step2(可选). +Download Vulkan SDK from https://vulkan.lunarg.com/sdk/home + +### Step3:编译NCNN +Clone NCNN repository + +``` shell script +git clone --recursive https://github.com/Tencent/ncnn.git +``` + +Build NCNN following this tutorial: [Build for Linux / NVIDIA Jetson / Raspberry Pi](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-linux) + +### Step4:编译可执行文件 + +``` shell script +cd +mkdir build +cd build +cmake .. +make +``` +# Run demo + +- 准备模型 + ```shell + modelName=picodet_s_320_coco_lcnet + # 导出Inference model + python tools/export_model.py \ + -c configs/picodet/${modelName}.yml \ + -o weights=${modelName}.pdparams \ + --output_dir=inference_model + # 转换到ONNX + paddle2onnx --model_dir inference_model/${modelName} \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file ${modelName}.onnx + # 简化模型 + python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx + # 将模型转换至NCNN格式 + Run onnx2ncnn in ncnn tools to generate ncnn .param and .bin file. + ``` +转NCNN模型可以利用在线转换工具 [https://convertmodel.com](https://convertmodel.com/) + +为了快速测试,可直接下载:[picodet_s_320_coco_lcnet-opt.bin](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.bin)/ [picodet_s_320_coco_lcnet-opt.param](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.param)(不带后处理)。 + +**注意:**由于带后处理后,NCNN预测会出NAN,暂时使用不带后处理Demo即可,带后处理的Demo正在升级中,很快发布。 + + +## 开始运行 + +首先新建预测结果存放目录: +```shell +cp -r ../demo_onnxruntime/imgs . +cd build +mkdir ../results +``` + +- 预测一张图片 +``` shell +./picodet_demo 0 ../picodet_s_320_coco_lcnet.bin ../picodet_s_320_coco_lcnet.param 320 320 ../imgs/dog.jpg 0 +``` +具体参数解析可参考`main.cpp`。 + +-测试速度Benchmark + +``` shell +./picodet_demo 1 ../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param 320 320 0 +``` + +## FAQ + +- 预测结果精度不对: +请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下: +```shell +# 分类分支 | 检测分支 +{"transpose_0.tmp_0", "transpose_1.tmp_0"}, +{"transpose_2.tmp_0", "transpose_3.tmp_0"}, +{"transpose_4.tmp_0", "transpose_5.tmp_0"}, +{"transpose_6.tmp_0", "transpose_7.tmp_0"}, +``` +可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。 diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/main.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..8f69af93b2de7f9404fb86d5112ce62056d936b4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/main.cpp @@ -0,0 +1,210 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn + +#include "picodet.h" +#include +#include +#include +#include +#include +#include + +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else + // show it in windows +struct object_rect { + int x; + int y; + int width; + int height; +}; + +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +void draw_bboxes(const cv::Mat &im, const std::vector &bboxes, + std::string save_path = "None") { + static const char *class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = im.clone(); + int src_w = image.cols; + int src_h = image.rows; + int thickness = 2; + auto colormap = GenerateColorMap(sizeof(class_names)); + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo &bbox = bboxes[i]; + std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". " + << bbox.y2 << ". " << std::endl; + int c1 = colormap[3 * bbox.label + 0]; + int c2 = colormap[3 * bbox.label + 1]; + int c3 = colormap[3 * bbox.label + 2]; + cv::Scalar color = cv::Scalar(c1, c2, c3); + // cv::Scalar color = cv::Scalar(0, 0, 255); + cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1), + cv::Point(bbox.x2, bbox.y2)), + color, 1); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + + int x = bbox.x1; + int y = bbox.y1 - label_size.height - baseLine; + if (y < 0) + y = 0; + if (x + label_size.width > image.cols) + x = image.cols - label_size.width; + + cv::rectangle(image, cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, + label_size.height + baseLine)), + color, -1); + + cv::putText(image, text, cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1); + } + + if (save_path == "None") { + cv::imshow("image", image); + } else { + cv::imwrite(save_path, image); + std::cout << "Result save in: " << save_path << std::endl; + } +} + +int image_demo(PicoDet &detector, const char *imagepath, + int has_postprocess = 0) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + bool is_postprocess = has_postprocess > 0 ? true : false; + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR); + if (image.empty()) { + fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); + return -1; + } + std::vector results; + detector.detect(image, results, is_postprocess); + std::cout << "detect done." << std::endl; + +#ifdef __SAVE_RESULT__ + std::string save_path = img_name; + draw_bboxes(image, results, save_path.replace(3, 4, "results")); +#else + draw_bboxes(image, results); + cv::waitKey(0); +#endif + } + return 0; +} + +int benchmark(PicoDet &detector, int width, int height, + int has_postprocess = 0) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1)); + bool is_postprocess = has_postprocess > 0 ? true : false; + for (int i = 0; i < warm_up + loop_num; i++) { + double start = ncnn::get_current_time(); + std::vector results; + detector.detect(image, results, is_postprocess); + double end = ncnn::get_current_time(); + + double time = end - start; + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; + } + } + time_avg /= loop_num; + fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", + time_min, time_max, time_avg); + return 0; +} + +int main(int argc, char **argv) { + int mode = atoi(argv[1]); + char *bin_model_path = argv[2]; + char *param_model_path = argv[3]; + int height = 320; + int width = 320; + if (argc == 5) { + height = atoi(argv[4]); + width = atoi(argv[5]); + } + PicoDet detector = + PicoDet(param_model_path, bin_model_path, width, height, true, 0.45, 0.3); + if (mode == 1) { + + benchmark(detector, width, height, atoi(argv[6])); + } else { + if (argc != 6) { + std::cout << "Must set image file, such as ./picodet_demo 0 " + "../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param " + "320 320 img.jpg" + << std::endl; + } + const char *images = argv[6]; + image_demo(detector, images, atoi(argv[7])); + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.cpp new file mode 100644 index 0000000000000000000000000000000000000000..d5f0ba3c788b0813f85dc61e35ac543661212d1c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.cpp @@ -0,0 +1,236 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn + +#include "picodet.h" +#include +#include + +inline float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); } + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} + +bool PicoDet::hasGPU = false; +PicoDet *PicoDet::detector = nullptr; + +PicoDet::PicoDet(const char *param, const char *bin, int input_width, + int input_hight, bool useGPU, float score_threshold_ = 0.5, + float nms_threshold_ = 0.3) { + this->Net = new ncnn::Net(); +#if NCNN_VULKAN + this->hasGPU = ncnn::get_gpu_count() > 0; +#endif + this->Net->opt.use_vulkan_compute = this->hasGPU && useGPU; + this->Net->opt.use_fp16_arithmetic = true; + this->Net->load_param(param); + this->Net->load_model(bin); + this->in_w = input_width; + this->in_h = input_hight; + this->score_threshold = score_threshold_; + this->nms_threshold = nms_threshold_; +} + +PicoDet::~PicoDet() { delete this->Net; } + +void PicoDet::preprocess(cv::Mat &image, ncnn::Mat &in) { + // cv::resize(image, image, cv::Size(this->in_w, this->in_h), 0.f, 0.f); + int img_w = image.cols; + int img_h = image.rows; + in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR, img_w, + img_h, this->in_w, this->in_h); + const float mean_vals[3] = {103.53f, 116.28f, 123.675f}; + const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f}; + in.substract_mean_normalize(mean_vals, norm_vals); +} + +int PicoDet::detect(cv::Mat image, std::vector &result_list, + bool has_postprocess) { + + ncnn::Mat input; + preprocess(image, input); + auto ex = this->Net->create_extractor(); + ex.set_light_mode(false); + ex.set_num_threads(4); +#if NCNN_VULKAN + ex.set_vulkan_compute(this->hasGPU); +#endif + ex.input("image", input); // picodet + + this->image_h = image.rows; + this->image_w = image.cols; + + std::vector> results; + results.resize(this->num_class); + + if (has_postprocess) { + ncnn::Mat dis_pred; + ncnn::Mat cls_pred; + ex.extract(this->nms_heads_info[0].c_str(), dis_pred); + ex.extract(this->nms_heads_info[1].c_str(), cls_pred); + std::cout << dis_pred.h << " " << dis_pred.w << std::endl; + std::cout << cls_pred.h << " " << cls_pred.w << std::endl; + this->nms_boxes(cls_pred, dis_pred, this->score_threshold, results); + } else { + for (const auto &head_info : this->non_postprocess_heads_info) { + ncnn::Mat dis_pred; + ncnn::Mat cls_pred; + ex.extract(head_info.dis_layer.c_str(), dis_pred); + ex.extract(head_info.cls_layer.c_str(), cls_pred); + this->decode_infer(cls_pred, dis_pred, head_info.stride, + this->score_threshold, results); + } + } + + for (int i = 0; i < (int)results.size(); i++) { + this->nms(results[i], this->nms_threshold); + + for (auto box : results[i]) { + box.x1 = box.x1 / this->in_w * this->image_w; + box.x2 = box.x2 / this->in_w * this->image_w; + box.y1 = box.y1 / this->in_h * this->image_h; + box.y2 = box.y2 / this->in_h * this->image_h; + result_list.push_back(box); + } + } + return 0; +} + +void PicoDet::nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, + float score_threshold, + std::vector> &result_list) { + BoxInfo bbox; + int i, j; + for (i = 0; i < dis_pred.h; i++) { + bbox.x1 = dis_pred.row(i)[0]; + bbox.y1 = dis_pred.row(i)[1]; + bbox.x2 = dis_pred.row(i)[2]; + bbox.y2 = dis_pred.row(i)[3]; + const float *scores = cls_pred.row(i); + float score = 0; + int cur_label = 0; + for (int label = 0; label < this->num_class; label++) { + float score_ = cls_pred.row(label)[i]; + if (score_ > score) { + score = score_; + cur_label = label; + } + } + bbox.score = score; + bbox.label = cur_label; + result_list[cur_label].push_back(bbox); + } +} + +void PicoDet::decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride, + float threshold, + std::vector> &results) { + int feature_h = ceil((float)this->in_w / stride); + int feature_w = ceil((float)this->in_h / stride); + + for (int idx = 0; idx < feature_h * feature_w; idx++) { + const float *scores = cls_pred.row(idx); + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + for (int label = 0; label < this->num_class; label++) { + if (scores[label] > score) { + score = scores[label]; + cur_label = label; + } + } + if (score > threshold) { + const float *bbox_pred = dis_pred.row(idx); + results[cur_label].push_back( + this->disPred2Bbox(bbox_pred, cur_label, score, col, row, stride)); + } + } +} + +BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score, + int x, int y, int stride) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[this->reg_max + 1]; + activation_function_softmax(dfl_det + i * (this->reg_max + 1), dis_after_sm, + this->reg_max + 1); + for (int j = 0; j < this->reg_max + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + float xmin = (std::max)(ct_x - dis_pred[0], .0f); + float ymin = (std::max)(ct_y - dis_pred[1], .0f); + float xmax = (std::min)(ct_x + dis_pred[2], (float)this->in_w); + float ymax = (std::min)(ct_y + dis_pred[3], (float)this->in_w); + return BoxInfo{xmin, ymin, xmax, ymax, score, label}; +} + +void PicoDet::nms(std::vector &input_boxes, float NMS_THRESH) { + std::sort(input_boxes.begin(), input_boxes.end(), + [](BoxInfo a, BoxInfo b) { return a.score > b.score; }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) * + (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1); + float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1); + float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2); + float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= NMS_THRESH) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } else { + j++; + } + } + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.h new file mode 100644 index 0000000000000000000000000000000000000000..dd8c8f5af96aed9393e207b6e920259d95befbe7 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_ncnn/picodet.h @@ -0,0 +1,87 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn + +#ifndef PICODET_H +#define PICODET_H + +#include +#include + +typedef struct NonPostProcessHeadInfo { + std::string cls_layer; + std::string dis_layer; + int stride; +} NonPostProcessHeadInfo; + +typedef struct BoxInfo { + float x1; + float y1; + float x2; + float y2; + float score; + int label; +} BoxInfo; + +class PicoDet { +public: + PicoDet(const char *param, const char *bin, int input_width, int input_hight, + bool useGPU, float score_threshold_, float nms_threshold_); + + ~PicoDet(); + + static PicoDet *detector; + ncnn::Net *Net; + static bool hasGPU; + + int detect(cv::Mat image, std::vector &result_list, + bool has_postprocess); + +private: + void preprocess(cv::Mat &image, ncnn::Mat &in); + void decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); + void nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, + float score_threshold, + std::vector> &result_list); + + int image_w; + int image_h; + int in_w = 320; + int in_h = 320; + int num_class = 80; + int reg_max = 7; + + float score_threshold; + float nms_threshold; + + std::vector bbox_output_data_; + std::vector class_output_data_; + + std::vector nms_heads_info{"tmp_16", "concat_4.tmp_0"}; + // If not export post-process, will use non_postprocess_heads_info + std::vector non_postprocess_heads_info{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; +}; + +#endif diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cdd5b603f2f4a3f46fa85f4d4739974d9079a775 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/README.md @@ -0,0 +1,33 @@ +# PP-YOLOE 转ONNX-TRT教程 + +本教程内容为:使用PP-YOLOE模型导出转换为ONNX格式,并定制化修改网络,使用[EfficientNMS_TRT](https://github.com/NVIDIA/TensorRT/tree/main/plugin/efficientNMSPlugin) OP, +可成功运行在[TensorRT](https://github.com/NVIDIA/TensorRT)上,示例仅供参考 + +## 1. 环境依赖 +CUDA 10.2 + [cudnn 8.2.1](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html) + [TensorRT 8.2](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-821/install-guide/index.htm) +```commandline +onnx +onnxruntime +paddle2onnx +``` + +## 2. Paddle模型导出 +```commandline +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True exclude_nms=True +``` + +## 3. ONNX模型转换 + 定制化修改EfficientNMS_TRT +```commandline +python deploy/third_engine/demo_onnx_trt/onnx_custom.py --onnx_file=output_inference/ppyoloe_crn_l_300e_coco/ppyoloe_crn_l_300e_coco.onnx --model_dir=output_inference/ppyoloe_crn_l_300e_coco/ --opset_version=11 +``` + +## 4. TensorRT Engine +```commandline +trtexec --onnx=output_inference/ppyoloe_crn_l_300e_coco/ppyoloe_crn_l_300e_coco.onnx --saveEngine=ppyoloe_crn_l_300e_coco.engine +``` +**注意**:若运行报错,可尝试添加`--tacticSources=-cublasLt,+cublas`参数解决 + +## 5. 运行TensorRT推理 +```commandline +python deploy/third_engine/demo_onnx_trt/trt_infer.py --infer_cfg=output_inference/ppyoloe_crn_l_300e_coco/infer_cfg.yml --trt_engine=ppyoloe_crn_l_300e_coco.engine --image_file=demo/000000014439.jpg +``` diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/onnx_custom.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/onnx_custom.py new file mode 100644 index 0000000000000000000000000000000000000000..5d6ae82869413c8d6e10c4ad123ca5e64073afc8 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/onnx_custom.py @@ -0,0 +1,104 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import onnx +import onnx_graphsurgeon +import numpy as np +from collections import OrderedDict +from paddle2onnx.command import program2onnx + +parser = argparse.ArgumentParser(description=__doc__) +parser.add_argument( + '--onnx_file', required=True, type=str, help='onnx model path') +parser.add_argument( + '--model_dir', + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py.")) +parser.add_argument( + "--opset_version", + type=int, + default=11, + help="set onnx opset version to export") +parser.add_argument( + '--topk_all', type=int, default=300, help='topk objects for every images') +parser.add_argument( + '--iou_thres', type=float, default=0.7, help='iou threshold for NMS') +parser.add_argument( + '--conf_thres', type=float, default=0.01, help='conf threshold for NMS') + + +def main(FLAGS): + assert os.path.exists(FLAGS.onnx_file) + onnx_model = onnx.load(FLAGS.onnx_file) + graph = onnx_graphsurgeon.import_onnx(onnx_model) + graph.toposort() + graph.fold_constants() + graph.cleanup() + + num_anchors = graph.outputs[1].shape[2] + num_classes = graph.outputs[1].shape[1] + scores = onnx_graphsurgeon.Variable( + name='scores', shape=[-1, num_anchors, num_classes], dtype=np.float32) + graph.layer( + op='Transpose', + name='lastTranspose', + inputs=[graph.outputs[1]], + outputs=[scores], + attrs=OrderedDict(perm=[0, 2, 1])) + + attrs = OrderedDict( + plugin_version="1", + background_class=-1, + max_output_boxes=FLAGS.topk_all, + score_threshold=FLAGS.conf_thres, + iou_threshold=FLAGS.iou_thres, + score_activation=False, + box_coding=0, ) + outputs = [ + onnx_graphsurgeon.Variable("num_dets", np.int32, [-1, 1]), + onnx_graphsurgeon.Variable("det_boxes", np.float32, + [-1, FLAGS.topk_all, 4]), + onnx_graphsurgeon.Variable("det_scores", np.float32, + [-1, FLAGS.topk_all]), + onnx_graphsurgeon.Variable("det_classes", np.int32, + [-1, FLAGS.topk_all]) + ] + graph.layer( + op='EfficientNMS_TRT', + name="batched_nms", + inputs=[graph.outputs[0], scores], + outputs=outputs, + attrs=attrs) + graph.outputs = outputs + graph.cleanup().toposort() + onnx.save(onnx_graphsurgeon.export_onnx(graph), FLAGS.onnx_file) + print(f"The modified onnx model is saved in {FLAGS.onnx_file}") + + +if __name__ == '__main__': + FLAGS = parser.parse_args() + if FLAGS.model_dir is not None: + assert os.path.exists(FLAGS.model_dir) + program2onnx( + model_dir=FLAGS.model_dir, + save_file=FLAGS.onnx_file, + model_filename="model.pdmodel", + params_filename="model.pdiparams", + opset_version=FLAGS.opset_version, + enable_onnx_checker=True) + main(FLAGS) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/preprocess.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..504762db89c867b12b9e9196a4b6abc182795cc3 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/preprocess.py @@ -0,0 +1,565 @@ +import numpy as np +import cv2 +import copy + + +def decode_image(img_path): + with open(img_path, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + norm_type (str): type in ['mean_std', 'none'] + """ + + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + return inp, im_info + + +# keypoint preprocess +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + Returns: + records (dict): contain the image and coords after tranformed + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img_path): + img, im_info = decode_image(img_path) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = np.ascontiguousarray(img.astype('float32')) + return inputs + + +coco_clsid2catid = { + 0: 1, + 1: 2, + 2: 3, + 3: 4, + 4: 5, + 5: 6, + 6: 7, + 7: 8, + 8: 9, + 9: 10, + 10: 11, + 11: 13, + 12: 14, + 13: 15, + 14: 16, + 15: 17, + 16: 18, + 17: 19, + 18: 20, + 19: 21, + 20: 22, + 21: 23, + 22: 24, + 23: 25, + 24: 27, + 25: 28, + 26: 31, + 27: 32, + 28: 33, + 29: 34, + 30: 35, + 31: 36, + 32: 37, + 33: 38, + 34: 39, + 35: 40, + 36: 41, + 37: 42, + 38: 43, + 39: 44, + 40: 46, + 41: 47, + 42: 48, + 43: 49, + 44: 50, + 45: 51, + 46: 52, + 47: 53, + 48: 54, + 49: 55, + 50: 56, + 51: 57, + 52: 58, + 53: 59, + 54: 60, + 55: 61, + 56: 62, + 57: 63, + 58: 64, + 59: 65, + 60: 67, + 61: 70, + 62: 72, + 63: 73, + 64: 74, + 65: 75, + 66: 76, + 67: 77, + 68: 78, + 69: 79, + 70: 80, + 71: 81, + 72: 82, + 73: 84, + 74: 85, + 75: 86, + 76: 87, + 77: 88, + 78: 89, + 79: 90 +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/trt_infer.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/trt_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..2d1381f2553c8fed9b702295f1db0ab83a68c71a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnx_trt/trt_infer.py @@ -0,0 +1,282 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time +import numpy as np +import pycuda.autoinit +import pycuda.driver as cuda + +import tensorrt as trt +from collections import OrderedDict +import os +import yaml +import json +import glob +import argparse + +from preprocess import Compose +from preprocess import coco_clsid2catid + +parser = argparse.ArgumentParser(description=__doc__) +parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml") +parser.add_argument( + "--trt_engine", required=True, type=str, help="trt engine path") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) +parser.add_argument( + "--repeats", + type=int, + default=1, + help="Repeat the running test `repeats` times in benchmark") +parser.add_argument( + "--save_coco", + action='store_true', + default=False, + help="Whether to save coco results") +parser.add_argument( + "--coco_file", type=str, default="results.json", help="coco results path") + +TRT_LOGGER = trt.Logger() +trt.init_libnvinfer_plugins(TRT_LOGGER, namespace="") +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'PPYOLOE', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', + 'S2ANet', 'JDE', 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', + 'TOOD', 'RetinaNet', 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet' +} + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +class PredictConfig(object): + """set config of preprocess, postprocess and visualize + Args: + infer_config (str): path of infer_cfg.yml + """ + + def __init__(self, infer_config): + # parsing Yaml config for Preprocess + with open(infer_config) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.label_list = yml_conf['label_list'] + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + self.draw_threshold = yml_conf.get("draw_threshold", 0.5) + self.mask = yml_conf.get("mask", False) + self.tracker = yml_conf.get("tracker", None) + self.nms = yml_conf.get("NMS", None) + self.fpn_stride = yml_conf.get("fpn_stride", None) + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def load_trt_engine(engine_path): + assert os.path.exists(engine_path) + print("Reading engine from file {}".format(engine_path)) + with open(engine_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: + return runtime.deserialize_cuda_engine(f.read()) + + +def predict_image(infer_config, engine, img_list, save_coco=False, repeats=1): + # load preprocess transforms + transforms = Compose(infer_config.preprocess_infos) + + stream = cuda.Stream() + coco_results = [] + num_data = len(img_list) + avg_time = [] + with engine.create_execution_context() as context: + # Allocate host and device buffers + bindings = create_trt_bindings(engine, context) + # warmup + run_trt_context(context, bindings, stream, repeats=10) + # predict image + for i, img_path in enumerate(img_list): + inputs = transforms(img_path) + inputs_name = [k for k, v in bindings.items() if v['is_input']] + inputs = { + k: inputs[k][None, ] + for k in inputs.keys() if k in inputs_name + } + # run infer + for k, v in inputs.items(): + bindings[k]['cpu_data'][...] = v + output = run_trt_context(context, bindings, stream, repeats=repeats) + print(f"{i + 1}/{num_data} infer time: {output['infer_time']} ms.") + avg_time.append(output['infer_time']) + # get output + for k, v in output.items(): + if k in bindings.keys(): + output[k] = np.reshape(v, bindings[k]['shape']) + if save_coco: + coco_results.extend( + format_coco_results(os.path.split(img_path)[-1], output)) + avg_time = np.mean(avg_time) + print( + f"Run on {num_data} data, repeats {repeats} times, avg time: {avg_time} ms." + ) + if save_coco: + with open(FLAGS.coco_file, 'w') as f: + json.dump(coco_results, f) + print(f"save coco json to {FLAGS.coco_file}") + + +def create_trt_bindings(engine, context): + bindings = OrderedDict() + for name in engine: + binding_idx = engine.get_binding_index(name) + size = trt.volume(context.get_binding_shape(binding_idx)) + dtype = trt.nptype(engine.get_binding_dtype(name)) + shape = list(engine.get_binding_shape(binding_idx)) + if shape[0] == -1: + shape[0] = 1 + bindings[name] = { + "idx": binding_idx, + "size": size, + "dtype": dtype, + "shape": shape, + "cpu_data": None, + "cuda_ptr": None, + "is_input": True if engine.binding_is_input(name) else False + } + if engine.binding_is_input(name): + bindings[name]['cpu_data'] = np.random.randn(*shape).astype( + np.float32) + bindings[name]['cuda_ptr'] = cuda.mem_alloc(bindings[name][ + 'cpu_data'].nbytes) + else: + bindings[name]['cpu_data'] = cuda.pagelocked_empty(size, dtype) + bindings[name]['cuda_ptr'] = cuda.mem_alloc(bindings[name][ + 'cpu_data'].nbytes) + return bindings + + +def run_trt_context(context, bindings, stream, repeats=1): + # Transfer input data to the GPU. + for k, v in bindings.items(): + if v['is_input']: + cuda.memcpy_htod_async(v['cuda_ptr'], v['cpu_data'], stream) + in_bindings = [int(v['cuda_ptr']) for k, v in bindings.items()] + output_data = {} + avg_time = [] + for _ in range(repeats): + # Run inference + t1 = time.time() + context.execute_async_v2( + bindings=in_bindings, stream_handle=stream.handle) + # Transfer prediction output from the GPU. + for k, v in bindings.items(): + if not v['is_input']: + cuda.memcpy_dtoh_async(v['cpu_data'], v['cuda_ptr'], stream) + output_data[k] = v['cpu_data'] + # Synchronize the stream + stream.synchronize() + t2 = time.time() + avg_time.append(t2 - t1) + output_data['infer_time'] = np.mean(avg_time) * 1000 + return output_data + + +def format_coco_results(file_name, result): + try: + image_id = int(os.path.splitext(file_name)[0]) + except: + image_id = file_name + num_dets = result['num_dets'].tolist() + det_classes = result['det_classes'].tolist() + det_scores = result['det_scores'].tolist() + det_boxes = result['det_boxes'].tolist() + per_result = [ + { + 'image_id': image_id, + 'category_id': coco_clsid2catid[int(det_classes[0][idx])], + 'file_name': file_name, + 'bbox': [ + det_boxes[0][idx][0], det_boxes[0][idx][1], + det_boxes[0][idx][2] - det_boxes[0][idx][0], + det_boxes[0][idx][3] - det_boxes[0][idx][1] + ], # xyxy -> xywh + 'score': det_scores[0][idx] + } for idx in range(num_dets[0][0]) + ] + + return per_result + + +if __name__ == '__main__': + FLAGS = parser.parse_args() + # load image list + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + # load trt engine + engine = load_trt_engine(FLAGS.trt_engine) + # load infer config + infer_config = PredictConfig(FLAGS.infer_cfg) + + predict_image(infer_config, engine, img_list, FLAGS.save_coco, + FLAGS.repeats) + print('Done!') diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8e56ed528855ee84772f43ed8bc75b5a0219ca4b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/README.md @@ -0,0 +1,43 @@ +# PicoDet ONNX Runtime Demo + +本文件夹提供利用[ONNX Runtime](https://onnxruntime.ai/docs/)进行 PicoDet 部署与Inference images 的 Demo。 + +## 安装 ONNX Runtime + +本demo采用的是 ONNX Runtime 1.10.0,可直接运行如下指令安装: +```shell +pip install onnxruntime +``` + +详细安装步骤,可参考 [Install ONNX Runtime](https://onnxruntime.ai/docs/install/)。 + +## Inference images + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet)中【导出及转换模型】步骤,采用包含后处理的方式导出模型(`-o export.benchmark=False` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```onnx_file```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:将待测试图片放在```./imgs```文件夹下,本demo已提供了两张测试图片。 + +- 在本目录下直接运行: + ```shell + python infer_demo.py --modelpath ./onnx_file/picodet_s_320_lcnet_postprocessed.onnx + ``` + 将会对```./imgs```文件夹下所有图片进行识别,并将识别结果保存在```./results```文件夹下。 + +- 结果: +
    + +
    + +## 模型下载 + +| 模型 | 输入尺寸 | ONNX( w/ 后处理) | +| :-------- | :--------: | :---------------------: | +| PicoDet-XS | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | +| PicoDet-XS | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | +| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | +| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | +| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | +| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | +| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | +| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | +| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/coco_label.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/coco_label.txt new file mode 100644 index 0000000000000000000000000000000000000000..ca76c80b5b2cd0b25047f75736656cfebc9da7aa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/coco_label.txt @@ -0,0 +1,80 @@ +person +bicycle +car +motorbike +aeroplane +bus +train +truck +boat +traffic light +fire hydrant +stop sign +parking meter +bench +bird +cat +dog +horse +sheep +cow +elephant +bear +zebra +giraffe +backpack +umbrella +handbag +tie +suitcase +frisbee +skis +snowboard +sports ball +kite +baseball bat +baseball glove +skateboard +surfboard +tennis racket +bottle +wine glass +cup +fork +knife +spoon +bowl +banana +apple +sandwich +orange +broccoli +carrot +hot dog +pizza +donut +cake +chair +sofa +pottedplant +bed +diningtable +toilet +tvmonitor +laptop +mouse +remote +keyboard +cell phone +microwave +oven +toaster +sink +refrigerator +book +clock +vase +scissors +teddy bear +hair drier +toothbrush diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b43e311165c785f000eb7493ff8fb662d06a3f83 Binary files /dev/null and b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg differ diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg new file mode 100644 index 0000000000000000000000000000000000000000..77b0381222eaed50867643f4166092c781e56d5b Binary files /dev/null and b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg differ diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/infer_demo.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/infer_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..a2b6de5286af6c51ae607201355a30167e894cbd --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_onnxruntime/infer_demo.py @@ -0,0 +1,209 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import argparse +import onnxruntime as ort +from pathlib import Path +from tqdm import tqdm + + +class PicoDet(): + def __init__(self, + model_pb_path, + label_path, + prob_threshold=0.4, + iou_threshold=0.3): + self.classes = list( + map(lambda x: x.strip(), open(label_path, 'r').readlines())) + self.num_classes = len(self.classes) + self.prob_threshold = prob_threshold + self.iou_threshold = iou_threshold + self.mean = np.array( + [103.53, 116.28, 123.675], dtype=np.float32).reshape(1, 1, 3) + self.std = np.array( + [57.375, 57.12, 58.395], dtype=np.float32).reshape(1, 1, 3) + so = ort.SessionOptions() + so.log_severity_level = 3 + self.net = ort.InferenceSession(model_pb_path, so) + inputs_name = [a.name for a in self.net.get_inputs()] + inputs_shape = { + k: v.shape + for k, v in zip(inputs_name, self.net.get_inputs()) + } + self.input_shape = inputs_shape['image'][2:] + + def _normalize(self, img): + img = img.astype(np.float32) + img = (img / 255.0 - self.mean / 255.0) / (self.std / 255.0) + return img + + def resize_image(self, srcimg, keep_ratio=False): + top, left, newh, neww = 0, 0, self.input_shape[0], self.input_shape[1] + origin_shape = srcimg.shape[:2] + im_scale_y = newh / float(origin_shape[0]) + im_scale_x = neww / float(origin_shape[1]) + img_shape = np.array([ + [float(self.input_shape[0]), float(self.input_shape[1])] + ]).astype('float32') + scale_factor = np.array([[im_scale_y, im_scale_x]]).astype('float32') + + if keep_ratio and srcimg.shape[0] != srcimg.shape[1]: + hw_scale = srcimg.shape[0] / srcimg.shape[1] + if hw_scale > 1: + newh, neww = self.input_shape[0], int(self.input_shape[1] / + hw_scale) + img = cv2.resize( + srcimg, (neww, newh), interpolation=cv2.INTER_AREA) + left = int((self.input_shape[1] - neww) * 0.5) + img = cv2.copyMakeBorder( + img, + 0, + 0, + left, + self.input_shape[1] - neww - left, + cv2.BORDER_CONSTANT, + value=0) # add border + else: + newh, neww = int(self.input_shape[0] * + hw_scale), self.input_shape[1] + img = cv2.resize( + srcimg, (neww, newh), interpolation=cv2.INTER_AREA) + top = int((self.input_shape[0] - newh) * 0.5) + img = cv2.copyMakeBorder( + img, + top, + self.input_shape[0] - newh - top, + 0, + 0, + cv2.BORDER_CONSTANT, + value=0) + else: + img = cv2.resize( + srcimg, self.input_shape, interpolation=cv2.INTER_LINEAR) + + return img, img_shape, scale_factor + + def get_color_map_list(self, num_classes): + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + def detect(self, srcimg): + img, im_shape, scale_factor = self.resize_image(srcimg) + img = self._normalize(img) + + blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0) + + inputs_dict = { + 'im_shape': im_shape, + 'image': blob, + 'scale_factor': scale_factor + } + inputs_name = [a.name for a in self.net.get_inputs()] + net_inputs = {k: inputs_dict[k] for k in inputs_name} + + outs = self.net.run(None, net_inputs) + + outs = np.array(outs[0]) + expect_boxes = (outs[:, 1] > 0.5) & (outs[:, 0] > -1) + np_boxes = outs[expect_boxes, :] + + color_list = self.get_color_map_list(self.num_classes) + clsid2color = {} + + for i in range(np_boxes.shape[0]): + classid, conf = int(np_boxes[i, 0]), np_boxes[i, 1] + xmin, ymin, xmax, ymax = int(np_boxes[i, 2]), int(np_boxes[ + i, 3]), int(np_boxes[i, 4]), int(np_boxes[i, 5]) + + if classid not in clsid2color: + clsid2color[classid] = color_list[classid] + color = tuple(clsid2color[classid]) + + cv2.rectangle( + srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2) + print(self.classes[classid] + ': ' + str(round(conf, 3))) + cv2.putText( + srcimg, + self.classes[classid] + ':' + str(round(conf, 3)), (xmin, + ymin - 10), + cv2.FONT_HERSHEY_SIMPLEX, + 0.8, (0, 255, 0), + thickness=2) + + return srcimg + + def detect_folder(self, img_fold, result_path): + img_fold = Path(img_fold) + result_path = Path(result_path) + result_path.mkdir(parents=True, exist_ok=True) + + img_name_list = filter( + lambda x: str(x).endswith(".png") or str(x).endswith(".jpg"), + img_fold.iterdir(), ) + img_name_list = list(img_name_list) + print(f"find {len(img_name_list)} images") + + for img_path in tqdm(img_name_list): + img = cv2.imread(str(img_path), 1) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + + srcimg = net.detect(img) + save_path = str(result_path / img_path.name.replace(".png", ".jpg")) + cv2.imwrite(save_path, srcimg) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument( + '--modelpath', + type=str, + default='onnx_file/picodet_s_320_lcnet_postprocessed.onnx', + help="onnx filepath") + parser.add_argument( + '--classfile', + type=str, + default='coco_label.txt', + help="classname filepath") + parser.add_argument( + '--confThreshold', default=0.5, type=float, help='class confidence') + parser.add_argument( + '--nmsThreshold', default=0.6, type=float, help='nms iou thresh') + parser.add_argument( + "--img_fold", dest="img_fold", type=str, default="./imgs") + parser.add_argument( + "--result_fold", dest="result_fold", type=str, default="results") + args = parser.parse_args() + + net = PicoDet( + args.modelpath, + args.classfile, + prob_threshold=args.confThreshold, + iou_threshold=args.nmsThreshold) + + net.detect_folder(args.img_fold, args.result_fold) + print( + f'infer results in ./deploy/third_engine/demo_onnxruntime/{args.result_fold}' + ) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..8ee82513f414779ba8c7d4ff97ffa90051e8fc35 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/CMakeLists.txt @@ -0,0 +1,23 @@ +cmake_minimum_required(VERSION 3.4.1) +set(CMAKE_CXX_STANDARD 14) + +project(picodet_demo) + +find_package(OpenCV REQUIRED) +find_package(InferenceEngine REQUIRED) +find_package(ngraph REQUIRED) + +include_directories( + ${OpenCV_INCLUDE_DIRS} + ${CMAKE_CURRENT_SOURCE_DIR} + ${CMAKE_CURRENT_BINARY_DIR} +) + +add_executable(picodet_demo main.cpp picodet_openvino.cpp) + +target_link_libraries( + picodet_demo + ${InferenceEngine_LIBRARIES} + ${NGRAPH_LIBRARIES} + ${OpenCV_LIBS} +) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/README.md new file mode 100644 index 0000000000000000000000000000000000000000..99a3e0f27c519308f915627e66118c965b600e6d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/README.md @@ -0,0 +1,143 @@ +# PicoDet OpenVINO Demo + +This fold provides PicoDet inference code using +[Intel's OpenVINO Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html). Most of the implements in this fold are same as *demo_ncnn*. +**Recommand** to use the xxx.tar.gz file to install instead of github method, [link](https://registrationcenter-download.intel.com/akdlm/irc_nas/18096/l_openvino_toolkit_p_2021.4.689.tgz). + + +## Install OpenVINO Toolkit + +Go to [OpenVINO HomePage](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) + +Download a suitable version and install. + +Follow the official Get Started Guides: https://docs.openvinotoolkit.org/latest/get_started_guides.html + +## Set the Environment Variables + +### Windows: + +Run this command in cmd. (Every time before using OpenVINO) +```cmd +\openvino_2021\bin\setupvars.bat +``` + +Or set the system environment variables once for all: + +Name |Value +:--------------------:|:--------: +INTEL_OPENVINO_DIR | \openvino_2021 +INTEL_CVSDK_DIR | %INTEL_OPENVINO_DIR% +InferenceEngine_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\share +HDDL_INSTALL_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\external\hddl +ngraph_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\ngraph\cmake + +And add this to ```Path``` +``` +%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\bin\intel64\Debug;%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\bin\intel64\Release;%HDDL_INSTALL_DIR%\bin;%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\external\tbb\bin;%INTEL_OPENVINO_DIR%\deployment_tools\ngraph\lib +``` + +### Linux + +Run this command in shell. (Every time before using OpenVINO) + +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +``` + +Or edit .bashrc + +```shell +vi ~/.bashrc +``` + +Add this line to the end of the file + +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +``` + +## Convert model + + Convert to OpenVINO + + ``` shell + cd /openvino_2021/deployment_tools/model_optimizer + ``` + + Install requirements for convert tool + + ```shell + cd ./install_prerequisites + sudo install_prerequisites_onnx.sh + + ``` + + Then convert model. Notice: mean_values and scale_values should be the same with your training settings in YAML config file. + ```shell + python3 mo_onnx.py --input_model --mean_values [103.53,116.28,123.675] --scale_values [57.375,57.12,58.395] + ``` + +## Build + +### Windows + +```cmd +\openvino_2021\bin\setupvars.bat +mkdir -p build +cd build +cmake .. +msbuild picodet_demo.vcxproj /p:configuration=release /p:platform=x64 +``` + +### Linux +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +mkdir build +cd build +cmake .. +make +``` + + +## Run demo +Download PicoDet openvino model [PicoDet openvino model download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_openvino.zip). + +move picodet openvino model files to the demo's weight folder. + +### Edit file +``` +step1: +main.cpp +#define image_size 416 +... +auto detector = PicoDet("../weight/picodet_m_416.xml"); +... +step2: +picodet_openvino.h +#define image_size 416 +``` + +### Webcam + +```shell +picodet_demo 0 0 +``` + +### Inference images + +```shell +picodet_demo 1 IMAGE_FOLDER/*.jpg +``` + +### Inference video + +```shell +picodet_demo 2 VIDEO_PATH +``` + +### Benchmark + +```shell +picodet_demo 3 0 +``` diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/main.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..e24b6070fbcbe9b95a02a7cd07c68bea8afc165d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/main.cpp @@ -0,0 +1,302 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet + +#include "picodet_openvino.h" +#include +#include +#include +#include +#define image_size 416 + +struct object_rect { + int x; + int y; + int width; + int height; +}; + +int resize_uniform(cv::Mat &src, cv::Mat &dst, cv::Size dst_size, + object_rect &effect_area) { + int w = src.cols; + int h = src.rows; + int dst_w = dst_size.width; + int dst_h = dst_size.height; + dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0)); + + float ratio_src = w * 1.0 / h; + float ratio_dst = dst_w * 1.0 / dst_h; + + int tmp_w = 0; + int tmp_h = 0; + if (ratio_src > ratio_dst) { + tmp_w = dst_w; + tmp_h = floor((dst_w * 1.0 / w) * h); + } else if (ratio_src < ratio_dst) { + tmp_h = dst_h; + tmp_w = floor((dst_h * 1.0 / h) * w); + } else { + cv::resize(src, dst, dst_size); + effect_area.x = 0; + effect_area.y = 0; + effect_area.width = dst_w; + effect_area.height = dst_h; + return 0; + } + cv::Mat tmp; + cv::resize(src, tmp, cv::Size(tmp_w, tmp_h)); + + if (tmp_w != dst_w) { + int index_w = floor((dst_w - tmp_w) / 2.0); + for (int i = 0; i < dst_h; i++) { + memcpy(dst.data + i * dst_w * 3 + index_w * 3, tmp.data + i * tmp_w * 3, + tmp_w * 3); + } + effect_area.x = index_w; + effect_area.y = 0; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else if (tmp_h != dst_h) { + int index_h = floor((dst_h - tmp_h) / 2.0); + memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3); + effect_area.x = 0; + effect_area.y = index_h; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else { + printf("error\n"); + } + return 0; +} + +const int color_list[80][3] = { + {216, 82, 24}, {236, 176, 31}, {125, 46, 141}, {118, 171, 47}, + {76, 189, 237}, {238, 19, 46}, {76, 76, 76}, {153, 153, 153}, + {255, 0, 0}, {255, 127, 0}, {190, 190, 0}, {0, 255, 0}, + {0, 0, 255}, {170, 0, 255}, {84, 84, 0}, {84, 170, 0}, + {84, 255, 0}, {170, 84, 0}, {170, 170, 0}, {170, 255, 0}, + {255, 84, 0}, {255, 170, 0}, {255, 255, 0}, {0, 84, 127}, + {0, 170, 127}, {0, 255, 127}, {84, 0, 127}, {84, 84, 127}, + {84, 170, 127}, {84, 255, 127}, {170, 0, 127}, {170, 84, 127}, + {170, 170, 127}, {170, 255, 127}, {255, 0, 127}, {255, 84, 127}, + {255, 170, 127}, {255, 255, 127}, {0, 84, 255}, {0, 170, 255}, + {0, 255, 255}, {84, 0, 255}, {84, 84, 255}, {84, 170, 255}, + {84, 255, 255}, {170, 0, 255}, {170, 84, 255}, {170, 170, 255}, + {170, 255, 255}, {255, 0, 255}, {255, 84, 255}, {255, 170, 255}, + {42, 0, 0}, {84, 0, 0}, {127, 0, 0}, {170, 0, 0}, + {212, 0, 0}, {255, 0, 0}, {0, 42, 0}, {0, 84, 0}, + {0, 127, 0}, {0, 170, 0}, {0, 212, 0}, {0, 255, 0}, + {0, 0, 42}, {0, 0, 84}, {0, 0, 127}, {0, 0, 170}, + {0, 0, 212}, {0, 0, 255}, {0, 0, 0}, {36, 36, 36}, + {72, 72, 72}, {109, 109, 109}, {145, 145, 145}, {182, 182, 182}, + {218, 218, 218}, {0, 113, 188}, {80, 182, 188}, {127, 127, 0}, +}; + +void draw_bboxes(const cv::Mat &bgr, const std::vector &bboxes, + object_rect effect_roi) { + static const char *class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = bgr.clone(); + int src_w = image.cols; + int src_h = image.rows; + int dst_w = effect_roi.width; + int dst_h = effect_roi.height; + float width_ratio = (float)src_w / (float)dst_w; + float height_ratio = (float)src_h / (float)dst_h; + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo &bbox = bboxes[i]; + cv::Scalar color = + cv::Scalar(color_list[bbox.label][0], color_list[bbox.label][1], + color_list[bbox.label][2]); + cv::rectangle(image, + cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, + (bbox.y1 - effect_roi.y) * height_ratio), + cv::Point((bbox.x2 - effect_roi.x) * width_ratio, + (bbox.y2 - effect_roi.y) * height_ratio)), + color); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + int x = (bbox.x1 - effect_roi.x) * width_ratio; + int y = + (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine; + if (y < 0) + y = 0; + if (x + label_size.width > image.cols) + x = image.cols - label_size.width; + + cv::rectangle(image, cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, + label_size.height + baseLine)), + color, -1); + cv::putText(image, text, cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255)); + } + + cv::imwrite("../predict.jpg", image); +} + +int image_demo(PicoDet &detector, const char *imagepath) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name); + if (image.empty()) { + return -1; + } + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(image_size, image_size), + effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + draw_bboxes(image, results, effect_roi); + } + return 0; +} + +int webcam_demo(PicoDet &detector, int cam_id) { + cv::Mat image; + cv::VideoCapture cap(cam_id); + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(image_size, image_size), + effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + draw_bboxes(image, results, effect_roi); + cv::waitKey(1); + } + return 0; +} + +int video_demo(PicoDet &detector, const char *path) { + cv::Mat image; + cv::VideoCapture cap(path); + + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform(image, resized_img, cv::Size(image_size, image_size), + effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + draw_bboxes(image, results, effect_roi); + cv::waitKey(1); + } + return 0; +} + +int benchmark(PicoDet &detector) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(image_size, image_size, CV_8UC3, cv::Scalar(1, 1, 1)); + + for (int i = 0; i < warm_up + loop_num; i++) { + auto start = std::chrono::steady_clock::now(); + std::vector results; + results = detector.detect(image, 0.4, 0.5); + auto end = std::chrono::steady_clock::now(); + double time = + std::chrono::duration(end - start).count(); + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; + } + } + time_avg /= loop_num; + fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", + time_min, time_max, time_avg); + return 0; +} + +int main(int argc, char **argv) { + if (argc != 3) { + fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is " + "cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n " + "For video, mode=2; \n For benchmark, mode=3 path=0.\n", + argv[0]); + return -1; + } + std::cout << "start init model" << std::endl; + auto detector = PicoDet("../weight/picodet_m_416.xml"); + std::cout << "success" << std::endl; + + int mode = atoi(argv[1]); + switch (mode) { + case 0: { + int cam_id = atoi(argv[2]); + webcam_demo(detector, cam_id); + break; + } + case 1: { + const char *images = argv[2]; + image_demo(detector, images); + break; + } + case 2: { + const char *path = argv[2]; + video_demo(detector, path); + break; + } + case 3: { + benchmark(detector); + break; + } + default: { + fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is " + "cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n " + "For video, mode=2; \n For benchmark, mode=3 path=0.\n", + argv[0]); + break; + } + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.cpp new file mode 100644 index 0000000000000000000000000000000000000000..04b0c482d5738ffb97428efb0faf68f3d6a03e1a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.cpp @@ -0,0 +1,209 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_openvino + +#include "picodet_openvino.h" + +inline float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); } + +template +int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} + +PicoDet::PicoDet(const char *model_path) { + InferenceEngine::Core ie; + InferenceEngine::CNNNetwork model = ie.ReadNetwork(model_path); + // prepare input settings + InferenceEngine::InputsDataMap inputs_map(model.getInputsInfo()); + input_name_ = inputs_map.begin()->first; + InferenceEngine::InputInfo::Ptr input_info = inputs_map.begin()->second; + // prepare output settings + InferenceEngine::OutputsDataMap outputs_map(model.getOutputsInfo()); + for (auto &output_info : outputs_map) { + output_info.second->setPrecision(InferenceEngine::Precision::FP32); + } + + // get network + network_ = ie.LoadNetwork(model, "CPU"); + infer_request_ = network_.CreateInferRequest(); +} + +PicoDet::~PicoDet() {} + +void PicoDet::preprocess(cv::Mat &image, InferenceEngine::Blob::Ptr &blob) { + int img_w = image.cols; + int img_h = image.rows; + int channels = 3; + + InferenceEngine::MemoryBlob::Ptr mblob = + InferenceEngine::as(blob); + if (!mblob) { + THROW_IE_EXCEPTION + << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, " + << "but by fact we were not able to cast inputBlob to MemoryBlob"; + } + auto mblobHolder = mblob->wmap(); + float *blob_data = mblobHolder.as(); + + for (size_t c = 0; c < channels; c++) { + for (size_t h = 0; h < img_h; h++) { + for (size_t w = 0; w < img_w; w++) { + blob_data[c * img_w * img_h + h * img_w + w] = + (float)image.at(h, w)[c]; + } + } + } +} + +std::vector PicoDet::detect(cv::Mat image, float score_threshold, + float nms_threshold) { + InferenceEngine::Blob::Ptr input_blob = infer_request_.GetBlob(input_name_); + preprocess(image, input_blob); + + // do inference + infer_request_.Infer(); + + // get output + std::vector> results; + results.resize(this->num_class_); + + for (const auto &head_info : this->heads_info_) { + const InferenceEngine::Blob::Ptr dis_pred_blob = + infer_request_.GetBlob(head_info.dis_layer); + const InferenceEngine::Blob::Ptr cls_pred_blob = + infer_request_.GetBlob(head_info.cls_layer); + + auto mdis_pred = + InferenceEngine::as(dis_pred_blob); + auto mdis_pred_holder = mdis_pred->rmap(); + const float *dis_pred = mdis_pred_holder.as(); + + auto mcls_pred = + InferenceEngine::as(cls_pred_blob); + auto mcls_pred_holder = mcls_pred->rmap(); + const float *cls_pred = mcls_pred_holder.as(); + this->decode_infer(cls_pred, dis_pred, head_info.stride, score_threshold, + results); + } + + std::vector dets; + for (int i = 0; i < (int)results.size(); i++) { + this->nms(results[i], nms_threshold); + + for (auto &box : results[i]) { + dets.push_back(box); + } + } + return dets; +} + +void PicoDet::decode_infer(const float *&cls_pred, const float *&dis_pred, + int stride, float threshold, + std::vector> &results) { + int feature_h = ceil((float)input_size_ / stride); + int feature_w = ceil((float)input_size_ / stride); + for (int idx = 0; idx < feature_h * feature_w; idx++) { + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + + for (int label = 0; label < num_class_; label++) { + if (cls_pred[idx * num_class_ + label] > score) { + score = cls_pred[idx * num_class_ + label]; + cur_label = label; + } + } + if (score > threshold) { + const float *bbox_pred = dis_pred + idx * (reg_max_ + 1) * 4; + results[cur_label].push_back( + this->disPred2Bbox(bbox_pred, cur_label, score, col, row, stride)); + } + } +} + +BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score, + int x, int y, int stride) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float *dis_after_sm = new float[reg_max_ + 1]; + activation_function_softmax(dfl_det + i * (reg_max_ + 1), dis_after_sm, + reg_max_ + 1); + for (int j = 0; j < reg_max_ + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + float xmin = (std::max)(ct_x - dis_pred[0], .0f); + float ymin = (std::max)(ct_y - dis_pred[1], .0f); + float xmax = (std::min)(ct_x + dis_pred[2], (float)this->input_size_); + float ymax = (std::min)(ct_y + dis_pred[3], (float)this->input_size_); + return BoxInfo{xmin, ymin, xmax, ymax, score, label}; +} + +void PicoDet::nms(std::vector &input_boxes, float NMS_THRESH) { + std::sort(input_boxes.begin(), input_boxes.end(), + [](BoxInfo a, BoxInfo b) { return a.score > b.score; }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) * + (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1); + float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1); + float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2); + float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= NMS_THRESH) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } else { + j++; + } + } + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.h new file mode 100644 index 0000000000000000000000000000000000000000..2a5bced16a3c3d57096adbdfa263b634c74377db --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/picodet_openvino.h @@ -0,0 +1,75 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_openvino + +#ifndef _PICODET_OPENVINO_H_ +#define _PICODET_OPENVINO_H_ + +#include +#include +#include + +#define image_size 416 + +typedef struct HeadInfo { + std::string cls_layer; + std::string dis_layer; + int stride; +} HeadInfo; + +typedef struct BoxInfo { + float x1; + float y1; + float x2; + float y2; + float score; + int label; +} BoxInfo; + +class PicoDet { +public: + PicoDet(const char *param); + + ~PicoDet(); + + InferenceEngine::ExecutableNetwork network_; + InferenceEngine::InferRequest infer_request_; + // static bool hasGPU; + + std::vector heads_info_{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; + + std::vector detect(cv::Mat image, float score_threshold, + float nms_threshold); + +private: + void preprocess(cv::Mat &image, InferenceEngine::Blob::Ptr &blob); + void decode_infer(const float *&cls_pred, const float *&dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); + std::string input_name_; + int input_size_ = image_size; + int num_class_ = 80; + int reg_max_ = 7; +}; + +#endif diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/README.md new file mode 100644 index 0000000000000000000000000000000000000000..52a8adb10af61594d6b6205cf09881e735ba2aaf --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/README.md @@ -0,0 +1,75 @@ +# PicoDet OpenVINO Benchmark Demo + +本文件夹提供利用[Intel's OpenVINO Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)进行PicoDet测速的Benchmark Demo与带后处理的模型Inference Demo。 + +## 安装 OpenVINO Toolkit + +前往 [OpenVINO HomePage](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html),下载对应版本并安装。 + +本demo安装的是 OpenVINO 2022.1.0,可直接运行如下指令安装: +```shell +pip install openvino==2022.1.0 +``` + +详细安装步骤,可参考[OpenVINO官网](https://docs.openvinotoolkit.org/latest/get_started_guides.html) + +## Benchmark测试 + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet)中【导出及转换模型】步骤,采用不包含后处理的方式导出模型(`-o export.benchmark=True` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```out_onnxsim```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:本demo默认利用PaddleDetection/demo/[000000014439.jpg](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/demo/000000014439.jpg) + +- 在本目录下直接运行: + +```shell +# Linux +python openvino_benchmark.py --img_path ../../../../demo/000000014439.jpg --onnx_path out_onnxsim/picodet_s_320_coco_lcnet.onnx --in_shape 320 +# Windows +python openvino_benchmark.py --img_path ..\..\..\..\demo\000000014439.jpg --onnx_path out_onnxsim\picodet_s_320_coco_lcnet.onnx --in_shape 320 +``` +- 注意:```--in_shape```为对应模型输入size,默认为320 + +## 真实图片测试(网络包含后处理,但不包含NMS) + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet)中【导出及转换模型】步骤,采用**包含后处理**但**不包含NMS**的方式导出模型(`-o export.benchmark=False export.nms=False` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```out_onnxsim_infer```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:默认利用../../demo_onnxruntime/imgs/[bus.jpg](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg) + +```shell +# Linux +python openvino_infer.py --img_path ../../demo_onnxruntime/imgs/bus.jpg --onnx_path out_onnxsim_infer/picodet_s_320_postproccesed_woNMS.onnx --in_shape 320 +# Windows +python openvino_infer.py --img_path ..\..\demo_onnxruntime\imgs\bus.jpg --onnx_path out_onnxsim_infer\picodet_s_320_postproccesed_woNMS.onnx --in_shape 320 +``` + +### 真实图片测试(网络不包含后处理) + +```shell +# Linux +python openvino_benchmark.py --benchmark 0 --img_path ../../../../demo/000000014439.jpg --onnx_path out_onnxsim/picodet_s_320_coco_lcnet.onnx --in_shape 320 +# Windows +python openvino_benchmark.py --benchmark 0 --img_path ..\..\..\..\demo\000000014439.jpg --onnx_path out_onnxsim\picodet_s_320_coco_lcnet.onnx --in_shape 320 +``` + +- 结果: +
    + +
    + +## Benchmark结果 + +- 测速结果如下: + +| 模型 | 输入尺寸 | ONNX | 预测时延[CPU](#latency)| +| :-------- | :--------: | :---------------------: | :----------------: | +| PicoDet-XS | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | 3.9ms | +| PicoDet-XS | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | 6.1ms | +| PicoDet-S | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | 4.8ms | +| PicoDet-S | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | 6.6ms | +| PicoDet-M | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | 8.2ms | +| PicoDet-M | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | 12.7ms | +| PicoDet-L | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | 11.5ms | +| PicoDet-L | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | 20.7ms | +| PicoDet-L | 640*640 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) | 62.5ms | + +- 测试环境: 英特尔酷睿i7 10750H CPU。 diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/coco_label.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/coco_label.txt new file mode 100644 index 0000000000000000000000000000000000000000..ca76c80b5b2cd0b25047f75736656cfebc9da7aa --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/coco_label.txt @@ -0,0 +1,80 @@ +person +bicycle +car +motorbike +aeroplane +bus +train +truck +boat +traffic light +fire hydrant +stop sign +parking meter +bench +bird +cat +dog +horse +sheep +cow +elephant +bear +zebra +giraffe +backpack +umbrella +handbag +tie +suitcase +frisbee +skis +snowboard +sports ball +kite +baseball bat +baseball glove +skateboard +surfboard +tennis racket +bottle +wine glass +cup +fork +knife +spoon +bowl +banana +apple +sandwich +orange +broccoli +carrot +hot dog +pizza +donut +cake +chair +sofa +pottedplant +bed +diningtable +toilet +tvmonitor +laptop +mouse +remote +keyboard +cell phone +microwave +oven +toaster +sink +refrigerator +book +clock +vase +scissors +teddy bear +hair drier +toothbrush diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_benchmark.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_benchmark.py new file mode 100644 index 0000000000000000000000000000000000000000..f21a8d5d1ed83c159818d2b405d1b5c9e5daa927 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_benchmark.py @@ -0,0 +1,365 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import time +import argparse +from scipy.special import softmax +from openvino.runtime import Core + + +def image_preprocess(img_path, re_shape): + img = cv2.imread(img_path) + img = cv2.resize( + img, (re_shape, re_shape), interpolation=cv2.INTER_LANCZOS4) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.transpose(img, [2, 0, 1]) / 255 + img = np.expand_dims(img, 0) + img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) + img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + return img.astype(np.float32) + + +def draw_box(img, results, class_label, scale_x, scale_y): + + label_list = list( + map(lambda x: x.strip(), open(class_label, 'r').readlines())) + + for i in range(len(results)): + print(label_list[int(results[i][0])], ':', results[i][1]) + bbox = results[i, 2:] + label_id = int(results[i, 0]) + score = results[i, 1] + if (score > 0.20): + xmin, ymin, xmax, ymax = [ + int(bbox[0] * scale_x), int(bbox[1] * scale_y), + int(bbox[2] * scale_x), int(bbox[3] * scale_y) + ] + cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 3) + font = cv2.FONT_HERSHEY_SIMPLEX + label_text = label_list[label_id] + cv2.rectangle(img, (xmin, ymin), (xmax, ymin - 60), (0, 255, 0), -1) + cv2.putText(img, "#" + label_text, (xmin, ymin - 10), font, 1, + (255, 255, 255), 2, cv2.LINE_AA) + cv2.putText(img, + str(round(score, 3)), (xmin, ymin - 40), font, 0.8, + (255, 255, 255), 2, cv2.LINE_AA) + return img + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetPostProcess(object): + """ + Args: + input_shape (int): network input image size + ori_shape (int): ori image shape of before padding + scale_factor (float): scale factor of ori image + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + input_shape, + ori_shape, + scale_factor, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.ori_shape = ori_shape + self.input_shape = input_shape + self.scale_factor = scale_factor + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def warp_boxes(self, boxes, ori_shape): + """Apply transform to boxes + """ + width, height = ori_shape[1], ori_shape[0] + n = len(boxes) + if n: + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + # xy = xy @ M.T # transform + xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate( + (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + # clip boxes + xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) + xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) + return xy.astype(np.float32) + else: + return boxes + + def __call__(self, scores, raw_boxes): + batch_size = raw_boxes[0].shape[0] + reg_max = int(raw_boxes[0].shape[-1] / 4 - 1) + out_boxes_num = [] + out_boxes_list = [] + for batch_id in range(batch_size): + # generate centers + decode_boxes = [] + select_scores = [] + for stride, box_distribute, score in zip(self.strides, raw_boxes, + scores): + box_distribute = box_distribute[batch_id] + score = score[batch_id] + # centers + fm_h = self.input_shape[0] / stride + fm_w = self.input_shape[1] / stride + h_range = np.arange(fm_h) + w_range = np.arange(fm_w) + ww, hh = np.meshgrid(w_range, h_range) + ct_row = (hh.flatten() + 0.5) * stride + ct_col = (ww.flatten() + 0.5) * stride + center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) + + # box distribution to distance + reg_range = np.arange(reg_max + 1) + box_distance = box_distribute.reshape((-1, reg_max + 1)) + box_distance = softmax(box_distance, axis=1) + box_distance = box_distance * np.expand_dims(reg_range, axis=0) + box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) + box_distance = box_distance * stride + + # top K candidate + topk_idx = np.argsort(score.max(axis=1))[::-1] + topk_idx = topk_idx[:self.nms_top_k] + center = center[topk_idx] + score = score[topk_idx] + box_distance = box_distance[topk_idx] + + # decode box + decode_box = center + [-1, -1, 1, 1] * box_distance + + select_scores.append(score) + decode_boxes.append(decode_box) + + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + out_boxes_num.append(0) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, :4] = self.warp_boxes( + picked_box_probs[:, :4], self.ori_shape[batch_id]) + im_scale = np.concatenate([ + self.scale_factor[batch_id][::-1], + self.scale_factor[batch_id][::-1] + ]) + picked_box_probs[:, :4] /= im_scale + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + out_boxes_num.append(len(picked_labels)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + out_boxes_num = np.asarray(out_boxes_num).astype(np.int32) + return out_boxes_list, out_boxes_num + + +def detect(img_file, compiled_model, re_shape, class_label): + output = compiled_model.infer_new_request({0: test_image}) + result_ie = list(output.values()) #[0] + + test_im_shape = np.array([[re_shape, re_shape]]).astype('float32') + test_scale_factor = np.array([[1, 1]]).astype('float32') + + np_score_list = [] + np_boxes_list = [] + + num_outs = int(len(result_ie) / 2) + for out_idx in range(num_outs): + np_score_list.append(result_ie[out_idx]) + np_boxes_list.append(result_ie[out_idx + num_outs]) + + postprocess = PicoDetPostProcess(test_image.shape[2:], test_im_shape, + test_scale_factor) + + np_boxes, np_boxes_num = postprocess(np_score_list, np_boxes_list) + + image = cv2.imread(img_file, 1) + scale_x = image.shape[1] / test_image.shape[3] + scale_y = image.shape[0] / test_image.shape[2] + res_image = draw_box(image, np_boxes, class_label, scale_x, scale_y) + + cv2.imwrite('res.jpg', res_image) + cv2.imshow("res", res_image) + cv2.waitKey() + + +def benchmark(test_image, compiled_model): + + # benchmark + loop_num = 100 + warm_up = 8 + timeall = 0 + time_min = float("inf") + time_max = float('-inf') + + for i in range(loop_num + warm_up): + time0 = time.time() + #perform the inference step + + output = compiled_model.infer_new_request({0: test_image}) + time1 = time.time() + timed = time1 - time0 + + if i >= warm_up: + timeall = timeall + timed + time_min = min(time_min, timed) + time_max = max(time_max, timed) + + time_avg = timeall / loop_num + + print('inference_time(ms): min={}, max={}, avg={}'.format( + round(time_min * 1000, 2), + round(time_max * 1000, 1), round(time_avg * 1000, 1))) + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser() + parser.add_argument( + '--benchmark', type=int, default=1, help="0:detect; 1:benchmark") + parser.add_argument( + '--img_path', + type=str, + default='../../../../demo/000000014439.jpg', + help="image path") + parser.add_argument( + '--onnx_path', + type=str, + default='out_onnxsim/picodet_s_320_processed.onnx', + help="onnx filepath") + parser.add_argument('--in_shape', type=int, default=320, help="input_size") + parser.add_argument( + '--class_label', + type=str, + default='coco_label.txt', + help="class label file") + args = parser.parse_args() + + ie = Core() + net = ie.read_model(args.onnx_path) + test_image = image_preprocess(args.img_path, args.in_shape) + compiled_model = ie.compile_model(net, 'CPU') + + if args.benchmark == 0: + detect(args.img_path, compiled_model, args.in_shape, args.class_label) + if args.benchmark == 1: + benchmark(test_image, compiled_model) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_infer.py b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..0ad51022b1793e7b6430025a7c71cc0de7658c8c --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino/python/openvino_infer.py @@ -0,0 +1,267 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import argparse +from scipy.special import softmax +from openvino.runtime import Core + + +def image_preprocess(img_path, re_shape): + img = cv2.imread(img_path) + img = cv2.resize( + img, (re_shape, re_shape), interpolation=cv2.INTER_LANCZOS4) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.transpose(img, [2, 0, 1]) / 255 + img = np.expand_dims(img, 0) + img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) + img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + return img.astype(np.float32) + + +def get_color_map_list(num_classes): + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + +def draw_box(srcimg, results, class_label): + label_list = list( + map(lambda x: x.strip(), open(class_label, 'r').readlines())) + for i in range(len(results)): + color_list = get_color_map_list(len(label_list)) + clsid2color = {} + classid, conf = int(results[i, 0]), results[i, 1] + xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int( + results[i, 4]), int(results[i, 5]) + + if classid not in clsid2color: + clsid2color[classid] = color_list[classid] + color = tuple(clsid2color[classid]) + + cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2) + print(label_list[classid] + ': ' + str(round(conf, 3))) + cv2.putText( + srcimg, + label_list[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10), + cv2.FONT_HERSHEY_SIMPLEX, + 0.8, (0, 255, 0), + thickness=2) + return srcimg + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetNMS(object): + """ + Args: + input_shape (int): network input image size + scale_factor (float): scale factor of ori image + """ + + def __init__(self, + input_shape, + scale_x, + scale_y, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.input_shape = input_shape + self.scale_x = scale_x + self.scale_y = scale_y + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def __call__(self, decode_boxes, select_scores): + batch_size = 1 + out_boxes_list = [] + for batch_id in range(batch_size): + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, 0] *= self.scale_x + picked_box_probs[:, 2] *= self.scale_x + picked_box_probs[:, 1] *= self.scale_y + picked_box_probs[:, 3] *= self.scale_y + + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + return out_boxes_list + + +def detect(img_file, compiled_model, class_label): + output = compiled_model.infer_new_request({0: test_image}) + result_ie = list(output.values()) + + decode_boxes = [] + select_scores = [] + num_outs = int(len(result_ie) / 2) + for out_idx in range(num_outs): + decode_boxes.append(result_ie[out_idx]) + select_scores.append(result_ie[out_idx + num_outs]) + + image = cv2.imread(img_file, 1) + scale_x = image.shape[1] / test_image.shape[3] + scale_y = image.shape[0] / test_image.shape[2] + + nms = PicoDetNMS(test_image.shape[2:], scale_x, scale_y) + np_boxes = nms(decode_boxes, select_scores) + + res_image = draw_box(image, np_boxes, class_label) + + cv2.imwrite('res.jpg', res_image) + cv2.imshow("res", res_image) + cv2.waitKey() + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser() + parser.add_argument( + '--img_path', + type=str, + default='../../demo_onnxruntime/imgs/bus.jpg', + help="image path") + parser.add_argument( + '--onnx_path', + type=str, + default='out_onnxsim_infer/picodet_s_320_postproccesed_woNMS.onnx', + help="onnx filepath") + parser.add_argument('--in_shape', type=int, default=320, help="input_size") + parser.add_argument( + '--class_label', + type=str, + default='coco_label.txt', + help="class label file") + args = parser.parse_args() + + ie = Core() + net = ie.read_model(args.onnx_path) + test_image = image_preprocess(args.img_path, args.in_shape) + compiled_model = ie.compile_model(net, 'CPU') + + detect(args.img_path, compiled_model, args.class_label) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/CMakeLists.txt b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..4692f1cca12dcf544dcaa375b740e356135bac4a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/CMakeLists.txt @@ -0,0 +1,23 @@ +cmake_minimum_required(VERSION 3.4.1) +set(CMAKE_CXX_STANDARD 14) + +project(tinypose_demo) + +find_package(OpenCV REQUIRED) +find_package(InferenceEngine REQUIRED) +find_package(ngraph REQUIRED) + +include_directories( + ${OpenCV_INCLUDE_DIRS} + ${CMAKE_CURRENT_SOURCE_DIR} + ${CMAKE_CURRENT_BINARY_DIR} +) + +add_executable(tinypose_demo main.cpp picodet_openvino.cpp keypoint_detector.cpp keypoint_postprocess.cpp) + +target_link_libraries( + tinypose_demo + ${InferenceEngine_LIBRARIES} + ${NGRAPH_LIBRARIES} + ${OpenCV_LIBS} +) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/README.md b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d7d7ce0de80eabcfeffd580d920a25e1341f575b --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/README.md @@ -0,0 +1,227 @@ +# TinyPose OpenVINO Demo + +This fold provides TinyPose inference code using +[Intel's OpenVINO Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html). Most of the implements in this fold are same as *demo_ncnn*. +**Recommand** +1. To use the xxx.tar.gz file to install instead of github method, [link](https://registrationcenter-download.intel.com/akdlm/irc_nas/18096/l_openvino_toolkit_p_2021.4.689.tgz). +2. Your can also deploy openvino with docker, the command is : +``` +docker pull openvino/ubuntu18_dev:2021.4.1 +``` + +## Install OpenVINO Toolkit + +Go to [OpenVINO HomePage](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) + +Download a suitable version and install. + +Follow the official Get Started Guides: https://docs.openvinotoolkit.org/latest/get_started_guides.html + +## Set the Environment Variables + +### Windows: + +Run this command in cmd. (Every time before using OpenVINO) +```cmd +\openvino_2021\bin\setupvars.bat +``` + +Or set the system environment variables once for all: + +Name |Value +:--------------------:|:--------: +INTEL_OPENVINO_DIR | \openvino_2021 +INTEL_CVSDK_DIR | %INTEL_OPENVINO_DIR% +InferenceEngine_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\share +HDDL_INSTALL_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\external\hddl +ngraph_DIR | %INTEL_OPENVINO_DIR%\deployment_tools\ngraph\cmake + +And add this to ```Path``` +``` +%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\bin\intel64\Debug;%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\bin\intel64\Release;%HDDL_INSTALL_DIR%\bin;%INTEL_OPENVINO_DIR%\deployment_tools\inference_engine\external\tbb\bin;%INTEL_OPENVINO_DIR%\deployment_tools\ngraph\lib +``` + +### Linux + +Run this command in shell. (Every time before using OpenVINO) + +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +``` + +Or edit .bashrc + +```shell +vi ~/.bashrc +``` + +Add this line to the end of the file + +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +``` + +## Convert model + + **1. Conver to onnx** + + Create picodet_m_416_coco.onnx and tinypose256.onnx + + example: + + ```shell + modelName=picodet_m_416_coco + # export model + python tools/export_model.py \ + -c configs/picodet/${modelName}.yml \ + -o weights=${modelName}.pdparams \ + --output_dir=inference_model + # convert to onnx + paddle2onnx --model_dir inference_model/${modelName} \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file ${modelName}.onnx + # onnxsim + python -m onnxsim ${modelName}.onnx ${modelName}_sim.onnx + ``` + + **2.Convert to OpenVINO** + + ``` shell + cd /openvino_2021/deployment_tools/model_optimizer + ``` + + Install requirements for convert tool + + ```shell + cd ./install_prerequisites + sudo install_prerequisites_onnx.sh + + ``` + + Then convert model. Notice: mean_values and scale_values should be the same with your training settings in YAML config file. + ```shell + mo_onnx.py --input_model --mean_values [103.53,116.28,123.675] --scale_values [57.375,57.12,58.395] --input_shape [1,3,256,192] + ``` + + **Note: The new version of openvino convert tools may cause error in Resize op. If you has problem with this, please try the version: openvino_2021.4.689** + +## Build + +### Windows + +```cmd +\openvino_2021\bin\setupvars.bat +mkdir -p build +cd build +cmake .. +msbuild tinypose_demo.vcxproj /p:configuration=release /p:platform=x64 +``` + +### Linux +```shell +source /opt/intel/openvino_2021/bin/setupvars.sh +mkdir build +cd build +cmake .. +make +``` + + +## Run demo + +Download PicoDet openvino model [PicoDet openvino model download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_openvino.zip). + +Download TinyPose openvino model [TinyPose openvino model download link](https://bj.bcebos.com/v1/paddledet/deploy/third_engine/demo_openvino_kpts.tar.gz), the origin paddlepaddle model is [Tinypose256](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192.pdparams). + +move picodet and tinypose openvino model files to the demo's weight folder. + +Note: +1. The model output node name may update by new version of paddle\paddle2onnx\onnxsim\openvino, please checkout your own model output node when the code can't find "conv2d_441.tmp_1"\"argmax_0.tmp_0". +2. If you happened with this error "Cannot find blob with name: transpose_1.tmp_0", it means your picodet model is oldversion. you can modify the below code to fix it. + +``` +#picodet_openvino.h line 50-54 + + std::vector heads_info_{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; + + modify to: + + std::vector heads_info_{ + // cls_pred|dis_pred|stride + {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, + {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, + {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, + {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, + }; +``` + +3. you can view your onnx model with [Netron](https://netron.app/). + +### Edit file +``` +step1: +main.cpp +#define image_size 416 +... + cv::Mat image(256, 192, CV_8UC3, cv::Scalar(1, 1, 1)); + std::vector center = {128, 96}; + std::vector scale = {256, 192}; +... + auto detector = PicoDet("../weight/picodet_m_416.xml"); + auto kpts_detector = new KeyPointDetector("../weight/tinypose256.xml", -1, 256, 192); +... +step2: +picodet_openvino.h +#define image_size 416 +``` + +### Run + +Run command: +``` shell +./tinypose_demo [mode] [image_file] +``` +| param | detail | +| ---- | ---- | +| --mode | input mode,0:camera;1:image;2:video;3:benchmark | +| --image_file | input image path | + +#### Webcam + +```shell +tinypose_demo 0 0 +``` + +#### Inference images + +```shell +tinypose_demo 1 IMAGE_FOLDER/*.jpg +``` + +#### Inference video + +```shell +tinypose_demo 2 VIDEO_PATH +``` + +### Benchmark + +```shell +tinypose_demo 3 0 +``` + +Plateform: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz x 24(核) +Model: [Tinypose256_Openvino](https://paddledet.bj.bcebos.com/deploy/third_engine/tinypose_256_openvino.zip) + +| param | Min | Max | Avg | +| ------------- | ----- | ----- | ----- | +| infer time(s) | 0.018 | 0.062 | 0.028 | + diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.cpp new file mode 100644 index 0000000000000000000000000000000000000000..4200dd93b3375fa1d9c511aaacd4ebf4e0903189 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.cpp @@ -0,0 +1,207 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +#include +// for setprecision +#include +#include +#include "keypoint_detector.h" + +namespace PaddleDetection { + +// Visualiztion MaskDetector results +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold) { + const int edge[][2] = {{0, 1}, + {0, 2}, + {1, 3}, + {2, 4}, + {3, 5}, + {4, 6}, + {5, 7}, + {6, 8}, + {7, 9}, + {8, 10}, + {5, 11}, + {6, 12}, + {11, 13}, + {12, 14}, + {13, 15}, + {14, 16}, + {11, 12}}; + cv::Mat vis_img = img.clone(); + for (int batchid = 0; batchid < results.size(); batchid++) { + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[i * 3] > threshold) { + int x_coord = int(results[batchid].keypoints[i * 3 + 1]); + int y_coord = int(results[batchid].keypoints[i * 3 + 2]); + cv::circle(vis_img, + cv::Point2d(x_coord, y_coord), + 1, + cv::Scalar(0, 0, 255), + 2); + } + } + for (int i = 0; i < results[batchid].num_joints; i++) { + if (results[batchid].keypoints[edge[i][0] * 3] > threshold && + results[batchid].keypoints[edge[i][1] * 3] > threshold) { + int x_start = int(results[batchid].keypoints[edge[i][0] * 3 + 1]); + int y_start = int(results[batchid].keypoints[edge[i][0] * 3 + 2]); + int x_end = int(results[batchid].keypoints[edge[i][1] * 3 + 1]); + int y_end = int(results[batchid].keypoints[edge[i][1] * 3 + 2]); + cv::line(vis_img, + cv::Point2d(x_start, y_start), + cv::Point2d(x_end, y_end), + colormap[i], + 1); + } + } + } + return vis_img; +} + +void KeyPointDetector::Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center_bs, + std::vector>& scale_bs) { + std::vector preds(output_shape[1] * 3, 0); + for (int batchid = 0; batchid < output_shape[0]; batchid++) { + get_final_preds(output, + output_shape, + idxout, + idx_shape, + center_bs[batchid], + scale_bs[batchid], + preds, + batchid, + this->use_dark()); + KeyPointResult result_item; + result_item.num_joints = output_shape[1]; + result_item.keypoints.clear(); + for (int i = 0; i < output_shape[1]; i++) { + result_item.keypoints.emplace_back(preds[i * 3]); + result_item.keypoints.emplace_back(preds[i * 3 + 1]); + result_item.keypoints.emplace_back(preds[i * 3 + 2]); + } + result->push_back(result_item); + } +} + +void KeyPointDetector::Predict(const std::vector imgs, + std::vector>& center_bs, + std::vector>& scale_bs, + std::vector* result) { + int batch_size = imgs.size(); + auto insize = 3 * in_h * in_w; + + InferenceEngine::Blob::Ptr input_blob = infer_request_.GetBlob(input_name_); + // Preprocess image + InferenceEngine::MemoryBlob::Ptr mblob = + InferenceEngine::as(input_blob); + if (!mblob) { + THROW_IE_EXCEPTION + << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, " + << "but by fact we were not able to cast inputBlob to MemoryBlob"; + } + auto mblobHolder = mblob->wmap(); + float* blob_data = mblobHolder.as(); + + cv::Mat resized_im; + for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) { + cv::Mat im = imgs.at(bs_idx); + + cv::resize(im, resized_im, cv::Size(in_w, in_h)); + for (size_t c = 0; c < 3; c++) { + for (size_t h = 0; h < in_h; h++) { + for (size_t w = 0; w < in_w; w++) { + blob_data[c * in_w * in_h + h * in_w + w] = + (float)resized_im.at(h, w)[c]; + } + } + } + } + // Run predictor + auto inference_start = std::chrono::steady_clock::now(); + // do inference + infer_request_.Infer(); + + InferenceEngine::Blob::Ptr output_blob = + infer_request_.GetBlob("conv2d_441.tmp_1"); + auto output_shape = output_blob->getTensorDesc().getDims(); + InferenceEngine::MemoryBlob::Ptr moutput = + InferenceEngine::as(output_blob); + + if (moutput) { + // locked memory holder should be alive all time while access to its + // buffer happens + auto minputHolder = moutput->rmap(); + + auto data = minputHolder.as::value_type*>(); + + // Calculate output length + int output_size = 1; + for (int j = 0; j < output_shape.size(); ++j) { + output_size *= output_shape[j]; + } + + output_data_.resize(output_size); + std::copy_n(data, output_size, output_data_.data()); + + } + + + InferenceEngine::Blob::Ptr output_blob2 = + infer_request_.GetBlob("argmax_0.tmp_0"); + auto idx_shape = output_blob2->getTensorDesc().getDims(); + InferenceEngine::MemoryBlob::Ptr moutput2 = + InferenceEngine::as(output_blob2); + + if (moutput2) { + // locked memory holder should be alive all time while access to its + // buffer happens + auto minputHolder = moutput2->rmap(); + // Original I64 precision was converted to I32 + auto data = minputHolder.as::value_type*>(); + + // Calculate output length + int output_size = 1; + for (int j = 0; j < idx_shape.size(); ++j) { + output_size *= idx_shape[j]; + } + + idx_data_.resize(output_size); + std::copy_n(data, output_size, idx_data_.data()); + } + + auto inference_end = std::chrono::steady_clock::now(); + std::chrono::duration elapsed = inference_end - inference_start; + printf("keypoint inference time: %f s\n", elapsed.count()); + + // Postprocessing result + Postprocess(output_data_, + output_shape, + idx_data_, + idx_shape, + result, + center_bs, + scale_bs); +} + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.h new file mode 100644 index 0000000000000000000000000000000000000000..e72e63dcc30bfacff21181b383ecbc23a580438d --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_detector.h @@ -0,0 +1,118 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include + +#include + +#include "keypoint_postprocess.h" + +namespace PaddleDetection { +// Object KeyPoint Result +struct KeyPointResult { + // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf + std::vector keypoints; + int num_joints = -1; +}; + +// Visualiztion KeyPoint Result +cv::Mat VisualizeKptsResult(const cv::Mat& img, + const std::vector& results, + const std::vector& colormap, + float threshold = 0.2); + +class KeyPointDetector { + public: + explicit KeyPointDetector(const std::string& model_path, + int input_height = 256, + int input_width = 192, + float score_threshold = 0.3, + const int batch_size = 1, + bool use_dark = true) { + use_dark_ = use_dark; + + in_w = input_width; + in_h = input_height; + threshold_ = score_threshold; + + InferenceEngine::Core ie; + auto model = ie.ReadNetwork(model_path); + // prepare input settings + InferenceEngine::InputsDataMap inputs_map(model.getInputsInfo()); + input_name_ = inputs_map.begin()->first; + InferenceEngine::InputInfo::Ptr input_info = inputs_map.begin()->second; + // prepare output settings + InferenceEngine::OutputsDataMap outputs_map(model.getOutputsInfo()); + int idx = 0; + for (auto& output_info : outputs_map) { + if (idx == 0) { + output_info.second->setPrecision(InferenceEngine::Precision::FP32); + } else { + output_info.second->setPrecision(InferenceEngine::Precision::FP32); + } + idx++; + } + + // get network + network_ = ie.LoadNetwork(model, "CPU"); + infer_request_ = network_.CreateInferRequest(); + } + + // Load Paddle inference model + void LoadModel(std::string model_file, int num_theads); + + // Run predictor + void Predict(const std::vector imgs, + std::vector>& center, + std::vector>& scale, + std::vector* result = nullptr); + + bool use_dark() { return this->use_dark_; } + + inline float get_threshold() { return threshold_; }; + + int in_w = 128; + int in_h = 256; + + private: + // Postprocess result + void Postprocess(std::vector& output, + std::vector& output_shape, + std::vector& idxout, + std::vector& idx_shape, + std::vector* result, + std::vector>& center, + std::vector>& scale); + + std::vector output_data_; + std::vector idx_data_; + float threshold_; + bool use_dark_; + + InferenceEngine::ExecutableNetwork network_; + InferenceEngine::InferRequest infer_request_; + std::string input_name_; +}; + +} // namespace PaddleDetection diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.cpp new file mode 100644 index 0000000000000000000000000000000000000000..65430ab1f07c0690aad8a26d5d3abda52badd9c4 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.cpp @@ -0,0 +1,273 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "keypoint_postprocess.h" +#define PI 3.1415926535 +#define HALF_CIRCLE_DEGREE 180 + +cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) { + cv::Point2f direct{a.x - b.x, a.y - b.y}; + return cv::Point2f(a.x - direct.y, a.y + direct.x); +} + +std::vector get_dir(float src_point_x, + float src_point_y, + float rot_rad) { + float sn = sin(rot_rad); + float cs = cos(rot_rad); + std::vector src_result{0.0, 0.0}; + src_result[0] = src_point_x * cs - src_point_y * sn; + src_result[1] = src_point_x * sn + src_point_y * cs; + return src_result; +} + +void affine_tranform( + float pt_x, float pt_y, cv::Mat& trans, std::vector& preds, int p) { + double new1[3] = {pt_x, pt_y, 1.0}; + cv::Mat new_pt(3, 1, trans.type(), new1); + cv::Mat w = trans * new_pt; + preds[p * 3 + 1] = static_cast(w.at(0, 0)); + preds[p * 3 + 2] = static_cast(w.at(1, 0)); +} + +void get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + cv::Mat& trans, + int inv) { + float src_w = scale[0]; + float dst_w = static_cast(output_size[0]); + float dst_h = static_cast(output_size[1]); + float rot_rad = rot * PI / HALF_CIRCLE_DEGREE; + std::vector src_dir = get_dir(-0.5 * src_w, 0, rot_rad); + std::vector dst_dir{static_cast(-0.5) * dst_w, 0.0}; + cv::Point2f srcPoint2f[3], dstPoint2f[3]; + srcPoint2f[0] = cv::Point2f(center[0], center[1]); + srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]); + srcPoint2f[2] = get_3rd_point(srcPoint2f[0], srcPoint2f[1]); + + dstPoint2f[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstPoint2f[1] = + cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]); + dstPoint2f[2] = get_3rd_point(dstPoint2f[0], dstPoint2f[1]); + if (inv == 0) { + trans = cv::getAffineTransform(srcPoint2f, dstPoint2f); + } else { + trans = cv::getAffineTransform(dstPoint2f, srcPoint2f); + } +} + +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine=false) { + if (affine) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform( + coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } + } else { + float heat_w = static_cast(output_size[0]); + float heat_h = static_cast(output_size[1]); + float x_scale = scale[0] / heat_w; + float y_scale = scale[1] / heat_h; + float offset_x = center[0] - scale[0] / 2.; + float offset_y = center[1] - scale[1] / 2.; + for (int i = 0; i < dim[1]; i++) { + target_coords[i * 3 + 1] = x_scale * coords[i * 2] + offset_x; + target_coords[i * 3 + 2] = y_scale * coords[i * 2 + 1] + offset_y; + } + } +} + +// only for batchsize == 1 +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx) { + int num_joints = dim[1]; + int width = dim[3]; + std::vector idx; + idx.resize(num_joints * 2); + + for (int j = 0; j < dim[1]; j++) { + float* index = &( + heatmap[batchid * num_joints * dim[2] * dim[3] + j * dim[2] * dim[3]]); + float* end = index + dim[2] * dim[3]; + float* max_dis = std::max_element(index, end); + auto max_id = std::distance(index, max_dis); + maxvals[j] = *max_dis; + if (*max_dis > 0) { + preds[j * 2] = static_cast(max_id % width); + preds[j * 2 + 1] = static_cast(max_id / width); + } + } +} + +void dark_parse(std::vector& heatmap, + std::vector& dim, + std::vector& coords, + int px, + int py, + int index, + int ch) { + /*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate + Representation for Human Pose Estimation (CVPR 2020). + 1) offset = - hassian.inv() * derivative + 2) dx = (heatmap[x+1] - heatmap[x-1])/2. + 3) dxx = (dx[x+1] - dx[x-1])/2. + 4) derivative = Mat([dx, dy]) + 5) hassian = Mat([[dxx, dxy], [dxy, dyy]]) + */ + std::vector::const_iterator first1 = heatmap.begin() + index; + std::vector::const_iterator last1 = + heatmap.begin() + index + dim[2] * dim[3]; + std::vector heatmap_ch(first1, last1); + cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0, dim[2]); + heatmap_mat.convertTo(heatmap_mat, CV_32FC1); + cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0); + heatmap_mat = heatmap_mat.reshape(1, 1); + heatmap_ch = std::vector(heatmap_mat.reshape(1, 1)); + + float epsilon = 1e-10; + // sample heatmap to get values in around target location + float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon)); + float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon)); + float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon)); + + float xr2 = log(fmax(heatmap_ch[py * dim[3] + px + 2], epsilon)); + float xl2 = log(fmax(heatmap_ch[py * dim[3] + px - 2], epsilon)); + float yu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px], epsilon)); + float yd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px], epsilon)); + float yu2 = log(fmax(heatmap_ch[(py + 2) * dim[3] + px], epsilon)); + float yd2 = log(fmax(heatmap_ch[(py - 2) * dim[3] + px], epsilon)); + float xryu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px + 1], epsilon)); + float xryd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px + 1], epsilon)); + float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon)); + float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon)); + + // compute dx/dy and dxx/dyy with sampled values + float dx = 0.5 * (xr - xl); + float dy = 0.5 * (yu - yd); + float dxx = 0.25 * (xr2 - 2 * xy + xl2); + float dxy = 0.25 * (xryu - xryd - xlyu + xlyd); + float dyy = 0.25 * (yu2 - 2 * xy + yd2); + + // finally get offset by derivative and hassian, which combined by dx/dy and + // dxx/dyy + if (dxx * dyy - dxy * dxy != 0) { + float M[2][2] = {dxx, dxy, dxy, dyy}; + float D[2] = {dx, dy}; + cv::Mat hassian(2, 2, CV_32F, M); + cv::Mat derivative(2, 1, CV_32F, D); + cv::Mat offset = -hassian.inv() * derivative; + coords[ch * 2] += offset.at(0, 0); + coords[ch * 2 + 1] += offset.at(1, 0); + } +} + +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK) { + std::vector coords; + coords.resize(dim[1] * 2); + int heatmap_height = dim[2]; + int heatmap_width = dim[3]; + + for (int j = 0; j < dim[1]; ++j) { + int index = (batchid * dim[1] + j) * dim[2] * dim[3]; + + int idx = int(idxout[batchid * dim[1] + j]); + preds[j * 3] = heatmap[index + idx]; + coords[j * 2] = idx % heatmap_width; + coords[j * 2 + 1] = idx / heatmap_width; + + int px = int(coords[j * 2] + 0.5); + int py = int(coords[j * 2 + 1] + 0.5); + + if (DARK && px > 1 && px < heatmap_width - 2 && py > 1 && + py < heatmap_height - 2) { + dark_parse(heatmap, dim, coords, px, py, index, j); + } else { + if (px > 0 && px < heatmap_width - 1) { + float diff_x = heatmap[index + py * dim[3] + px + 1] - + heatmap[index + py * dim[3] + px - 1]; + coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25; + } + if (py > 0 && py < heatmap_height - 1) { + float diff_y = heatmap[index + (py + 1) * dim[3] + px] - + heatmap[index + (py - 1) * dim[3] + px]; + coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25; + } + } + } + + std::vector img_size{heatmap_width, heatmap_height}; + transform_preds(coords, center, scale, img_size, dim, preds); +} + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio) { + int crop_x1 = std::max(0, area[0]); + int crop_y1 = std::max(0, area[1]); + int crop_x2 = std::min(img.cols - 1, area[2]); + int crop_y2 = std::min(img.rows - 1, area[3]); + + int center_x = (crop_x1 + crop_x2) / 2.; + int center_y = (crop_y1 + crop_y2) / 2.; + int half_h = (crop_y2 - crop_y1) / 2.; + int half_w = (crop_x2 - crop_x1) / 2.; + + if (half_h * 3 > half_w * 4) { + half_w = static_cast(half_h * 0.75); + } else { + half_h = static_cast(half_w * 4 / 3); + } + + crop_x1 = + std::max(0, center_x - static_cast(half_w * (1 + expandratio))); + crop_y1 = + std::max(0, center_y - static_cast(half_h * (1 + expandratio))); + crop_x2 = std::min(img.cols - 1, + static_cast(center_x + half_w * (1 + expandratio))); + crop_y2 = std::min(img.rows - 1, + static_cast(center_y + half_h * (1 + expandratio))); + crop_img = + img(cv::Range(crop_y1, crop_y2 + 1), cv::Range(crop_x1, crop_x2 + 1)); + + center.clear(); + center.emplace_back((crop_x1 + crop_x2) / 2); + center.emplace_back((crop_y1 + crop_y2) / 2); + scale.clear(); + scale.emplace_back((crop_x2 - crop_x1)); + scale.emplace_back((crop_y2 - crop_y1)); +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.h new file mode 100644 index 0000000000000000000000000000000000000000..b9bd743b772d226b4b02c4f411e8492fda220571 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/keypoint_postprocess.h @@ -0,0 +1,68 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include + +std::vector get_3rd_point(std::vector& a, std::vector& b); +std::vector get_dir(float src_point_x, float src_point_y, float rot_rad); +void affine_tranform(float pt_x, + float pt_y, + cv::Mat& trans, + std::vector& x, + int p, + int num); +cv::Mat get_affine_transform(std::vector& center, + std::vector& scale, + float rot, + std::vector& output_size, + int inv); +void transform_preds(std::vector& coords, + std::vector& center, + std::vector& scale, + std::vector& output_size, + std::vector& dim, + std::vector& target_coords, + bool affine); +void box_to_center_scale(std::vector& box, + int width, + int height, + std::vector& center, + std::vector& scale); +void get_max_preds(std::vector& heatmap, + std::vector& dim, + std::vector& preds, + std::vector& maxvals, + int batchid, + int joint_idx); +void get_final_preds(std::vector& heatmap, + std::vector& dim, + std::vector& idxout, + std::vector& idxdim, + std::vector& center, + std::vector scale, + std::vector& preds, + int batchid, + bool DARK = true); + +void CropImg(cv::Mat& img, + cv::Mat& crop_img, + std::vector& area, + std::vector& center, + std::vector& scale, + float expandratio = 0.25); diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/main.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..cc580e41db18fbd1a5f61302f1b633eb65254f8a --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/main.cpp @@ -0,0 +1,415 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet + +#include +#include +#include +#include +#define image_size 416 + +#include "keypoint_detector.h" +#include "picodet_openvino.h" + +using namespace PaddleDetection; + +struct object_rect { + int x; + int y; + int width; + int height; +}; + +int resize_uniform(cv::Mat& src, + cv::Mat& dst, + cv::Size dst_size, + object_rect& effect_area) { + int w = src.cols; + int h = src.rows; + int dst_w = dst_size.width; + int dst_h = dst_size.height; + dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0)); + + float ratio_src = w * 1.0 / h; + float ratio_dst = dst_w * 1.0 / dst_h; + + int tmp_w = 0; + int tmp_h = 0; + if (ratio_src > ratio_dst) { + tmp_w = dst_w; + tmp_h = floor((dst_w * 1.0 / w) * h); + } else if (ratio_src < ratio_dst) { + tmp_h = dst_h; + tmp_w = floor((dst_h * 1.0 / h) * w); + } else { + cv::resize(src, dst, dst_size); + effect_area.x = 0; + effect_area.y = 0; + effect_area.width = dst_w; + effect_area.height = dst_h; + return 0; + } + cv::Mat tmp; + cv::resize(src, tmp, cv::Size(tmp_w, tmp_h)); + + if (tmp_w != dst_w) { + int index_w = floor((dst_w - tmp_w) / 2.0); + for (int i = 0; i < dst_h; i++) { + memcpy(dst.data + i * dst_w * 3 + index_w * 3, + tmp.data + i * tmp_w * 3, + tmp_w * 3); + } + effect_area.x = index_w; + effect_area.y = 0; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else if (tmp_h != dst_h) { + int index_h = floor((dst_h - tmp_h) / 2.0); + memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3); + effect_area.x = 0; + effect_area.y = index_h; + effect_area.width = tmp_w; + effect_area.height = tmp_h; + } else { + printf("error\n"); + } + return 0; +} + +const int color_list[80][3] = { + {216, 82, 24}, {236, 176, 31}, {125, 46, 141}, {118, 171, 47}, + {76, 189, 237}, {238, 19, 46}, {76, 76, 76}, {153, 153, 153}, + {255, 0, 0}, {255, 127, 0}, {190, 190, 0}, {0, 255, 0}, + {0, 0, 255}, {170, 0, 255}, {84, 84, 0}, {84, 170, 0}, + {84, 255, 0}, {170, 84, 0}, {170, 170, 0}, {170, 255, 0}, + {255, 84, 0}, {255, 170, 0}, {255, 255, 0}, {0, 84, 127}, + {0, 170, 127}, {0, 255, 127}, {84, 0, 127}, {84, 84, 127}, + {84, 170, 127}, {84, 255, 127}, {170, 0, 127}, {170, 84, 127}, + {170, 170, 127}, {170, 255, 127}, {255, 0, 127}, {255, 84, 127}, + {255, 170, 127}, {255, 255, 127}, {0, 84, 255}, {0, 170, 255}, + {0, 255, 255}, {84, 0, 255}, {84, 84, 255}, {84, 170, 255}, + {84, 255, 255}, {170, 0, 255}, {170, 84, 255}, {170, 170, 255}, + {170, 255, 255}, {255, 0, 255}, {255, 84, 255}, {255, 170, 255}, + {42, 0, 0}, {84, 0, 0}, {127, 0, 0}, {170, 0, 0}, + {212, 0, 0}, {255, 0, 0}, {0, 42, 0}, {0, 84, 0}, + {0, 127, 0}, {0, 170, 0}, {0, 212, 0}, {0, 255, 0}, + {0, 0, 42}, {0, 0, 84}, {0, 0, 127}, {0, 0, 170}, + {0, 0, 212}, {0, 0, 255}, {0, 0, 0}, {36, 36, 36}, + {72, 72, 72}, {109, 109, 109}, {145, 145, 145}, {182, 182, 182}, + {218, 218, 218}, {0, 113, 188}, {80, 182, 188}, {127, 127, 0}, +}; + +void draw_bboxes(const cv::Mat& bgr, + const std::vector& bboxes, + object_rect effect_roi) { + static const char* class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = bgr.clone(); + int src_w = image.cols; + int src_h = image.rows; + int dst_w = effect_roi.width; + int dst_h = effect_roi.height; + float width_ratio = (float)src_w / (float)dst_w; + float height_ratio = (float)src_h / (float)dst_h; + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo& bbox = bboxes[i]; + cv::Scalar color = cv::Scalar(color_list[bbox.label][0], + color_list[bbox.label][1], + color_list[bbox.label][2]); + cv::rectangle(image, + cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, + (bbox.y1 - effect_roi.y) * height_ratio), + cv::Point((bbox.x2 - effect_roi.x) * width_ratio, + (bbox.y2 - effect_roi.y) * height_ratio)), + color); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + int x = (bbox.x1 - effect_roi.x) * width_ratio; + int y = + (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine; + if (y < 0) y = 0; + if (x + label_size.width > image.cols) x = image.cols - label_size.width; + + cv::rectangle( + image, + cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, label_size.height + baseLine)), + color, + -1); + + cv::putText(image, + text, + cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, + 0.4, + cv::Scalar(255, 255, 255)); + } + + cv::imwrite("../predict.jpg", image); +} + +std::vector coordsback(const cv::Mat image, + const object_rect effect_roi, + const std::vector& bboxes) { + int src_w = image.cols; + int src_h = image.rows; + int dst_w = effect_roi.width; + int dst_h = effect_roi.height; + float width_ratio = (float)src_w / (float)dst_w; + float height_ratio = (float)src_h / (float)dst_h; + + std::vector bboxes_oimg; + + for (int i = 0; i < bboxes.size(); i++) { + auto bbox = bboxes[i]; + bbox.x1 = (bbox.x1 - effect_roi.x) * width_ratio; + bbox.y1 = (bbox.y1 - effect_roi.y) * height_ratio; + bbox.x2 = (bbox.x2 - effect_roi.x) * width_ratio; + bbox.y2 = (bbox.y2 - effect_roi.y) * height_ratio; + bboxes_oimg.emplace_back(bbox); + } + return bboxes_oimg; +} + +void image_infer_kpts(KeyPointDetector* kpts_detector, + cv::Mat image, + const object_rect effect_roi, + const std::vector& results, + std::string img_name = "kpts_vis", + bool save_img = true) { + std::vector cropimgs; + std::vector> center_bs; + std::vector> scale_bs; + std::vector kpts_results; + auto results_oimg = coordsback(image, effect_roi, results); + + for (int i = 0; i < results_oimg.size(); i++) { + auto rect = results_oimg[i]; + if (rect.label == 0) { + cv::Mat cropimg; + std::vector center, scale; + std::vector area = {static_cast(rect.x1), + static_cast(rect.y1), + static_cast(rect.x2), + static_cast(rect.y2)}; + CropImg(image, cropimg, area, center, scale); + cropimgs.emplace_back(cropimg); + center_bs.emplace_back(center); + scale_bs.emplace_back(scale); + } + if (cropimgs.size() == 1 || + (cropimgs.size() > 0 && i == results_oimg.size() - 1)) { + kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results); + cropimgs.clear(); + center_bs.clear(); + scale_bs.clear(); + } + } + std::vector compression_params; + compression_params.push_back(cv::IMWRITE_JPEG_QUALITY); + compression_params.push_back(95); + std::string kpts_savepath = + "keypoint_" + img_name.substr(img_name.find_last_of('/') + 1); + cv::Mat kpts_vis_img = + VisualizeKptsResult(image, kpts_results, {0, 255, 0}, 0.1); + if (save_img) { + cv::imwrite(kpts_savepath, kpts_vis_img, compression_params); + printf("Visualized output saved as %s\n", kpts_savepath.c_str()); + } else { + cv::imshow("image", kpts_vis_img); + } +} + +int image_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + const char* imagepath) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name); + if (image.empty()) { + return -1; + } + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform( + image, resized_img, cv::Size(image_size, image_size), effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, img_name); + } + } + return 0; +} + +int webcam_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + int cam_id) { + cv::Mat image; + cv::VideoCapture cap(cam_id); + + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform( + image, resized_img, cv::Size(image_size, image_size), effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, "", false); + } + } + return 0; +} + +int video_demo(PicoDet& detector, + KeyPointDetector* kpts_detector, + const char* path) { + cv::Mat image; + cv::VideoCapture cap(path); + + while (true) { + cap >> image; + object_rect effect_roi; + cv::Mat resized_img; + resize_uniform( + image, resized_img, cv::Size(image_size, image_size), effect_roi); + auto results = detector.detect(resized_img, 0.4, 0.5); + if (kpts_detector) { + image_infer_kpts(kpts_detector, image, effect_roi, results, "", false); + } + } + return 0; +} + +int benchmark(KeyPointDetector* kpts_detector) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(256, 192, CV_8UC3, cv::Scalar(1, 1, 1)); + std::vector center = {128, 96}; + std::vector scale = {256, 192}; + std::vector cropimgs = {image}; + std::vector> center_bs = {center}; + std::vector> scale_bs = {scale}; + std::vector kpts_results; + + for (int i = 0; i < warm_up + loop_num; i++) { + auto start = std::chrono::steady_clock::now(); + std::vector results; + kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results); + auto end = std::chrono::steady_clock::now(); + + std::chrono::duration elapsed = end - start; + double time = elapsed.count(); + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; + } + } + time_avg /= loop_num; + fprintf(stderr, + "%20s min = %7.4f max = %7.4f avg = %7.4f\n", + "tinypose", + time_min, + time_max, + time_avg); + return 0; +} + +int main(int argc, char** argv) { + if (argc != 3) { + fprintf(stderr, + "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n " + "For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; " + "\n For benchmark, mode=3 path=0.\n", + argv[0]); + return -1; + } + std::cout << "start init model" << std::endl; + auto detector = PicoDet("./weight/picodet_m_416.xml"); + auto kpts_detector = + new KeyPointDetector("./weight/tinypose256_git2-sim.xml", 256, 192); + std::cout << "success" << std::endl; + + int mode = atoi(argv[1]); + switch (mode) { + case 0: { + int cam_id = atoi(argv[2]); + webcam_demo(detector, kpts_detector, cam_id); + break; + } + case 1: { + const char* images = argv[2]; + image_demo(detector, kpts_detector, images); + break; + } + case 2: { + const char* path = argv[2]; + video_demo(detector, kpts_detector, path); + break; + } + case 3: { + benchmark(kpts_detector); + break; + } + default: { + fprintf(stderr, + "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; " + "\n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, " + "mode=2; \n For benchmark, mode=3 path=0.\n", + argv[0]); + break; + } + } + delete kpts_detector; + kpts_detector = nullptr; +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.cpp b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.cpp new file mode 100644 index 0000000000000000000000000000000000000000..14ddab3baf1bf059e30d82c415d0c9e5da0034fc --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.cpp @@ -0,0 +1,213 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_openvino + +#include "picodet_openvino.h" + +inline float fast_exp(float x) { + union { + uint32_t i; + float f; + } v{}; + v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f); + return v.f; +} + +inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); } + +template +int activation_function_softmax(const _Tp* src, _Tp* dst, int length) { + const _Tp alpha = *std::max_element(src, src + length); + _Tp denominator{0}; + + for (int i = 0; i < length; ++i) { + dst[i] = fast_exp(src[i] - alpha); + denominator += dst[i]; + } + + for (int i = 0; i < length; ++i) { + dst[i] /= denominator; + } + + return 0; +} + +PicoDet::PicoDet(const char* model_path) { + InferenceEngine::Core ie; + InferenceEngine::CNNNetwork model = ie.ReadNetwork(model_path); + // prepare input settings + InferenceEngine::InputsDataMap inputs_map(model.getInputsInfo()); + input_name_ = inputs_map.begin()->first; + InferenceEngine::InputInfo::Ptr input_info = inputs_map.begin()->second; + // prepare output settings + InferenceEngine::OutputsDataMap outputs_map(model.getOutputsInfo()); + for (auto& output_info : outputs_map) { + output_info.second->setPrecision(InferenceEngine::Precision::FP32); + } + + // get network + network_ = ie.LoadNetwork(model, "CPU"); + infer_request_ = network_.CreateInferRequest(); +} + +PicoDet::~PicoDet() {} + +void PicoDet::preprocess(cv::Mat& image, InferenceEngine::Blob::Ptr& blob) { + int img_w = image.cols; + int img_h = image.rows; + int channels = 3; + + InferenceEngine::MemoryBlob::Ptr mblob = + InferenceEngine::as(blob); + if (!mblob) { + THROW_IE_EXCEPTION + << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, " + << "but by fact we were not able to cast inputBlob to MemoryBlob"; + } + auto mblobHolder = mblob->wmap(); + float* blob_data = mblobHolder.as(); + + for (size_t c = 0; c < channels; c++) { + for (size_t h = 0; h < img_h; h++) { + for (size_t w = 0; w < img_w; w++) { + blob_data[c * img_w * img_h + h * img_w + w] = + (float)image.at(h, w)[c]; + } + } + } +} + +std::vector PicoDet::detect(cv::Mat image, + float score_threshold, + float nms_threshold) { + InferenceEngine::Blob::Ptr input_blob = infer_request_.GetBlob(input_name_); + preprocess(image, input_blob); + + // do inference + infer_request_.Infer(); + + // get output + std::vector> results; + results.resize(this->num_class_); + + for (const auto& head_info : this->heads_info_) { + const InferenceEngine::Blob::Ptr dis_pred_blob = + infer_request_.GetBlob(head_info.dis_layer); + const InferenceEngine::Blob::Ptr cls_pred_blob = + infer_request_.GetBlob(head_info.cls_layer); + + auto mdis_pred = + InferenceEngine::as(dis_pred_blob); + auto mdis_pred_holder = mdis_pred->rmap(); + const float* dis_pred = mdis_pred_holder.as(); + + auto mcls_pred = + InferenceEngine::as(cls_pred_blob); + auto mcls_pred_holder = mcls_pred->rmap(); + const float* cls_pred = mcls_pred_holder.as(); + this->decode_infer( + cls_pred, dis_pred, head_info.stride, score_threshold, results); + } + + std::vector dets; + for (int i = 0; i < (int)results.size(); i++) { + this->nms(results[i], nms_threshold); + + for (auto& box : results[i]) { + dets.push_back(box); + } + } + return dets; +} + +void PicoDet::decode_infer(const float*& cls_pred, + const float*& dis_pred, + int stride, + float threshold, + std::vector>& results) { + int feature_h = input_size_ / stride; + int feature_w = input_size_ / stride; + for (int idx = 0; idx < feature_h * feature_w; idx++) { + int row = idx / feature_w; + int col = idx % feature_w; + float score = 0; + int cur_label = 0; + + for (int label = 0; label < num_class_; label++) { + if (cls_pred[idx * num_class_ + label] > score) { + score = cls_pred[idx * num_class_ + label]; + cur_label = label; + } + } + if (score > threshold) { + const float* bbox_pred = dis_pred + idx * (reg_max_ + 1) * 4; + results[cur_label].push_back( + this->disPred2Bbox(bbox_pred, cur_label, score, col, row, stride)); + } + } +} + +BoxInfo PicoDet::disPred2Bbox( + const float*& dfl_det, int label, float score, int x, int y, int stride) { + float ct_x = (x + 0.5) * stride; + float ct_y = (y + 0.5) * stride; + std::vector dis_pred; + dis_pred.resize(4); + for (int i = 0; i < 4; i++) { + float dis = 0; + float* dis_after_sm = new float[reg_max_ + 1]; + activation_function_softmax( + dfl_det + i * (reg_max_ + 1), dis_after_sm, reg_max_ + 1); + for (int j = 0; j < reg_max_ + 1; j++) { + dis += j * dis_after_sm[j]; + } + dis *= stride; + dis_pred[i] = dis; + delete[] dis_after_sm; + } + float xmin = (std::max)(ct_x - dis_pred[0], .0f); + float ymin = (std::max)(ct_y - dis_pred[1], .0f); + float xmax = (std::min)(ct_x + dis_pred[2], (float)this->input_size_); + float ymax = (std::min)(ct_y + dis_pred[3], (float)this->input_size_); + return BoxInfo{xmin, ymin, xmax, ymax, score, label}; +} + +void PicoDet::nms(std::vector& input_boxes, float NMS_THRESH) { + std::sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) { + return a.score > b.score; + }); + std::vector vArea(input_boxes.size()); + for (int i = 0; i < int(input_boxes.size()); ++i) { + vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) * + (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1); + } + for (int i = 0; i < int(input_boxes.size()); ++i) { + for (int j = i + 1; j < int(input_boxes.size());) { + float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1); + float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1); + float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2); + float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2); + float w = (std::max)(float(0), xx2 - xx1 + 1); + float h = (std::max)(float(0), yy2 - yy1 + 1); + float inter = w * h; + float ovr = inter / (vArea[i] + vArea[j] - inter); + if (ovr >= NMS_THRESH) { + input_boxes.erase(input_boxes.begin() + j); + vArea.erase(vArea.begin() + j); + } else { + j++; + } + } + } +} diff --git a/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h new file mode 100644 index 0000000000000000000000000000000000000000..7bd3d79c44a2f6ae62eaba82bcafcae45a84254f --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h @@ -0,0 +1,74 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_openvino + +#ifndef _PICODET_OPENVINO_H_ +#define _PICODET_OPENVINO_H_ + +#include +#include +#include + +#define image_size 416 + +typedef struct HeadInfo { + std::string cls_layer; + std::string dis_layer; + int stride; +} HeadInfo; + +typedef struct BoxInfo { + float x1; + float y1; + float x2; + float y2; + float score; + int label; +} BoxInfo; + +class PicoDet { +public: + PicoDet(const char *param); + + ~PicoDet(); + + InferenceEngine::ExecutableNetwork network_; + InferenceEngine::InferRequest infer_request_; + + std::vector heads_info_{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; + + std::vector detect(cv::Mat image, float score_threshold, + float nms_threshold); + +private: + void preprocess(cv::Mat &image, InferenceEngine::Blob::Ptr &blob); + void decode_infer(const float *&cls_pred, const float *&dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); + std::string input_name_; + int input_size_ = image_size; + int num_class_ = 80; + int reg_max_ = 7; +}; + +#endif diff --git a/PaddleDetection-release-2.6/deploy/third_engine/onnx/infer.py b/PaddleDetection-release-2.6/deploy/third_engine/onnx/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..e916af9d016c72b7964f980a1a623d5d7e8ab8a1 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/onnx/infer.py @@ -0,0 +1,148 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import argparse +import numpy as np +import glob +from onnxruntime import InferenceSession + +from preprocess import Compose + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'PPYOLOE', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', + 'S2ANet', 'JDE', 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', + 'TOOD', 'RetinaNet', 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet' +} + +parser = argparse.ArgumentParser(description=__doc__) +parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml") +parser.add_argument( + '--onnx_file', type=str, default="model.onnx", help="onnx model file path") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +class PredictConfig(object): + """set config of preprocess, postprocess and visualize + Args: + infer_config (str): path of infer_cfg.yml + """ + + def __init__(self, infer_config): + # parsing Yaml config for Preprocess + with open(infer_config) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.label_list = yml_conf['label_list'] + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + self.draw_threshold = yml_conf.get("draw_threshold", 0.5) + self.mask = yml_conf.get("mask", False) + self.tracker = yml_conf.get("tracker", None) + self.nms = yml_conf.get("NMS", None) + self.fpn_stride = yml_conf.get("fpn_stride", None) + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def predict_image(infer_config, predictor, img_list): + # load preprocess transforms + transforms = Compose(infer_config.preprocess_infos) + # predict image + for img_path in img_list: + inputs = transforms(img_path) + inputs_name = [var.name for var in predictor.get_inputs()] + inputs = {k: inputs[k][None, ] for k in inputs_name} + + outputs = predictor.run(output_names=None, input_feed=inputs) + + print("ONNXRuntime predict: ") + if infer_config.arch in ["HRNet"]: + print(np.array(outputs[0])) + else: + bboxes = np.array(outputs[0]) + for bbox in bboxes: + if bbox[0] > -1 and bbox[1] > infer_config.draw_threshold: + print(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + + +if __name__ == '__main__': + FLAGS = parser.parse_args() + # load image list + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + # load predictor + predictor = InferenceSession(FLAGS.onnx_file) + # load infer config + infer_config = PredictConfig(FLAGS.infer_cfg) + + predict_image(infer_config, predictor, img_list) diff --git a/PaddleDetection-release-2.6/deploy/third_engine/onnx/preprocess.py b/PaddleDetection-release-2.6/deploy/third_engine/onnx/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..3554b7f81250bdffbbd78d1ec4905bcf722943e6 --- /dev/null +++ b/PaddleDetection-release-2.6/deploy/third_engine/onnx/preprocess.py @@ -0,0 +1,494 @@ +import numpy as np +import cv2 +import copy + + +def decode_image(img_path): + with open(img_path, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + norm_type (str): type in ['mean_std', 'none'] + """ + + def __init__(self, mean, std, is_scale=True, norm_type='mean_std'): + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + return inp, im_info + + +# keypoint preprocess +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img_path): + img, im_info = decode_image(img_path) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = img + return inputs diff --git a/PaddleDetection-release-2.6/docs/CHANGELOG.md b/PaddleDetection-release-2.6/docs/CHANGELOG.md new file mode 100644 index 0000000000000000000000000000000000000000..0253bdd7c3d85a567fc10751cab95f240c73921d --- /dev/null +++ b/PaddleDetection-release-2.6/docs/CHANGELOG.md @@ -0,0 +1,405 @@ +简体中文 | [English](./CHANGELOG_en.md) + +# 版本更新信息 + +## 最新版本信息 + +### 2.6(02.15/2023) + +- 特色模型 + - 发布旋转框检测模型PP-YOLOE-R:Anchor-free旋转框检测SOTA模型,精度速度双高、云边一体,s/m/l/x四个模型适配不用算力硬件、部署友好,避免使用特殊算子,能够轻松使用TensorRT加速; + - 发布小目标检测模型PP-YOLOE-SOD:基于切图的端到端检测方案、基于原图的检测模型,精度达VisDrone开源最优; + - 发布密集检测模型:基于PP-YOLOE+的密集检测算法,SKU数据集检测精度60.3,达到开源最优 +- 前沿算法 + - YOLO家族新增前沿算法YOLOv8,更新YOLOv6-v3.0 + - 新增目标检测算法DINO,YOLOF + - 新增ViTDet系列检测模型,PP-YOLOE+ViT_base, Mask RCNN + ViT_base, Mask RCNN + ViT_large + - 新增多目标跟踪算法CenterTrack + - 新增旋转框检测算法FCOSR + - 新增实例分割算法QueryInst + - 新增3D关键点检测算法Metro3d + - 新增模型蒸馏算法FGD,LD,CWD,新增PP-YOLOE+模型蒸馏,精度提升1.1 mAP + - 新增半监督检测算法 DenseTeacher,并适配PP-YOLOE+ + - 新增少样本迁移学习方案,包含Co-tuning,Contrastive learning两类算法 +- 场景能力 + - PP-Human v2开源边缘端实时检测模型,精度45.7,Jetson AGX速度80FPS + - PP-Vehicle开源边缘端实时检测模型,精度53.5,Jetson AGX速度80FPS + - PP-Human v2,PP-Vehicle支持多路视频流部署能力,实现Jetson AGX 4路视频流端到端20FPS实时部署 + - PP-Vehicle新增车辆压线检测和车辆逆行检测能力 +- 框架能力 + - 功能新增 + - 新增检测热力图可视化能力,适配FasterRCNN/MaskRCNN系列, PP-YOLOE系列, BlazeFace, SSD, RetinaNet + - 功能完善/Bug修复 + - 支持python3.10版本 + - EMA支持过滤不更新参数 + - 简化PP-YOLOE architecture架构代码 + - AdamW适配paddle2.4.1版本 + + +### 2.5(08.26/2022) + +- 特色模型 + - PP-YOLOE+: + - 发布PP-YOLOE+模型,COCO test2017数据集精度提升0.7%-2.4% mAP,模型训练收敛速度提升3.75倍,端到端预测速度提升1.73-2.3倍 + - 发布智慧农业,夜间安防检测,工业质检场景预训练模型,精度提升1.3%-8.1% mAP + - 支持分布式训练、在线量化、serving部署等10大高性能训练部署能力,新增C++/Python Serving、TRT原生推理、ONNX Runtime等5+部署demo教程 + - PP-PicoDet: + - 发布PicoDet-NPU模型,支持模型全量化部署 + - 新增PicoDet版面分析模型,基于FGD蒸馏算法精度提升0.5% mAP + - PP-TinyPose + - 发布PP-TinyPose增强版,在健身、舞蹈等场景的业务数据集端到端AP提升9.1% AP + - 覆盖侧身、卧躺、跳跃、高抬腿等非常规动作 + - 新增滤波稳定模块,关键点稳定性显著增强 + +- 场景能力 + - PP-Human v2 + - 发布PP-Human v2,支持四大产业特色功能:多方案行为识别案例库、人体属性识别、人流检测与轨迹留存以及高精度跨镜跟踪 + - 底层算法能力升级,行人检测精度提升1.5% mAP;行人跟踪精度提升10.2% MOTA,轻量级模型速度提升34%;属性识别精度提升0.6% ma,轻量级模型速度提升62.5% + - 提供全流程教程,覆盖数据采集标注,模型训练优化和预测部署,及pipeline中后处理代码修改 + - 新增在线视频流输入支持 + - 易用性提升,一行代码执行功能,执行流程判断、模型下载背后自动完成。 + - PP-Vehicle + - 全新发布PP-Vehicle,支持四大交通场景核心功能:车牌识别、属性识别、车流量统计、违章检测 + - 车牌识别支持基于PP-OCR v3的轻量级车牌识别模型 + - 车辆属性识别支持基于PP-LCNet多标签分类模型 + - 兼容图片、视频、在线视频流等各类数据输入格式 + - 易用性提升,一行代码执行功能,执行流程判断、模型下载背后自动完成。 + +- 前沿算法 + - YOLO家族全系列模型 + - 发布YOLO家族全系列模型,覆盖前沿检测算法YOLOv5、YOLOv6及YOLOv7 + - 基于ConvNext骨干网络,YOLO各算法训练周期缩5-8倍,精度普遍提升1%-5% mAP;使用模型压缩策略实现精度无损的同时速度提升30%以上 + - 新增基于ViT骨干网络高精度检测模型,COCO数据集精度达到55.7% mAP + - 新增OC-SORT多目标跟踪模型 + - 新增ConvNeXt骨干网络 + +- 产业实践范例教程 + - 基于PP-TinyPose增强版的智能健身动作识别 + - 基于PP-Human的打架识别 + - 基于PP-Human的营业厅来客分析 + - 基于PP-Vehicle的车辆结构化分析 + - 基于PP-YOLOE+的PCB电路板缺陷检测 + +- 框架能力 + - 功能新增 + - 新增自动压缩工具支持并提供demo,PP-YOLOE l版本精度损失0.3% mAP,V100速度提升13% + - 新增PaddleServing python/C++和ONNXRuntime部署demo + - 新增PP-YOLOE 端到端TensorRT部署demo + - 新增FGC蒸馏算法,RetinaNet精度提升3.3% + - 新增分布式训练文档 + - 功能完善/Bug修复 + - 修复Windows c++部署编译问题 + - 修复VOC格式数据预测时保存结果问题 + - 修复FairMOT c++部署检测框输出 + - 旋转框检测模型S2ANet支持batch size>1部署 + +### 2.4(03.24/2022) + +- PP-YOLOE: + - 发布PP-YOLOE特色模型,l版本COCO test2017数据集精度51.6%,V100预测速度78.1 FPS,精度速度服务器端SOTA + - 发布s/m/l/x系列模型,打通TensorRT、ONNX部署能力 + - 支持混合精度训练,训练较PP-YOLOv2加速33% + +- PP-PicoDet: + - 发布PP-PicoDet优化模型,精度提升2%左右,CPU预测速度提升63%。 + - 新增参数量0.7M的PicoDet-XS模型 + - 后处理集成到网络中,优化端到端部署成本 + +- 行人分析Pipeline: + - 发布PP-Human行人分析Pipeline,覆盖行人检测、属性识别、行人跟踪、跨镜跟踪、人流量统计、动作识别多种功能,打通TensorRT部署 + - 属性识别支持StrongBaseline模型 + - ReID支持Centroid模型 + - 动作识别支持ST-GCN摔倒检测 + +- 模型丰富度: + - 发布YOLOX,支持nano/tiny/s/m/l/x版本,x版本COCO val2017数据集精度51.8% + +- 框架功能优化: + - EMA训练速度优化20%,优化EMA训练模型保存方式 + - 支持infer预测结果保存为COCO格式 + +- 部署优化: + - RCNN全系列模型支持Paddle2ONNX导出ONNX模型 + - SSD模型支持导出时融合解码OP,优化边缘端部署速度 + - 支持NMS导出TensorRT,TensorRT部署端到端速度提升 + +### 2.3(11.03/2021) + +- 特色模型: + - 检测: 轻量级移动端检测模型PP-PicoDet,精度速度达到移动端SOTA + - 关键点: 轻量级移动端关键点模型PP-TinyPose + +- 模型丰富度: + - 检测: + - 新增Swin-Transformer目标检测模型 + - 新增TOOD(Task-aligned One-stage Object Detection)模型 + - 新增GFL(Generalized Focal Loss)目标检测模型 + - 发布Sniper小目标检测优化方法,支持Faster RCNN及PP-YOLO系列模型 + - 发布针对EdgeBoard优化的PP-YOLO-EB模型 + + - 跟踪 + - 发布实时跟踪系统PP-Tracking + - 发布FairMot高精度模型、小尺度模型和轻量级模型 + - 发布行人、人头和车辆实跟踪垂类模型库,覆盖航拍监控、自动驾驶、密集人群、极小目标等场景 + - DeepSORT模型适配PP-YOLO, PP-PicoDet等更多检测器 + + - 关键点 + - 新增Lite HRNet模型 + +- 预测部署: + - YOLOv3系列模型支持NPU预测部署 + - FairMot模型C++预测部署打通 + - 关键点系列模型C++预测部署打通, Paddle Lite预测部署打通 + +- 文档: + - 新增各系列模型英文文档 + +### 2.2(08.10/2021) + +- 模型丰富度: + - 发布Transformer检测模型:DETR、Deformable DETR、Sparse RCNN + - 关键点检测新增Dark模型,发布Dark HRNet模型 + - 发布MPII数据集HRNet关键点检测模型 + - 发布人头、车辆跟踪垂类模型 + +- 模型优化: + - 旋转框检测模型S2ANet发布Align Conv优化模型,DOTA数据集mAP优化至74.0 + +- 预测部署 + - 主流模型支持batch size>1预测部署,包含YOLOv3,PP-YOLO,Faster RCNN,SSD,TTFNet,FCOS + - 新增多目标跟踪模型(JDE, FairMot, DeepSort) Python端预测部署支持,并支持TensorRT预测 + - 新增多目标跟踪模型FairMot联合关键点检测模型部署Python端预测部署支持 + - 新增关键点检测模型联合PP-YOLO预测部署支持 + +- 文档: + - Windows预测部署文档新增TensorRT版本说明 + - FAQ文档更新发布 + +- 问题修复: + - 修复PP-YOLO系列模型训练收敛性问题 + - 修复batch size>1时无标签数据训练问题 + + +### 2.1(05.20/2021) +- 模型丰富度提升: + - 发布关键点模型HRNet,HigherHRNet + - 发布多目标跟踪模型DeepSort, FairMot, JDE + +- 框架基础能力: + - 支持无标注框训练 + +- 预测部署: + - Paddle Inference YOLOv3系列模型支持batch size>1预测 + - 旋转框检测S2ANet模型预测部署打通 + - 增加量化模型Benchmark + - 增加动态图模型与静态图模型Paddle-Lite demo + +- 检测模型压缩: + - 发布PPYOLO系列模型压缩模型 + +- 文档: + - 更新快速开始,预测部署等教程文档 + - 新增ONNX模型导出教程 + - 新增移动端部署文档 + + +### 2.0(04.15/2021) + + **说明:** 自2.0版本开始,动态图作为PaddleDetection默认版本,原`dygraph`目录切换为根目录,原静态图实现移动到`static`目录下。 + + - 动态图模型丰富度提升: + - 发布PP-YOLOv2及PP-YOLO tiny模型,PP-YOLOv2 COCO test数据集精度达到49.5%,V100预测速度达到68.9 FPS + - 发布旋转框检测模型S2ANet + - 发布两阶段实用模型PSS-Det + - 发布人脸检测模型Blazeface + + - 新增基础模块: + - 新增SENet,GhostNet,Res2Net骨干网络 + - 新增VisualDL训练可视化支持 + - 新增单类别精度计算及PR曲线绘制功能 + - YOLO系列模型支持NHWC数据格式 + + - 预测部署: + - 发布主要模型的预测benchmark数据 + - 适配TensorRT6,支持TensorRT动态尺寸输入,支持TensorRT int8量化预测 + - PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN等7类模型在Linux、Windows、NV Jetson平台下python/cpp/TRT预测部署打通: + + - 检测模型压缩: + - 蒸馏:新增动态图蒸馏支持,并发布YOLOv3-MobileNetV1蒸馏模型 + - 联合策略:新增动态图剪裁+蒸馏联合策略压缩方案,并发布YOLOv3-MobileNetV1的剪裁+蒸馏压缩模型 + - 问题修复:修复动态图量化模型导出问题 + + - 文档: + - 新增动态图英文文档:包含首页文档,入门使用,快速开始,模型算法、新增数据集等 + - 新增动态图中英文安装文档 + - 新增动态图RCNN系列和YOLO系列配置文件模板及配置项说明文档 + + +## 历史版本信息 + +### 2.0-rc(02.23/2021) + - 动态图模型丰富度提升: + - 优化RCNN模型组网及训练方式,RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本) + - 新增支持SSDLite,FCOS,TTFNet,SOLOv2系列模型 + - 新增行人和车辆垂类目标检测模型 + + - 新增动态图基础模块: + - 新增MobileNetV3,HRNet骨干网络 + - 优化RoIAlign计算逻辑,RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本) + - 新增支持Synchronized Batch Norm + - 新增支持Modulated Deformable Convolution + + - 预测部署: + - 发布动态图python、C++、Serving部署解决方案及文档,支持Faster RCNN,Mask RCNN,YOLOv3,PP-YOLO,SSD,TTFNet,FCOS,SOLOv2等系列模型预测部署 + - 动态图预测部署支持TensorRT模式FP32,FP16推理加速 + + - 检测模型压缩: + - 裁剪:新增动态图裁剪支持,并发布YOLOv3-MobileNetV1裁剪模型 + - 量化:新增动态图量化支持,并发布YOLOv3-MobileNetV1和YOLOv3-MobileNetV3量化模型 + + - 文档: + - 新增动态图入门教程文档:包含安装说明,快速开始,准备数据,训练/评估/预测流程文档 + - 新增动态图进阶教程文档:包含模型压缩、推理部署文档 + - 新增动态图模型库文档 + +### v2.0-beta(12.20/2020) + - 动态图支持: + - 支持Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3和SSD模型,试用版本。 + - 模型提升: + - 更新PP-YOLO MobileNetv3 large和small模型,精度提升,并新增裁剪和蒸馏后的模型。 + - 新功能: + - 支持VisualDL可视化数据预处理图片。 + + - Bug修复: + - 修复BlazeFace人脸关键点预测bug。 + + +### v0.5.0(11/2020) + - 模型丰富度提升: + - 发布SOLOv2系列模型,其中SOLOv2-Light-R50-VD-DCN-FPN 模型在单卡V100上达到 38.6 FPS,加速24% ,COCO验证集精度达到38.8%, 提升2.4绝对百分点。 + - 新增Android移动端检测demo,包括SSD、YOLO系列模型,可直接扫码安装体验。 + + - 移动端模型优化: + - 新增PACT新量化策略,YOLOv3-Mobilenetv3在COCO数据集上比普通量化相比提升0.7%。 + + - 易用性提升及功能组件: + - 增强generate_proposal_labels算子功能,规避模型出nan风险。 + - 修复deploy下python与C++预测若干问题。 + - 统一COCO与VOC数据集下评估流程,支持输出单类AP和P-R曲线。 + - PP-YOLO支持矩形输入图像。 + + - 文档: + - 新增目标检测全流程教程,新增Jetson平台部署教程。 + + +### v0.4.0(07/2020) + - 模型丰富度提升: + - 发布PPYOLO模型,COCO数据集精度达到45.2%,单卡V100预测速度达到72.9 FPS,精度和预测速度优于YOLOv4模型。 + - 新增TTFNet模型,base版本对齐竞品,COCO数据集精度达到32.9%。 + - 新增HTC模型,base版本对齐竞品,COCO数据集精度达到42.2%。 + - 新增BlazeFace人脸关键点检测模型,在Wider-Face数据集的Easy-Set精度达到85.2%。 + - 新增ACFPN模型, COCO数据集精度达到39.6%。 + - 发布服务器端通用目标检测模型(包含676类),相同策略在COCO数据集上,V100为19.5FPS时,COCO mAP可以达到49.4%。 + + - 移动端模型优化: + - 新增SSDLite系列优化模型,包括新增GhostNet的Backbone,新增FPN组件等,精度提升0.5%-1.5%。 + + - 易用性提升及功能组件: + - 新增GridMask, RandomErasing数据增强方法。 + - 新增Matrix NMS支持。 + - 新增EMA(Exponential Moving Average)训练支持。 + - 新增多机训练方法,两机相对于单机平均加速比80%,多机训练支持待进一步验证。 + +### v0.3.0(05/2020) + - 模型丰富度提升: + - 添加Efficientdet-D0模型,速度与精度优于竞品。 + - 新增YOLOv4预测模型,精度对齐竞品;新增YOLOv4在Pascal VOC数据集上微调训练,精度达到85.5%。 + - YOLOv3新增MobileNetV3骨干网络,COCO数据集精度达到31.6%。 + - 添加Anchor-free模型FCOS,精度优于竞品。 + - 添加Anchor-free模型CornernetSqueeze,精度优于竞品,优化模型的COCO数据集精度38.2%, +3.7%,速度较YOLOv3-Darknet53快5%。 + - 添加服务器端实用目标检测模型CascadeRCNN-ResNet50vd模型,速度与精度优于竞品EfficientDet。 + + - 移动端推出3种模型: + - SSDLite系列模型:SSDLite-Mobilenetv3 small/large模型,精度优于竞品。 + - YOLOv3移动端方案: YOLOv3-MobileNetv3模型压缩后加速3.5倍,速度和精度均领先于竞品的SSDLite模型。 + - RCNN移动端方案:CascadeRCNN-MobileNetv3经过系列优化, 推出输入图像分别为320x320和640x640的模型,速度与精度具有较高性价比。 + + - 预测部署重构: + - 新增Python预测部署流程,支持RCNN,YOLO,SSD,RetinaNet,人脸系列模型,支持视频预测。 + - 重构C++预测部署,提高易用性。 + + - 易用性提升及功能组件: + - 增加AutoAugment数据增强。 + - 升级检测库文档结构。 + - 支持迁移学习自动进行shape匹配。 + - 优化mask分支评估阶段内存占用。 + +### v0.2.0(02/2020) + - 新增模型: + - 新增基于CBResNet模型。 + - 新增LibraRCNN模型。 + - 进一步提升YOLOv3模型精度,基于COCO数据精度达到43.2%,相比上个版本提升1.4%。 + - 新增基础模块: + - 主干网络: 新增CBResNet。 + - loss模块: YOLOv3的loss支持细粒度op组合。 + - 正则模块: 新增DropBlock模块。 + - 功能优化和改进: + - 加速YOLOv3数据预处理,整体训练提速40%。 + - 优化数据预处理逻辑,提升易用性。 + - 增加人脸检测预测benchmark数据。 + - 增加C++预测引擎Python API预测示例。 + - 检测模型压缩 : + - 裁剪: 发布MobileNet-YOLOv3裁剪方案和模型,基于VOC数据FLOPs - 69.6%, mAP + 1.4%,基于COCO数据FLOPS-28.8%, mAP + 0.9%; 发布ResNet50vd-dcn-YOLOv3裁剪方案和模型,基于COCO数据集FLOPS - 18.4%, mAP + 0.8%。 + - 蒸馏: 发布MobileNet-YOLOv3蒸馏方案和模型,基于VOC数据mAP + 2.8%,基于COCO数据mAP + 2.1%。 + - 量化: 发布YOLOv3-MobileNet和BlazeFace的量化模型。 + - 裁剪+蒸馏: 发布MobileNet-YOLOv3裁剪+蒸馏方案和模型,基于COCO数据FLOPS - 69.6%,基于TensorRT预测加速64.5%,mAP - 0.3 %; 发布ResNet50vd-dcn-YOLOv3裁剪+蒸馏方案和模型,基于COCO数据FLOPS - 43.7%,基于TensorRT预测加速24.0%,mAP + 0.6 %。 + - 搜索: 开源BlazeFace-Nas的完成搜索方案。 + - 预测部署: + - 集成 TensorRT,支持FP16、FP32、INT8量化推理加速。 + - 文档: + - 增加详细的数据预处理模块介绍文档以及实现自定义数据Reader文档。 + - 增加如何新增算法模型的文档。 + - 文档部署到网站: https://paddledetection.readthedocs.io + +### 12/2019 +- 增加Res2Net模型。 +- 增加HRNet模型。 +- 增加GIOU loss和DIOU loss。 + + +### 21/11/2019 +- 增加CascadeClsAware RCNN模型。 +- 增加CBNet,ResNet200和Non-local模型。 +- 增加SoftNMS。 +- 增加Open Image V5数据集和Objects365数据集模型。 + +### 10/2019 +- 增加增强版YOLOv3模型,精度高达41.4%。 +- 增加人脸检测模型BlazeFace、Faceboxes。 +- 丰富基于COCO的模型,精度高达51.9%。 +- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。 +- 增加行人检测和车辆检测预训练模型。 +- 支持FP16训练。 +- 增加跨平台的C++推理部署方案。 +- 增加模型压缩示例。 + + +### 2/9/2019 +- 增加GroupNorm模型。 +- 增加CascadeRCNN+Mask模型。 + +### 5/8/2019 +- 增加Modulated Deformable Convolution系列模型。 + +### 29/7/2019 + +- 增加检测库中文文档 +- 修复R-CNN系列模型训练同时进行评估的问题 +- 新增ResNext101-vd + Mask R-CNN + FPN模型 +- 新增基于VOC数据集的YOLOv3模型 + +### 3/7/2019 + +- 首次发布PaddleDetection检测库和检测模型库 +- 模型包括:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask + R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD. diff --git a/PaddleDetection-release-2.6/docs/CHANGELOG_en.md b/PaddleDetection-release-2.6/docs/CHANGELOG_en.md new file mode 100644 index 0000000000000000000000000000000000000000..ac374b5d619ceeb3c7445279afb9587367092502 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/CHANGELOG_en.md @@ -0,0 +1,415 @@ +English | [简体中文](./CHANGELOG.md) + +# Version Update Information + +## Last Version Information + +### 2.6(02.15/2023) + +- Featured model + + - Release rotated object detector PP-YOLOE-R:SOTA Anchor-free rotated object detection model with high accuracy and efficiency. It has a series of models, named s/m/l/x, for cloud and edge devices and avoids using special operators to be deployed friendly with TensorRT. + - Release small object detector PP-YOLOE-SOD: End-to-end detection pipeline based on sliced images and SOTA model on VisDrone based on original images. + - Release crowded object detector: Crowded object detection model with top accuracy on SKU dataset. + +- Functions in different scenarios + + - Release real-time object detection model on edge device in PP-Human v2. The model reaches 45.7mAP and 80FPS on Jetson AGX + - Release real-time object detection model on edge device in PP-Vehicle. The model reaches 53.5mAP and 80FPS on Jetson AGX + - Support multi-stream deployment in PP-Human v2 and PP-Vehicle. Achieved 20FPS in 4-stream deployment on Jetson AGX + - Support retrograde and press line detection in PP-Vehicle + +- Cutting-edge algorithms + + - Release YOLOv8 and YOLOv6 3.0 in YOLO Family + - Release object detection algorithm DINO, YOLOF + - Rich ViTDet series including PP-YOLOE+ViT_base, Mask RCNN + ViT_base, Mask RCNN + ViT_large + - Release MOT algorithm CenterTrack + - Release oriented object detection algorithm FCOSR + - Release instance segmentation algorithm QueryInst + - Release 3D keypoint detection algorithm Metro3d + - Release distillation algorithm FGD,LD,CWD and PP-YOLOE+ distillation with improvement of 1.1+ mAP + - Release SSOD algorithm DenseTeacher and adapt for PP-YOLOE+ + - Release few shot finetuning algorithm, including Co-tuning and Contrastive learning + +- Framework capabilities + + - New functions + - Release Grad-CAM for heatmap visualization. Support Faster RCNN, Mask RCNN, PP-YOLOE, BlazeFace, SSD, RetinaNet. + - Improvement and fixes + - Support python 3.10 + - Fix EMA for no-grad parameters + - Simplify PP-YOLOE architecture + - Support AdamW for Paddle 2.4.1 + +### 2.5(08.26/2022) + +- Featured model + + - PP-YOLOE+: + - Released PP-YOLOE+ model, with a 0.7%-2.4% mAP improvement on COCO test2017. 3.75 times faster model training convergence rate and 1.73-2.3 times faster end-to-end inference speed + - Released pre-trained models for smart agriculture, night security detection, and industrial quality inspection with 1.3%-8.1% mAP accuracy improvement + - supports 10 high-performance training deployment capabilities, including distributed training, online quantization, and serving deployment. We also provide more than five new deployment demos, such as C++/Python Serving, TRT native inference, and ONNX Runtime + - PP-PicoDet: + - Release the PicoDet-NPU model to support full quantization of model deployment + - Add PicoDet layout analysis model with 0.5% mAP accuracy improvement due to FGD distillation algorithm + - PP-TinyPose + - Release PP-TinyPose Plus with 9.1% end-to-end AP improvement for business data sets such as physical exercises, dance, and other scenarios + - Covers unconventional movements such as turning to one side, lying down, jumping, high lift + - Add stabilization module (via filter) to significantly improve the stability at key points + +- Functions in different scenarios + + - PP-Human v2 + - Release PP-Human v2, which supports four industrial features: behavioral recognition case zoo for multiple solutions, human attribute recognition, human traffic detection and trajectory retention, as well as high precision multi-camera tracking + - Upgraded underlying algorithm capabilities: 1.5% mAP improvement in pedestrian detection accuracy; 10.2% MOTA improvement in pedestrian tracking accuracy, 34% speed improvement in the lightweight model; 0.6% ma improvement in attribute recognition accuracy, 62.5% speed improvement in the lightweight model + - Provides comprehensive tutorials covering data collection and annotation, model training optimization and prediction deployment, and post-processing code modification in the pipeline + - Supports online video streaming input + - Become more user-friendly with a one-line code execution function that automates the process determination and model download + - PP-Vehicle + - Launch PP-Vehicle, which supports four core functions for traffic application: license plate recognition, attribute recognition, traffic flow statistics, and violation detection + - License plate recognition supports a lightweight model based on PP-OCR v3 + - Vehicle attribute recognition supports a multi-label classification model based on PP-LCNet + - Compatible with various data input formats such as pictures, videos and online video streaming + - Become more user-friendly with a one-line code execution function that automates the process determination and model download + +- Cutting-edge algorithms + + - YOLO Family + - Release the full range of YOLO family models covering the cutting-edge detection algorithms YOLOv5, YOLOv6 and YOLOv7 + - Based on the ConvNext backbone network, YOLO's algorithm training periods are reduced by 5-8 times with accuracy generally improving by 1%-5% mAP; Thanks to the model compression strategy, its speed increased by over 30% with no loss of precision. + - Newly add high precision detection model based on [ViT](configs/vitdet) backbone network, with a 55.7% mAP accuracy on the COCO dataset + - Newly add multi-object tracking model [OC-SORT](configs/mot/ocsort) + - Newly add [ConvNeXt](configs/convnext) backbone network. + +- Industrial application + + - Intelligent physical exercise recognition based on PP-TinyPose Plus + - Fighting recognition based on PP-Human + - Business hall visitor analysis based on PP-Human + - Vehicle structuring analysis based on PP-Vehicle + - PCB board defect detection based on PP-YOLOE+ + +- Framework capabilities + + - New functions + - Release auto-compression tools and demos, 0.3% mAP accuracy loss for PP-YOLOE l version, while 13% speed increase for V100 + - Release PaddleServing python/C++ and ONNXRuntime deployment demos + - Release PP-YOLOE end-to-end TensorRT deployment demo + - Release FGC distillation algorithm with RetinaNet accuracy improved by 3.3% + - Release distributed training documentation + - Improvement and fixes + - Fix compilation problem with Windows c++ deployment + - Fix problems when saving results of inference data in VOC format + - Fix the detection box output of FairMOT c++ deployment + - Rotating frame detection model S2ANet supports batch size>1 deployment + +### 2.4(03.24/2022) + +- PP-YOLOE: + - Release PP-YOLOE object detection models, achieve mAP as 51.6% on COCO test dataset and 78.1 FPS on Nvidia V100 by PP-YOLOE-l, reach SOTA performance for object detection on GPU`` + - Release series models: s/m/l/x, and support deployment base on TensorRT & ONNX + - Spport AMP training and training speed is 33% faster than PP-YOLOv2 + +- PP-PicoDet: + - Release enhanced models of PP-PicoDet, mAP promoted ~2% on COCO and inference speed accelerated 63% on CPU + - Release PP-PicoDet-XS model with 0.7M parameters + - Post-processing integrated into the network to optimize deployment pipeline + +- PP-Human: + - Release PP-Human human analysis pipeline,including pedestrian detection, attribute recognition, human tracking, multi-camera tracking, human statistics, action recognition. Supporting deployment with TensorRT + - Release StrongBaseline model for attribute recognition + - Release Centroid model for ReID + - Release ST-GCN model for falldown action recognition + +- Model richness: + - Publish YOLOX object detection model, release series models: nano/tiny/s/m/l/x, and YOLOX-x achieves mAP as 51.8% on COCO val2017 dataset + +- Function Optimize: + - Optimize 20% training speed when training with EMA, improve saving method of EMA weights + - Support saving inference results in COCO format + +- Deployment Optimize: + - Support export ONNX model by Paddle2ONNX for all RCNN models + - Supoort export model with fused decode OP for SSD models to enhance inference speed in edge side + - Support export NMS to TensorRT model, optmize inference speed on TensorRT + +### 2.3(11.03/2021) + +- Feature models: + - Object detection: The lightweight object detection model PP-PicoDet, performace and inference speed reaches SOTA on mobile side + - Keypoint detection: The lightweight keypoint detection model PP-TinyPose for mobile side + +- Model richness: + - Object detection: + - Publish Swin-Transformer object detection model + - Publish TOOD(Task-aligned One-stage Object Detection) model + - Publish GFL(Generalized Focal Loss) object detection model + - Publish Sniper optimization method for tiny object detection, supporting Faster RCNN and PP-YOLO series models + - Publish PP-YOLO optimized model PP-YOLO-EB for EdgeBoard + - Multi-object tracking: + - Publish Real-time tracking system PP-Tracking + - Publish high-precision, small-scale and lightweight model based on FairMot + - Publish real-time tracking model zoo for pedestrian, head and vehicle tracking, including scenarios such as aerial surveillance, autonomous driving, dense crowds, and tiny object tracking + - DeepSort support PP-YOLO, PP-PicoDet as object detector + - Keypoint detection: + - Publish Lite HRNet model + +- Inference deployment: + - Support NPU deployment for YOLOv3 series + - Support C++ deployment for FairMot + - Support C++ and PaddleLite deployment for keypoint detection series model + +- Documents: + - Add series English documents + + +### 2.2(08.10/2021) + +- Model richness: + - Publish the Transformer test model: DETR, Deformable DETR, Sparse RCNN + - Key point test new Dark model, release Dark HRNet model + - Publish the MPII dataset HRNet keypoint detection model + - Release head and vehicle tracking vertical model + +- Model optimization: + - AlignConv optimization model was released by S2ANet, and DOTA dataset mAP was optimized to 74.0 + +- Inference deployment + - Mainstream models support batch size>1 predictive deployment, including YOLOv3, PP-YOLO, Faster RCNN, SSD, TTFNet, FCOS + - New addition of target tracking models (JDE, Fair Mot, Deep Sort) Python side prediction deployment support, and support for TensorRT prediction + - FairMot joint key point detection model deployment Python side predictive deployment support + - Added support for key point detection model combined with PP-YOLO prediction deployment + +- Documents: + - New TensorRT version notes to Windows Predictive Deployment documentation + - FAQ documents are updated + +- Bug fixes: + - Fixed PP-YOLO series model training convergence problem + - Fixed the problem of no label data training when batch_size > 1 + + +### 2.1(05.20/2021) +- Model richness enhancement: + - Key point model: HRNet, HigherHRNet + - Publish the multi-target tracking model: DeepSort, FairMot, JDE + +- Basic framework Capabilities: + - Supports training without labels + +- Forecast deployment: + - Paddle Inference YOLOv3 series model support batch_size>1 prediction + - Rotating frame detection S2ANet model prediction deployment is open + - Incremental quantization model benchmark + - Add dynamic graph model and static graph model: Paddle-Lite demo + +- Detection model compression: + - Release PP-YOLO series model compression model + +- Documents: + - Update quick start, forecast deployment and other tutorial documentation + - Added ONNX model export tutorial + - Added the mobile deployment document + + +### 2.0(04.15/2021) + + **Description:** Since version 2.0, dynamic graphs are used as the default version of Paddle Detection, the original `dygraph` directory is switched to the root directory, and the original static graph implementation is moved to the `static` directory. + + - Enhancement of dynamic graph model richness: + - PP-YOLOv2 and PP-YOLO tiny models were published. The accuracy of PP-YOLOv2 COCO Test dataset reached 49.5%, and the prediction speed of V100 reached 68.9 FPS + - Release the rotary frame detection model S2ANet + - Release the two-phase utility model PSS-Det + - Publish the face detection model Blazeface + + - New basic module: + - Added SENet, GhostNet, and Res2Net backbone networks + - Added VisualDL training visualization support + - Added single precision calculation and PR curve drawing function + - The YOLO models support THE NHWC data format + + - Forecast deployment: + - Publish forecast benchmark data for major models + - Adaptive to TensorRT6, support TensorRT dynamic size input, support TensorRT int8 quantitative prediction + - 7 types of models including PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN are deployed in Python/CPP/TRT prediction on Linux, Windows and NV Jetson platforms + + - Detection model compression: + - Distillation: Added dynamic map distillation support and released YOLOv3-MobileNetV1 distillation model + - Joint strategy: new dynamic graph prunning + distillation joint strategy compression scheme, and release YOLOv3-MobileNetV1 prunning + distillation compression model + - Problem fix: Fixed dynamic graph quantization model export problem + + - Documents: + - New English document of dynamic graph: including homepage document, getting started, quick start, model algorithm, new dataset, etc + - Added both English and Chinese installation documents of dynamic diagrams + - Added configuration file templates and description documents of dynamic graph RCNN series and YOLO series + + +## Historical Version Information + +### 2.0-rc(02.23/2021) + - Enhancement of dynamic graph model richness: + - Optimize networking and training mode of RCNN models, and improve accuracy of RCNN series models (depending on Paddle Develop or version 2.0.1) + - Added support for SSDLite, FCOS, TTFNet, SOLOv2 series models + - Added pedestrian and vehicle vertical object detection models + + - New dynamic graph basic module: + - Added MobileNetV3 and HRNet backbone networks + - Improved roi-align calculation logic for RCNN series models (depending on Paddle Develop or version 2.0.1) + - Added support for Synchronized Batch Norm + - Added support for Modulated Deformable Convolution + + - Forecast deployment: + - Publish dynamic diagrams in python, C++, and Serving deployment solution and documentation. Support Faster RCNN, Mask RCNN, YOLOv3, PPYOLO, SSD, TTFNet, FCOS, SOLOv2 and other models to predict deployment + - Dynamic graph prediction deployment supports TensorRT mode FP32, FP16 inference acceleration + + - Detection model compression: + - Prunning: Added dynamic graph prunning support, and released YOLOv3-MobileNetV1 prunning model + - Quantization: Added quantization support of dynamic graph, and released quantization models of YOLOv3-MobileNetV1 and YOLOv3-MobileNetV3 + + - Documents: + - New Dynamic Diagram tutorial documentation: includes installation instructions, quick start, data preparation, and training/evaluation/prediction process documentation + - New advanced tutorial documentation for dynamic diagrams: includes documentation for model compression and inference deployment + - Added dynamic graph model library documentation + +### v2.0-beta(12.20/2020) + - Dynamic graph support: + - Support for Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3 and SSD models, trial version. + - Model upgrade: + - Updated PP-YOLO Mobile-Netv3 large and small models with improved accuracy, and added prunning and distillation models. + - New features: + - Support VisualDL visual data preprocessing pictures. + + - Bug fix: + - Fix Blaze Face keypoint prediction bug. + + +### v0.5.0(11/2020) + - Model richness enhancement: + - SOLOv2 series models were released, in which the SOLOv2-Light-R50-VD-DCN-FPN model achieved 38.6 FPS on a single gpu V100, accelerating by 24%, and the accuracy of COCO verification set reached 38.8%, improving by 2.4 absolute percentage points. + - Added Android mobile terminal detection demo, including SSD, YOLO series model, can directly scan code installation experience. + + - Mobile terminal model optimization: + - Added to PACT's new quantization strategy, YOLOv3 Mobilenetv3 is 0.7% better than normal quantization on COCO datasets. + + - Ease of use and functional components: + - Enhance the function of generate_proposal_labels operator to avoid nan risk of the model. + - Fixed several problems with deploy python and C++ prediction. + - Unified COCO and VOC datasets under the evaluation process, support the output of a single class of AP and P-R curves. + - PP-YOLO supports rectangular input images. + + - Documents: + - Added object detection whole process tutorial, added Jetson platform deployment tutorial. + + +### v0.4.0(07/2020) + - Model richness enhancement: + - The PPYOLO model was released. The accuracy of COCO dataset reached 45.2%, and the prediction speed of single gpu V100 reached 72.9 FPS, which was better than that of YOL Ov4 model. + - New TTFNet model, base version aligned with competing products, COCO dataset accuracy up to 32.9%. + - New HTC model, base version aligned with competing products, COCO dataset accuracy up to 42.2%. + - BlazeFace key point detection model was added, with an accuracy of 85.2% in Wider-Face's Easy-Set. + - ACFPN model was added, and the accuracy of COCO dataset reached 39.6%. + - General object detection model (including 676 classes) on the publisher side. On the COCO dataset with the same strategy, when V100 is 19.5FPS, the COCO mAP can reach 49.4%. + + - Mobile terminal model optimization: + - Added SSD Lite series optimization models, including Ghost Net Backbone, FPN components, etc., with accuracy improved by 0.5% and 1.5%. + + - Ease of use and functional components: + - Add GridMask, Random Erasing data enhancement method. + - Added support for Matrix NMS. + - EMA(Exponential Moving Average) training support. + - The new multi-machine training method, the average acceleration ratio of two machines to single machine is 80%, multi-machine training support needs to be further verified. + +### v0.3.0(05/2020) + - Model richness enhancement: + - Efficientdet-D0 model added, speed and accuracy is better than competing products. + - Added YOLOv4 prediction model, precision aligned with competing products; Added YOLOv4 fine tuning training on Pascal VOC datasets with accuracy of 85.5%. + - YOLOv3 added MobileNetV3 backbone network, COCO dataset accuracy reached 31.6%. + - Add Anchor-free model FCOS, the accuracy is better than competing products. + - Anchor-free model Cornernet Squeeze was added, the accuracy was better than competing products, and the accuracy of COCO dataset of optimized model was 38.2% and +3.7%, 5% faster than YOL Ov3 Darknet53. + - The CascadeRCNN-ResNet50vd model, which is a practical object detection model on the server side, is added, and its speed and accuracy are better than that of the competitive EfficientDet. + + - Mobile terminal launched three models: + - SSSDLite model: SSDLite-Mobilenetv3 small/large model, with better accuracy than competitors. + - YOLOv3 Mobile solution: The YOLOv3-MobileNetv3 model accelerates 3.5 times after compression, which is faster and more accurate than the SSD Lite model of competing products. + - RCNN Mobile terminal scheme: CascadeRCNN-MobileNetv3, after series optimization, launched models with input images of 320x320 and 640x640 respectively, with high cost performance for speed and accuracy. + + - Anticipate deployment refactoring: + - New Python prediction deployment process, support for RCNN, YOLO, SSD, Retina Net, face models, support for video prediction. + - Refactoring C++ predictive deployment to improve ease of use. + + - Ease of use and functional components: + - Added Auto Augment data enhancement. + - Upgrade the detection library document structure. + - Support shape matching automatically by transfer learning. + - Optimize memory footprint during mask branch evaluation. + +### v0.2.0(02/2020) + - The new model: + - Added CBResNet model. + - Added LibraRCNN model. + - The accuracy of YOLOv3 model was further improved, and the accuracy based on COCO data reached 43.2%, 1.4% higher than the previous version. + - New Basic module: + - Trunk network: CBResNet is added. + - Loss module: Loss of YOLOv3 supports fine-grained OP combinations. + - Regular module: Added the Drop Block module. + - Function optimization and improvement: + - Accelerate YOLOv3 data preprocessing and increase the overall training speed by 40%. + - Optimize data preprocessing logic to improve ease of use. + - dd face detection prediction benchmark data. + - Added C++ prediction engine Python API prediction example. + - Detection model compression: + - prunning: Release MobileNet-YOLOv3 prunning scheme and model, based on VOC data FLOPs 69.6%, mAP + 1.4%, based on COCO DATA FLOPS 28.8%, mAP + 0.9%; Release ResNet50vd-DCN-YOLOv3 clipped solution and model based on COCO datasets 18.4%, mAP + 0.8%. + - Distillation: Release MobileNet-YOLOv3 distillation scheme and model, based on VOC data mAP + 2.8%, COCO data mAP + 2.1%. + - Quantification: Release quantification models of YOLOv3 Mobile Net and Blaze Face. + - Prunning + distillation: release MobileNet-YOLOv3 prunning + distillation solution and model, 69.6% based on COCO DATA FLOPS, 64.5% based on TensorRT prediction acceleration, 0.3% mAP; Release ResNet50vd-DCN-YOLOv3 tailoring + distillation solution and model, 43.7% based on COCO Data FLOPS, 24.0% based on TensorRT prediction acceleration, mAP + 0.6%. + - Search: Open source Blaze Face Nas complete search solution. + - Predict deployment: + - Integrated TensorRT, support FP16, FP32, INT8 quantitative inference acceleration. + - Document: + - Add detailed data preprocessing module to introduce documents and implement custom data Reader documents. + - Added documentation on how to add algorithm models. + - Document deployment to the web site: https://paddledetection.readthedocs.io + +### 12/2019 +- Add Res2Net model. +- Add HRNet model. +- Add GIOU loss and DIOU loss。 + + +### 21/11/2019 +- Add CascadeClsAware RCNN model. +- Add CBNet, ResNet200 and Non-local model. +- Add SoftNMS. +- Add Open Image V5 dataset and Objects365 dataset model + +### 10/2019 +- Added enhanced YOLOv3 model with accuracy up to 41.4%. +- Added Face detection models BlazeFace and Faceboxes. +- Rich COCO based models, accuracy up to 51.9%. +- Added CA-Cascade-RCNN, one of the best single models to win on Objects365 2019 Challenge. +- Add pedestrian detection and vehicle detection pre-training models. +- Support FP16 training. +- Added cross-platform C++ inference deployment scheme. +- Add model compression examples. + + +### 2/9/2019 +- Add GroupNorm model. +- Add CascadeRCNN+Mask model. + +### 5/8/2019 +- Add Modulated Deformable Convolution series model + +### 29/7/2019 + +- Add detection library Chinese document +- Fixed an issue where R-CNN series model training was evaluated simultaneously +- Add ResNext101-vd + Mask R-CNN + FPN models +- Added YOLOv3 model based on VOC dataset + +### 3/7/2019 + +- First release of PaddleDetection Detection library and Detection model library +- models:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask + R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD. diff --git a/PaddleDetection-release-2.6/docs/MODEL_ZOO_cn.md b/PaddleDetection-release-2.6/docs/MODEL_ZOO_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..73590b56ad3c68778253dfba01cb83efbca12a1e --- /dev/null +++ b/PaddleDetection-release-2.6/docs/MODEL_ZOO_cn.md @@ -0,0 +1,276 @@ +# 模型库和基线 + +# 内容 +- [基础设置](#基础设置) + - [测试环境](#测试环境) + - [通用设置](#通用设置) + - [训练策略](#训练策略) + - [ImageNet预训练模型](#ImageNet预训练模型) +- [基线](#基线) + - [目标检测](#目标检测) + - [实例分割](#实例分割) + - [PaddleYOLO](#PaddleYOLO) + - [人脸检测](#人脸检测) + - [旋转框检测](#旋转框检测) + - [关键点检测](#关键点检测) + - [多目标跟踪](#多目标跟踪) + +# 基础设置 + +## 测试环境 + +- Python 3.7 +- PaddlePaddle 每日版本 +- CUDA 10.1 +- cuDNN 7.5 +- NCCL 2.4.8 + +## 通用设置 + +- 所有模型均在COCO17数据集中训练和测试。 +- [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5)、[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6)、[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7)和[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8)这几类模型的代码在[PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)中,**PaddleYOLO库开源协议为GPL 3.0**。 +- 除非特殊说明,所有ResNet骨干网络采用[ResNet-B](https://arxiv.org/pdf/1812.01187)结构。 +- **推理时间(fps)**: 推理时间是在一张Tesla V100的GPU上通过'tools/eval.py'测试所有验证集得到,单位是fps(图片数/秒), cuDNN版本是7.5,包括数据加载、网络前向执行和后处理, batch size是1。 + +## 训练策略 + +- 我们采用和[Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules)相同的训练策略。 +- 1x 策略表示:在总batch size为8时,初始学习率为0.01,在8 epoch和11 epoch后学习率分别下降10倍,最终训练12 epoch。 +- 2x 策略为1x策略的两倍,同时学习率调整的epoch数位置也为1x的两倍。 + +## ImageNet预训练模型 + +Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型均通过标准的Imagenet-1k数据集训练得到,ResNet和MobileNet等是采用余弦学习率调整策略或SSLD知识蒸馏训练得到的高精度预训练模型,可在[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)查看模型细节。 + + +# 基线 + +## 目标检测 + +### Faster R-CNN + +请参考[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/) + +### YOLOv3 + +请参考[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/) + +### PP-YOLOE/PP-YOLOE+ + +请参考[PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/) + +### PP-YOLO/PP-YOLOv2 + +请参考[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/) + +### PicoDet + +请参考[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet) + +### RetinaNet + +请参考[RetinaNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/retinanet/) + +### Cascade R-CNN + +请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn) + +### SSD/SSDLite + +请参考[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/) + +### FCOS + +请参考[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/fcos/) + +### CenterNet + +请参考[CenterNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/centernet/) + +### TTFNet/PAFNet + +请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/) + +### Group Normalization + +请参考[Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/) + +### Deformable ConvNets v2 + +请参考[Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/) + +### HRNets + +请参考[HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/hrnet/) + +### Res2Net + +请参考[Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/res2net/) + +### ConvNeXt + +请参考[ConvNeXt](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/convnext/) + +### GFL + +请参考[GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl) + +### TOOD + +请参考[TOOD](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/tood) + +### PSS-DET(RCNN-Enhance) + +请参考[PSS-DET](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rcnn_enhance) + +### DETR + +请参考[DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/detr) + +### Deformable DETR + +请参考[Deformable DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/deformable_detr) + +### Sparse R-CNN + +请参考[Sparse R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/sparse_rcnn) + +### Vision Transformer + +请参考[Vision Transformer](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/vitdet) + +### DINO + +请参考[DINO](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dino) + +### YOLOX + +请参考[YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolox) + +### YOLOF + +请参考[YOLOF](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolof) + + +## 实例分割 + +### Mask R-CNN + +请参考[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/) + +### Cascade R-CNN + +请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn) + +### SOLOv2 + +请参考[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/) + +### QueryInst + +请参考[QueryInst](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/queryinst) + + +## [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) + +请参考[PaddleYOLO模型库](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_cn.md) + +### YOLOv5 + +请参考[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5) + +### YOLOv6(v3.0) + +请参考[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6) + +### YOLOv7 + +请参考[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) + +### YOLOv8 + +请参考[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8) + +### RTMDet + +请参考[RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/rtmdet) + + +## 人脸检测 + +请参考[人脸检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection) + +### BlazeFace + +请参考[BlazeFace](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/) + + +## 旋转框检测 + +请参考[旋转框检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate) + +### PP-YOLOE-R + +请参考[PP-YOLOE-R](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r) + +### FCOSR + +请参考[FCOSR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr) + +### S2ANet + +请参考[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet) + + +## 关键点检测 + +请参考[关键点检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint) + +### PP-TinyPose + +请参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/tiny_pose) + +### HRNet + +请参考[HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/hrnet) + +### Lite-HRNet + +请参考[Lite-HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/lite_hrnet) + +### HigherHRNet + +请参考[HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/higherhrnet) + + +## 多目标跟踪 + +请参考[多目标跟踪模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot) + +### DeepSORT + +请参考[DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/deepsort) + +### ByteTrack + +请参考[ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/bytetrack) + +### OC-SORT + +请参考[OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/ocsort) + +### BoT-SORT + +请参考[BoT-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/botsort) + +### CenterTrack + +请参考[CenterTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/centertrack) + +### FairMOT/MC-FairMOT + +请参考[FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/fairmot) + +### JDE + +请参考[JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde) diff --git a/PaddleDetection-release-2.6/docs/MODEL_ZOO_en.md b/PaddleDetection-release-2.6/docs/MODEL_ZOO_en.md new file mode 100644 index 0000000000000000000000000000000000000000..8af559146637bba7bd42a57c286bf415eb19f943 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/MODEL_ZOO_en.md @@ -0,0 +1,275 @@ +# Model Zoos and Baselines + +# Content +- [Basic Settings](#Basic-Settings) + - [Test Environment](#Test-Environment) + - [General Settings](#General-Settings) + - [Training strategy](#Training-strategy) + - [ImageNet pretraining model](#ImageNet-pretraining-model) +- [Baseline](#Baseline) + - [Object Detection](#Object-Detection) + - [Instance Segmentation](#Instance-Segmentation) + - [PaddleYOLO](#PaddleYOLO) + - [Face Detection](#Face-Detection) + - [Rotated Object detection](#Rotated-Object-detection) + - [KeyPoint Detection](#KeyPoint-Detection) + - [Multi Object Tracking](#Multi-Object-Tracking) + +# Basic Settings + +## Test Environment + +- Python 3.7 +- PaddlePaddle Daily version +- CUDA 10.1 +- cuDNN 7.5 +- NCCL 2.4.8 + +## General Settings + +- All models were trained and tested in the COCO17 dataset. +- The codes of [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) and [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8) can be found in [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO). Note that **the LICENSE of PaddleYOLO is GPL 3.0**. +- Unless special instructions, all the ResNet backbone network using [ResNet-B](https://arxiv.org/pdf/1812.01187) structure. +- **Inference time (FPS)**: The reasoning time was calculated on a Tesla V100 GPU by `tools/eval.py` testing all validation sets in FPS (number of pictures/second). CuDNN version is 7.5, including data loading, network forward execution and post-processing, and Batch size is 1. + +## Training strategy + +- We adopt and [Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules) in the same training strategy. +- 1x strategy indicates that when the total batch size is 8, the initial learning rate is 0.01, and the learning rate decreases by 10 times after 8 epoch and 11 epoch, respectively, and the final training is 12 epoch. +- 2x strategy is twice as much as strategy 1x, and the learning rate adjustment position of epochs is twice as much as strategy 1x. + +## ImageNet pretraining model +Paddle provides a skeleton network pretraining model based on ImageNet. All pre-training models were trained by standard Imagenet 1K dataset. ResNet and MobileNet are high-precision pre-training models obtained by cosine learning rate adjustment strategy or SSLD knowledge distillation training. Model details are available at [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). + + +# Baseline + +## Object Detection + +### Faster R-CNN + +Please refer to [Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/) + +### YOLOv3 + +Please refer to [YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/) + +### PP-YOLOE/PP-YOLOE+ + +Please refer to [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe/) + +### PP-YOLO/PP-YOLOv2 + +Please refer to [PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyolo/) + +### PicoDet + +Please refer to [PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/picodet) + +### RetinaNet + +Please refer to [RetinaNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/retinanet/) + +### Cascade R-CNN + +Please refer to [Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn) + +### SSD/SSDLite + +Please refer to [SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ssd/) + +### FCOS + +Please refer to [FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/fcos/) + +### CenterNet + +Please refer to [CenterNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/centernet/) + +### TTFNet/PAFNet + +Please refer to [TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ttfnet/) + +### Group Normalization + +Please refer to [Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gn/) + +### Deformable ConvNets v2 + +Please refer to [Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dcn/) + +### HRNets + +Please refer to [HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/hrnet/) + +### Res2Net + +Please refer to [Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/res2net/) + +### ConvNeXt + +Please refer to [ConvNeXt](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/convnext/) + +### GFL + +Please refer to [GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/gfl) + +### TOOD + +Please refer to [TOOD](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/tood) + +### PSS-DET(RCNN-Enhance) + +Please refer to [PSS-DET](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rcnn_enhance) + +### DETR + +Please refer to [DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/detr) + +### Deformable DETR + +Please refer to [Deformable DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/deformable_detr) + +### Sparse R-CNN + +Please refer to [Sparse R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/sparse_rcnn) + +### Vision Transformer + +Please refer to [Vision Transformer](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/vitdet) + +### DINO + +Please refer to [DINO](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/dino) + +### YOLOX + +Please refer to [YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolox) + +### YOLOF + +Please refer to [YOLOF](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolof) + + +## Instance-Segmentation + +### Mask R-CNN + +Please refer to [Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/) + +### Cascade R-CNN + +Please refer to [Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn) + +### SOLOv2 + +Please refer to [SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/solov2/) + +### QueryInst + +Please refer to [QueryInst](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/queryinst) + + +## [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) + +Please refer to [Model Zoo for PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_en.md) + +### YOLOv5 + +Please refer to [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5) + +### YOLOv6(v3.0) + +Please refer to [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6) + +### YOLOv7 + +Please refer to [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) + +### YOLOv8 + +Please refer to [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8) + +### RTMDet + +Please refer to [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/rtmdet) + + +## Face Detection + +Please refer to [Model Zoo for Face Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection) + +### BlazeFace + +Please refer to [BlazeFace](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/face_detection/) + + +## Rotated Object detection + +Please refer to [Model Zoo for Rotated Object Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate) + +### PP-YOLOE-R + +Please refer to [PP-YOLOE-R](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/ppyoloe_r) + +### FCOSR + +Please refer to [FCOSR](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/fcosr) + +### S2ANet + +Please refer to [S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/rotate/s2anet) + + +## KeyPoint Detection + +Please refer to [Model Zoo for KeyPoint Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint) + +### PP-TinyPose + +Please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/tiny_pose) + +### HRNet + +Please refer to [HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/hrnet) + +### Lite-HRNet + +Please refer to [Lite-HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/lite_hrnet) + +### HigherHRNet + +Please refer to [HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/keypoint/higherhrnet) + + +## Multi-Object Tracking + +Please refer to [Model Zoo for Multi-Object Tracking](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot) + +### DeepSORT + +Please refer to [DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/deepsort) + +### ByteTrack + +Please refer to [ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/bytetrack) + +### OC-SORT + +Please refer to [OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/ocsort) + +### BoT-SORT + +Please refer to [BoT-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/botsort) + +### CenterTrack + +Please refer to [CenterTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/centertrack) + +### FairMOT/MC-FairMOT + +Please refer to [FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/fairmot) + +### JDE + +Please refer to [JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mot/jde) diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL.md new file mode 100644 index 0000000000000000000000000000000000000000..1d3e58d909c9e5ae48028c4bc0beb71ad08bf363 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL.md @@ -0,0 +1,407 @@ +# 新增模型算法 +为了让用户更好的使用PaddleDetection,本文档中,我们将介绍PaddleDetection的主要模型技术细节及应用 + +## 目录 +- [1.简介](#1.简介) +- [2.新增模型](#2.新增模型) + - [2.1新增网络结构](#2.1新增网络结构) + - [2.1.1新增Backbone](#2.1.1新增Backbone) + - [2.1.2新增Neck](#2.1.2新增Neck) + - [2.1.3新增Head](#2.1.3新增Head) + - [2.1.4新增Loss](#2.1.4新增Loss) + - [2.1.5新增后处理模块](#2.1.5新增后处理模块) + - [2.1.6新增Architecture](#2.1.6新增Architecture) + - [2.2新增配置文件](#2.2新增配置文件) + - [2.2.1网络结构配置文件](#2.2.1网络结构配置文件) + - [2.2.2优化器配置文件](#2.2.2优化器配置文件) + - [2.2.3Reader配置文件](#2.2.3Reader配置文件) + +### 1.简介 +PaddleDetecion中的每一种模型对应一个文件夹,以yolov3为例,yolov3系列的模型对应于`configs/yolov3`文件夹,其中yolov3_darknet的总配置文件`configs/yolov3/yolov3_darknet53_270e_coco.yml`的内容如下: +``` +_BASE_: [ + '../datasets/coco_detection.yml', # 数据集配置文件,所有模型共用 + '../runtime.yml', # 运行时相关配置 + '_base_/optimizer_270e.yml', # 优化器相关配置 + '_base_/yolov3_darknet53.yml', # yolov3网络结构配置文件 + '_base_/yolov3_reader.yml', # yolov3 Reader模块配置 +] + +# 定义在此处的相关配置可以覆盖上述文件中的同名配置 +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final +``` +可以看到,配置文件中的模块进行了清晰的划分,除了公共的数据集配置以及运行时配置,其他配置被划分为优化器,网络结构以及Reader模块。PaddleDetection中支持丰富的优化器,学习率调整策略,预处理算子等,因此大多数情况下不需要编写优化器以及Reader相关的代码,而只需要在配置文件中配置即可。因此,新增一个模型的主要在于搭建网络结构。 + +PaddleDetection网络结构的代码在`ppdet/modeling/`中,所有网络结构以组件的形式进行定义与组合,网络结构的主要构成如下所示: +``` + ppdet/modeling/ + ├── architectures + │ ├── faster_rcnn.py # Faster Rcnn模型 + │ ├── ssd.py # SSD模型 + │ ├── yolo.py # YOLOv3模型 + │ │ ... + ├── heads # 检测头模块 + │ ├── xxx_head.py # 定义各类检测头 + │ ├── roi_extractor.py #检测感兴趣区域提取 + ├── backbones # 基干网络模块 + │ ├── resnet.py # ResNet网络 + │ ├── mobilenet.py # MobileNet网络 + │ │ ... + ├── losses # 损失函数模块 + │ ├── xxx_loss.py # 定义注册各类loss函数 + ├── necks # 特征融合模块 + │ ├── xxx_fpn.py # 定义各种FPN模块 + ├── proposal_generator # anchor & proposal生成与匹配模块 + │ ├── anchor_generator.py # anchor生成模块 + │ ├── proposal_generator.py # proposal生成模块 + │ ├── target.py # anchor & proposal的匹配函数 + │ ├── target_layer.py # anchor & proposal的匹配模块 + ├── tests # 单元测试模块 + │ ├── test_xxx.py # 对网络中的算子以及模块结构进行单元测试 + ├── ops.py # 封装各类PaddlePaddle物体检测相关公共检测组件/算子 + ├── layers.py # 封装及注册各类PaddlePaddle物体检测相关公共检测组件/算子 + ├── bbox_utils.py # 封装检测框相关的函数 + ├── post_process.py # 封装及注册后处理相关模块 + ├── shape_spec.py # 定义模块输出shape的类 +``` + +![](../images/model_figure.png) + +### 2.新增模型 +接下来,以单阶段检测器YOLOv3为例,对建立模型过程进行详细描述,按照此思路您可以快速搭建新的模型。 + +#### 2.1新增网络结构 + +##### 2.1.1新增Backbone + +PaddleDetection中现有所有Backbone网络代码都放置在`ppdet/modeling/backbones`目录下,所以我们在其中新建`darknet.py`如下: +```python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class DarkNet(nn.Layer): + + __shared__ = ['norm_type'] + + def __init__(self, + depth=53, + return_idx=[2, 3, 4], + norm_type='bn', + norm_decay=0.): + super(DarkNet, self).__init__() + # 省略内容 + + def forward(self, inputs): + # 省略处理逻辑 + pass + + @property + def out_shape(self): + # 省略内容 + pass +``` +然后在`backbones/__init__.py`中加入引用: +```python +from . import darknet +from .darknet import * +``` +**几点说明:** +- 为了在yaml配置文件中灵活配置网络,所有Backbone需要利用`ppdet.core.workspace`里的`register`进行注册,形式请参考如上示例。此外,可以使用`serializable`以使backbone支持序列化; +- 所有的Backbone需继承`paddle.nn.Layer`类,并实现forward函数。此外,还需实现out_shape属性定义输出的feature map的channel信息,具体可参见源码; +- `__shared__`为了实现一些参数的配置全局共享,这些参数可以被backbone, neck,head,loss等所有注册模块共享。 + +##### 2.1.2新增Neck +特征融合模块放置在`ppdet/modeling/necks`目录下,我们在其中新建`yolo_fpn.py`如下: + +``` python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class YOLOv3FPN(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + in_channels=[256, 512, 1024], + norm_type='bn'): + super(YOLOv3FPN, self).__init__() + # 省略内容 + + def forward(self, blocks): + # 省略内容 + pass + + @classmethod + def from_config(cls, cfg, input_shape): + # 省略内容 + pass + + @property + def out_shape(self): + # 省略内容 + pass +``` +然后在`necks/__init__.py`中加入引用: +```python +from . import yolo_fpn +from .yolo_fpn import * +``` +**几点说明:** +- neck模块需要使用`register`进行注册,可以使用`serializable`进行序列化; +- neck模块需要继承`paddle.nn.Layer`类,并实现forward函数。除此之外,还需要实现`out_shape`属性,用于定义输出的feature map的channel信息,还需要实现类函数`from_config`用于在配置文件中推理出输入channel,并用于`YOLOv3FPN`的初始化; +- neck模块可以使用`__shared__`实现一些参数的配置全局共享。 + +##### 2.1.3新增Head +Head模块全部存放在`ppdet/modeling/heads`目录下,我们在其中新建`yolo_head.py`如下 +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Head(nn.Layer): + __shared__ = ['num_classes'] + __inject__ = ['loss'] + + def __init__(self, + anchors=[[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45],[59, 119], + [116, 90], [156, 198], [373, 326]], + anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]], + num_classes=80, + loss='YOLOv3Loss', + iou_aware=False, + iou_aware_factor=0.4): + super(YOLOv3Head, self).__init__() + # 省略内容 + + def forward(self, feats, targets=None): + # 省略内容 + pass +``` +然后在`heads/__init__.py`中加入引用: +```python +from . import yolo_head +from .yolo_head import * +``` +**几点说明:** +- Head模块需要使用`register`进行注册; +- Head模块需要继承`paddle.nn.Layer`类,并实现forward函数。 +- `__inject__`表示引入全局字典中已经封装好的模块。如loss等。 + +##### 2.1.4新增Loss +Loss模块全部存放在`ppdet/modeling/losses`目录下,我们在其中新建`yolo_loss.py`下 +```python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Loss(nn.Layer): + + __inject__ = ['iou_loss', 'iou_aware_loss'] + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + ignore_thresh=0.7, + label_smooth=False, + downsample=[32, 16, 8], + scale_x_y=1., + iou_loss=None, + iou_aware_loss=None): + super(YOLOv3Loss, self).__init__() + # 省略内容 + + def forward(self, inputs, targets, anchors): + # 省略内容 + pass +``` +然后在`losses/__init__.py`中加入引用: +```python +from . import yolo_loss +from .yolo_loss import * +``` +**几点说明:** +- loss模块需要使用`register`进行注册; +- loss模块需要继承`paddle.nn.Layer`类,并实现forward函数。 +- 可以使用`__inject__`表示引入全局字典中已经封装好的模块,使用`__shared__`可以实现一些参数的配置全局共享。 + +##### 2.1.5新增后处理模块 +后处理模块定义在`ppdet/modeling/post_process.py`中,其中定义了`BBoxPostProcess`类来进行后处理操作,如下所示: +``` python +from ppdet.core.workspace import register + +@register +class BBoxPostProcess(object): + __shared__ = ['num_classes'] + __inject__ = ['decode', 'nms'] + + def __init__(self, num_classes=80, decode=None, nms=None): + # 省略内容 + pass + + def __call__(self, head_out, rois, im_shape, scale_factor): + # 省略内容 + pass +``` +**几点说明:** +- 后处理模块需要使用`register`进行注册 +- `__inject__`注入了全局字典中封装好的模块,如decode和nms等。decode和nms定义在`ppdet/modeling/layers.py`中。 + +##### 2.1.6新增Architecture + +所有architecture网络代码都放置在`ppdet/modeling/architectures`目录下,`meta_arch.py`中定义了`BaseArch`类,代码如下: +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class BaseArch(nn.Layer): + def __init__(self): + super(BaseArch, self).__init__() + + def forward(self, inputs): + self.inputs = inputs + self.model_arch() + + if self.training: + out = self.get_loss() + else: + out = self.get_pred() + return out + + def model_arch(self, ): + pass + + def get_loss(self, ): + raise NotImplementedError("Should implement get_loss method!") + + def get_pred(self, ): + raise NotImplementedError("Should implement get_pred method!") +``` +所有的architecture需要继承`BaseArch`类,如`yolo.py`中的`YOLOv3`定义如下: +``` python +@register +class YOLOv3(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone='DarkNet', + neck='YOLOv3FPN', + yolo_head='YOLOv3Head', + post_process='BBoxPostProcess'): + super(YOLOv3, self).__init__() + self.backbone = backbone + self.neck = neck + self.yolo_head = yolo_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # 省略内容 + pass + + def get_loss(self): + # 省略内容 + pass + + def get_pred(self): + # 省略内容 + pass +``` + +**几点说明:** +- 所有的architecture需要使用`register`进行注册 +- 在组建一个完整的网络时必须要设定`__category__ = 'architecture'`来表示一个完整的物体检测模型; +- backbone, neck, yolo_head以及post_process等检测组件传入到architecture中组成最终的网络。像这样将检测模块化,提升了检测模型的复用性,可以通过组合不同的检测组件得到多个模型。 +- from_config类函数实现了模块间组合时channel的自动配置。 + +#### 2.2新增配置文件 + +##### 2.2.1网络结构配置文件 +上面详细地介绍了如何新增一个architecture,接下来演示如何配置一个模型,yolov3关于网络结构的配置在`configs/yolov3/_base_/`文件夹中定义,如`yolov3_darknet53.yml`定义了yolov3_darknet的网络结构,其定义如下: +``` +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 + +``` +可以看到在配置文件中,首先需要指定网络的architecture,pretrain_weights指定训练模型的url或者路径,norm_type等可以作为全局参数共享。模型的定义自上而下依次在文件中定义,与上节中的模型组件一一对应。对于一些模型组件,如果采用默认 +的参数,可以不用配置,如上文中的`yolo_fpn`。通过改变相关配置,我们可以轻易地组合出另一个模型,比如`configs/yolov3/_base_/yolov3_mobilenet_v1.yml`将backbone从Darknet切换成MobileNet。 + +##### 2.2.2优化器配置文件 +优化器配置文件定义模型使用的优化器以及学习率的调度策略,目前PaddleDetection中已经集成了多种多样的优化器和学习率策略,具体可参见代码`ppdet/optimizer.py`。比如,yolov3的优化器配置文件定义在`configs/yolov3/_base_/optimizer_270e.yml`,其定义如下: +``` +epoch: 270 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + # epoch数目 + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 +``` +**几点说明:** +- 可以通过OptimizerBuilder.optimizer指定优化器的类型及参数,目前支持的优化器可以参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html) +- 可以设置LearningRate.schedulers设置不同学习率调整策略的组合,PaddlePaddle目前支持多种学习率调整策略,具体也可参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)。需要注意的是,你需要对于PaddlePaddle中的学习率调整策略进行简单的封装,具体可参考源码`ppdet/optimizer.py`。 + +##### 2.2.3Reader配置文件 +关于Reader的配置可以参考[Reader配置文档](./READER.md#5.配置及运行)。 + +> 看过此文档,您应该对PaddleDetection中模型搭建与配置有了一定经验,结合源码会理解的更加透彻。关于模型技术,如您有其他问题或建议,请给我们提issue,我们非常欢迎您的反馈。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL_en.md new file mode 100644 index 0000000000000000000000000000000000000000..927a08596cd5086c3c610779f12b4b99421cce87 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/MODEL_TECHNICAL_en.md @@ -0,0 +1,409 @@ +# How to Create Model Algorithm +In order to make better use of PaddleDetection, we will introduce the main model technical details and application of PaddleDetection in this document + +## Directory +- [How to Create Model Algorithm](#how-to-create-model-algorithm) + - [Directory](#directory) + - [1. Introduction](#1-introduction) + - [2. Create Model](#2-create-model) + - [2.1 Create Model Structure](#21-create-model-structure) + - [2.1.1 Create Backbone](#211-create-backbone) + - [2.1.2 Create Neck](#212-create-neck) + - [2.1.3 Create Head](#213-create-head) + - [2.1.4 Create Loss](#214-create-loss) + - [2.1.5 Create Post-processing Module](#215-create-post-processing-module) + - [2.1.6 Create Architecture](#216-create-architecture) + - [2.2 Create Configuration File](#22-create-configuration-file) + - [2.2.1 Network Structure Configuration File](#221-network-structure-configuration-file) + - [2.2.2 Optimizer configuration file](#222-optimizer-configuration-file) + - [2.2.3 Reader Configuration File](#223-reader-configuration-file) + +### 1. Introduction +Each model in the PaddleDetecion corresponds to a folder. In the case of Yolov3, models in the Yolov3 family correspond to the `configs/yolov3` folder. Yolov3 Darknet's general configuration file `configs/yolov3/yolov3_darknet53_270e_coco.yml`. +``` +_BASE_: [ + '../datasets/coco_detection.yml', # Dataset configuration file shared by all models + '../runtime.yml', # Runtime configuration + '_base_/optimizer_270e.yml', # Optimizer related configuration + '_base_/yolov3_darknet53.yml', # yolov3 Network structure configuration file + '_base_/yolov3_reader.yml', # yolov3 Reader module configuration +] + +# The relevant configuration defined here can override the configuration of the same name in the above file +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final +``` +As you can see, the modules in the configuration file are clearly divided into optimizer, network structure, and reader modules, with the exception of the common dataset configuration and runtime configuration. Rich optimizers, learning rate adjustment strategies, preprocessing operators, etc., are supported in PaddleDetection, so most of the time you don't need to write the optimizer and reader-related code, just configure it in the configuration file. Therefore, the main purpose of adding a new model is to build the network structure. + +In `ppdet/modeling/`, all of the Paddle Detection network structures are defined and combined in the form of components. The main components of the network structure are as follows: +``` + ppdet/modeling/ + ├── architectures + │ ├── faster_rcnn.py # Faster Rcnn model + │ ├── ssd.py # SSD model + │ ├── yolo.py # YOLOv3 model + │ │ ... + ├── heads # detection head module + │ ├── xxx_head.py # define various detection heads + │ ├── roi_extractor.py # detection of region of interest extraction + ├── backbones # backbone network module + │ ├── resnet.py # ResNet network + │ ├── mobilenet.py # MobileNet network + │ │ ... + ├── losses # loss function module + │ ├── xxx_loss.py # define and register various loss functions + ├── necks # feature fusion module + │ ├── xxx_fpn.py # define various FPN modules + ├── proposal_generator # anchor & proposal generate and match modules + │ ├── anchor_generator.py # anchor generate modules + │ ├── proposal_generator.py # proposal generate modules + │ ├── target.py # anchor & proposal Matching function + │ ├── target_layer.py # anchor & proposal Matching function + ├── tests # unit test module + │ ├── test_xxx.py # the operator and module structure in the network are unit tested + ├── ops.py # encapsulates all kinds of common detection components/operators related to the detection of PaddlePaddle objects + ├── layers.py # encapsulates and register all kinds of PaddlePaddle object detection related public detection components/operators + ├── bbox_utils.py # encapsulates the box-related functions + ├── post_process.py # encapsulate and process related modules after registration + ├── shape_spec.py # defines a class for the module to output shape +``` + +![](../images/model_figure.png) + +### 2. Create Model +Next, the modeling process is described in detail by taking the single-stage detector YOLOv3 as an example, so that you can quickly build a new model according to this idea. + +#### 2.1 Create Model Structure + +##### 2.1.1 Create Backbone + +All existing Backbone network code in PaddleDetection is placed under `ppdet/modeling/backbones` directory, so we created `darknet.py` as follows: +```python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class DarkNet(nn.Layer): + + __shared__ = ['norm_type'] + + def __init__(self, + depth=53, + return_idx=[2, 3, 4], + norm_type='bn', + norm_decay=0.): + super(DarkNet, self).__init__() + # Omit the content + + def forward(self, inputs): + # Ellipsis processing logic + pass + + @property + def out_shape(self): + # Omit the content + pass +``` +Then add a reference to `backbones/__init__.py`: +```python +from . import darknet +from .darknet import * +``` +**A few notes:** +- To flexibly configure networks in the YAML configuration file, all backbone nodes need to register in `ppdet.core.workspace` as shown in the preceding example. In addition, `serializable` can be used to enable backbone to support serialization; +- All backbone needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, it is necessary to implement the out shape attribute to define the channel information of the output feature map. For details, please refer to the source code. +- `__shared__` To realize global sharing of configuration parameters, these parameters can be shared by all registration modules, such as backbone, neck, head, and loss. + +##### 2.1.2 Create Neck +The feature fusion module is placed under the `ppdet/modeling/necks` directory and we create the following `yolo_fpn.py`: + +``` python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class YOLOv3FPN(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + in_channels=[256, 512, 1024], + norm_type='bn'): + super(YOLOv3FPN, self).__init__() + # Omit the content + + def forward(self, blocks): + # Omit the content + pass + + @classmethod + def from_config(cls, cfg, input_shape): + # Omit the content + pass + + @property + def out_shape(self): + # Omit the content + pass +``` +Then add a reference to `necks/__init__.py`: +```python +from . import yolo_fpn +from .yolo_fpn import * +``` +**A few notes:** +- The neck module needs to be registered with `register` and can be serialized with `serializable`. +- The neck module needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, the `out_shape` attribute needs to be implemented to define the channel information of the output feature map, and the class function `from_config` needs to be implemented to deduce the input channel in the configuration file and initialize `YOLOv3FPN`. +- The neck module can use `shared` to implement global sharing of configuration parameters. + +##### 2.1.3 Create Head +The head module is all stored in the `ppdet/modeling/heads` directory, where we create `yolo_head.py` as follows +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Head(nn.Layer): + __shared__ = ['num_classes'] + __inject__ = ['loss'] + + def __init__(self, + anchors=[[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45],[59, 119], + [116, 90], [156, 198], [373, 326]], + anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]], + num_classes=80, + loss='YOLOv3Loss', + iou_aware=False, + iou_aware_factor=0.4): + super(YOLOv3Head, self).__init__() + # Omit the content + + def forward(self, feats, targets=None): + # Omit the content + pass +``` +Then add a reference to `heads/__init__.py`: +```python +from . import yolo_head +from .yolo_head import * +``` +**A few notes:** +- The head module needs to register with `register`. +- The head module needs to inherit the `paddle.nn.Layer` class and implement the forward function. +- `__inject__` indicates that the module encapsulated in the global dictionary is imported. Such as loss, etc. + +##### 2.1.4 Create Loss +The loss modules are all stored under `ppdet/modeling/losses` directory, where we created `yolo_loss.py` +```python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Loss(nn.Layer): + + __inject__ = ['iou_loss', 'iou_aware_loss'] + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + ignore_thresh=0.7, + label_smooth=False, + downsample=[32, 16, 8], + scale_x_y=1., + iou_loss=None, + iou_aware_loss=None): + super(YOLOv3Loss, self).__init__() + # Omit the content + + def forward(self, inputs, targets, anchors): + # Omit the content + pass +``` +Then add a reference to `losses/__init__.py`: +```python +from . import yolo_loss +from .yolo_loss import * +``` +**A few notes:** +- The loss module needs to register with `register`. +- The loss module needs to inherit the `paddle.nn.Layer` class and implement the forward function. +- `__inject__` modules that have been encapsulated in the global dictionary can be used. Some parameters can be globally shared with `__shared__` configuration. + +##### 2.1.5 Create Post-processing Module +The post-processing module is defined in `ppdet/modeling/post_process.py`, where the `BBoxPostProcess` class is defined for post-processing operations, as follows: +``` python +from ppdet.core.workspace import register + +@register +class BBoxPostProcess(object): + __shared__ = ['num_classes'] + __inject__ = ['decode', 'nms'] + + def __init__(self, num_classes=80, decode=None, nms=None): + # Omit the content + pass + + def __call__(self, head_out, rois, im_shape, scale_factor): + # Omit the content + pass +``` +**A few notes:** +- Post-processing modules need to register with `register` +- `__inject__` modules encapsulated in the global dictionary, such as decode and NMS. Decode and NMS are defined in `ppdet/modeling/layers.py`. + +##### 2.1.6 Create Architecture + +All architecture network code is placed in `ppdet/modeling/architectures` directory, `meta_arch.py` defines the `BaseArch` class, the code is as follows: +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class BaseArch(nn.Layer): + def __init__(self): + super(BaseArch, self).__init__() + + def forward(self, inputs): + self.inputs = inputs + self.model_arch() + + if self.training: + out = self.get_loss() + else: + out = self.get_pred() + return out + + def model_arch(self, ): + pass + + def get_loss(self, ): + raise NotImplementedError("Should implement get_loss method!") + + def get_pred(self, ): + raise NotImplementedError("Should implement get_pred method!") +``` +All architecture needs to inherit from the `BaseArch` class, as defined by `yolo.py` in `YOLOv3` as follows: +``` python +@register +class YOLOv3(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone='DarkNet', + neck='YOLOv3FPN', + yolo_head='YOLOv3Head', + post_process='BBoxPostProcess'): + super(YOLOv3, self).__init__() + self.backbone = backbone + self.neck = neck + self.yolo_head = yolo_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # Omit the content + pass + + def get_loss(self): + # Omit the content + pass + + def get_pred(self): + # Omit the content + pass +``` + +**A few notes:** +- All architecture needs to be registered using a `register` +- When constructing a complete network, `__category__ = 'architecture'` must be set to represent a complete object detection model; +- Backbone, neck, YOLO head, post-process and other inspection components are passed into the architecture to form the final network. Modularization of detection like this improves the reusability of detection models, and multiple models can be obtained by combining different detection components. +- The from config class function implements the automatic configuration of channels when modules are combined. + +#### 2.2 Create Configuration File + +##### 2.2.1 Network Structure Configuration File +The configuration of the yolov3 network structure is defined in the `configs/yolov3/_base_/` folder. For example, `yolov3_darknet53.yml` defines the network structure of Yolov3 Darknet as follows: +``` +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 + +``` +In the configuration file, you need to specify the network architecture, pretrain weights to specify the URL or path of the training model, and norm type to share as global parameters. The definition of the model is defined in the file from top to bottom, corresponding to the model components in the previous section. For some model components, if the default parameters are used, you do not need to configure them, such as `yolo_fpn` above. By changing related configuration, we can easily combine another model, such as `configs/yolov3/_base_/yolov3_mobilenet_v1.yml` to switch backbone from Darknet to MobileNet. + +##### 2.2.2 Optimizer configuration file +The optimizer profile defines the optimizer used by the model and the learning rate scheduling strategy. Currently, a variety of optimizers and learning rate strategies have been integrated in PaddleDetection, as described in the code `ppdet/optimizer.py`. For example, the optimizer configuration file for yolov3 is defined in `configs/yolov3/_base_/optimizer_270e.yml` as follows: +``` +epoch: 270 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + # epoch number + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 +``` +**A few notes:** +- Optimizer builder. Optimizer specifies the type and parameters of the Optimizer. Currently support the optimizer can reference [PaddlePaddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html) +- The `LearningRate.schedulers` sets the combination of different Learning Rate adjustment strategies. Paddle currently supports a variety of Learning Rate adjustment strategies. Specific also can reference [Paddle Paddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html). It is important to note that you need to simply package the learning rate adjustment strategy in Paddle, which can be found in the source code `ppdet/optimizer.py`. + + +##### 2.2.3 Reader Configuration File +For Reader configuration, see [Reader configuration documentation](./READER_en.md#5.Configuration-and-Operation). + +> After reading this document, you should have some experience in model construction and configuration of Paddle Detection, and you will understand it more thoroughly with the source code. If you have other questions or suggestions about model technology, please send us an issue. We welcome your feedback. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/READER.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/READER.md new file mode 100644 index 0000000000000000000000000000000000000000..21b7865dfd7a9bb5af41c9902ce2a89e87b29a69 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/READER.md @@ -0,0 +1,336 @@ +# 数据处理模块 + +## 目录 +- [1.简介](#1.简介) +- [2.数据集](#2.数据集) + - [2.1COCO数据集](#2.1COCO数据集) + - [2.2Pascal VOC数据集](#2.2Pascal-VOC数据集) + - [2.3自定义数据集](#2.3自定义数据集) +- [3.数据预处理](#3.数据预处理) + - [3.1数据增强算子](#3.1数据增强算子) + - [3.2自定义数据增强算子](#3.2自定义数据增强算子) +- [4.Raeder](#4.Reader) +- [5.配置及运行](#5.配置及运行) + - [5.1配置](#5.1配置) + - [5.2运行](#5.2运行) + +### 1.简介 +PaddleDetection的数据处理模块的所有代码逻辑在`ppdet/data/`中,数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。 +数据处理模块的主要构成如下架构所示: +```bash + ppdet/data/ + ├── reader.py # 基于Dataloader封装的Reader模块 + ├── source # 数据源管理模块 + │ ├── dataset.py # 定义数据源基类,各类数据集继承于此 + │ ├── coco.py # COCO数据集解析与格式化数据 + │ ├── voc.py # Pascal VOC数据集解析与格式化数据 + │ ├── widerface.py # WIDER-FACE数据集解析与格式化数据 + │ ├── category.py # 相关数据集的类别信息 + ├── transform # 数据预处理模块 + │ ├── batch_operators.py # 定义各类基于批量数据的预处理算子 + │ ├── op_helper.py # 预处理算子的辅助函数 + │ ├── operators.py # 定义各类基于单张图片的预处理算子 + │ ├── gridmask_utils.py # GridMask数据增强函数 + │ ├── autoaugment_utils.py # AutoAugment辅助函数 + ├── shm_utils.py # 用于使用共享内存的辅助函数 + ``` + + +### 2.数据集 +数据集定义在`source`目录下,其中`dataset.py`中定义了数据集的基类`DetDataSet`, 所有的数据集均继承于基类,`DetDataset`基类里定义了如下等方法: + +| 方法 | 输入 | 输出 | 备注 | +| :------------------------: | :----: | :------------: | :--------------: | +| \_\_len\_\_ | 无 | int, 数据集中样本的数量 | 过滤掉了无标注的样本 | +| \_\_getitem\_\_ | int, 样本的索引idx | dict, 索引idx对应的样本roidb | 得到transform之后的样本roidb | +| check_or_download_dataset | 无 | 无 | 检查数据集是否存在,如果不存在则下载,目前支持COCO, VOC,widerface等数据集 | +| set_kwargs | 可选参数,以键值对的形式给出 | 无 | 目前用于支持接收mixup, cutmix等参数的设置 | +| set_transform | 一系列的transform函数 | 无 | 设置数据集的transform函数 | +| set_epoch | int, 当前的epoch | 无 | 用于dataset与训练过程的交互 | +| parse_dataset | 无 | 无 | 用于从数据中读取所有的样本 | +| get_anno | 无 | 无 | 用于获取标注文件的路径 | + +当一个数据集类继承自`DetDataSet`,那么它只需要实现parse_dataset函数即可。parse_dataset根据数据集设置的数据集根路径dataset_dir,图片文件夹image_dir, 标注文件路径anno_path取出所有的样本,并将其保存在一个列表roidbs中,每一个列表中的元素为一个样本xxx_rec(比如coco_rec或者voc_rec),用dict表示,dict中包含样本的image, gt_bbox, gt_class等字段。COCO和Pascal-VOC数据集中的xxx_rec的数据结构定义如下: + ```python + xxx_rec = { + 'im_file': im_fname, # 一张图像的完整路径 + 'im_id': np.array([img_id]), # 一张图像的ID序号 + 'h': im_h, # 图像高度 + 'w': im_w, # 图像宽度 + 'is_crowd': is_crowd, # 是否是群落对象, 默认为0 (VOC中无此字段) + 'gt_class': gt_class, # 标注框标签名称的ID序号 + 'gt_bbox': gt_bbox, # 标注框坐标(xmin, ymin, xmax, ymax) + 'gt_poly': gt_poly, # 分割掩码,此字段只在coco_rec中出现,默认为None + 'difficult': difficult # 是否是困难样本,此字段只在voc_rec中出现,默认为0 + } + ``` + +xxx_rec中的内容也可以通过`DetDataSet`的data_fields参数来控制,即可以过滤掉一些不需要的字段,但大多数情况下不需要修改,按照`configs/datasets`中的默认配置即可。 + +此外,在parse_dataset函数中,保存了类别名到id的映射的一个字典`cname2cid`。在coco数据集中,会利用[COCO API](https://github.com/cocodataset/cocoapi)从标注文件中加载数据集的类别名,并设置此字典。在voc数据集中,如果设置`use_default_label=False`,将从`label_list.txt`中读取类别列表,反之将使用voc默认的类别列表。 + +#### 2.1COCO数据集 +COCO数据集目前分为COCO2014和COCO2017,主要由json文件和image文件组成,其组织结构如下所示: + + ``` + dataset/coco/ + ├── annotations + │ ├── instances_train2014.json + │ ├── instances_train2017.json + │ ├── instances_val2014.json + │ ├── instances_val2017.json + │ │ ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ │ ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ │ ... + ``` + +在`source/coco.py`中定义并注册了`COCODataSet`数据集类,其继承自`DetDataSet`,并实现了parse_dataset方法,调用[COCO API](https://github.com/cocodataset/cocoapi)加载并解析COCO格式数据源`roidbs`和`cname2cid`,具体可参见`source/coco.py`源码。将其他数据集转换成COCO格式可以参考[用户数据转成COCO数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成COCO数据) + +#### 2.2Pascal VOC数据集 +该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示: +``` + dataset/voc/ + ├── trainval.txt + ├── test.txt + ├── label_list.txt (optional) + ├── VOCdevkit/VOC2007 + │ ├── Annotations + │ ├── 001789.xml + │ │ ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ │ ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ │ ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ │ ... + │ ├── ImageSets + │ │ ... + ``` +在`source/voc.py`中定义并注册了`VOCDataSet`数据集,它继承自`DetDataSet`基类,并重写了`parse_dataset`方法,解析VOC数据集中xml格式标注文件,更新`roidbs`和`cname2cid`。将其他数据集转换成VOC格式可以参考[用户数据转成VOC数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成VOC数据) + +#### 2.3自定义数据集 +如果COCODataSet和VOCDataSet不能满足你的需求,可以通过自定义数据集的方式来加载你的数据集。只需要以下两步即可实现自定义数据集 + +1. 新建`source/xxx.py`,定义类`XXXDataSet`继承自`DetDataSet`基类,完成注册与序列化,并重写`parse_dataset`方法对`roidbs`与`cname2cid`更新: + ```python + from ppdet.core.workspace import register, serializable + + #注册并序列化 + @register + @serializable + class XXXDataSet(DetDataSet): + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + ... + ): + self.roidbs = None + self.cname2cid = None + ... + + def parse_dataset(self): + ... + 省略具体解析数据逻辑 + ... + self.roidbs, self.cname2cid = records, cname2cid + ``` + +2. 在`source/__init__.py`中添加引用: + ```python + from . import xxx + from .xxx import * + ``` +完成以上两步就将新的数据源`XXXDataSet`添加好了,你可以参考[配置及运行](#5.配置及运行)实现自定义数据集的使用。 + +### 3.数据预处理 + +#### 3.1数据增强算子 +PaddleDetection中支持了种类丰富的数据增强算子,有单图像数据增强算子与批数据增强算子两种方式,您可选取合适的算子组合使用。单图像数据增强算子定义在`transform/operators.py`中,已支持的单图像数据增强算子详见下表: + +| 名称 | 作用 | +| :---------------------: | :--------------: | +| Decode | 从图像文件或内存buffer中加载图像,格式为RGB格式 | +| Permute | 假如输入是HWC顺序变成CHW | +| RandomErasingImage | 对图像进行随机擦除 | +| NormalizeImage | 对图像像素值进行归一化,如果设置is_scale=True,则先将像素值除以255.0, 再进行归一化。 | +| GridMask | GridMask数据增广 | +| RandomDistort | 随机扰动图片亮度、对比度、饱和度和色相 | +| AutoAugment | AutoAugment数据增广,包含一系列数据增强方法 | +| RandomFlip | 随机水平翻转图像 | +| Resize | 对于图像进行resize,并对标注进行相应的变换 | +| MultiscaleTestResize | 将图像重新缩放为多尺度list的每个尺寸 | +| RandomResize | 对于图像进行随机Resize,可以Resize到不同的尺寸以及使用不同的插值策略 | +| RandomExpand | 将原始图片放入用像素均值填充的扩张图中,对此图进行裁剪、缩放和翻转 | +| CropWithSampling | 根据缩放比例、长宽比例生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果 | +| CropImageWithDataAchorSampling | 基于CropImage,在人脸检测中,随机将图片尺度变换到一定范围的尺度,大大增强人脸的尺度变化 | +| RandomCrop | 原理同CropImage,以随机比例与IoU阈值进行处理 | +| RandomScaledCrop | 根据长边对图像进行随机裁剪,并对标注做相应的变换 | +| Cutmix | Cutmix数据增强,对两张图片做拼接 | +| Mixup | Mixup数据增强,按比例叠加两张图像 | +| NormalizeBox | 对bounding box进行归一化 | +| PadBox | 如果bounding box的数量少于num_max_boxes,则将零填充到bbox | +| BboxXYXY2XYWH | 将bounding box从(xmin,ymin,xmax,ymin)形式转换为(xmin,ymin,width,height)格式 | +| Pad | 将图片Pad某一个数的整数倍或者指定的size,并支持指定Pad的方式 | +| Poly2Mask | Poly2Mask数据增强 | + +批数据增强算子定义在`transform/batch_operators.py`中, 目前支持的算子列表如下: +| 名称 | 作用 | +| :---------------------: | :--------------: | +| PadBatch | 随机对每个batch的数据图片进行Pad操作,使得batch中的图片具有相同的shape | +| BatchRandomResize | 对一个batch的图片进行resize,使得batch中的图片随机缩放到相同的尺寸 | +| Gt2YoloTarget | 通过gt数据生成YOLO系列模型的目标 | +| Gt2FCOSTarget | 通过gt数据生成FCOS模型的目标 | +| Gt2TTFTarget | 通过gt数据生成TTFNet模型的目标 | +| Gt2Solov2Target | 通过gt数据生成SOLOv2模型的目标 | + +**几点说明:** +- 数据增强算子的输入为sample或者samples,每一个sample对应上文所说的`DetDataSet`输出的roidbs中的一个样本,如coco_rec或者voc_rec +- 单图像数据增强算子(Mixup, Cutmix等除外)也可用于批数据处理中。但是,单图像处理算子和批图像处理算子仍有一些差异,以RandomResize和BatchRandomResize为例,RandomResize会将一个Batch中的每张图片进行随机缩放,但是每一张图像Resize之后的形状不尽相同,BatchRandomResize则会将一个Batch中的所有图片随机缩放到相同的形状。 +- 除BatchRandomResize外,定义在`transform/batch_operators.py`的批数据增强算子接收的输入图像均为CHW形式,所以使用这些批数据增强算子前请先使用Permute进行处理。如果用到Gt2xxxTarget算子,需要将其放置在靠后的位置。NormalizeBox算子建议放置在Gt2xxxTarget之前。将这些限制条件总结下来,推荐的预处理算子的顺序为 + ``` + - XXX: {} + - ... + - BatchRandomResize: {...} # 如果不需要,可以移除,如果需要,放置在Permute之前 + - Permute: {} # 必须项 + - NormalizeBox: {} # 如果需要,建议放在Gt2XXXTarget之前 + - PadBatch: {...} # 如果不需要可移除,如果需要,建议放置在Permute之后 + - Gt2XXXTarget: {...} # 建议与PadBatch放置在最后的位置 + ``` + +#### 3.2自定义数据增强算子 +如果需要自定义数据增强算子,那么您需要了解下数据增强算子的相关逻辑。数据增强算子基类为定义在`transform/operators.py`中的`BaseOperator`类,单图像数据增强算子与批数据增强算子均继承自这个基类。完整定义参考源码,以下代码显示了`BaseOperator`类的关键函数: apply和__call__方法 + ``` python + class BaseOperator(object): + + ... + + def apply(self, sample, context=None): + return sample + + def __call__(self, sample, context=None): + if isinstance(sample, Sequence): + for i in range(len(sample)): + sample[i] = self.apply(sample[i], context) + else: + sample = self.apply(sample, context) + return sample + ``` +__call__方法为`BaseOperator`的调用入口,接收一个sample(单图像)或者多个sample(多图像)作为输入,并调用apply函数对一个或者多个sample进行处理。大多数情况下,你只需要继承`BaseOperator`重写apply方法或者重写__call__方法即可,如下所示,定义了一个XXXOp继承自BaseOperator,并注册: + ```python + @register_op + class XXXOp(BaseOperator): + def __init__(self,...): + + super(XXXImage, self).__init__() + ... + + # 大多数情况下只需要重写apply方法 + def apply(self, sample, context=None): + ... + 省略对输入的sample具体操作 + ... + return sample + + # 如果有需要,可以重写__call__方法,如Mixup, Gt2XXXTarget等 + # def __call__(self, sample, context=None): + # ... + # 省略对输入的sample具体操作 + # ... + # return sample + ``` +大多数情况下,只需要重写apply方法即可,如`transform/operators.py`中除Mixup和Cutmix外的预处理算子。对于批处理的情况一般需要重写__call__方法,如`transform/batch_operators.py`的预处理算子。 + +### 4.Reader +Reader相关的类定义在`reader.py`, 其中定义了`BaseDataLoader`类。`BaseDataLoader`在`paddle.io.DataLoader`的基础上封装了一层,其具备`paddle.io.DataLoader`的所有功能,并能够实现不同模型对于`DetDataset`的不同需求,如可以通过对Reader进行设置,以控制`DetDataset`支持Mixup, Cutmix等操作。除此之外,数据预处理算子通过`Compose`类和`BatchCompose`类组合起来分别传入`DetDataset`和`paddle.io.DataLoader`中。 +所有的Reader类都继承自`BaseDataLoader`类,具体可参见源码。 + +### 5.配置及运行 + +#### 5.1 配置 +与数据预处理相关的模块的配置文件包含所有模型公用的Dataset的配置文件,以及不同模型专用的Reader的配置文件。 + +##### 5.1.1 Dataset配置 +关于Dataset的配置文件存在于`configs/datasets`文件夹。比如COCO数据集的配置文件如下: +``` +metric: COCO # 目前支持COCO, VOC, OID, WiderFace等评估标准 +num_classes: 80 # num_classes数据集的类别数,不包含背景类 + +TrainDataset: + !COCODataSet + image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径 + anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco #数据集所在路径,相对于PaddleDetection路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # 控制dataset输出的sample所包含的字段,注意此为TrainDataset独有的且必须配置的字段 + +EvalDataset: + !COCODataSet + image_dir: val2017 # 验证集的图片所在文件夹相对于dataset_dir的路径 + anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco # 数据集所在路径,相对于PaddleDetection路径 + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # 标注文件所在路径,仅用于读取数据集的类别信息,支持json和txt格式 + dataset_dir: dataset/coco # 数据集所在路径,若添加了此行,则`anno_path`路径为`dataset_dir/anno_path`,若此行不设置或去掉此行,则`anno_path`路径即为`anno_path` +``` +在PaddleDetection的yml配置文件中,使用`!`直接序列化模块实例(可以是函数,实例等),上述的配置文件均使用Dataset进行了序列化。 + +**注意:** +请运行前自行仔细检查数据集的配置路径,在训练或验证时如果TrainDataset和EvalDataset的路径配置有误,会提示自动下载数据集。若使用自定义数据集,在推理时如果TestDataset路径配置有误,会提示使用默认COCO数据集的类别信息。 + + +##### 5.1.2 Reader配置 +不同模型专用的Reader定义在每一个模型的文件夹下,如yolov3的Reader配置文件定义在`configs/yolov3/_base_/yolov3_reader.yml`。一个Reader的示例配置如下: +``` +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + ... + batch_transforms: + ... + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + ... + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + ... + batch_size: 1 +``` +你可以在Reader中定义不同的预处理算子,每张卡的batch_size以及DataLoader的worker_num等。 + +#### 5.2运行 +在PaddleDetection的训练、评估和测试运行程序中,都通过创建Reader迭代器。Reader在`ppdet/engine/trainer.py`中创建。下面的代码展示了如何创建训练时的Reader +``` python +from ppdet.core.workspace import create +# build data loader +self.dataset = cfg['TrainDataset'] +self.loader = create('TrainReader')(selfdataset, cfg.worker_num) +``` +相应的预测以及评估时的Reader与之类似,具体可参考`ppdet/engine/trainer.py`源码。 + +> 关于数据处理模块,如您有其他问题或建议,请给我们提issue,我们非常欢迎您的反馈。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/READER_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/READER_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d1a0d9ee8d7bcd93021a2b762320f662d0a35592 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/READER_en.md @@ -0,0 +1,336 @@ +# Data Processing Module + +## Directory +- [Data Processing Module](#data-processing-module) + - [Directory](#directory) + - [1.Introduction](#1introduction) + - [2.Dataset](#2dataset) + - [2.1COCO Dataset](#21coco-dataset) + - [2.2Pascal VOC dataset](#22pascal-voc-dataset) + - [2.3Customize Dataset](#23customize-dataset) + - [3.Data preprocessing](#3data-preprocessing) + - [3.1Data Enhancement Operator](#31data-enhancement-operator) + - [3.2Custom data enhancement operator](#32custom-data-enhancement-operator) + - [4.Reader](#4reader) + - [5.Configuration and Operation](#5configuration-and-operation) + - [5.1Configuration](#51configuration) + - [5.2run](#52run) + +### 1.Introduction +All code logic for Paddle Detection's data processing module in `ppdet/data/`, the data processing module is used to load data and convert it into a format required for training, evaluation and reasoning of object Detection models. The main components of the data processing module are as follows: +The main components of the data processing module are as follows: +```bash + ppdet/data/ + ├── reader.py # Reader module based on Dataloader encapsulation + ├── source # Data source management module + │ ├── dataset.py # Defines the data source base class from which various datasets are inherited + │ ├── coco.py # The COCO dataset parses and formats the data + │ ├── voc.py # Pascal VOC datasets parse and format data + │ ├── widerface.py # The WIDER-FACE dataset parses and formats data + │ ├── category.py # Category information for the relevant dataset + ├── transform # Data preprocessing module + │ ├── batch_operators.py # Define all kinds of preprocessing operators based on batch data + │ ├── op_helper.py # The auxiliary function of the preprocessing operator + │ ├── operators.py # Define all kinds of preprocessing operators based on single image + │ ├── gridmask_utils.py # GridMask data enhancement function + │ ├── autoaugment_utils.py # AutoAugment auxiliary function + ├── shm_utils.py # Auxiliary functions for using shared memory + ``` + + +### 2.Dataset +The dataset is defined in the `source` directory, where `dataset.py` defines the base class `DetDataSet` of the dataset. All datasets inherit from the base class, and the `DetDataset` base class defines the following methods: + +| Method | Input | Output | Note | +| :-----------------------: | :------------------------------------------: | :---------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | +| \_\_len\_\_ | no | int, the number of samples in the dataset | Filter out the unlabeled samples | +| \_\_getitem\_\_ | int, The index of the sample | dict, Index idx to sample ROIDB | Get the sample roidb after transform | +| check_or_download_dataset | no | no | Check whether the dataset exists, if not, download, currently support COCO, VOC, Widerface and other datasets | +| set_kwargs | Optional arguments, given as key-value pairs | no | Currently used to support receiving mixup, cutMix and other parameters | +| set_transform | A series of transform functions | no | Set the transform function of the dataset | +| set_epoch | int, current epoch | no | Interaction between dataset and training process | +| parse_dataset | no | no | Used to read all samples from the data | +| get_anno | no | no | Used to get the path to the annotation file | + +When a dataset class inherits from `DetDataSet`, it simply implements the Parse dataset function. parse_dataset set dataset root path dataset_dir, image folder image dir, annotated file path anno_path retrieve all samples and save them in a list roidbs Each element in the list is a sample XXX rec(such as coco_rec or voc_rec), represented by dict, which contains the sample image, gt_bbox, gt_class and other fields. The data structure of xxx_rec in COCO and Pascal-VOC datasets is defined as follows: + ```python + xxx_rec = { + 'im_file': im_fname, # The full path to an image + 'im_id': np.array([img_id]), # The ID number of an image + 'h': im_h, # Height of the image + 'w': im_w, # The width of the image + 'is_crowd': is_crowd, # Community object, default is 0 (VOC does not have this field) + 'gt_class': gt_class, # ID number of an enclosure label name + 'gt_bbox': gt_bbox, # label box coordinates(xmin, ymin, xmax, ymax) + 'gt_poly': gt_poly, # Segmentation mask. This field only appears in coco_rec and defaults to None + 'difficult': difficult # Is it a difficult sample? This field only appears in voc_rec and defaults to 0 + } + ``` + +The contents of the xxx_rec can also be controlled by the Data fields parameter of `DetDataSet`, that is, some unwanted fields can be filtered out, but in most cases you do not need to change them. The default configuration in `configs/datasets` will do. + +In addition, a dictionary `cname2cid` holds the mapping of category names to IDS in the Parse dataset function. In coco dataset, can use [coco API](https://github.com/cocodataset/cocoapi) from the label category name of the file to load dataset, and set up the dictionary. In the VOC dataset, if `use_default_label=False` is set, the category list will be read from `label_list.txt`, otherwise the VOC default category list will be used. + +#### 2.1COCO Dataset +COCO datasets are currently divided into COCO2014 and COCO2017, which are mainly composed of JSON files and image files, and their organizational structure is shown as follows: + ``` + dataset/coco/ + ├── annotations + │ ├── instances_train2014.json + │ ├── instances_train2017.json + │ ├── instances_val2014.json + │ ├── instances_val2017.json + │ │ ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ │ ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ │ ... + ``` +class `COCODataSet` is defined and registered on `source/coco.py`. And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source ` roidbs ` and ` cname2cid `, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/data/PrepareDataSet_en.md#convert-user-data-to-coco-data) +And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source `roidbs` and `cname2cid`, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-coco-data) + + +#### 2.2Pascal VOC dataset +The dataset is currently divided into VOC2007 and VOC2012, mainly composed of XML files and image files, and its organizational structure is shown as follows: +``` + dataset/voc/ + ├── trainval.txt + ├── test.txt + ├── label_list.txt (optional) + ├── VOCdevkit/VOC2007 + │ ├── Annotations + │ ├── 001789.xml + │ │ ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ │ ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ │ ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ │ ... + │ ├── ImageSets + │ │ ... + ``` +The `VOCDataSet` dataset is defined and registered in `source/voc.py` . It inherits the `DetDataSet` base class and rewrites the `parse_dataset` method to parse XML annotations in the VOC dataset. Update `roidbs` and `cname2cid`. To convert other datasets to VOC format, refer to [User Data to VOC Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-voc-data) + + +#### 2.3Customize Dataset +If the COCO dataset and VOC dataset do not meet your requirements, you can load your dataset by customizing it. There are only two steps to implement a custom dataset + +1. create`source/xxx.py`, define class `XXXDataSet` extends from `DetDataSet` base class, complete registration and serialization, and rewrite `parse_dataset`methods to update `roidbs` and `cname2cid`: + ```python + from ppdet.core.workspace import register, serializable + + #Register and serialize + @register + @serializable + class XXXDataSet(DetDataSet): + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + ... + ): + self.roidbs = None + self.cname2cid = None + ... + + def parse_dataset(self): + ... + Omit concrete parse data logic + ... + self.roidbs, self.cname2cid = records, cname2cid + ``` + +2. Add a reference to `source/__init__.py`: + ```python + from . import xxx + from .xxx import * + ``` +Complete the above two steps to add the new Data source `XXXDataSet`, you can refer to [Configure and Run](#5.Configuration-and-Operation) to implement the use of custom datasets. + +### 3.Data preprocessing + +#### 3.1Data Enhancement Operator +A variety of data enhancement operators are supported in PaddleDetection, including single image data enhancement operator and batch data enhancement operator. You can choose suitable operators to use in combination. Single image data enhancement operators are defined in `transform/operators.py`. The supported single image data enhancement operators are shown in the following table: +| Name | Function | +| :----------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Decode | Loads an image from an image file or memory buffer in RGB format | +| Permute | If the input is HWC, the sequence changes to CHW | +| RandomErasingImage | Random erasure of the image | +| NormalizeImage | The pixel value of the image is normalized. If is scale= True is set, the pixel value is divided by 255.0 before normalization. | +| GridMask | GridMask data is augmented | +| RandomDistort | Random disturbance of image brightness, contrast, saturation and hue | +| AutoAugment | Auto Augment data, which contains a series of data augmentation methods | +| RandomFlip | Randomly flip the image horizontally | +| Resize | Resize the image and transform the annotation accordingly | +| MultiscaleTestResize | Rescale the image to each size of the multi-scale list | +| RandomResize | Random Resize of images can be resized to different sizes and different interpolation strategies can be used | +| RandomExpand | Place the original image into an expanded image filled with pixel mean, crop, scale, and flip the image | +| CropWithSampling | Several candidate frames are generated according to the scaling ratio and length-width ratio, and then the prunning results that meet the requirements are selected according to the area intersection ratio (IoU) between these candidate frames and the marking frames | +| CropImageWithDataAchorSampling | Based on Crop Image, in face detection, the Image scale is randomly transformed to a certain range of scale, which greatly enhances the scale change of face | +| RandomCrop | The principle is the same as CropImage, which is processed with random proportion and IoU threshold | +| RandomScaledCrop | According to the long edge, the image is randomly clipped and the corresponding transformation is made to the annotations | +| Cutmix | Cutmix data enhancement, Mosaic of two images | +| Mixup | Mixup data enhancement to scale up two images | +| NormalizeBox | Bounding box is normalized | +| PadBox | If the number of bounding boxes is less than num Max boxes, zero is populated into bboxes | +| BboxXYXY2XYWH | Bounding Box is converted from (xmin,ymin,xmax,ymin) form to (xmin,ymin, Width,height) form | +| Pad | The image Pad is an integer multiple of a certain number or the specified size, and supports the way of specifying Pad | +| Poly2Mask | Poly2Mask data enhancement | | + +Batch data enhancement operators are defined in `transform/batch_operators.py`. The list of operators currently supported is as follows: +| Name | Function | +| :---------------: | :------------------------------------------------------------------------------------------------------------------: | +| PadBatch | Pad operation is performed on each batch of data images randomly to make the images in the batch have the same shape | +| BatchRandomResize | Resize a batch of images so that the images in the batch are randomly scaled to the same size | +| Gt2YoloTarget | Generate the objectives of YOLO series models from GT data | +| Gt2FCOSTarget | Generate the target of the FCOS model from GT data | +| Gt2TTFTarget | Generate TTF Net model targets from GT data | +| Gt2Solov2Target | Generate targets for SOL Ov2 models from GT data | + +**A few notes:** +- The input of Data enhancement operator is sample or samples, and each sample corresponds to a sample of RoIDBS output by `DetDataSet` mentioned above, such as coco_rec or voc_rec +- Single image data enhancement operators (except Mixup, Cutmix, etc.) can also be used in batch data processing. However, there are still some differences between single image processing operators and Batch image processing operators. Taking Random Resize and Batch Random Resize as an example, Random Resize will randomly scale each picture in a Batch. However, the shapes of each image after Resize are different. Batch Random Resize means that all images in a Batch will be randomly scaled to the same shape. +- In addition to Batch Random Resize, the Batch data enhancement operators defined in `transform/batch_operators.py` receive input images in the form of CHW, so please use Permute before using these Batch data enhancement operators . If the Gt2xxx Target operator is used, it needs to be placed further back. The Normalize Box operator is recommended to be placed before Gt2xxx Target. After summarizing these constraints, the order of the recommended preprocessing operator is: + ``` + - XXX: {} + - ... + - BatchRandomResize: {...} # Remove it if not needed, and place it in front of Permute if necessary + - Permute: {} # flush privileges + - NormalizeBox: {} # If necessary, it is recommended to precede Gt2XXXTarget + - PadBatch: {...} # If not, you can remove it. If necessary, it is recommended to place it behind Permute + - Gt2XXXTarget: {...} # It is recommended to place with Pad Batch in the last position + ``` + +#### 3.2Custom data enhancement operator +If you need to customize data enhancement operators, you need to understand the logic of data enhancement operators. The Base class of the data enhancement Operator is the `transform/operators.py`class defined in `BaseOperator`, from which both the single image data enhancement Operator and the batch data enhancement Operator inherit. Refer to the source code for the complete definition. The following code shows the key functions of the `BaseOperator` class: the apply and __call__ methods + ``` python + class BaseOperator(object): + + ... + + def apply(self, sample, context=None): + return sample + + def __call__(self, sample, context=None): + if isinstance(sample, Sequence): + for i in range(len(sample)): + sample[i] = self.apply(sample[i], context) + else: + sample = self.apply(sample, context) + return sample + ``` +__call__ method is call entry of `BaseOperator`, Receive one sample(single image) or multiple samples (multiple images) as input, and call the Apply function to process one or more samples. In most cases, you simply inherit from `BaseOperator` and override the apply method or override the __call__ method, as shown below. Define a XXXOp that inherits from Base Operator and register it: + ```python + @register_op + class XXXOp(BaseOperator): + def __init__(self,...): + + super(XXXImage, self).__init__() + ... + + # In most cases, you just need to override the Apply method + def apply(self, sample, context=None): + ... + 省略对输入的sample具体操作 + ... + return sample + + # If necessary, override call methods such as Mixup, Gt2XXXTarget, etc + # def __call__(self, sample, context=None): + # ... + # The specific operation on the input sample is omitted + # ... + # return sample + ``` +In most cases, you simply override the Apply method, such as the preprocessor in `transform/operators.py` in addition to Mixup and Cutmix. In the case of batch processing, it is generally necessary to override the call method, such as the preprocessing operator of `transform/batch_operators.py`. + +### 4.Reader +The Reader class is defined in `reader.py`, where the `BaseDataLoader` class is defined. `BaseDataLoader` encapsulates a layer on the basis of `paddle.io.DataLoader`, which has all the functions of `paddle.io.DataLoader` and can realize the different needs of `DetDataset` for different models. For example, you can set Reader to control `DetDataset` to support Mixup, Cutmix and other operations. In addition, the Data preprocessing operators are combined into the `DetDataset` and `paddle.io.DataLoader` by the `Compose` and 'Batch Compose' classes, respectively. All Reader classes inherit from the `BaseDataLoader` class. See source code for details. + +### 5.Configuration and Operation + +#### 5.1 Configuration +The configuration files for modules related to data preprocessing contain the configuration files for Datasets common to all models and the configuration files for readers specific to different models. + +##### 5.1.1 Dataset Configuration +The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: +``` +metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards +num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes + +TrainDataset: + !COCODataSet + image_dir: train2017 # The path where the training set image resides relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the TrainDataset and must be configured + +EvalDataset: + !COCODataSet + image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir + anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir + dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path +TestDataset: + !ImageFolder + anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported + dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path` +``` +In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset. + +**Note:** +Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used. + + +##### 5.1.2 Reader configuration +The Reader configuration files for yolov3 are defined in `configs/yolov3/_base_/yolov3_reader.yml`. An example Reader configuration is as follows: +``` +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + ... + batch_transforms: + ... + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + ... + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + ... + batch_size: 1 +``` +You can define different preprocessing operators in Reader, batch_size per gpu, worker_num of Data Loader, etc. + +#### 5.2run +In the Paddle Detection training, evaluation, and test runs, Reader iterators are created. The Reader is created in `ppdet/engine/trainer.py`. The following code shows how to create a training-time Reader +``` python +from ppdet.core.workspace import create +# build data loader +self.dataset = cfg['TrainDataset'] +self.loader = create('TrainReader')(selfdataset, cfg.worker_num) +``` +The Reader for prediction and evaluation is similar to `ppdet/engine/trainer.py`. + +> About the data processing module, if you have other questions or suggestions, please send us an issue, we welcome your feedback. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d9adf2184592e37c343d743e3ce93c8a4dccb493 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README.md @@ -0,0 +1,54 @@ +简体中文 | [English](./README_en.md) + +# 行为识别任务二次开发 + +在产业落地过程中应用行为识别算法,不可避免地会出现希望自定义类型的行为识别的需求,或是对已有行为识别模型的优化,以提升在特定场景下模型的效果。鉴于行为的多样性,PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别,并根据行为的不同,集成了基于视频分类、基于检测、基于图像分类、基于跟踪以及基于骨骼点的五种行为识别技术方案,可覆盖90%+动作类型的识别,满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择,以及使用PaddleDetection进行行为识别算法二次开发工作,包括:方案选择、数据准备、模型优化思路和新增行为的开发流程。 + + +## 方案选择 + +在PaddleDetection的PP-Human中,我们为行为识别提供了多种方案:基于视频分类、基于图像分类、基于检测、基于跟踪以及基于骨骼点的行为识别方案,以期望满足不同场景、不同目标行为的需求。对于二次开发,首先我们需要确定要采用何种方案来实现行为识别的需求,其核心是要通过对场景和具体行为的分析、并考虑数据采集成本等因素,综合选择一个合适的识别方案。我们在这里简要列举了当前PaddleDetection中所支持的方案的优劣势和适用场景,供大家参考。 + +image + +下面以PaddleDetection目前已经支持的几个具体动作为例,介绍每个动作方案的选型依据: + +### 吸烟 + +方案选择:基于人体id检测的行为识别 + +原因:吸烟动作中具有香烟这个明显特征目标,因此我们可以认为当在某个人物的对应图像中检测到香烟时,该人物即在吸烟动作中。相比于基于视频或基于骨骼点的识别方案,训练检测模型需要采集的是图片级别而非视频级别的数据,可以明显减轻数据收集与标注的难度。此外,目标检测任务具有丰富的预训练模型资源,整体模型的效果会更有保障, + +### 打电话 + +方案选择:基于人体id分类的行为识别 + +原因:打电话动作中虽然有手机这个特征目标,但为了区分看手机等动作,以及考虑到在安防场景下打电话动作中会出现较多对手机的遮挡(如手对手机的遮挡、人头对手机的遮挡等等),不利于检测模型正确检测到目标。同时打电话通常持续的时间较长,且人物本身的动作不会发生太大变化,因此可以因此采用帧级别图像分类的策略。 + 此外,打电话这个动作主要可以通过上半身判别,可以采用半身图片,去除冗余信息以降低模型训练的难度。 + +### 摔倒 + +方案选择:基于人体骨骼点的行为识别 + +原因:摔倒是一个明显的时序行为的动作,可由一个人物本身进行区分,具有场景无关的特性。由于PP-Human的场景定位偏向安防监控场景,背景变化较为复杂,且部署上需要考虑到实时性,因此采用了基于骨骼点的行为识别方案,以获得更好的泛化性及运行速度。 + +### 闯入 + +方案选择:基于人体id跟踪的行为识别 + +原因:闯入识别判断行人的路径或所在位置是否在某区域内即可,与人体自身动作无关,因此只需要跟踪人体跟踪结果分析是否存在闯入行为。 + +### 打架 + +方案选择:基于视频分类的行为识别 + +原因:与上面的动作不同,打架是一个典型的多人组成的行为。因此不再通过检测与跟踪模型来提取行人及其ID,而对整体视频片段进行处理。此外,打架场景下各个目标间的互相遮挡极为严重,关键点识别的准确性不高,采用基于骨骼点的方案难以保证精度。 + + +下面详细展开五大类方案的数据准备、模型优化和新增行为识别方法 + +1. [基于人体id检测的行为识别](./idbased_det.md) +2. [基于人体id分类的行为识别](./idbased_clas.md) +3. [基于人体骨骼点的行为识别](./skeletonbased_rec.md) +4. [基于人体id跟踪的行为识别](../pphuman_mot.md) +5. [基于视频分类的行为识别](./videobased_rec.md) diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d04d426b7076abdd38a7317117f5daab6eeff0ad --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/README_en.md @@ -0,0 +1,55 @@ +[简体中文](./README.md) | English + +# Secondary Development for Action Recognition Task + +In the process of industrial implementation, the application of action recognition algorithms will inevitably lead to the need for customized types of action, or the optimization of existing action recognition models to improve the performance of the model in specific scenarios. In view of the diversity of behaviors, PP-Human supports the identification of five abnormal behavioras of smoking, making phone calls, falling, fighting, and people intrusion. At the same time, according to the different behaviors, PP-Human integrates five action recognition technology solutions based on video classification, detection-based, image-based classification, tracking-based and skeleton-based, which can cover 90%+ action type recognition and meet various development needs. In this document, we use a case to introduce how to select a action recognition solution according to the expected behavior, and use PaddleDetection to carry out the secondary development of the action recognition algorithm, including: solution selection, data preparation, model optimization and development process for adding new actions. + + +## Solution Selection +In PaddleDetection's PP-Human, we provide a variety of solutions for behavior recognition: video classification, image classification, detection, tracking-based, and skeleton point-based behavior recognition solutions, in order to meet the needs of different scenes and different target behaviors. + +image + +The following takes several specific actions that PaddleDetection currently supports as an example to introduce the selection basis of each action: + +### Smoking + +Solution selection: action recognition based on detection with human id. + +Reason: The smoking action has a obvious feature target, that is, cigarette. So we can think that when a cigarette is detected in the corresponding image of a person, the person is with the smoking action. Compared with video-based or skeleton-based recognition schemes, training detection model needs to collect data at the image level rather than the video level, which can significantly reduce the difficulty of data collection and labeling. In addition, the detection task has abundant pre-training model resources, and the performance of the model will be more guaranteed. + +### Making Phone Calls + +Solution selection: action recognition based on classification with human id. + +Reason: Although there is a characteristic target of a mobile phone in the call action, in order to distinguish actions such as looking at the mobile phone, and considering that there will be much occlusion of the mobile phone in the calling action in the security scene (such as the occlusion of the mobile phone by the hand or head, etc.), is not conducive to the detection model to correctly detect the target. Simultaneous, calls usually last a long time, and the character's action do not change much, so a strategy for frame-level image classification can therefore be employed. In addition, the action of making a phone call can mainly be judged by the upper body, and the half-body picture can be used to remove redundant information to reduce the difficulty of model training. + + +### Falling + +Solution selection: action recognition based on skelenton. + +Reason: Falling is an obvious temporal action, which is distinguishable by a character himself, and it is scene-independent. Since PP-Human is towards the security monitoring scene, where the background changes are more complicated, and the real-time inference needs to be considered in the deployment, the action recognition based on skeleton points is adopted to obtain better generalization and running speed. + + +### People Intrusion + +Solution selection: action recognition based on tracking with human id. + +Reason: The intrusion recognition can be judged by whether the pedestrian's path or location is in a selected area, and it is unrelated to pedestrian's body action. Therefore, it is only necessary to track the human and use coordinate results to analyze whether there is intrusion behavior. + +### Fighting + +Solution selection: action recognition based on video classification. + +Reason: Unlike the actions above, fighting is a typical multiplayer action. Therefore, the detection and tracking model is no longer used to extract pedestrians and their IDs, but the entire video clip is processed. In addition, the mutual occlusion between various targets in the fighting scene is extremely serious, leading to the accuracy of keypoint recognition is not good. + + + +The following are detailed description for the five major categories of solutions, including the data preparation, model optimization and adding new actions. + +1. [action recognition based on detection with human id.](./idbased_det_en.md) +2. [action recognition based on classification with human id.](./idbased_clas_en.md) +3. [action recognition based on skelenton.](./skeletonbased_rec_en.md) +4. [action recognition based on tracking with human id](../pphuman_mot_en.md) +5. [action recognition based on video classification](./videobased_rec_en.md) diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md new file mode 100644 index 0000000000000000000000000000000000000000..51f281835ab0b842a0718d726ae73a533587e82a --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md @@ -0,0 +1,223 @@ +简体中文 | [English](./idbased_clas_en.md) + +# 基于人体id的分类模型开发 + +## 环境准备 + +基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 + +基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别,因此模型训练流程与通常的图像分类模型一致。 + +### 数据集下载 +打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。 + +在`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集,每个视频的文件名即为其标注信息。 + +### 训练及测试图像处理 +根据视频文件名,其中与行为识别相关的为`A`相关的字段(即action),我们可以找到期望识别的动作类型数据。 +- 正样本视频:以打电话为例,我们只需找到包含`A024`的文件。 +- 负样本视频:除目标动作以外所有的视频。 + +鉴于视频数据转化为图像会有较多冗余,对于正样本视频,我们间隔8帧进行采样,并使用行人检测模型处理为半身图像(取检测框的上半部分,即`img = img[:H/2, :, :]`)。正样本视频中的采样得到的图像即视为正样本,负样本视频中采样得到的图像即为负样本。 + +**注意**: 正样本视频中并不完全符合打电话这一动作,在视频开头结尾部分会出现部分冗余动作,需要移除。 + +### 标注文件准备 + +基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型,需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下,其中`0`,`1`分别是图片对应所属的类别: +``` + # 每一行采用"空格"分隔图像路径与标注 + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + ... +``` + +此外,标签文件`phone_label_list.txt`,帮助将分类序号映射到具体的类型名称: +``` +0 make_a_phone_call # 类型0 +1 normal # 类型1 +``` + +完成上述内容后,放置于`dataset`目录下,文件结构如下: +``` +data/ +├── images # 放置所有图片 +├── phone_label_list.txt # 标签文件 +├── phone_train_list.txt # 训练列表,包含图片及其对应类型 +└── phone_val_list.txt # 测试列表,包含图片及其对应类型 +``` + +## 模型优化 + +### 检测-跟踪模型优化 +基于分类的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + + +### 半身图预测 +在打电话这一动作中,实际是通过上半身就能实现动作的区分的,因此在训练和预测过程中,将图像由行人全身图换为半身图 + +## 新增行为 + +### 数据准备 +参考前述介绍的内容,完成数据准备的部分,放置于`{root of PaddleClas}/dataset`下: +``` +data/ +├── images # 放置所有图片 +├── label_list.txt # 标签文件 +├── train_list.txt # 训练列表,包含图片及其对应类型 +└── val_list.txt # 测试列表,包含图片及其对应类型 +``` +其中,训练及测试列表如下: +``` + # 每一行采用"空格"分隔图像路径与标注 + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + train/000004.jpg 2 # 新增的类别直接填写对应类别号即可 + ... +``` +`label_list.txt`中需要同样对应扩展类型的名称: +``` +0 make_a_phone_call # 类型0 +1 Your New Action # 类型1 + ... +n normal # 类型n +``` + +### 配置文件设置 +在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml),需要重点关注的设置项如下: + +```yaml +# model architecture +Arch: + name: PPHGNet_tiny + class_num: 2 # 对应新增后的数量 + + ... + +# 正确设置image_root与cls_label_path,保证image_root + cls_label_path中的图片路径能够正确访问图片路径 +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/phone_train_list_halfbody.txt + + ... + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 1 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 2 # 显示topk的数量,不要超过类别总数 + class_id_map_file: dataset/phone_label_list.txt # 修改后的label_list.txt路径 +``` + +### 模型训练及评估 +#### 模型训练 +通过如下命令启动训练: +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Arch.pretrained=True +``` +其中 `Arch.pretrained` 为 `True`表示使用预训练权重帮助训练。 + +#### 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估。 + +```bash +python3 tools/eval.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=output/PPHGNet_tiny/best_model +``` + +其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。 + +### 模型导出 +模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model) +可以参考以下步骤实现: +```python +python tools/export_model.py + -c ./PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \ + -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody +``` +然后将导出的模型重命名,并加入配置文件,以适配PP-Human的使用。 +```bash +cd ./output_inference/PPHGNet_tiny_calling_halfbody + +mv inference.pdiparams model.pdiparams +mv inference.pdiparams.info model.pdiparams.info +mv inference.pdmodel model.pdmodel + +# 下载预测配置文件 +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml +``` + +至此,即可使用PP-Human进行实际预测了。 + + +### 自定义行为输出 +基于人体id的分类的行为识别方案中,将任务转化为对应人物的图像进行图片级别的分类。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,还需要将分类模型结果转化为最终的行为识别结果作为输出,并修改可视化的显示结果。 + +#### 转换为行为识别结果 +请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509)。 + +核心代码为: +```python +# 确定分类模型的最高分数输出结果 +cls_id_res = 1 +cls_score_res = -1.0 +for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + +# Current now, class 0 is positive, class 1 is negative. +if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + # 如果分类结果不是目标行为或是置信度未达到阈值,则根据历史结果确定当前帧的行为 + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] +else: + # 分类结果属于目标行为,则使用将该结果,并记录到历史结果中 + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + ... +``` + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fc28ccc7029c7ea7f1e63d0ee4f97962747e7ad3 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md @@ -0,0 +1,224 @@ +[简体中文](./idbased_clas.md) | English + +# Development for Action Recognition Based on Classification with Human ID + +## Environmental Preparation +The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Please refer to [Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation + +The model of action recognition based on classification with human id directly recognizes the image frames of video, so the model training process is same with the usual image classification model. + +### Dataset Download + +The action recognition of making phone calls is trained on the public dataset [UAV-Human](https://github.com/SUTDCV/UAV-Human). Please fill in the relevant application materials through this link to obtain the download link. + +The RGB video in this dataset is included in the `UAVHuman/ActionRecognition/RGBVideos` path, and the file name of each video is its annotation information. + +### Image Processing for Training and Validation +According to the video file name, in which the `A` field (i.e. action) related to action recognition, we can find the action type of the video data that we expect to recognize. +- Positive sample video: Taking phone calls as an example, we just need to find the file containing `A024`. +- Negative sample video: All videos except the target action. + +In view of the fact that there will be much redundancy when converting video data into images, for positive sample videos, we sample at intervals of 8 frames, and use the pedestrian detection model to process it into a half-body image (take the upper half of the detection frame, that is, `img = img[: H/2, :, :]`). The image sampled from the positive sample video is regarded as a positive sample, and the sampled image from the negative sample video is regarded as a negative sample. + +**Note**: The positive sample video does not completely are the action of making a phone call. There will be some redundant actions at the beginning and end of the video, which need to be removed. + + +### Preparation for Annotation File +The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Thus the model trained with this scheme needs to prepare the desired image data and corresponding annotation files. Please refer to [Image Classification Datasets](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/data_preparation/classification_dataset_en.md) to prepare the data. An example of an annotation file is as follows, where `0` and `1` are the corresponding categories of the image: + +``` + # Each line uses "space" to separate the image path and label + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + ... +``` + +Additionally, the label file `phone_label_list.txt` helps map category numbers to specific type names: +``` +0 make_a_phone_call # type 0 +1 normal # type 1 +``` + +After the above content finished, place it to the `dataset` directory, the file structure is as follow: +``` +data/ +├── images # All images +├── phone_label_list.txt # Label file +├── phone_train_list.txt # Training list, including pictures and their corresponding types +└── phone_val_list.txt # Validation list, including pictures and their corresponding types +``` + +## Model Optimization + +### Detection-Tracking Model Optimization +The performance of action recognition based on classification with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + + +### Half-Body Prediction +In the action of making a phone call, the action classification can be achieved through the upper body image. Therefore, during the training and prediction process, the image is changed from the pedestrian full-body to half-body. + +## Add New Action + +### Data Preparation +Referring to the previous introduction, complete the data preparation part and place it under `{root of PaddleClas}/dataset`: + +``` +data/ +├── images # All images +├── label_list.txt # Label file +├── train_list.txt # Training list, including pictures and their corresponding types +└── val_list.txt # Validation list, including pictures and their corresponding types +``` +Where the training list and validation list file are as follow: +``` + # Each line uses "space" to separate the image path and label + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + train/000004.jpg 2 # For the newly added categories, simply fill in the corresponding category number. + +`label_list.txt` should give name of the extension type: +``` +0 make_a_phone_call # class 0 +1 Your New Action # class 1 + ... +n normal # class n +``` + ... +``` + +### Configuration File Settings +The [training configuration file] (https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml) has been integrated in PaddleClas. The settings that need to be paid attention to are as follows: + +```yaml +# model architecture +Arch: + name: PPHGNet_tiny + class_num: 2 # Corresponding to the number of action categories + + ... + +# Please correctly set image_root and cls_label_path to ensure that the image_root + image path in cls_label_path can access the image correctly +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/phone_train_list_halfbody.txt + + ... + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 1 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 2 # Display the number of topks, do not exceed the total number of categories + class_id_map_file: dataset/phone_label_list.txt # path of label_list.txt +``` + +### Model Training And Evaluation +#### Model Training +Start training with the following command: +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Arch.pretrained=True +``` +where `Arch.pretrained=True` is to use pretrained weights to help with training. + +#### Model Evaluation +After training the model, use the following command to evaluate the model metrics. +```bash +python3 tools/eval.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=output/PPHGNet_tiny/best_model +``` +Where `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` specifies the path where the current best weight is located. If other weights are needed, just replace the corresponding path. + +#### Model Export +For the detailed introduction of model export, please refer to [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model) +You can refer to the following steps: + +```python +python tools/export_model.py + -c ./PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \ + -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody +``` + +Then rename the exported model and add the configuration file to suit the usage of PP-Human. +```bash +cd ./output_inference/PPHGNet_tiny_calling_halfbody + +mv inference.pdiparams model.pdiparams +mv inference.pdiparams.info model.pdiparams.info +mv inference.pdmodel model.pdmodel + +# Download configuration file for inference +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml +``` + +At this point, this model can be used in PP-Human. + +### Custom Action Output +In the model of action recognition based on classification with human id, the task is defined as a picture-level classification task of corresponding person. The type of the corresponding classification is finally regarded as the action type of the current stage. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the classification model results to the final action recognition results as output, and the displayed result of the visualization should be modified. + +Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509). + +The core code are: +```python +# Get the highest score output of the classification model +cls_id_res = 1 +cls_score_res = -1.0 +for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + +# Current now, class 0 is positive, class 1 is negative. +if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + # If the classification result is not the target action or its confidence does not reach the threshold, + # determine the action type of the current frame according to the historical results + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] +else: + # If the classification result belongs to the target action, use the result and record it in the historical result + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + ... +``` + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md new file mode 100644 index 0000000000000000000000000000000000000000..4d8495690ae0b6b10fbb113e56fe72f41ea5d73d --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md @@ -0,0 +1,202 @@ +简体中文 | [English](./idbased_det_en.md) + +# 基于人体id的检测模型开发 + +## 环境准备 + +基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/INSTALL_cn.md)完成环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 +基于检测的行为识别方案中,数据准备的流程与一般的检测模型一致,详情可参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。 + +**注意** : 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。 + + +## 模型优化 + +### 检测-跟踪模型优化 +基于检测的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + + +### 更大的分辨率 +烟头的检测在监控视角下是一个典型的小目标检测问题,使用更大的分辨率有助于提升模型整体的识别率 + +### 预训练模型 +加入小目标场景数据集VisDrone下的预训练模型进行训练,模型mAP由38.1提升到39.7。 + +## 新增行为 +### 数据准备 +参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。 + +准备完成后,数据路径为 +``` +dataset/smoking +├── smoking # 存放所有的图片 +│   ├── 1.jpg +│   ├── 2.jpg +├── smoking_test_cocoformat.json # 测试标注文件 +├── smoking_train_cocoformat.json # 训练标注文件 +``` + +以`COCO`格式为例,完成后的json标注文件内容如下: + +```json +# images字段下包含了图像的路径,id及对应宽高信息 + "images": [ + { + "file_name": "smoking/1.jpg", + "id": 0, # 此处id为图片id序号,不要重复 + "height": 437, + "width": 212 + }, + { + "file_name": "smoking/2.jpg", + "id": 1, + "height": 655, + "width": 365 + }, + + ... + +# categories 字段下包含所有类别信息,如果希望新增更多的检测类别,请在这里增加, 示例如下。 + "categories": [ + { + "supercategory": "cigarette", + "id": 1, + "name": "cigarette" + }, + { + "supercategory": "Class_Defined_by_Yourself", + "id": 2, + "name": "Class_Defined_by_Yourself" + }, + + ... + +# annotations 字段下包含了所有目标实例的信息,包括类别,检测框坐标, id, 所属图像id等信息 + "annotations": [ + { + "category_id": 1, # 对应定义的类别,在这里1代表cigarette + "bbox": [ + 97.0181345931, + 332.7033243081, + 7.5943999555, + 16.4545332369 + ], + "id": 0, # 此处id为实例的id序号,不要重复 + "image_id": 0, # 此处为实例所在图片的id序号,可能重复,此时即一张图片上有多个实例对象 + "iscrowd": 0, + "area": 124.96230648208665 + }, + { + "category_id": 2, # 对应定义的类别,在这里2代表Class_Defined_by_Yourself + "bbox": [ + 114.3895698372, + 221.9131122343, + 25.9530363697, + 50.5401234568 + ], + "id": 1, + "image_id": 1, + "iscrowd": 0, + "area": 1311.6696622034585 +``` + +### 配置文件设置 +参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下: + +```yaml +metric: COCO +num_classes: 1 # 如果新增了更多的类别,请对应修改此处 + +# 正确设置image_dir,anno_path,dataset_dir +# 保证dataset_dir + anno_path 能正确对应标注文件的路径 +# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking +``` + +### 模型训练及评估 +#### 模型训练 + +参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),执行下列步骤实现 +```bash +# At Root of PaddleDetection + +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval +``` + +#### 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估 +```bash +# At Root of PaddleDetection + +python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml +``` + +### 模型导出 +注意:如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能 +```bash +# At Root of PaddleDetection + +python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True +``` + +导出模型后,可以得到: +``` +ppyoloe_crn_s_80e_smoking_visdrone/ +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` + +至此,即可使用PP-Human进行实际预测了。 + + +### 自定义行为输出 +基于人体id的检测的行为识别方案中,将任务转化为在对应人物的图像中检测目标特征对象。当目标特征对象被检测到时,则视为行为正在发生。因此在完成自定义模型的训练及部署的基础上,还需要将检测模型结果转化为最终的行为识别结果作为输出,并修改可视化的显示结果。 + +#### 转换为行为识别结果 +请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pphuman/action_infer.py#L338)。 +核心代码为: +```python +# 解析检测模型输出,并筛选出置信度高于阈值的有效检测框。 +# Current now, class 0 is positive, class 1 is negative. +action_ret = {'class': 1.0, 'score': -1.0} +box_num = np_boxes_num[idx] +boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] +cur_box_idx += box_num +isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) +valid_boxes = boxes[isvalid, :] + +if valid_boxes.shape[0] >= 1: + # 存在有效检测框时,行为识别结果的类别和分数对应修改 + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + # 由于动作的持续性,有效检测结果可复用一定帧数 + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] +else: + # 不存在有效检测框,则根据历史检测数据确定当前帧的结果 + ... +``` + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9a5a907986c6d7c0a9151411469980b1ff668b52 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md @@ -0,0 +1,199 @@ +[简体中文](./idbased_det.md) | English + +# Development for Action Recognition Based on Detection with Human ID + +## Environmental Preparation +The model of action recognition based on detection with human id is trained with [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Please refer to [Installation](../../../tutorials/INSTALL.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation + +The model of action recognition based on detection with human id directly recognizes the image frames of video, so the model training process is same with preparation process of general detection model. For details, please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md). Please process image and annotation of data into one of the formats PaddleDetection supports. + +**Note**: In the actual prediction process, a single person image is used for prediction. So it is recommended to crop the image into a single person image during the training process, and label the cigarette detection bounding box to improve the accuracy. + + +## Model Optimization +### Detection-Tracking Model Optimization +The performance of action recognition based on detection with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + + +### Larger resolution +The detection of cigarette is a typical small target detection problem from the monitoring perspective. Using a larger resolution can help improve the overall performance of the model. + +### Pretrained model +The pretrained model under the small target scene dataset VisDrone is used for training, and the mAP of the model is increased from 38.1 to 39.7. + +## Add New Action +### Data Preparation +please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md) to complete the data preparation part. + +When finish this step, the path will look like: +``` +dataset/smoking +├── smoking # all images +│   ├── 1.jpg +│   ├── 2.jpg +├── smoking_test_cocoformat.json # Validation file +├── smoking_train_cocoformat.json # Training file +``` + +Taking the `COCO` format as an example, the content of the completed json annotation file is as follows: + +```json +# The "images" field contains the path, id and corresponding width and height information of the images. + "images": [ + { + "file_name": "smoking/1.jpg", + "id": 0, # Here id is the picture id serial number, do not duplicate + "height": 437, + "width": 212 + }, + { + "file_name": "smoking/2.jpg", + "id": 1, + "height": 655, + "width": 365 + }, + + ... + +# The "categories" field contains all category information. If you want to add more detection categories, please add them here. The example is as follows. + "categories": [ + { + "supercategory": "cigarette", + "id": 1, + "name": "cigarette" + }, + { + "supercategory": "Class_Defined_by_Yourself", + "id": 2, + "name": "Class_Defined_by_Yourself" + }, + + ... + +# The "annotations" field contains information about all instances, including category, bounding box coordinates, id, image id and other information + "annotations": [ + { + "category_id": 1, # Corresponding to the defined category, where 1 represents cigarette + "bbox": [ + 97.0181345931, + 332.7033243081, + 7.5943999555, + 16.4545332369 + ], + "id": 0, # Here id is the id serial number of the instance, do not duplicate + "image_id": 0, # Here is the id serial number of the image where the instance is located, which may be duplicated. In this case, there are multiple instance objects on one image. + "iscrowd": 0, + "area": 124.96230648208665 + }, + { + "category_id": 2, # Corresponding to the defined category, where 2 represents Class_Defined_by_Yourself + "bbox": [ + 114.3895698372, + 221.9131122343, + 25.9530363697, + 50.5401234568 + ], + "id": 1, + "image_id": 1, + "iscrowd": 0, + "area": 1311.6696622034585 +``` + +### Configuration File Settings +Refer to [Configuration File](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), the key should be paid attention to are as follows: +```yaml +metric: COCO +num_classes: 1 # If more categories are added, please modify here accordingly + +# Set image_dir,anno_path,dataset_dir correctly +# Ensure that dataset_dir + anno_path can correctly access to the path of the annotation file +# Ensure that dataset_dir + image_dir + the image path in the annotation file can correctly access to the image path +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking +``` + +### Model Training And Evaluation +#### Model Training +As [PP-YOLOE](../../../../configs/ppyoloe/README.md), start training with the following command: +```bash +# At Root of PaddleDetection + +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval +``` + +#### Model Evaluation +After training the model, use the following command to evaluate the model metrics. + +```bash +# At Root of PaddleDetection + +python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml +``` + +#### Model Export +Note: If predicting in Tensor-RT environment, please enable `-o trt=True` for better performance. +```bash +# At Root of PaddleDetection + +python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True +``` + +After exporting the model, you can get: +``` +ppyoloe_crn_s_80e_smoking_visdrone/ +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` + +At this point, this model can be used in PP-Human. + +### Custom Action Output +In the model of action recognition based on detection with human id, the task is defined to detect target objects in images of corresponding person. When the target object is detected, the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the detection model results to the final action recognition results as output, and the displayed result of the visualization should be modified. + +#### Convert to Action Recognition Result +Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pphuman/action_infer.py#L338). + +The core code are: +```python +# Parse the detection model output and filter out valid detection boxes with confidence higher than a threshold. +# Current now, class 0 is positive, class 1 is negative. +action_ret = {'class': 1.0, 'score': -1.0} +box_num = np_boxes_num[idx] +boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] +cur_box_idx += box_num +isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) +valid_boxes = boxes[isvalid, :] + +if valid_boxes.shape[0] >= 1: + # When there is a valid detection frame, the category and score of the behavior recognition result are modified accordingly. + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + # Due to the continuity of the action, valid detection results can be reused for a certain number of frames. + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] +else: + # If there is no valid detection frame, the result of the current frame is determined according to the historical detection result. + ... +``` + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md new file mode 100644 index 0000000000000000000000000000000000000000..e777180fbd373829ca002f3d3d210c64ac090ae9 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md @@ -0,0 +1,205 @@ +简体中文 | [English](./skeletonbased_rec_en.md) + +# 基于人体骨骼点的行为识别 + +## 环境准备 + +基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 +使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据,以适配PaddleVideo进行训练,其主要流程包含以下步骤: + + +### 数据格式说明 +STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中,训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`,当前方案仅支持单人构成的行为(但视频中可以存在多人,每个人独自进行行为识别判断),即`M=1`。 + +| 维度 | 大小 | 说明 | +| ---- | ---- | ---------- | +| N | 不定 | 数据集序列个数 | +| C | 2 | 关键点坐标维度,即(x, y) | +| T | 50 | 动作序列的时序维度(即持续帧数)| +| V | 17 | 每个人物关键点的个数,这里我们使用了`COCO`数据集的定义,具体可见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_cn.md#COCO%E6%95%B0%E6%8D%AE%E9%9B%86) | +| M | 1 | 人物个数,这里我们每个动作序列只针对单人预测 | + +### 获取序列的骨骼点坐标 +对于一个待标注的序列(这里序列指一个动作片段,可以是视频或有顺序的图片集合)。可以通过模型预测或人工标注的方式获取骨骼点(也称为关键点)坐标。 +- 模型预测:可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型,并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。 +- 人工标注:若对关键点的数量或是定义有其他需求,也可以直接人工标注各个关键点的坐标位置,注意对于被遮挡或较难标注的点,仍需要标注一个大致坐标,否则后续网络学习过程会受到影响。 + + +当使用模型预测获取时,可以参考如下步骤进行,请注意此时在PaddleDetection中进行操作。 + +```bash +# current path is under root of PaddleDetection + +# Step 1: download pretrained inference models. +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip +unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip +unzip -d output_inference/ dark_hrnet_w32_256x192.zip + +# Step 2: Get the keypoint coordinarys + +# if your data is image sequence +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True + +# if your data is video +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True +``` +这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。 + + +### 统一序列的时序长度 +由于实际数据中每个动作的长度不一,首先需要根据您的数据和实际场景预定时序长度(在PP-Human中我们采用50帧为一个动作序列),并对数据做以下处理: +- 实际长度超过预定长度的数据,随机截取一个50帧的片段 +- 实际长度不足预定长度的数据:补0,直到满足50帧 +- 恰好等于预定长度的数据: 无需处理 + +注意:在这一步完成后,请严格确认处理后的数据仍然包含了一个完整的行为动作,不会产生预测上的歧义,建议通过可视化数据的方式进行确认。 + +### 保存为PaddleVideo可用的文件格式 +在经过前两步处理后,我们得到了每个人物动作片段的标注,此时我们已有一个列表`all_kpts`,这个列表中包含多个关键点序列片段,其中每一个片段形状为(T, V, C) (在我们的例子中即(50, 17, 2)), 下面进一步将其转化为PaddleVideo可用的格式。 +- 调整维度顺序: 可通过`np.transpose`和`np.expand_dims`将每一个片段的维度转化为(C, T, V, M)的格式。 +- 将所有片段组合并保存为一个文件 + +注意:这里的`class_id`是`int`类型,与其他分类任务类似。例如`0:摔倒, 1:其他`。 + + +我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件,该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。 + +```bash +mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations + +mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json + +cd {root of PaddleVideo}/applications/PPHuman/datasets/ + +python prepare_dataset.py +``` + +至此,我们得到了可用的训练数据(`.npy`)和对应的标注文件(`.pkl`)。 + + +## 模型优化 + +### 检测-跟踪模型优化 +基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + +### 关键点模型优化 +骨骼点作为该方案的核心特征,对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误,从关键点组成的骨架图像看,已经难以辨别具体动作,可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。 + +### 坐标归一化处理 +在完成骨骼点坐标的获取后,建议根据各人物的检测框进行归一化处理,以消除人物位置、尺度的差异给网络带来的收敛难度。 + + +## 新增行为 + +基于关键点的行为识别方案中,行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455),并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配,完成模型训练及导出使用流程。 + + +### 数据准备与配置文件修改 +- 按照`数据准备`, 准备训练数据(`.npy`)和对应的标注文件(`.pkl`)。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。 + +- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下: + +```yaml +MODEL: #MODEL field + framework: + backbone: + name: "STGCN" + in_channels: 2 # 此处对应数据说明中的C维,表示二维坐标。 + dropout: 0.5 + layout: 'coco_keypoint' + data_bn: True + head: + name: "STGCNHead" + num_classes: 2 # 如果数据中有多种行为类型,需要修改此处使其与预测类型数目一致。 + if_top5: False # 行为类型数量不足5时请设置为False,否则会报错 + +... + + +# 请根据数据路径正确设置train/valid/test部分的数据及label路径 +DATASET: #DATASET field + batch_size: 64 + num_workers: 4 + test_batch_size: 1 + test_num_workers: 0 + train: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle + file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path + label_path: "./applications/PPHuman/datasets/train_label.pkl" + + valid: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True + test: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True +``` + +### 模型训练与测试 +- 在PaddleVideo中,使用以下命令即可开始训练: + +```bash +# current path is under root of PaddleVideo +python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml + +# 由于整个任务可能过拟合,建议同时开启验证以保存最佳模型 +python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml +``` + +- 在训练完成后,采用以下命令进行预测: +```bash +python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams +``` + +### 模型导出 +- 在PaddleVideo中,通过以下命令实现模型的导出,得到模型结构文件`STGCN.pdmodel`和模型权重文件`STGCN.pdiparams`,并增加配置文件: +```bash +# current path is under root of PaddleVideo +python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \ + -p output/STGCN/STGCN_best.pdparams \ + -o output_inference/STGCN + +cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN + +# 重命名模型文件,适配PP-Human的调用 +cd output_inference/STGCN +mv STGCN.pdiparams model.pdiparams +mv STGCN.pdiparams.info model.pdiparams.info +mv STGCN.pdmodel model.pdmodel +``` +完成后的导出模型目录结构如下: +``` +STGCN +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +├── model.pdmodel +``` + +至此,就可以使用PP-Human进行行为识别的推理了。 + +**注意**:如果在训练时调整了视频序列的长度或关键点的数量,在此处需要对应修改配置文件中`INFERENCE`字段内容,以实现正确预测。 +```yaml +# 序列数据的维度为(N,C,T,V,M) +INFERENCE: + name: 'STGCN_Inference_helper' + num_channels: 2 # 对应C维 + window_size: 50 # 对应T维,请对应调整为数据长度 + vertex_nums: 17 # 对应V维,请对应调整为关键点数目 + person_nums: 1 # 对应M维 +``` + +### 自定义行为输出 +基于人体骨骼点的行为识别方案中,模型输出的分类结果即代表了该人物在一定时间段内行为类型。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,使用模型输出作为最终结果,修改可视化的显示结果即可。 + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md new file mode 100644 index 0000000000000000000000000000000000000000..754fb3fa19e36a03ea78d6b4ccc8790eecbb2bcc --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md @@ -0,0 +1,200 @@ +[简体中文](./skeletonbased_rec.md) | English + +# Skeleton-based action recognition + +## Environmental Preparation +The skeleton-based action recognition is trained with [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo). Please refer to [Installation](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/install.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation +For the model of skeleton-based model, you can refer to [this document](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE %AD%E7%BB%83%E6%95%B0%E6%8D%AE) to preparation training adapted to PaddleVideo. The main process includes the following steps: + + +### Data Format Description +STGCN is a model based on the sequence of skeleton point coordinates. In PaddleVideo, training data is `Numpy` data stored with `.npy` format, and labels can be files stored in `.npy` or `.pkl` format. The dimension requirement for sequence data is `(N,C,T,V,M)`, the current solution only supports behaviors composed of a single person (but there can be multiple people in the video, and each person performs action recognition separately), that is` M=1`. + +| Dim | Size | Description | +| ---- | ---- | ---------- | +| N | Not Fixed | The number of sequences in the dataset | +| C | 2 | Keypoint coordinate, i.e. (x, y) | +| T | 50 | The temporal dimension of the action sequence (i.e. the number of continuous frames)| +| V | 17 | The number of keypoints of each person, here we use the definition of the `COCO` dataset, see [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_en.md#description-for-coco-datasetkeypoint) | +| M | 1 | The number of persons, here we only predict a single person for each action sequence | + +### Get The Skeleton Point Coordinates of The Sequence +For a sequence to be labeled (here a sequence refers to an action segment, which can be a video or an ordered collection of pictures). The coordinates of skeletal points (also known as keypoints) can be obtained through model prediction or manual annotation. +- Model prediction: You can directly select the model in the [PaddleDetection KeyPoint Models](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/README_en.md) and according to `3, training and testing - Deployment Prediction - Detect + keypoint top-down model joint deployment` to get the 17 keypoint coordinates of the target sequence. + +When using the model to predict and obtain the coordinates, you can refer to the following steps, please note that the operation in PaddleDetection at this time. + +```bash +# current path is under root of PaddleDetection + +# Step 1: download pretrained inference models. +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip +unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip +unzip -d output_inference/ dark_hrnet_w32_256x192.zip + +# Step 2: Get the keypoint coordinarys + +# if your data is image sequence +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True + +# if your data is video +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True +``` +We can get a detection result file named `det_keypoint_unite_image_results.json`. The detail of content can be seen at [Here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108). + + +### Uniform Sequence Length +Since the length of each action in the actual data is different, the first step is to pre-determine the time sequence length according to your data and the actual scene (in PP-Human, we use 50 frames as an action sequence), and do the following processing to the data: +- If the actual length exceeds the predetermined length, a 50-frame segment will be randomly intercepted +- Data whose actual length is less than the predetermined length: fill with 0 until 50 frames are met +- data exactly equal to the predeter: no processing required + +Note: After this step is completed, please strictly confirm that the processed data contains a complete action, and there will be no ambiguity in prediction. It is recommended to confirm by visualizing the data. + +### Save to PaddleVideo usable formats +After the first two steps of processing, we get the annotation of each character action fragment. At this time, we have a list `all_kpts`, which contains multiple keypoint sequence fragments, each one has a shape of (T, V, C) (in our case (50, 17, 2)), which is further converted into a format usable by PaddleVideo. +- Adjust dimension order: `np.transpose` and `np.expand_dims` can be used to convert the dimension of each fragment into (C, T, V, M) format. +- Combine and save all clips as one file + +Note: `class_id` is a `int` type variable, similar to other classification tasks. For example `0: falling, 1: other`. + +We provide a [script file](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py) to do this step, which can directly process the generated `det_keypoint_unite_image_results.json` file. The content executed by the script includes parsing the content of the json file, unforming the training data sequence and saving the data file as described in the preceding steps. + +```bash +mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations + +mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json + +cd {root of PaddleVideo}/applications/PPHuman/datasets/ + +python prepare_dataset.py +``` + +Now, we have available training data (`.npy`) and corresponding annotation files (`.pkl`). + +## Model Optimization +### detection-tracking model optimization +The performance of action recognition based on skelenton depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + +### keypoint model optimization +As the core feature of the scheme, the skeleton point positioning performance also determines the overall effect of action recognition. If there are obvious errors in the recognition results of the keypoint coordinates of in the actual scene, it is difficult to distinguish the specific actions from the skeleton image composed of the keypoint. +You can refer to [Secondary Development of Keypoint Detection Task](../keypoint_detection_en.md) to optimize the keypoint model. + +### Coordinate Normalization +After getting coordinates of the skeleton points, it is recommended to perform normalization processing according to the detection bounding box of each person to reduce the convergence difficulty brought by the difference in the position and scale of the person. + +## Add New Action + +In skeleton-based action recognition, the model is [ST-GCN](https://arxiv.org/abs/1801.07455). Modified to adapt PaddleVideo based on [Training Step](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/model_zoo/recognition/stgcn.md). And complete the model training and exporting process. + +### Data Preparation And Configuration File Settings +- Prepare the training data (`.npy`) and the corresponding annotation file (`.pkl`) according to `Data preparation`. Correspondingly placed under `{root of PaddleVideo}/applications/PPHuman/datasets/`. + +- Refer [Configuration File](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), the things to focus on are as follows: + +```yaml +MODEL: #MODEL field + framework: + backbone: + name: "STGCN" + in_channels: 2 # This corresponds to the C dimension in the data format description, representing two-dimensional coordinates. + dropout: 0.5 + layout: 'coco_keypoint' + data_bn: True + head: + name: "STGCNHead" + num_classes: 2 # If there are multiple action types in the data, this needs to be modified to match the number of types. + if_top5: False # When the number of action types is less than 5, please set it to False, otherwise an error will be raised. + +... + + +# Please set the data and label path of the train/valid/test part correctly according to the data path +DATASET: #DATASET field + batch_size: 64 + num_workers: 4 + test_batch_size: 1 + test_num_workers: 0 + train: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle + file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path + label_path: "./applications/PPHuman/datasets/train_label.pkl" + + valid: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True + test: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True +``` + +### Model Training And Evaluation + +- In PaddleVideo, start training with the following command: +```bash +# current path is under root of PaddleVideo +python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml + +# Since the task may overfit, it is recommended to evaluate model during training to save the best model. +python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml +``` + +- After training the model, use the following command to do inference. +```bash +python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams +``` + +### Model Export + +In PaddleVideo, use the following command to export model and get structure file `STGCN.pdmodel` and weight file `STGCN.pdiparams`. And add the configuration file here. +```bash +# current path is under root of PaddleVideo +python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \ + -p output/STGCN/STGCN_best.pdparams \ + -o output_inference/STGCN + +cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN + +# Rename model files to adapt PP-Human +cd output_inference/STGCN +mv STGCN.pdiparams model.pdiparams +mv STGCN.pdiparams.info model.pdiparams.info +mv STGCN.pdmodel model.pdmodel +``` + +The directory structure will look like: +``` +STGCN +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +├── model.pdmodel +``` +At this point, this model can be used in PP-Human. + +**Note**: If the length of the video sequence or the number of keypoints is changed during training, the content of the `INFERENCE` field in the configuration file needs to be modified accordingly to correct prediction. + +```yaml +# The dimension of the sequence data is (N,C,T,V,M) +INFERENCE: + name: 'STGCN_Inference_helper' + num_channels: 2 # Corresponding to C dimension + window_size: 50 # Corresponding to T dimension, please set it accordingly to the sequence length. + vertex_nums: 17 # Corresponding to V dimension, please set it accordingly to the number of keypoints + person_nums: 1 # Corresponding to M dimension +``` + +### Custom Action Output +In the skeleton-based action recognition, the classification result of the model represents the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, the model output is directly used as the final result, and the displayed result of the visualization should be modified. + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md new file mode 100644 index 0000000000000000000000000000000000000000..cdf0a5b419d487d8d74da438e22d661085c7824c --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md @@ -0,0 +1,159 @@ +# 基于视频分类的行为识别 + +## 数据准备 + +视频分类任务输入的视频格式一般为`.mp4`、`.avi`等格式视频或者是抽帧后的视频帧序列,标签则可以是`.txt`格式存储的文件。 + +对于打架识别任务,具体数据准备流程如下: + +### 数据集下载 + +打架识别基于6个公开的打架、暴力行为相关数据集合并后的数据进行模型训练。公开数据集具体信息如下: + +| 数据集 | 下载连接 | 简介 | 标注 | 数量 | 时长 | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| Surveillance Camera Fight Dataset| https://github.com/sayibet/fight-detection-surv-dataset | 裁剪视频,监控视角 | 视频级别 | 打架:150;非打架:150 | 2s | +| A Dataset for Automatic Violence Detection in Videos | https://github.com/airtlab/A-Dataset-for-Automatic-Violence-Detection-in-Videos | 裁剪视频,室内自行录制 | 视频级别 | 暴力行为:115个场景,2个机位,共230 ;非暴力行为:60个场景,2个机位,共120 | 几秒钟 | +| Hockey Fight Detection Dataset | https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes?resource=download | 裁剪视频,非真实场景 | 视频级别 | 打架:500;非打架:500 | 2s | +| Video Fight Detection Dataset | https://www.kaggle.com/datasets/naveenk903/movies-fight-detection-dataset | 裁剪视频,非真实场景 | 视频级别 | 打架:100;非打架:101 | 2s | +| Real Life Violence Situations Dataset | https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset | 裁剪视频,非真实场景 | 视频级别 | 暴力行为:1000;非暴力行为:1000 | 几秒钟 | +| UBI Abnormal Event Detection Dataset| http://socia-lab.di.ubi.pt/EventDetection/ | 未裁剪视频,监控视角 | 帧级别 | 打架:216;非打架:784;裁剪后二次标注:打架1976,非打架1630 | 原视频几秒到几分钟不等,裁剪后2s | + +打架(暴力行为)视频3956个,非打架(非暴力行为)视频3501个,共7457个视频,每个视频几秒钟。 + +本项目为大家整理了前5个数据集,下载链接:[https://aistudio.baidu.com/aistudio/datasetdetail/149085](https://aistudio.baidu.com/aistudio/datasetdetail/149085)。 + +### 视频抽帧 + +首先下载PaddleVideo代码: +```bash +git clone https://github.com/PaddlePaddle/PaddleVideo.git +``` + +假设PaddleVideo源码路径为PaddleVideo_root。 + +为了加快训练速度,将视频进行抽帧。下面命令会根据视频的帧率FPS进行抽帧,如FPS=30,则每秒视频会抽取30帧图像。 + +```bash +cd ${PaddleVideo_root} +python data/ucf101/extract_rawframes.py dataset/ rawframes/ --level 2 --ext mp4 +``` +其中,假设视频已经存放在了`dataset`目录下,如果是其他路径请对应修改。打架(暴力)视频存放在`dataset/fight`中;非打架(非暴力)视频存放在`dataset/nofight`中。`rawframes`目录存放抽取的视频帧。 + +### 训练集和验证集划分 + +打架识别验证集1500条,来自Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、UBI Abnormal Event Detection Dataset三个数据集。 + +也可根据下面的命令将数据按照8:2的比例划分成训练集和测试集: + +```bash +python split_fight_train_test_dataset.py "rawframes" 2 0.8 +``` + +参数说明:“rawframes”为视频帧存放的文件夹;2表示目录结构为两级,第二级表示每个行为对应的子文件夹;0.8表示训练集比例。 + +其中`split_fight_train_test_dataset.py`文件在PaddleDetection中的`deploy/pipeline/tools`路径下。 + +执行完命令后会最终生成fight_train_list.txt和fight_val_list.txt两个文件。打架的标签为1,非打架的标签为0。 + +### 视频裁剪 +对于未裁剪的视频,如UBI Abnormal Event Detection Dataset数据集,需要先进行裁剪才能用于模型训练,`deploy/pipeline/tools/clip_video.py`中给出了视频裁剪的函数`cut_video`,输入为视频路径,裁剪的起始帧和结束帧以及裁剪后的视频保存路径。 + + +## 模型优化 + +### VideoMix +[VideoMix](https://arxiv.org/abs/2012.03457)是视频数据增强的方法之一,是对图像数据增强CutMix的扩展,可以缓解模型的过拟合问题。 + +与Mixup将两个视频片段的每个像素点按照一定比例融合不同的是,VideoMix是每个像素点要么属于片段A要么属于片段B。输出结果是两个片段原始标签的加权和,权重是两个片段各自的比例。 + +在baseline的基础上加入VideoMix数据增强后,精度由87.53%提升至88.01%。 + +### 更大的分辨率 +由于监控摄像头角度、距离等问题,存在监控画面下人比较小的情况,小目标行为的识别较困难,尝试增大输入图像的分辨率,模型精度由88.01%提升至89.06%。 + +## 新增行为 + +目前打架识别模型使用的是[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)套件中[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md),并在PP-TSM视频分类模型训练流程的基础上修改适配,完成模型训练。 + +请先参考[使用说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/usage.md)了解PaddleVideo模型库的使用。 + + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| 打架识别 | PP-TSM | 准确率:89.06% | T4, 2s视频128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +#### 模型训练 +下载预训练模型: +```bash +wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams +``` + +执行训练: +```bash +# 单卡训练 +cd ${PaddleVideo_root} +python main.py --validate -c pptsm_fight_frames_dense.yaml +``` + +本方案针对的是视频的二分类问题,如果不是二分类,需要修改配置文件中`MODEL-->head-->num_classes`为具体的类别数目。 + + +```bash +cd ${PaddleVideo_root} +# 多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -B -m paddle.distributed.launch --gpus=“0,1,2,3” \ + --log_dir=log_pptsm_dense main.py --validate \ + -c pptsm_fight_frames_dense.yaml +``` + +#### 模型评估 +训练好的模型下载:[https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) + +模型评估: +```bash +cd ${PaddleVideo_root} +python main.py --test -c pptsm_fight_frames_dense.yaml \ + -w ppTSM_fight_best.pdparams +``` + +其中`ppTSM_fight_best.pdparams`为训练好的模型。 + +#### 模型导出 + +导出inference模型: + +```bash +cd ${PaddleVideo_root} +python tools/export_model.py -c pptsm_fight_frames_dense.yaml \ + -p ppTSM_fight_best.pdparams \ + -o inference/ppTSM +``` + + +#### 推理可视化 + +利用上步骤导出的模型,基于PaddleDetection中推理pipeline可完成自定义行为识别及可视化。 + +新增行为后,需要对现有的可视化代码进行修改,目前代码支持打架二分类可视化,新增类别后需要根据识别结果自适应可视化推理结果。 + +具体修改PaddleDetection中/deploy/pipeline/pipeline.py路径下PipePredictor类中visualize_video成员函数。当结果中存在'video_action'数据时,会对行为进行可视化。目前的逻辑是如果推理的类别为1,则为打架行为,进行可视化;否则不进行显示,即"video_action_score"为None。用户新增行为后,可根据类别index和对应的行为设置"video_action_text"字段,目前index=1对应"Fight"。相关代码块如下: + +``` +video_action_res = result.get('video_action') +if video_action_res is not None: + video_action_score = None + if video_action_res and video_action_res["class"] == 1: + video_action_score = video_action_res["score"] + mot_boxes = None + if mot_res: + mot_boxes = mot_res['boxes'] + image = visualize_action( + image, + mot_boxes, + action_visual_collector=None, + action_text="SkeletonAction", + video_action_score=video_action_score, + video_action_text="Fight") +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection.md new file mode 100644 index 0000000000000000000000000000000000000000..4f20cf3c58e8908136bd336abc413536a06a3467 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection.md @@ -0,0 +1,84 @@ +简体中文 | [English](./detection_en.md) + +# 目标检测任务二次开发 + +在目标检测算法产业落地过程中,常常会出现需要额外训练以满足实际使用的要求,项目迭代过程中也会出先需要修改类别的情况。本文档详细介绍如何使用PaddleDetection进行目标检测算法二次开发,流程包括:数据准备、模型优化思路和修改类别开发流程。 + +## 数据准备 + +二次开发首先需要进行数据集的准备,针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用Labeme,LabelImg等标注工具标注目标检测框,并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md) + +## 模型优化 + +### 1. 使用自定义数据集训练 + +基于准备的数据在数据配置文件中修改对应路径,例如`configs/dataset/coco_detection.yml`: + +``` +metric: COCO +num_classes: 80 + +TrainDataset: + !COCODataSet + image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径 + anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco # 数据集所在路径,相对于PaddleDetection路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 # 验证集的图片所在文件相对于dataset_dir的路径 + anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco # 数据集所在路径,相对于PaddleDetection路径 + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # 标注文件所在文件 相对于dataset_dir的路径 + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # 数据集所在路径,相对于PaddleDetection路径 +``` + +配置修改完成后,即可以启动训练评估,命令如下 + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) + + +### 2. 加载COCO模型作为预训练 + +目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重,加载到检测算法的骨干网络中,实际使用时,建议加载COCO数据集训练好的权重,通常能够对模型精度有较大提升,使用方法如下: + +#### 1) 设置预训练权重路径 + +COCO数据集训练好的模型权重均在各算法配置文件夹下,例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重:[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) 修改超参数 + +加载COCO预训练权重后,需要修改学习率超参数,例如`configs/ppyoloe/_base_/optimizer_300e.yml`中: + +``` +epoch: 120 # 原始配置为300epoch,加载COCO权重后可以适当减少迭代轮数 + +LearningRate: + base_lr: 0.005 # 原始配置为0.025,加载COCO权重后需要降低学习率 + schedulers: + - !CosineDecay + max_epochs: 144 # 依据epoch数进行修改 + - !LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## 修改类别 + +当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中: + +``` +metric: COCO +num_classes: 10 # 原始类别80 +``` + +配置修改完成后,同样可以加载COCO预训练权重,PaddleDetection支持自动加载shape匹配的权重,对于shape不匹配的权重会自动忽略,因此无需其他修改。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection_en.md new file mode 100644 index 0000000000000000000000000000000000000000..003ea152906b947473643b93cf1585b7f32d2155 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/detection_en.md @@ -0,0 +1,89 @@ +[简体中文](./detection.md) | English + +# Customize Object Detection task + +In the practical application of object detection algorithms in a specific industry, additional training is often required for practical use. The project iteration will also need to modify categories. This document details how to use PaddleDetection for a customized object detection algorithm. The process includes data preparation, model optimization roadmap, and modifying the category development process. + +## Data Preparation + +Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection bouding boxes and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/PrepareDetDataSet_en.md) + +## Model Optimization + +### 1. Use customized dataset for training + +Modify the corresponding path in the data configuration file based on the prepared data, for example: + +configs/dataset/coco_detection.yml`: + +``` +metric: COCO +num_classes: 80 + +TrainDataset: + !COCODataSet + image_dir: train2017 # Path to the images of the training set relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 # Path to the images of the evaldataset set relative to the dataset_dir + anno_path: annotations/instances_val2017.json # Path to the annotation file of the evaldataset relative to the dataset_dir + dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # Path to the annotation files relative to dataset_di. + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # Path to the dataset relative to the PaddleDetection path +``` + +Once the configuration changes are completed, the training evaluation can be started with the following command + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval +``` + +More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) + +### + +### 2. Load the COCO model as pre-training + +The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows. + +#### 1) Set pre-training weight path + +The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) Modify hyperparameters + +After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example + +In `configs/ppyoloe/_base_/optimizer_300e.yml`: + +``` +epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately + +LearningRate: + base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced. + schedulers: + - !CosineDecay + max_epochs: 144 # Modify based on the number of epochs + - LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## Modify categories + +When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`: + +``` +metric: COCO +num_classes: 10 # original class 80 +``` + +After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection.md new file mode 100644 index 0000000000000000000000000000000000000000..c4afe4e1ec010670d07a7490f75ecb5c6b669e6a --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection.md @@ -0,0 +1,261 @@ +简体中文 | [English](./keypoint_detection_en.md) + +# 关键点检测任务二次开发 + +在实际场景中应用关键点检测算法,不可避免地会出现需要二次开发的需求。包括对目前的预训练模型效果不满意,希望优化模型效果;或是目前的关键点点位定义不能满足实际场景需求,希望新增或是替换关键点点位的定义,训练新的关键点模型。本文档将介绍如何在PaddleDetection中,对关键点检测算法进行二次开发。 + +## 数据准备 + +### 基本流程说明 +在PaddleDetection中,目前支持的标注数据格式为`COCO`和`MPII`。这两个数据格式的详细说明,可以参考文档[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)。在这一步中,通过使用Labeme等标注工具,依照特征点序号标注对应坐标。并转化成对应可训练的标注格式。建议使用`COCO`格式进行。 + +### 合并数据集 +为了扩展使用的训练数据,合并多个不同的数据集一起训练是一个很直观的解决手段,但不同的数据集往往对关键点的定义并不一致。合并数据集的第一步是需要统一不同数据集的点位定义,确定标杆点位,即最终模型学习的特征点类型,然后根据各个数据集的点位定义与标杆点位定义之间的关系进行调整。 +- 在标杆点位中的点:调整点位序号,使其与标杆点位一致 +- 未在标杆点位中的点:舍去 +- 数据集缺少标杆点位中的点:对应将标注的标志位记为“未标注” + +在[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)中,提供了如何合并`COCO`数据集和`AI Challenger`数据集,并统一为以`COCO`为标杆点位定义的案例说明,供参考。 + + +## 模型优化 + +### 检测-跟踪模型优化 +在PaddleDetection中,关键点检测能力支持Top-Down、Bottom-Up两套方案,Top-Down先检测主体,再检测局部关键点,优点是精度较高,缺点是速度会随着检测对象的个数增加,Bottom-Up先检测关键点再组合到对应的部位上,优点是速度快,与检测对象个数无关,缺点是精度较低。关于两种方案的详情及对应模型,可参考[关键点检测系列模型](../../../configs/keypoint/README.md) + +当使用Top-Down方案时,模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,会使关键点检测部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](./detection.md)以及[多目标跟踪任务二次开发](./pphuman_mot.md)对检测/跟踪模型进行优化。 + +### 使用符合场景的数据迭代 +目前发布的关键点检测算法模型主要在`COCO`/ `AI Challenger`等开源数据集上迭代,这部分数据集中可能缺少与实际任务较为相似的监控场景(视角、光照等因素)、体育场景(存在较多非常规的姿态)。使用更符合实际任务场景的数据进行训练,有助于提升模型效果。 + +### 使用预训练模型迭代 +关键点模型的数据的标注复杂度较大,直接使用模型从零开始在业务数据集上训练,效果往往难以满足需求。在实际工程中使用时,建议加载已经训练好的权重,通常能够对模型精度有较大提升,以`HRNet`为例,使用方法如下: +```bash +python tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +``` +在加载预训练模型后,可以适当减小初始学习率和最终迭代轮数, 建议初始学习率取默认配置值的1/2至1/5,并可开启`--eval`观察迭代过程中AP值的变化。 + + +### 遮挡数据增强 +关键点任务中有较多遮挡问题,包括自身遮挡与不同目标之间的遮挡。 + +1. 检测模型优化(仅针对Top-Down方案) + +参考[目标检测任务二次开发](./detection.md),提升检测模型在复杂场景下的效果。 + +2. 关键点数据增强 + +在关键点模型训练中增加遮挡的数据增强,参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/tinypose_256x192.yml#L100)。有助于模型提升这类场景下的表现。 + +### 对视频预测进行平滑处理 +关键点模型是在图片级别的基础上进行训练和预测的,对于视频类型的输入也是将视频拆分为帧进行预测。帧与帧之间虽然内容大多相似,但微小的差异仍然可能导致模型的输出发生较大的变化,表现为虽然预测的坐标大体正确,但视觉效果上有较大的抖动问题。通过添加滤波平滑处理,将每一帧预测的结果与历史结果综合考虑,得到最终的输出结果,可以有效提升视频上的表现。该部分内容可参考[滤波平滑处理](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/python/det_keypoint_unite_infer.py#L206)。 + + +## 新增或修改关键点点位定义 + +### 数据准备 +根据前述说明,完成数据的准备,放置于`{root of PaddleDetection}/dataset`下。 + +
    + 标注文件示例 + +一个标注文件示例如下: + +``` +self_dataset/ +├── train_coco_joint.json # 训练集标注文件 +├── val_coco_joint.json # 验证集标注文件 +├── images/ # 存放图片文件 +    ├── 0.jpg +    ├── 1.jpg +    ├── 2.jpg +``` +其中标注文件中需要注意的改动如下: +```json +{ + "images": [ + { + "file_name": "images/0.jpg", + "id": 0, # 图片id,注意不可重复 + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/1.jpg", + "id": 1, + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/2.jpg", + "id": 2, + "height": 1080, + "width": 1920 + }, + ... + + "categories": [ + { + "supercategory": "person", + "id": 1, + "name": "person", + "keypoints": [ # 点位序号的名称 + "point1", + "point2", + "point3", + "point4", + "point5", + ], + "skeleton": [ # 点位构成的骨骼, 训练中非必要 + [ + 1, + 2 + ], + [ + 1, + 3 + ], + [ + 2, + 4 + ], + [ + 3, + 5 + ] + ] + ... + + "annotations": [ + { + { + "category_id": 1, # 实例所属类别 + "num_keypoints": 3, # 该实例已标注点数量 + "bbox": [ # 检测框位置,格式为x, y, w, h + 799, + 575, + 55, + 185 + ], + # N*3 的列表,内容为x, y, v。 + "keypoints": [ + 807.5899658203125, + 597.5455322265625, + 2, + 0, + 0, + 0, # 未标注的点记为0,0,0 + 805.8563232421875, + 592.3446655273438, + 2, + 816.258056640625, + 594.0783081054688, + 2, + 0, + 0, + 0 + ] + "id": 1, # 实例id,不可重复 + "image_id": 8, # 实例所在图像的id,可重复。此时代表一张图像上存在多个目标 + "iscrowd": 0, # 是否遮挡,为0时参与训练 + "area": 10175 # 实例所占面积,可简单取为w * h。注意为0时会跳过,过小时在eval时会被忽略 + + ... +``` + +
    + + +### 配置文件设置 + +在配置文件中,完整的含义参考[config yaml配置项说明](../../tutorials/KeyPointConfigGuide_cn.md)。以[HRNet模型配置](../../../configs/keypoint/hrnet/hrnet_w32_256x192.yml)为例,重点需要关注的内容如下: + +
    + 配置文件示例 + +一个配置文件的示例如下 + +```yaml +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 5 # 预测的点数与定义点数量一致 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4]] # 注意只有含义上镜像对称的点才写到这里 + +... + +# 保证dataset_dir + anno_path 能正确定位到标注文件位置 +# 保证dataset_dir + image_dir + 标注文件中的图片路径能正确定位到图片 +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: train_coco_joint.json + dataset_dir: dataset/self_dataset + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: val_coco_joint.json + dataset_dir: dataset/self_dataset + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 +``` +
    + +### 模型训练及评估 +#### 模型训练 +通过如下命令启动训练: +```bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +#### 模型评估 +训练好模型之后,可以通过以下命令实现对模型指标的评估: +```bash +python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +注意:由于测试依赖pycocotools工具,其默认为`COCO`数据集的17点,如果修改后的模型并非预测17点,直接使用评估命令会报错。 +需要修改以下内容以获得正确的评估结果: +- [sigma列表](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/ppdet/modeling/keypoint_utils.py#L219),表示每个关键点的范围方差,越大则容忍度越高。其长度与预测点数一致。根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- [pycocotools sigma列表](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523),含义及内容同上,取值与sigma列表一致。 + +### 模型导出及预测 +#### Top-Down模型联合部署 +```shell +#导出关键点模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights} + +#detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down方式) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu +``` +- 注意目前PP-Human中使用的为该方案 + +#### Bottom-Up模型独立部署 +```shell +#导出模型 +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams + +#部署推理 +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 + +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection_en.md new file mode 100644 index 0000000000000000000000000000000000000000..25116777c1af90cb2b2de4b98cae58e6d6fa4ecf --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/keypoint_detection_en.md @@ -0,0 +1,258 @@ +[简体中文](./keypoint_detection.md) | English + +# Customized Keypoint Detection + +When applying keypoint detection algorithms in real practice, inevitably, we may need customization as we may dissatisfy with the current pre-trained model results, or the current keypoint detection cannot meet the actual demand, or we may want to add or replace the definition of keypoints and train a new keypoint detection model. This document will introduce how to customize the keypoint detection algorithm in PaddleDetection. + +## Data Preparation + +### Basic Process Description + +PaddleDetection currently supports `COCO` and `MPII` annotation data formats. For detailed descriptions of these two data formats, please refer to the document [Keypoint Data Preparation](./../tutorials/data/PrepareKeypointDataSet.md). In this step, by using annotation tools such as Labeme, the corresponding coordinates are annotated according to the feature point serial numbers and then converted into the corresponding trainable annotation format. And we recommend `COCO` format. + +### Merging datasets + +To extend the training data, we can merge several different datasets together. But different datasets often have different definitions of key points. Therefore, the first step in merging datasets is to unify the point definitions of different datasets, and determine the benchmark points, i.e., the types of feature points finally learned by the model, and then adjust them according to the relationship between the point definitions of each dataset and the benchmark point definitions. + +- Points in the benchmark point location: adjust the point number to make it consistent with the benchmark point location +- Points that are not in the benchmark points: discard +- Points in the dataset that are missing from the benchmark: annotate the marked points as "unannotated". + +In [Key point data preparation](... /... /tutorials/data/PrepareKeypointDataSet.md), we provide a case illustration of how to merge the `COCO` dataset and the `AI Challenger` dataset and unify them as a benchmark point definition with `COCO` for your reference. + +## Model Optimization + +### Detection and tracking model optimization + +In PaddleDetection, the keypoint detection supports Top-Down and Bottom-Up solutions. Top-Down first detects the main body and then detects the local key points. It has higher accuracy but will take a longer time as the number of detected objects increases.The Bottom-Up plan first detects the keypoints and then combines them with the corresponding parts. It is fast and its speed is independent of the number of detected objects. Its disadvantage is that the accuracy is relatively low. For details of the two solutions and the corresponding models, please refer to [Keypoint Detection Series Models](../../../configs/keypoint/README.md) + +When using the Top-Down solution, the model's effects depend on the previous detection or tracking effect. If the pedestrian position cannot be accurately detected in the actual practice, the performance of the keypoint detection will be limited. If you encounter the above problem in actual application, please refer to [Customized Object Detection](./detection_en.md) and [Customized Multi-target tracking](./pphuman_mot_en.md) for optimization of the detection and tracking model. + +### Iterate with scenario-compatible data + +The currently released keypoint detection algorithm models are mainly iterated on open source datasets such as `COCO`/ `AI Challenger`, which may lack surveillance scenarios (angles, lighting and other factors), sports scenarios (more unconventional poses) that are more similar to the actual task. Training with data that more closely matches the actual task scenario can help improve the model's results. + +### Iteration via pre-trained models + +The data annotation of the keypoint model is complex, and using the model directly to train on the business dataset from scratch is often difficult to meet the demand. When used in practical projects, it is recommended to load the pre-trained weights, which usually improve the model accuracy significantly. Let's take `HRNet` as an example with the following method: + +``` +python tools/train.py \ + -c configs/keypoint/hrnet/hrnet_w32_256x192.yml \ + -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +``` + +After loading the pre-trained model, the initial learning rate and the rounds of iterations can be reduced appropriately. It is recommended that the initial learning rate be 1/2 to 1/5 of the default configuration, and you can enable`--eval` to observe the change of AP values during the iterations. + +## Data augmentation with occlusion + +There are a lot of data in occlusion in keypoint tasks, including self-covered objects and occlusion between different objects. + +1. Detection model optimization (only for Top-Down solutions) + +Refer to [Target Detection Task Secondary Development](. /detection.md) to improve the detection model in complex scenarios. + +2. Keypoint data augmentation + +Augmentation of covered data in keypoint model training to improve model performance in such scenarios, please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/) + +### Smooth video prediction + +The keypoint model is trained and predicted on the basis of image, and video input is also predicted by splitting the video into frames. Although the content is mostly similar between frames, small differences may still lead to large changes in the output of the model. As a result of that, although the predicted coordinates are roughly correct, there may be jitters in the visual effect. + +By adding a smoothing filter process, the performance of the video output can be effectively improved by combining the predicted results of each frame and the historical results. For this part, please see [Filter Smoothing](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/python/det_keypoint_unite_infer.py#L206). + +## Add or modify keypoint definition + +### Data Preparation + +Complete the data preparation according to the previous instructions and place it under `{root of PaddleDetection}/dataset`. + +
    + Examples of annotation file + +``` +self_dataset/ +├── train_coco_joint.json # training set annotation file +├── val_coco_joint.json # Validation set annotation file +├── images/ # Store the image files +    ├── 0.jpg +    ├── 1.jpg +    ├── 2.jpg +``` + +Notable changes as follows: + +``` +{ + "images": [ + { + "file_name": "images/0.jpg", + "id": 0, # image id, id cannotdo not repeat + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/1.jpg", + "id": 1, + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/2.jpg", + "id": 2, + "height": 1080, + "width": 1920 + }, + ... + + "categories": [ + { + "supercategory": "person", + "id": 1, + "name": "person", + "keypoints": [ # the name of the point serial number + "point1", + "point2", + "point3", + "point4", + "point5", + ], + "skeleton": [ # Skeleton composed of points, not necessary for training + [ + 1, + 2 + ], + [ + 1, + 3 + ], + [ + 2, + 4 + ], + [ + 3, + 5 + ] + ] + ... + + "annotations": [ + { + { + "category_id": 1, # The category to which the instance belongs + "num_keypoints": 3, # the number of marked points of the instance + "bbox": [ # location of detection box,format is x, y, w, h + 799, + 575, + 55, + 185 + ], + # N*3 list of x, y, v. + "keypoints": [ + 807.5899658203125, + 597.5455322265625, + 2, + 0, + 0, + 0, # unlabeled points noted as 0, 0, 0 + 805.8563232421875, + 592.3446655273438, + 2, + 816.258056640625, + 594.0783081054688, + 2, + 0, + 0, + 0 + ] + "id": 1, # the id of the instance, id cannot repeat + "image_id": 8, # The id of the image where the instance is located, repeatable. This represents the presence of multiple objects on a single image +"iscrowd": 0, # covered or not, when the value is 0, it will participate in training + "area": 10175 # the area occupied by the instance, can be simply taken as w * h. Note that when the value is 0, it will be skipped, and if it is too small, it will be ignored in eval + + ... +``` + +### Settings of configuration file + +In the configuration file, refer to [config yaml configuration](... /... /tutorials/KeyPointConfigGuide_cn.md) for more details . Take [HRNet model configuration](... /... /... /configs/keypoint/hrnet/hrnet_w32_256x192.yml) as an example, we need to focus on following contents: + +
    + Example of configuration + +``` +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 5 # The number of predicted points matches the number of defined points +pixel_std: &pixel_std 200 +Metric. keyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height]. +hmsize: &hmsize [48, 64]. +flip_perm: &flip_perm [[1, 2], [3, 4]]. # Note that only points that are mirror-symmetric are recorded here. + +... + +# Ensure that dataset_dir + anno_path can correctly locate the annotation file +# Ensure that dataset_dir + image_dir + image path in annotation file can correctly locate the image. +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: train_coco_joint.json + dataset_dir: dataset/self_dataset + num_joints: *num_joints + trainsize. *trainsize + pixel_std: *pixel_std + use_gt_box: true + + +Evaluate the dataset. + !KeypointTopDownCocoDataset + image_dir: images + anno_path: val_coco_joint.json + dataset_dir: dataset/self_dataset + bbox_file: bbox.json + num_joints: *num_joints + trainsize. *trainsize + pixel_std: *pixel_std + use_gt_box: true + image_thre: 0.0 +``` + +### Model Training and Evaluation + +#### Model Training + +Run the following command to start training: + +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +#### Model Evaluation + +After training the model, you can evaluate the model metrics by running the following commands: + +``` +python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +### Model Export and Inference + +#### Top-Down model deployment + +``` +#Export keypoint model +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights} + +#detector detection + keypoint top-down model co-deployment(for top-down solutions only) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..75cdd4aebc22ddaae052c8f9be64e527d9f62fea --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute.md @@ -0,0 +1,295 @@ +简体中文 | [English](./pphuman_attribute_en.md) + +# 行人属性识别任务二次开发 + +## 数据准备 + +### 数据格式 + +格式采用PA100K的属性标注格式,共有26位属性。 + +这26位属性的名称、位置、种类数量见下表。 + +| Attribute | index | length | +|:----------|:----------|:----------| +| 'Hat','Glasses' | [0, 1] | 2 | +| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 | +| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 | +| 'boots' | [14, ] | 1 | +| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 | +| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 | +| 'Female' | [22, ] | 1 | +| 'Front','Side','Back' | [23, 24, 25] | 3 | + + +举例: + +[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] + +第一组,位置[0, 1]数值分别是[0, 1],表示'no hat'、'has glasses'。 + +第二组,位置[22, ]数值分别是[0, ], 表示gender属性是'male', 否则是'female'。 + +第三组,位置[23, 24, 25]数值分别是[0, 1, 0], 表示方向属性是侧面'side'。 + +其他组依次类推 + +### 数据标注 + +理解了上面`属性标注`格式的含义后,就可以进行数据标注的工作。其本质是:每张单人图建立一组26个长度的标注项,分别与26个位置的属性值对应。 + +举例: + +对于一张原始图片, + +1) 使用检测框,标注图片中每一个人的位置。 + +2) 每一个检测框(对应每一个人),包含一组26位的属性值数组,数组的每一位以0或1表示。对应上述26个属性。例如,如果图片是'Female',则数组第22位为0,如果满足'Age18-60',则位置[19, 20, 21]对应的数值是[0, 1, 0], 或者满足'AgeOver60',则相应数值为[1, 0, 0]. + +标注完成后利用检测框将每一个人截取成单人图,其图片与26位属性标注建立对应关系。也可先截成单人图再进行标注,效果相同。 + + +## 模型训练 + +数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。 + +其主要有两步工作需要完成:1)将数据与标注数据整理成训练格式。2)修改配置文件开始训练。 + +### 训练数据格式 + +训练数据包括训练使用的图片和一个训练列表train.txt,其具体位置在训练配置中指定,其放置方式示例如下: +``` +Attribute/ +|-- data 训练图片文件夹 +| |-- 00001.jpg +| |-- 00002.jpg +| `-- 0000x.jpg +`-- train.txt 训练数据列表 + +``` + +train.txt文件内为所有训练图片名称(相对于根路径的文件路径)+ 26个标注值 + +其每一行表示一个人的图片和标注结果。其格式为: + +``` +00001.jpg 0,0,1,0,.... +``` + +注意:1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。 + +### 修改配置开始训练 + +首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md)): + +```shell +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +需要在配置文件`PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`中,修改的配置项如下: + +``` +DataLoader: + Train: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #指定训练图片所在根路径 + cls_label_path: "dataset/pa100k/train_list.txt" #指定训练列表文件位置 + label_ratio: True + transform_ops: + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #指定评估图片所在根路径 + cls_label_path: "dataset/pa100k/val_list.txt" #指定评估列表文件位置 + label_ratio: True + transform_ops: +``` +注意: +1. 这里image_root路径+train.txt中图片相对路径,对应图片的完整路径位置。 +2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量: +``` +# model architecture +Arch: + name: "PPLCNet_x1_0" + pretrained: True + use_ssld: True + class_num: 26 #属性种类数量 +``` + +然后运行以下命令开始训练。 + +``` +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml + +#单卡训练 +python3 tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#多卡评估 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#单卡评估 +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer +``` + +导出模型后,需要下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml)文件,并放置到导出的模型文件夹`PPLCNet_x1_0_person_attribute_infer`中。 + +使用时在PP-Human中的配置文件`./deploy/pipeline/config/infer_cfg_pphuman.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`。 +``` +ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #新导出的模型路径位置 + enable: True #开启功能 +``` +然后可以使用-->至此即完成新增属性类别识别任务。 + +## 属性增减 + +上述是以26个属性为例的标注、训练过程。 + +如果需要增加、减少属性数量,则需要: + +1)标注时需增加新属性类别信息或删减属性类别信息; + +2)对应修改训练中train.txt所使用的属性数量和名称; + +3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。 + +增加属性示例: + +1. 在标注数据时在26位后继续增加新的属性标注数值; +2. 在train.txt文件的标注数值中也增加新的属性数值。 +3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要时固定的,例如第[19, 20, 21]位表示年龄,所有图片都要使用[19, 20, 21]位置表示年龄,不再赘述。 + +
    + +
    + +删减属性同理。 +例如,如果不需要年龄属性,则位置[19, 20, 21]的数值可以去掉。只需在train.txt中标注的26个数字中全部删除第19-21位数值即可,同时标注数据时也不再需要标注这3位属性值。 + +## 修改后处理代码 + +修改了属性定义后,pipeline后处理部分也需要做相应修改,主要影响结果可视化时的显示结果。 + +相应代码在路径`deploy/pipeline/pphuman/attr_infer.py`文件中`postprocess`函数。 + +其函数实现说明如下: + +``` +# 函数入口 + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + +# 1) 定义各组属性实际意义,其数量及位置与输出结果中占用位数一一对应。 + labels = self.pred_config.labels + age_list = ['AgeLess18', 'Age18-60', 'AgeOver60'] + direct_list = ['Front', 'Side', 'Back'] + bag_list = ['HandBag', 'ShoulderBag', 'Backpack'] + upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice'] + lower_list = [ + 'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts', + 'Skirt&Dress' + ] +# 2) 部分属性所用阈值与通用值有明显区别,单独设置 + glasses_threshold = 0.3 + hold_threshold = 0.6 + + batch_res = [] + for res in im_results: + res = res.tolist() + label_res = [] + # gender +# 3) 单个位置属性类别,判断该位置是否大于阈值,来分配二分类结果 + gender = 'Female' if res[22] > self.threshold else 'Male' + label_res.append(gender) + # age +# 4)多个位置属性类别,N选一形式,选择得分最高的属性 + age = age_list[np.argmax(res[19:22])] + label_res.append(age) + # direction + direction = direct_list[np.argmax(res[23:])] + label_res.append(direction) + # glasses + glasses = 'Glasses: ' + if res[1] > glasses_threshold: + glasses += 'True' + else: + glasses += 'False' + label_res.append(glasses) + # hat + hat = 'Hat: ' + if res[0] > self.threshold: + hat += 'True' + else: + hat += 'False' + label_res.append(hat) + # hold obj + hold_obj = 'HoldObjectsInFront: ' + if res[18] > hold_threshold: + hold_obj += 'True' + else: + hold_obj += 'False' + label_res.append(hold_obj) + # bag + bag = bag_list[np.argmax(res[15:18])] + bag_score = res[15 + np.argmax(res[15:18])] + bag_label = bag if bag_score > self.threshold else 'No bag' + label_res.append(bag_label) + # upper +# 5)同一类属性,分为两组(这里是款式和花色),每小组内单独选择,相当于两组不同属性。 + upper_label = 'Upper:' + sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve' + upper_label += ' {}'.format(sleeve) + upper_res = res[4:8] + if np.max(upper_res) > self.threshold: + upper_label += ' {}'.format(upper_list[np.argmax(upper_res)]) + label_res.append(upper_label) + # lower + lower_res = res[8:14] + lower_label = 'Lower: ' + has_lower = False + for i, l in enumerate(lower_res): + if l > self.threshold: + lower_label += ' {}'.format(lower_list[i]) + has_lower = True + if not has_lower: + lower_label += ' {}'.format(lower_list[np.argmax(lower_res)]) + + label_res.append(lower_label) + # shoe + shoe = 'Boots' if res[14] > self.threshold else 'No boots' + label_res.append(shoe) + + batch_res.append(label_res) + result = {'output': batch_res} + return result +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute_en.md new file mode 100644 index 0000000000000000000000000000000000000000..1dd3813ec3ff4c5e9ce66a7952b773e49555e307 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_attribute_en.md @@ -0,0 +1,223 @@ +[简体中文](pphuman_attribute.md) | English + +# Customized Pedestrian Attribute Recognition + +## Data Preparation + +### Data format + +We use the PA100K attribute annotation format, with a total of 26 attributes. + +The names, locations, and the number of these 26 attributes are shown in the table below. + +| Attribute | index | length | +|:------------------------------------------------------------------------------- |:---------------------- |:------ | +| 'Hat','Glasses' | [0, 1] | 2 | +| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 | +| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 | +| 'boots' | [14, ] | 1 | +| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 | +| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 | +| 'Female' | [22, ] | 1 | +| 'Front','Side','Back' | [23, 24, 25] | 3 | + +Examples: + +[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] + +The first group: position [0, 1] values are [0, 1], which means'no hat', 'has glasses'. + +The second group: position [22, ] values are [0, ], indicating that the gender attribute is 'male', otherwise it is 'female'. + +The third group: position [23, 24, 25] values are [0, 1, 0], indicating that the direction attribute is 'side'. + +Other groups follow in this order + + + +### Data Annotation + +After knowing the purpose of the above `attribute annotation` format, we can start to annotate data. The essence is that each single-person image creates a set of 26 annotation items, corresponding to the attribute values at 26 positions. + +Examples: + +For an original image: + +1) Using bounding boxes to annotate the position of each person in the picture. + +2) Each detection box (corresponding to each person) contains 26 attribute values which are represented by 0 or 1. It corresponds to the above 26 attributes. For example, if the picture is 'Female', then the 22nd bit of the array is 0. If the person is between 'Age18-60', then the corresponding value at position [19, 20, 21] is [0, 1, 0], or if the person matches 'AgeOver60', then the corresponding value is [1, 0, 0]. + +After the annotation is completed, the model will use the detection box to intercept each person into a single-person picture, and its picture establishes a corresponding relationship with the 26 attribute annotation. It is also possible to cut into a single-person image first and then annotate it. The results are the same. + + + +## Model Training + +Once the data is annotated, it can be used for model training to complete the optimization of the customized model. + +There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training. + +### Training data format + +The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example: + +``` +Attribute/ +|-- data Training images folder +|-- 00001.jpg +|-- 00002.jpg +| `-- 0000x.jpg +train.txt List of training data +``` + +train.txt file contains the names of all training images (file path relative to the root path) + 26 annotation values + +Each line of it represents a person's image and annotation result. The format is as follows: + +``` +00001.jpg 0,0,1,0,.... +``` + +Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail. + + + +### Modify the configuration to start training + +First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)): + +``` +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml` + +``` +DataLoader: + Train: + Train: dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #Specify the root path of training image + cls_label_path: "dataset/pa100k/train_list.txt" #Specify the location of the training list file + label_ratio: True + transform_ops: + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #Specify the root path of evaluated image + cls_label_path: "dataset/pa100k/val_list.txt" #Specify the location of the evaluation list file + label_ratio: True + transform_ops: +``` + +Note: + +1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image. +2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly. + +``` +# model architecture +Arch: +name: "PPLCNet_x1_0" +pretrained: True +use_ssld: True +class_num: 26 #Attribute classes and numbers +``` + +Then run the following command to start training: + +``` +#Multi-card training +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml + +#Single card training +python3 tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml +``` + +You can run the following commands for performance evaluation after the training is completed: + +``` +#Multi-card evaluation +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#Single card evaluation +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + +### Model Export + +Use the following command to export the trained model as an inference deployment model. + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer +``` + +After exporting the model, you need to download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml) file and put it into the exported model folder `PPLCNet_x1_0_person_ attribute_infer` . + +When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Human `. /deploy/pipeline/config/infer_cfg_pphuman.yml` . + +``` +ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #The exported model location + enable: True #Whether to enable the function +``` + + + +Now, the model is ready for you. + + To this point, a new attribute category recognition task is completed. + + + +## Adding or deleting attributes + +The above is the annotation and training process with 26 attributes. + +If the attributes need to be added or deleted, you need to + +1) New attribute category information needs to be added or deleted when annotating the data. + +2) Modify the number and name of attributes used in train.txt corresponding to the training. + +3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above. + +Example of adding attributes. + +1. Continue to add new attribute annotation values after 26 values when annotating the data. +2. Add new attribute values to the annotated values in the train.txt file as well. +3. The above is the annotation and training process with 26 attributes. + + If the attributes need to be added or deleted, you need to + 1) New attribute category information needs to be added or deleted when annotating the data. + + 2) Modify the number and name of attributes used in train.txt corresponding to the training. + + 3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above. + + Example of adding attributes. + + 1. Continue to add new attribute annotation values after 26 values when annotating the data. + 2. Add new attribute values to the annotated values in the train.txt file as well. + 3. Note that the correlation of attribute types and values in train.txt needs to be fixed, for example, the [19, 20, 21] position indicates age, and all images should use the [19, 20, 21] position to indicate age. + + + + The same applies to the deletion of attributes. + For example, if the age attribute is not needed, the values in positions [19, 20, 21] can be removed. You can simply remove all the values in positions 19-21 from the 26 numbers marked in train.txt, and you no longer need to annotate these 3 attribute values. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot.md new file mode 100644 index 0000000000000000000000000000000000000000..209c603267c6799d2ed3b8e096d977fa2ff5f7ab --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot.md @@ -0,0 +1,63 @@ +简体中文 | [English](./pphuman_mot_en.md) + +# 多目标跟踪任务二次开发 + +在产业落地过程中应用多目标跟踪算法,不可避免地会出现希望自定义类型的多目标跟踪的需求,或是对已有多目标跟踪模型的优化,以提升在特定场景下模型的效果。我们在本文档通过案例来介绍如何根据期望识别的行为来进行多目标跟踪方案的选择,以及使用PaddleDetection进行多目标跟踪算法二次开发工作,包括:数据准备、模型优化思路和跟踪类别修改的开发流程。 + +## 数据准备 + +多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf),其中使用PP-YOLOE替换原文的YOLOX作为检测器,使用BYTETracker作为跟踪器,详细文档参考[ByteTrack](../../../configs/mot/bytetrack)。原文的ByteTrack只支持行人单类别,PaddleDetection中也支持多类别同时进行跟踪。训练ByteTrack也就是训练检测器的过程,只需要准备好检测标注即可,不需要ReID标注信息,即当成纯检测来做即可。数据集最好是连续视频中抽取出来的而不是无关联的图片集合。 +二次开发首先需要进行数据集的准备,针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用Labeme,LabelImg等标注工具标注目标检测框,并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md) + +## 模型优化 + +### 1. 使用自定义数据集训练 + +ByteTrack跟踪方案采用的数据集只需要有检测标注即可。参照[MOT数据集准备](../../../configs/mot)和[MOT数据集教程](docs/tutorials/data/PrepareMOTDataSet.md)。 + +``` +# 单卡训练 +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp + +# 多卡训练 +python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)和[ByteTrack](../../../configs/mot/bytetrack/detector) + + +### 2. 加载COCO模型作为预训练 + +目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重,加载到检测算法的骨干网络中,实际使用时,建议加载COCO数据集训练好的权重,通常能够对模型精度有较大提升,使用方法如下: + +#### 1) 设置预训练权重路径 + +COCO数据集训练好的模型权重均在各算法配置文件夹下,例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重:[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) 修改超参数 + +加载COCO预训练权重后,需要修改学习率超参数,例如`configs/ppyoloe/_base_/optimizer_300e.yml`中: + +``` +epoch: 120 # 原始配置为300epoch,加载COCO权重后可以适当减少迭代轮数 + +LearningRate: + base_lr: 0.005 # 原始配置为0.025,加载COCO权重后需要降低学习率 + schedulers: + - !CosineDecay + max_epochs: 144 # 依据epoch数进行修改,一般为epoch数的1.2倍 + - !LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## 跟踪类别修改 + +当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中: + +``` +metric: COCO +num_classes: 10 # 原始类别1 +``` + +配置修改完成后,同样可以加载COCO预训练权重,PaddleDetection支持自动加载shape匹配的权重,对于shape不匹配的权重会自动忽略,因此无需其他修改。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot_en.md new file mode 100644 index 0000000000000000000000000000000000000000..0aaea495663666782a318ccbc945f11169598eff --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mot_en.md @@ -0,0 +1,65 @@ +[简体中文](./pphuman_mot.md) | English + +# Customized multi-object tracking task + +When applying multi-object tracking algorithms in industrial applications, there will be inevitable demands for customized types of multi-object tracking or optimization of existing multi-object tracking models to improve the effectiveness of the models in specific scenarios. In this document, we present examples of how to choose a multi-object tracking solution based on the expected identified behavior, and how to use PaddleDetection for further development of multi-object tracking algorithms, including data preparation, model optimization ideas, and the development process of tracking category modification. + +## Data Preparation + +The multi-object tracking model scheme uses [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf), which adopts PP-YOLOE to replace the original YOLOX as a detector and BYTETracker as a tracker, for details, please refer to [ByteTrack](... /... /... /configs/mot/bytetrack). The original ByteTrack only supports single pedestrian category, while PaddleDetection supports multiple categories for simultaneous tracking. Training ByteTrack, which is the process of training the detector, only requires the detection annotations to be prepared, and does not require ReID annotation information, i.e., it can be done as pure detection. The dataset should preferably be extracted from continuous video rather than a collection of unrelated images. + +Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection frame and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/README.md) + +## Model Optimization + +### 1. Use customized data set for training + +The dataset used by the ByteTrack tracking solution only needs detection annotations. Refer to [MOT dataset preparation](... /... /... /configs/mot) and [MOT dataset tutorial](docs/tutorials/data/PrepareMOTDataSet.md). + +``` +# Single card training +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp + +# Multi-card training +python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp +``` + +More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) and [ByteTrack](../../../configs/mot/bytetrack/detector) + +### 2. Load the COCO model as the pre-trained model + +The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows. + +#### 1) Set pre-training weight path + +The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) Modify hyperparameters + +After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example + +In `configs/ppyoloe/*base*/optimizer_300e.yml`: + +``` +epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately + +LearningRate: +base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced. + schedulers: + - !CosineDecay + max_epochs: 144 # Modified according to the number of epochs, usually 1.2 times the number of epochs + - LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## Modify categories + +When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`: + +``` +metric: COCO +num_classes: 10 # original class 80 +``` + +After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct.md new file mode 100644 index 0000000000000000000000000000000000000000..0d784e243cb782c2a5152dff3bd3652138391344 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct.md @@ -0,0 +1,159 @@ +简体中文 | [English](./pphuman_mtmct_en.md) + +# 跨镜跟踪任务二次开发 + +## 数据准备 + +### 数据格式 + +跨镜跟踪使用行人REID技术实现,其训练方式采用多分类模型训练,使用时取分类softmax头部前的特征作为检索特征向量。 + +因此其格式与多分类任务相同。每一个行人分配一个专属id,不同行人id不同,同一行人在不同图片中的id相同。 + +例如图片0001.jpg、0003.jpg是同一个人,0002.jpg、0004.jpg是不同的其他行人。则标注id为: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +... +``` + +依次类推。 + +### 数据标注 + +理解了上面`标注`格式的含义后,就可以进行数据标注的工作。其本质是:每张单人图建立一个标注项,对应该行人分配的id。 + +举例: + +对于一张原始图片, + +1) 使用检测框,标注图片中每一个人的位置。 + +2) 每一个检测框(对应每一个人),包含一个int类型的id属性。例如,上述举例中的0001.jpg中的人,对应id:1. + +标注完成后利用检测框将每一个人截取成单人图,其图片与id属性标注建立对应关系。也可先截成单人图再进行标注,效果相同。 + +## 模型训练 + + +数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。 + +其主要有两步工作需要完成:1)将数据与标注数据整理成训练格式。2)修改配置文件开始训练。 + +### 训练数据格式 + +训练数据包括训练使用的图片和一个训练列表bounding_box_train.txt,其具体位置在训练配置中指定,其放置方式示例如下: + +``` +REID/ +|-- data 训练图片文件夹 +| |-- 00001.jpg +| |-- 00002.jpg +| `-- 0000x.jpg +`-- bounding_box_train.txt 训练数据列表 + +``` + +bounding_box_train.txt文件内为所有训练图片名称(相对于根路径的文件路径)+ 1个id标注值 + +其每一行表示一个人的图片和id标注结果。其格式为: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +``` +注意:图片与标注值之间是以Tab[\t]符号隔开。该格式不能错,否则解析失败。 + +### 修改配置开始训练 + +首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md)): + +```shell +git clone https://github.com/PaddlePaddle/PaddleClas +``` + + +需要在配置文件[softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml)中,修改的配置项如下: + +``` + Head: + name: "FC" + embedding_size: *feat_dim + class_num: &class_num 751 #行人id总数量 + +DataLoader: + Train: + dataset: + name: "Market1501" + image_root: "./dataset/" #训练图片根路径 + cls_label_path: "bounding_box_train" #训练文件列表 + + + Eval: + Query: + dataset: + name: "Market1501" + image_root: "./dataset/" #评估图片根路径 + cls_label_path: "query" #评估文件列表 + +``` +注意: + +1. 这里image_root路径+bounding_box_train.txt中图片相对路径,对应图片存放的完整路径。 + +然后运行以下命令开始训练。 + +``` +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml + +#单卡训练 +python3 tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#多卡评估 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model + +#单卡评估 +python3 tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model +``` + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model \ + -o Global.save_inference_dir=deploy/models/strong_baseline_inference +``` + +导出模型后,下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml)文件到新导出的模型文件夹'strong_baseline_inference'中。 + +使用时在PP-Human中的配置文件infer_cfg_pphuman.yml中修改模型路径`model_dir`并开启功能`enable`。 +``` +REID: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/ + enable: True +``` +然后可以使用。至此完成模型开发。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct_en.md new file mode 100644 index 0000000000000000000000000000000000000000..e3638c2700a4cd20d5bcb6f85c6c6d2b610a56ee --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/pphuman_mtmct_en.md @@ -0,0 +1,165 @@ +[简体中文](./pphuman_mtmct.md) | English + +# Customized Multi-Target Multi-Camera Tracking Module of PP-Human + +## Data Preparation + +### Data Format + + + +Multi-target multi-camera tracking, or mtmct is achieved by the pedestrian REID technique. It is trained with a multiclassification model and uses the features before the head of the classification softmax as the retrieval feature vector. + +Therefore its format is the same as the multi-classification task. Each pedestrian is assigned an exclusive id, which is different for different pedestrians while the same pedestrian has the same id in different images. + +For example, images 0001.jpg, 0003.jpg are the same person, 0002.jpg, 0004.jpg are different pedestrians. Then the labeled ids are. + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +... +``` + +### Data Annotation + +After understanding the meaning of the `annotation` format above, we can work on the data annotation. The essence of data annotation is that each single person diagram creates an annotation item that corresponds to the id assigned to that pedestrian. + +For example: + +For an original picture + +1) Use bouding boxes to annotate the position of each person in the picture. + +2) Each bouding box (corresponding to each person) contains an int id attribute. For example, the person in 0001.jpg in the above example corresponds to id: 1. + +After the annotation is completed, use the detection box to intercept each person into a single picture, the picture and id attribute annotation will establish a corresponding relationship. You can also first cut into a single image and then annotate, the result is the same. + + + +## Model Training + +Once the data is annotated, it can be used for model training to complete the optimization of the customized model. + +There are two main steps to implement: 1) organize the data and annotated data into a training format. 2) modify the configuration file to start training. + +### Training data format + +The training data consists of the images used for training and a training list bounding_box_train.txt, the location of which is specified in the training configuration, with the following example placement. + + +``` +REID/ +|-- data Training image folder +|-- 00001.jpg +|-- 00002.jpg +|-- 0000x.jpg +`-- bounding_box_train.txt List of training data +``` + +bounding_box_train.txt file contains the names of all training images (file path relative to the root path) + 1 id annotation value + +Each line represents a person's image and id annotation result. The format is as follows: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +``` + +Note: The images are separated from the annotated values by a Tab[\t] symbol. This format must be correct, otherwise, the parsing will fail. + + + +### Modify the configuration to start training + +First, execute the following command to download the training code (for more environment issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md): + +``` +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +You need to change the following configuration items in the configuration file [softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_ baseline/softmax_triplet_with_center.yaml): + +``` + Head: + name: "FC" + embedding_size: *feat_dim + class_num: &class_num 751 #Total number of pedestrian ids + +DataLoader: + Train: + dataset: + name: "Market1501" + image_root: ". /dataset/" #training image root path + cls_label_path: "bounding_box_train" #training_file_list + + + Eval: + Query: + dataset: + name: "Market1501" + image_root: ". /dataset/" #Evaluated image root path + cls_label_path: "query" #List of evaluation files +``` + +Note: + +1. Here the image_root path + the relative path of the image in the bounding_box_train.txt corresponds to the full path where the image is stored. + +Then run the following command to start the training. + +``` +#Multi-card training +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml + +#Single card training +python3 tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml +``` + +After the training is completed, you may run the following commands for performance evaluation: + +``` +#Multi-card evaluation +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model + +#Single card evaluation +python3 tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model +``` + +### Model Export + +Use the following command to export the trained model as an inference deployment model. + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model \ + -o Global.save_inference_dir=deploy/models/strong_baseline_inference +``` + +After exporting the model, download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml) file to the newly exported model folder 'strong_baseline_ inference'. + +Change the model path `model_dir` in the configuration file `infer_cfg_pphuman.yml` in PP-Human and set `enable`. + +``` +REID: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/ + enable: True +``` + +Now, the model is ready. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..6cc2bf2c8ed109b0d8ef6e147278ede219a6d641 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute.md @@ -0,0 +1,257 @@ +简体中文 | [English](./ppvehicle_attribute_en.md) + +# 车辆属性识别任务二次开发 + +## 数据准备 + +### 数据格式 + +车辆属性模型采用VeRi数据集的属性,共计10种车辆颜色及9种车型, 具体如下: +``` +# 车辆颜色 +- "yellow" +- "orange" +- "green" +- "gray" +- "red" +- "blue" +- "white" +- "golden" +- "brown" +- "black" + +# 车型 +- "sedan" +- "suv" +- "van" +- "hatchback" +- "mpv" +- "pickup" +- "bus" +- "truck" +- "estate" +``` + +在标注文件中使用长度为19的序列来表示上述属性。 + +举例: + +[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0] + +前10位中,位序号0的值为1,表示车辆颜色为`"yellow"`。 + +后9位中,位序号11的值为1,表示车型为`"suv"`。 + + +### 数据标注 + +理解了上面`数据格式`的含义后,就可以进行数据标注的工作。其本质是:每张车辆的图片,建立一组长度为19的标注项,分别对应各项属性值。 + +举例: + +对于一张原始图片, + +1) 使用检测框,标注图片中每台车辆的位置。 + +2) 每一个检测框(对应每辆车),包含一组19位的属性值数组,数组的每一位以0或1表示。对应上述19个属性分类。例如,如果颜色是'orange',则数组索引为1的位置值为1,如果车型是'sedan',则数组索引为10的位置值为1。 + +标注完成后利用检测框将每辆车截取成只包含单辆车的图片,则图片与19位属性标注建立了对应关系。也可先截取再进行标注,效果相同。 + + +## 模型训练 + +数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。 + +其主要有两步工作需要完成:1)将数据与标注数据整理成训练格式。2)修改配置文件开始训练。 + +### 训练数据格式 + +训练数据包括训练使用的图片和一个训练列表train.txt,其具体位置在训练配置中指定,其放置方式示例如下: +``` +Attribute/ +|-- data 训练图片文件夹 +| |-- 00001.jpg +| |-- 00002.jpg +| `-- 0000x.jpg +`-- train.txt 训练数据列表 + +``` + +train.txt文件内为所有训练图片名称(相对于根路径的文件路径)+ 19个标注值 + +其每一行表示一辆车的图片和标注结果。其格式为: + +``` +00001.jpg 0,0,1,0,.... +``` + +注意:1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。 + +### 修改配置开始训练 + +首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md)): + +```shell +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +需要在[配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml)中,修改的配置项如下: +```yaml +DataLoader: + Train: + dataset: + name: MultiLabelDataset + image_root: "dataset/VeRi/" # the root path of training images + cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file + label_ratio: True + transform_ops: + ... + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/VeRi/" # the root path of evaluation images + cls_label_path: "dataset/VeRi/val_list.txt" # the location of the evaluation list file + label_ratio: True + transform_ops: + ... +``` + +注意: +1. 这里image_root路径+train.txt中图片相对路径,对应图片的完整路径位置。 +2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量: +```yaml +# model architecture +Arch: + name: "PPLCNet_x1_0" + pretrained: True + use_ssld: True + class_num: 19 #属性种类数量 +``` + +然后运行以下命令开始训练。 + +``` +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml + +#单卡训练 +python3 tools/train.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#多卡评估 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#单卡评估 +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model +``` + +导出模型后,如果希望在PP-Vehicle中使用,则需要下载[预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip),解压并将其中的配置文件`infer_cfg.yml`文件,放置到导出的模型文件夹`PPLCNet_x1_0_vehicle_attribute_model`中。 + +使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`。 +``` +VEHICLE_ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #新导出的模型路径位置 + enable: True #开启功能 +``` +然后可以使用-->至此即完成新增属性类别识别任务。 + +## 属性增减 + +该过程与行人属性的增减过程相似,如果需要增加、减少属性数量,则需要: + +1)标注时需增加新属性类别信息或删减属性类别信息; + +2)对应修改训练中train.txt所使用的属性数量和名称; + +3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。 + +增加属性示例: + +1. 在标注数据时在19位后继续增加新的属性标注数值; +2. 在train.txt文件的标注数值中也增加新的属性数值。 +3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要固定。 + +
    + +
    + +删减属性同理。 + + +## 修改后处理代码 + +修改了属性定义后,pipeline后处理部分也需要做相应修改,主要影响结果可视化时的显示结果。 + +相应代码在[文件](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/ppvehicle/vehicle_attr.py#L108)中`postprocess`函数。 + +其函数实现说明如下: + +```python + # 在类的初始化函数中,定义了颜色/车型的名称 + self.color_list = [ + "yellow", "orange", "green", "gray", "red", "blue", "white", + "golden", "brown", "black" + ] + self.type_list = [ + "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck", + "estate" + ] + + ... + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + res = res.tolist() + attr_res = [] + color_res_str = "Color: " + type_res_str = "Type: " + color_idx = np.argmax(res[:10]) # 前10项表示各项颜色得分,取得分最大项作为颜色结果 + type_idx = np.argmax(res[10:]) # 后9项表示各项车型得分,取得分最大项作为车型结果 + + # 颜色和车型的得分都需要超过对应阈值,否则视为'UnKnown' + if res[color_idx] >= self.color_threshold: + color_res_str += self.color_list[color_idx] + else: + color_res_str += "Unknown" + attr_res.append(color_res_str) + + if res[type_idx + 10] >= self.type_threshold: + type_res_str += self.type_list[type_idx] + else: + type_res_str += "Unknown" + attr_res.append(type_res_str) + + batch_res.append(attr_res) + result = {'output': batch_res} + return result +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute_en.md new file mode 100644 index 0000000000000000000000000000000000000000..900e92a585f64541590d026b5de22b7a5152f177 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_attribute_en.md @@ -0,0 +1,271 @@ +[简体中文](ppvehicle_attribute.md) | English + +# Customized Vehicle Attribute Recognition + +## Data Preparation + +### Data Format + +We use the VeRi attribute annotation format, with a total of 10 color and 9 model attributes shown as follows. + +``` +# colors +- "yellow" +- "orange" +- "green" +- "gray" +- "red" +- "blue" +- "white" +- "golden" +- "brown" +- "black" + +# models +- "sedan" +- "suv" +- "van" +- "hatchback" +- "mpv" +- "pickup" +- "bus" +- "truck" +- "estate" +``` + +A sequence of length 19 is used in the annotation file to represent the above attributes. + +Examples: + +[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0] + +In the first 10 bits, the value of bit index 0 is 1, indicating that the vehicle color is `"yellow"`. + +In the last 9 bits, the value of bit index 11 is 1, indicating that the model is `"suv"`. + + +### Data Annotation + +After knowing the purpose of the above `Data format`, we can start to annotate data. The essence is that each single-vehicle image creates a set of 19 annotation items, corresponding to the attribute values at 19 positions. + +Examples: + +For an original image: + +1) Using bounding boxes to annotate the position of each vehicle in the picture. + +2) Each detection box (corresponding to each vehicle) contains 19 attribute values which are represented by 0 or 1. It corresponds to the above 19 attributes. For example, if the color is 'orange', then the index 1 bit of the array is 1. If the model is 'sedan', then the index 10 bit of the array is 1. + +After the annotation is completed, the model will use the detection box to intercept each vehicle into a single-vehicle picture, and its picture establishes a corresponding relationship with the 19 attribute annotation. It is also possible to cut into a single-vehicle image first and then annotate it. The results are the same. + + + +## Model Training + +Once the data is annotated, it can be used for model training to complete the optimization of the customized model. + +There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training. + +### Training Data Format + +The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example: + +``` +Attribute/ +|-- data Training images folder +|-- 00001.jpg +|-- 00002.jpg +| `-- 0000x.jpg +train.txt List of training data +``` + +train.txt file contains the names of all training images (file path relative to the root path) + 19 annotation values + +Each line of it represents a vehicle's image and annotation result. The format is as follows: + +``` +00001.jpg 0,0,1,0,.... +``` + +Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail. + + +### Modify The Configuration To Start Training + +First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)): + +``` +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml` + +```yaml +DataLoader: + Train: + dataset: + name: MultiLabelDataset + image_root: "dataset/VeRi/" # the root path of training images + cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file + label_ratio: True + transform_ops: + ... + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/VeRi/" # the root path of evaluation images + cls_label_path: "dataset/VeRi/val_list.txt" # the location of the training list file + label_ratio: True + transform_ops: + ... +``` + +Note: + +1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image. +2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly. + +```yaml +# model architecture +Arch: +name: "PPLCNet_x1_0" +pretrained: True +use_ssld: True +class_num: 19 # Number of attribute classes +``` + +Then run the following command to start training: + +```bash +#Multi-card training +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml + +#Single card training +python3 tools/train.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml +``` + +You can run the following commands for performance evaluation after the training is completed: + +``` +#Multi-card evaluation +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#Single card evaluation +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + +### Model Export + +Use the following command to export the trained model as an inference deployment model. + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model +``` + +After exporting the model, if want to use it in PP-Vehicle, you need to download the [deploy infer model](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) and copy `infer_cfg.yml` into the exported model folder `PPLCNet_x1_0_vehicle_attribute_model` . + +When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Vehicle `. /deploy/pipeline/config/infer_cfg_ppvehicle.yml` . + +``` +VEHICLE_ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #The exported model location + enable: True #Whether to enable the function +``` + +To this point, a new attribute category recognition task is completed. + + + +## Adding or deleting attributes + +This is similar to the increase and decrease process of pedestrian attributes. + +If the attributes need to be added or deleted, you need to + +1) New attribute category information needs to be added or deleted when annotating the data. + +2) Modify the number and name of attributes used in train.txt corresponding to the training. + +3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above. + +Example of adding attributes. + +1. Continue to add new attribute annotation values after 19 values when annotating the data. +2. Add new attribute values to the annotated values in the train.txt file as well. +3. The above is the annotation and training process with 19 attributes. + +
    + +
    + + +The same applies to the deletion of attributes. + + +## Modifications to post-processing code + +After modifying the attribute definition, the post-processing part of the pipeline also needs to be modified accordingly, which mainly affects the display results when the results are visualized. + + +The code is at [file](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/deploy/pipeline/ppvehicle/vehicle_attr.py#L108), that is, the `postprocess` function. + +The function implementation is described as follows: + +```python + # The name of the color/model is defined in the initialization function of the class + self.color_list = [ + "yellow", "orange", "green", "gray", "red", "blue", "white", + "golden", "brown", "black" + ] + self.type_list = [ + "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck", + "estate" + ] + + ... + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + res = res.tolist() + attr_res = [] + color_res_str = "Color: " + type_res_str = "Type: " + color_idx = np.argmax(res[:10]) # The first 10 items represent the color scores, and the item with the largest score is used as the color result + type_idx = np.argmax(res[10:]) # The last 9 items represent the model scores, and the item with the largest score is used as the model result. + + # The score of color and model need to be larger than the corresponding threshold, otherwise it will be regarded as 'UnKnown' + if res[color_idx] >= self.color_threshold: + color_res_str += self.color_list[color_idx] + else: + color_res_str += "Unknown" + attr_res.append(color_res_str) + + if res[type_idx + 10] >= self.type_threshold: + type_res_str += self.type_list[type_idx] + else: + type_res_str += "Unknown" + attr_res.append(type_res_str) + + batch_res.append(attr_res) + result = {'output': batch_res} + return result +``` diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_plate.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_plate.md new file mode 100644 index 0000000000000000000000000000000000000000..37694ff8f3555bdcbb9a0dfb16281c025812d484 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_plate.md @@ -0,0 +1,215 @@ +简体中文 | [English](./ppvehicle_plate_en.md) + +# 车牌识别任务二次开发 + +车牌识别任务,采用PP-OCRv3模型在车牌数据集上进行fine-tune得到,过程参考[PaddleOCR车牌应用介绍](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/applications/%E8%BD%BB%E9%87%8F%E7%BA%A7%E8%BD%A6%E7%89%8C%E8%AF%86%E5%88%AB.md)在CCPD2019数据集上进行了拓展。 + +## 数据准备 + +1. 对于CCPD2019、CCPD2020数据集,我们提供了处理脚本[ccpd2ocr_all.py](../../../deploy/pipeline/tools/ccpd2ocr_all.py), 使用时跟CCPD2019、CCPD2020数据集文件夹放在同一目录下,然后执行脚本即可在CCPD2019/PPOCR、CCPD2020/PPOCR目录下得到检测、识别模型的训练标注文件。训练时可以整合到一起使用。 +2. 对于其他来源数据或者自标注数据,可以按如下格式整理训练列表文件: + +- **车牌检测标注** + +标注文件格式如下,中间用'\t'分隔: + +``` +" 图像文件路径 标注框标注信息" +CCPD2020/xxx.jpg [{"transcription": "京AD88888", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}] +``` + +标注框标注信息是包含多个字典的list,有多少个标注框就有多少个字典对应,字典中的 `points` 表示车牌框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 `transcription` 表示当前文本框的文字,***当其内容为“###”时,表示该文本框无效,在训练时会跳过。*** + +- **车牌字符识别标注** + +标注文件的格式如下,txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。其中图片是对车牌字符的截图。 + +``` +" 图像文件名 字符标注信息 " +CCPD2020/crop_imgs/xxx.jpg 京AD88888 +``` + +## 模型训练 + +首先执行以下命令clone PaddleOCR库代码到训练机器: +``` +git clone git@github.com:PaddlePaddle/PaddleOCR.git +``` + +下载预训练模型: +``` +#检测预训练模型: +mkdir models +cd models +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar +tar -xf ch_PP-OCRv3_det_distill_train.tar + +#识别预训练模型: +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar +tar -xf ch_PP-OCRv3_rec_train.tar +cd .. +``` + +安装相关依赖环境: +``` +cd PaddleOCR +pip install -r requirements.txt +``` + +然后进行训练相关配置修改。 + +### 修改配置 + +**检测模型配置项** + +修改配置项包括以下3部分内容,可以在训练时以命令行修改,或者直接在配置文件`configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml`中修改: +1. 模型存储和训练相关: +- Global.pretrained_model: 指向前面下载的PP-OCRv3文本检测预训练模型地址 +- Global.eval_batch_step: 模型多少step评估一次,一般设置为一个epoch对应的step数,可以从训练开始的log中读取。此处以[0, 772]为例,第一个数字表示从第0各step开始算起。 +2. 优化器相关: +- Optimizer.lr.name: 学习率衰减器设为常量 Const +- Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍 +- Optimizer.lr.warmup_epoch: warmup_epoch设为0 +3. 数据集相关: +- Train.dataset.data_dir:指向训练集图片存放根目录 +- Train.dataset.label_file_list:指向训练集标注文件 +- Eval.dataset.data_dir:指向测试集图片存放根目录 +- Eval.dataset.label_file_list:指向测试集标注文件 + +**识别模型配置项** + +修改配置项包括以下3部分内容,可以在训练时以命令行修改,或者直接在配置文件`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml`中修改: +1. 模型存储和训练相关: +- Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址 +- Global.eval_batch_step: 模型多少step评估一次,一般设置为一个epoch对应的step数,可以从训练开始的log中读取。此处以[0, 90]为例,第一个数字表示从第0各step开始算起。 +2. 优化器相关 +- Optimizer.lr.name: 学习率衰减器设为常量 Const +- Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍 +- Optimizer.lr.warmup_epoch: warmup_epoch设为0 +3. 数据集相关 +- Train.dataset.data_dir:指向训练集图片存放根目录 +- Train.dataset.label_file_list:指向训练集标注文件 +- Eval.dataset.data_dir:指向测试集图片存放根目录 +- Eval.dataset.label_file_list:指向测试集标注文件 + +### 执行训练 + +然后运行以下命令开始训练。如果在配置文件中已经做了修改,可以省略`-o`及其后面的内容。 + +**检测模型训练命令** + +``` +#单卡训练 +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \ + Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \ + Global.save_model_dir=output/CCPD/det \ + Global.eval_batch_step="[0, 772]" \ + Optimizer.lr.name=Const \ + Optimizer.lr.learning_rate=0.0005 \ + Optimizer.lr.warmup_epoch=0 \ + Train.dataset.data_dir=/home/aistudio/ccpd_data/ \ + Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt] + +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \ + Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \ + Global.save_model_dir=output/CCPD/det \ + Global.eval_batch_step="[0, 772]" \ + Optimizer.lr.name=Const \ + Optimizer.lr.learning_rate=0.0005 \ + Optimizer.lr.warmup_epoch=0 \ + Train.dataset.data_dir=/home/aistudio/ccpd_data/ \ + Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt] + +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#单卡评估 +python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \ + Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \ + Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \ + Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/det.txt] +``` + +**识别模型训练命令** + +``` +#单卡训练 +python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \ + Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \ + Global.save_model_dir=output/CCPD/rec/ \ + Global.eval_batch_step="[0, 90]" \ + Optimizer.lr.name=Const \ + Optimizer.lr.learning_rate=0.0005 \ + Optimizer.lr.warmup_epoch=0 \ + Train.dataset.data_dir=/home/aistudio/ccpd_data \ + Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \ + Eval.dataset.data_dir=/home/aistudio/ccpd_data \ + Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt] + + +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \ + Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \ + Global.save_model_dir=output/CCPD/rec/ \ + Global.eval_batch_step="[0, 90]" \ + Optimizer.lr.name=Const \ + Optimizer.lr.learning_rate=0.0005 \ + Optimizer.lr.warmup_epoch=0 \ + Train.dataset.data_dir=/home/aistudio/ccpd_data \ + Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \ + Eval.dataset.data_dir=/home/aistudio/ccpd_data \ + Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt] + +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#单卡评估 +python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \ + Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \ + Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \ + Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt] +``` + + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +**检测模型导出** + +``` +python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \ + Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \ + Global.save_inference_dir=output/det/infer +``` + +**识别模型导出** + +``` +python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \ + Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \ + Global.save_inference_dir=output/CCPD/rec/infer +``` + + +使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`VEHICLE_PLATE`模块中的`det_model_dir`、`rec_model_dir`项,并开启功能`enable: True`。 +``` +VEHICLE_PLATE: + det_model_dir: [YOUR_DET_INFERENCE_MODEL_PATH] #设置检测模型路径 + det_limit_side_len: 736 + det_limit_type: "max" + rec_model_dir: [YOUR_REC_INFERENCE_MODEL_PATH] #设置识别模型路径 + rec_image_shape: [3, 48, 320] + rec_batch_num: 6 + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt + enable: True #开启功能 +``` + +然后可以使用-->至此即完成更新车牌识别模型任务。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation.md new file mode 100644 index 0000000000000000000000000000000000000000..b82fe97d3334f636b15a1004ccbbaec8c36082a2 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation.md @@ -0,0 +1,235 @@ +简体中文 | [English](./ppvehicle_violation_en.md) + +# 车辆违章任务二次开发 + +车辆违章任务的二次开发,主要集中于车道线分割模型任务。采用PP-LiteSeg模型在车道线数据集bdd100k,上进行fine-tune得到,过程参考[PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。 + +## 数据准备 + +ppvehicle违法分析将车道线类别分为4类 +``` +0 背景 +1 双黄线 +2 实线 +3 虚线 + +``` + +1. 对于bdd100k数据集,可以结合我们的提供的处理脚本[lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py)和bdd100k官方[repo](https://github.com/bdd100k/bdd100k)将数据处理成分割需要的数据格式. + +``` +#首先执行以下命令clone bdd100k库: +git clone https://github.com/bdd100k/bdd100k.git + +#拷贝lane_to_mask.py到bdd100k目录 +cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/ + +#准备bdd100k环境 +cd bdd100k && pip install -r requirements.txt + +#数据转换 +python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path + +# -i bdd100k数据集label的json路径, +# -o 生成的mask图像路径 + +``` + +2. 整理数据,按如下格式存放数据 +``` +dataset_root + | + |--images + | |--train + | |--image1.jpg + | |--image2.jpg + | |--... + | |--val + | |--image3.jpg + | |--image4.jpg + | |--... + | |--test + | |--image5.jpg + | |--image6.jpg + | |--... + | + |--labels + | |--train + | |--label1.jpg + | |--label2.jpg + | |--... + | |--val + | |--label3.jpg + | |--label4.jpg + | |--... + | |--test + | |--label5.jpg + | |--label6.jpg + | |--... + | +``` +运行[create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py)生成txt文件 +``` +python create_dataset_list.py #数据根目录 + --type custom #数据类型,支持cityscapes、custom + + +``` +其他数据以及数据标注,可参考PaddleSeg[准备自定义数据集](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md) + + +## 模型训练 + +首先执行以下命令clone PaddleSeg库代码到训练机器: +``` +git clone https://github.com/PaddlePaddle/PaddleSeg.git +``` + +安装相关依赖环境: +``` +cd PaddleSeg +pip install -r requirements.txt +``` + +### 准备配置文件 +详细可参考PaddleSeg[准备配置文件](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md). +本例用pp_liteseg_stdc2_bdd100k_1024x512.yml示例 + +``` +batch_size: 16 +iters: 50000 + +train_dataset: + type: Dataset + dataset_root: data/bdd100k #数据集路径 + train_path: data/bdd100k/train.txt #数据集训练txt文件 + num_classes: 4 #ppvehicle将道路分为4类 + mode: train + transforms: + - type: ResizeStepScaling + min_scale_factor: 0.5 + max_scale_factor: 2.0 + scale_step_size: 0.25 + - type: RandomPaddingCrop + crop_size: [512, 1024] + - type: RandomHorizontalFlip + - type: RandomAffine + - type: RandomDistort + brightness_range: 0.5 + contrast_range: 0.5 + saturation_range: 0.5 + - type: Normalize + +val_dataset: + type: Dataset + dataset_root: data/bdd100k #数据集路径 + val_path: data/bdd100k/val.txt #数据集验证集txt文件 + num_classes: 4 + mode: val + transforms: + - type: Normalize + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-5 + +lr_scheduler: + type: PolynomialDecay + learning_rate: 0.01 #0.01 + end_lr: 0 + power: 0.9 + +loss: + types: + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + coef: [1, 1,1] + + +model: + type: PPLiteSeg + backbone: + type: STDC2 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #预训练模型 +``` + +### 执行训练 + +``` +#单卡训练 +export CUDA_VISIBLE_DEVICES=0 # Linux上设置1张可用的卡 +# set CUDA_VISIBLE_DEVICES=0 # Windows上设置1张可用的卡 + +python train.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --do_eval \ + --use_vdl \ + --save_interval 500 \ + --save_dir output + +``` +### 训练参数解释 +``` +--do_eval 是否在保存模型时启动评估, 启动时将会根据mIoU保存最佳模型至best_model +--use_vdl 是否开启visualdl记录训练数据 +--save_interval 500 模型保存的间隔步数 +--save_dir output 模型输出路径 +``` + +## 2、多卡训练 +如果想要使用多卡训练的话,需要将环境变量CUDA_VISIBLE_DEVICES指定为多卡(不指定时默认使用所有的gpu),并使用paddle.distributed.launch启动训练脚本(windows下由于不支持nccl,无法使用多卡训练): + +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 # 设置4张可用的卡 +python -m paddle.distributed.launch train.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --do_eval \ + --use_vdl \ + --save_interval 500 \ + --save_dir output +``` + + +训练完成后可以执行以下命令进行性能评估: +``` +#单卡评估 +python val.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --model_path output/iter_1000/model.pdparams +``` + + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python export.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --model_path output/iter_1000/model.pdparams \ + --save_dir output/inference_model +``` + + +使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`LANE_SEG`模块中的`model_dir`项. +``` +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml + model_dir: output/inference_model +``` + +然后可以使用-->至此即完成更新车道线分割模型任务。 diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation_en.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9b96e8a60eaf2a24ee33fc70a330295d3e01add6 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/customization/ppvehicle_violation_en.md @@ -0,0 +1,240 @@ +English | [简体中文](./ppvehicle_violation.md) + +# Customized Vehicle Violation + +The secondary development of vehicle violation task mainly focuses on the task of lane line segmentation model. PP-LiteSeg model is used to get the lane line data set bdd100k through fine-tune. The process is referred to [PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。 + +## Data preparation + +ppvehicle violation analysis divides the lane line into 4 categories +``` +0 Background + +1 double yellow line + +2 Solid line + +3 Dashed line + +``` + +1. For the bdd100k data set, we can combine the processing script provided by [lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py) and bdd100k [repo](https://github.com/bdd100k/bdd100k) to process the data into the data format required for segmentation. + + +``` +# clone bdd100k: +git clone https://github.com/bdd100k/bdd100k.git + +# copy lane_to_mask.py to bdd100k/ +cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/ + +# preparation bdd100k env +cd bdd100k && pip install -r requirements.txt + +#bdd100k to mask +python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path + +# -i means input path for bdd100k dataset label json, +# -o for output patn + +``` + +2. Organize data and store data in the following format: +``` +dataset_root + | + |--images + | |--train + | |--image1.jpg + | |--image2.jpg + | |--... + | |--val + | |--image3.jpg + | |--image4.jpg + | |--... + | |--test + | |--image5.jpg + | |--image6.jpg + | |--... + | + |--labels + | |--train + | |--label1.jpg + | |--label2.jpg + | |--... + | |--val + | |--label3.jpg + | |--label4.jpg + | |--... + | |--test + | |--label5.jpg + | |--label6.jpg + | |--... + | +``` + +run [create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py) create txt file + +``` +python create_dataset_list.py #dataset path + --type custom #dataset type,support cityscapes、custom + +``` + +For other data and data annotation, please refer to PaddleSeg [Prepare Custom Datasets](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md) + + +## model training + +clone PaddleSeg: +``` +git clone https://github.com/PaddlePaddle/PaddleSeg.git +``` + +prepapation env: +``` +cd PaddleSeg +pip install -r requirements.txt +``` + +### Prepare configuration file +For details, please refer to PaddleSeg [prepare configuration file](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md). + +exp: pp_liteseg_stdc2_bdd100k_1024x512.yml + +``` +batch_size: 16 +iters: 50000 + +train_dataset: + type: Dataset + dataset_root: data/bdd100k #dataset path + train_path: data/bdd100k/train.txt #dataset train txt + num_classes: 4 #lane classes + mode: train + transforms: + - type: ResizeStepScaling + min_scale_factor: 0.5 + max_scale_factor: 2.0 + scale_step_size: 0.25 + - type: RandomPaddingCrop + crop_size: [512, 1024] + - type: RandomHorizontalFlip + - type: RandomAffine + - type: RandomDistort + brightness_range: 0.5 + contrast_range: 0.5 + saturation_range: 0.5 + - type: Normalize + +val_dataset: + type: Dataset + dataset_root: data/bdd100k #dataset path + val_path: data/bdd100k/val.txt #dataset val txt + num_classes: 4 + mode: val + transforms: + - type: Normalize + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-5 + +lr_scheduler: + type: PolynomialDecay + learning_rate: 0.01 #0.01 + end_lr: 0 + power: 0.9 + +loss: + types: + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + - type: MixedLoss + losses: + - type: CrossEntropyLoss + - type: LovaszSoftmaxLoss + coef: [0.6, 0.4] + coef: [1, 1,1] + + +model: + type: PPLiteSeg + backbone: + type: STDC2 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #Pre-training model +``` + +### training model + +``` +#Single GPU training +export CUDA_VISIBLE_DEVICES=0 # Linux +# set CUDA_VISIBLE_DEVICES=0 # Windows +python train.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --do_eval \ + --use_vdl \ + --save_interval 500 \ + --save_dir output + +``` +### Explanation of training parameters +``` +--do_eval Whether to start the evaluation when saving the model. When starting, the best model will be saved to best according to mIoU model +--use_vdl Whether to enable visualdl to record training data +--save_interval 500 Number of steps between model saving +--save_dir output Model output path +``` + +## 2、Multiple GPUs training +if you want to use multiple gpus training, you need to set the environment variable CUDA_VISIBLE_DEVICES is specified as multiple gpus (if not specified, all gpus will be used by default), and the training script will be started using paddle.distributed.launch (because nccl is not supported under windows, multi-card training cannot be used): + +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 # 4 gpus +python -m paddle.distributed.launch train.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --do_eval \ + --use_vdl \ + --save_interval 500 \ + --save_dir output +``` + + +After training, you can execute the following commands for performance evaluation: +``` +python val.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --model_path output/iter_1000/model.pdparams +``` + + +### Model export + +Use the following command to export the trained model as a prediction deployment model. + +``` +python export.py \ + --config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \ + --model_path output/iter_1000/model.pdparams \ + --save_dir output/inference_model +``` + + +Profile in PP-Vehicle when used `./deploy/pipeline/config/infer_cfg_ppvehicle.yml` set `model_dir` in `LANE_SEG`. +``` +LANE_SEG: + lane_seg_config: deploy/pipeline/config/lane_seg_config.yml + model_dir: output/inference_model +``` + +Then you can use -->to finish the task of updating the lane line segmentation model. diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ac372eaa30d1f02a5016b42b928e9c56bfea547a --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README.md @@ -0,0 +1,159 @@ +# Using OpenVINO for Inference + +## Introduction +PaddleDetection has been a vibrant open-source project and has a large amout of contributors and maintainers around it. It is an AI framework which enables developers to quickly integrate AI capacities into their own projects and applications. + +Intel OpenVINO is a widely used free toolkit. It facilitates the optimization of a deep learning model from a framework and deployment using an inference engine onto Intel hardware. + +Apparently, the upstream(Paddle) and the downstream(Intel OpenVINO) can work together to streamline and simplify the process of developing an AI model and deploying the model onto hardware, which, in turn, makes our lives easier. + +This article will show you how to use a PaddleDetection model [FairMOT](../../../configs/mot/fairmot/README.md) from the Model Zoo in PaddleDetection and use it with OpenVINO to do the inference. + +------------ + +## Prerequisites + +This article is not an entry level introduction to help you set up everything, in order to focus on its main purpose, the instruction of setting up environment will be kept at the minmum level and respective instructions will be provided by their official website links. + +Before we can do anything, please make sure you have PaddlePaddle environment set up. + +``` +conda install paddlepaddle==2.2.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ +``` + +Please also download the converted [ONNX format of FairMOT](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_576_320_v3.onnx) + +## Export the PaddleDetection Model to ONNX format + +1. Download the [FairMOT Inference Model](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar) + +2. Using Paddle2ONNX to convert the model + +Make sure you have the [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) installed + +``` +paddle2onnx --model_dir . --model_filename model.pdmodel \ +--params_filename model.pdiparams \ +--input_shape_dict "{'image': [1, 3, 320, 576], 'scale_factor': [1, 2], 'im_shape': [1, 2]}" \ +--save_file fairmot_576_320_v2.onnx \ +--opset_version 12 \ +--enable_onnx_checker True +``` + +For more details about how to convert Paddle models to ONNX, please see [Export ONNX Model](../../../deploy/EXPORT_ONNX_MODEL_en.md). + +## Use the ONNX model for inference + +Once the Paddle model has been converted to ONNX format, we can then use it with OpenVINO inference engine to do the prediction. + +*Please make sure you have the OpenVINO installed, here is the [instruction for installation](https://docs.openvino.ai/cn/latest/openvino_docs_install_guides_installing_openvino_linux.html).* + +1. ### Get the execution network + +So the 1st thing to do here is to get an execution network which can be used later to do the inference. + +Here is the code. + +``` +def get_net(): + ie = IECore() + model_path = root_path / "PaddleDetection/FairMot/fairmot_576_320_v3.onnx" + net = ie.read_network(model= str(model_path)) + exec_net = ie.load_network(network=net, device_name="CPU") + return net, exec_net +``` + +2. ### Preprocessing + +Every AI model has its own steps of preprocessing, let's have a look how to do it for the FairMOT model: + +``` +def prepare_input(): + transforms = [ + T.Resize(target_size=(target_width, target_height)), + T.Normalize(mean=(0,0,0), std=(1,1,1)) + ] + img_file = root_path / "images/street.jpeg" + img = cv2.imread(str(img_file)) + normalized_img, _ = T.Compose(transforms)(img) + # add an new axis in front + img_input = normalized_img[np.newaxis, :] + # scale_factor is calculated as: im_shape / original_im_shape + h_scale = target_height / img.shape[0] + w_scale = target_width / img.shape[1] + input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]} + return input, img +``` + +3. ### Prediction + +After we have done all the load network and preprocessing, it finally comes to the stage of prediction. + + +``` +def predict(exec_net, input): + result = exec_net.infer(input) + return result +``` + +You might be surprised to see the very exciting stage this small. Hang on there, the next stage is actually big again. + +4. ### Post-processing + +MOT(Multi-Object Tracking) is special, not like other AI models which require a few steps of post-processing. Instead, FairMOT requires a special object called tracker, to handle the prediction results. The prediction results are prediction detections and prediction embeddings. + +Luckily, PaddleDetection has made this procesure easy for us, it has exported the JDETracker from `ppdet`, so that we do not need to write much code to handle it. + +``` +def postprocess(pred_dets, pred_embs, threshold = 0.5): + tracker = JDETracker() + online_targets_dict = tracker.update(pred_dets, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(1): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + # make sure the tscore is no less then the threshold. + if tscore < threshold: continue + # make sure the target area is not less than the min_box_area. + if tlwh[2] * tlwh[3] <= tracker.min_box_area: + continue + # make sure the vertical ratio of a found target is within the range (1.6 as default ratio). + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + online_im = plot_tracking_dict( + img, + 1, + online_tlwhs, + online_ids, + online_scores, + frame_id=0) + return online_im +``` + +5. ### Plot the detections (Optional) + +This step is optional. For demo purpose, I just use `plot_tracking_dict()` method to draw all boundary boxes on the image. But you do not need to do this if you don't have the same requirement. + +``` +online_im = plot_tracking_dict( + img, + 1, + online_tlwhs, + online_ids, + online_scores, + frame_id=0) +``` + +So these are the all steps which you need to follow in order to run FairMOT on your machine. + +A companion article which explains in details of this procedure will be released soon and a link to that article will be updated here soon. + +To see the full code, please take a look at [Paddle OpenVINO Prediction](./fairmot_onnx_openvino.py). diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README_cn.md b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..aaaf84eb05c26359fcc48cb14a3f6104bd834d5d --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/README_cn.md @@ -0,0 +1,157 @@ +# 将FairMOT模型为ONNX格式,并用OpenVINO做推理 + +## 简介 + +PaddleDetection是一个充满活力的开源项目,拥有大量的贡献者和维护者。 PaddleDetection是PaddlePaddle下面一个人工智能框物体检测工具集,能够帮助开发人员快速的将人工智能集成到自己的项目和应用程序中。 +Intel OpenVINO 是一个广泛使用的免费工具包。 它能帮助优化深度学习模型,并使用推理引擎将其部署到英特尔硬件上。 +很显然,当我们可以协同上下游(PaddlePaddle, OpenVINO)一起工作,这将可以极大的简化工作流程, 并且帮助我们实现AI模型从开发到部署的流水线工作模式, 这也让我们的生活更轻松。 + +本文将向您展示如何在 PaddleDetection 中使用 Model Zoo 中的FairMOT模型 [FairMOT](../../../configs/mot/fairmot/README.md) 并用OpenVINO来实现推理过程。 + +------------ + +## 前提要求 + +为了专注于介绍如何在OpenVINO中使用飞桨的模型这一主题,本文将不是一片入门级文章,它不会帮助您设置好您的开发环境, 本文只会提供最核心的组件安装, 并且会为每个需要用到的组件提供相应的链接. + +在开始之前 请确保您已经安装了 PaddlePaddle. + +``` +conda install paddlepaddle==2.2.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ +``` + +为了运行演示程序, 您还需要下载已经转换好了的[ONNX格式的FairMOT模型](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_576_320_v3.onnx). + +## 将FairMOT模型到ONNX格式 + +1. 下载[FairMOT推理模型](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar). + +2. 使用Paddle2ONNX来转换FairMOT模型. + +请确保您已经安装了[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). + +``` +paddle2onnx --model_dir . --model_filename model.pdmodel \ +--params_filename model.pdiparams \ +--input_shape_dict "{'image': [1, 3, 320, 576], 'scale_factor': [1, 2], 'im_shape': [1, 2]}" \ +--save_file fairmot_576_320_v2.onnx \ +--opset_version 12 \ +--enable_onnx_checker True +``` + +更多关于如何使用Paddle2ONNX的详细信息, 请参考: [ONNX模型导出](../../../deploy/EXPORT_ONNX_MODEL_en.md). + +## 使用ONNX模型以及OpenVINO进行推理 + +当我们把Paddle模型转换成ONNX模型之后, 我们可以直接使用OpenVINO读取其模型 并且进行推理. + +*请确保您已经安装了OpenVINO, 这里是[OpenVINO的安装指南](https://docs.openvino.ai/cn/latest/openvino_docs_install_guides_installing_openvino_linux.html).* + +1. ### 创建一个execution network + +所以这里要做的第一件事是获得一个执行网络,以后可以使用它来进行推理。 +代码如下: + +``` +def get_net(): + ie = IECore() + model_path = root_path / "PaddleDetection/FairMot/fairmot_576_320_v3.onnx" + net = ie.read_network(model= str(model_path)) + exec_net = ie.load_network(network=net, device_name="CPU") + return net, exec_net +``` + +2. ### 预处理 + +每个 AI 模型都有自己不同的预处理步骤,让我们看看 FairMOT 模型是如何做的: + +``` +def prepare_input(): + transforms = [ + T.Resize(target_size=(target_width, target_height)), + T.Normalize(mean=(0,0,0), std=(1,1,1)) + ] + img_file = root_path / "images/street.jpeg" + img = cv2.imread(str(img_file)) + normalized_img, _ = T.Compose(transforms)(img) + # add an new axis in front + img_input = normalized_img[np.newaxis, :] + # scale_factor is calculated as: im_shape / original_im_shape + h_scale = target_height / img.shape[0] + w_scale = target_width / img.shape[1] + input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]} + return input, img +``` + +3. ### 预测 + +在我们完成了所有的负载网络和预处理之后,终于开始了预测阶段。 + +``` +def predict(exec_net, input): + result = exec_net.infer(input) + return result +``` + +您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会更加复杂。 + +4. ### 后处理 + +相较于大多数其他类型的AI推理, MOT(Multi-Object Tracking)显然是特殊的. FairMOT 需要一个称为跟踪器的特殊对象来处理预测结果。 这个预测结果则包括预测检测和预测的行人特征向量。 + +幸运的是,PaddleDetection 为我们简化了这个过程,我们可以从`ppdet`导出JDETracker,然后用这个tracker挑选出来符合条件的检测框,而且我们不需要编写太多代码来处理它。 + + +``` +def postprocess(pred_dets, pred_embs, threshold = 0.5): + tracker = JDETracker() + online_targets_dict = tracker.update(pred_dets, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(1): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + # make sure the tscore is no less then the threshold. + if tscore < threshold: continue + # make sure the target area is not less than the min_box_area. + if tlwh[2] * tlwh[3] <= tracker.min_box_area: + continue + # make sure the vertical ratio of a found target is within the range (1.6 as default ratio). + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + online_im = plot_tracking_dict( + img, + 1, + online_tlwhs, + online_ids, + online_scores, + frame_id=0) + return online_im +``` + +5. ### 画出检测框(可选) + +这一步是可选的。出于演示目的,我只使用 `plot_tracking_dict()` 方法在图像上绘制所有边界框。 但是,如果您没有相同的要求,则不需要这样做。 + +``` +online_im = plot_tracking_dict( + img, + 1, + online_tlwhs, + online_ids, + online_scores, + frame_id=0) +``` + +这些就是在您的硬件上运行 FairMOT 所需要遵循的所有步骤。 + +之后会有一篇详细解释此过程的配套文章将会发布,并且该文章的链接将很快在此处更新。 + +完整代码请查看 [Paddle OpenVINO 预测](./fairmot_onnx_openvino.py). diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py new file mode 100644 index 0000000000000000000000000000000000000000..f0ed7d7a9ae1ab8c3d48e6f6145e15a22d33b16c --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py @@ -0,0 +1,104 @@ +from collections import defaultdict +from pathlib import Path + +import cv2 +import numpy as np +import paddle.vision.transforms as T +from openvino.inference_engine import IECore +from ppdet.modeling.mot.tracker import JDETracker +from ppdet.modeling.mot.visualization import plot_tracking_dict + +root_path = Path(__file__).parent +target_height = 320 +target_width = 576 + +# ------------------------------- +def get_net(): + ie = IECore() + model_path = root_path / "fairmot_576_320_v3.onnx" + net = ie.read_network(model= str(model_path)) + exec_net = ie.load_network(network=net, device_name="CPU") + return net, exec_net + +def get_output_names(net): + output_names = [key for key in net.outputs] + return output_names + +def prepare_input(): + transforms = [ + T.Resize(size=(target_height, target_width)), + T.Normalize(mean=(0,0,0), std=(1,1,1), data_format='HWC', to_rgb= True), + T.Transpose() + ] + + img_file = root_path / "street.jpeg" + img = cv2.imread(str(img_file)) + normalized_img = T.Compose(transforms)(img) + normalized_img = normalized_img.astype(np.float32, copy=False) / 255.0 + + # add an new axis in front + img_input = normalized_img[np.newaxis, :] + # scale_factor is calculated as: im_shape / original_im_shape + h_scale = target_height / img.shape[0] + w_scale = target_width / img.shape[1] + input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]} + return input, img + +def predict(exec_net, input): + result = exec_net.infer(input) + return result + +def postprocess(pred_dets, pred_embs, threshold = 0.5): + + tracker = JDETracker() + + online_targets_dict = tracker.update(pred_dets, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + + for cls_id in range(1): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + + # make sure the tscore is no less then the threshold. + if tscore < threshold: continue + + # make sure the target area is not less than the min_box_area. + if tlwh[2] * tlwh[3] <= tracker.min_box_area: + continue + + # make sure the vertical ratio of a found target is within the range (1.6 as default ratio). + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + + online_im = plot_tracking_dict( + img, + 1, + online_tlwhs, + online_ids, + online_scores, + frame_id=0) + + return online_im + +# ------------------------------- +net, exec_net = get_net() +output_names = get_output_names(net) +del net + +input, img = prepare_input() +result = predict(exec_net, input) + +pred_dets = result[output_names[0]] +pred_embs = result[output_names[1]] + +processed_img = postprocess(pred_dets, pred_embs) +tracked_img_file_path = root_path / "tracked.jpg" +cv2.imwrite(str(tracked_img_file_path), processed_img) diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/requirements.txt b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..986abdd66b8c6dead295db65c578de2996fb814f --- /dev/null +++ b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/requirements.txt @@ -0,0 +1,4 @@ +numpy +opencv-python +openvino == 2021.4.0 +paddledet==2.3.0 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/street.jpeg b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/street.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..80112cfd917a4dd92f359c27193c1b9ef0a3592b Binary files /dev/null and b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/street.jpeg differ diff --git a/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/tracked.jpg b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/tracked.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b122ab6693682a8e73f37d5cde813d4154334c0b Binary files /dev/null and b/PaddleDetection-release-2.6/docs/advanced_tutorials/openvino_inference/tracked.jpg differ diff --git a/PaddleDetection-release-2.6/docs/contribution/README.md b/PaddleDetection-release-2.6/docs/contribution/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e8adc2a4847ceb8367531b164d8eaf6a5a184fa0 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/contribution/README.md @@ -0,0 +1,33 @@ +# Contributing to PaddleDetection + +PaddleDetection非常欢迎你加入到飞桨社区的开源建设中,你可以通过以下方式参与贡献: + +- 新建一个 ISSUE 来反馈 bug + +- 新建一个 ISSUE 来提出新功能需求、建议、疑问 + +- 提 PR 来修复一个 bug + +- 提 PR 来实现一个新功能 + +同时我们也会组织专项活动,引导大家参与到PaddleDetection的开发中: + +- [Yes, PP-YOLOE! 基于PP-YOLOE的算法开发](https://github.com/PaddlePaddle/PaddleDetection/issues/7345) + +## 贡献指南 + +提ISSUE、PR的步骤请参考[飞桨官网-贡献指南-代码贡献流程](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/code_contributing_path_cn.html) + +## 开发者 + +我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。 + +- 感谢[Mandroide](https://github.com/Mandroide)清理代码并且统一部分函数接口。 +- 感谢[FL77N](https://github.com/FL77N/)贡献`Sparse-RCNN`模型。 +- 感谢[Chen-Song](https://github.com/Chen-Song)贡献`Swin Faster-RCNN`模型。 +- 感谢[yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) 开发PP-Tracking GUI界面 +- 感谢Shigure19 开发PP-TinyPose健身APP +- 感谢[manangoel99](https://github.com/manangoel99)贡献Wandb可视化方式 + + +非常感谢大家为飞桨贡献!共建飞桨繁荣社区! diff --git a/PaddleDetection-release-2.6/docs/contribution/Yes_PP-YOLOE.md b/PaddleDetection-release-2.6/docs/contribution/Yes_PP-YOLOE.md new file mode 100644 index 0000000000000000000000000000000000000000..06d96da3cb017e5f5f25ca7cdb24f3a6b4a71ef1 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/contribution/Yes_PP-YOLOE.md @@ -0,0 +1,80 @@ +# [Contribute to PaddleDetection] Yes, PP-YOLOE! 基于PP-YOLOE的算法开发 + +本期活动联系人:[thinkthinking](https://github.com/thinkthinking) + +## 建设目标 +[PP-YOLOE+](../../configs/ppyoloe)是百度飞桨团队开源的最新SOTA通用检测模型,COCO数据集精度达54.7mAP,其L版本相比YOLOv7精度提升1.9%,V100端到端(包含前后处理)推理速度达42.2FPS。 + +我们鼓励大家基于PP-YOLOE去做新的算法开发,比如: + +- 改造PP-YOLOE适用于旋转框、小目标、关键点检测、实例分割等场景; +- 精调PP-YOLOE用于工业质检、火灾检测、垃圾检测等垂类场景; +- 将PP-YOLOE用于PP-Human、PP-Vehicle等Pipeline中,提升pipeline的检测效果。 + +相信通过这些活动,大家可以对PP-YOLOE的细节有更深刻的理解,对业务场景的应用也可以做更细节的适配。 + +## 参与方式 + +- **方式一**:**列表选题**,见招募列表(提供了选题方向、题目、优秀的对标项目、文章和代码,以供学习)。 +- **方式二**:**自选题目**,对于非参考列表内的题目,可自主命题,需要与负责人 [thinkthinking](https://github.com/thinkthinking)讨论后决定题目。 + +## 题目认领 + +为避免重复选题、知晓任务状态、方便统计管理,请根据如下操作认领您的题目。 + +在本issue提交题目:[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345) + +* 方式一(列表选题):在“招募列表”中选择题目,并在[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)中,回复下列信息: +``` + +【列表选题】 +编号:XX +题目:XXXX +认领人:XX +``` + +* 方式二(自选题目):自主命题,直接在 [issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345) 中,回复下列信息: + +``` +【自选题目】 +题目:XXXX +认领人:XX +``` + +## 招募列表 + +| 序号 | 类型 | 题目 | 难度 | 参考 | 认领人 | +| :--- | :------- | :-------------------------- | :--- | :-------------------------------------------------------------------------------- | :----- | +| 01 | 模型改造 | PP-YOLOE用于旋转框检测 | 高 | https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/rotate | ---- | +| 02 | 模型改造 | PP-YOLOE用于小目标检测 | 高 | https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/smalldet | ---- | +| 03 | 模型改造 | PP-YOLOE用于关键点检测 | 高 | https://github.com/WongKinYiu/yolov7/tree/pose | ---- | +| 04 | 模型改造 | PP-YOLOE用于实例分割 | 高 | https://github.com/WongKinYiu/yolov7/tree/mask | ---- | +| 05 | 垂类应用 | 基于PP-YOLOE的缺陷检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2367089 | ---- | +| 06 | 垂类应用 | 基于PP-YOLOE的行为检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2500639 | ---- | +| 07 | 垂类应用 | 基于PP-YOLOE的异物检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0 | ---- | +| 08 | 垂类应用 | 基于PP-YOLOE的安全监测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2503301?channelType=0&channel=0 | ---- | +| 09 | Pipeline | PP-YOLOE-->PP-Human大升级 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/4606001 | ---- | +| 10 | Pipeline | PP-YOLOE-->PP-Vehicle大升级 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/4512254 | ---- | + + + 【注意】招募列表外的,欢迎开发者联系活动负责人[thinkthinking](https://github.com/thinkthinking)提交贡献👏 + +## 贡献指南 + +1. 提ISSUE、PR的步骤请参考[飞桨官网-贡献指南-代码贡献流程](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/code_contributing_path_cn.html) +2. AI-Studio使用指南请参考[AI-Studio新手指南](https://ai.baidu.com/ai-doc/AISTUDIO/Tk39ty6ho) + +## 原则及注意事项 +1. 使用PaddlePaddle框架, 建议复用PaddleDetection代码。 +2. 建议使用[Paddle框架最新版本](https://www.paddlepaddle.org.cn/). +3. PR需提到[PaddleDetection-develop](https://github.com/PaddlePaddle/PaddleDetection/tree/develop)分支。 +4. 模型改造类的任务建议以PR形式提交 +5. 垂类应用以及Pipeline类的任务建议以AI-Studio项目形式提交,项目会同步到[产业范例页面](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/industrial_tutorial/README.md) + +## 还有不清楚的问题 + +欢迎大家随时在本[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)下提问,飞桨会有专门的管理员进行疑问解答。 + +有任何问题,请联系本期活动联系人 [thinkthinking](https://github.com/thinkthinking) + +非常感谢大家为飞桨贡献!共建飞桨繁荣社区! diff --git a/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL.md b/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL.md new file mode 100644 index 0000000000000000000000000000000000000000..c9eb3e47886d7480d2fed523d5e51bba827e5c35 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL.md @@ -0,0 +1,408 @@ +简体中文 | [English](PaddleYOLO_MODEL_en.md) + +# [**PaddleYOLO**](https://github.com/PaddlePaddle/PaddleYOLO) + +## 内容 +- [**PaddleYOLO**](#paddleyolo) + - [内容](#内容) + - [简介](#简介) + - [更新日志](#更新日志) + - [模型库](#模型库) + - [PP-YOLOE](#pp-yoloe) + - [YOLOX](#yolox) + - [YOLOv5](#yolov5) + - [YOLOv6](#yolov6) + - [YOLOv7](#yolov7) + - [YOLOv8](#yolov8) + - [RTMDet](#rtmdet) + - [**注意:**](#注意) + - [VOC](#voc) + - [使用指南](#使用指南) + - [**一键运行全流程**](#一键运行全流程) + - [自定义数据集](#自定义数据集) + - [数据集准备:](#数据集准备) + - [fintune训练:](#fintune训练) + - [预测和导出:](#预测和导出) + +## 简介 + +**PaddleYOLO**是基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的YOLO系列模型库,**只包含YOLO系列模型的相关代码**,支持`YOLOv3`,`PP-YOLO`,`PP-YOLOv2`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`,`YOLOv8`,`RTMDet`等模型,欢迎一起使用和建设! + +## 更新日志 +* 【2022/01/10】支持[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)预测和部署; +* 【2022/09/29】支持[RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet)预测和部署; +* 【2022/09/26】发布[`PaddleYOLO`](https://github.com/PaddlePaddle/PaddleYOLO)模型套件; +* 【2022/09/19】支持[`YOLOv6`](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6)新版,包括n/t/s/m/l模型; +* 【2022/08/23】发布`YOLOSeries`代码库: 支持`YOLOv3`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`等YOLO模型,支持`ConvNeXt`骨干网络高精度版`PP-YOLOE`,`YOLOX`和`YOLOv5`等模型,支持PaddleSlim无损加速量化训练`PP-YOLOE`,`YOLOv5`,`YOLOv6`和`YOLOv7`等模型,详情可阅读[此文章](https://mp.weixin.qq.com/s/Hki01Zs2lQgvLSLWS0btrA); + + +**注意:** + - **PaddleYOLO**代码库协议为**GPL 3.0**,[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7)和[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)这几类模型代码不合入[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection),其余YOLO模型推荐在[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)中使用,**会最先发布PP-YOLO系列特色检测模型的最新进展**;; + - **PaddleYOLO**代码库**推荐使用paddlepaddle-2.3.2以上的版本**,请参考[官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载对应适合版本,**Windows平台请安装paddle develop版本**; + - PaddleYOLO 的[Roadmap](https://github.com/PaddlePaddle/PaddleYOLO/issues/44) issue用于收集用户的需求,欢迎提出您的建议和需求。 + - 训练**自定义数据集**请参照[文档](#自定义数据集)和[issue](https://github.com/PaddlePaddle/PaddleYOLO/issues/43)。请首先**确保加载了COCO权重作为预训练**,YOLO检测模型建议**总`batch_size`至少大于`64`**去训练,如果资源不够请**换小模型**或**减小模型的输入尺度**,为了保障较高检测精度,**尽量不要尝试单卡训和总`batch_size`小于`32`训**; + + +## 模型库 + +### [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe) + +
    + 基础模型 + +| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| PP-YOLOE-s | 640 | 32 | 400e | 2.9 | 43.4 | 60.0 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml) | +| PP-YOLOE-s | 640 | 32 | 300e | 2.9 | 43.0 | 59.6 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | +| PP-YOLOE-m | 640 | 28 | 300e | 6.0 | 49.0 | 65.9 | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | +| PP-YOLOE-l | 640 | 20 | 300e | 8.7 | 51.4 | 68.6 | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | +| PP-YOLOE-x | 640 | 16 | 300e | 14.9 | 52.3 | 69.5 | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | +| PP-YOLOE-tiny ConvNeXt| 640 | 16 | 36e | - | 44.6 | 63.3 | 33.04 | 13.87 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml) | +| **PP-YOLOE+_s** | 640 | 8 | 80e | 2.9 | **43.7** | **60.6** | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) | +| **PP-YOLOE+_m** | 640 | 8 | 80e | 6.0 | **49.8** | **67.1** | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) | +| **PP-YOLOE+_l** | 640 | 8 | 80e | 8.7 | **52.9** | **70.1** | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | +| **PP-YOLOE+_x** | 640 | 8 | 80e | 14.9 | **54.7** | **72.0** | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| PP-YOLOE-s(400epoch) | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.onnx) | +| PP-YOLOE-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.onnx) | +| PP-YOLOE-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.onnx) | +| PP-YOLOE-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.onnx) | +| PP-YOLOE-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.onnx) | +| **PP-YOLOE+_s** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_m** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_l** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_x** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.onnx) | + +
    + +### [YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox) + +
    + 基础模型 + +| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [model](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_nano_300e_coco.yml) | +| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [model](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_tiny_300e_coco.yml) | +| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_s_300e_coco.yml) | +| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_m_300e_coco.yml) | +| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_l_300e_coco.yml) | +| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [model](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_x_300e_coco.yml) | + YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [model](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [config](c../../onfigs/yolox/yolox_cdn_tiny_300e_coco.yml) | +| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [model](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_crn_s_300e_coco.yml) | +| YOLOX-s ConvNeXt| 640 | 8 | 36e | - | 44.6 | 65.3 | 36.2 | 27.52 | [model](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/yolox_convnext_s_36e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOx-nano | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.onnx) | +| YOLOx-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.onnx) | +| YOLOx-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.onnx) | +| YOLOx-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.onnx) | +| YOLOx-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.onnx) | +| YOLOx-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.onnx) | + +
    + +### [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5) + +
    + 基础模型 + +| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOv5-n | 640 | 16 | 300e | 2.6 | 28.0 | 45.7 | 1.87 | 4.52 | [model](https://paddledet.bj.bcebos.com/models/yolov5_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_n_300e_coco.yml) | +| YOLOv5-s | 640 | 16 | 300e | 3.2 | 37.6 | 56.7 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_s_300e_coco.yml) | +| YOLOv5-m | 640 | 16 | 300e | 5.2 | 45.4 | 64.1 | 21.19 | 49.08 | [model](https://paddledet.bj.bcebos.com/models/yolov5_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_m_300e_coco.yml) | +| YOLOv5-l | 640 | 16 | 300e | 7.9 | 48.9 | 67.1 | 46.56 | 109.32 | [model](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_l_300e_coco.yml) | +| YOLOv5-x | 640 | 16 | 300e | 13.7 | 50.6 | 68.7 | 86.75 | 205.92 | [model](https://paddledet.bj.bcebos.com/models/yolov5_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_x_300e_coco.yml) | +| YOLOv5-s ConvNeXt| 640 | 8 | 36e | - | 42.4 | 65.3 | 34.54 | 17.96 | [model](https://paddledet.bj.bcebos.com/models/yolov5_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_convnext_s_36e_coco.yml) | +| *YOLOv5p6-n | 1280 | 16 | 300e | - | 35.9 | 54.2 | 3.25 | 9.23 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_n_300e_coco.yml) | +| *YOLOv5p6-s | 1280 | 16 | 300e | - | 44.5 | 63.3 | 12.63 | 33.81 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_s_300e_coco.yml) | +| *YOLOv5p6-m | 1280 | 16 | 300e | - | 51.1 | 69.0 | 35.73 | 100.21 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_m_300e_coco.yml) | +| *YOLOv5p6-l | 1280 | 8 | 300e | - | 53.4 | 71.0 | 76.77 | 223.09 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_l_300e_coco.yml) | +| *YOLOv5p6-x | 1280 | 8 | 300e | - | 54.7 | 72.4 | 140.80 | 420.03 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_x_300e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv5-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.onnx) | +| YOLOv5-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.onnx) | +| YOLOv5-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.onnx) | +| YOLOv5-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.onnx) | +| YOLOv5-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.onnx) | + +
    + +### [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6) + +
    + 基础模型 + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP | AP50 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: | +| *YOLOv6-n | 640 | 16 | 300e(+300e) | 2.0 | 37.5 | 53.1 | 5.07 | 12.49 |[model](https://paddledet.bj.bcebos.com/models/yolov6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_n_300e_coco.yml) | +| *YOLOv6-s | 640 | 32 | 300e(+300e) | 2.7 | 44.8 | 61.7 | 20.18 | 49.36 |[model](https://paddledet.bj.bcebos.com/models/yolov6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_s_300e_coco.yml) | +| *YOLOv6-m | 640 | 32 | 300e(+300e) | - | 49.5 | 66.9 | 37.74 | 92.47 |[model](https://paddledet.bj.bcebos.com/models/yolov6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_m_300e_coco.yml) | +| *YOLOv6-l(silu) | 640 | 32 | 300e(+300e) | - | 52.2 | 70.2 | 59.66 | 149.4 |[model](https://paddledet.bj.bcebos.com/models/yolov6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_l_300e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| yolov6-n | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.onnx) | +| yolov6-s | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.onnx) | +| yolov6-m | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.onnx) | +| yolov6-l(silu) | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.onnx) | + +
    + +### [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7) + +
    + 基础模型 + +| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOv7-L | 640 | 32 | 300e | 7.4 | 51.0 | 70.2 | 37.62 | 106.08 |[model](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_l_300e_coco.yml) | +| *YOLOv7-X | 640 | 32 | 300e | 12.2 | 53.0 | 70.8 | 71.34 | 190.08 | [model](https://paddledet.bj.bcebos.com/models/yolov7_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_x_300e_coco.yml) | +| *YOLOv7P6-W6 | 1280 | 16 | 300e | 25.5 | 54.4 | 71.8 | 70.43 | 360.26 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_w6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_w6_300e_coco.yml) | +| *YOLOv7P6-E6 | 1280 | 10 | 300e | 31.1 | 55.7 | 73.0 | 97.25 | 515.4 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6_300e_coco.yml) | +| *YOLOv7P6-D6 | 1280 | 8 | 300e | 37.4 | 56.1 | 73.3 | 133.81 | 702.92 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_d6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_d6_300e_coco.yml) | +| *YOLOv7P6-E6E | 1280 | 6 | 300e | 48.7 | 56.5 | 73.7 | 151.76 | 843.52 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6e_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6e_300e_coco.yml) | +| YOLOv7-tiny | 640 | 32 | 300e | - | 37.3 | 54.5 | 6.23 | 6.90 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_300e_coco.yml) | +| YOLOv7-tiny | 416 | 32 | 300e | - | 33.3 | 49.5 | 6.23 | 2.91 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_416_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_416_300e_coco.yml) | +| YOLOv7-tiny | 320 | 32 | 300e | - | 29.1 | 43.8 | 6.23 | 1.73 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_320_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_320_300e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv7-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.onnx) | +| YOLOv7-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.onnx) | +| YOLOv7P6-W6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-E6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-D6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-E6E | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 320 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.onnx) | + +
    + + +### [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) + +
    + 基础模型 + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| *YOLOv8-n | 640 | 16 | 500e | 2.4 | 37.3 | 53.0 | 3.16 | 8.7 | [model](https://paddledet.bj.bcebos.com/models/yolov8_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_n_300e_coco.yml) | +| *YOLOv8-s | 640 | 16 | 500e | 3.4 | 44.9 | 61.8 | 11.17 | 28.6 | [model](https://paddledet.bj.bcebos.com/models/yolov8_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_s_300e_coco.yml) | +| *YOLOv8-m | 640 | 16 | 500e | 6.5 | 50.2 | 67.3 | 25.90 | 78.9 | [model](https://paddledet.bj.bcebos.com/models/yolov8_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_m_300e_coco.yml) | +| *YOLOv8-l | 640 | 16 | 500e | 10.0 | 52.8 | 69.6 | 43.69 | 165.2 | [model](https://paddledet.bj.bcebos.com/models/yolov8_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_l_300e_coco.yml) | +| *YOLOv8-x | 640 | 16 | 500e | 15.1 | 53.8 | 70.6 | 68.23 | 257.8 | [model](https://paddledet.bj.bcebos.com/models/yolov8_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_x_300e_coco.yml) | +| *YOLOv8-P6-x | 1280 | 16 | 500e | 55.0 | - | - | 97.42 | 522.93 | [model](https://paddledet.bj.bcebos.com/models/yolov8p6_x_500e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8p6_x_500e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv8-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.onnx) | +| YOLOv8-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.onnx) | +| YOLOv8-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.onnx) | +| YOLOv8-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.onnx) | +| YOLOv8-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.onnx) | + +
    + + +### [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet) + +
    + 基础模型 + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP | AP50 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: | +| *RTMDet-t | 640 | 32 | 300e | 2.8 | 40.9 | 57.9 | 4.90 | 16.21 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_t_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_t_300e_coco.yml) | +| *RTMDet-s | 640 | 32 | 300e | 3.3 | 44.5 | 62.0 | 8.89 | 29.71 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_s_300e_coco.yml) | +| *RTMDet-m | 640 | 32 | 300e | 6.4 | 49.1 | 66.8 | 24.71 | 78.47 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_m_300e_coco.yml) | +| *RTMDet-l | 640 | 32 | 300e | 10.2 | 51.2 | 68.8 | 52.31 | 160.32 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_l_300e_coco.yml) | +| *RTMDet-x | 640 | 32 | 300e | 18.0 | 52.6 | 70.4 | 94.86 | 283.12 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_x_300e_coco.yml) | + +
    + +
    + 部署模型 + +| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| RTMDet-t | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.onnx) | +| RTMDet-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.onnx) | +| RTMDet-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.onnx) | +| RTMDet-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.onnx) | +| RTMDet-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.onnx) | + +
    + + +### **注意:** + - 所有模型均使用COCO train2017作为训练集,在COCO val2017上验证精度,模型前带*表示训练更新中。 + - 具体精度和速度细节请查看[PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe),[YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox),[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7),**其中YOLOv5,YOLOv6,YOLOv7评估并未采用`multi_label`形式**。 +- 模型推理耗时(ms)为TensorRT-FP16下测试的耗时,**不包含数据预处理和模型输出后处理(NMS)的耗时**。测试采用**单卡Tesla T4 GPU,batch size=1**,测试环境为**paddlepaddle-2.3.2**, **CUDA 11.2**, **CUDNN 8.2**, **GCC-8.2**, **TensorRT 8.0.3.4**,具体请参考各自模型主页。 +- **统计FLOPs(G)和Params(M)**,首先安装[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), `pip install paddleslim`,然后设置[runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml)里`print_flops: True`和`print_params: True`,并且注意确保是**单尺度**下如640x640,**打印的是MACs,FLOPs=2*MACs**。 + - 各模型导出后的权重以及ONNX,分为**带(w)**和**不带(wo)**后处理NMS,都提供了下载链接,请参考各自模型主页下载。`w_nms`表示**带NMS后处理**,可以直接使用预测出最终检测框结果如```python deploy/python/infer.py --model_dir=ppyoloe_crn_l_300e_coco_w_nms/ --image_file=demo/000000014439.jpg --device=GPU```;`wo_nms`表示**不带NMS后处理**,是**测速**时使用,如需预测出检测框结果需要找到**对应head中的后处理相关代码**并修改为如下: + ``` + if self.exclude_nms: + # `exclude_nms=True` just use in benchmark for speed test + # return pred_bboxes.sum(), pred_scores.sum() # 原先是这行,现在注释 + return pred_bboxes, pred_scores # 新加这行,表示保留进NMS前的原始结果 + else: + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num + ``` +并重新导出,使用时再**另接自己写的NMS后处理**。 + - 基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)对YOLO系列模型进行量化训练,可以实现精度基本无损,速度普遍提升30%以上,具体请参照[模型自动化压缩工具ACT](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression)。 + + +### [VOC](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc) + +
    + 基础模型 + +| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP(0.50,11point) | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :-----------: | :-------: | :-------: | :------: | :------------: | :---------------: | :------------------: |:-----------------: | :------: | :------: | +| YOLOv5-s | 640 | 16 | 60e | 3.2 | 80.3 | 7.24 | 16.54 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov5_s_60e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov5_s_60e_voc.yml) | +| YOLOv7-tiny | 640 | 32 | 60e | 2.6 | 80.2 | 6.23 | 6.90 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov7_tiny_60e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov7_tiny_60e_voc.yml) | +| YOLOX-s | 640 | 8 | 40e | 3.0 | 82.9 | 9.0 | 26.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_s_40e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolox_s_40e_voc.yml) | +| PP-YOLOE+_s | 640 | 8 | 30e | 2.9 | 86.7 | 7.93 | 17.36 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/ppyoloe_plus_crn_s_30e_voc.yml) | + +
    + +**注意:** + - VOC数据集训练的mAP为`mAP(IoU=0.5)`的结果,且评估未使用`multi_label`等trick; + - 所有YOLO VOC模型均加载各自模型的COCO权重作为预训练,各个配置文件的配置均为默认使用8卡GPU,可作为自定义数据集设置参考,具体精度会因数据集而异; + - YOLO检测模型建议**总`batch_size`至少大于`64`**去训练,如果资源不够请**换小模型**或**减小模型的输入尺度**,为了保障较高检测精度,**尽量不要尝试单卡训和总`batch_size`小于`64`训**; + - Params(M)和FLOPs(G)均为训练时所测,YOLOv7没有s模型,故选用tiny模型; + - TRT-FP16-Latency(ms)测速相关请查看各YOLO模型的config的主页; + + +## 使用指南 + +下载MS-COCO数据集,[官网](https://cocodataset.org)下载地址为: [annotations](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [train2017](http://images.cocodataset.org/zips/train2017.zip), [val2017](http://images.cocodataset.org/zips/val2017.zip), [test2017](http://images.cocodataset.org/zips/test2017.zip)。 +PaddleDetection团队提供的下载链接为:[coco](https://bj.bcebos.com/v1/paddledet/data/coco.tar)(共约22G)和[test2017](https://bj.bcebos.com/v1/paddledet/data/cocotest2017.zip),注意test2017可不下载,评估是使用的val2017。 + + +### **一键运行全流程** + +将以下命令写在一个脚本文件里如```run.sh```,一键运行命令为:```sh run.sh```,也可命令行一句句去运行。 + +```bash +model_name=ppyoloe # 可修改,如 yolov7 +job_name=ppyoloe_plus_crn_l_300e_coco # 可修改,如 yolov7_tiny_300e_coco + +config=configs/${model_name}/${job_name}.yml +log_dir=log_dir/${job_name} +# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams +weights=output/${job_name}/model_final.pdparams + +# 1.训练(单卡/多卡) +# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp + +# 2.评估 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise + +# 3.直接预测 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5 + +# 4.导出模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True + +# 5.部署预测 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU + +# 6.部署测速,加 “--run_mode=trt_fp16” 表示在TensorRT FP16模式下测速 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16 + +# 7.onnx导出 +paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx + +# 8.onnx测速 +/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16 +``` + +- 如果想切换模型,只要修改开头两行即可,如: + ``` + model_name=yolov7 + job_name=yolov7_l_300e_coco + ``` +- 导出**onnx**,首先安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX),`pip install paddle2onnx`; +- **统计FLOPs(G)和Params(M)**,首先安装[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim),`pip install paddleslim`,然后设置[runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml)里`print_flops: True`和`print_params: True`,并且注意确保是**单尺度**下如640x640,**打印的是MACs,FLOPs=2*MACs**。 + + +### 自定义数据集 + +#### 数据集准备: + +1.自定义数据集的标注制作,请参考[DetAnnoTools](../tutorials/data/DetAnnoTools.md); + +2.自定义数据集的训练准备,请参考[PrepareDataSet](../tutorials/PrepareDataSet.md)。 + + +#### fintune训练: + +除了更改数据集的路径外,训练一般推荐加载**对应模型的COCO预训练权重**去fintune,会更快收敛和达到更高精度,如: + +```base +# 单卡fintune训练: +# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + +# 多卡fintune训练: +python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +**注意:** +- fintune训练一般会提示head分类分支最后一层卷积的通道数没对应上,属于正常情况,是由于自定义数据集一般和COCO数据集种类数不一致; +- fintune训练一般epoch数可以设置更少,lr设置也更小点如1/10,最高精度可能出现在中间某个epoch; + +#### 预测和导出: + +使用自定义数据集预测和导出模型时,如果TestDataset数据集路径设置不正确会默认使用COCO 80类。 +除了TestDataset数据集路径设置正确外,也可以自行修改和添加对应的label_list.txt文件(一行记录一个对应种类),TestDataset中的anno_path也可设置为绝对路径,如: +``` +TestDataset: + !ImageFolder + anno_path: label_list.txt # 如不使用dataset_dir,则anno_path即为相对于PaddleDetection主目录的相对路径 + # dataset_dir: dataset/my_coco # 如使用dataset_dir,则dataset_dir/anno_path作为新的anno_path +``` +label_list.txt里的一行记录一个对应种类,如下所示: +``` +person +vehicle +``` diff --git a/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL_en.md b/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL_en.md new file mode 100644 index 0000000000000000000000000000000000000000..beea3c27fbdadc81f6de2f565da80151f5865ccb --- /dev/null +++ b/PaddleDetection-release-2.6/docs/feature_models/PaddleYOLO_MODEL_en.md @@ -0,0 +1,399 @@ +[简体中文](PaddleYOLO_MODEL.md) | English + +# [**PaddleYOLO**](https://github.com/PaddlePaddle/PaddleYOLO) + +## Introduction +- [**PaddleYOLO**](#paddleyolo) + - [Introduction](#introduction) + - [Introduction](#introduction-1) + - [Updates](#updates) + - [ModelZoo](#modelzoo) + - [PP-YOLOE](#pp-yoloe) + - [YOLOX](#yolox) + - [YOLOv5](#yolov5) + - [YOLOv6](#yolov6) + - [YOLOv7](#yolov7) + - [YOLOv8](#yolov8) + - [RTMDet](#rtmdet) + - [**Notes:**](#notes) + - [VOC](#voc) + - [UserGuide](#userguide) + - [**Pipeline**](#pipeline) + - [CustomDataset](#customdataset) + - [preparation:](#preparation) + - [fintune:](#fintune) + - [Predict and export:](#predict-and-export) + +## Introduction + +**PaddleYOLO** is a YOLO Series toolbox based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), **only relevant codes of YOLO series models are included**. It supports `YOLOv3`,`PP-YOLO`,`PP-YOLOv2`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`,`YOLOv8`,`RTMDet` and so on. Welcome to use and build it together! + +## Updates + +* 【2023/01/10】Support [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) inference and deploy; +* 【2022/09/29】Support [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet) inference and deploy; +* 【2022/09/26】Release [`PaddleYOLO`](https://github.com/PaddlePaddle/PaddleYOLO); +* 【2022/09/19】Support the new version of [`YOLOv6`](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6), including n/t/s/m/l model; +* 【2022/08/23】Release `YOLOSeries` codebase: support `YOLOv3`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6` and `YOLOv7`; support using `ConvNeXt` backbone to get high-precision version of `PP-YOLOE`,`YOLOX` and `YOLOv5`; support PaddleSlim accelerated quantitative training `PP-YOLOE`,`YOLOv5`,`YOLOv6` and `YOLOv7`. For details, please read this [article](https://mp.weixin.qq.com/s/Hki01Zs2lQgvLSLWS0btrA); + + +**Notes:** + - The Licence of **PaddleYOLO** is **GPL 3.0**, the codes of [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7) and [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) will not be merged into [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Except for these three YOLO models, other YOLO models are recommended to use in [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), **which will be the first to release the latest progress of PP-YOLO series detection model**; + - To use **PaddleYOLO**, **PaddlePaddle-2.3.2 or above is recommended**,please refer to the [official website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) to download the appropriate version. **For Windows platforms, please install the paddle develop version**; + - Training **Custom dataset** please refer to [doc](#CustomDataset) and [issue](https://github.com/PaddlePaddle/PaddleYOLO/issues/43). Please **ensure COCO trained weights are loaded as pre-train** at first. We recommend to use YOLO detection model **with a total `batch_size` at least greater than `64` to train**. If the resources are insufficient, please **use the smaller model** or **reduce the input size of the model**. To ensure high detection accuracy, **you'd better never try to using single GPU or total `batch_size` less than `32` for training**; + +## ModelZoo + +### [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| PP-YOLOE-s | 640 | 32 | 400e | 2.9 | 43.4 | 60.0 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml) | +| PP-YOLOE-s | 640 | 32 | 300e | 2.9 | 43.0 | 59.6 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | +| PP-YOLOE-m | 640 | 28 | 300e | 6.0 | 49.0 | 65.9 | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | +| PP-YOLOE-l | 640 | 20 | 300e | 8.7 | 51.4 | 68.6 | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | +| PP-YOLOE-x | 640 | 16 | 300e | 14.9 | 52.3 | 69.5 | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | +| PP-YOLOE-tiny ConvNeXt| 640 | 16 | 36e | - | 44.6 | 63.3 | 33.04 | 13.87 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml) | +| **PP-YOLOE+_s** | 640 | 8 | 80e | 2.9 | **43.7** | **60.6** | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) | +| **PP-YOLOE+_m** | 640 | 8 | 80e | 6.0 | **49.8** | **67.1** | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) | +| **PP-YOLOE+_l** | 640 | 8 | 80e | 8.7 | **52.9** | **70.1** | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | +| **PP-YOLOE+_x** | 640 | 8 | 80e | 14.9 | **54.7** | **72.0** | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| PP-YOLOE-s(400epoch) | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.onnx) | +| PP-YOLOE-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.onnx) | +| PP-YOLOE-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.onnx) | +| PP-YOLOE-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.onnx) | +| PP-YOLOE-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.onnx) | +| **PP-YOLOE+_s** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_m** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_l** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.onnx) | +| **PP-YOLOE+_x** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.onnx) | + +
    + +### [YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [model](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_nano_300e_coco.yml) | +| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [model](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_tiny_300e_coco.yml) | +| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_s_300e_coco.yml) | +| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_m_300e_coco.yml) | +| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_l_300e_coco.yml) | +| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [model](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_x_300e_coco.yml) | + YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [model](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [config](c../../onfigs/yolox/yolox_cdn_tiny_300e_coco.yml) | +| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [model](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_crn_s_300e_coco.yml) | +| YOLOX-s ConvNeXt| 640 | 8 | 36e | - | 44.6 | 65.3 | 36.2 | 27.52 | [model](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/yolox_convnext_s_36e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOx-nano | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.onnx) | +| YOLOx-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.onnx) | +| YOLOx-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.onnx) | +| YOLOx-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.onnx) | +| YOLOx-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.onnx) | +| YOLOx-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.onnx) | + +
    + + +### [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOv5-n | 640 | 16 | 300e | 2.6 | 28.0 | 45.7 | 1.87 | 4.52 | [model](https://paddledet.bj.bcebos.com/models/yolov5_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_n_300e_coco.yml) | +| YOLOv5-s | 640 | 16 | 300e | 3.2 | 37.6 | 56.7 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_s_300e_coco.yml) | +| YOLOv5-m | 640 | 16 | 300e | 5.2 | 45.4 | 64.1 | 21.19 | 49.08 | [model](https://paddledet.bj.bcebos.com/models/yolov5_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_m_300e_coco.yml) | +| YOLOv5-l | 640 | 16 | 300e | 7.9 | 48.9 | 67.1 | 46.56 | 109.32 | [model](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_l_300e_coco.yml) | +| YOLOv5-x | 640 | 16 | 300e | 13.7 | 50.6 | 68.7 | 86.75 | 205.92 | [model](https://paddledet.bj.bcebos.com/models/yolov5_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_x_300e_coco.yml) | +| YOLOv5-s ConvNeXt| 640 | 8 | 36e | - | 42.4 | 65.3 | 34.54 | 17.96 | [model](https://paddledet.bj.bcebos.com/models/yolov5_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_convnext_s_36e_coco.yml) | +| *YOLOv5p6-n | 1280 | 16 | 300e | - | 35.9 | 54.2 | 3.25 | 9.23 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_n_300e_coco.yml) | +| *YOLOv5p6-s | 1280 | 16 | 300e | - | 44.5 | 63.3 | 12.63 | 33.81 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_s_300e_coco.yml) | +| *YOLOv5p6-m | 1280 | 16 | 300e | - | 51.1 | 69.0 | 35.73 | 100.21 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_m_300e_coco.yml) | +| *YOLOv5p6-l | 1280 | 8 | 300e | - | 53.4 | 71.0 | 76.77 | 223.09 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_l_300e_coco.yml) | +| *YOLOv5p6-x | 1280 | 8 | 300e | - | 54.7 | 72.4 | 140.80 | 420.03 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_x_300e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv5-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.onnx) | +| YOLOv5-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.onnx) | +| YOLOv5-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.onnx) | +| YOLOv5-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.onnx) | +| YOLOv5-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.onnx) | + +
    + +### [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: | +| *YOLOv6-n | 640 | 16 | 300e(+300e) | 2.0 | 37.5 | 53.1 | 5.07 | 12.49 |[model](https://paddledet.bj.bcebos.com/models/yolov6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_n_300e_coco.yml) | +| *YOLOv6-s | 640 | 32 | 300e(+300e) | 2.7 | 44.8 | 61.7 | 20.18 | 49.36 |[model](https://paddledet.bj.bcebos.com/models/yolov6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_s_300e_coco.yml) | +| *YOLOv6-m | 640 | 32 | 300e(+300e) | - | 49.5 | 66.9 | 37.74 | 92.47 |[model](https://paddledet.bj.bcebos.com/models/yolov6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_m_300e_coco.yml) | +| *YOLOv6-l(silu) | 640 | 32 | 300e(+300e) | - | 52.2 | 70.2 | 59.66 | 149.4 |[model](https://paddledet.bj.bcebos.com/models/yolov6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_l_300e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| yolov6-n | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.onnx) | +| yolov6-s | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.onnx) | +| yolov6-m | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.onnx) | +| yolov6-l(silu) | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.zip) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.onnx) | [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.onnx) | + +
    + +### [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOv7-L | 640 | 32 | 300e | 7.4 | 51.0 | 70.2 | 37.62 | 106.08 |[model](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_l_300e_coco.yml) | +| *YOLOv7-X | 640 | 32 | 300e | 12.2 | 53.0 | 70.8 | 71.34 | 190.08 | [model](https://paddledet.bj.bcebos.com/models/yolov7_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_x_300e_coco.yml) | +| *YOLOv7P6-W6 | 1280 | 16 | 300e | 25.5 | 54.4 | 71.8 | 70.43 | 360.26 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_w6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_w6_300e_coco.yml) | +| *YOLOv7P6-E6 | 1280 | 10 | 300e | 31.1 | 55.7 | 73.0 | 97.25 | 515.4 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6_300e_coco.yml) | +| *YOLOv7P6-D6 | 1280 | 8 | 300e | 37.4 | 56.1 | 73.3 | 133.81 | 702.92 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_d6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_d6_300e_coco.yml) | +| *YOLOv7P6-E6E | 1280 | 6 | 300e | 48.7 | 56.5 | 73.7 | 151.76 | 843.52 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6e_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6e_300e_coco.yml) | +| YOLOv7-tiny | 640 | 32 | 300e | - | 37.3 | 54.5 | 6.23 | 6.90 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_300e_coco.yml) | +| YOLOv7-tiny | 416 | 32 | 300e | - | 33.3 | 49.5 | 6.23 | 2.91 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_416_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_416_300e_coco.yml) | +| YOLOv7-tiny | 320 | 32 | 300e | - | 29.1 | 43.8 | 6.23 | 1.73 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_320_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_320_300e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv7-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.onnx) | +| YOLOv7-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.onnx) | +| YOLOv7P6-W6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-E6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-D6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.onnx) | +| YOLOv7P6-E6E | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.onnx) | +| YOLOv7-tiny | 320 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.onnx) | + +
    + + +### [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| *YOLOv8-n | 640 | 16 | 500e | 2.4 | 37.3 | 53.0 | 3.16 | 8.7 | [model](https://paddledet.bj.bcebos.com/models/yolov8_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_n_300e_coco.yml) | +| *YOLOv8-s | 640 | 16 | 500e | 3.4 | 44.9 | 61.8 | 11.17 | 28.6 | [model](https://paddledet.bj.bcebos.com/models/yolov8_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_s_300e_coco.yml) | +| *YOLOv8-m | 640 | 16 | 500e | 6.5 | 50.2 | 67.3 | 25.90 | 78.9 | [model](https://paddledet.bj.bcebos.com/models/yolov8_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_m_300e_coco.yml) | +| *YOLOv8-l | 640 | 16 | 500e | 10.0 | 52.8 | 69.6 | 43.69 | 165.2 | [model](https://paddledet.bj.bcebos.com/models/yolov8_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_l_300e_coco.yml) | +| *YOLOv8-x | 640 | 16 | 500e | 15.1 | 53.8 | 70.6 | 68.23 | 257.8 | [model](https://paddledet.bj.bcebos.com/models/yolov8_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_x_300e_coco.yml) | +| *YOLOv8-P6-x | 1280 | 16 | 500e | 55.0 | - | - | 97.42 | 522.93 | [model](https://paddledet.bj.bcebos.com/models/yolov8p6_x_500e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8p6_x_500e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| YOLOv8-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.onnx) | +| YOLOv8-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.onnx) | +| YOLOv8-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.onnx) | +| YOLOv8-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.onnx) | +| YOLOv8-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.onnx) | + +
    + + +### [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | download | config | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| *RTMDet-t | 640 | 32 | 300e | 2.8 | 40.9 | 57.9 | 4.90 | 16.21 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_t_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_t_300e_coco.yml) | +| *RTMDet-s | 640 | 32 | 300e | 3.3 | 44.5 | 62.0 | 8.89 | 29.71 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_s_300e_coco.yml) | +| *RTMDet-m | 640 | 32 | 300e | 6.4 | 49.1 | 66.8 | 24.71 | 78.47 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_m_300e_coco.yml) | +| *RTMDet-l | 640 | 32 | 300e | 10.2 | 51.2 | 68.8 | 52.31 | 160.32 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_l_300e_coco.yml) | +| *RTMDet-x | 640 | 32 | 300e | 18.0 | 52.6 | 70.4 | 94.86 | 283.12 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_x_300e_coco.yml) | + +
    + +
    + Deploy Models + +| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) | +| :-------- | :--------: | :---------------------: | :----------------: | +| RTMDet-t | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.onnx) | +| RTMDet-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.onnx) | +| RTMDet-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.onnx) | +| RTMDet-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.onnx) | +| RTMDet-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.zip) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.onnx) | [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.onnx) | + +
    + + +### **Notes:** + - All the models are trained on COCO train2017 dataset and evaluated on val2017 dataset. The * in front of the model indicates that the training is being updated. + - Please check the specific accuracy and speed details in [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe),[YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox),[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7). **Note that YOLOv5, YOLOv6 and YOLOv7 have not adopted `multi_label` to eval**. +- TRT-FP16-Latency(ms) is the time spent in testing under TensorRT-FP16, **excluding data preprocessing and model output post-processing (NMS)**. The test adopts single card **Tesla T4 GPU, batch size=1**, and the test environment is **paddlepaddle-2.3.2**, **CUDA 11.2**, **CUDNN 8.2**, **GCC-8.2**, **TensorRT 8.0.3.4**. Please refer to the respective model homepage for details. +- For **FLOPs(G) and Params(M)**, you should first install [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), `pip install paddleslim`, then set `print_flops: True` and `print_params: True` in [runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml). Make sure **single scale** like 640x640, **MACs are printed,FLOPs=2*MACs**. + - Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), quantitative training of YOLO series models can achieve basically lossless accuracy and generally improve the speed by more than 30%. For details, please refer to [auto_compression](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression). + + +### [VOC](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc) + +
    + Baseline + +| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP(0.50,11point) | Params(M) | FLOPs(G) | download | config | +| :-----------: | :-------: | :-------: | :------: | :------------: | :---------------: | :------------------: |:-----------------: | :------: | :------: | +| YOLOv5-s | 640 | 16 | 60e | 3.2 | 80.3 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_60e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov5_s_60e_voc.yml) | +| YOLOv7-tiny | 640 | 32 | 60e | 2.6 | 80.2 | 6.23 | 6.90 | [model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_60e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov7_tiny_60e_voc.yml) | +| YOLOX-s | 640 | 8 | 40e | 3.0 | 82.9 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_40e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolox_s_40e_voc.yml) | +| PP-YOLOE+_s | 640 | 8 | 30e | 2.9 | 86.7 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/ppyoloe_plus_crn_s_30e_voc.yml) | + +
    + +**Note:** + - The VOC mAP is `mAP(IoU=0.5)`, and all the models **have not adopted `multi_label` to eval**. + - All YOLO VOC models are loaded with the COCO weights of their respective models as pre-train weights. Each config file uses 8 GPUs by default, which can be used as a reference for setting custom datasets. The specific mAP will vary depending on the datasets; + - We recommend to use YOLO detection model **with a total `batch_size` at least greater than `64` to train**. If the resources are insufficient, please **use the smaller model** or **reduce the input size of the model**. To ensure high detection accuracy, **you'd better not try to using single GPU or total `batch_size` less than `64` for training**; + - Params (M) and FLOPs (G) are measured during training. YOLOv7 has no s model, so tiny model is selected; + - For TRT-FP16 Latency (ms) speed measurement, please refer to the config homepage of each YOLO model; + + +## UserGuide + +Download MS-COCO dataset, [official website](https://cocodataset.org). The download links are: [annotations](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [train2017](http://images.cocodataset.org/zips/train2017.zip), [val2017](http://images.cocodataset.org/zips/val2017.zip), [test2017](http://images.cocodataset.org/zips/test2017.zip). +The download link provided by PaddleDetection team is: [coco](https://bj.bcebos.com/v1/paddledet/data/coco.tar)(about 22G) and [test2017](https://bj.bcebos.com/v1/paddledet/data/cocotest2017.zip). Note that test2017 is optional, and the evaluation is based on val2017. + + +### **Pipeline** + +Write the following commands in a script file, such as ```run.sh```, and run as:```sh run.sh```. You can also run the command line sentence by sentence. + +```bash +model_name=ppyoloe # yolov7 +job_name=ppyoloe_plus_crn_l_80e_coco # yolov7_tiny_300e_coco + +config=configs/${model_name}/${job_name}.yml +log_dir=log_dir/${job_name} +# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams +weights=output/${job_name}/model_final.pdparams + +# 1.training(single GPU / multi GPU) +# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp + +# 2.eval +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise + +# 3.infer +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5 + +# 4.export +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True + +# 5.deploy infer +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU + +# 6.deploy speed, add '--run_mode=trt_fp16' to test in TensorRT FP16 mode +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16 + +# 7.export onnx +paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx + +# 8.onnx speed +/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16 +``` + +**Note:** +- If you want to switch models, just modify the first two lines, such as: + ``` + model_name=yolov7 + job_name=yolov7_tiny_300e_coco + ``` +- For **exporting onnx**, you should install [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) by `pip install paddle2onnx` at first. +- For **FLOPs(G) and Params(M)**, you should install [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) by `pip install paddleslim` at first, then set `print_flops: True` and `print_params: True` in [runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml). Make sure **single scale** like 640x640, **MACs are printed,FLOPs=2*MACs**. + + +### CustomDataset + +#### preparation: + +1.For the annotation of custom dataset, please refer to[DetAnnoTools](../tutorials/data/DetAnnoTools.md); + +2.For training preparation of custom dataset,please refer to[PrepareDataSet](../tutorials/PrepareDataSet.md). + + +#### fintune: + +In addition to changing the path of the dataset, it is generally recommended to load **the COCO pre training weight of the corresponding model** to fintune, which will converge faster and achieve higher accuracy, such as: + +```base +# fintune with single GPU: +# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + +# fintune with multi GPU: +python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +**Note:** +- The fintune training will show that the channels of the last layer of the head classification branch is not matched, which is a normal situation, because the number of custom dataset is generally inconsistent with that of COCO dataset; +- In general, the number of epochs for fintune training can be set less, and the lr setting is also smaller, such as 1/10. The highest accuracy may occur in one of the middle epochs; + +#### Predict and export: + +When using custom dataset to predict and export models, if the path of the TestDataset dataset is set incorrectly, COCO 80 categories will be used by default. + +In addition to the correct path setting of the TestDataset dataset, you can also modify and add the corresponding `label_list`. Txt file (one category is recorded in one line), and `anno_path` in TestDataset can also be set as an absolute path, such as: +``` +TestDataset: + !ImageFolder + anno_path: label_list.txt # if not set dataset_dir, the anno_path will be relative path of PaddleDetection root directory + # dataset_dir: dataset/my_coco # if set dataset_dir, the anno_path will be dataset_dir/anno_path +``` +one line in `label_list.txt` records a corresponding category: +``` +person +vehicle +``` diff --git a/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL.md b/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL.md new file mode 100644 index 0000000000000000000000000000000000000000..f1fe33612372be462c4e63a63eeb5901e3a1faac --- /dev/null +++ b/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL.md @@ -0,0 +1,54 @@ +简体中文 | [English](SSLD_PRETRAINED_MODEL_en.md) + +### Simple semi-supervised label knowledge distillation solution (SSLD) + +### R-CNN on COCO + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 1x | ---- | 41.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 2x | ---- | 42.3 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 1x | ---- | 42.0 | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 2x | ---- | 42.7 | 38.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | + + +### YOLOv3 on COCO + +| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: | +| MobileNet-V1-SSLD | 608 | 8 | 270e | ---- | 31.0 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | ---- | 30.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | ---- | 28.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | + +### YOLOv3 on Pasacl VOC + +| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 | +| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: | +| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 78.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 79.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 608 | 8 | 270e | - | 80.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | + +**注意事项:** + +- [SSLD](https://arxiv.org/abs/2103.05959)是一种知识蒸馏方法,我们使用蒸馏后性能更强的backbone预训练模型,进一步提升检测精度,详细方案请参考[知识蒸馏教程](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md) + +![demo image](../images/ssld_model.png) + +## Citations +``` +@misc{cui2021selfsupervision, + title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones}, + author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma}, + year={2021}, + eprint={2103.05959}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md b/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b0e39b574a01c1fcc0cd51dda3ddeba718fc1501 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md @@ -0,0 +1,53 @@ +English | [简体中文](SSLD_PRETRAINED_MODEL.md) + +### Simple semi-supervised label knowledge distillation solution (SSLD) + +### R-CNN on COCO + +| Backbone | Model | Images/GPU | Lr schd | FPS | Box AP | Mask AP | Download | Config | +| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 1x | ---- | 41.4 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 2x | ---- | 42.3 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 1x | ---- | 42.0 | 38.2 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 2x | ---- | 42.7 | 38.9 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | +| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | + +### YOLOv3 on COCO + +| Backbone | Input shape | Images/GPU | Lr schd | FPS | Box AP | Download | Config | +| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: | +| MobileNet-V1-SSLD | 608 | 8 | 270e | ---- | 31.0 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | ---- | 30.6 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | ---- | 28.4 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) | + +### YOLOv3 on Pasacl VOC + +| Backbone | Input shape | Images/GPU | Lr schd | FPS | Box AP | Download | Config | +| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: | +| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 78.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 79.6 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 77.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 608 | 8 | 270e | - | 80.4 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | +| MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | + +**Notes:** + +- [SSLD](https://arxiv.org/abs/2103.05959) is a knowledge distillation method. We use the stronger backbone pretrained model after distillation to further improve the detection accuracy. Please refer to the [knowledge distillation tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md). + +![demo image](../images/ssld_model.png) + +## Citations +``` +@misc{cui2021selfsupervision, + title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones}, + author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma}, + year={2021}, + eprint={2103.05959}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/PaddleDetection-release-2.6/docs/images/000000014439.jpg b/PaddleDetection-release-2.6/docs/images/000000014439.jpg new file mode 100644 index 0000000000000000000000000000000000000000..56a4f66768c439adf0fadbde7b150b520c6d09e3 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/000000014439.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/12_Group_Group_12_Group_Group_12_935.jpg b/PaddleDetection-release-2.6/docs/images/12_Group_Group_12_Group_Group_12_935.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2a563361ae03fbe079dba017374eee51ccbd17dd Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/12_Group_Group_12_Group_Group_12_935.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/PedestrianDetection_001.png b/PaddleDetection-release-2.6/docs/images/PedestrianDetection_001.png new file mode 100644 index 0000000000000000000000000000000000000000..5194d6ff891b9507fedfc53f36de4f00219c7f30 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/PedestrianDetection_001.png differ diff --git a/PaddleDetection-release-2.6/docs/images/PedestrianDetection_004.png b/PaddleDetection-release-2.6/docs/images/PedestrianDetection_004.png new file mode 100644 index 0000000000000000000000000000000000000000..7c62be5051f9a47c5f5e98ccd9f45c3fa5f30257 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/PedestrianDetection_004.png differ diff --git a/PaddleDetection-release-2.6/docs/images/VehicleDetection_001.jpeg b/PaddleDetection-release-2.6/docs/images/VehicleDetection_001.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..aa2b679d4d2a73487edd5f9c67323ab18df93893 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/VehicleDetection_001.jpeg differ diff --git a/PaddleDetection-release-2.6/docs/images/VehicleDetection_005.png b/PaddleDetection-release-2.6/docs/images/VehicleDetection_005.png new file mode 100644 index 0000000000000000000000000000000000000000..57f918a30fcc5bf7bda284c1a1a0304e8822d325 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/VehicleDetection_005.png differ diff --git a/PaddleDetection-release-2.6/docs/images/add_attribute.png b/PaddleDetection-release-2.6/docs/images/add_attribute.png new file mode 100644 index 0000000000000000000000000000000000000000..1d6092c4a3f778f08b0636875bdcb30a51d0655d Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/add_attribute.png differ diff --git a/PaddleDetection-release-2.6/docs/images/bus.jpg b/PaddleDetection-release-2.6/docs/images/bus.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cdbbf8c9ba9990fb228360db590e37f078160767 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/bus.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/dog.jpg b/PaddleDetection-release-2.6/docs/images/dog.jpg new file mode 100644 index 0000000000000000000000000000000000000000..237c084d9b0dd5cf32e9ec5463ab027ebd148df8 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/dog.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/fps_map.png b/PaddleDetection-release-2.6/docs/images/fps_map.png new file mode 100644 index 0000000000000000000000000000000000000000..0fbafcb4fb55fb3659a09b9ff20b6f82a9fe2ffc Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/fps_map.png differ diff --git a/PaddleDetection-release-2.6/docs/images/grad_cam_ppyoloe_demo.jpg b/PaddleDetection-release-2.6/docs/images/grad_cam_ppyoloe_demo.jpg new file mode 100644 index 0000000000000000000000000000000000000000..83631b0a0590ad1ecb2e67b6932507b1022a29bf Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/grad_cam_ppyoloe_demo.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/input_shape.png b/PaddleDetection-release-2.6/docs/images/input_shape.png new file mode 100644 index 0000000000000000000000000000000000000000..1148116f81ec78ae625f342fa51dcf778d1fb4ca Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/input_shape.png differ diff --git a/PaddleDetection-release-2.6/docs/images/instance_seg.png b/PaddleDetection-release-2.6/docs/images/instance_seg.png new file mode 100644 index 0000000000000000000000000000000000000000..7ba84009457640edc700805be5e48207ffa660ad Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/instance_seg.png differ diff --git a/PaddleDetection-release-2.6/docs/images/layout.jpg b/PaddleDetection-release-2.6/docs/images/layout.jpg new file mode 100644 index 0000000000000000000000000000000000000000..1c3ca618d30c4c04f062a7db382326ebb4d4e599 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/layout.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/lite_demo.jpg b/PaddleDetection-release-2.6/docs/images/lite_demo.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0eee6e84c24ee44d422f314a92a3df5d7cf2dc81 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/lite_demo.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/mobile_fps_map.png b/PaddleDetection-release-2.6/docs/images/mobile_fps_map.png new file mode 100644 index 0000000000000000000000000000000000000000..2b31508332710042406ab046529148d82a0581e8 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/mobile_fps_map.png differ diff --git a/PaddleDetection-release-2.6/docs/images/model_figure.png b/PaddleDetection-release-2.6/docs/images/model_figure.png new file mode 100644 index 0000000000000000000000000000000000000000..72ec8cdad23a49e948f39fe3091c26f7a94d74a4 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/model_figure.png differ diff --git a/PaddleDetection-release-2.6/docs/images/picedet_demo.jpeg b/PaddleDetection-release-2.6/docs/images/picedet_demo.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..031f38ab484ac8f375d71029a2588a81ec23aa1e Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/picedet_demo.jpeg differ diff --git a/PaddleDetection-release-2.6/docs/images/picodet_android_demo1.jpg b/PaddleDetection-release-2.6/docs/images/picodet_android_demo1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6732f63246d1f693eba8a77f1fb4a201078a3b01 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/images/picodet_android_demo1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b2c7341bdeac0742ade539f5731b254a8aa411866fa181296662981db0438c2 +size 1005558 diff --git a/PaddleDetection-release-2.6/docs/images/picodet_android_demo2.jpg b/PaddleDetection-release-2.6/docs/images/picodet_android_demo2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b4b0d31ef44d2ed09674d00d5a60fb0f06f95f18 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/images/picodet_android_demo2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e0c705c07a56883f414f91d1d22e31cd8766c0c9806d4208988ca1bc5371f802 +size 1175030 diff --git a/PaddleDetection-release-2.6/docs/images/picodet_android_demo3.jpg b/PaddleDetection-release-2.6/docs/images/picodet_android_demo3.jpg new file mode 100644 index 0000000000000000000000000000000000000000..554dbbb288796b34cec6bc16123a3b841c76c535 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/images/picodet_android_demo3.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:da1ca8f1a4ff1330b5c2b99154613eef5514d2c61d17fa44f00faa86bfce2517 +size 1211006 diff --git a/PaddleDetection-release-2.6/docs/images/picodet_android_demo4.jpg b/PaddleDetection-release-2.6/docs/images/picodet_android_demo4.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c8c0dc128ed058c61434b7d7029d209ad64c3ac4 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/picodet_android_demo4.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/picodet_map.png b/PaddleDetection-release-2.6/docs/images/picodet_map.png new file mode 100644 index 0000000000000000000000000000000000000000..e66e480e9ddf178e164fb6648e3bfbd4fd6a2817 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/images/picodet_map.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43b8434ab5807dc7bef47e363e481a71de0e13ad0f2686140288c753f7863305 +size 2572722 diff --git a/PaddleDetection-release-2.6/docs/images/pphuman-tech.png b/PaddleDetection-release-2.6/docs/images/pphuman-tech.png new file mode 100644 index 0000000000000000000000000000000000000000..401690df5a28f2d9b89bf0599da167496da8bc82 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/pphuman-tech.png differ diff --git a/PaddleDetection-release-2.6/docs/images/pphumanv2.png b/PaddleDetection-release-2.6/docs/images/pphumanv2.png new file mode 100644 index 0000000000000000000000000000000000000000..87b896ed0e5f973497aaedb7cf2042d004e080c3 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/pphumanv2.png differ diff --git a/PaddleDetection-release-2.6/docs/images/pptracking.png b/PaddleDetection-release-2.6/docs/images/pptracking.png new file mode 100644 index 0000000000000000000000000000000000000000..557584e666166eafe4e9659ae0fa951521575e95 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/pptracking.png differ diff --git a/PaddleDetection-release-2.6/docs/images/pptracking_en.png b/PaddleDetection-release-2.6/docs/images/pptracking_en.png new file mode 100644 index 0000000000000000000000000000000000000000..408ccc48cf4b0a0f86f1951b7f782539c58b6adf Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/pptracking_en.png differ diff --git a/PaddleDetection-release-2.6/docs/images/ppvehicle.png b/PaddleDetection-release-2.6/docs/images/ppvehicle.png new file mode 100644 index 0000000000000000000000000000000000000000..87176c313e7af184225e0f09001be53955b98f7a Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/ppvehicle.png differ diff --git a/PaddleDetection-release-2.6/docs/images/ppyolo_map_fps.png b/PaddleDetection-release-2.6/docs/images/ppyolo_map_fps.png new file mode 100644 index 0000000000000000000000000000000000000000..f860d220d1c831e42a23e38fc78732426c23e2cc Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/ppyolo_map_fps.png differ diff --git a/PaddleDetection-release-2.6/docs/images/ppyoloe_plus_map_fps.png b/PaddleDetection-release-2.6/docs/images/ppyoloe_plus_map_fps.png new file mode 100644 index 0000000000000000000000000000000000000000..dbc0e4cca60775103fd655c36c3c4092f57a24a5 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/ppyoloe_plus_map_fps.png differ diff --git a/PaddleDetection-release-2.6/docs/images/ppyoloe_r_map_fps.png b/PaddleDetection-release-2.6/docs/images/ppyoloe_r_map_fps.png new file mode 100644 index 0000000000000000000000000000000000000000..2d4553b97e96a63c428b08a2da9d0f8880e72be8 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/ppyoloe_r_map_fps.png differ diff --git a/PaddleDetection-release-2.6/docs/images/reader_figure.png b/PaddleDetection-release-2.6/docs/images/reader_figure.png new file mode 100644 index 0000000000000000000000000000000000000000..68441a20cd5bc14349bfea01a3ffa66a31ac1793 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/reader_figure.png differ diff --git a/PaddleDetection-release-2.6/docs/images/res.jpg b/PaddleDetection-release-2.6/docs/images/res.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6f281fa3be0053d5a919da4ee36c6005e0664daa Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/res.jpg differ diff --git a/PaddleDetection-release-2.6/docs/images/road554.png b/PaddleDetection-release-2.6/docs/images/road554.png new file mode 100644 index 0000000000000000000000000000000000000000..1ecd45d9403897aa048417a9b69ad06e7ce41016 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/road554.png differ diff --git a/PaddleDetection-release-2.6/docs/images/roadsign_yml.png b/PaddleDetection-release-2.6/docs/images/roadsign_yml.png new file mode 100644 index 0000000000000000000000000000000000000000..242bab90bd75f7ab08c7477475222b0b37678c43 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/roadsign_yml.png differ diff --git a/PaddleDetection-release-2.6/docs/images/ssld_model.png b/PaddleDetection-release-2.6/docs/images/ssld_model.png new file mode 100644 index 0000000000000000000000000000000000000000..23508712be7e6b6787575a66ca4c65037c9015c8 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/ssld_model.png differ diff --git a/PaddleDetection-release-2.6/docs/images/tinypose_app.png b/PaddleDetection-release-2.6/docs/images/tinypose_app.png new file mode 100644 index 0000000000000000000000000000000000000000..fd43ebcdcaec7bda1c57378e7b82b9d103ee3cb2 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/tinypose_app.png differ diff --git a/PaddleDetection-release-2.6/docs/images/tinypose_demo.png b/PaddleDetection-release-2.6/docs/images/tinypose_demo.png new file mode 100644 index 0000000000000000000000000000000000000000..fc24c821420ec1c95f815579b1ee8c43a10fcdb1 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/images/tinypose_demo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ceb1f4d4d463c3394312bab51249f394a88efb0f8a2d6b94c52336cc847124d +size 2520527 diff --git a/PaddleDetection-release-2.6/docs/images/tinypose_pipeline.png b/PaddleDetection-release-2.6/docs/images/tinypose_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..67542c12b0f81a5e135aae02cae007668d56ca07 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/tinypose_pipeline.png differ diff --git a/PaddleDetection-release-2.6/docs/images/yaml_show.png b/PaddleDetection-release-2.6/docs/images/yaml_show.png new file mode 100644 index 0000000000000000000000000000000000000000..b6319752d4f13471f2edc4a357cb9ec51ec90c75 Binary files /dev/null and b/PaddleDetection-release-2.6/docs/images/yaml_show.png differ diff --git a/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_cn.md b/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..8581cc1b59f8393d185a913b6b723b8252ae26a7 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_cn.md @@ -0,0 +1,66 @@ +[English](DistributedTraining_en.md) | 简体中文 + + +# 分布式训练 + +## 1. 简介 + +* 分布式训练指的是将训练任务按照一定方法拆分到多个计算节点进行计算,再按照一定的方法对拆分后计算得到的梯度等信息进行聚合与更新。飞桨分布式训练技术源自百度的业务实践,在自然语言处理、计算机视觉、搜索和推荐等领域经过超大规模业务检验。分布式训练的高性能,是飞桨的核心优势技术之一,PaddleDetection同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考:[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。 + +## 2. 使用方法 + +### 2.1 单机训练 + +* 以PP-YOLOE-s为例,本地准备好数据之后,使用`paddle.distributed.launch`或者`fleetrun`的接口启动训练任务即可。下面为运行脚本示例。 + +```bash +fleetrun \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +### 2.2 多机训练 + +* 相比单机训练,多机训练时,只需要添加`--ips`的参数,该参数表示需要参与分布式训练的机器的ip列表,不同机器的ip用逗号隔开。下面为运行代码示例。 + +```shell +ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" +fleetrun \ +--ips=${ip_list} \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +**注:** +* 不同机器的ip信息需要用逗号隔开,可以通过`ifconfig`或者`ipconfig`查看。 +* 不同机器之间需要做免密设置,且可以直接ping通,否则无法完成通信。 +* 不同机器之间的代码、数据与运行命令或脚本需要保持一致,且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0,以此类推。 +* 不同机器的起始端口可能不同,建议在启动多机任务前,在不同的机器中设置相同的多机运行起始端口,命令为`export FLAGS_START_PORT=17000`,端口值建议在`10000~20000`之间。 + + +## 3. 性能效果测试 + +* 在3机8卡V100的机器上进行模型训练,不同模型的精度、训练耗时、多机加速比情况如下所示。 + +| 模型 | 数据集 | 配置 | 单机8卡耗时/精度 | 3机8卡耗时/精度 | 加速比 | +|:---------:|:--------:|:--------:|:--------:|:--------:|:------:| +| PP-YOLOE-s | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 301h/- | 162h/17.7% | **1.85** | +| PP-YOLOE-l | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | 401h/- | 178h/30.3% | **2.25** | + + +* 在4机8卡V100的机器上进行模型训练,不同模型的精度、训练耗时、多机加速比情况如下所示。 + + +| 模型 | 数据集 | 配置 | 单机8卡耗时/精度 | 4机8卡耗时/精度 | 加速比 | +|:---------:|:--------:|:--------:|:--------:|:--------:|:------:| +| PP-YOLOE-s | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 39h/42.7% | 13h/42.1% | **3.0** | +| PP-YOLOE-m | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | 337h/- | 112h/24.6% | **3.0** | +| PP-YOLOE-x | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | 464h/- | 125h/32.1% | **3.4** | + + +* **注意** + * 在训练的GPU卡数过多时,精度会稍微有所损失(1%左右),此时可以尝试通过添加warmup或者适当增加迭代轮数来弥补精度损失。 + * 这里的配置文件均提供的是COCO数据集的配置文件,如果需要训练其他的数据集,需要修改数据集路径。 + * 上面的`PP-YOLOE`系列模型在多机训练过程中,均设置单卡batch size为8,同时学习率相比于单机8卡保持不变。 diff --git a/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_en.md b/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_en.md new file mode 100644 index 0000000000000000000000000000000000000000..af69ab1c2d5f6714475a0b90e13bed4950831a3d --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/DistributedTraining_en.md @@ -0,0 +1,60 @@ +English | [简体中文](DistributedTraining_cn.md) + + +## 1. Usage + +### 1.1 Single-machine + +* Take PP-YOLOE-s as an example, after preparing the data locally, use the interface of `paddle.distributed.launch` or `fleetrun` to start the training task. Below is an example of running the script. + +```bash +fleetrun \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +### 1.2 Multi-machine + +* Compared with single-machine training, when training on multiple machines, you only need to add the `--ips` parameter, which indicates the ip list of machines that need to participate in distributed training. The ips of different machines are separated by commas. Below is an example of running code. + +```shell +ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" +fleetrun \ +--ips=${ip_list} \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +**Note:** +* The ip information of different machines needs to be separated by commas, which can be viewed through `ifconfig` or `ipconfig`. +* Password-free settings are required between different machines, and they can be pinged directly, otherwise the communication cannot be completed. +* The code, data, and running commands or scripts between different machines need to be consistent, and the set training commands or scripts need to be run on all machines. The first device of the first machine in the final `ip_list` is trainer0, and so on. +* The starting port of different machines may be different. It is recommended to set the same starting port for multi-machine running in different machines before starting the multi-machine task. The command is `export FLAGS_START_PORT=17000`, and the port value is recommended to be `10000~20000`. + + +## 2. Performance + +* We conducted model training on 3x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below. + +| Model | Dataset | Configuration | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio | +|:---------:|:--------:|:--------:|:--------:|:--------:|:------:| +| PP-YOLOE-s | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 301h/- | 162h/17.7% | **1.85** | +| PP-YOLOE-l | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | 401h/- | 178h/30.3% | **2.25** | + + +* We conducted model training on 4x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below. + + +| Model | Dataset | Configuration | 8 GPU training time / Accuracy | 4x8 GPU training time / Accuracy | Acceleration ratio | +|:---------:|:--------:|:--------:|:--------:|:--------:|:------:| +| PP-YOLOE-s | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 39h/42.7% | 13h/42.1% | **3.0** | +| PP-YOLOE-m | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | 337h/- | 112h/24.6% | **3.0** | +| PP-YOLOE-x | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | 464h/- | 125h/32.1% | **3.4** | + + +* **Note** + * When the number of GPU cards for training is too large, the accuracy will be slightly lost (about 1%). At this time, you can try to warmup the training process or increase some training epochs to reduce the lost. + * The configuration files here are provided based on COCO datasets. If you need to train on other datasets, you need to modify the dataset path. + * For the multi-machine training process of `PP-YOLOE` series, the batch size of single card is set as 8 and learning rate is same as that of single machine. diff --git "a/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\344\270\200\346\234\237.md" "b/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\344\270\200\346\234\237.md" new file mode 100644 index 0000000000000000000000000000000000000000..b7926d86d2135772af3e2f7a8c96f24b82d42e13 --- /dev/null +++ "b/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\344\270\200\346\234\237.md" @@ -0,0 +1,57 @@ +# FAQ:第一期 + +**Q:**SOLOv2训练mAP值宽幅震荡,无上升趋势,检测效果不好,检测置信度超过了1的原因是? + +**A:** SOLOv2训练不收敛的话,先更新PaddleDetection到release/2.2或者develop分支尝试。 + + + +**Q:** Optimizer中优化器支持哪几种? + +**A:** Paddle中支持的优化器[Optimizer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html )在PaddleDetection中均支持,需要手动修改下配置文件即可。 + + + +**Q:** 在tools/infer.py加入如下函数,得到FLOPs值为-1,请问原因? + +**A:** 更新PaddleDetection到release/2.2或者develop分支,`print_flops`设为True即可打印FLOPs。 + + + +**Q:** 使用官方的ReID模块时遇到了模块未注册的问题 + +**A:** 请尝试`pip uninstall paddledet`并重新安装,或者`python setup.py install`。 + + + +**Q:** 大规模实用目标检测模型有动态图版本吗,或者可以转换为动态图版本吗? + +**A:** 大规模实用模型的动态图版本正在整理,我们正在开发更大规模的通用预训练模型,预计在2.3版本中发布。 + + + +**Q:** Develop分支下FairMot预测视频问题:预测视频时不会完全运行完毕。比如用一个300frame的视频,代码会保存预测结果的每一帧图片,但只保存到299张就没了,并且也没有预测好的视频文件生成,该如何解决? + +**A:** 已经支持自己设置帧率infer视频,请使用develop分支或release/2.2分支,命令如下: + +``` +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --frame_rate=20 --save_videos +``` + + + +**Q:** 使用YOLOv3模型如何通过yml文件修改输入图片尺寸? + +**A:** 模型预测部署需要用到指定的尺寸时,首先在训练前需要修改`configs/_base_/yolov3_reader.yml`中的`TrainReader`的`BatchRandomResize`中`target_size`包含指定的尺寸,训练完成后,在评估或者预测时,需要将`EvalReader`和`TestReader`中的`Resize`的`target_size`修改成对应的尺寸,如果是需要模型导出(export_model),则需要将`TestReader`中的`image_shape`修改为对应的图片输入尺寸 。 + + + +**Q:** 以前的模型都是用静态图训练的,现在想用动态图训练,但想加载原来静态图的模型作为预训练模型,可以直接用加载静态图保存的模型断点吗?如不行,有其它方法吗? + +**A:** 静态图和动态图模型的权重的key做下映射一一对应转过去是可以的,可以参考[这个代码](https://github.com/nemonameless/weights_st2dy )。但是不保证所有静态图的权重的key映射都能对应上,静态图是把背景也训练了,动态图去背景类训的,而且现有动态图模型训出来的一般都比以前静态图更高,资源时间够的情况下建议还是直接训动态图版本。 + + + +**Q:** TTFNet训练过程中hm_loss异常 + +**A:** 如果是单卡的话学习率需要对应降低8倍。另外ttfnet模型因为自身设置的学习率比较大,可能会出现其他数据集训练出现不稳定的情况。建议pretrain_weights加载官方release出的coco数据集上训练好的模型,然后将学习率再调低一些。 diff --git "a/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" "b/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" new file mode 100644 index 0000000000000000000000000000000000000000..4478495bff8e52ed1377ad8e09ee63a49ce606da --- /dev/null +++ "b/PaddleDetection-release-2.6/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" @@ -0,0 +1,104 @@ +# FAQ:第零期 + +**Q:** 为什么我使用单GPU训练loss会出`NaN`?
    +**A:** 配置文件中原始学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。 + +以[faster_rcnn_r50](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) 为例,在静态图下计算规则表如下所示,它们是等价的,表中变化节点即为`piecewise decay`里的`boundaries`:
    + + +| GPU数 |batch size/卡| 学习率 | 最大轮数 | 变化节点 | +| :---------: | :------------:|:------------: | :-------: | :--------------: | +| 2 | 1 | 0.0025 | 720000 | [480000, 640000] | +| 4 | 1 | 0.005 | 360000 | [240000, 320000] | +| 8 | 1| 0.01 | 180000 | [120000, 160000] | + +* 上述方式适用于静态图下。在动态图中,由于训练以epoch方式计数,因此调整GPU卡数后只需要修改学习率即可,修改方式和静态图相同. + + +**Q:** 自定义数据集时,配置文件里的`num_classes`应该如何设置?
    +**A:** 动态图中,自定义数据集时将`num_classes`统一设置为自定义数据集的类别数即可,静态图中(static目录下),YOLO系列模型和anchor free系列模型将`num_classes`设置为自定义数据集类别即可,其他模型如RCNN系列,SSD,RetinaNet,SOLOv2等模型,由于检测原理上分类中需要区分背景框和前景框,设置的`num_classes`须为自定义数据集类别数+1,即增加一类背景类。 + +**Q:** PP-YOLOv2模型训练使用`—eval`做训练中验证,在第一次做eval的时候hang住,该如何处理?
    +**A:** PP-YOLO系列模型如果只加载backbone的预训练权重从头开始训练的话收敛会比较慢,当模型还没有较好收敛的时候做预测时,由于输出的预测框比较混乱,在NMS时做排序和滤除会非常耗时,就好像eval时hang住了一样,这种情况一般发生在使用自定义数据集并且自定义数据集样本数较少导致训练到第一次做eval的时候训练轮数较少,模型还没有较好收敛的情况下,可以通过如下三个方面排查解决。 + + + +* PaddleDetection中提供的默认配置一般是采用8卡训练的配置,配置文件中的`batch_size`数为每卡的batch size,若训练的时候不是使用8卡或者对`batch_size`有修改,需要等比例的调小初始`learning_rate`来获得较好的收敛效果 + +* 如果使用自定义数据集并且样本数比较少,建议增大`snapshot_epoch`数来增加第一次进行eval的时候的训练轮数来保证模型已经较好收敛 + +* 若使用自定义数据集训练,可以加载我们发布的COCO或VOC数据集上训练好的权重进行finetune训练来加快收敛速度,可以使用`-o pretrain_weights=xxx`的方式指定预训练权重,xxx可以是Model Zoo里发布的模型权重链接 + + + + +**Q:** 如何更好的理解reader和自定义修改reader文件 +``` +# 每张GPU reader进程个数 +worker_num: 2 +# 训练数据 +TrainReader: + inputs_def: + num_max_boxes: 50 + # 训练数据transforms + sample_transforms: + - Decode: {} # 图片解码,将图片数据从numpy格式转为rgb格式,是必须存在的一个OP + - Mixup: {alpha: 1.5, beta: 1.5} # Mixup数据增强,对两个样本的gt_bbbox/gt_score操作,构建虚拟的训练样本,可选的OP + - RandomDistort: {} # 随机颜色失真,可选的OP + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} # 随机Canvas填充,可选的OP + - RandomCrop: {} # 随机裁剪,可选的OP + - RandomFlip: {} # 随机左右翻转,默认概率0.5,可选的OP + # batch_transforms + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + # 训练时batch_size + batch_size: 24 + # 读取数据是否乱序 + shuffle: true + # 是否丢弃最后不能完整组成batch的数据 + drop_last: true + # mixup_epoch,大于最大epoch,表示训练过程一直使用mixup数据增广。默认值为-1,表示不使用Mixup。如果删去- Mixup: {alpha: 1.5, beta: 1.5}这行代码则必须也将mixup_epoch设置为-1或者删除 + mixup_epoch: 25000 + # 是否通过共享内存进行数据读取加速,需要保证共享内存大小(如/dev/shm)满足大于1G + use_shared_memory: true + + 如果需要单尺度训练,则去掉batch_transforms里的BatchRandomResize这一行,在sample_transforms最后一行添加- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + + Decode是必须保留的,如果想要去除数据增强,则可以注释或删除Mixup RandomDistort RandomExpand RandomCrop RandomFlip,注意如果注释或删除Mixup则必须也将mixup_epoch这一行注释或删除,或者设置为-1表示不使用Mixup + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + +``` +**Q:** 用户如何控制类别类别输出?即图中有多类目标只输出其中的某几类 + +**A:** 用户可自行在代码中进行修改,增加条件设置。 +``` +# filter by class_id +keep_class_id = [1, 2] +bbox_res = [e for e in bbox_res if int(e[0]) in keep_class_id] +``` +https://github.com/PaddlePaddle/PaddleDetection/blob/b87a1ea86fa18ce69e44a17ad1b49c1326f19ff9/ppdet/engine/trainer.py#L438 + +**Q:** 用户自定义数据集训练,预测结果标签错误 + +**A:** 此类情况往往是用户在设置数据集路径时候,并没有关注TestDataset中anno_path的路径问题。需要用户将anno_path设置成自己的路径。 +``` +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json +``` + +**Q:** 如何打印网络FLOPs? + +**A:** 在`configs/runtime.yml`中设置`print_flops: true`,同时需要安装PaddleSlim(比如:pip install paddleslim),即可打印模型的FLOPs。 + +**Q:** 如何使用无标注框进行训练? + +**A:** 在`configs/dataset/coco.py` 或者`configs/dataset/voc.py`中的TrainDataset下设置`allow_empty: true`, 此时允许数据集加载无标注框进行训练。该功能支持coco,voc数据格式,RCNN系列和YOLO系列模型验证能够正常训练。另外,如果无标注框数据过多,会影响模型收敛,在TrainDataset下可以设置`empty_ratio: 0.1`对无标注框数据进行随机采样,控制无标注框的数据量占总数据量的比例,默认值为1.,即使用全部无标注框 diff --git a/PaddleDetection-release-2.6/docs/tutorials/FAQ/README.md b/PaddleDetection-release-2.6/docs/tutorials/FAQ/README.md new file mode 100644 index 0000000000000000000000000000000000000000..67d688600f1e93455f5ac700ff1b51fcc1bbb375 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/FAQ/README.md @@ -0,0 +1,6 @@ +# FAQ/常见问题 + +**PaddleDetection**非常感谢各位开发者提出任何使用问题或需求,我们根据大家的提问,总结**FAQ/常见问题**合集,并在**每周一**进行更新,以下是往期的FAQ,欢迎大家进行查阅。 + +- [FAQ:第零期](./FAQ第零期.md) +- [FAQ:第一期](./FAQ第一期.md) diff --git a/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED.md b/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..6ed4043a2f69fcbba19e757c6abdaa6b8507fc7b --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED.md @@ -0,0 +1,146 @@ +English | [简体中文](GETTING_STARTED_cn.md) + +# Getting Started + +## Installation + +For setting up the running environment, please refer to [installation +instructions](INSTALL_cn.md). + + + +## Data preparation + +- Please refer to [PrepareDetDataSet](./data/PrepareDetDataSet_en.md) for data preparation +- Please set the data path for data configuration file in ```configs/datasets``` + +## Training & Evaluation & Inference + +PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. And for more distribued training details see [DistributedTraining].(./DistributedTraining_en.md) + +```bash +# training on single-GPU +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml +# training on multi-GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml +# training on multi-machines and multi-GPUs +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +$fleetrun --ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" --selected_gpu 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml +# GPU evaluation +export CUDA_VISIBLE_DEVICES=0 +python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams +# Inference +python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams +``` + +### Other argument list + +list below can be viewed by `--help` + +| FLAG | script supported | description | default | remark | +| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | +| -c | ALL | Select config file | None | **required**, such as `-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml` | +| -o | ALL | Set parameters in configure file | None | `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False` | +| --eval | train | Whether to perform evaluation in training | False | set `--eval` if needed | +| -r/--resume_checkpoint | train | Checkpoint path for resuming training | None | such as `-r output/faster_rcnn_r50_1x_coco/10000` | +| --slim_config | ALL | Configure file of slim method | None | such as `--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` | +| --use_vdl | train/infer | Whether to record the data with [VisualDL](https://github.com/paddlepaddle/visualdl), so as to display in VisualDL | False | VisualDL requires Python>=3.5 | +| --vdl\_log_dir | train/infer | VisualDL logging directory for image | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL requires Python>=3.5 | +| --output_eval | eval | Directory for storing the evaluation output | None | such as `--output_eval=eval_output`, default is current directory | +| --json_eval | eval | Whether to evaluate with already existed bbox.json or mask.json | False | set `--json_eval` if needed and json path is set in `--output_eval` | +| --classwise | eval | Whether to eval AP for each class and draw PR curve | False | set `--classwise` if needed | +| --output_dir | infer | Directory for storing the output visualization files | `./output` | such as `--output_dir output` | +| --draw_threshold | infer | Threshold to reserve the result for visualization | 0.5 | such as `--draw_threshold 0.7` | +| --infer_dir | infer | Directory for images to perform inference on | None | One of `infer_dir` and `infer_img` is requied | +| --infer_img | infer | Image path | None | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir` | +| --save_results | infer | Whether to save detection results to file | False | Optional + + + +## Examples + +### Training + +- Perform evaluation in training + + ```bash + export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 + python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval + ``` + + Perform training and evalution alternatively and evaluate at each end of epoch. Meanwhile, the best model with highest MAP is saved at each epoch which has the same path as `model_final`. + + If evaluation dataset is large, we suggest modifing `snapshot_epoch` in `configs/runtime.yml` to decrease evaluation times or evaluating after training. + +- Fine-tune other task + + When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example: + + + ```bash + export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 + # If the shape of parameters in program is different from pretrain_weights, + # then PaddleDetection will not use such parameters. + python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \ + ``` + +##### NOTES + +- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. +- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally. +- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`. +- Checkpoints are saved in `output` by default, and can be revised from `save_dir` in `configs/runtime.yml`. + + +### Evaluation + +- Evaluate by specified weights path and dataset path + + ```bash + export CUDA_VISIBLE_DEVICES=0 + python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams + ``` + + The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md). + +- Evaluate with json + + ```bash + export CUDA_VISIBLE_DEVICES=0 + python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + --json_eval \ + -output_eval evaluation/ + ``` + + The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. + + +### Inference + +- Output specified directory && Set up threshold + + ```bash + export CUDA_VISIBLE_DEVICES=0 + python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + --infer_img=demo/000000570688.jpg \ + --output_dir=infer_output/ \ + --draw_threshold=0.5 \ + -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \ + --use_vdl=True + ``` + + `--draw_threshold` is an optional argument. Default is 0.5. + Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659). + + +## Deployment + +Please refer to [depolyment](../../deploy/README_en.md) + +## Model Compression + +Please refer to [slim](../../configs/slim/README_en.md) diff --git a/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED_cn.md b/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..c0230f514746344b0d2ad1f1c36f8c68c5c4e45d --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/GETTING_STARTED_cn.md @@ -0,0 +1,266 @@ +[English](GETTING_STARTED.md) | 简体中文 + + +# 30分钟快速上手PaddleDetection + +PaddleDetection作为成熟的目标检测开发套件,提供了从数据准备、模型训练、模型评估、模型导出到模型部署的全流程。在这个章节里面,我们以路标检测数据集为例,提供快速上手PaddleDetection的流程。 + +## 1 安装 + +关于安装配置运行环境,请参考[安装指南](INSTALL_cn.md) +在本演示案例中,假定用户将PaddleDetection的代码克隆并放置在`/home/paddle`目录中。用户执行的命令操作均在`/home/paddle/PaddleDetection`目录下完成 + +## 2 准备数据 +目前PaddleDetection支持:COCO VOC WiderFace, MOT四种数据格式。 +- 首先按照[准备数据文档](./data/PrepareDetDataSet.md) 准备数据。 +- 然后设置`configs/datasets`中相应的coco或voc等数据配置文件中的数据路径。 +- 在本项目中,我们使用路标识别数据集 + ```bash +python dataset/roadsign_voc/download_roadsign_voc.py +``` +- 下载后的数据格式为 +``` + ├── download_roadsign_voc.py + ├── annotations + │ ├── road0.xml + │ ├── road1.xml + │ | ... + ├── images + │ ├── road0.png + │ ├── road1.png + │ | ... + ├── label_list.txt + ├── train.txt + ├── valid.txt +``` + +## 3 配置文件改动和说明 +我们使用`configs/yolov3/yolov3_mobilenet_v1_roadsign`配置进行训练。 +在静态图版本下,一个模型往往可以通过两个配置文件(一个主配置文件、一个reader的读取配置)实现,在PaddleDetection 2.0后续版本,采用了模块解耦设计,用户可以组合配置模块实现检测器,并可自由修改覆盖各模块配置,如下图所示 + + +
    + +
    +
    配置文件摘要

    + + +从上图看到`yolov3_mobilenet_v1_roadsign.yml`配置需要依赖其他的配置文件。在该例子中需要依赖: + +```bash + roadsign_voc.yml + + runtime.yml + + optimizer_40e.yml + + yolov3_mobilenet_v1.yml + + yolov3_reader.yml +-------------------------------------- + + +yolov3_mobilenet_v1_roadsign 文件入口 + +roadsign_voc 主要说明了训练数据和验证数据的路径 + +runtime.yml 主要说明了公共的运行参数,比如说是否使用GPU、每多少个epoch存储checkpoint等 + +optimizer_40e.yml 主要说明了学习率和优化器的配置。 + +ppyolov2_r50vd_dcn.yml 主要说明模型、和主干网络的情况。 + +ppyolov2_reader.yml 主要说明数据读取器配置,如batch size,并发加载子进程数等,同时包含读取后预处理操作,如resize、数据增强等等 + + +``` + +
    + +
    配置文件结构说明

    + +### 修改配置文件说明 +* 关于数据的路径修改说明 +在修改配置文件中,用户如何实现自定义数据集是非常关键的一步,如何定义数据集请参考[如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140) +* 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8) +* 更多使用问题,请参考[FAQ](FAQ) + +## 4 训练 + +PaddleDetection提供了单卡/多卡训练模式,满足用户多种训练需求 +* GPU单卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml +``` + +* GPU多卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml +``` + +* [GPU多机多卡训练](./DistributedTraining_cn.md) +```bash +$fleetrun \ +--ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ +``` + +* Fine-tune其他任务 + + 使用预训练模型fine-tune其他任务时,可以直接加载预训练模型,形状不匹配的参数将自动忽略,例如: + +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 + # 如果模型中参数形状与加载权重形状不同,将不会加载这类参数 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o pretrain_weights=output/model_final +``` + +* 模型恢复训练 + + 在日常训练过程中,有的用户由于一些原因导致训练中断,用户可以使用-r的命令恢复训练 + +```bash +export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -r output/faster_rcnn_r50_1x_coco/10000 + ``` + +## 5 评估 +* 默认将训练生成的模型保存在当前`output`文件夹下 + ```bash +export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 +python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams +``` +* 边训练,边评估 + +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval +``` + + 在训练中交替执行评估, 评估在每个epoch训练结束后开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。 + + 如果验证集很大,测试将会比较耗时,建议调整`configs/runtime.yml` 文件中的 `snapshot_epoch`配置以减少评估次数,或训练完成后再进行评估。 + +- 通过json文件评估 + +```bash +export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 +python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ + --json_eval \ + -output_eval evaluation/ +``` +* 上述命令中没有加载模型的选项,则使用配置文件中weights的默认配置,`weights`表示训练过程中保存的最后一轮模型文件 + +* json文件必须命名为bbox.json或者mask.json,放在`evaluation`目录下。 + +## 6 预测 + + ```bash + python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams + ``` + * 设置参数预测 + + ```bash + export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 + python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ + --infer_img=demo/road554.png \ + --output_dir=infer_output/ \ + --draw_threshold=0.5 \ + -o weights=output/yolov3_mobilenet_v1_roadsign/model_final \ + --use_vdl=True + ``` + + `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 + `keep_top_k`表示设置输出目标的最大数量,默认值为100,用户可以根据自己的实际情况进行设定。 + +结果如下图: + +![road554 image](../images/road554.png) + +## 7 训练可视化 + +当打开`use_vdl`开关后,为了方便用户实时查看训练过程中状态,PaddleDetection集成了VisualDL可视化工具,当打开`use_vdl`开关后,记录的数据包括: +1. loss变化趋势 +2. mAP变化趋势 + +```bash +export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml + --use_vdl=true \ + --vdl_log_dir=vdl_dir/scalar \ +``` + +使用如下命令启动VisualDL查看日志 +```shell +# 下述命令会在127.0.0.1上启动一个服务,支持通过前端web页面查看,可以通过--host这个参数指定实际ip地址 +visualdl --logdir vdl_dir/scalar/ +``` + +在浏览器输入提示的网址,效果如下: +
    + +
    图:VDL效果演示

    + +**参数列表** + +以下列表可以通过`--help`查看 + +| FLAG | 支持脚本 | 用途 | 默认值 | 备注 | +| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | +| -c | ALL | 指定配置文件 | None | **必选**,例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml | +| -o | ALL | 设置或更改配置文件里的参数内容 | None | 相较于`-c`设置的配置文件有更高优先级,例如:`-o use_gpu=False` | +| --eval | train | 是否边训练边测试 | False | 如需指定,直接`--eval`即可 | +| -r/--resume_checkpoint | train | 恢复训练加载的权重路径 | None | 例如:`-r output/faster_rcnn_r50_1x_coco/10000` | +| --slim_config | ALL | 模型压缩策略配置文件 | None | 例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` | +| --use_vdl | train/infer | 是否使用[VisualDL](https://github.com/paddlepaddle/visualdl)记录数据,进而在VisualDL面板中显示 | False | VisualDL需Python>=3.5 | +| --vdl\_log_dir | train/infer | 指定 VisualDL 记录数据的存储路径 | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL需Python>=3.5 | +| --output_eval | eval | 评估阶段保存json路径 | None | 例如 `--output_eval=eval_output`, 默认为当前路径 | +| --json_eval | eval | 是否通过已存在的bbox.json或者mask.json进行评估 | False | 如需指定,直接`--json_eval`即可, json文件路径在`--output_eval`中设置 | +| --classwise | eval | 是否评估单类AP和绘制单类PR曲线 | False | 如需指定,直接`--classwise`即可 | +| --output_dir | infer/export_model | 预测后结果或导出模型保存路径 | `./output` | 例如`--output_dir=output` | +| --draw_threshold | infer | 可视化时分数阈值 | 0.5 | 例如`--draw_threshold=0.7` | +| --infer_dir | infer | 用于预测的图片文件夹路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个 | +| --infer_img | infer | 用于预测的图片路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个,`infer_img`具有更高优先级 | +| --save_results | infer | 是否在文件夹下将图片的预测结果保存到文件中 | False | 可选 | + + +## 8 模型导出 + +在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程,在实际的工业部署则不需要反向传播,因此需要将模型进行导成部署需要的模型格式。 +在PaddleDetection中提供了 `tools/export_model.py`脚本来导出模型 + +```bash +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --output_dir=./inference_model \ + -o weights=output/yolov3_mobilenet_v1_roadsign/best_model +``` +预测模型会导出到`inference_model/yolov3_mobilenet_v1_roadsign`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`,`model.pdmodel` 如果不指定文件夹,模型则会导出在`output_inference` + +* 更多关于模型导出的文档,请参考[模型导出文档](../../deploy/EXPORT_MODEL.md) + +## 9 模型压缩 + +为了进一步对模型进行优化,PaddleDetection提供了基于PaddleSlim进行模型压缩的完整教程和benchmark。目前支持的方案: +* 裁剪 +* 量化 +* 蒸馏 +* 联合策略 +* 更多关于模型压缩的文档,请参考[模型压缩文档](../../configs/slim/README.md)。 +## 10 预测部署 +PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署形式,支持服务端、移动端、嵌入式等多种平台,提供了完善的Python和C++部署方案。 +* 在这里,我们以Python为例,说明如何使用PaddleInference进行模型部署 +```bash +python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=demo/road554.png --device=GPU +``` +* 同时`infer.py`提供了丰富的接口,用户进行接入视频文件、摄像头进行预测,更多内容请参考[Python端预测部署](../../deploy/python) +### PaddleDetection支持的部署形式说明 +|形式|语言|教程|设备/平台| +|-|-|-|-| +|PaddleInference|Python|已完善|Linux(arm X86)、Windows +|PaddleInference|C++|已完善|Linux(arm X86)、Windows| +|PaddleServing|Python|已完善|Linux(arm X86)、Windows| +|PaddleLite|C++|已完善|Android、IOS、FPGA、RK... + +* 更多关于预测部署的文档,请参考[预测部署文档](../../deploy/README.md)。 diff --git a/PaddleDetection-release-2.6/docs/tutorials/GradCAM_cn.md b/PaddleDetection-release-2.6/docs/tutorials/GradCAM_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..e214e48b18c982b0e9e3752e14bacd910616d36f --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/GradCAM_cn.md @@ -0,0 +1,69 @@ +# 目标检测热力图 + +## 1.简介 + +基于backbone/roi特征图计算物体预测框的cam(类激活图), 目前支持基于FasterRCNN/MaskRCNN系列, PPYOLOE系列, 以及BlazeFace, SSD, Retinanet网络。 + +## 2.使用方法 +* 以PP-YOLOE为例,准备好数据之后,指定网络配置文件、模型权重地址和图片路径以及输出文件夹路径,使用脚本调用tools/cam_ppdet.py计算图片中物体预测框的grad_cam热力图。下面为运行脚本示例。 +```shell +python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +* **参数** + +| FLAG | 用途 | +|:--------------------------:|:--------------------------------------------------------------------------------------------------------------------------:| +| -c | 指定配置文件 | +| --infer_img | 用于预测的图片路径 | +| --cam_out | 指定输出路径 | +| --target_feature_layer_name | 计算cam的特征图位置, 如model.backbone、 model.bbox_head.roi_extractor | +| -o | 设置或更改配置文件里的参数内容, 如 -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams | + +* 运行效果 + +
    + +
    +
    cam_ppyoloe/225.jpg

    + +## 3. 目前支持基于FasterRCNN/MaskRCNN系列, PPYOLOE系列以及BlazeFace, SSD, Retinanet网络。 +* PPYOLOE网络热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +* MaskRCNN网络roi特征热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams +``` + +* MaskRCNN网络backbone特征的热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams +``` + +* FasterRCNN网络基于roi特征的热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams +``` + +* FasterRCNN网络基于backbone特征的热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams +``` + +* BlaczeFace网络backbone特征热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/face_detection/blazeface_1000e.yml --infer_img demo/hrnet_demo.jpg --cam_out cam_blazeface --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams +``` + +* SSD网络backbone特征热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --infer_img demo/000000014439.jpg --cam_out cam_ssd --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams +``` + +* Retinanet网络backbone特征热图可视化脚本 +```bash +python tools/cam_ppdet.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_retinanet --target_feature_layer_name model.backbone -o weights=https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_2x_coco.pdparams +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/GradCAM_en.md b/PaddleDetection-release-2.6/docs/tutorials/GradCAM_en.md new file mode 100644 index 0000000000000000000000000000000000000000..e4a5fd40ad2efec25d116460f57bcde32b3c1aa1 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/GradCAM_en.md @@ -0,0 +1,71 @@ +# Object detection grad_cam heatmap + +## 1.Introduction +Calculate the cam (class activation map) of the object predict bbox based on the backbone/roi feature map, currently supports networks based on FasterRCNN/MaskRCNN series, PPYOLOE series and BlazeFace, SSD, Retinanet. + +## 2.Usage +* Taking PP-YOLOE as an example, after preparing the data, specify the network configuration file, model weight address, image path and output folder path, and then use the script to call tools/cam_ppdet.py to calculate the grad_cam heat map of the prediction box. Below is an example run script. +```shell +python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +* **Arguments** + +| FLAG | description | +| :----------------------: |:---------------------------------------------------------------------------------------------------------------------------------:| +| -c | Select config file | +| --infer_img | Image path | +| --cam_out | Directory for output | +| --target_feature_layer_name | The position of featuremap to do gradcam, for example:model.backbone, model.bbox_head.roi_extractor | +| -o | Set parameters in configure file, for example: -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams | + +* result + +
    + +
    +
    cam_ppyoloe/225.jpg

    + + +## 3.Currently supports networks based on FasterRCNN/MaskRCNN series, PPYOLOE series and BlazeFace, SSD, Retinanet. +* PPYOLOE bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` + +* MaskRCNN bbox heat map visualization script (with roi featuremap) +```bash +python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams +``` + +* MaskRCNN bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams +``` + +* FasterRCNN bbox heat map visualization script (with roi featuremap) +```bash +python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams +``` + +* FasterRCNN bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams +``` + +* BlaczeFace bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/face_detection/blazeface_1000e.yml --infer_img demo/hrnet_demo.jpg --cam_out cam_blazeface --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams +``` + +* SSD bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --infer_img demo/000000014439.jpg --cam_out cam_ssd --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams +``` + +* Retinanet bbox heat map visualization script (with backbone featuremap) +```bash +python tools/cam_ppdet.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_retinanet --target_feature_layer_name model.backbone -o weights=https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_2x_coco.pdparams +``` + + diff --git a/PaddleDetection-release-2.6/docs/tutorials/INSTALL.md b/PaddleDetection-release-2.6/docs/tutorials/INSTALL.md new file mode 100644 index 0000000000000000000000000000000000000000..4f9ebd00d8f6420c4bd60d7131795866b8e9887f --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/INSTALL.md @@ -0,0 +1,138 @@ +English | [简体中文](INSTALL_cn.md) + +# Installation + + +This document covers how to install PaddleDetection and its dependencies +(including PaddlePaddle), together with COCO and Pascal VOC dataset. + +For general information about PaddleDetection, please see [README.md](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6). + +## Requirements: + +- PaddlePaddle 2.2 +- OS 64 bit +- Python 3(3.5.1+/3.6/3.7/3.8/3.9/3.10),64 bit +- pip/pip3(9.0.1+), 64 bit +- CUDA >= 10.2 +- cuDNN >= 7.6 + + +Dependency of PaddleDetection and PaddlePaddle: + +| PaddleDetection version | PaddlePaddle version | tips | +| :----------------: | :---------------: | :-------: | +| develop | >= 2.3.2 | Dygraph mode is set as default | +| release/2.6 | >= 2.3.2 | Dygraph mode is set as default | +| release/2.5 | >= 2.2.2 | Dygraph mode is set as default | +| release/2.4 | >= 2.2.2 | Dygraph mode is set as default | +| release/2.3 | >= 2.2.0rc | Dygraph mode is set as default | +| release/2.2 | >= 2.1.2 | Dygraph mode is set as default | +| release/2.1 | >= 2.1.0 | Dygraph mode is set as default | +| release/2.0 | >= 2.0.1 | Dygraph mode is set as default | +| release/2.0-rc | >= 2.0.1 | -- | +| release/0.5 | >= 1.8.4 | Cascade R-CNN and SOLOv2 depends on 2.0.0.rc | +| release/0.4 | >= 1.8.4 | PP-YOLO depends on 1.8.4 | +| release/0.3 | >=1.7 | -- | + + +## Instruction + +### 1. Install PaddlePaddle + +``` + +# CUDA10.2 +python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple + +# CPU +python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple +``` + +- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick) +- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html) + +Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify. + +``` +# check +>>> import paddle +>>> paddle.utils.run_check() + +# confirm the paddle's version +python -c "import paddle; print(paddle.__version__)" +``` + +**Note** + +1. If you want to use PaddleDetection on multi-GPU, please install NCCL at first. + + +### 2. Install PaddleDetection + + + +**Note:** Installing via pip only supports Python3 + +``` + +# Clone PaddleDetection repository +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# Install other dependencies +cd PaddleDetection +pip install -r requirements.txt + +# Compile and install paddledet +python setup.py install + +``` + +**Note** + +1. If you are working on Windows OS, `pycocotools` installing may failed because of the origin version of cocoapi does not support windows, another version can be used used which only supports Python3: + + ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI``` + +2. If you are using Python <= 3.6, `pycocotools` installing may failed with error like `distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, please install `cython` firstly, for example `pip install cython` + +After installation, make sure the tests pass: + +```shell +python ppdet/modeling/tests/test_architectures.py +``` + +If the tests are passed, the following information will be prompted: + +``` +....... +---------------------------------------------------------------------- +Ran 7 tests in 12.816s +OK +``` + +## Use built Docker images + +> If you do not have a Docker environment, please refer to [Docker](https://www.docker.com/). + +We provide docker images containing the latest PaddleDetection code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleDetection without any extra steps. + +Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection), including CPU, GPU, ROCm environment versions. + +If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton). + +## Inference demo + +**Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo: + +``` +# Predict an image by GPU +export CUDA_VISIBLE_DEVICES=0 +python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg +``` + +An image of the same name with the predicted result will be generated under the `output` folder. +The result is as shown below: + +![](../images/000000014439.jpg) diff --git a/PaddleDetection-release-2.6/docs/tutorials/INSTALL_cn.md b/PaddleDetection-release-2.6/docs/tutorials/INSTALL_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..9b0313c5254ba4fa143f892fd060f6fdb4fc3a22 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/INSTALL_cn.md @@ -0,0 +1,130 @@ +[English](INSTALL.md) | 简体中文 + + +# 安装文档 + + + +## 环境要求 + +- PaddlePaddle 2.3.2 +- OS 64位操作系统 +- Python 3(3.5.1+/3.6/3.7/3.8/3.9/3.10),64位版本 +- pip/pip3(9.0.1+),64位版本 +- CUDA >= 10.2 +- cuDNN >= 7.6 + +PaddleDetection 依赖 PaddlePaddle 版本关系: + +| PaddleDetection版本 | PaddlePaddle版本 | 备注 | +| :------------------: | :---------------: | :-------: | +| develop | >=2.3.2 | 默认使用动态图模式 | +| release/2.6 | >=2.3.2 | 默认使用动态图模式 | +| release/2.5 | >= 2.2.2 | 默认使用动态图模式 | +| release/2.4 | >= 2.2.2 | 默认使用动态图模式 | +| release/2.3 | >= 2.2.0rc | 默认使用动态图模式 | +| release/2.2 | >= 2.1.2 | 默认使用动态图模式 | +| release/2.1 | >= 2.1.0 | 默认使用动态图模式 | +| release/2.0 | >= 2.0.1 | 默认使用动态图模式 | +| release/2.0-rc | >= 2.0.1 | -- | +| release/0.5 | >= 1.8.4 | 大部分模型>=1.8.4即可运行,Cascade R-CNN系列模型与SOLOv2依赖2.0.0.rc版本 | +| release/0.4 | >= 1.8.4 | PP-YOLO依赖1.8.4 | +| release/0.3 | >=1.7 | -- | + +## 安装说明 + +### 1. 安装PaddlePaddle + +``` +# CUDA10.2 +python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple + +# CPU +python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple +``` +- 更多CUDA版本或环境快速安装,请参考[PaddlePaddle快速安装文档](https://www.paddlepaddle.org.cn/install/quick) +- 更多安装方式例如conda或源码编译安装方法,请参考[PaddlePaddle安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html) + +请确保您的PaddlePaddle安装成功并且版本不低于需求版本。使用以下命令进行验证。 + +``` +# 在您的Python解释器中确认PaddlePaddle安装成功 +>>> import paddle +>>> paddle.utils.run_check() + +# 确认PaddlePaddle版本 +python -c "import paddle; print(paddle.__version__)" +``` +**注意** +1. 如果您希望在多卡环境下使用PaddleDetection,请首先安装NCCL + +### 2. 安装PaddleDetection + + + + +**注意:** pip安装方式只支持Python3 + + + +``` +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt + +# 编译安装paddledet +python setup.py install +``` + +**注意** +1. 如果github下载代码较慢,可尝试使用[gitee](https://gitee.com/PaddlePaddle/PaddleDetection.git)或者[代理加速](https://doc.fastgit.org/zh-cn/guide.html)。 + +1. 若您使用的是Windows系统,由于原版cocoapi不支持Windows,`pycocotools`依赖可能安装失败,可采用第三方实现版本,该版本仅支持Python3 + + ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI``` + +2. 若您使用的是Python <= 3.6的版本,安装`pycocotools`可能会报错`distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, 您可通过先安装`cython`如`pip install cython`解决该问题 + + +安装后确认测试通过: + +``` +python ppdet/modeling/tests/test_architectures.py +``` + +测试通过后会提示如下信息: + +``` +....... +---------------------------------------------------------------------- +Ran 7 tests in 12.816s +OK +``` + +## 使用Docker镜像 +> 如果您没有Docker运行环境,请参考[Docker官网](https://www.docker.com/)进行安装。 + +我们提供了包含最新 PaddleDetection 代码的docker镜像,并预先安装好了所有的环境和库依赖,您只需要**拉取docker镜像**,然后**运行docker镜像**,无需其他任何额外操作,即可开始使用PaddleDetection的所有功能。 + +在[Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection)中获取这些镜像及相应的使用指南,包括CPU、GPU、ROCm版本。 +如果您对自动化制作docker镜像感兴趣,或有自定义需求,请访问[PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton)做进一步了解。 + +## 快速体验 + +**恭喜!** 您已经成功安装了PaddleDetection,接下来快速体验目标检测效果 + +``` +# 在GPU上预测一张图片 +export CUDA_VISIBLE_DEVICES=0 +python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg +``` + +会在`output`文件夹下生成一个画有预测结果的同名图像。 + +结果如下图: + +![](../images/000000014439.jpg) diff --git a/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_cn.md b/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..9675849c8aef7564df4a944690ac39df7296b4c4 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_cn.md @@ -0,0 +1,299 @@ +**# config yaml配置项说明** + +KeyPoint 使用时config文件配置项说明,以[tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml)为例 + +```yaml +use_gpu: true #是否使用gpu训练 + +log_iter: 5 #打印log的iter间隔 + +save_dir: output #模型保存目录 + +snapshot_epoch: 10 #保存模型epoch间隔 + +weights: output/tinypose_256x192/model_final #测试加载模型路径(不含后缀“.pdparams”) + +epoch: 420 #总训练epoch数量 + +num_joints: &num_joints 17 #关键点数量 + +pixel_std: &pixel_std 200 #变换时相对比率像素(无需关注,不动就行) + +metric: KeyPointTopDownCOCOEval #metric评估函数 + +num_classes: 1 #种类数(检测模型用,不需关注) + +train_height: &train_height 256 #模型输入尺度高度变量设置 + +train_width: &train_width 192 #模型输入尺度宽度变量设置 + +trainsize: &trainsize [*train_width, *train_height] #模型输入尺寸,使用已定义变量 + +hmsize: &hmsize [48, 64] #输出热力图尺寸(宽,高) + +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #左右关键点经图像翻转时对应关系,例如:图像翻转后,左手腕变成了右手腕,右手腕变成了左手腕 + + + + + +\#####model + +architecture: TopDownHRNet #模型框架结构类选择 + + + +TopDownHRNet: #TopDownHRNet相关配置 + + backbone: LiteHRNet #模型主干网络 + + post_process: HRNetPostProcess #模型后处理类 + + flip_perm: *flip_perm #同上flip_perm + + num_joints: *num_joints #关键点数量(输出通道数量) + + width: &width 40 #backbone输出通道数 + + loss: KeyPointMSELoss #loss函数选择 + + use_dark: true #是否使用DarkPose后处理 + + + +LiteHRNet: #LiteHRNet相关配置 + + network_type: wider_naive #网络结构类型选择 + + freeze_at: -1 #梯度截断branch id,截断则该branch梯度不会反传 + + freeze_norm: false #是否固定normalize层参数 + + return_idx: [0] #返回feature的branch id + + + +KeyPointMSELoss: #Loss相关配置 + + use_target_weight: true #是否使用关键点权重 + + loss_scale: 1.0 #loss比率调整,1.0表示不变 + + + +\#####optimizer + +LearningRate: #学习率相关配置 + + base_lr: 0.002 #初始基础学习率 + + schedulers: + + \- !PiecewiseDecay #衰减策略 + +​ milestones: [380, 410] #衰减时间对应epoch次数 + +​ gamma: 0.1 #衰减率 + + \- !LinearWarmup #Warmup策略 + +​ start_factor: 0.001 #warmup初始学习率比率 + +​ steps: 500 #warmup所用iter次数 + + + +OptimizerBuilder: #学习策略设置 + + optimizer: + +​ type: Adam #学习策略Adam + + regularizer: + +​ factor: 0.0 #正则项权重 + +​ type: L2 #正则类型L2/L1 + + + + + +\#####data + +TrainDataset: #训练数据集设置 + + !KeypointTopDownCocoDataset #数据加载类 + +​ image_dir: "" #图片文件夹,对应dataset_dir/image_dir + +​ anno_path: aic_coco_train_cocoformat.json #训练数据Json文件,coco格式 + +​ dataset_dir: dataset #训练数据集所在路径,image_dir、anno_path路径基于此目录 + +​ num_joints: *num_joints #关键点数量,使用已定义变量 + +​ trainsize: *trainsize #训练使用尺寸,使用已定义变量 + +​ pixel_std: *pixel_std #同上pixel_std + +​ use_gt_bbox: True #是否使用gt框 + + + + + +EvalDataset: #评估数据集设置 + + !KeypointTopDownCocoDataset #数据加载类 + +​ image_dir: val2017 #图片文件夹 + +​ anno_path: annotations/person_keypoints_val2017.json #评估数据Json文件,coco格式 + +​ dataset_dir: dataset/coco #数据集路径,image_dir、anno_path路径基于此目录 + +​ num_joints: *num_joints #关键点数量,使用已定义变量 + +​ trainsize: *trainsize #训练使用尺寸,使用已定义变量 + +​ pixel_std: *pixel_std #同上pixel_std + +​ use_gt_bbox: True #是否使用gt框,一般测试时用 + +​ image_thre: 0.5 #检测框阈值设置,测试时使用非gt_bbox时用 + + + +TestDataset: #纯测试数据集设置,无label + + !ImageFolder #数据加载类,图片文件夹类型 + +​ anno_path: dataset/coco/keypoint_imagelist.txt #测试图片列表文件 + + + +worker_num: 2 #数据加载worker数量,一般2-4,太多可能堵塞 + +global_mean: &global_mean [0.485, 0.456, 0.406] #全局均值变量设置 + +global_std: &global_std [0.229, 0.224, 0.225] #全局方差变量设置 + +TrainReader: #训练数据加载类设置 + + sample_transforms: #数据预处理变换设置 + +​ \- RandomFlipHalfBodyTransform: #随机翻转&随机半身变换类 + +​ scale: 0.25 #最大缩放尺度比例 + +​ rot: 30 #最大旋转角度 + +​ num_joints_half_body: 8 #关键点小于此数不做半身变换 + +​ prob_half_body: 0.3 #半身变换执行概率(满足关键点数量前提下) + +​ pixel_std: *pixel_std #同上pixel_std + +​ trainsize: *trainsize #训练尺度,同上trainsize + +​ upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身关键点id + +​ flip_pairs: *flip_perm #左右关键点对应关系,同上flip_perm + +​ \- AugmentationbyInformantionDropping: + +​ prob_cutout: 0.5 #随机擦除变换概率 + +​ offset_factor: 0.05 #擦除位置中心点随机波动范围相对图片宽度比例 + +​ num_patch: 1 #擦除位置数量 + +​ trainsize: *trainsize #同上trainsize + +​ \- TopDownAffine: + +​ trainsize: *trainsize #同上trainsize + +​ use_udp: true #是否使用udp_unbias(flip测试使用) + +​ \- ToHeatmapsTopDown_DARK: #生成热力图gt类 + +​ hmsize: *hmsize #热力图尺寸 + +​ sigma: 2 #生成高斯核sigma值设置 + + batch_transforms: + +​ \- NormalizeImage: #图像归一化类 + +​ mean: *global_mean #均值设置,使用已有变量 + +​ std: *global_std #方差设置,使用已有变量 + +​ is_scale: true #图像元素是否除255.,即[0,255]到[0,1] + +​ \- Permute: {} #通道变换HWC->CHW,一般都需要 + + batch_size: 128 #训练时batchsize + + shuffle: true #数据集是否shuffle + + drop_last: false #数据集对batchsize取余数量是否丢弃 + + + +EvalReader: + + sample_transforms: #数据预处理变换设置,意义同TrainReader + +​ \- TopDownAffine: #Affine变换设置 + +​ trainsize: *trainsize #训练尺寸同上trainsize,使用已有变量 + +​ use_udp: true #是否使用udp_unbias,与训练需对应 + + batch_transforms: + +​ \- NormalizeImage: #图片归一化,与训练需对应 + +​ mean: *global_mean + +​ std: *global_std + +​ is_scale: true + +​ \- Permute: {} #通道变换HWC->CHW + + batch_size: 16 #测试时batchsize + + + +TestReader: + + inputs_def: + +​ image_shape: [3, *train_height, *train_width] #输入数据维度设置,CHW + + sample_transforms: + +​ \- Decode: {} #图片加载 + +​ \- TopDownEvalAffine: #Affine类,Eval时用 + +​ trainsize: *trainsize #输入图片尺度 + +​ \- NormalizeImage: #输入图像归一化 + +​ mean: *global_mean #均值 + +​ std: *global_std #方差 + +​ is_scale: true #图像元素是否除255.,即[0,255]到[0,1] + +​ \- Permute: {} #通道变换HWC->CHW + + batch_size: 1 #Test batchsize + + fuse_normalize: false #导出模型时是否内融合归一化操作(若是,预处理中可省略normalize,可以加快pipeline速度) +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_en.md b/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_en.md new file mode 100644 index 0000000000000000000000000000000000000000..8ad8218810ec6fee170ba17dce8dd874edd32155 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/KeyPointConfigGuide_en.md @@ -0,0 +1,299 @@ +**# config yaml guide** + +KeyPoint config guide,Take an example of [tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +```yaml +use_gpu: true #train with gpu or not + +log_iter: 5 #print log every 5 iter + +save_dir: output #the directory to save model + +snapshot_epoch: 10 #save model every 10 epochs + +weights: output/tinypose_256x192/model_final #the weight to load(without postfix “.pdparams”) + +epoch: 420 #the total epoch number to train + +num_joints: &num_joints 17 #number of joints + +pixel_std: &pixel_std 200 #the standard pixel length(don't care) + +metric: KeyPointTopDownCOCOEval #metric function + +num_classes: 1 #number of classes(just for object detection, don't care) + +train_height: &train_height 256 #the height of model input + +train_width: &train_width 192 #the width of model input + +trainsize: &trainsize [*train_width, *train_height] #the shape of model input + +hmsize: &hmsize [48, 64] #the shape of model output + +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #the correspondence between left and right keypoint id, for example: left wrist become right wrist after image flip, and also the right wrist becomes left wrist + + + + + +\#####model + +architecture: TopDownHRNet #the model architecture + + + +TopDownHRNet: #TopDownHRNet configs + + backbone: LiteHRNet #which backbone to use + + post_process: HRNetPostProcess #the post_process to use + + flip_perm: *flip_perm #same to the upper "flip_perm" + + num_joints: *num_joints #the joint number(the number of output channels) + + width: &width 40 #backbone output channels + + loss: KeyPointMSELoss #loss funciton + + use_dark: true #whther to use DarkPose in postprocess + + + +LiteHRNet: #LiteHRNet configs + + network_type: wider_naive #the network type of backbone + + freeze_at: -1 #the branch match this id doesn't backward,-1 means all branch backward + + freeze_norm: false #whether to freeze normalize weights + + return_idx: [0] #the branch id to fetch features + + + +KeyPointMSELoss: #Loss configs + + use_target_weight: true #whether to use target weights + + loss_scale: 1.0 #loss weights,finalloss = loss*loss_scale + + + +\#####optimizer + +LearningRate: #LearningRate configs + + base_lr: 0.002 #the original base learning rate + + schedulers: + + \- !PiecewiseDecay #the scheduler to adjust learning rate + +​ milestones: [380, 410] #the milestones(epochs) to adjust learning rate + +​ gamma: 0.1 #the ratio to adjust learning rate, new_lr = lr*gamma + + \- !LinearWarmup #Warmup configs + +​ start_factor: 0.001 #the original ratio with respect to base_lr + +​ steps: 500 #iters used to warmup + + + +OptimizerBuilder: #Optimizer type configs + + optimizer: + +​ type: Adam #optimizer type: Adam + + regularizer: + +​ factor: 0.0 #the regularizer weight + +​ type: L2 #regularizer type: L2/L1 + + + + + +\#####data + +TrainDataset: #Train Dataset configs + + !KeypointTopDownCocoDataset #the dataset class to load data + +​ image_dir: "" #the image directory, relative to dataset_dir + +​ anno_path: aic_coco_train_cocoformat.json #the train datalist,coco format, relative to dataset_dir + +​ dataset_dir: dataset #the dataset directory, the image_dir and anno_path based on this directory + +​ num_joints: *num_joints #joint numbers + +​ trainsize: *trainsize #the input size of model + +​ pixel_std: *pixel_std #same to the upper "pixel_std" + +​ use_gt_bbox: True #whether to use gt bbox, commonly used in eval + + + + + +EvalDataset: #Eval Dataset configs + + !KeypointTopDownCocoDataset #the dataset class to load data + +​ image_dir: val2017 #the image directory, relative to dataset_dir + +​ anno_path: annotations/person_keypoints_val2017.json #the eval datalist,coco format, relative to dataset_dir + +​ dataset_dir: dataset/coco #the dataset directory, the image_dir and anno_path based on this directory + +​ num_joints: *num_joints #joint numbers + +​ trainsize: *trainsize #the input size of model + +​ pixel_std: *pixel_std #same to the upper "pixel_std" + +​ use_gt_bbox: True #whether to use gt bbox, commonly used in eval + +​ image_thre: 0.5 #the threshold of detected rect, used while use_gt_bbox is False + + + +TestDataset: #the test dataset without label + + !ImageFolder #the class to load data, find images by folder + +​ anno_path: dataset/coco/keypoint_imagelist.txt #the image list file + + + +worker_num: 2 #the workers to load Dataset + +global_mean: &global_mean [0.485, 0.456, 0.406] #means used to normalize image + +global_std: &global_std [0.229, 0.224, 0.225] #stds used to normalize image + +TrainReader: #TrainReader configs + + sample_transforms: #transform configs + +​ \- RandomFlipHalfBodyTransform: #random flip & random HalfBodyTransform + +​ scale: 0.25 #the maximum scale for size transform + +​ rot: 30 #the maximum rotation to transoform + +​ num_joints_half_body: 8 #the HalfBodyTransform is skiped while joints found is less than this number + +​ prob_half_body: 0.3 #the ratio of halfbody transform + +​ pixel_std: *pixel_std #same to upper "pixel_std" + +​ trainsize: *trainsize #the input size of model + +​ upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #the joint id which is belong to upper body + +​ flip_pairs: *flip_perm #same to the upper "flip_perm" + +​ \- AugmentationbyInformantionDropping: + +​ prob_cutout: 0.5 #the probability to cutout keypoint + +​ offset_factor: 0.05 #the jitter offset of cutout position, expressed as a percentage of trainwidth + +​ num_patch: 1 #the numbers of area to cutout + +​ trainsize: *trainsize #same to upper "trainsize" + +​ \- TopDownAffine: + +​ trainsize: *trainsize #same to upper "trainsize" + +​ use_udp: true #whether to use udp_unbias(just for flip eval) + +​ \- ToHeatmapsTopDown_DARK: #generate gt heatmaps + +​ hmsize: *hmsize #the size of output heatmaps + +​ sigma: 2 #the sigma of gaussin kernel which used to generate gt heatmaps + + batch_transforms: + +​ \- NormalizeImage: #image normalize class + +​ mean: *global_mean #mean of normalize + +​ std: *global_std #std of normalize + +​ is_scale: true #whether scale by 1/255 to every image pixels,transform pixel from [0,255] to [0,1] + +​ \- Permute: {} #channel transform from HWC to CHW + + batch_size: 128 #batchsize used for train + + shuffle: true #whether to shuffle the images before train + + drop_last: false #whether drop the last images which is not enogh for batchsize + + + +EvalReader: + + sample_transforms: #transform configs + +​ \- TopDownAffine: #Affine configs + +​ trainsize: *trainsize #same to upper "trainsize" + +​ use_udp: true #whether to use udp_unbias(just for flip eval) + + batch_transforms: + +​ \- NormalizeImage: #image normalize, the values should be same to values in TrainReader + +​ mean: *global_mean + +​ std: *global_std + +​ is_scale: true + +​ \- Permute: {} #channel transform from HWC to CHW + + batch_size: 16 #batchsize used for test + + + +TestReader: + + inputs_def: + +​ image_shape: [3, *train_height, *train_width] #the input dimensions used in model,CHW + + sample_transforms: + +​ \- Decode: {} #load image + +​ \- TopDownEvalAffine: #Affine class used in Eval + +​ trainsize: *trainsize #the input size of model + +​ \- NormalizeImage: #image normalize, the values should be same to values in TrainReader + +​ mean: *global_mean #mean of normalize + +​ std: *global_std #std of normalize + +​ is_scale: true #whether scale by 1/255 to every image pixels,transform pixel from [0,255] to [0,1] + +​ \- Permute: {} #channel transform from HWC to CHW + + batch_size: 1 #Test batchsize + + fuse_normalize: false #whether fuse the normalize into model while export model, this speedup the model infer +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED.md b/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..b4e7ae9491d38ca07154b67e9523b7ae8d77ca7a --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED.md @@ -0,0 +1,91 @@ +English | [简体中文](QUICK_STARTED_cn.md) + +# Quick Start +In order to enable users to experience PaddleDetection and produce models in a short time, this tutorial introduces the pipeline to get a decent object detection model by finetuning on a small dataset in 10 minutes only. In practical applications, it is recommended that users select a suitable model configuration file for their specific demand. + +- **Set GPU** + + +```bash +export CUDA_VISIBLE_DEVICES=0 +``` + +## Inference Demo with Pre-trained Models + +``` +# predict an image using PP-YOLO +python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg +``` + +the result: + +![](../images/000000014439.jpg) + + +## Data preparation +The Dataset is [Kaggle dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) ,including 877 images and 4 data categories: crosswalk, speedlimit, stop, trafficlight. The dataset is divided into training set (701 images) and test set (176 images),[download link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). + +``` +# Note: this command could skip and +# the dataset will be dowloaded automatically at the stage of training. +python dataset/roadsign_voc/download_roadsign_voc.py +``` + +## Training & Evaluation & Inference +### 1、Training +``` +# It will takes about 10 minutes on 1080Ti and 1 hour on CPU +# -c set configuration file +# -o overwrite the settings in the configuration file +# --eval Evaluate while training, and a model named best_model.pdmodel with the most evaluation results will be automatically saved + + +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true +``` + +If you want to observe the loss change curve in real time through VisualDL, add --use_vdl=true to the training command, and set the log save path through --vdl_log_dir. + +**Note: VisualDL need Python>=3.5** + +Please install [VisualDL](https://github.com/PaddlePaddle/VisualDL) first + +``` +python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple +``` + +``` +python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ + --use_vdl=true \ + --vdl_log_dir=vdl_dir/scalar \ + --eval +``` +View the change curve in real time through the visualdl command: +``` +visualdl --logdir vdl_dir/scalar/ --host --port +``` + +### 2、Evaluation +``` +# Evaluate best_model by default +# -c set config file +# -o overwrite the settings in the configuration file + +python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true +``` + +The final mAP should be around 0.85. The dataset is small so the precision may vary a little after each training. + + +### 3、Inference +``` +# -c set config file +# -o overwrite the settings in the configuration file +# --infer_img image path +# After the prediction is over, an image of the same name with the prediction result will be generated in the output folder + +python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png +``` + +The result is as shown below: + +![](../images/road554.png) diff --git a/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED_cn.md b/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..d201e7fc901c2aef8b1ce95ebc9ba087ed3a1d9e --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/QUICK_STARTED_cn.md @@ -0,0 +1,88 @@ +[English](QUICK_STARTED.md) | 简体中文 + +# 快速开始 +为了使得用户能够在很短时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中,建议用户根据需要选择合适模型配置文件进行适配。 + +- **设置显卡** +```bash +export CUDA_VISIBLE_DEVICES=0 +``` + +## 一、快速体验 +``` +# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片 +python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg +``` + +结果如下图: + +![demo image](../images/000000014439.jpg) + + +## 二、准备数据 +数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) ,包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 +将数据划分为训练集701张图和测试集176张图,[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). + +``` +# 注意:可跳过这步下载,后面训练会自动下载 +python dataset/roadsign_voc/download_roadsign_voc.py +``` + + +## 三、训练、评估、预测 +### 1、训练 +``` +# 边训练边测试 CPU需要约1小时(use_gpu=false),1080Ti GPU需要约10分钟 +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置),这里设置使用gpu +# --eval 参数表示边训练边评估,最后会自动保存一个名为model_final.pdparams的模型 + +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true +``` + +如果想通过VisualDL实时观察loss变化曲线,在训练命令中添加--use_vdl=true,以及通过--vdl_log_dir设置日志保存路径。 + +**但注意VisualDL需Python>=3.5** + +首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL) +``` +python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple +``` + +``` +python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ + --use_vdl=true \ + --vdl_log_dir=vdl_dir/scalar \ + --eval +``` +通过visualdl命令实时查看变化曲线: +``` +visualdl --logdir vdl_dir/scalar/ --host --port +``` + + +### 2、评估 +``` +# 评估 默认使用训练过程中保存的model_final.pdparams +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置) +# 目前只支持单卡评估 + +python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true +``` +最终模型精度在mAP=0.85左右,由于数据集较小因此每次训练结束后精度会有一定波动 + + +### 3、预测 +``` +# -c 参数表示指定使用哪个配置文件 +# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置) +# --infer_img 参数指定预测图像路径 +# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像 + +python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png +``` + +结果如下图: + +![road554 image](../images/road554.png) diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md new file mode 100644 index 0000000000000000000000000000000000000000..becfaa5862ba9d294fa677de1cd5e84f60f3df50 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md @@ -0,0 +1,261 @@ +# RCNN系列模型参数配置教程 + +标签: 模型参数配置 + +以`faster_rcnn_r50_fpn_1x_coco.yml`为例,这个模型由五个子配置文件组成: + +- 数据配置文件 `coco_detection.yml` + +```yaml +# 数据评估类型 +metric: COCO +# 数据集的类别数 +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # 数据文件夹 + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # 数据文件夹 + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json +``` + +- 优化器配置文件 `optimizer_1x.yml` + +```yaml +# 总训练轮数 +epoch: 12 + +# 学习率设置 +LearningRate: + # 默认为8卡训学习率 + base_lr: 0.01 + # 学习率调整策略 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # 学习率变化位置(轮数) + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +# 优化器 +OptimizerBuilder: + # 优化器 + optimizer: + momentum: 0.9 + type: Momentum + # 正则化 + regularizer: + factor: 0.0001 + type: L2 +``` + +- 数据读取配置文件 `faster_fpn_reader.yml` + +```yaml +# 每张GPU reader进程个数 +worker_num: 2 +# 训练数据 +TrainReader: + # 训练数据transforms + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # 由于模型存在FPN结构,输入图片需要padding为32的倍数 + - PadBatch: {pad_to_stride: 32} + # 训练时batch_size + batch_size: 1 + # 读取数据是否乱序 + shuffle: true + # 是否丢弃最后不能完整组成batch的数据 + drop_last: true + # 表示reader是否对gt进行组batch的操作,在rcnn系列算法中设置为false,得到的gt格式为list[Tensor] + collate_batch: false + +# 评估数据 +EvalReader: + # 评估数据transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # 由于模型存在FPN结构,输入图片需要padding为32的倍数 + - PadBatch: {pad_to_stride: 32} + # 评估时batch_size + batch_size: 1 + # 读取数据是否乱序 + shuffle: false + # 是否丢弃最后不能完整组成batch的数据 + drop_last: false + +# 测试数据 +TestReader: + # 测试数据transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # 由于模型存在FPN结构,输入图片需要padding为32的倍数 + - PadBatch: {pad_to_stride: 32} + # 测试时batch_size + batch_size: 1 + # 读取数据是否乱序 + shuffle: false + # 是否丢弃最后不能完整组成batch的数据 + drop_last: false +``` + +- 模型配置文件 `faster_rcnn_r50_fpn.yml` + +```yaml +# 模型结构类型 +architecture: FasterRCNN +# 预训练模型地址 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +# FasterRCNN +FasterRCNN: + # backbone + backbone: ResNet + # neck + neck: FPN + # rpn_head + rpn_head: RPNHead + # bbox_head + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +# backbone +ResNet: + # index 0 stands for res2 + depth: 50 + # norm_type,可设置参数:bn 或 sync_bn + norm_type: bn + # freeze_at index, 0 represent res2 + freeze_at: 0 + # return_idx + return_idx: [0,1,2,3] + # num_stages + num_stages: 4 + +# FPN +FPN: + # channel of FPN + out_channel: 256 + +# RPNHead +RPNHead: + # anchor generator + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + # rpn_target_assign + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + # 训练时生成proposal的参数 + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + # 评估时生成proposal的参数 + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + +# BBoxHead +BBoxHead: + # TwoFCHead as BBoxHead + head: TwoFCHead + # roi align + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + # bbox_assigner + bbox_assigner: BBoxAssigner + +# BBoxAssigner +BBoxAssigner: + # batch_size_per_im + batch_size_per_im: 512 + # 背景阈值 + bg_thresh: 0.5 + # 前景阈值 + fg_thresh: 0.5 + # 前景比例 + fg_fraction: 0.25 + # 是否随机采样 + use_random: True + +# TwoFCHead +TwoFCHead: + # TwoFCHead特征维度 + out_channel: 1024 + + +# BBoxPostProcess +BBoxPostProcess: + # 解码 + decode: RCNNBox + # nms + nms: + # 使用MultiClassNMS + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +``` + +- 运行时置文件 `runtime.yml` + +```yaml +# 是否使用gpu +use_gpu: true +# 日志打印间隔 +log_iter: 20 +# save_dir +save_dir: output +# 模型保存间隔时间 +snapshot_epoch: 1 +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md new file mode 100644 index 0000000000000000000000000000000000000000..15090f842ca771ebe225d5f2876358fa3e60a493 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md @@ -0,0 +1,261 @@ +# RCNN series model parameter configuration tutorial + +Tag: Model parameter configuration + +Take `faster_rcnn_r50_fpn_1x_coco.yml` as an example. The model consists of five sub-profiles: + +- Data profile `coco_detection.yml` + +```yaml +# Data evaluation type +metric: COCO +# The number of categories in the dataset +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # data file + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # data file file os.path.join(dataset_dir, anno_path) + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json +``` + +- Optimizer configuration file `optimizer_1x.yml` + +```yaml +# Total training epoches +epoch: 12 + +# learning rate setting +LearningRate: + # Default is 8 Gpus training learning rate + base_lr: 0.01 + # Learning rate adjustment strategy + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # Position of change in learning rate (number of epoches) + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +# Optimizer +OptimizerBuilder: + # Optimizer + optimizer: + momentum: 0.9 + type: Momentum + # Regularization + regularizer: + factor: 0.0001 + type: L2 +``` + +- Data reads configuration files `faster_fpn_reader.yml` + +```yaml +# Number of PROCESSES per GPU Reader +worker_num: 2 +# training data +TrainReader: + # Training data transforms + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # Batch_size during training + batch_size: 1 + # Read data is out of order + shuffle: true + # Whether to discard data that does not complete the batch + drop_last: true + # Set it to false. Then you have a sequence of values for GT: List [Tensor] + collate_batch: false + +# Evaluate data +EvalReader: + # Evaluate data transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # batch_size of evaluation + batch_size: 1 + # Read data is out of order + shuffle: false + # Whether to discard data that does not complete the batch + drop_last: false + +# test data +TestReader: + # test data transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # batch_size of test + batch_size: 1 + # Read data is out of order + shuffle: false + # Whether to discard data that does not complete the batch + drop_last: false +``` + +- Model profile `faster_rcnn_r50_fpn.yml` + +```yaml +# Model structure type +architecture: FasterRCNN +# Pretrain model address +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +# FasterRCNN +FasterRCNN: + # backbone + backbone: ResNet + # neck + neck: FPN + # rpn_head + rpn_head: RPNHead + # bbox_head + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +# backbone +ResNet: + # index 0 stands for res2 + depth: 50 + # norm_type, Configurable parameter: bn or sync_bn + norm_type: bn + # freeze_at index, 0 represent res2 + freeze_at: 0 + # return_idx + return_idx: [0,1,2,3] + # num_stages + num_stages: 4 + +# FPN +FPN: + # channel of FPN + out_channel: 256 + +# RPNHead +RPNHead: + # anchor generator + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + # rpn_target_assign + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + # The parameters of the proposal are generated during training + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + # The parameters of the proposal are generated during evaluation + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + +# BBoxHead +BBoxHead: + # TwoFCHead as BBoxHead + head: TwoFCHead + # roi align + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + # bbox_assigner + bbox_assigner: BBoxAssigner + +# BBoxAssigner +BBoxAssigner: + # batch_size_per_im + batch_size_per_im: 512 + # Background the threshold + bg_thresh: 0.5 + # Prospects for threshold + fg_thresh: 0.5 + # Prospects of proportion + fg_fraction: 0.25 + # Random sampling + use_random: True + +# TwoFCHead +TwoFCHead: + # TwoFCHead feature dimension + out_channel: 1024 + + +# BBoxPostProcess +BBoxPostProcess: + # decode + decode: RCNNBox + # nms + nms: + # use MultiClassNMS + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +``` + +- runtime configuration file `runtime.yml` + +```yaml +# Whether to use gpu +use_gpu: true +# Log Printing interval +log_iter: 20 +# save_dir +save_dir: output +# Model save interval +snapshot_epoch: 1 +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config.md new file mode 100644 index 0000000000000000000000000000000000000000..1b6b6bb1fd4d08e696ad8d0d729e18207f3220d8 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config.md @@ -0,0 +1,45 @@ +# Multi Scale Test Configuration + +Tags: Configuration + +--- +```yaml + +##################################### Multi scale test configuration ##################################### + +EvalReader: + sample_transforms: + - Decode: {} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + +TestReader: + sample_transforms: + - Decode: {} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} +``` + +--- + +Multi Scale Test is a TTA (Test Time Augmentation) method, it can improve object detection performance. + +The input image will be scaled into different scales, then model generated predictions (bboxes) at different scales, finally all the predictions will be combined to generate final prediction. (Here **NMS** is used to aggregate the predictions.) + +## _MultiscaleTestResize_ option + +`MultiscaleTestResize` option is used to enable multi scale test prediction. + +`origin_target_size: [800, 1333]` means the input image will be scaled to 800 (for short edge) and 1333 (max edge length cannot be greater than 1333) at first + +`target_size: [700 , 900]` property is used to specify different scales. + +It can be plugged into evaluation process or test (inference) process, by adding `MultiscaleTestResize` entry to `EvalReader.sample_transforms` or `TestReader.sample_transforms` + +--- + +###Note + +Now only CascadeRCNN, FasterRCNN and MaskRCNN are supported for multi scale testing. And batch size must be 1. \ No newline at end of file diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config_cn.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..36851def51e7ae3a414b78df656100b5072685c0 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/multi_scale_test_config_cn.md @@ -0,0 +1,45 @@ +# 多尺度测试的配置 + +标签: 配置 + +--- +```yaml + +##################################### 多尺度测试的配置 ##################################### + +EvalReader: + sample_transforms: + - Decode: {} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + +TestReader: + sample_transforms: + - Decode: {} + - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} +``` + +--- + +多尺度测试是一种TTA方法(测试时增强),可以用于提高目标检测的准确率 + +输入图像首先被缩放为不同尺度的图像,然后模型对这些不同尺度的图像进行预测,最后将这些不同尺度上的预测结果整合为最终预测结果。(这里使用了**NMS**来整合不同尺度的预测结果) + +## _MultiscaleTestResize_ 选项 + +`MultiscaleTestResize` 选项用于开启多尺度测试. + +`origin_target_size: [800, 1333]` 项代表输入图像首先缩放为短边为800,最长边不超过1333. + +`target_size: [700 , 900]` 项设置不同的预测尺度。 + +通过在`EvalReader.sample_transforms`或`TestReader.sample_transforms`中设置`MultiscaleTestResize`项,可以在评估过程或预测过程中开启多尺度测试。 + +--- + +###注意 + +目前多尺度测试只支持CascadeRCNN, FasterRCNN and MaskRCNN网络, 并且batch size需要是1. \ No newline at end of file diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md new file mode 100644 index 0000000000000000000000000000000000000000..9369053227e248971a996e035fb1cef9745384eb --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md @@ -0,0 +1,264 @@ +# YOLO系列模型参数配置教程 + +标签: 模型参数配置 + +以`ppyolo_r50vd_dcn_1x_coco.yml`为例,这个模型由五个子配置文件组成: + +- 数据配置文件 `coco_detection.yml` + +```yaml +# 数据评估类型 +metric: COCO +# 数据集的类别数 +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # 数据文件夹 + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # 数据文件夹,os.path.join(dataset_dir, anno_path) + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # 标注文件路径,相对 dataset_dir 路径 + anno_path: annotations/instances_val2017.json +``` + +- 优化器配置文件 `optimizer_1x.yml` + +```yaml +# 总训练轮数 +epoch: 405 + +# 学习率设置 +LearningRate: + # 默认为8卡训学习率 + base_lr: 0.01 + # 学习率调整策略 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # 学习率变化位置(轮数) + milestones: + - 243 + - 324 + # Warmup + - !LinearWarmup + start_factor: 0. + steps: 4000 + +# 优化器 +OptimizerBuilder: + # 优化器 + optimizer: + momentum: 0.9 + type: Momentum + # 正则化 + regularizer: + factor: 0.0005 + type: L2 +``` + +- 数据读取配置文件 `ppyolo_reader.yml` + +```yaml +# 每张GPU reader进程个数 +worker_num: 2 +# 训练数据 +TrainReader: + inputs_def: + num_max_boxes: 50 + # 训练数据transforms + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + # batch_transforms + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + # 训练时batch_size + batch_size: 24 + # 读取数据是否乱序 + shuffle: true + # 是否丢弃最后不能完整组成batch的数据 + drop_last: true + # mixup_epoch,大于最大epoch,表示训练过程一直使用mixup数据增广 + mixup_epoch: 25000 + # 是否通过共享内存进行数据读取加速,需要保证共享内存大小(如/dev/shm)满足大于1G + use_shared_memory: true + +# 评估数据 +EvalReader: + # 评估数据transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # 评估时batch_size + batch_size: 8 + +# 测试数据 +TestReader: + inputs_def: + image_shape: [3, 608, 608] + # 测试数据transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # 测试时batch_size + batch_size: 1 +``` + +- 模型配置文件 `ppyolo_r50vd_dcn.yml` + +```yaml +# 模型结构类型 +architecture: YOLOv3 +# 预训练模型地址 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +# norm_type +norm_type: sync_bn +# 是否使用ema +use_ema: true +# ema_decay +ema_decay: 0.9998 + +# YOLOv3 +YOLOv3: + # backbone + backbone: ResNet + # neck + neck: PPYOLOFPN + # yolo_head + yolo_head: YOLOv3Head + # post_process + post_process: BBoxPostProcess + + +# backbone +ResNet: + # depth + depth: 50 + # variant + variant: d + # return_idx, 0 represent res2 + return_idx: [1, 2, 3] + # dcn_v2_stages + dcn_v2_stages: [3] + # freeze_at + freeze_at: -1 + # freeze_norm + freeze_norm: false + # norm_decay + norm_decay: 0. + +# PPYOLOFPN +PPYOLOFPN: + # 是否coord_conv + coord_conv: true + # 是否drop_block + drop_block: true + # block_size + block_size: 3 + # keep_prob + keep_prob: 0.9 + # 是否spp + spp: true + +# YOLOv3Head +YOLOv3Head: + # anchors + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + # anchor_masks + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + # loss + loss: YOLOv3Loss + # 是否使用iou_aware + iou_aware: true + # iou_aware_factor + iou_aware_factor: 0.4 + +# YOLOv3Loss +YOLOv3Loss: + # ignore_thresh + ignore_thresh: 0.7 + # downsample + downsample: [32, 16, 8] + # 是否label_smooth + label_smooth: false + # scale_x_y + scale_x_y: 1.05 + # iou_loss + iou_loss: IouLoss + # iou_aware_loss + iou_aware_loss: IouAwareLoss + +# IouLoss +IouLoss: + loss_weight: 2.5 + loss_square: true + +# IouAwareLoss +IouAwareLoss: + loss_weight: 1.0 + +# BBoxPostProcess +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + # nms 配置 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 + +``` + +- 运行时置文件 `runtime.yml` + +```yaml +# 是否使用gpu +use_gpu: true +# 日志打印间隔 +log_iter: 20 +# save_dir +save_dir: output +# 模型保存间隔时间 +snapshot_epoch: 1 +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md new file mode 100644 index 0000000000000000000000000000000000000000..f6f3452e9100d9352cd58e76b329f605b664ffc1 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md @@ -0,0 +1,264 @@ +# YOLO series model parameter configuration tutorial + +Tag: Model parameter configuration + +Take `ppyolo_r50vd_dcn_1x_coco.yml` as an example, The model consists of five sub-profiles: + +- Data profile `coco_detection.yml` + +```yaml +# Data evaluation type +metric: COCO +# The number of categories in the dataset +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # data file + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # data file os.path.join(dataset_dir, anno_path) + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json +``` + +- Optimizer configuration file `optimizer_1x.yml` + +```yaml +# Total training epoches +epoch: 405 + +# learning rate setting +LearningRate: + # Default is 8 Gpus training learning rate + base_lr: 0.01 + # Learning rate adjustment strategy + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # Position of change in learning rate (number of epoches) + milestones: + - 243 + - 324 + # Warmup + - !LinearWarmup + start_factor: 0. + steps: 4000 + +# Optimizer +OptimizerBuilder: + # Optimizer + optimizer: + momentum: 0.9 + type: Momentum + # Regularization + regularizer: + factor: 0.0005 + type: L2 +``` + +- Data reads configuration files `ppyolo_reader.yml` + +```yaml +# Number of PROCESSES per GPU Reader +worker_num: 2 +# training data +TrainReader: + inputs_def: + num_max_boxes: 50 + # Training data transforms + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + # batch_transforms + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + # Batch size during training + batch_size: 24 + # Read data is out of order + shuffle: true + # Whether to discard data that does not complete the batch + drop_last: true + # mixup_epoch,Greater than maximum epoch, Indicates that the training process has been augmented with mixup data + mixup_epoch: 25000 + # Whether to use the shared memory to accelerate data reading, ensure that the shared memory size (such as /dev/shm) is greater than 1 GB + use_shared_memory: true + +# Evaluate data +EvalReader: + # Evaluating data transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # Batch_size during evaluation + batch_size: 8 + +# test data +TestReader: + inputs_def: + image_shape: [3, 608, 608] + # test data transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # batch_size during training + batch_size: 1 +``` + +- Model profile `ppyolo_r50vd_dcn.yml` + +```yaml +# Model structure type +architecture: YOLOv3 +# Pretrain model address +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +# norm_type +norm_type: sync_bn +# Whether to use EMA +use_ema: true +# ema_decay +ema_decay: 0.9998 + +# YOLOv3 +YOLOv3: + # backbone + backbone: ResNet + # neck + neck: PPYOLOFPN + # yolo_head + yolo_head: YOLOv3Head + # post_process + post_process: BBoxPostProcess + + +# backbone +ResNet: + # depth + depth: 50 + # variant + variant: d + # return_idx, 0 represent res2 + return_idx: [1, 2, 3] + # dcn_v2_stages + dcn_v2_stages: [3] + # freeze_at + freeze_at: -1 + # freeze_norm + freeze_norm: false + # norm_decay + norm_decay: 0. + +# PPYOLOFPN +PPYOLOFPN: + # whether coord_conv or not + coord_conv: true + # whether drop_block or not + drop_block: true + # block_size + block_size: 3 + # keep_prob + keep_prob: 0.9 + # whether spp or not + spp: true + +# YOLOv3Head +YOLOv3Head: + # anchors + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + # anchor_masks + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + # loss + loss: YOLOv3Loss + # whether to use iou_aware + iou_aware: true + # iou_aware_factor + iou_aware_factor: 0.4 + +# YOLOv3Loss +YOLOv3Loss: + # ignore_thresh + ignore_thresh: 0.7 + # downsample + downsample: [32, 16, 8] + # whether label_smooth or not + label_smooth: false + # scale_x_y + scale_x_y: 1.05 + # iou_loss + iou_loss: IouLoss + # iou_aware_loss + iou_aware_loss: IouAwareLoss + +# IouLoss +IouLoss: + loss_weight: 2.5 + loss_square: true + +# IouAwareLoss +IouAwareLoss: + loss_weight: 1.0 + +# BBoxPostProcess +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + # nms setting + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 + +``` + +- Runtime file `runtime.yml` + +```yaml +# Whether to use gpu +use_gpu: true +# Log Printing interval +log_iter: 20 +# save_dir +save_dir: output +# Model save interval +snapshot_epoch: 1 +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools.md b/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools.md new file mode 100644 index 0000000000000000000000000000000000000000..f74d1efc141ea3892c4567fd8bfdfd6e1dfd5c4b --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools.md @@ -0,0 +1,278 @@ +简体中文 | [English](DetAnnoTools_en.md) + + + +# 目标检测标注工具 + +## 目录 + +[LabelMe](#LabelMe) + +* [使用说明](#使用说明) + * [安装](#LabelMe安装) + * [图片标注过程](#LabelMe图片标注过程) +* [标注格式](#LabelMe标注格式) + * [导出数据格式](#LabelMe导出数据格式) + * [格式转化总结](#格式转化总结) + * [标注文件(json)-->VOC](#标注文件(json)-->VOC数据集) + * [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集) + +[LabelImg](#LabelImg) + +* [使用说明](#使用说明) + * [LabelImg安装](#LabelImg安装) + * [安装注意事项](#安装注意事项) + * [图片标注过程](#LabelImg图片标注过程) +* [标注格式](#LabelImg标注格式) + * [导出数据格式](#LabelImg导出数据格式) + * [格式转换注意事项](#格式转换注意事项) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### 使用说明 + +#### LabelMe安装 + +具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +推荐使用Anaconda的安装方式 + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + + + +#### LabelMe图片标注过程 + +启动labelme后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`create polygons` 绘制标注区域如下图所示(右击图像区域可以选择不同的标注形状),绘制好区域后按下回车,弹出新的框填入标注区域对应的标签,如:people + +左侧菜单栏点击保存,生成`json`形式的**标注文件** + +![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g) + + + +### LabelMe标注格式 + +#### LabelMe导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelme标注-->json +``` + + + + + +#### 格式转化总结 + +``` +#标注文件转化为VOC数据集格式 +json-->labelme2voc.py-->VOC数据集 + +#标注文件转化为COCO数据集格式 +json-->labelme2coco.py-->COCO数据集 +``` + + + + + +#### 标注文件(json)-->VOC数据集 + +使用[官方给出的labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py)这份脚本 + +下载该脚本,在命令行中使用 + +```Te +python labelme2voc.py data_annotated(标注文件所在文件夹) data_dataset_voc(输出文件夹) --labels labels.txt +``` + +运行后,在指定的输出文件夹中会如下的目录 + +``` +# It generates: +# - data_dataset_voc/JPEGImages +# - data_dataset_voc/Annotations +# - data_dataset_voc/AnnotationsVisualization + +``` + + + + + +#### 标注文件(json)-->COCO数据集 + +使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式 + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + + + + + +## [LabelImg](https://github.com/tzutalin/labelImg) + +### 使用说明 + +#### LabelImg安装 + +安装操作请参考[LabelImg官方教程](https://github.com/tzutalin/labelImg) + +
    + Ubuntu + +``` +sudo apt-get install pyqt5-dev-tools +sudo pip3 install -r requirements/requirements-linux-python3.txt +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + +
    +macOS + +``` +brew install qt # Install qt-5.x.x by Homebrew +brew install libxml2 + +or using pip + +pip3 install pyqt5 lxml # Install qt and lxml by pip + +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + + + +推荐使用Anaconda的安装方式 + + 首先下载并进入 [labelImg](https://github.com/tzutalin/labelImg#labelimg) 的目录 + +``` +conda install pyqt=5 +conda install -c anaconda lxml +pyrcc5 -o libs/resources.py resources.qrc +python labelImg.py +python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + + + + + +#### 安装注意事项 + +以Anaconda安装方式为例,比Labelme配置要麻烦一些 + +启动方式是通过python运行脚本`python labelImg.py <图片路径>` + + + +#### LabelImg图片标注过程 + +启动labelImg后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`创建区块` 绘制标注区,在弹出新的框选择对应的标签 + +左侧菜单栏点击保存,可以选择VOC/YOLO/CreateML三种类型的标注文件 + + + +![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif) + + + + + +### LabelImg标注格式 + +#### LabelImg导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelImg标注-->xml/txt/json +``` + + + +#### 格式转换注意事项 + +**PaddleDetection支持VOC或COCO格式的数据**,经LabelImg标注导出后的标注文件,需要修改为**VOC或COCO格式**,调整说明可以参考[准备训练数据](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE) diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools_en.md b/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools_en.md new file mode 100644 index 0000000000000000000000000000000000000000..ab08db491b1a3496cdcfdce53b6650008e08edf7 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/DetAnnoTools_en.md @@ -0,0 +1,270 @@ +[简体中文](DetAnnoTools.md) | English + + + +# Object Detection Annotation Tools + +## Concents + +[LabelMe](#LabelMe) + +* [Instruction](#Instruction-of-LabelMe) + * [Installation](#Installation) + * [Annotation of Images](#Annotation-of-images-in-LabelMe) +* [Annotation Format](#Annotation-Format-of-LabelMe) + * [Export Format](#Export-Format-of-LabelMe) + * [Summary of Format Conversion](#Summary-of-Format-Conversion) + * [Annotation file(json)—>VOC Dataset](#annotation-filejsonvoc-dataset) + * [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset) + +[LabelImg](#LabelImg) + +* [Instruction](#Instruction-of-LabelImg) + * [Installation](#Installation-of-LabelImg) + * [Installation Notes](#Installation-Notes) + * [Annotation of images](#Annotation-of-images-in-LabelImg) +* [Annotation Format](#Annotation-Format-of-LabelImg) + * [Export Format](#Export-Format-of-LabelImg) + * [Notes of Format Conversion](#Notes-of-Format-Conversion) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### Instruction of LabelMe + +#### Installation + +Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +We recommend installing by Anoncanda. + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + + + +#### Annotation of Images in LabelMe + +After starting labelme, select an image or an folder with images. + +Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people. + +Click the save button in the formula bar,it will generate an annotation file in json. + +![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g) + + + +### Annotation Format of LabelMe + +#### Export Format of LabelMe + +``` +#generate an annotation file +png/jpeg/jpg-->labelme-->json +``` + + + + + +#### Summary of Format Conversion + +``` +#convert annotation file to VOC dataset format +json-->labelme2voc.py-->VOC dataset + +#convert annotation file to COCO dataset format +json-->labelme2coco.py-->COCO dataset +``` + + + + + +#### Annotation file(json)—>VOC Dataset + +Use this script [labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py) in command line. + +```Te +python labelme2voc.py data_annotated(annotation folder) data_dataset_voc(output folder) --labels labels.txt +``` + +Then, it will generate following contents: + +``` +# It generates: +# - data_dataset_voc/JPEGImages +# - data_dataset_voc/Annotations +# - data_dataset_voc/AnnotationsVisualization + +``` + + + + + +#### Annotation file(json)—>COCO Dataset + +Convert the data annotated by LabelMe to COCO dataset by the script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/tools/x2coco.py) provided by PaddleDetection. + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +After the user dataset is converted to COCO data, the directory structure is as follows (Try to avoid use Chinese for the path name in case of errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + + + + + +## [LabelImg](https://github.com/tzutalin/labelImg) + +### Instruction + +#### Installation of LabelImg + +Please refer to [The github of LabelImg](https://github.com/tzutalin/labelImg) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install pyqt5-dev-tools +sudo pip3 install -r requirements/requirements-linux-python3.txt +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + +
    +macOS + +``` +brew install qt # Install qt-5.x.x by Homebrew +brew install libxml2 + +or using pip + +pip3 install pyqt5 lxml # Install qt and lxml by pip + +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + + + +We recommend installing by Anoncanda. + +Download and go to the folder of [labelImg](https://github.com/tzutalin/labelImg#labelimg) + +``` +conda install pyqt=5 +conda install -c anaconda lxml +pyrcc5 -o libs/resources.py resources.qrc +python labelImg.py +python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + + + + + +#### Installation Notes + +Use python scripts to startup LabelImg: `python labelImg.py ` + +#### Annotation of images in LabelImg + +After the startup of LabelImg, select an image or a folder with images. + +Select `Create RectBox` in the formula bar. Draw an annotation area as shown in the following GIF. When finished, select corresponding label in the popup box. Then save the annotated file in three forms: VOC/YOLO/CreateML. + + + +![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif) + + + + + +### Annotation Format of LabelImg + +#### Export Format of LabelImg + +``` +#generate annotation files +png/jpeg/jpg-->labelImg-->xml/txt/json +``` + + + +#### Notes of Format Conversion + +**PaddleDetection supports the format of VOC or COCO.** The annotation file generated by LabelImg needs to be converted by VOC or COCO. You can refer to [PrepareDataSet](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE). diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools.md b/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools.md new file mode 100644 index 0000000000000000000000000000000000000000..8d85d33521864b45f96b67fda59daef947ebad24 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools.md @@ -0,0 +1,164 @@ +简体中文 | [English](KeyPointAnnoTools_en.md) + +# 关键点检测标注工具 + +## 目录 + +[LabelMe](#LabelMe) + +- [使用说明](#使用说明) + - [安装](#安装) + - [关键点数据说明](#关键点数据说明) + - [图片标注过程](#图片标注过程) +- [标注格式](#标注格式) + - [导出数据格式](#导出数据格式) + - [格式转化总结](#格式转化总结) + - [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### 使用说明 + +#### 安装 + +具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +推荐使用Anaconda的安装方式 + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + +#### 关键点数据说明 + +以COCO数据集为例,共需采集17个关键点 + +``` +keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` + + + + + +#### 图片标注过程 + +启动labelme后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`create polygons` ,右击图像区域选择标注形状,绘制好关键点后按下回车,弹出新的框填入标注关键点对应的标签 + +左侧菜单栏点击保存,生成`json`形式的**标注文件** + +![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif) + + + +### 标注格式 + +#### 导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelme标注-->json +``` + + + +#### 格式转化总结 + +``` +#标注文件转化为COCO数据集格式 +json-->labelme2coco.py-->COCO数据集 +``` + + + + + +#### 标注文件(json)-->COCO数据集 + +使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式 + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools_en.md b/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools_en.md new file mode 100644 index 0000000000000000000000000000000000000000..70cfdd0103e78e4913649886b3d259ce946b5498 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/KeyPointAnnoTools_en.md @@ -0,0 +1,164 @@ +[简体中文](KeyPointAnnoTools.md) | English + +# Key Points Detection Annotation Tool + +## Concents + +[LabelMe](#LabelMe) + +- [Instruction](#Instruction) + - [Installation](#Installation) + - [Notes of Key Points Data](#Notes-of-Key-Points-Data) + - [Annotation of LabelMe](#Annotation-of-LabelMe) +- [Annotation Format](#Annotation-Format) + - [Data Export Format](#Data-Export-Format) + - [Summary of Format Conversion](#Summary-of-Format-Conversion) + - [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### Instruction + +#### Installation + +Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +We recommend installing by Anoncanda. + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + +#### Notes of Key Points Data + +COCO dataset needs to collect 17 key points. + +``` +keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` + + + + + +#### Annotation of LabelMe + +After starting labelme, select an image or an folder with images. + +Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people. + +Click the save button in the formula bar,it will generate an annotation file in json. + +![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif) + + + +### Annotation Format + +#### Data Export Format + +``` +#generate an annotation file +png/jpeg/jpg-->labelme-->json +``` + + + +#### Summary of Format Conversion + +``` +#convert annotation file to COCO dataset format +json-->labelme2coco.py-->COCO dataset +``` + + + + + +#### Annotation file(json)—>COCO Dataset + +Convert the data annotated by LabelMe to COCO dataset by this script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/tools/x2coco.py). + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +After the user dataset is converted to COCO data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/MOTAnnoTools.md b/PaddleDetection-release-2.6/docs/tutorials/data/MOTAnnoTools.md new file mode 100644 index 0000000000000000000000000000000000000000..433a1a2808cf05cecbffe80e5d8297f2224d3bfb --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/MOTAnnoTools.md @@ -0,0 +1,75 @@ +# 多目标跟踪标注工具 + + + +## 目录 + +* [前期准备](#前期准备) +* [SDE数据集](#SDE数据集) + * [LabelMe](#LabelMe) + * [LabelImg](#LabelImg) +* [JDE数据集](#JDE数据集) + * [DarkLabel](#DarkLabel) + * [标注格式](#标注格式) + + +### 前期准备 + +请先查看[多目标跟踪数据集准备](PrepareMOTDataSet.md)确定MOT模型选型和MOT数据集的类型。 +通常综合数据标注成本和模型精度速度平衡考虑,更推荐使用SDE系列数据集,和SDE系列模型的ByteTrack或OC-SORT。SDE系列数据集的标注工具与目标检测任务是一致的。 + +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 + +#### LabelMe +LabelMe的使用可以参考[DetAnnoTools](DetAnnoTools.md) + +#### LabelImg +LabelImg的使用可以参考[DetAnnoTools](DetAnnoTools.md) + + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,标注成本比SDE数据集更高。 + +#### [DarkLabel](https://github.com/darkpgmr/DarkLabel) + +#### 使用说明 + +##### 安装 + +从官方给出的下载[链接](https://github.com/darkpgmr/DarkLabel/releases)中下载想要的版本,Windows环境解压后能够直接使用 + +**视频/图片标注过程** + +1. 启动应用程序后,能看到左侧的工具栏 +2. 选择视频/图像文件后,按需选择标注形式: + * Box仅绘制标注框 + * Box+Label绘制标注框&标签 + * Box+Label+AutoID绘制标注框&标签&ID号 + * Popup LabelSelect可以自行定义标签 +3. 在视频帧/图像上进行拖动鼠标,进行标注框的绘制 +4. 绘制完成后,在上数第六行里选择保存标注文件的形式,默认.txt + +![1](https://user-images.githubusercontent.com/34162360/179673519-511b4167-97ed-4228-8869-db9c69a68b6b.mov) + + + +##### 注意事项 + +1. 如果标注的是视频文件,需要在工具栏上数第五行的下拉框里选择`[fn,cname,id,x1,y1,w,h]` (DarkLabel2.4版本) +2. 鼠标移动到标注框所在区域,右键可以删除标注框 +3. 按下shift,可以选中标注框,进行框的移动和对某条边的编辑 +4. 按住enter回车,可以自动跟踪标注目标 +5. 自动跟踪标注目标过程中可以暂停(松开enter),按需修改标注框 + + + +##### 其他使用参考视频 + +* [DarkLabel (Video/Image Annotation Tool) - Ver.2.0](https://www.youtube.com/watch?v=lok30aIZgUw) +* [DarkLabel (Image/Video Annotation Tool)](https://www.youtube.com/watch?v=vbydG78Al8s&t=11s) + + + +#### 标注格式 +标注文件需要转化为MOT JDE数据集格式,包含`images`和`labels_with_ids`文件夹,具体参照[用户自定义数据集准备](PrepareMOTDataSet.md#用户自定义数据集准备)。 diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet.md new file mode 100644 index 0000000000000000000000000000000000000000..a282d4220f05ff07314295bdbb108bd99d738f7a --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet.md @@ -0,0 +1,497 @@ +# 目标检测数据准备 +## 目录 +- [目标检测数据说明](#目标检测数据说明) +- [准备训练数据](#准备训练数据) + - [VOC数据](#VOC数据) + - [VOC数据集下载](#VOC数据集下载) + - [VOC数据标注文件介绍](#VOC数据标注文件介绍) + - [COCO数据](#COCO数据) + - [COCO数据集下载](#COCO数据下载) + - [COCO数据标注文件介绍](#COCO数据标注文件介绍) + - [用户数据准备](#用户数据准备) + - [用户数据转成VOC数据](#用户数据转成VOC数据) + - [用户数据转成COCO数据](#用户数据转成COCO数据) + - [用户数据自定义reader](#用户数据自定义reader) + - [用户数据使用示例](#用户数据使用示例) + - [数据格式转换](#数据格式转换) + - [自定义数据训练](#自定义数据训练) +- [(可选)生成Anchor](#(可选)生成Anchor) + +### 目标检测数据说明 + +目标检测的数据比分类复杂,一张图像中,需要标记出各个目标区域的位置和类别。 + +一般的目标区域位置用一个矩形框来表示,一般用以下3种方式表达: + +| 表达方式 | 说明 | +| :----------------: | :--------------------------------: | +| x1,y1,x2,y2 | (x1,y1)为左上角坐标,(x2,y2)为右下角坐标 | +| x1,y1,w,h | (x1,y1)为左上角坐标,w为目标区域宽度,h为目标区域高度 | +| xc,yc,w,h | (xc,yc)为目标区域中心坐标,w为目标区域宽度,h为目标区域高度 | + +常见的目标检测数据集如Pascal VOC采用的`[x1,y1,x2,y2]` 表示物体的bounding box, [COCO](https://cocodataset.org/#format-data)采用的`[x1,y1,w,h]` 表示物体的bounding box. + +### 准备训练数据 + +PaddleDetection默认支持[COCO](http://cocodataset.org)和[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 和[WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) 数据源。 +同时还支持自定义数据源,包括: + +(1) 自定义数据转换成VOC数据; +(2) 自定义数据转换成COCO数据; +(3) 自定义新的数据源,增加自定义的reader。 + + +首先进入到`PaddleDetection`根目录下 +``` +cd PaddleDetection/ +ppdet_root=$(pwd) +``` + +#### VOC数据 + +VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务,还包含图像目标检测、图像分割等任务,其标注文件中包含多个任务的标注内容。 +VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据,xml文件中的非必须字段,请根据实际情况选择是否标注或是否使用默认值。 + +##### VOC数据集下载 + +- 通过代码自动化下载VOC数据集,数据集较大,下载需要较长时间 + + ``` + # 执行代码自动化下载VOC数据集 + python dataset/voc/download_voc.py + ``` + + 代码执行完成后VOC数据集文件组织结构为: + ``` + >>cd dataset/voc/ + >>tree + ├── create_list.py + ├── download_voc.py + ├── generic_det_label_list.txt + ├── generic_det_label_list_zh.txt + ├── label_list.txt + ├── VOCdevkit/VOC2007 + │ ├── annotations + │ ├── 001789.xml + │ | ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ | ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ | ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ | ... + │ ├── ImageSets + │ | ... + | ... + ``` + + 各个文件说明 + ``` + # label_list.txt 是类别名称列表,文件名必须是 label_list.txt。若使用VOC数据集,config文件中use_default_label为true时不需要这个文件 + >>cat label_list.txt + aeroplane + bicycle + ... + + # trainval.txt 是训练数据集文件列表 + >>cat trainval.txt + VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml + VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml + ... + + # test.txt 是测试数据集文件列表 + >>cat test.txt + VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml + ... + + # label_list.txt voc 类别名称列表 + >>cat label_list.txt + + aeroplane + bicycle + ... + ``` +- 已下载VOC数据集 + 按照如上数据文件组织结构组织文件即可。 + +##### VOC数据标注文件介绍 + +VOC数据是每个图像文件对应一个同名的xml文件,xml文件中标记物体框的坐标和类别等信息。例如图像`2007_002055.jpg`: +![](../images/2007_002055.jpg) + +图片对应的xml文件内包含对应图片的基本信息,比如文件名、来源、图像尺寸以及图像中包含的物体区域信息和类别信息等。 + +xml文件中包含以下字段: +- filename,表示图像名称。 +- size,表示图像尺寸。包括:图像宽度、图像高度、图像深度。 + ``` + + 500 + 375 + 3 + + ``` +- object字段,表示每个物体。包括: + + | 标签 | 说明 | + | :--------: | :-----------: | + | name | 物体类别名称 | + | pose | 关于目标物体姿态描述(非必须字段) | + | truncated | 如果物体的遮挡超过15-20%并且位于边界框之外,请标记为`truncated`(非必须字段) | + | difficult | 难以识别的物体标记为`difficult`(非必须字段) | + | bndbox子标签 | (xmin,ymin) 左上角坐标,(xmax,ymax) 右下角坐标, | + + +#### COCO数据 +COCO数据是[COCO](http://cocodataset.org) 比赛使用的数据。同样的,COCO比赛数也包含多个比赛任务,其标注文件中包含多个任务的标注内容。 +COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据,json文件中的一些字段,请根据实际情况选择是否标注或是否使用默认值。 + + +##### COCO数据下载 +- 通过代码自动化下载COCO数据集,数据集较大,下载需要较长时间 + + ``` + # 执行代码自动化下载COCO数据集 + python dataset/coco/download_coco.py + ``` + + 代码执行完成后COCO数据集文件组织结构为: + ``` + >>cd dataset/coco/ + >>tree + ├── annotations + │ ├── instances_train2017.json + │ ├── instances_val2017.json + │ | ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ | ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ | ... + | ... + ``` +- 已下载COCO数据集 + 按照如上数据文件组织结构组织文件即可。 + +##### COCO数据标注介绍 +COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放。 + +json文件中包含以下key: +- info,表示标注文件info。 +- licenses,表示标注文件licenses。 +- images,表示标注文件中图像信息列表,每个元素是一张图像的信息。如下为其中一张图像的信息: + ``` + { + 'license': 3, # license + 'file_name': '000000391895.jpg', # file_name + # coco_url + 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', + 'height': 360, # image height + 'width': 640, # image width + 'date_captured': '2013-11-14 11:18:45', # date_captured + # flickr_url + 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', + 'id': 391895 # image id + } + ``` +- annotations,表示标注文件中目标物体的标注信息列表,每个元素是一个目标物体的标注信息。如下为其中一个目标物体的标注信息: + ``` + { + + 'segmentation': # 物体的分割标注 + 'area': 2765.1486500000005, # 物体的区域面积 + 'iscrowd': 0, # iscrowd + 'image_id': 558840, # image id + 'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h] + 'category_id': 58, # category_id + 'id': 156 # image id + } + ``` + + ``` + # 查看COCO标注文件 + import json + coco_anno = json.load(open('./annotations/instances_train2017.json')) + + # coco_anno.keys + print('\nkeys:', coco_anno.keys()) + + # 查看类别信息 + print('\n物体类别:', coco_anno['categories']) + + # 查看一共多少张图 + print('\n图像数量:', len(coco_anno['images'])) + + # 查看一共多少个目标物体 + print('\n标注物体数量:', len(coco_anno['annotations'])) + + # 查看一条目标物体标注信息 + print('\n查看一条目标物体标注信息:', coco_anno['annotations'][0]) + ``` + +#### 用户数据准备 +对于用户数据有3种处理方法: +(1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可) +(2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可) +(3) 自定义一个用户数据的reader(较复杂数据,需要自定义reader) + +##### 用户数据转成VOC数据 +用户数据集转成VOC数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── xxx1.xml +│ ├── xxx2.xml +│ ├── xxx3.xml +│ | ... +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +├── label_list.txt (必须提供,且文件名称必须是label_list.txt ) +├── train.txt (训练数据集文件列表, ./images/xxx1.jpg ./annotations/xxx1.xml) +└── valid.txt (测试数据集文件列表) +``` + +各个文件说明 +``` +# label_list.txt 是类别名称列表,改文件名必须是这个 +>>cat label_list.txt +classname1 +classname2 +... + +# train.txt 是训练数据文件列表 +>>cat train.txt +./images/xxx1.jpg ./annotations/xxx1.xml +./images/xxx2.jpg ./annotations/xxx2.xml +... + +# valid.txt 是验证数据文件列表 +>>cat valid.txt +./images/xxx3.jpg ./annotations/xxx3.xml +... +``` + +##### 用户数据转成COCO数据 +在`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据,例如: + +(1)labelme数据转换为COCO数据: +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` +(2)voc数据转换为COCO数据: +```bash +python tools/x2coco.py \ + --dataset_type voc \ + --voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \ + --voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \ + --voc_label_list dataset/voc/label_list.txt \ + --voc_out_name voc_train.json +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + +##### 用户数据自定义reader +如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)。 + + +#### 用户数据使用示例 + +以[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。 +Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 +可从Kaggle上下载,也可以从[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 下载。 +路标数据集示例图: +![](../images/road554.png) + +``` +# 下载解压数据 +>>cd $(ppdet_root)/dataset +# 下载kaggle数据集并解压,当前文件组织结构如下 + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +``` + +#### 数据格式转换 + +将数据划分为训练集和测试集 +``` +# 生成 label_list.txt 文件 +>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt + +# 生成 train.txt、valid.txt和test.txt列表文件 +>>ls images/*.png | shuf > all_image_list.txt +>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt + +# 训练集、验证集、测试集比例分别约80%、10%、10%。 +>>head -n 88 all_list.txt > test.txt +>>head -n 176 all_list.txt | tail -n 88 > valid.txt +>>tail -n 701 all_list.txt > train.txt + +# 删除不用文件 +>>rm -rf all_image_list.txt all_list.txt + +最终数据集文件组织结构为: + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +├── label_list.txt +├── test.txt +├── train.txt +└── valid.txt + +# label_list.txt 是类别名称列表,文件名必须是 label_list.txt +>>cat label_list.txt +crosswalk +speedlimit +stop +trafficlight + +# train.txt 是训练数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。 +>>cat train.txt +./images/road839.png ./annotations/road839.xml +./images/road363.png ./annotations/road363.xml +... + +# valid.txt 是验证数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。 +>>cat valid.txt +./images/road218.png ./annotations/road218.xml +./images/road681.png ./annotations/road681.xml +``` + +也可以下载准备好的数据[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) ,解压到`dataset/roadsign_voc/`文件夹下即可。 +准备好数据后,一般的我们要对数据有所了解,比如图像量,图像尺寸,每一类目标区域个数,目标区域大小等。如有必要,还要对数据进行清洗。 +roadsign数据集统计: + +| 数据 | 图片数量 | +| :--------: | :-----------: | +| train | 701 | +| valid | 176 | + +**说明:** +(1)用户数据,建议在训练前仔细检查数据,避免因数据标注格式错误或图像数据不完整造成训练过程中的crash +(2)如果图像尺寸太大的话,在不限制读入数据尺寸情况下,占用内存较多,会造成内存/显存溢出,请合理设置batch_size,可从小到大尝试 + +#### 自定义数据训练 + +数据准备完成后,需要修改PaddleDetection中关于Dataset的配置文件,在`configs/datasets`文件夹下。比如roadsign数据集的配置文件如下: +``` +metric: VOC # 目前支持COCO, VOC, WiderFace等评估标准 +num_classes: 4 # 数据集的类别数,不包含背景类,roadsign数据集为4类,其他数据需要修改为自己的数据类别 + +TrainDataset: + !VOCDataSet + dataset_dir: dataset/roadsign_voc # 训练集的图片所在文件相对于dataset_dir的路径 + anno_path: train.txt # 训练集的标注文件相对于dataset_dir的路径 + label_list: label_list.txt # 数据集所在路径,相对于PaddleDetection路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] # 控制dataset输出的sample所包含的字段,注意此为训练集Reader独有的且必须配置的字段 + +EvalDataset: + !VOCDataSet + dataset_dir: dataset/roadsign_voc # 数据集所在路径,相对于PaddleDetection路径 + anno_path: valid.txt # 验证集的标注文件相对于dataset_dir的路径 + label_list: label_list.txt # 标签文件,相对于dataset_dir的路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +TestDataset: + !ImageFolder + anno_path: label_list.txt # 标注文件所在路径,仅用于读取数据集的类别信息,支持json和txt格式 + dataset_dir: dataset/roadsign_voc # 数据集所在路径,若添加了此行,则`anno_path`路径为相对于`dataset_dir`路径,若此行不设置或去掉此行,则为相对于PaddleDetection路径 +``` + +然后在对应模型配置文件中将自定义数据文件路径替换为新路径,以`configs/yolov3/yolov3_mobilenet_v1_roadsign.yml`为例 + +``` +_BASE_: [ + '../datasets/roadsign_voc.yml', # 指定为自定义数据集配置路径 + '../runtime.yml', + '_base_/optimizer_40e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +weights: output/yolov3_mobilenet_v1_roadsign/model_final + +YOLOv3Loss: + ignore_thresh: 0.7 + label_smooth: true +``` + + +在PaddleDetection的yml配置文件中,使用`!`直接序列化模块实例(可以是函数,实例等),上述的配置文件均使用Dataset进行了序列化。 + +配置修改完成后,即可以启动训练评估,命令如下 + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../GETTING_STARTED_cn.md) + +**注意:** +请运行前自行仔细检查数据集的配置路径,在训练或验证时如果TrainDataset和EvalDataset的路径配置有误,会提示自动下载数据集。若使用自定义数据集,在推理时如果TestDataset路径配置有误,会提示使用默认COCO数据集的类别信息。 + + + +### (可选)生成Anchor +在yolo系列模型中,大多数情况下使用默认的anchor设置即可, 你也可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor,使用方法如下: +``` bash +python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 +``` +目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示: + +| 参数 | 用途 | 默认值 | 备注 | +|:------:|:------:|:------:|:------:| +| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 | +| -n/--n | 聚类的簇数 | 9 | Anchor的数目 | +| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 | +| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2的聚类算法 | +| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 | diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet_en.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet_en.md new file mode 100644 index 0000000000000000000000000000000000000000..dbbe90d049c4239e5fa5b075df84685d98cc91ab --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareDetDataSet_en.md @@ -0,0 +1,450 @@ +# How to Prepare Training Data +## Directory +- [How to Prepare Training Data](#how-to-prepare-training-data) + - [Directory](#directory) + - [Description of Object Detection Data](#description-of-object-detection-data) + - [Prepare Training Data](#prepare-training-data) + - [VOC Data](#voc-data) + - [VOC Dataset Download](#voc-dataset-download) + - [Introduction to VOC Data Annotation File](#introduction-to-voc-data-annotation-file) + - [COCO Data](#coco-data) + - [COCO Data Download](#coco-data-download) + - [Description of COCO Data Annotation](#description-of-coco-data-annotation) + - [User Data](#user-data) + - [Convert User Data to VOC Data](#convert-user-data-to-voc-data) + - [Convert User Data to COCO Data](#convert-user-data-to-coco-data) + - [Reader of User Define Data](#reader-of-user-define-data) + - [Example of User Data Conversion](#example-of-user-data-conversion) + +### Description of Object Detection Data +The data of object detection is more complex than classification. In an image, it is necessary to mark the position and category of each object. + +The general object position is represented by a rectangular box, which is generally expressed in the following three ways + +| Expression | Explanation | +| :---------: | :----------------------------------------------------------------------------: | +| x1,y1,x2,y2 | (x1,y1)is the top left coordinate, (x2,y2)is the bottom right coordonate | +| x1,y1,w,h | (x1,y1)is the top left coordinate, w is width of object, h is height of object | +| xc,yc,w,h | (xc,yc)is center of object, w is width of object, h is height of object | + +Common object detection datasets such as Pascal VOC, adopting `[x1,y1,x2,y2]` to express the bounding box of object. COCO uses `[x1,y1,w,h]`, [format](https://cocodataset.org/#format-data). + +### Prepare Training Data +PaddleDetection is supported [COCO](http://cocodataset.org) and [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) datasets by default. + +It also supports custom data sources including: + +(1) Convert custom data to VOC format; +(2) Convert custom data to COOC format; +(3) Customize a new data source, and add custom reader; + +firstly, enter `PaddleDetection` root directory + +``` +cd PaddleDetection/ +ppdet_root=$(pwd) +``` + +#### VOC Data + +VOC data is used in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) competition. Pascal VOC competition not only contains image classification task, but also contains object detection and object segmentation et al., the annotation file contains the ground truth of multiple tasks. +VOC dataset denotes the data of PAscal VOC competition. when customizeing VOC data, For non mandatory fields in the XML file, please select whether to label or use the default value according to the actual situation. + +##### VOC Dataset Download + +- Download VOC datasets through code automation. The datasets are large and take a long time to download + + ``` + # Execute code to automatically download VOC dataset + python dataset/voc/download_voc.py + ``` + + After code execution, the VOC dataset file organization structure is: + ``` + >>cd dataset/voc/ + >>tree + ├── create_list.py + ├── download_voc.py + ├── generic_det_label_list.txt + ├── generic_det_label_list_zh.txt + ├── label_list.txt + ├── VOCdevkit/VOC2007 + │ ├── annotations + │ ├── 001789.xml + │ | ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ | ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ | ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ | ... + │ ├── ImageSets + │ | ... + | ... + ``` + + Description of each document + ``` + # label_list.txt is list of classes name,filename must be label_list.txt. If using VOC dataset, when `use_default_label=true` in config file, this file is not required. + + >>cat label_list.txt + aeroplane + bicycle + ... + + # trainval.txt is file list of trainset + >>cat trainval.txt + VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml + VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml + ... + + # test.txt is file list of testset + >>cat test.txt + VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml + ... + + # label_list.txt voc list of classes name + >>cat label_list.txt + + aeroplane + bicycle + ... + ``` +- If the VOC dataset has been downloaded + You can organize files according to the above data file organization structure. + +##### Introduction to VOC Data Annotation File + +In VOC dataset, Each image file corresponds to an XML file with the same name, the coordinates and categories of the marked object frame in the XML file, such as `2007_002055.jpg`: +![](../images/2007_002055.jpg) + +The XML file corresponding to the image contains the basic information of the corresponding image, such as file name, source, image size, object area information and category information contained in the image. + +The XML file contains the following fields: +- filename, indicating the image name. +- size, indicating the image size, including: image width, image height and image depth + ``` + + 500 + 375 + 3 + + ``` +- object field, indict each object, including: + + | Label | Explanation | + | :--------------: | :------------------------------------------------------------------------------------------------------------------------: | + | name | name of object class | + | pose | attitude description of the target object (non required field) | + | truncated | If the occlusion of the object exceeds 15-20% and is outside the bounding box,mark it as `truncated` (non required field) | + | difficult | objects that are difficult to recognize are marked as`difficult` (non required field) | + | bndbox son laebl | (xmin,ymin) top left coordinate, (xmax,ymax) bottom right coordinate | + + +#### COCO Data +COOC data is used in [COCO](http://cocodataset.org) competition. alike, Coco competition also contains multiple competition tasks, and its annotation file contains the annotation contents of multiple tasks. +The coco dataset refers to the data used in the coco competition. Customizing coco data, some fields in JSON file, please select whether to label or use the default value according to the actual situation. + + +##### COCO Data Download +- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download + + ``` + # automatically download coco datasets by executing code + python dataset/coco/download_coco.py + ``` + + after code execution, the organization structure of coco dataset file is: + ``` + >>cd dataset/coco/ + >>tree + ├── annotations + │ ├── instances_train2017.json + │ ├── instances_val2017.json + │ | ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ | ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ | ... + | ... + ``` +- If the coco dataset has been downloaded + The files can be organized according to the above data file organization structure. + +##### Description of COCO Data Annotation +Coco data annotation is to store the annotations of all training images in a JSON file. Data is stored in the form of nested dictionaries. + +The JSON file contains the following keys: +- info,indicating the annotation file info。 +- licenses, indicating the label file licenses。 +- images, indicating the list of image information in the annotation file, and each element is the information of an image. The following is the information of one of the images: + ``` + { + 'license': 3, # license + 'file_name': '000000391895.jpg', # file_name + # coco_url + 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', + 'height': 360, # image height + 'width': 640, # image width + 'date_captured': '2013-11-14 11:18:45', # date_captured + # flickr_url + 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', + 'id': 391895 # image id + } + ``` +- annotations: indicating the annotation information list of the target object in the annotation file. Each element is the annotation information of a target object. The following is the annotation information of one of the target objects: + ``` + { + + 'segmentation': # object segmentation annotation + 'area': 2765.1486500000005, # object area + 'iscrowd': 0, # iscrowd + 'image_id': 558840, # image id + 'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h] + 'category_id': 58, # category_id + 'id': 156 # image id + } + ``` + + ``` + # Viewing coco annotation files + import json + coco_anno = json.load(open('./annotations/instances_train2017.json')) + + # coco_anno.keys + print('\nkeys:', coco_anno.keys()) + + # Viewing categories information + print('\ncategories:', coco_anno['categories']) + + # Viewing the number of images + print('\nthe number of images:', len(coco_anno['images'])) + + # Viewing the number of obejcts + print('\nthe number of annotation:', len(coco_anno['annotations'])) + + # View object annotation information + print('\nobject annotation information: ', coco_anno['annotations'][0]) + ``` + + Coco data is prepared as follows. + `dataset/coco/`Initial document organization + ``` + >>cd dataset/coco/ + >>tree + ├── download_coco.py + ``` + +#### User Data +There are three processing methods for user data: + (1) Convert user data into VOC data (only include labels necessary for object detection as required) + (2) Convert user data into coco data (only include labels necessary for object detection as required) + (3) Customize a reader for user data (for complex data, you need to customize the reader) + +##### Convert User Data to VOC Data +After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── xxx1.xml +│ ├── xxx2.xml +│ ├── xxx3.xml +│ | ... +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +├── label_list.txt (Must be provided and the file name must be label_list.txt ) +├── train.txt (list of trainset ./images/xxx1.jpg ./annotations/xxx1.xml) +└── valid.txt (list of valid file) +``` + +Description of each document +``` +# label_list.txt is a list of category names. The file name must be this +>>cat label_list.txt +classname1 +classname2 +... + +# train.txt is list of trainset +>>cat train.txt +./images/xxx1.jpg ./annotations/xxx1.xml +./images/xxx2.jpg ./annotations/xxx2.xml +... + +# valid.txt is list of validset +>>cat valid.txt +./images/xxx3.jpg ./annotations/xxx3.xml +... +``` + +##### Convert User Data to COCO Data +`x2coco.py` is provided in `./tools/` to convert VOC dataset, labelme labeled dataset or cityscape dataset into coco data, for example: + +(1)Conversion of labelme data to coco data: +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` +(2)Convert VOC data to coco data: +```bash +python tools/x2coco.py \ + --dataset_type voc \ + --voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \ + --voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \ + --voc_label_list dataset/voc/label_list.txt \ + --voc_out_name voc_train.json +``` + +After the user dataset is converted to coco data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + +##### Reader of User Define Data + If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing + +The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: +``` +metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards +num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes + +TrainDataset: + !COCODataSet + image_dir: train2017 # The path where the training set image resides relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the trainreader and must be configured + +EvalDataset: + !COCODataSet + image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir + anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir + dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path +TestDataset: + !ImageFolder + anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported + dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path` +``` +In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset. + +**Note:** +Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used. + + +#### Example of User Data Conversion + Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categories:crosswalk,speedlimit,stop,trafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). + Example diagram of road sign dataset: + ![](../images/road554.png) + +``` +# Downing and unziping data + >>cd $(ppdet_root)/dataset +# Download and unzip the kaggle dataset. The current file organization is as follows + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +``` + +The data is divided into training set and test set +``` +# Generating label_list.txt +>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt + +# Generating train.txt, valid.txt and test.txt +>>ls images/*.png | shuf > all_image_list.txt +>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt + +# The proportion of training set, verification set and test set is about 80%, 10% and 10% respectively. +>>head -n 88 all_list.txt > test.txt +>>head -n 176 all_list.txt | tail -n 88 > valid.txt +>>tail -n 701 all_list.txt > train.txt + +# Deleting unused files +>>rm -rf all_image_list.txt all_list.txt + +The organization structure of the final dataset file is: + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +├── label_list.txt +├── test.txt +├── train.txt +└── valid.txt + +# label_list.txt is list of file name, file name must be label_list.txt +>>cat label_list.txt +crosswalk +speedlimit +stop +trafficlight + +# train.txt is the list of training dataset files, and each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder. +>>cat train.txt +./images/road839.png ./annotations/road839.xml +./images/road363.png ./annotations/road363.xml +... + +# valid.txt is the list of validation dataset files. Each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder. +>>cat valid.txt +./images/road218.png ./annotations/road218.xml +./images/road681.png ./annotations/road681.xml +``` + +You can also download [the prepared data](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar), unzip to `dataset/roadsign_voc/` +After preparing the data, we should generally understand the data, such as image quantity, image size, number of target areas of each type, target area size, etc. If necessary, clean the data. + +Roadsign dataset statistics: + +| data | number of images | +| :---: | :--------------: | +| train | 701 | +| valid | 176 | + +**Explanation:** + (1) For user data, it is recommended to carefully check the data before training to avoid crash during training due to wrong data annotation format or incomplete image data + (2) If the image size is too large, it will occupy more memory without limiting the read data size, which will cause memory / video memory overflow. Please set batch reasonably_ Size, you can try from small to large diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet.md new file mode 100644 index 0000000000000000000000000000000000000000..27d844c03482047dfa47db1985b10fecef9ee74b --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet.md @@ -0,0 +1,176 @@ +简体中文 | [English](PrepareKeypointDataSet_en.md) + +# 关键点数据准备 +## 目录 +- [COCO数据集](#COCO数据集) +- [MPII数据集](#MPII数据集) +- [用户数据准备](#用户数据准备) + - [数据格式转换](#数据格式转换) + - [自定义数据训练](#自定义数据训练) + +## COCO数据集 +### COCO数据集的准备 +我们提供了一键脚本来自动完成COCO2017数据集的下载及准备工作,请参考[COCO数据集下载](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet.md#COCO%E6%95%B0%E6%8D%AE)。 + +### COCO数据集(KeyPoint)说明 +在COCO中,关键点序号与部位的对应关系为: +``` +COCO keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` +与Detection任务不同,KeyPoint任务的标注文件为`person_keypoints_train2017.json`和`person_keypoints_val2017.json`两个json文件。json文件中包含的`info`、`licenses`和`images`字段的含义与Detection相同,而`annotations`和`categories`则是不同的。 +在`categories`字段中,除了给出类别,还给出了关键点的名称和互相之间的连接性。 +在`annotations`字段中,标注了每一个实例的ID与所在图像,同时还有分割信息和关键点信息。其中与关键点信息较为相关的有: +- `keypoints`:`[x1,y1,v1 ...]`,是一个长度为17*3=51的List,每组表示了一个关键点的坐标与可见性,`v=0, x=0, y=0`表示该点不可见且未标注,`v=1`表示该点有标注但不可见,`v=2`表示该点有标注且可见。 +- `bbox`: `[x1,y1,w,h]`表示该实例的检测框位置。 +- `num_keypoints`: 表示该实例标注关键点的数目。 + + +## MPII数据集 +### MPII数据集的准备 +请先通过[MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/#download)下载MPII数据集的图像与对应标注文件,并存放到`dataset/mpii`路径下。标注文件可以采用[mpii_annotations](https://download.openmmlab.com/mmpose/datasets/mpii_annotations.tar),已对应转换为json格式,完成后的目录结构为: +``` +mpii +|── annotations +| |── mpii_gt_val.mat +| |── mpii_test.json +| |── mpii_train.json +| |── mpii_trainval.json +| `── mpii_val.json +`── images + |── 000001163.jpg + |── 000003072.jpg +``` +### MPII数据集的说明 +在MPII中,关键点序号与部位的对应关系为: +``` +MPII keypoint indexes: + 0: 'right_ankle', + 1: 'right_knee', + 2: 'right_hip', + 3: 'left_hip', + 4: 'left_knee', + 5: 'left_ankle', + 6: 'pelvis', + 7: 'thorax', + 8: 'upper_neck', + 9: 'head_top', + 10: 'right_wrist', + 11: 'right_elbow', + 12: 'right_shoulder', + 13: 'left_shoulder', + 14: 'left_elbow', + 15: 'left_wrist', +``` +下面以一个解析后的标注信息为例,说明标注的内容,其中每条标注信息标注了一个人物实例: +``` +{ + 'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1], + 'gt_joints': [ + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [1232.0, 288.0], + [1236.1271, 311.7755], + [1181.8729, -0.77553], + [692.0, 464.0], + [902.0, 417.0], + [1059.0, 247.0], + [1405.0, 329.0], + [1498.0, 613.0], + [1303.0, 562.0] + ], + 'image': '077096718.jpg', + 'scale': 9.516749, + 'center': [1257.0, 297.0] +} +``` +- `joints_vis`:分别表示16个关键点是否标注,若为0,则对应序号的坐标也为`[-1.0, -1.0]`。 +- `joints`:分别表示16个关键点的坐标。 +- `image`:表示对应的图片文件。 +- `center`:表示人物的大致坐标,用于定位人物在图像中的位置。 +- `scale`:表示人物的比例,对应200px。 + + +## 用户数据准备 + +### 数据格式转换 + +这里我们以`AIChallenger`数据集为例,展示如何将其他数据集对齐到COCO格式并加入关键点模型训练中。 + + +`AI challenger`的标注格式如下: +``` +AI Challenger Description: + 0: 'Right Shoulder', + 1: 'Right Elbow', + 2: 'Right Wrist', + 3: 'Left Shoulder', + 4: 'Left Elbow', + 5: 'Left Wrist', + 6: 'Right Hip', + 7: 'Right Knee', + 8: 'Right Ankle', + 9: 'Left Hip', + 10: 'Left Knee', + 11: 'Left Ankle', + 12: 'Head top', + 13: 'Neck' +``` +1. 将`AI Challenger`点位序号,调整至与`COCO`数据集一致,(如`Right Shoulder`的序号由`0`调整到`13`。 +2. 统一是否标注/可见的标志位信息,如`AI Challenger`中`标注且可见`需要由`1`调整到`2`。 +3. 在该过程中,舍弃该数据集特有的点位(如`Neck`);同时该数据集中没有的COCO点位(如`left_eye`等),对应设置为`v=0, x=0, y=0`,表示该未标注。 +4. 为了避免不同数据集ID重复的问题,需要重新排列图像的`image_id`和`annotation id`。 +5. 整理图像路径`file_name`,使其能够被正确访问到。 + +我们提供了整合`COCO`训练集和`AI Challenger`数据集的[标注文件](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json),供您参考调整后的效果。 + +### 自定义数据训练 + +以[tinypose_256x192](../../../configs/keypoint/tiny_pose/README.md)为例来说明对于自定义数据如何修改: + +#### 1、配置文件[tinypose_256x192.yml](../../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +基本的修改内容及其含义如下: + +``` +num_joints: &num_joints 17 #自定义数据的关键点数量 +train_height: &train_height 256 #训练图片尺寸-高度h +train_width: &train_width 192 #训练图片尺寸-宽度w +hmsize: &hmsize [48, 64] #对应训练尺寸的输出尺寸,这里是输入[w,h]的1/4 +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #关键点定义中左右对称的关键点,用于flip增强。若没有对称结构在 TrainReader 的 RandomFlipHalfBodyTransform 一栏中 flip_pairs 后面加一行 "flip: False"(注意缩紧对齐) +num_joints_half_body: 8 #半身关键点数量,用于半身增强 +prob_half_body: 0.3 #半身增强实现概率,若不需要则修改为0 +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身对应关键点id,用于半身增强中获取上半身对应的关键点。 +``` + +上述是自定义数据时所需要的修改部分,完整的配置及含义说明可参考文件:[关键点配置文件说明](../KeyPointConfigGuide_cn.md)。 + +#### 2、其他代码修改(影响测试、可视化) +- keypoint_utils.py中的sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0,表示每个关键点的确定范围方差,根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- visualizer.py中的draw_pose函数中的EDGES,表示可视化时关键点之间的连接线关系。 +- pycocotools工具中的sigmas,同第一个keypoint_utils.py中的设置。用于coco指标评估时计算。 + +#### 3、数据准备注意 +- 训练数据请按coco数据格式处理。需要包括关键点[Nx3]、检测框[N]标注。 +- 请注意area>0,area=0时数据在训练时会被过滤掉。此外,由于COCO的评估机制,area较小的数据在评估时也会被过滤掉,我们建议在自定义数据时取`area = bbox_w * bbox_h`。 diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet_en.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6ed566d171a9fa6888ff2caaa3a4df521a97ebfa --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareKeypointDataSet_en.md @@ -0,0 +1,142 @@ +[简体中文](PrepareKeypointDataSet.md) | English + +# How to prepare dataset? +## Table of Contents +- [COCO](#COCO) +- [MPII](#MPII) +- [Training for other dataset](#Training_for_other_dataset) + +## COCO +### Preperation for COCO dataset +We provide a one-click script to automatically complete the download and preparation of the COCO2017 dataset. Please refer to [COCO Download](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet_en.md#COCO%E6%95%B0%E6%8D%AE). + +### Description for COCO dataset(Keypoint): +In COCO, the indexes and corresponding keypoint name are: +``` +COCO keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` +Being different from detection task, the annotation files for keyPoint task are `person_keypoints_train2017.json` and `person_keypoints_val2017.json`. In these two json files, the terms `info`、`licenses` and `images` are same with detection task. However, the `annotations` and `categories` are different. + +In `categories`, in addition to the category, there are also the names of the keypoints and the connectivity among them. + +In `annotations`, the ID and image of each instance are annotated, as well as segmentation information and keypoint information. Among them, terms related to the keypoints are: +- `keypoints`: `[x1,y1,v1 ...]`, which is a `List` with length 17*3=51. Each combination represents the coordinates and visibility of one keypoint. `v=0, x=0, y=0` indicates this keypoint is not visible and unlabeled. `v=1` indicates this keypoint is labeled but not visible. `v=2` indicates this keypoint is labeled and visible. +- `bbox`: `[x1,y1,w,h]`, the bounding box of this instance. +- `num_keypoints`: the number of labeled keypoints of this instance. + + +## MPII +### Preperation for MPII dataset +Please download MPII dataset images and corresponding annotation files from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/#download), and save them to `dataset/mpii`. You can use [mpii_annotations](https://download.openmmlab.com/mmpose/datasets/mpii_annotations.tar), which are already converted to `.json`. The directory structure will be shown as: +``` +mpii +|── annotations +| |── mpii_gt_val.mat +| |── mpii_test.json +| |── mpii_train.json +| |── mpii_trainval.json +| `── mpii_val.json +`── images + |── 000001163.jpg + |── 000003072.jpg +``` +### Description for MPII dataset +In MPII, the indexes and corresponding keypoint name are: +``` +MPII keypoint indexes: + 0: 'right_ankle', + 1: 'right_knee', + 2: 'right_hip', + 3: 'left_hip', + 4: 'left_knee', + 5: 'left_ankle', + 6: 'pelvis', + 7: 'thorax', + 8: 'upper_neck', + 9: 'head_top', + 10: 'right_wrist', + 11: 'right_elbow', + 12: 'right_shoulder', + 13: 'left_shoulder', + 14: 'left_elbow', + 15: 'left_wrist', +``` +The following example takes a parsed annotation information to illustrate the content of the annotation, each annotation information represents a person instance: +``` +{ + 'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1], + 'gt_joints': [ + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [-1.0, -1.0], + [1232.0, 288.0], + [1236.1271, 311.7755], + [1181.8729, -0.77553], + [692.0, 464.0], + [902.0, 417.0], + [1059.0, 247.0], + [1405.0, 329.0], + [1498.0, 613.0], + [1303.0, 562.0] + ], + 'image': '077096718.jpg', + 'scale': 9.516749, + 'center': [1257.0, 297.0] +} +``` +- `joints_vis`: indicates whether the 16 keypoints are labeled respectively, if it is 0, the corresponding coordinate will be `[-1.0, -1.0]`. +- `joints`: the coordinates of 16 keypoints. +- `image`: image file which this instance belongs to. +- `center`: the coordinate of person instance center, which is used to locate instance in the image. +- `scale`: scale of the instance, corresponding to 200px. + + +## Training for other dataset +Here, we take `AI Challenger` dataset as example, to show how to align other datasets to `COCO` and add them into training of keypoint models. + +In `AI Challenger`, the indexes and corresponding keypoint name are: +``` +AI Challenger Description: + 0: 'Right Shoulder', + 1: 'Right Elbow', + 2: 'Right Wrist', + 3: 'Left Shoulder', + 4: 'Left Elbow', + 5: 'Left Wrist', + 6: 'Right Hip', + 7: 'Right Knee', + 8: 'Right Ankle', + 9: 'Left Hip', + 10: 'Left Knee', + 11: 'Left Ankle', + 12: 'Head top', + 13: 'Neck' +``` +1. Align the indexes of the `AI Challenger` keypoint to be consistent with `COCO`. For example, the index of `Right Shoulder` should be adjusted from `0` to `13`. +2. Unify the flags whether the keypoint is labeled/visible. For example, `labeled and visible` in `AI Challenger` needs to be adjusted from `1` to `2`. +3. In this proprocess, we discard the unique keypoints in this dataset (like `Neck`). For keypoints not in this dataset but in `COCO` (like `left_eye`), we set `v=0, x=0, y=0` to indicate these keypoints are not labeled. +4. To avoid the problem of ID duplication in different datasets, the `image_id` and `annotation id` need to be rearranged. +5. Rewrite the image path `file_name`, to make sure images can be accessed correctly. + +We also provide an [annotation file](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json) combining `COCO` trainset and `AI Challenger` trainset. diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet.md new file mode 100644 index 0000000000000000000000000000000000000000..37b031f1a85b35f0e8beb774f7520f65215de124 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet.md @@ -0,0 +1,302 @@ +简体中文 | [English](PrepareMOTDataSet_en.md) + +# 多目标跟踪数据集准备 +## 目录 +- [简介和模型选型](#简介和模型选型) +- [MOT数据集准备](#MOT数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) +- [用户自定义数据集准备](#用户自定义数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) +- [引用](#引用) + +## 简介和模型选型 +PaddleDetection中提供了SDE和JDE两个系列的多种算法实现: +- SDE(Separate Detection and Embedding) + - [ByteTrack](../../../configs/mot/bytetrack) + - [DeepSORT](../../../configs/mot/deepsort) + +- JDE(Joint Detection and Embedding) + - [JDE](../../../configs/mot/jde) + - [FairMOT](../../../configs/mot/fairmot) + - [MCFairMOT](../../../configs/mot/mcfairmot) + +**注意:** + - 以上算法原论文均为单类别的多目标跟踪,PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪; + - [DeepSORT](../../../configs/mot/deepsort)和[JDE](../../../configs/mot/jde)均只支持单类别的多目标跟踪; + - [DeepSORT](../../../configs/mot/deepsort)需要额外添加ReID权重一起执行,[ByteTrack](../../../configs/mot/bytetrack)可加可不加ReID权重,默认不加; + + +关于模型选型,PaddleDetection团队提供的总结建议如下: + +| MOT方式 | 经典算法 | 算法流程 | 数据集要求 | 其他特点 | +| :--------------| :--------------| :------- | :----: | :----: | +| SDE系列 | DeepSORT,ByteTrack | 分离式,两个独立模型权重先检测后ReID,也可不加ReID | 检测和ReID数据相对独立,不加ReID时即纯检测数据集 |检测和ReID可分别调优,鲁棒性较高,AI竞赛常用| +| JDE系列 | FairMOT | 联合式,一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练,不易调优,泛化性不强| + +**注意:** + - 由于数据标注的成本较大,建议选型前优先考虑**数据集要求**,如果数据集只有检测框标注而没有ReID标注,是无法使用JDE系列算法训练的,更推荐使用SDE系列; + - SDE系列算法在检测器精度足够高时,也可以不使用ReID权重进行物体间的长时序关联,可以参照[ByteTrack](bytetrack); + - 耗时速度和模型权重参数量计算量有一定关系,耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`; + + +## MOT数据集准备 +PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接,参考[数据集下载汇总](../../../configs/mot/DataDownload.md),用户可以自行下载使用。 + +根据模型选型总结,MOT数据集可以分为两类:一类纯检测框标注的数据集,仅SDE系列可以使用;另一类是同时有检测和ReID标注的数据集,SDE系列和JDE系列都可以使用。 + +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 + +以MOT17数据集为例,下载并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip + +``` +并修改数据集部分的配置文件如下: +``` +num_classes: 1 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json +``` + +数据集目录为: +``` +dataset/mot + |——————MOT17 + |——————annotations + |——————images +``` + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip +``` + +然后按照以下命令可以快速下载各个公开数据集,也解压放在`PaddleDetection/dataset/mot`目录下: +``` +# MIX数据,同JDE,FairMOT论文使用的数据集 +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip +``` +数据集目录为: +``` +dataset/mot + |——————image_lists + |——————caltech.all + |——————citypersons.train + |——————cuhksysu.train + |——————eth.train + |——————mot16.train + |——————mot17.train + |——————prw.train + |——————Caltech + |——————Cityscapes + |——————CUHKSYSU + |——————ETHZ + |——————MOT16 + |——————MOT17 + |——————PRW +``` + +#### JDE数据集的格式 +这几个相关数据集都遵循以下结构: +``` +MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train +``` +所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + + +**注意:** + - MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 + - MIX数据集以及其子数据集都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 + - 更多场景的垂类模型例如车辆行人人头跟踪等,垂类数据集也需要处理成与MIX数据集相同的格式,参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。 + - 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 + + +## 用户自定义数据集准备 + +### SDE数据集 +如果用户选择SDE系列方案,是准备准检测标注的自定义数据集,则可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 + +### JDE数据集 +如果用户选择JDE系列方案,则需要同时具有检测和ReID标注,且符合MOT-17数据集的格式。 +为了规范地进行训练和评测,用户数据需要转成和MOT-17数据集相同的目录和格式: +``` +custom_data + |——————images + | └——————test + | └——————train + | └——————seq1 + | | └——————gt + | | | └——————gt.txt + | | └——————img1 + | | | └——————000001.jpg + | | | |——————000002.jpg + | | | └—————— ... + | | └——————seqinfo.ini + | └——————seq2 + | └——————... + └——————labels_with_ids + └——————train + └——————seq1 + | └——————000001.txt + | |——————000002.txt + | └—————— ... + └——————seq2 + └—————— ... +``` + +##### images文件夹 + - `gt.txt`是原始标注文件,而训练所用标注是`labels_with_ids`文件夹。 + - `gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下: + ``` + [frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio] + ``` + - `img1`文件夹里是按照一定帧率抽好的图片。 + - `seqinfo.ini`文件是视频信息描述文件,需要如下格式的信息: + ``` + [Sequence] + name=MOT17-02 + imDir=img1 + frameRate=30 + seqLength=600 + imWidth=1920 + imHeight=1080 + imExt=.jpg + ``` + +其中`gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下: +``` +[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio] +``` +**注意**: + - `frame_id`为当前图片帧序号 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是**当前视频或图片序列**的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `bb_left`是目标框的左边界的x坐标 + - `bb_top`是目标框的上边界的y坐标 + - `width,height`是真实的像素宽高 + - `score`是当前目标是否进入考虑范围内的标志(值为0表示此目标在计算中被忽略,而值为1则用于将其标记为活动实例),默认为`1` + - `label`是当前目标的种类标签,由于目前仅支持单类别跟踪,默认为`1`,MOT-16数据集中会有其他类别标签,但都是当作ignore类别计算 + - `vis_ratio`是当前目标被其他目标包含或覆挡后的可见率,是从0到1的浮点数,默认为`1` + + +##### labels_with_ids文件夹 +所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` +**注意**: + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + +可采用如下脚本生成相应的`labels_with_ids`: +``` +cd dataset/mot +python gen_labels_MOT.py +``` + + +### 引用 +Caltech: +``` +@inproceedings{ dollarCVPR09peds, + author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona", + title = "Pedestrian Detection: A Benchmark", + booktitle = "CVPR", + month = "June", + year = "2009", + city = "Miami", +} +``` +Citypersons: +``` +@INPROCEEDINGS{Shanshan2017CVPR, + Author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele}, + Title = {CityPersons: A Diverse Dataset for Pedestrian Detection}, + Booktitle = {CVPR}, + Year = {2017} + } + +@INPROCEEDINGS{Cordts2016Cityscapes, +title={The Cityscapes Dataset for Semantic Urban Scene Understanding}, +author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt}, +booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +year={2016} +} +``` +CUHK-SYSU: +``` +@inproceedings{xiaoli2017joint, + title={Joint Detection and Identification Feature Learning for Person Search}, + author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang}, + booktitle={CVPR}, + year={2017} +} +``` +PRW: +``` +@inproceedings{zheng2017person, + title={Person re-identification in the wild}, + author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={1367--1376}, + year={2017} +} +``` +ETHZ: +``` +@InProceedings{eth_biwi_00534, +author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool}, +title = {A Mobile Vision System for Robust Multi-Person Tracking}, +booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)}, +year = {2008}, +month = {June}, +publisher = {IEEE Press}, +keywords = {} +} +``` +MOT-16&17: +``` +@article{milan2016mot16, + title={MOT16: A benchmark for multi-object tracking}, + author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad}, + journal={arXiv preprint arXiv:1603.00831}, + year={2016} +} +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet_en.md b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fc957c636f3ac095e933b02612df00d7ba077262 --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/PrepareMOTDataSet_en.md @@ -0,0 +1,229 @@ +English | [简体中文](PrepareMOTDataSet.md) + +# Contents +## Multi-Object Tracking Dataset Preparation +- [MOT Dataset](#MOT_Dataset) +- [Dataset Directory](#Dataset_Directory) +- [Data Format](#Data_Format) +- [Custom Dataset Preparation](#Custom_Dataset_Preparation) +- [Citations](#Citations) + +### MOT Dataset +PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**. + +**Notes:** +- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets. +- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](../../configs/mot/vehicle/readme.md), [head tracking](../../configs/mot/headtracking21/readme.md) and more general [pedestrian tracking](../../configs/mot/pedestrian/readme.md). User defined datasets can also be prepared by referring to this data preparation doc. +- The multipe category MOT model is [MCFairMOT] (../../configs/mot/mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](../../configs/mot/mcfairmot/README.md). +- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](../../configs/mot/mtmct/README.md). + +### Dataset Directory +First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip +``` + +Then, download the MIX dataset using the following command, and unzip them into `PaddleDetection/dataset/mot`: +``` +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip +``` + +The final directory is: +``` +dataset/mot + |——————image_lists + |——————caltech.10k.val + |——————caltech.all + |——————caltech.train + |——————caltech.val + |——————citypersons.train + |——————citypersons.val + |——————cuhksysu.train + |——————cuhksysu.val + |——————eth.train + |——————mot16.train + |——————mot17.train + |——————prw.train + |——————prw.val + |——————Caltech + |——————Cityscapes + |——————CUHKSYSU + |——————ETHZ + |——————MOT16 + |——————MOT17 + |——————PRW +``` + +### Data Format +These several relevant datasets have the following structure: +``` +MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train +``` +Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`. + +In the annotation text, each line is describing a bounding box and has the following format: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` +**Notes:** +- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`. +- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation. +- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1. + + +### Custom Dataset Preparation + +In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-16 dataset: +``` +custom_data + |——————images + | └——————test + | └——————train + | └——————seq1 + | | └——————gt + | | | └——————gt.txt + | | └——————img1 + | | | └——————000001.jpg + | | | |——————000002.jpg + | | | └—————— ... + | | └——————seqinfo.ini + | └——————seq2 + | └——————... + └——————labels_with_ids + └——————train + └——————seq1 + | └——————000001.txt + | |——————000002.txt + | └—————— ... + └——————seq2 + └—————— ... +``` + +#### images +- `gt.txt` is the original annotation file of all images extracted from the video. +- `img1` is the folder of images extracted from the video by a certain frame rate. +- `seqinfo.ini` is a video information description file, and the following format is required: +``` +[Sequence] +name=MOT16-02 +imDir=img1 +frameRate=30 +seqLength=600 +imWidth=1920 +imHeight=1080 +imExt=.jpg +``` + +Each line in `gt.txt` describes a bounding box, with the format as follows: +``` +[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio] +``` +**Notes:**: +- `frame_id` is the current frame id. +- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in **this video or image sequence**), or `-1` if this box has no identity annotation. +- `bb_left` is the x coordinate of the left boundary of the target box +- `bb_top` is the Y coordinate of the upper boundary of the target box +- `width, height` are the pixel width and height +- `score` acts as a flag whether the entry is to be considered. A value of 0 means that this particular instance is ignored in the evaluation, while a value of 1 is used to mark it as active. `1` by default. +- `label` is the type of object annotated, use `1` as default because only single-class multi-object tracking is supported now. There are other classes of object in MOT-16, but they are treated as ignore. +- `vis_ratio` is the visibility ratio of each bounding box. This can be due to occlusion by another +static or moving object, or due to image border cropping. `1` by default. + +#### labels_with_ids +Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`. + +In the annotation text, each line is describing a bounding box and has the following format: +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` +**Notes:** +- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`. +- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation. +- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1. + +Generate the corresponding `labels_with_ids` with following command: +``` +cd dataset/mot +python gen_labels_MOT.py +``` + + +### Citation +Caltech: +``` +@inproceedings{ dollarCVPR09peds, + author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona", + title = "Pedestrian Detection: A Benchmark", + booktitle = "CVPR", + month = "June", + year = "2009", + city = "Miami", +} +``` +Citypersons: +``` +@INPROCEEDINGS{Shanshan2017CVPR, + Author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele}, + Title = {CityPersons: A Diverse Dataset for Pedestrian Detection}, + Booktitle = {CVPR}, + Year = {2017} + } + +@INPROCEEDINGS{Cordts2016Cityscapes, +title={The Cityscapes Dataset for Semantic Urban Scene Understanding}, +author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt}, +booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +year={2016} +} +``` +CUHK-SYSU: +``` +@inproceedings{xiaoli2017joint, + title={Joint Detection and Identification Feature Learning for Person Search}, + author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang}, + booktitle={CVPR}, + year={2017} +} +``` +PRW: +``` +@inproceedings{zheng2017person, + title={Person re-identification in the wild}, + author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={1367--1376}, + year={2017} +} +``` +ETHZ: +``` +@InProceedings{eth_biwi_00534, +author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool}, +title = {A Mobile Vision System for Robust Multi-Person Tracking}, +booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)}, +year = {2008}, +month = {June}, +publisher = {IEEE Press}, +keywords = {} +} +``` +MOT-16&17: +``` +@article{milan2016mot16, + title={MOT16: A benchmark for multi-object tracking}, + author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad}, + journal={arXiv preprint arXiv:1603.00831}, + year={2016} +} +``` diff --git a/PaddleDetection-release-2.6/docs/tutorials/data/README.md b/PaddleDetection-release-2.6/docs/tutorials/data/README.md new file mode 100644 index 0000000000000000000000000000000000000000..947b650e18cbc9cf9bb57c8b6600588ed0a6501f --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/data/README.md @@ -0,0 +1,27 @@ +# 数据准备 + +数据对于深度学习开发起到了至关重要的作用,数据采集和标注的质量是提升业务模型效果的重要因素。本文档主要介绍PaddleDetection中如何进行数据准备,包括采集高质量数据方法,覆盖多场景类型,提升模型泛化能力;以及各类任务数据标注工具和方法,并在PaddleDetection下使用 + +## 数据采集 +在深度学习任务的实际落地中,数据采集往往决定了最终模型的效果,对于数据采集的几点建议如下: + +### 确定方向 +任务类型、数据的类别和目标场景这些因素决定了要收集什么数据,首先需要根据这些因素来确定整体数据收集的工作方向。 + +### 开源数据集 +在实际场景中数据采集成本其实十分高昂,完全靠自己收集在时间和金钱上都有很高的成本,开源数据集是帮助增加训练数据量的重要手段,所以很多时候会考虑加入一些相似任务的开源数据。在使用中请遵守各个开源数据集的license规定的使用条件。 + +### 增加场景数据 +开源数据一般不会覆盖实际使用的的目标场景,用户需要评估开源数据集中已包含的场景和目标场景间的差异,有针对性地补充目标场景数据,尽量让训练和部署数据的场景一致。 + +### 类别均衡 +在采集阶段,也需要尽量保持类别均衡,帮助模型正确学习到目标特征。 + + +## 数据标注及格式说明 + +| 任务类型 | 数据标注 | 数据格式说明 | +|:--------:| :--------:|:--------:| +| 目标检测 | [文档链接](DetAnnoTools.md) | [文档链接](PrepareDetDataSet.md) | +| 关键点检测 | [文档链接](KeyPointAnnoTools.md) | [文档链接](PrepareKeypointDataSet.md) | +| 多目标跟踪 | [文档链接](MOTAnnoTools.md) | [文档链接](PrepareMOTDataSet.md) | diff --git a/PaddleDetection-release-2.6/docs/tutorials/logging_en.md b/PaddleDetection-release-2.6/docs/tutorials/logging_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b45ceba69d39098f70d0b8825d372529ce40cd0b --- /dev/null +++ b/PaddleDetection-release-2.6/docs/tutorials/logging_en.md @@ -0,0 +1,46 @@ +# Logging + +This document talks about how to track metrics and visualize model performance during training. The library currently supports [VisualDL](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/03_VisualDL/visualdl_usage_en.html) and [Weights & Biases](https://docs.wandb.ai). + +## VisualDL +Logging to VisualDL is supported only in python >= 3.5. To install VisualDL + +``` +pip install visualdl +``` + +PaddleDetection uses a callback to log the training metrics at the end of every step and metrics from the validation step at the end of every epoch. To use VisualDL for visualization, add the `--use_vdl` flag to the training command and `--vdl_log_dir ` to set the directory which stores the records. + +For example + +``` +python tools/train -c config.yml --use_vdl --vdl_log_dir ./logs +``` + +Another possible way to do this is to add the aforementioned flags to the `config.yml` file. + +## Weights & Biases +W&B is a MLOps tool that can be used for experiment tracking, dataset/model versioning, visualizing results and collaborating with colleagues. A W&B logger is integrated directly into PaddleDetection and to use it, first you need to install the wandb sdk and login to your wandb account. + +``` +pip install wandb +wandb login +``` + +To use wandb to log metrics while training add the `--use_wandb` flag to the training command and any other arguments for the W&B logger can be provided like this - + +``` +python tools/train -c config.yml --use_wandb -o wandb-project=MyDetector wandb-entity=MyTeam wandb-save_dir=./logs +``` + +The arguments to the W&B logger must be proceeded by `-o` and each invidiual argument must contain the prefix "wandb-". + +If this is too tedious, an alternative way is to add the arguments to the `config.yml` file under the `wandb` header. For example + +``` +use_wandb: True +wandb: + project: MyProject + entity: MyTeam + save_dir: ./logs +``` diff --git a/PaddleDetection-release-2.6/industrial_tutorial/README.md b/PaddleDetection-release-2.6/industrial_tutorial/README.md new file mode 100644 index 0000000000000000000000000000000000000000..aa68046657d298c6883488b49d5cfd26255c4bcc --- /dev/null +++ b/PaddleDetection-release-2.6/industrial_tutorial/README.md @@ -0,0 +1,45 @@ +# 产业实践范例 + +PaddleDetection场景应用覆盖通用,制造,城市,交通行业的主要检测垂类应用,在PP-YOLOE,PP-PicoDet,PP-Human,PP-Vehicle的能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地目标检测应用提供示范与启发。 + + +
    + +
    + + +欢迎扫码加入用户交流答疑群 +
    + +
    + +## 范例列表 + +- [基于PP-YOLOE-R的旋转框检测](https://aistudio.baidu.com/aistudio/projectdetail/5058293) + +- [基于PP-YOLOE-SOD的无人机航拍图像检测](https://aistudio.baidu.com/aistudio/projectdetail/5036782) + + +- [基于PP-Human v2的摔倒检测](https://aistudio.baidu.com/aistudio/projectdetail/4606001) + +- [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) + +- [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) + +- [基于PP-PicoDet的通信塔识别及Android端部署](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [基于Faster-RCNN的瓷砖表面瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2571419) + +- [基于PaddleDetection的PCB瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2367089) + +- [基于FairMOT实现人流量统计](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + +- [基于YOLOv3实现跌倒检测](https://aistudio.baidu.com/aistudio/projectdetail/2500639) + +- [基于PP-PicoDetv2 的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) + +- [基于人体关键点检测的合规检测](https://aistudio.baidu.com/aistudio/projectdetail/4061642?contributionType=1) + +- [基于PP-Human的来客分析案例教程](https://aistudio.baidu.com/aistudio/projectdetail/4537344) + + *范例将持续更新中 diff --git a/PaddleDetection-release-2.6/ppdet/__init__.py b/PaddleDetection-release-2.6/ppdet/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6fcc982fb60c796e6b9b6e23026d50ef0e9611ae --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/__init__.py @@ -0,0 +1,26 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import (core, data, engine, modeling, model_zoo, optimizer, metrics, + utils, slim) + + +try: + from .version import full_version as __version__ + from .version import commit as __git_commit__ +except ImportError: + import sys + sys.stderr.write("Warning: import ppdet from source directory " \ + "without installing, run 'python setup.py install' to " \ + "install ppdet firstly\n") diff --git a/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c187dc3196855cc36456724f18d2c82f4c7e8689 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fb5dff90e95b1e90e00047b6d05089c74a928635 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/__init__.py b/PaddleDetection-release-2.6/ppdet/core/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..d0427717715c9af31dfa57f1b69f8369fc9178a2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/core/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import config diff --git a/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..875a7e1cf4b8563ccbd9824ff1dbb1932bb06b6a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3bd5ef4f557dd2369e1a791c61551e13ecb2249d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..55489cfb525596ff9b1c55718bce4a05397910d2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1973a84655c3930c7e21dffb384dba607d5e3496 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/__pycache__/workspace.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__init__.py b/PaddleDetection-release-2.6/ppdet/core/config/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..d0c32e26092f6ea25771279418582a24ea449ab2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/core/config/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d2c0c00fe4c9d2dfb412b54102194249f1084afe Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..851b50a3753ff7d7c391034d98cfc24e4e00486d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..186d3f52dab0149f0cbc2bf53e1ca27d6828af23 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2c1cbd48818ce251149a35fa11433e9816ef4c9a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/schema.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e8fa9e9942994e3d154f56377b670387f1c23a22 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5c18864704f1d334661a33b087662acaef2de01f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/core/config/__pycache__/yaml_helpers.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/core/config/schema.py b/PaddleDetection-release-2.6/ppdet/core/config/schema.py new file mode 100644 index 0000000000000000000000000000000000000000..2e41b5c34693a709fa61d47489f6934ead0c17e0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/core/config/schema.py @@ -0,0 +1,248 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +import inspect +import importlib +import re + +try: + from docstring_parser import parse as doc_parse +except Exception: + + def doc_parse(*args): + pass + + +try: + from typeguard import check_type +except Exception: + + def check_type(*args): + pass + + +__all__ = ['SchemaValue', 'SchemaDict', 'SharedConfig', 'extract_schema'] + + +class SchemaValue(object): + def __init__(self, name, doc='', type=None): + super(SchemaValue, self).__init__() + self.name = name + self.doc = doc + self.type = type + + def set_default(self, value): + self.default = value + + def has_default(self): + return hasattr(self, 'default') + + +class SchemaDict(dict): + def __init__(self, **kwargs): + super(SchemaDict, self).__init__() + self.schema = {} + self.strict = False + self.doc = "" + self.update(kwargs) + + def __setitem__(self, key, value): + # XXX also update regular dict to SchemaDict?? + if isinstance(value, dict) and key in self and isinstance(self[key], + SchemaDict): + self[key].update(value) + else: + super(SchemaDict, self).__setitem__(key, value) + + def __missing__(self, key): + if self.has_default(key): + return self.schema[key].default + elif key in self.schema: + return self.schema[key] + else: + raise KeyError(key) + + def copy(self): + newone = SchemaDict() + newone.__dict__.update(self.__dict__) + newone.update(self) + return newone + + def set_schema(self, key, value): + assert isinstance(value, SchemaValue) + self.schema[key] = value + + def set_strict(self, strict): + self.strict = strict + + def has_default(self, key): + return key in self.schema and self.schema[key].has_default() + + def is_default(self, key): + if not self.has_default(key): + return False + if hasattr(self[key], '__dict__'): + return True + else: + return key not in self or self[key] == self.schema[key].default + + def find_default_keys(self): + return [ + k for k in list(self.keys()) + list(self.schema.keys()) + if self.is_default(k) + ] + + def mandatory(self): + return any([k for k in self.schema.keys() if not self.has_default(k)]) + + def find_missing_keys(self): + missing = [ + k for k in self.schema.keys() + if k not in self and not self.has_default(k) + ] + placeholders = [k for k in self if self[k] in ('', '')] + return missing + placeholders + + def find_extra_keys(self): + return list(set(self.keys()) - set(self.schema.keys())) + + def find_mismatch_keys(self): + mismatch_keys = [] + for arg in self.schema.values(): + if arg.type is not None: + try: + check_type("{}.{}".format(self.name, arg.name), + self[arg.name], arg.type) + except Exception: + mismatch_keys.append(arg.name) + return mismatch_keys + + def validate(self): + missing_keys = self.find_missing_keys() + if missing_keys: + raise ValueError("Missing param for class<{}>: {}".format( + self.name, ", ".join(missing_keys))) + extra_keys = self.find_extra_keys() + if extra_keys and self.strict: + raise ValueError("Extraneous param for class<{}>: {}".format( + self.name, ", ".join(extra_keys))) + mismatch_keys = self.find_mismatch_keys() + if mismatch_keys: + raise TypeError("Wrong param type for class<{}>: {}".format( + self.name, ", ".join(mismatch_keys))) + + +class SharedConfig(object): + """ + Representation class for `__shared__` annotations, which work as follows: + + - if `key` is set for the module in config file, its value will take + precedence + - if `key` is not set for the module but present in the config file, its + value will be used + - otherwise, use the provided `default_value` as fallback + + Args: + key: config[key] will be injected + default_value: fallback value + """ + + def __init__(self, key, default_value=None): + super(SharedConfig, self).__init__() + self.key = key + self.default_value = default_value + + +def extract_schema(cls): + """ + Extract schema from a given class + + Args: + cls (type): Class from which to extract. + + Returns: + schema (SchemaDict): Extracted schema. + """ + ctor = cls.__init__ + # python 2 compatibility + if hasattr(inspect, 'getfullargspec'): + argspec = inspect.getfullargspec(ctor) + annotations = argspec.annotations + has_kwargs = argspec.varkw is not None + else: + argspec = inspect.getfullargspec(ctor) + # python 2 type hinting workaround, see pep-3107 + # however, since `typeguard` does not support python 2, type checking + # is still python 3 only for now + annotations = getattr(ctor, '__annotations__', {}) + has_kwargs = argspec.varkw is not None + + names = [arg for arg in argspec.args if arg != 'self'] + defaults = argspec.defaults + num_defaults = argspec.defaults is not None and len(argspec.defaults) or 0 + num_required = len(names) - num_defaults + + docs = cls.__doc__ + if docs is None and getattr(cls, '__category__', None) == 'op': + docs = cls.__call__.__doc__ + try: + docstring = doc_parse(docs) + except Exception: + docstring = None + + if docstring is None: + comments = {} + else: + comments = {} + for p in docstring.params: + match_obj = re.match('^([a-zA-Z_]+[a-zA-Z_0-9]*).*', p.arg_name) + if match_obj is not None: + comments[match_obj.group(1)] = p.description + + schema = SchemaDict() + schema.name = cls.__name__ + schema.doc = "" + if docs is not None: + start_pos = docs[0] == '\n' and 1 or 0 + schema.doc = docs[start_pos:].split("\n")[0].strip() + # XXX handle paddle's weird doc convention + if '**' == schema.doc[:2] and '**' == schema.doc[-2:]: + schema.doc = schema.doc[2:-2].strip() + schema.category = hasattr(cls, '__category__') and getattr( + cls, '__category__') or 'module' + schema.strict = not has_kwargs + schema.pymodule = importlib.import_module(cls.__module__) + schema.inject = getattr(cls, '__inject__', []) + schema.shared = getattr(cls, '__shared__', []) + for idx, name in enumerate(names): + comment = name in comments and comments[name] or name + if name in schema.inject: + type_ = None + else: + type_ = name in annotations and annotations[name] or None + value_schema = SchemaValue(name, comment, type_) + if name in schema.shared: + assert idx >= num_required, "shared config must have default value" + default = defaults[idx - num_required] + value_schema.set_default(SharedConfig(name, default)) + elif idx >= num_required: + default = defaults[idx - num_required] + value_schema.set_default(default) + schema.set_schema(name, value_schema) + + return schema diff --git a/PaddleDetection-release-2.6/ppdet/core/config/yaml_helpers.py b/PaddleDetection-release-2.6/ppdet/core/config/yaml_helpers.py new file mode 100644 index 0000000000000000000000000000000000000000..181cfe6fcd7368c6cadb32d1021a8c55a1d98aa5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/core/config/yaml_helpers.py @@ -0,0 +1,118 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import importlib +import inspect + +import yaml +from .schema import SharedConfig + +__all__ = ['serializable', 'Callable'] + + +def represent_dictionary_order(self, dict_data): + return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items()) + + +def setup_orderdict(): + from collections import OrderedDict + yaml.add_representer(OrderedDict, represent_dictionary_order) + + +def _make_python_constructor(cls): + def python_constructor(loader, node): + if isinstance(node, yaml.SequenceNode): + args = loader.construct_sequence(node, deep=True) + return cls(*args) + else: + kwargs = loader.construct_mapping(node, deep=True) + try: + return cls(**kwargs) + except Exception as ex: + print("Error when construct {} instance from yaml config". + format(cls.__name__)) + raise ex + + return python_constructor + + +def _make_python_representer(cls): + # python 2 compatibility + if hasattr(inspect, 'getfullargspec'): + argspec = inspect.getfullargspec(cls) + else: + argspec = inspect.getfullargspec(cls.__init__) + argnames = [arg for arg in argspec.args if arg != 'self'] + + def python_representer(dumper, obj): + if argnames: + data = {name: getattr(obj, name) for name in argnames} + else: + data = obj.__dict__ + if '_id' in data: + del data['_id'] + return dumper.represent_mapping(u'!{}'.format(cls.__name__), data) + + return python_representer + + +def serializable(cls): + """ + Add loader and dumper for given class, which must be + "trivially serializable" + + Args: + cls: class to be serialized + + Returns: cls + """ + yaml.add_constructor(u'!{}'.format(cls.__name__), + _make_python_constructor(cls)) + yaml.add_representer(cls, _make_python_representer(cls)) + return cls + + +yaml.add_representer(SharedConfig, + lambda d, o: d.represent_data(o.default_value)) + + +@serializable +class Callable(object): + """ + Helper to be used in Yaml for creating arbitrary class objects + + Args: + full_type (str): the full module path to target function + """ + + def __init__(self, full_type, args=[], kwargs={}): + super(Callable, self).__init__() + self.full_type = full_type + self.args = args + self.kwargs = kwargs + + def __call__(self): + if '.' in self.full_type: + idx = self.full_type.rfind('.') + module = importlib.import_module(self.full_type[:idx]) + func_name = self.full_type[idx + 1:] + else: + try: + module = importlib.import_module('builtins') + except Exception: + module = importlib.import_module('__builtin__') + func_name = self.full_type + + func = getattr(module, func_name) + return func(*self.args, **self.kwargs) diff --git a/PaddleDetection-release-2.6/ppdet/core/workspace.py b/PaddleDetection-release-2.6/ppdet/core/workspace.py new file mode 100644 index 0000000000000000000000000000000000000000..6735bcfc26d426565bf0c4cef50dd100f4c5fd30 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/core/workspace.py @@ -0,0 +1,292 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +import importlib +import os +import sys + +import yaml +import collections + +try: + collectionsAbc = collections.abc +except AttributeError: + collectionsAbc = collections + +from .config.schema import SchemaDict, SharedConfig, extract_schema +from .config.yaml_helpers import serializable + +__all__ = [ + 'global_config', + 'load_config', + 'merge_config', + 'get_registered_modules', + 'create', + 'register', + 'serializable', + 'dump_value', +] + + +def dump_value(value): + # XXX this is hackish, but collections.abc is not available in python 2 + if hasattr(value, '__dict__') or isinstance(value, (dict, tuple, list)): + value = yaml.dump(value, default_flow_style=True) + value = value.replace('\n', '') + value = value.replace('...', '') + return "'{}'".format(value) + else: + # primitive types + return str(value) + + +class AttrDict(dict): + """Single level attribute dict, NOT recursive""" + + def __init__(self, **kwargs): + super(AttrDict, self).__init__() + super(AttrDict, self).update(kwargs) + + def __getattr__(self, key): + if key in self: + return self[key] + raise AttributeError("object has no attribute '{}'".format(key)) + + def __setattr__(self, key, value): + self[key] = value + + def copy(self): + new_dict = AttrDict() + for k, v in self.items(): + new_dict.update({k: v}) + return new_dict + + +global_config = AttrDict() + +BASE_KEY = '_BASE_' + + +# parse and load _BASE_ recursively +def _load_config_with_base(file_path): + with open(file_path) as f: + file_cfg = yaml.load(f, Loader=yaml.Loader) + + # NOTE: cfgs outside have higher priority than cfgs in _BASE_ + if BASE_KEY in file_cfg: + all_base_cfg = AttrDict() + base_ymls = list(file_cfg[BASE_KEY]) + for base_yml in base_ymls: + if base_yml.startswith("~"): + base_yml = os.path.expanduser(base_yml) + if not base_yml.startswith('/'): + base_yml = os.path.join(os.path.dirname(file_path), base_yml) + + with open(base_yml) as f: + base_cfg = _load_config_with_base(base_yml) + all_base_cfg = merge_config(base_cfg, all_base_cfg) + + del file_cfg[BASE_KEY] + return merge_config(file_cfg, all_base_cfg) + + return file_cfg + + +def load_config(file_path): + """ + Load config from file. + + Args: + file_path (str): Path of the config file to be loaded. + + Returns: global config + """ + _, ext = os.path.splitext(file_path) + assert ext in ['.yml', '.yaml'], "only support yaml files for now" + + # load config from file and merge into global config + cfg = _load_config_with_base(file_path) + cfg['filename'] = os.path.splitext(os.path.split(file_path)[-1])[0] + merge_config(cfg) + + return global_config + + +def dict_merge(dct, merge_dct): + """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of + updating only top-level keys, dict_merge recurses down into dicts nested + to an arbitrary depth, updating keys. The ``merge_dct`` is merged into + ``dct``. + + Args: + dct: dict onto which the merge is executed + merge_dct: dct merged into dct + + Returns: dct + """ + for k, v in merge_dct.items(): + if (k in dct and isinstance(dct[k], dict) and + isinstance(merge_dct[k], collectionsAbc.Mapping)): + dict_merge(dct[k], merge_dct[k]) + else: + dct[k] = merge_dct[k] + return dct + + +def merge_config(config, another_cfg=None): + """ + Merge config into global config or another_cfg. + + Args: + config (dict): Config to be merged. + + Returns: global config + """ + global global_config + dct = another_cfg or global_config + return dict_merge(dct, config) + + +def get_registered_modules(): + return {k: v for k, v in global_config.items() if isinstance(v, SchemaDict)} + + +def make_partial(cls): + op_module = importlib.import_module(cls.__op__.__module__) + op = getattr(op_module, cls.__op__.__name__) + cls.__category__ = getattr(cls, '__category__', None) or 'op' + + def partial_apply(self, *args, **kwargs): + kwargs_ = self.__dict__.copy() + kwargs_.update(kwargs) + return op(*args, **kwargs_) + + if getattr(cls, '__append_doc__', True): # XXX should default to True? + if sys.version_info[0] > 2: + cls.__doc__ = "Wrapper for `{}` OP".format(op.__name__) + cls.__init__.__doc__ = op.__doc__ + cls.__call__ = partial_apply + cls.__call__.__doc__ = op.__doc__ + else: + # XXX work around for python 2 + partial_apply.__doc__ = op.__doc__ + cls.__call__ = partial_apply + return cls + + +def register(cls): + """ + Register a given module class. + + Args: + cls (type): Module class to be registered. + + Returns: cls + """ + if cls.__name__ in global_config: + raise ValueError("Module class already registered: {}".format( + cls.__name__)) + if hasattr(cls, '__op__'): + cls = make_partial(cls) + global_config[cls.__name__] = extract_schema(cls) + return cls + + +def create(cls_or_name, **kwargs): + """ + Create an instance of given module class. + + Args: + cls_or_name (type or str): Class of which to create instance. + + Returns: instance of type `cls_or_name` + """ + assert type(cls_or_name) in [type, str + ], "should be a class or name of a class" + name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__ + if name in global_config: + if isinstance(global_config[name], SchemaDict): + pass + elif hasattr(global_config[name], "__dict__"): + # support instance return directly + return global_config[name] + else: + raise ValueError("The module {} is not registered".format(name)) + else: + raise ValueError("The module {} is not registered".format(name)) + + config = global_config[name] + cls = getattr(config.pymodule, name) + cls_kwargs = {} + cls_kwargs.update(global_config[name]) + + # parse `shared` annoation of registered modules + if getattr(config, 'shared', None): + for k in config.shared: + target_key = config[k] + shared_conf = config.schema[k].default + assert isinstance(shared_conf, SharedConfig) + if target_key is not None and not isinstance(target_key, + SharedConfig): + continue # value is given for the module + elif shared_conf.key in global_config: + # `key` is present in config + cls_kwargs[k] = global_config[shared_conf.key] + else: + cls_kwargs[k] = shared_conf.default_value + + # parse `inject` annoation of registered modules + if getattr(cls, 'from_config', None): + cls_kwargs.update(cls.from_config(config, **kwargs)) + + if getattr(config, 'inject', None): + for k in config.inject: + target_key = config[k] + # optional dependency + if target_key is None: + continue + + if isinstance(target_key, dict) or hasattr(target_key, '__dict__'): + if 'name' not in target_key.keys(): + continue + inject_name = str(target_key['name']) + if inject_name not in global_config: + raise ValueError( + "Missing injection name {} and check it's name in cfg file". + format(k)) + target = global_config[inject_name] + for i, v in target_key.items(): + if i == 'name': + continue + target[i] = v + if isinstance(target, SchemaDict): + cls_kwargs[k] = create(inject_name) + elif isinstance(target_key, str): + if target_key not in global_config: + raise ValueError("Missing injection config:", target_key) + target = global_config[target_key] + if isinstance(target, SchemaDict): + cls_kwargs[k] = create(target_key) + elif hasattr(target, '__dict__'): # serialized object + cls_kwargs[k] = target + else: + raise ValueError("Unsupported injection type:", target_key) + # prevent modification of global config values of reference types + # (e.g., list, dict) from within the created module instances + #kwargs = copy.deepcopy(kwargs) + return cls(**cls_kwargs) diff --git a/PaddleDetection-release-2.6/ppdet/data/__init__.py b/PaddleDetection-release-2.6/ppdet/data/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a12aa323e7350d13e9b02ff7816ae6d69ab9044e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/__init__.py @@ -0,0 +1,21 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import source +from . import transform +from . import reader + +from .source import * +from .transform import * +from .reader import * diff --git a/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..45141ccc38bc902a2547330e235c4d9d8f33be00 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c271f098e91e3e6552c7d04c3da57bb5f61b4605 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/__pycache__/reader.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/__pycache__/reader.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..cdd84e53b31894bd50e88e1111dc154c8fd0aa7e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/__pycache__/reader.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/__pycache__/shm_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/__pycache__/shm_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..015ef985fb833745945f88fb9dfb969d1095715c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/__pycache__/shm_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..89b9c0c2af9932e158b0ed1eb9e8122940ddd99d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/__init__.py b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..61d5aa213694a29c4820ead6e2a74123c2df44e8 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ca1cc1a01241a2c053f2ba2261d3e3ac8ab4e0e8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/annotation_cropper.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/annotation_cropper.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..37fac662f971531361a8238e4a6c96e8f3236c97 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/annotation_cropper.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/chip_box_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/chip_box_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..434d7ee10c1af24bd92343f5ed9235ef154fce29 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/crop_utils/__pycache__/chip_box_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/annotation_cropper.py b/PaddleDetection-release-2.6/ppdet/data/crop_utils/annotation_cropper.py new file mode 100644 index 0000000000000000000000000000000000000000..e288fabed4bf372186d37637681197bfbd507b87 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/crop_utils/annotation_cropper.py @@ -0,0 +1,580 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import math +import random +import numpy as np +from copy import deepcopy +from typing import List, Tuple +from collections import defaultdict + +from .chip_box_utils import nms, transform_chip_boxes2image_boxes +from .chip_box_utils import find_chips_to_cover_overlaped_boxes +from .chip_box_utils import transform_chip_box +from .chip_box_utils import intersection_over_box + + +class AnnoCropper(object): + def __init__(self, + image_target_sizes: List[int], + valid_box_ratio_ranges: List[List[float]], + chip_target_size: int, + chip_target_stride: int, + use_neg_chip: bool=False, + max_neg_num_per_im: int=8, + max_per_img: int=-1, + nms_thresh: int=0.5): + """ + Generate chips by chip_target_size and chip_target_stride. + These two parameters just like kernel_size and stride in cnn. + + Each image has its raw size. After resizing, then get its target size. + The resizing scale = target_size / raw_size. + So are chips of the image. + box_ratio = box_raw_size / image_raw_size = box_target_size / image_target_size + The 'size' above mentioned is the size of long-side of image, box or chip. + + :param image_target_sizes: [2000, 1000] + :param valid_box_ratio_ranges: [[-1, 0.1],[0.08, -1]] + :param chip_target_size: 500 + :param chip_target_stride: 200 + """ + self.target_sizes = image_target_sizes + self.valid_box_ratio_ranges = valid_box_ratio_ranges + assert len(self.target_sizes) == len(self.valid_box_ratio_ranges) + self.scale_num = len(self.target_sizes) + self.chip_target_size = chip_target_size # is target size + self.chip_target_stride = chip_target_stride # is target stride + self.use_neg_chip = use_neg_chip + self.max_neg_num_per_im = max_neg_num_per_im + self.max_per_img = max_per_img + self.nms_thresh = nms_thresh + + def crop_anno_records(self, records: List[dict]): + """ + The main logic: + # foreach record(image): + # foreach scale: + # 1 generate chips by chip size and stride for each scale + # 2 get pos chips + # - validate boxes: current scale; h,w >= 1 + # - find pos chips greedily by valid gt boxes in each scale + # - for every valid gt box, find its corresponding pos chips in each scale + # 3 get neg chips + # - If given proposals, find neg boxes in them which are not in pos chips + # - If got neg boxes in last step, we find neg chips and assign neg boxes to neg chips such as 2. + # 4 sample neg chips if too much each image + # transform this image-scale annotations to chips(pos chips&neg chips) annotations + + :param records, standard coco_record but with extra key `proposals`(Px4), which are predicted by stage1 + model and maybe have neg boxes in them. + :return: new_records, list of dict like + { + 'im_file': 'fake_image1.jpg', + 'im_id': np.array([1]), # new _global_chip_id as im_id + 'h': h, # chip height + 'w': w, # chip width + 'is_crowd': is_crowd, # Nx1 -> Mx1 + 'gt_class': gt_class, # Nx1 -> Mx1 + 'gt_bbox': gt_bbox, # Nx4 -> Mx4, 4 represents [x1,y1,x2,y2] + 'gt_poly': gt_poly, # [None]xN -> [None]xM + 'chip': [x1, y1, x2, y2] # added + } + + Attention: + ------------------------------>x + | + | (x1,y1)------ + | | | + | | | + | | | + | | | + | | | + | ---------- + | (x2,y2) + | + ↓ + y + + If we use [x1, y1, x2, y2] to represent boxes or chips, + (x1,y1) is the left-top point which is in the box, + but (x2,y2) is the right-bottom point which is not in the box. + So x1 in [0, w-1], x2 in [1, w], y1 in [0, h-1], y2 in [1,h]. + And you can use x2-x1 to get width, and you can use image[y1:y2, x1:x2] to get the box area. + """ + + self.chip_records = [] + self._global_chip_id = 1 + for r in records: + self._cur_im_pos_chips = [ + ] # element: (chip, boxes_idx), chip is [x1, y1, x2, y2], boxes_ids is List[int] + self._cur_im_neg_chips = [] # element: (chip, neg_box_num) + for scale_i in range(self.scale_num): + self._get_current_scale_parameters(scale_i, r) + + # Cx4 + chips = self._create_chips(r['h'], r['w'], self._cur_scale) + + # # dict: chipid->[box_id, ...] + pos_chip2boxes_idx = self._get_valid_boxes_and_pos_chips( + r['gt_bbox'], chips) + + # dict: chipid->neg_box_num + neg_chip2box_num = self._get_neg_boxes_and_chips( + chips, + list(pos_chip2boxes_idx.keys()), r.get('proposals', None)) + + self._add_to_cur_im_chips(chips, pos_chip2boxes_idx, + neg_chip2box_num) + + cur_image_records = self._trans_all_chips2annotations(r) + self.chip_records.extend(cur_image_records) + return self.chip_records + + def _add_to_cur_im_chips(self, chips, pos_chip2boxes_idx, neg_chip2box_num): + for pos_chipid, boxes_idx in pos_chip2boxes_idx.items(): + chip = np.array(chips[pos_chipid]) # copy chips slice + self._cur_im_pos_chips.append((chip, boxes_idx)) + + if neg_chip2box_num is None: + return + + for neg_chipid, neg_box_num in neg_chip2box_num.items(): + chip = np.array(chips[neg_chipid]) + self._cur_im_neg_chips.append((chip, neg_box_num)) + + def _trans_all_chips2annotations(self, r): + gt_bbox = r['gt_bbox'] + im_file = r['im_file'] + is_crowd = r['is_crowd'] + gt_class = r['gt_class'] + # gt_poly = r['gt_poly'] # [None]xN + # remaining keys: im_id, h, w + chip_records = self._trans_pos_chips2annotations(im_file, gt_bbox, + is_crowd, gt_class) + + if not self.use_neg_chip: + return chip_records + + sampled_neg_chips = self._sample_neg_chips() + neg_chip_records = self._trans_neg_chips2annotations(im_file, + sampled_neg_chips) + chip_records.extend(neg_chip_records) + return chip_records + + def _trans_pos_chips2annotations(self, im_file, gt_bbox, is_crowd, + gt_class): + chip_records = [] + for chip, boxes_idx in self._cur_im_pos_chips: + chip_bbox, final_boxes_idx = transform_chip_box(gt_bbox, boxes_idx, + chip) + x1, y1, x2, y2 = chip + chip_h = y2 - y1 + chip_w = x2 - x1 + rec = { + 'im_file': im_file, + 'im_id': np.array([self._global_chip_id]), + 'h': chip_h, + 'w': chip_w, + 'gt_bbox': chip_bbox, + 'is_crowd': is_crowd[final_boxes_idx].copy(), + 'gt_class': gt_class[final_boxes_idx].copy(), + # 'gt_poly': [None] * len(final_boxes_idx), + 'chip': chip + } + self._global_chip_id += 1 + chip_records.append(rec) + return chip_records + + def _sample_neg_chips(self): + pos_num = len(self._cur_im_pos_chips) + neg_num = len(self._cur_im_neg_chips) + sample_num = min(pos_num + 2, self.max_neg_num_per_im) + assert sample_num >= 1 + if neg_num <= sample_num: + return self._cur_im_neg_chips + + candidate_num = int(sample_num * 1.5) + candidate_neg_chips = sorted( + self._cur_im_neg_chips, key=lambda x: -x[1])[:candidate_num] + random.shuffle(candidate_neg_chips) + sampled_neg_chips = candidate_neg_chips[:sample_num] + return sampled_neg_chips + + def _trans_neg_chips2annotations(self, + im_file: str, + sampled_neg_chips: List[Tuple]): + chip_records = [] + for chip, neg_box_num in sampled_neg_chips: + x1, y1, x2, y2 = chip + chip_h = y2 - y1 + chip_w = x2 - x1 + rec = { + 'im_file': im_file, + 'im_id': np.array([self._global_chip_id]), + 'h': chip_h, + 'w': chip_w, + 'gt_bbox': np.zeros( + (0, 4), dtype=np.float32), + 'is_crowd': np.zeros( + (0, 1), dtype=np.int32), + 'gt_class': np.zeros( + (0, 1), dtype=np.int32), + # 'gt_poly': [], + 'chip': chip + } + self._global_chip_id += 1 + chip_records.append(rec) + return chip_records + + def _get_current_scale_parameters(self, scale_i, r): + im_size = max(r['h'], r['w']) + im_target_size = self.target_sizes[scale_i] + self._cur_im_size, self._cur_im_target_size = im_size, im_target_size + self._cur_scale = self._get_current_scale(im_target_size, im_size) + self._cur_valid_ratio_range = self.valid_box_ratio_ranges[scale_i] + + def _get_current_scale(self, im_target_size, im_size): + return im_target_size / im_size + + def _create_chips(self, h: int, w: int, scale: float): + """ + Generate chips by chip_target_size and chip_target_stride. + These two parameters just like kernel_size and stride in cnn. + :return: chips, Cx4, xy in raw size dimension + """ + chip_size = self.chip_target_size # omit target for simplicity + stride = self.chip_target_stride + width = int(scale * w) + height = int(scale * h) + min_chip_location_diff = 20 # in target size + + assert chip_size >= stride + chip_overlap = chip_size - stride + if (width - chip_overlap + ) % stride > min_chip_location_diff: # 不能被stride整除的部分比较大,则保留 + w_steps = max(1, int(math.ceil((width - chip_overlap) / stride))) + else: # 不能被stride整除的部分比较小,则丢弃 + w_steps = max(1, int(math.floor((width - chip_overlap) / stride))) + if (height - chip_overlap) % stride > min_chip_location_diff: + h_steps = max(1, int(math.ceil((height - chip_overlap) / stride))) + else: + h_steps = max(1, int(math.floor((height - chip_overlap) / stride))) + + chips = list() + for j in range(h_steps): + for i in range(w_steps): + x1 = i * stride + y1 = j * stride + x2 = min(x1 + chip_size, width) + y2 = min(y1 + chip_size, height) + chips.append([x1, y1, x2, y2]) + + # check chip size + for item in chips: + if item[2] - item[0] > chip_size * 1.1 or item[3] - item[ + 1] > chip_size * 1.1: + raise ValueError(item) + chips = np.array(chips, dtype=np.float32) + + raw_size_chips = chips / scale + return raw_size_chips + + def _get_valid_boxes_and_pos_chips(self, gt_bbox, chips): + valid_ratio_range = self._cur_valid_ratio_range + im_size = self._cur_im_size + scale = self._cur_scale + # Nx4 N + valid_boxes, valid_boxes_idx = self._validate_boxes( + valid_ratio_range, im_size, gt_bbox, scale) + # dict: chipid->[box_id, ...] + pos_chip2boxes_idx = self._find_pos_chips(chips, valid_boxes, + valid_boxes_idx) + return pos_chip2boxes_idx + + def _validate_boxes(self, + valid_ratio_range: List[float], + im_size: int, + gt_boxes: 'np.array of Nx4', + scale: float): + """ + :return: valid_boxes: Nx4, valid_boxes_idx: N + """ + ws = (gt_boxes[:, 2] - gt_boxes[:, 0]).astype(np.int32) + hs = (gt_boxes[:, 3] - gt_boxes[:, 1]).astype(np.int32) + maxs = np.maximum(ws, hs) + box_ratio = maxs / im_size + mins = np.minimum(ws, hs) + target_mins = mins * scale + + low = valid_ratio_range[0] if valid_ratio_range[0] > 0 else 0 + high = valid_ratio_range[1] if valid_ratio_range[1] > 0 else np.finfo( + np.float32).max + + valid_boxes_idx = np.nonzero((low <= box_ratio) & (box_ratio < high) & ( + target_mins >= 2))[0] + valid_boxes = gt_boxes[valid_boxes_idx] + return valid_boxes, valid_boxes_idx + + def _find_pos_chips(self, + chips: 'Cx4', + valid_boxes: 'Bx4', + valid_boxes_idx: 'B'): + """ + :return: pos_chip2boxes_idx, dict: chipid->[box_id, ...] + """ + iob = intersection_over_box(chips, valid_boxes) # overlap, CxB + + iob_threshold_to_find_chips = 1. + pos_chip_ids, _ = self._find_chips_to_cover_overlaped_boxes( + iob, iob_threshold_to_find_chips) + pos_chip_ids = set(pos_chip_ids) + + iob_threshold_to_assign_box = 0.5 + pos_chip2boxes_idx = self._assign_boxes_to_pos_chips( + iob, iob_threshold_to_assign_box, pos_chip_ids, valid_boxes_idx) + return pos_chip2boxes_idx + + def _find_chips_to_cover_overlaped_boxes(self, iob, overlap_threshold): + return find_chips_to_cover_overlaped_boxes(iob, overlap_threshold) + + def _assign_boxes_to_pos_chips(self, iob, overlap_threshold, pos_chip_ids, + valid_boxes_idx): + chip_ids, box_ids = np.nonzero(iob >= overlap_threshold) + pos_chip2boxes_idx = defaultdict(list) + for chip_id, box_id in zip(chip_ids, box_ids): + if chip_id not in pos_chip_ids: + continue + raw_gt_box_idx = valid_boxes_idx[box_id] + pos_chip2boxes_idx[chip_id].append(raw_gt_box_idx) + return pos_chip2boxes_idx + + def _get_neg_boxes_and_chips(self, + chips: 'Cx4', + pos_chip_ids: 'D', + proposals: 'Px4'): + """ + :param chips: + :param pos_chip_ids: + :param proposals: + :return: neg_chip2box_num, None or dict: chipid->neg_box_num + """ + if not self.use_neg_chip: + return None + + # train proposals maybe None + if proposals is None or len(proposals) < 1: + return None + + valid_ratio_range = self._cur_valid_ratio_range + im_size = self._cur_im_size + scale = self._cur_scale + + valid_props, _ = self._validate_boxes(valid_ratio_range, im_size, + proposals, scale) + neg_boxes = self._find_neg_boxes(chips, pos_chip_ids, valid_props) + neg_chip2box_num = self._find_neg_chips(chips, pos_chip_ids, neg_boxes) + return neg_chip2box_num + + def _find_neg_boxes(self, + chips: 'Cx4', + pos_chip_ids: 'D', + valid_props: 'Px4'): + """ + :return: neg_boxes: Nx4 + """ + if len(pos_chip_ids) == 0: + return valid_props + + pos_chips = chips[pos_chip_ids] + iob = intersection_over_box(pos_chips, valid_props) + overlap_per_prop = np.max(iob, axis=0) + non_overlap_props_idx = overlap_per_prop < 0.5 + neg_boxes = valid_props[non_overlap_props_idx] + return neg_boxes + + def _find_neg_chips(self, chips: 'Cx4', pos_chip_ids: 'D', + neg_boxes: 'Nx4'): + """ + :return: neg_chip2box_num, dict: chipid->neg_box_num + """ + neg_chip_ids = np.setdiff1d(np.arange(len(chips)), pos_chip_ids) + neg_chips = chips[neg_chip_ids] + + iob = intersection_over_box(neg_chips, neg_boxes) + iob_threshold_to_find_chips = 0.7 + chosen_neg_chip_ids, chip_id2overlap_box_num = \ + self._find_chips_to_cover_overlaped_boxes(iob, iob_threshold_to_find_chips) + + neg_chipid2box_num = {} + for cid in chosen_neg_chip_ids: + box_num = chip_id2overlap_box_num[cid] + raw_chip_id = neg_chip_ids[cid] + neg_chipid2box_num[raw_chip_id] = box_num + return neg_chipid2box_num + + def crop_infer_anno_records(self, records: List[dict]): + """ + transform image record to chips record + :param records: + :return: new_records, list of dict like + { + 'im_file': 'fake_image1.jpg', + 'im_id': np.array([1]), # new _global_chip_id as im_id + 'h': h, # chip height + 'w': w, # chip width + 'chip': [x1, y1, x2, y2] # added + 'ori_im_h': ori_im_h # added, origin image height + 'ori_im_w': ori_im_w # added, origin image width + 'scale_i': 0 # added, + } + """ + self.chip_records = [] + self._global_chip_id = 1 # im_id start from 1 + self._global_chip_id2img_id = {} + + for r in records: + for scale_i in range(self.scale_num): + self._get_current_scale_parameters(scale_i, r) + # Cx4 + chips = self._create_chips(r['h'], r['w'], self._cur_scale) + cur_img_chip_record = self._get_chips_records(r, chips, scale_i) + self.chip_records.extend(cur_img_chip_record) + + return self.chip_records + + def _get_chips_records(self, rec, chips, scale_i): + cur_img_chip_records = [] + ori_im_h = rec["h"] + ori_im_w = rec["w"] + im_file = rec["im_file"] + ori_im_id = rec["im_id"] + for id, chip in enumerate(chips): + chip_rec = {} + x1, y1, x2, y2 = chip + chip_h = y2 - y1 + chip_w = x2 - x1 + chip_rec["im_file"] = im_file + chip_rec["im_id"] = self._global_chip_id + chip_rec["h"] = chip_h + chip_rec["w"] = chip_w + chip_rec["chip"] = chip + chip_rec["ori_im_h"] = ori_im_h + chip_rec["ori_im_w"] = ori_im_w + chip_rec["scale_i"] = scale_i + + self._global_chip_id2img_id[self._global_chip_id] = int(ori_im_id) + self._global_chip_id += 1 + cur_img_chip_records.append(chip_rec) + + return cur_img_chip_records + + def aggregate_chips_detections(self, results, records=None): + """ + # 1. transform chip dets to image dets + # 2. nms boxes per image; + # 3. format output results + :param results: + :param roidb: + :return: + """ + results = deepcopy(results) + records = records if records else self.chip_records + img_id2bbox = self._transform_chip2image_bboxes(results, records) + nms_img_id2bbox = self._nms_dets(img_id2bbox) + aggregate_results = self._reformat_results(nms_img_id2bbox) + return aggregate_results + + def _transform_chip2image_bboxes(self, results, records): + # 1. Transform chip dets to image dets; + # 2. Filter valid range; + # 3. Reformat and Aggregate chip dets to Get scale_cls_dets + img_id2bbox = defaultdict(list) + for result in results: + bbox_locs = result['bbox'] + bbox_nums = result['bbox_num'] + if len(bbox_locs) == 1 and bbox_locs[0][ + 0] == -1: # current batch has no detections + # bbox_locs = array([[-1.]], dtype=float32); bbox_nums = [[1]] + # MultiClassNMS output: If there is no detected boxes for all images, lod will be set to {1} and Out only contains one value which is -1. + continue + im_ids = result['im_id'] # replace with range(len(bbox_nums)) + + last_bbox_num = 0 + for idx, im_id in enumerate(im_ids): + + cur_bbox_len = bbox_nums[idx] + bboxes = bbox_locs[last_bbox_num:last_bbox_num + cur_bbox_len] + last_bbox_num += cur_bbox_len + # box: [num_id, score, xmin, ymin, xmax, ymax] + if len(bboxes) == 0: # current image has no detections + continue + + chip_rec = records[int(im_id) - + 1] # im_id starts from 1, type is np.int64 + image_size = max(chip_rec["ori_im_h"], chip_rec["ori_im_w"]) + + bboxes = transform_chip_boxes2image_boxes( + bboxes, chip_rec["chip"], chip_rec["ori_im_h"], + chip_rec["ori_im_w"]) + + scale_i = chip_rec["scale_i"] + cur_scale = self._get_current_scale(self.target_sizes[scale_i], + image_size) + _, valid_boxes_idx = self._validate_boxes( + self.valid_box_ratio_ranges[scale_i], image_size, + bboxes[:, 2:], cur_scale) + ori_img_id = self._global_chip_id2img_id[int(im_id)] + + img_id2bbox[ori_img_id].append(bboxes[valid_boxes_idx]) + + return img_id2bbox + + def _nms_dets(self, img_id2bbox): + # 1. NMS on each image-class + # 2. Limit number of detections to MAX_PER_IMAGE if requested + max_per_img = self.max_per_img + nms_thresh = self.nms_thresh + + for img_id in img_id2bbox: + box = img_id2bbox[ + img_id] # list of np.array of shape [N, 6], 6 is [label, score, x1, y1, x2, y2] + box = np.concatenate(box, axis=0) + nms_dets = nms(box, nms_thresh) + if max_per_img > 0: + if len(nms_dets) > max_per_img: + keep = np.argsort(-nms_dets[:, 1])[:max_per_img] + nms_dets = nms_dets[keep] + + img_id2bbox[img_id] = nms_dets + + return img_id2bbox + + def _reformat_results(self, img_id2bbox): + """reformat results""" + im_ids = img_id2bbox.keys() + results = [] + for img_id in im_ids: # output by original im_id order + if len(img_id2bbox[img_id]) == 0: + bbox = np.array( + [[-1., 0., 0., 0., 0., 0.]]) # edge case: no detections + bbox_num = np.array([0]) + else: + # np.array of shape [N, 6], 6 is [label, score, x1, y1, x2, y2] + bbox = img_id2bbox[img_id] + bbox_num = np.array([len(bbox)]) + res = dict(im_id=np.array([[img_id]]), bbox=bbox, bbox_num=bbox_num) + results.append(res) + return results diff --git a/PaddleDetection-release-2.6/ppdet/data/crop_utils/chip_box_utils.py b/PaddleDetection-release-2.6/ppdet/data/crop_utils/chip_box_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..cfa1e39e9058a3f13ef4f972b3b98acd17bc5080 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/crop_utils/chip_box_utils.py @@ -0,0 +1,170 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + + +def bbox_area(boxes): + return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) + + +def intersection_over_box(chips, boxes): + """ + intersection area over box area + :param chips: C + :param boxes: B + :return: iob, CxB + """ + M = chips.shape[0] + N = boxes.shape[0] + if M * N == 0: + return np.zeros([M, N], dtype='float32') + + box_area = bbox_area(boxes) # B + + inter_x2y2 = np.minimum(np.expand_dims(chips, 1)[:, :, 2:], + boxes[:, 2:]) # CxBX2 + inter_x1y1 = np.maximum(np.expand_dims(chips, 1)[:, :, :2], + boxes[:, :2]) # CxBx2 + inter_wh = inter_x2y2 - inter_x1y1 + inter_wh = np.clip(inter_wh, a_min=0, a_max=None) + inter_area = inter_wh[:, :, 0] * inter_wh[:, :, 1] # CxB + + iob = inter_area / np.expand_dims(box_area, 0) + return iob + + +def clip_boxes(boxes, im_shape): + """ + Clip boxes to image boundaries. + :param boxes: [N, 4] + :param im_shape: tuple of 2, [h, w] + :return: [N, 4] + """ + # x1 >= 0 + boxes[:, 0] = np.clip(boxes[:, 0], 0, im_shape[1] - 1) + # y1 >= 0 + boxes[:, 1] = np.clip(boxes[:, 1], 0, im_shape[0] - 1) + # x2 < im_shape[1] + boxes[:, 2] = np.clip(boxes[:, 2], 1, im_shape[1]) + # y2 < im_shape[0] + boxes[:, 3] = np.clip(boxes[:, 3], 1, im_shape[0]) + return boxes + + +def transform_chip_box(gt_bbox: 'Gx4', boxes_idx: 'B', chip: '4'): + boxes_idx = np.array(boxes_idx) + cur_gt_bbox = gt_bbox[boxes_idx].copy() # Bx4 + x1, y1, x2, y2 = chip + cur_gt_bbox[:, 0] -= x1 + cur_gt_bbox[:, 1] -= y1 + cur_gt_bbox[:, 2] -= x1 + cur_gt_bbox[:, 3] -= y1 + h = y2 - y1 + w = x2 - x1 + cur_gt_bbox = clip_boxes(cur_gt_bbox, (h, w)) + ws = (cur_gt_bbox[:, 2] - cur_gt_bbox[:, 0]).astype(np.int32) + hs = (cur_gt_bbox[:, 3] - cur_gt_bbox[:, 1]).astype(np.int32) + valid_idx = (ws >= 2) & (hs >= 2) + return cur_gt_bbox[valid_idx], boxes_idx[valid_idx] + + +def find_chips_to_cover_overlaped_boxes(iob, overlap_threshold): + chip_ids, box_ids = np.nonzero(iob >= overlap_threshold) + chip_id2overlap_box_num = np.bincount(chip_ids) # 1d array + chip_id2overlap_box_num = np.pad( + chip_id2overlap_box_num, (0, len(iob) - len(chip_id2overlap_box_num)), + constant_values=0) + + chosen_chip_ids = [] + while len(box_ids) > 0: + value_counts = np.bincount(chip_ids) # 1d array + max_count_chip_id = np.argmax(value_counts) + assert max_count_chip_id not in chosen_chip_ids + chosen_chip_ids.append(max_count_chip_id) + + box_ids_in_cur_chip = box_ids[chip_ids == max_count_chip_id] + ids_not_in_cur_boxes_mask = np.logical_not( + np.isin(box_ids, box_ids_in_cur_chip)) + chip_ids = chip_ids[ids_not_in_cur_boxes_mask] + box_ids = box_ids[ids_not_in_cur_boxes_mask] + return chosen_chip_ids, chip_id2overlap_box_num + + +def transform_chip_boxes2image_boxes(chip_boxes, chip, img_h, img_w): + chip_boxes = np.array(sorted(chip_boxes, key=lambda item: -item[1])) + xmin, ymin, _, _ = chip + # Transform to origin image loc + chip_boxes[:, 2] += xmin + chip_boxes[:, 4] += xmin + chip_boxes[:, 3] += ymin + chip_boxes[:, 5] += ymin + chip_boxes = clip_boxes(chip_boxes, (img_h, img_w)) + return chip_boxes + + +def nms(dets, thresh): + """Apply classic DPM-style greedy NMS.""" + if dets.shape[0] == 0: + return dets[[], :] + scores = dets[:, 1] + x1 = dets[:, 2] + y1 = dets[:, 3] + x2 = dets[:, 4] + y2 = dets[:, 5] + + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + ndets = dets.shape[0] + suppressed = np.zeros((ndets), dtype=np.int32) + + # nominal indices + # _i, _j + # sorted indices + # i, j + # temp variables for box i's (the box currently under consideration) + # ix1, iy1, ix2, iy2, iarea + + # variables for computing overlap with box j (lower scoring box) + # xx1, yy1, xx2, yy2 + # w, h + # inter, ovr + + for _i in range(ndets): + i = order[_i] + if suppressed[i] == 1: + continue + ix1 = x1[i] + iy1 = y1[i] + ix2 = x2[i] + iy2 = y2[i] + iarea = areas[i] + for _j in range(_i + 1, ndets): + j = order[_j] + if suppressed[j] == 1: + continue + xx1 = max(ix1, x1[j]) + yy1 = max(iy1, y1[j]) + xx2 = min(ix2, x2[j]) + yy2 = min(iy2, y2[j]) + w = max(0.0, xx2 - xx1 + 1) + h = max(0.0, yy2 - yy1 + 1) + inter = w * h + ovr = inter / (iarea + areas[j] - inter) + if ovr >= thresh: + suppressed[j] = 1 + keep = np.where(suppressed == 0)[0] + dets = dets[keep, :] + return dets diff --git a/PaddleDetection-release-2.6/ppdet/data/reader.py b/PaddleDetection-release-2.6/ppdet/data/reader.py new file mode 100644 index 0000000000000000000000000000000000000000..227fabca6dce8ef76dabe88864a69000d38468dd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/reader.py @@ -0,0 +1,611 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import os +import traceback +import six +import sys +if sys.version_info >= (3, 0): + pass +else: + pass +import numpy as np +import paddle +import paddle.nn.functional as F + +from copy import deepcopy + +from paddle.io import DataLoader, DistributedBatchSampler +from .utils import default_collate_fn + +from ppdet.core.workspace import register +from . import transform +from .shm_utils import _get_shared_memory_size_in_M + +from ppdet.utils.logger import setup_logger +logger = setup_logger('reader') + +MAIN_PID = os.getpid() + + +class Compose(object): + def __init__(self, transforms, num_classes=80): + self.transforms = transforms + self.transforms_cls = [] + for t in self.transforms: + for k, v in t.items(): + op_cls = getattr(transform, k) + f = op_cls(**v) + if hasattr(f, 'num_classes'): + f.num_classes = num_classes + + self.transforms_cls.append(f) + + def __call__(self, data): + for f in self.transforms_cls: + try: + data = f(data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map sample transform [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + return data + + +class BatchCompose(Compose): + def __init__(self, transforms, num_classes=80, collate_batch=True): + super(BatchCompose, self).__init__(transforms, num_classes) + self.collate_batch = collate_batch + + def __call__(self, data): + for f in self.transforms_cls: + try: + data = f(data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map batch transform [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + # remove keys which is not needed by model + extra_key = ['h', 'w', 'flipped'] + for k in extra_key: + for sample in data: + if k in sample: + sample.pop(k) + + # batch data, if user-define batch function needed + # use user-defined here + if self.collate_batch: + batch_data = default_collate_fn(data) + else: + batch_data = {} + for k in data[0].keys(): + tmp_data = [] + for i in range(len(data)): + tmp_data.append(data[i][k]) + if not 'gt_' in k and not 'is_crowd' in k and not 'difficult' in k: + tmp_data = np.stack(tmp_data, axis=0) + batch_data[k] = tmp_data + return batch_data + + +class BaseDataLoader(object): + """ + Base DataLoader implementation for detection models + + Args: + sample_transforms (list): a list of transforms to perform + on each sample + batch_transforms (list): a list of transforms to perform + on batch + batch_size (int): batch size for batch collating, default 1. + shuffle (bool): whether to shuffle samples + drop_last (bool): whether to drop the last incomplete, + default False + num_classes (int): class number of dataset, default 80 + collate_batch (bool): whether to collate batch in dataloader. + If set to True, the samples will collate into batch according + to the batch size. Otherwise, the ground-truth will not collate, + which is used when the number of ground-truch is different in + samples. + use_shared_memory (bool): whether to use shared memory to + accelerate data loading, enable this only if you + are sure that the shared memory size of your OS + is larger than memory cost of input datas of model. + Note that shared memory will be automatically + disabled if the shared memory of OS is less than + 1G, which is not enough for detection models. + Default False. + """ + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=False, + drop_last=False, + num_classes=80, + collate_batch=True, + use_shared_memory=False, + **kwargs): + # sample transform + self._sample_transforms = Compose( + sample_transforms, num_classes=num_classes) + + # batch transfrom + self._batch_transforms = BatchCompose(batch_transforms, num_classes, + collate_batch) + self.batch_size = batch_size + self.shuffle = shuffle + self.drop_last = drop_last + self.use_shared_memory = use_shared_memory + self.kwargs = kwargs + + def __call__(self, + dataset, + worker_num, + batch_sampler=None, + return_list=False): + self.dataset = dataset + self.dataset.check_or_download_dataset() + self.dataset.parse_dataset() + # get data + self.dataset.set_transform(self._sample_transforms) + # set kwargs + self.dataset.set_kwargs(**self.kwargs) + # batch sampler + if batch_sampler is None: + self._batch_sampler = DistributedBatchSampler( + self.dataset, + batch_size=self.batch_size, + shuffle=self.shuffle, + drop_last=self.drop_last) + else: + self._batch_sampler = batch_sampler + + # DataLoader do not start sub-process in Windows and Mac + # system, do not need to use shared memory + use_shared_memory = self.use_shared_memory and \ + sys.platform not in ['win32', 'darwin'] + # check whether shared memory size is bigger than 1G(1024M) + if use_shared_memory: + shm_size = _get_shared_memory_size_in_M() + if shm_size is not None and shm_size < 1024.: + logger.warning("Shared memory size is less than 1G, " + "disable shared_memory in DataLoader") + use_shared_memory = False + + self.dataloader = DataLoader( + dataset=self.dataset, + batch_sampler=self._batch_sampler, + collate_fn=self._batch_transforms, + num_workers=worker_num, + return_list=return_list, + use_shared_memory=use_shared_memory) + self.loader = iter(self.dataloader) + + return self + + def __len__(self): + return len(self._batch_sampler) + + def __iter__(self): + return self + + def __next__(self): + try: + return next(self.loader) + except StopIteration: + self.loader = iter(self.dataloader) + six.reraise(*sys.exc_info()) + + def next(self): + # python2 compatibility + return self.__next__() + + +@register +class TrainReader(BaseDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=True, + drop_last=True, + num_classes=80, + collate_batch=True, + **kwargs): + super(TrainReader, self).__init__(sample_transforms, batch_transforms, + batch_size, shuffle, drop_last, + num_classes, collate_batch, **kwargs) + + +@register +class EvalReader(BaseDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=False, + drop_last=True, + num_classes=80, + **kwargs): + super(EvalReader, self).__init__(sample_transforms, batch_transforms, + batch_size, shuffle, drop_last, + num_classes, **kwargs) + + +@register +class TestReader(BaseDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=False, + drop_last=False, + num_classes=80, + **kwargs): + super(TestReader, self).__init__(sample_transforms, batch_transforms, + batch_size, shuffle, drop_last, + num_classes, **kwargs) + + +@register +class EvalMOTReader(BaseDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=False, + drop_last=False, + num_classes=1, + **kwargs): + super(EvalMOTReader, self).__init__(sample_transforms, batch_transforms, + batch_size, shuffle, drop_last, + num_classes, **kwargs) + + +@register +class TestMOTReader(BaseDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + batch_transforms=[], + batch_size=1, + shuffle=False, + drop_last=False, + num_classes=1, + **kwargs): + super(TestMOTReader, self).__init__(sample_transforms, batch_transforms, + batch_size, shuffle, drop_last, + num_classes, **kwargs) + + +# For Semi-Supervised Object Detection (SSOD) +class Compose_SSOD(object): + def __init__(self, base_transforms, weak_aug, strong_aug, num_classes=80): + self.base_transforms = base_transforms + self.base_transforms_cls = [] + for t in self.base_transforms: + for k, v in t.items(): + op_cls = getattr(transform, k) + f = op_cls(**v) + if hasattr(f, 'num_classes'): + f.num_classes = num_classes + self.base_transforms_cls.append(f) + + self.weak_augs = weak_aug + self.weak_augs_cls = [] + for t in self.weak_augs: + for k, v in t.items(): + op_cls = getattr(transform, k) + f = op_cls(**v) + if hasattr(f, 'num_classes'): + f.num_classes = num_classes + self.weak_augs_cls.append(f) + + self.strong_augs = strong_aug + self.strong_augs_cls = [] + for t in self.strong_augs: + for k, v in t.items(): + op_cls = getattr(transform, k) + f = op_cls(**v) + if hasattr(f, 'num_classes'): + f.num_classes = num_classes + self.strong_augs_cls.append(f) + + def __call__(self, data): + for f in self.base_transforms_cls: + try: + data = f(data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map sample transform [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + weak_data = deepcopy(data) + strong_data = deepcopy(data) + for f in self.weak_augs_cls: + try: + weak_data = f(weak_data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map weak aug [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + for f in self.strong_augs_cls: + try: + strong_data = f(strong_data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map strong aug [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + weak_data['strong_aug'] = strong_data + return weak_data + + +class BatchCompose_SSOD(Compose): + def __init__(self, transforms, num_classes=80, collate_batch=True): + super(BatchCompose_SSOD, self).__init__(transforms, num_classes) + self.collate_batch = collate_batch + + def __call__(self, data): + # split strong_data from data(weak_data) + strong_data = [] + for sample in data: + strong_data.append(sample['strong_aug']) + sample.pop('strong_aug') + + for f in self.transforms_cls: + try: + data = f(data) + strong_data = f(strong_data) + except Exception as e: + stack_info = traceback.format_exc() + logger.warning("fail to map batch transform [{}] " + "with error: {} and stack:\n{}".format( + f, e, str(stack_info))) + raise e + + # remove keys which is not needed by model + extra_key = ['h', 'w', 'flipped'] + for k in extra_key: + for sample in data: + if k in sample: + sample.pop(k) + for sample in strong_data: + if k in sample: + sample.pop(k) + + # batch data, if user-define batch function needed + # use user-defined here + if self.collate_batch: + batch_data = default_collate_fn(data) + strong_batch_data = default_collate_fn(strong_data) + return batch_data, strong_batch_data + else: + batch_data = {} + for k in data[0].keys(): + tmp_data = [] + for i in range(len(data)): + tmp_data.append(data[i][k]) + if not 'gt_' in k and not 'is_crowd' in k and not 'difficult' in k: + tmp_data = np.stack(tmp_data, axis=0) + batch_data[k] = tmp_data + + strong_batch_data = {} + for k in strong_data[0].keys(): + tmp_data = [] + for i in range(len(strong_data)): + tmp_data.append(strong_data[i][k]) + if not 'gt_' in k and not 'is_crowd' in k and not 'difficult' in k: + tmp_data = np.stack(tmp_data, axis=0) + strong_batch_data[k] = tmp_data + + return batch_data, strong_batch_data + + +class CombineSSODLoader(object): + def __init__(self, label_loader, unlabel_loader): + self.label_loader = label_loader + self.unlabel_loader = unlabel_loader + + def __iter__(self): + while True: + try: + label_samples = next(self.label_loader_iter) + except: + self.label_loader_iter = iter(self.label_loader) + label_samples = next(self.label_loader_iter) + + try: + unlabel_samples = next(self.unlabel_loader_iter) + except: + self.unlabel_loader_iter = iter(self.unlabel_loader) + unlabel_samples = next(self.unlabel_loader_iter) + + yield ( + label_samples[0], # sup weak + label_samples[1], # sup strong + unlabel_samples[0], # unsup weak + unlabel_samples[1] # unsup strong + ) + + def __call__(self): + return self.__iter__() + + +class BaseSemiDataLoader(object): + def __init__(self, + sample_transforms=[], + weak_aug=[], + strong_aug=[], + sup_batch_transforms=[], + unsup_batch_transforms=[], + sup_batch_size=1, + unsup_batch_size=1, + shuffle=True, + drop_last=True, + num_classes=80, + collate_batch=True, + use_shared_memory=False, + **kwargs): + # sup transforms + self._sample_transforms_label = Compose_SSOD( + sample_transforms, weak_aug, strong_aug, num_classes=num_classes) + self._batch_transforms_label = BatchCompose_SSOD( + sup_batch_transforms, num_classes, collate_batch) + self.batch_size_label = sup_batch_size + + # unsup transforms + self._sample_transforms_unlabel = Compose_SSOD( + sample_transforms, weak_aug, strong_aug, num_classes=num_classes) + self._batch_transforms_unlabel = BatchCompose_SSOD( + unsup_batch_transforms, num_classes, collate_batch) + self.batch_size_unlabel = unsup_batch_size + + # common + self.shuffle = shuffle + self.drop_last = drop_last + self.use_shared_memory = use_shared_memory + self.kwargs = kwargs + + def __call__(self, + dataset_label, + dataset_unlabel, + worker_num, + batch_sampler_label=None, + batch_sampler_unlabel=None, + return_list=False): + # sup dataset + self.dataset_label = dataset_label + self.dataset_label.check_or_download_dataset() + self.dataset_label.parse_dataset() + self.dataset_label.set_transform(self._sample_transforms_label) + self.dataset_label.set_kwargs(**self.kwargs) + if batch_sampler_label is None: + self._batch_sampler_label = DistributedBatchSampler( + self.dataset_label, + batch_size=self.batch_size_label, + shuffle=self.shuffle, + drop_last=self.drop_last) + else: + self._batch_sampler_label = batch_sampler_label + + # unsup dataset + self.dataset_unlabel = dataset_unlabel + self.dataset_unlabel.length = self.dataset_label.__len__() + self.dataset_unlabel.check_or_download_dataset() + self.dataset_unlabel.parse_dataset() + self.dataset_unlabel.set_transform(self._sample_transforms_unlabel) + self.dataset_unlabel.set_kwargs(**self.kwargs) + if batch_sampler_unlabel is None: + self._batch_sampler_unlabel = DistributedBatchSampler( + self.dataset_unlabel, + batch_size=self.batch_size_unlabel, + shuffle=self.shuffle, + drop_last=self.drop_last) + else: + self._batch_sampler_unlabel = batch_sampler_unlabel + + # DataLoader do not start sub-process in Windows and Mac + # system, do not need to use shared memory + use_shared_memory = self.use_shared_memory and \ + sys.platform not in ['win32', 'darwin'] + # check whether shared memory size is bigger than 1G(1024M) + if use_shared_memory: + shm_size = _get_shared_memory_size_in_M() + if shm_size is not None and shm_size < 1024.: + logger.warning("Shared memory size is less than 1G, " + "disable shared_memory in DataLoader") + use_shared_memory = False + + self.dataloader_label = DataLoader( + dataset=self.dataset_label, + batch_sampler=self._batch_sampler_label, + collate_fn=self._batch_transforms_label, + num_workers=worker_num, + return_list=return_list, + use_shared_memory=use_shared_memory) + + self.dataloader_unlabel = DataLoader( + dataset=self.dataset_unlabel, + batch_sampler=self._batch_sampler_unlabel, + collate_fn=self._batch_transforms_unlabel, + num_workers=worker_num, + return_list=return_list, + use_shared_memory=use_shared_memory) + + self.dataloader = CombineSSODLoader(self.dataloader_label, + self.dataloader_unlabel) + self.loader = iter(self.dataloader) + return self + + def __len__(self): + return len(self._batch_sampler_label) + + def __iter__(self): + return self + + def __next__(self): + return next(self.loader) + + def next(self): + # python2 compatibility + return self.__next__() + + +@register +class SemiTrainReader(BaseSemiDataLoader): + __shared__ = ['num_classes'] + + def __init__(self, + sample_transforms=[], + weak_aug=[], + strong_aug=[], + sup_batch_transforms=[], + unsup_batch_transforms=[], + sup_batch_size=1, + unsup_batch_size=1, + shuffle=True, + drop_last=True, + num_classes=80, + collate_batch=True, + **kwargs): + super(SemiTrainReader, self).__init__( + sample_transforms, weak_aug, strong_aug, sup_batch_transforms, + unsup_batch_transforms, sup_batch_size, unsup_batch_size, shuffle, + drop_last, num_classes, collate_batch, **kwargs) diff --git a/PaddleDetection-release-2.6/ppdet/data/shm_utils.py b/PaddleDetection-release-2.6/ppdet/data/shm_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..a929a809cec9bc1e6b1dd335faa0ba4f2e44ff87 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/shm_utils.py @@ -0,0 +1,70 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +SIZE_UNIT = ['K', 'M', 'G', 'T'] +SHM_QUERY_CMD = 'df -h' +SHM_KEY = 'shm' +SHM_DEFAULT_MOUNT = '/dev/shm' + +# [ shared memory size check ] +# In detection models, image/target data occupies a lot of memory, and +# will occupy lots of shared memory in multi-process DataLoader, we use +# following code to get shared memory size and perform a size check to +# disable shared memory use if shared memory size is not enough. +# Shared memory getting process as follows: +# 1. use `df -h` get all mount info +# 2. pick up spaces whose mount info contains 'shm' +# 3. if 'shm' space number is only 1, return its size +# 4. if there are multiple 'shm' space, try to find the default mount +# directory '/dev/shm' is Linux-like system, otherwise return the +# biggest space size. + + +def _parse_size_in_M(size_str): + if size_str[-1] == 'B': + num, unit = size_str[:-2], size_str[-2] + else: + num, unit = size_str[:-1], size_str[-1] + assert unit in SIZE_UNIT, \ + "unknown shm size unit {}".format(unit) + return float(num) * \ + (1024 ** (SIZE_UNIT.index(unit) - 1)) + + +def _get_shared_memory_size_in_M(): + try: + df_infos = os.popen(SHM_QUERY_CMD).readlines() + except: + return None + else: + shm_infos = [] + for df_info in df_infos: + info = df_info.strip() + if info.find(SHM_KEY) >= 0: + shm_infos.append(info.split()) + + if len(shm_infos) == 0: + return None + elif len(shm_infos) == 1: + return _parse_size_in_M(shm_infos[0][3]) + else: + default_mount_infos = [ + si for si in shm_infos if si[-1] == SHM_DEFAULT_MOUNT + ] + if default_mount_infos: + return _parse_size_in_M(default_mount_infos[0][3]) + else: + return max([_parse_size_in_M(si[3]) for si in shm_infos]) diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__init__.py b/PaddleDetection-release-2.6/ppdet/data/source/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..f4fef334ee2a87791a9838dabc19097486cb46ea --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/__init__.py @@ -0,0 +1,31 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import coco +from . import voc +from . import widerface +from . import category +from . import keypoint_coco +from . import mot +from . import sniper_coco + +from .coco import * +from .voc import * +from .widerface import * +from .category import * +from .keypoint_coco import * +from .mot import * +from .sniper_coco import SniperCOCODataSet +from .dataset import ImageFolder +from .pose3d_cmb import * diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..28533d3a3808e20085b2a11af8088ef8e4560ac6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..63515890f00a44666c5abf4942d5fef9ed802037 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/category.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/category.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ba28f5ff94b19d5f188e42f3b04291ea19ba4ad2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/category.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6b818311a110c5b1b99f152e0c428d53b80d66b7 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8ddaadd389ebaaf8f17529d83b8e988e5b565ba1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/coco.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-310.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fe233ee9c70c92c738e9e546ce99518fee382da3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-310.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3bd40cd4d92548e27e5d6ad411b18c789d47d299 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/dataset.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/keypoint_coco.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/keypoint_coco.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..25d5ed915ad216d0ffa109aec681dba78ebcb605 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/keypoint_coco.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/mot.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/mot.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..09b1662cb5c01879ad283af8366a12997fda6c74 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/mot.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/pose3d_cmb.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/pose3d_cmb.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..53544b80aa8f8b7925b422eef2ced7d3e5d0ad6d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/pose3d_cmb.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/sniper_coco.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/sniper_coco.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..007df5c72fb983b09128325b8113545f1bcc5044 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/sniper_coco.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/voc.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/voc.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..923b93228de639c617e74678e019a5da9ecb0a54 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/voc.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/widerface.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/widerface.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9c4e393227a480d629c7c9c7341357ce9264f6fa Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/source/__pycache__/widerface.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/source/category.py b/PaddleDetection-release-2.6/ppdet/data/source/category.py new file mode 100644 index 0000000000000000000000000000000000000000..4da25a2d2f52a0caf28b4623d563d0fa315c4ac6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/category.py @@ -0,0 +1,940 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os + +from ppdet.data.source.voc import pascalvoc_label +from ppdet.data.source.widerface import widerface_label +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['get_categories'] + + +def get_categories(metric_type, anno_file=None, arch=None): + """ + Get class id to category id map and category id + to category name map from annotation file. + + Args: + metric_type (str): metric type, currently support 'coco', 'voc', 'oid' + and 'widerface'. + anno_file (str): annotation file path + """ + if arch == 'keypoint_arch': + return (None, {'id': 'keypoint'}) + + if anno_file == None or (not os.path.isfile(anno_file)): + logger.warning( + "anno_file '{}' is None or not set or not exist, " + "please recheck TrainDataset/EvalDataset/TestDataset.anno_path, " + "otherwise the default categories will be used by metric_type.". + format(anno_file)) + + if metric_type.lower() == 'coco' or metric_type.lower( + ) == 'rbox' or metric_type.lower() == 'snipercoco': + if anno_file and os.path.isfile(anno_file): + if anno_file.endswith('json'): + # lazy import pycocotools here + from pycocotools.coco import COCO + coco = COCO(anno_file) + cats = coco.loadCats(coco.getCatIds()) + + clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)} + catid2name = {cat['id']: cat['name'] for cat in cats} + + elif anno_file.endswith('txt'): + cats = [] + with open(anno_file) as f: + for line in f.readlines(): + cats.append(line.strip()) + if cats[0] == 'background': cats = cats[1:] + + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + else: + raise ValueError("anno_file {} should be json or txt.".format( + anno_file)) + return clsid2catid, catid2name + + # anno file not exist, load default categories of COCO17 + else: + if metric_type.lower() == 'rbox': + logger.warning( + "metric_type: {}, load default categories of DOTA.".format( + metric_type)) + return _dota_category() + logger.warning("metric_type: {}, load default categories of COCO.". + format(metric_type)) + return _coco17_category() + + elif metric_type.lower() == 'voc': + if anno_file and os.path.isfile(anno_file): + cats = [] + with open(anno_file) as f: + for line in f.readlines(): + cats.append(line.strip()) + + if cats[0] == 'background': + cats = cats[1:] + + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + return clsid2catid, catid2name + + # anno file not exist, load default categories of + # VOC all 20 categories + else: + logger.warning("metric_type: {}, load default categories of VOC.". + format(metric_type)) + return _vocall_category() + + elif metric_type.lower() == 'oid': + if anno_file and os.path.isfile(anno_file): + logger.warning("only default categories support for OID19") + return _oid19_category() + + elif metric_type.lower() == 'widerface': + return _widerface_category() + + elif metric_type.lower() == 'keypointtopdowncocoeval' or metric_type.lower( + ) == 'keypointtopdownmpiieval': + return (None, {'id': 'keypoint'}) + + elif metric_type.lower() == 'pose3deval': + return (None, {'id': 'pose3d'}) + + elif metric_type.lower() in ['mot', 'motdet', 'reid']: + if anno_file and os.path.isfile(anno_file): + cats = [] + with open(anno_file) as f: + for line in f.readlines(): + cats.append(line.strip()) + if cats[0] == 'background': + cats = cats[1:] + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + return clsid2catid, catid2name + # anno file not exist, load default category 'pedestrian'. + else: + logger.warning( + "metric_type: {}, load default categories of pedestrian MOT.". + format(metric_type)) + return _mot_category(category='pedestrian') + + elif metric_type.lower() in ['kitti', 'bdd100kmot']: + return _mot_category(category='vehicle') + + elif metric_type.lower() in ['mcmot']: + if anno_file and os.path.isfile(anno_file): + cats = [] + with open(anno_file) as f: + for line in f.readlines(): + cats.append(line.strip()) + if cats[0] == 'background': + cats = cats[1:] + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + return clsid2catid, catid2name + # anno file not exist, load default categories of visdrone all 10 categories + else: + logger.warning( + "metric_type: {}, load default categories of VisDrone.".format( + metric_type)) + return _visdrone_category() + + else: + raise ValueError("unknown metric type {}".format(metric_type)) + + +def _mot_category(category='pedestrian'): + """ + Get class id to category id map and category id + to category name map of mot dataset + """ + label_map = {category: 0} + label_map = sorted(label_map.items(), key=lambda x: x[1]) + cats = [l[0] for l in label_map] + + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + return clsid2catid, catid2name + + +def _coco17_category(): + """ + Get class id to category id map and category id + to category name map of COCO2017 dataset + + """ + clsid2catid = { + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 13, + 13: 14, + 14: 15, + 15: 16, + 16: 17, + 17: 18, + 18: 19, + 19: 20, + 20: 21, + 21: 22, + 22: 23, + 23: 24, + 24: 25, + 25: 27, + 26: 28, + 27: 31, + 28: 32, + 29: 33, + 30: 34, + 31: 35, + 32: 36, + 33: 37, + 34: 38, + 35: 39, + 36: 40, + 37: 41, + 38: 42, + 39: 43, + 40: 44, + 41: 46, + 42: 47, + 43: 48, + 44: 49, + 45: 50, + 46: 51, + 47: 52, + 48: 53, + 49: 54, + 50: 55, + 51: 56, + 52: 57, + 53: 58, + 54: 59, + 55: 60, + 56: 61, + 57: 62, + 58: 63, + 59: 64, + 60: 65, + 61: 67, + 62: 70, + 63: 72, + 64: 73, + 65: 74, + 66: 75, + 67: 76, + 68: 77, + 69: 78, + 70: 79, + 71: 80, + 72: 81, + 73: 82, + 74: 84, + 75: 85, + 76: 86, + 77: 87, + 78: 88, + 79: 89, + 80: 90 + } + + catid2name = { + 0: 'background', + 1: 'person', + 2: 'bicycle', + 3: 'car', + 4: 'motorcycle', + 5: 'airplane', + 6: 'bus', + 7: 'train', + 8: 'truck', + 9: 'boat', + 10: 'traffic light', + 11: 'fire hydrant', + 13: 'stop sign', + 14: 'parking meter', + 15: 'bench', + 16: 'bird', + 17: 'cat', + 18: 'dog', + 19: 'horse', + 20: 'sheep', + 21: 'cow', + 22: 'elephant', + 23: 'bear', + 24: 'zebra', + 25: 'giraffe', + 27: 'backpack', + 28: 'umbrella', + 31: 'handbag', + 32: 'tie', + 33: 'suitcase', + 34: 'frisbee', + 35: 'skis', + 36: 'snowboard', + 37: 'sports ball', + 38: 'kite', + 39: 'baseball bat', + 40: 'baseball glove', + 41: 'skateboard', + 42: 'surfboard', + 43: 'tennis racket', + 44: 'bottle', + 46: 'wine glass', + 47: 'cup', + 48: 'fork', + 49: 'knife', + 50: 'spoon', + 51: 'bowl', + 52: 'banana', + 53: 'apple', + 54: 'sandwich', + 55: 'orange', + 56: 'broccoli', + 57: 'carrot', + 58: 'hot dog', + 59: 'pizza', + 60: 'donut', + 61: 'cake', + 62: 'chair', + 63: 'couch', + 64: 'potted plant', + 65: 'bed', + 67: 'dining table', + 70: 'toilet', + 72: 'tv', + 73: 'laptop', + 74: 'mouse', + 75: 'remote', + 76: 'keyboard', + 77: 'cell phone', + 78: 'microwave', + 79: 'oven', + 80: 'toaster', + 81: 'sink', + 82: 'refrigerator', + 84: 'book', + 85: 'clock', + 86: 'vase', + 87: 'scissors', + 88: 'teddy bear', + 89: 'hair drier', + 90: 'toothbrush' + } + + clsid2catid = {k - 1: v for k, v in clsid2catid.items()} + catid2name.pop(0) + + return clsid2catid, catid2name + + +def _dota_category(): + """ + Get class id to category id map and category id + to category name map of dota dataset + """ + catid2name = { + 0: 'background', + 1: 'plane', + 2: 'baseball-diamond', + 3: 'bridge', + 4: 'ground-track-field', + 5: 'small-vehicle', + 6: 'large-vehicle', + 7: 'ship', + 8: 'tennis-court', + 9: 'basketball-court', + 10: 'storage-tank', + 11: 'soccer-ball-field', + 12: 'roundabout', + 13: 'harbor', + 14: 'swimming-pool', + 15: 'helicopter' + } + catid2name.pop(0) + clsid2catid = {i: i + 1 for i in range(len(catid2name))} + return clsid2catid, catid2name + + +def _vocall_category(): + """ + Get class id to category id map and category id + to category name map of mixup voc dataset + + """ + label_map = pascalvoc_label() + label_map = sorted(label_map.items(), key=lambda x: x[1]) + cats = [l[0] for l in label_map] + + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + return clsid2catid, catid2name + + +def _widerface_category(): + label_map = widerface_label() + label_map = sorted(label_map.items(), key=lambda x: x[1]) + cats = [l[0] for l in label_map] + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + return clsid2catid, catid2name + + +def _oid19_category(): + clsid2catid = {k: k + 1 for k in range(500)} + + catid2name = { + 0: "background", + 1: "Infant bed", + 2: "Rose", + 3: "Flag", + 4: "Flashlight", + 5: "Sea turtle", + 6: "Camera", + 7: "Animal", + 8: "Glove", + 9: "Crocodile", + 10: "Cattle", + 11: "House", + 12: "Guacamole", + 13: "Penguin", + 14: "Vehicle registration plate", + 15: "Bench", + 16: "Ladybug", + 17: "Human nose", + 18: "Watermelon", + 19: "Flute", + 20: "Butterfly", + 21: "Washing machine", + 22: "Raccoon", + 23: "Segway", + 24: "Taco", + 25: "Jellyfish", + 26: "Cake", + 27: "Pen", + 28: "Cannon", + 29: "Bread", + 30: "Tree", + 31: "Shellfish", + 32: "Bed", + 33: "Hamster", + 34: "Hat", + 35: "Toaster", + 36: "Sombrero", + 37: "Tiara", + 38: "Bowl", + 39: "Dragonfly", + 40: "Moths and butterflies", + 41: "Antelope", + 42: "Vegetable", + 43: "Torch", + 44: "Building", + 45: "Power plugs and sockets", + 46: "Blender", + 47: "Billiard table", + 48: "Cutting board", + 49: "Bronze sculpture", + 50: "Turtle", + 51: "Broccoli", + 52: "Tiger", + 53: "Mirror", + 54: "Bear", + 55: "Zucchini", + 56: "Dress", + 57: "Volleyball", + 58: "Guitar", + 59: "Reptile", + 60: "Golf cart", + 61: "Tart", + 62: "Fedora", + 63: "Carnivore", + 64: "Car", + 65: "Lighthouse", + 66: "Coffeemaker", + 67: "Food processor", + 68: "Truck", + 69: "Bookcase", + 70: "Surfboard", + 71: "Footwear", + 72: "Bench", + 73: "Necklace", + 74: "Flower", + 75: "Radish", + 76: "Marine mammal", + 77: "Frying pan", + 78: "Tap", + 79: "Peach", + 80: "Knife", + 81: "Handbag", + 82: "Laptop", + 83: "Tent", + 84: "Ambulance", + 85: "Christmas tree", + 86: "Eagle", + 87: "Limousine", + 88: "Kitchen & dining room table", + 89: "Polar bear", + 90: "Tower", + 91: "Football", + 92: "Willow", + 93: "Human head", + 94: "Stop sign", + 95: "Banana", + 96: "Mixer", + 97: "Binoculars", + 98: "Dessert", + 99: "Bee", + 100: "Chair", + 101: "Wood-burning stove", + 102: "Flowerpot", + 103: "Beaker", + 104: "Oyster", + 105: "Woodpecker", + 106: "Harp", + 107: "Bathtub", + 108: "Wall clock", + 109: "Sports uniform", + 110: "Rhinoceros", + 111: "Beehive", + 112: "Cupboard", + 113: "Chicken", + 114: "Man", + 115: "Blue jay", + 116: "Cucumber", + 117: "Balloon", + 118: "Kite", + 119: "Fireplace", + 120: "Lantern", + 121: "Missile", + 122: "Book", + 123: "Spoon", + 124: "Grapefruit", + 125: "Squirrel", + 126: "Orange", + 127: "Coat", + 128: "Punching bag", + 129: "Zebra", + 130: "Billboard", + 131: "Bicycle", + 132: "Door handle", + 133: "Mechanical fan", + 134: "Ring binder", + 135: "Table", + 136: "Parrot", + 137: "Sock", + 138: "Vase", + 139: "Weapon", + 140: "Shotgun", + 141: "Glasses", + 142: "Seahorse", + 143: "Belt", + 144: "Watercraft", + 145: "Window", + 146: "Giraffe", + 147: "Lion", + 148: "Tire", + 149: "Vehicle", + 150: "Canoe", + 151: "Tie", + 152: "Shelf", + 153: "Picture frame", + 154: "Printer", + 155: "Human leg", + 156: "Boat", + 157: "Slow cooker", + 158: "Croissant", + 159: "Candle", + 160: "Pancake", + 161: "Pillow", + 162: "Coin", + 163: "Stretcher", + 164: "Sandal", + 165: "Woman", + 166: "Stairs", + 167: "Harpsichord", + 168: "Stool", + 169: "Bus", + 170: "Suitcase", + 171: "Human mouth", + 172: "Juice", + 173: "Skull", + 174: "Door", + 175: "Violin", + 176: "Chopsticks", + 177: "Digital clock", + 178: "Sunflower", + 179: "Leopard", + 180: "Bell pepper", + 181: "Harbor seal", + 182: "Snake", + 183: "Sewing machine", + 184: "Goose", + 185: "Helicopter", + 186: "Seat belt", + 187: "Coffee cup", + 188: "Microwave oven", + 189: "Hot dog", + 190: "Countertop", + 191: "Serving tray", + 192: "Dog bed", + 193: "Beer", + 194: "Sunglasses", + 195: "Golf ball", + 196: "Waffle", + 197: "Palm tree", + 198: "Trumpet", + 199: "Ruler", + 200: "Helmet", + 201: "Ladder", + 202: "Office building", + 203: "Tablet computer", + 204: "Toilet paper", + 205: "Pomegranate", + 206: "Skirt", + 207: "Gas stove", + 208: "Cookie", + 209: "Cart", + 210: "Raven", + 211: "Egg", + 212: "Burrito", + 213: "Goat", + 214: "Kitchen knife", + 215: "Skateboard", + 216: "Salt and pepper shakers", + 217: "Lynx", + 218: "Boot", + 219: "Platter", + 220: "Ski", + 221: "Swimwear", + 222: "Swimming pool", + 223: "Drinking straw", + 224: "Wrench", + 225: "Drum", + 226: "Ant", + 227: "Human ear", + 228: "Headphones", + 229: "Fountain", + 230: "Bird", + 231: "Jeans", + 232: "Television", + 233: "Crab", + 234: "Microphone", + 235: "Home appliance", + 236: "Snowplow", + 237: "Beetle", + 238: "Artichoke", + 239: "Jet ski", + 240: "Stationary bicycle", + 241: "Human hair", + 242: "Brown bear", + 243: "Starfish", + 244: "Fork", + 245: "Lobster", + 246: "Corded phone", + 247: "Drink", + 248: "Saucer", + 249: "Carrot", + 250: "Insect", + 251: "Clock", + 252: "Castle", + 253: "Tennis racket", + 254: "Ceiling fan", + 255: "Asparagus", + 256: "Jaguar", + 257: "Musical instrument", + 258: "Train", + 259: "Cat", + 260: "Rifle", + 261: "Dumbbell", + 262: "Mobile phone", + 263: "Taxi", + 264: "Shower", + 265: "Pitcher", + 266: "Lemon", + 267: "Invertebrate", + 268: "Turkey", + 269: "High heels", + 270: "Bust", + 271: "Elephant", + 272: "Scarf", + 273: "Barrel", + 274: "Trombone", + 275: "Pumpkin", + 276: "Box", + 277: "Tomato", + 278: "Frog", + 279: "Bidet", + 280: "Human face", + 281: "Houseplant", + 282: "Van", + 283: "Shark", + 284: "Ice cream", + 285: "Swim cap", + 286: "Falcon", + 287: "Ostrich", + 288: "Handgun", + 289: "Whiteboard", + 290: "Lizard", + 291: "Pasta", + 292: "Snowmobile", + 293: "Light bulb", + 294: "Window blind", + 295: "Muffin", + 296: "Pretzel", + 297: "Computer monitor", + 298: "Horn", + 299: "Furniture", + 300: "Sandwich", + 301: "Fox", + 302: "Convenience store", + 303: "Fish", + 304: "Fruit", + 305: "Earrings", + 306: "Curtain", + 307: "Grape", + 308: "Sofa bed", + 309: "Horse", + 310: "Luggage and bags", + 311: "Desk", + 312: "Crutch", + 313: "Bicycle helmet", + 314: "Tick", + 315: "Airplane", + 316: "Canary", + 317: "Spatula", + 318: "Watch", + 319: "Lily", + 320: "Kitchen appliance", + 321: "Filing cabinet", + 322: "Aircraft", + 323: "Cake stand", + 324: "Candy", + 325: "Sink", + 326: "Mouse", + 327: "Wine", + 328: "Wheelchair", + 329: "Goldfish", + 330: "Refrigerator", + 331: "French fries", + 332: "Drawer", + 333: "Treadmill", + 334: "Picnic basket", + 335: "Dice", + 336: "Cabbage", + 337: "Football helmet", + 338: "Pig", + 339: "Person", + 340: "Shorts", + 341: "Gondola", + 342: "Honeycomb", + 343: "Doughnut", + 344: "Chest of drawers", + 345: "Land vehicle", + 346: "Bat", + 347: "Monkey", + 348: "Dagger", + 349: "Tableware", + 350: "Human foot", + 351: "Mug", + 352: "Alarm clock", + 353: "Pressure cooker", + 354: "Human hand", + 355: "Tortoise", + 356: "Baseball glove", + 357: "Sword", + 358: "Pear", + 359: "Miniskirt", + 360: "Traffic sign", + 361: "Girl", + 362: "Roller skates", + 363: "Dinosaur", + 364: "Porch", + 365: "Human beard", + 366: "Submarine sandwich", + 367: "Screwdriver", + 368: "Strawberry", + 369: "Wine glass", + 370: "Seafood", + 371: "Racket", + 372: "Wheel", + 373: "Sea lion", + 374: "Toy", + 375: "Tea", + 376: "Tennis ball", + 377: "Waste container", + 378: "Mule", + 379: "Cricket ball", + 380: "Pineapple", + 381: "Coconut", + 382: "Doll", + 383: "Coffee table", + 384: "Snowman", + 385: "Lavender", + 386: "Shrimp", + 387: "Maple", + 388: "Cowboy hat", + 389: "Goggles", + 390: "Rugby ball", + 391: "Caterpillar", + 392: "Poster", + 393: "Rocket", + 394: "Organ", + 395: "Saxophone", + 396: "Traffic light", + 397: "Cocktail", + 398: "Plastic bag", + 399: "Squash", + 400: "Mushroom", + 401: "Hamburger", + 402: "Light switch", + 403: "Parachute", + 404: "Teddy bear", + 405: "Winter melon", + 406: "Deer", + 407: "Musical keyboard", + 408: "Plumbing fixture", + 409: "Scoreboard", + 410: "Baseball bat", + 411: "Envelope", + 412: "Adhesive tape", + 413: "Briefcase", + 414: "Paddle", + 415: "Bow and arrow", + 416: "Telephone", + 417: "Sheep", + 418: "Jacket", + 419: "Boy", + 420: "Pizza", + 421: "Otter", + 422: "Office supplies", + 423: "Couch", + 424: "Cello", + 425: "Bull", + 426: "Camel", + 427: "Ball", + 428: "Duck", + 429: "Whale", + 430: "Shirt", + 431: "Tank", + 432: "Motorcycle", + 433: "Accordion", + 434: "Owl", + 435: "Porcupine", + 436: "Sun hat", + 437: "Nail", + 438: "Scissors", + 439: "Swan", + 440: "Lamp", + 441: "Crown", + 442: "Piano", + 443: "Sculpture", + 444: "Cheetah", + 445: "Oboe", + 446: "Tin can", + 447: "Mango", + 448: "Tripod", + 449: "Oven", + 450: "Mouse", + 451: "Barge", + 452: "Coffee", + 453: "Snowboard", + 454: "Common fig", + 455: "Salad", + 456: "Marine invertebrates", + 457: "Umbrella", + 458: "Kangaroo", + 459: "Human arm", + 460: "Measuring cup", + 461: "Snail", + 462: "Loveseat", + 463: "Suit", + 464: "Teapot", + 465: "Bottle", + 466: "Alpaca", + 467: "Kettle", + 468: "Trousers", + 469: "Popcorn", + 470: "Centipede", + 471: "Spider", + 472: "Sparrow", + 473: "Plate", + 474: "Bagel", + 475: "Personal care", + 476: "Apple", + 477: "Brassiere", + 478: "Bathroom cabinet", + 479: "studio couch", + 480: "Computer keyboard", + 481: "Table tennis racket", + 482: "Sushi", + 483: "Cabinetry", + 484: "Street light", + 485: "Towel", + 486: "Nightstand", + 487: "Rabbit", + 488: "Dolphin", + 489: "Dog", + 490: "Jug", + 491: "Wok", + 492: "Fire hydrant", + 493: "Human eye", + 494: "Skyscraper", + 495: "Backpack", + 496: "Potato", + 497: "Paper towel", + 498: "Lifejacket", + 499: "Bicycle wheel", + 500: "Toilet", + } + + return clsid2catid, catid2name + + +def _visdrone_category(): + clsid2catid = {i: i for i in range(10)} + + catid2name = { + 0: 'pedestrian', + 1: 'people', + 2: 'bicycle', + 3: 'car', + 4: 'van', + 5: 'truck', + 6: 'tricycle', + 7: 'awning-tricycle', + 8: 'bus', + 9: 'motor' + } + return clsid2catid, catid2name diff --git a/PaddleDetection-release-2.6/ppdet/data/source/coco.py b/PaddleDetection-release-2.6/ppdet/data/source/coco.py new file mode 100644 index 0000000000000000000000000000000000000000..330dae6775115bb4401e5adcdc30471b7099f3e8 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/coco.py @@ -0,0 +1,587 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import copy +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence +import numpy as np +from ppdet.core.workspace import register, serializable +from .dataset import DetDataset + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['COCODataSet', 'SlicedCOCODataSet', 'SemiCOCODataSet'] + + +@register +@serializable +class COCODataSet(DetDataset): + """ + Load dataset with COCO format. + + Args: + dataset_dir (str): root directory for dataset. + image_dir (str): directory for images. + anno_path (str): coco annotation file path. + data_fields (list): key name of data dictionary, at least have 'image'. + sample_num (int): number of samples to load, -1 means all. + load_crowd (bool): whether to load crowded ground-truth. + False as default + allow_empty (bool): whether to load empty entry. False as default + empty_ratio (float): the ratio of empty record number to total + record's, if empty_ratio is out of [0. ,1.), do not sample the + records and use all the empty entries. 1. as default + repeat (int): repeat times for dataset, use in benchmark. + """ + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + load_crowd=False, + allow_empty=False, + empty_ratio=1., + repeat=1): + super(COCODataSet, self).__init__( + dataset_dir, + image_dir, + anno_path, + data_fields, + sample_num, + repeat=repeat) + self.load_image_only = False + self.load_semantic = False + self.load_crowd = load_crowd + self.allow_empty = allow_empty + self.empty_ratio = empty_ratio + + def _sample_empty(self, records, num): + # if empty_ratio is out of [0. ,1.), do not sample the records + if self.empty_ratio < 0. or self.empty_ratio >= 1.: + return records + import random + sample_num = min( + int(num * self.empty_ratio / (1 - self.empty_ratio)), len(records)) + records = random.sample(records, sample_num) + return records + + def parse_dataset(self): + anno_path = os.path.join(self.dataset_dir, self.anno_path) + image_dir = os.path.join(self.dataset_dir, self.image_dir) + + assert anno_path.endswith('.json'), \ + 'invalid coco annotation file: ' + anno_path + from pycocotools.coco import COCO + coco = COCO(anno_path) + img_ids = coco.getImgIds() + img_ids.sort() + cat_ids = coco.getCatIds() + records = [] + empty_records = [] + ct = 0 + + self.catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)}) + self.cname2cid = dict({ + coco.loadCats(catid)[0]['name']: clsid + for catid, clsid in self.catid2clsid.items() + }) + + if 'annotations' not in coco.dataset: + self.load_image_only = True + logger.warning('Annotation file: {} does not contains ground truth ' + 'and load image information only.'.format(anno_path)) + + for img_id in img_ids: + img_anno = coco.loadImgs([img_id])[0] + im_fname = img_anno['file_name'] + im_w = float(img_anno['width']) + im_h = float(img_anno['height']) + + im_path = os.path.join(image_dir, + im_fname) if image_dir else im_fname + is_empty = False + if not os.path.exists(im_path): + logger.warning('Illegal image file: {}, and it will be ' + 'ignored'.format(im_path)) + continue + + if im_w < 0 or im_h < 0: + logger.warning('Illegal width: {} or height: {} in annotation, ' + 'and im_id: {} will be ignored'.format( + im_w, im_h, img_id)) + continue + + coco_rec = { + 'im_file': im_path, + 'im_id': np.array([img_id]), + 'h': im_h, + 'w': im_w, + } if 'image' in self.data_fields else {} + + if not self.load_image_only: + ins_anno_ids = coco.getAnnIds( + imgIds=[img_id], iscrowd=None if self.load_crowd else False) + instances = coco.loadAnns(ins_anno_ids) + + bboxes = [] + is_rbox_anno = False + for inst in instances: + # check gt bbox + if inst.get('ignore', False): + continue + if 'bbox' not in inst.keys(): + continue + else: + if not any(np.array(inst['bbox'])): + continue + + x1, y1, box_w, box_h = inst['bbox'] + x2 = x1 + box_w + y2 = y1 + box_h + eps = 1e-5 + if inst['area'] > 0 and x2 - x1 > eps and y2 - y1 > eps: + inst['clean_bbox'] = [ + round(float(x), 3) for x in [x1, y1, x2, y2] + ] + bboxes.append(inst) + else: + logger.warning( + 'Found an invalid bbox in annotations: im_id: {}, ' + 'area: {} x1: {}, y1: {}, x2: {}, y2: {}.'.format( + img_id, float(inst['area']), x1, y1, x2, y2)) + + num_bbox = len(bboxes) + if num_bbox <= 0 and not self.allow_empty: + continue + elif num_bbox <= 0: + is_empty = True + + gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32) + gt_class = np.zeros((num_bbox, 1), dtype=np.int32) + is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) + gt_poly = [None] * num_bbox + gt_track_id = -np.ones((num_bbox, 1), dtype=np.int32) + + has_segmentation = False + has_track_id = False + for i, box in enumerate(bboxes): + catid = box['category_id'] + gt_class[i][0] = self.catid2clsid[catid] + gt_bbox[i, :] = box['clean_bbox'] + is_crowd[i][0] = box['iscrowd'] + # check RLE format + if 'segmentation' in box and box['iscrowd'] == 1: + gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] + elif 'segmentation' in box and box['segmentation']: + if not np.array( + box['segmentation'], + dtype=object).size > 0 and not self.allow_empty: + bboxes.pop(i) + gt_poly.pop(i) + np.delete(is_crowd, i) + np.delete(gt_class, i) + np.delete(gt_bbox, i) + else: + gt_poly[i] = box['segmentation'] + has_segmentation = True + + if 'track_id' in box: + gt_track_id[i][0] = box['track_id'] + has_track_id = True + + if has_segmentation and not any( + gt_poly) and not self.allow_empty: + continue + + gt_rec = { + 'is_crowd': is_crowd, + 'gt_class': gt_class, + 'gt_bbox': gt_bbox, + 'gt_poly': gt_poly, + } + if has_track_id: + gt_rec.update({'gt_track_id': gt_track_id}) + + for k, v in gt_rec.items(): + if k in self.data_fields: + coco_rec[k] = v + + # TODO: remove load_semantic + if self.load_semantic and 'semantic' in self.data_fields: + seg_path = os.path.join(self.dataset_dir, 'stuffthingmaps', + 'train2017', im_fname[:-3] + 'png') + coco_rec.update({'semantic': seg_path}) + + logger.debug('Load file: {}, im_id: {}, h: {}, w: {}.'.format( + im_path, img_id, im_h, im_w)) + if is_empty: + empty_records.append(coco_rec) + else: + records.append(coco_rec) + ct += 1 + if self.sample_num > 0 and ct >= self.sample_num: + break + assert ct > 0, 'not found any coco record in %s' % (anno_path) + logger.info('Load [{} samples valid, {} samples invalid] in file {}.'. + format(ct, len(img_ids) - ct, anno_path)) + if self.allow_empty and len(empty_records) > 0: + empty_records = self._sample_empty(empty_records, len(records)) + records += empty_records + self.roidbs = records + + +@register +@serializable +class SlicedCOCODataSet(COCODataSet): + """Sliced COCODataSet""" + + def __init__( + self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + load_crowd=False, + allow_empty=False, + empty_ratio=1., + repeat=1, + sliced_size=[640, 640], + overlap_ratio=[0.25, 0.25], ): + super(SlicedCOCODataSet, self).__init__( + dataset_dir=dataset_dir, + image_dir=image_dir, + anno_path=anno_path, + data_fields=data_fields, + sample_num=sample_num, + load_crowd=load_crowd, + allow_empty=allow_empty, + empty_ratio=empty_ratio, + repeat=repeat, ) + self.sliced_size = sliced_size + self.overlap_ratio = overlap_ratio + + def parse_dataset(self): + anno_path = os.path.join(self.dataset_dir, self.anno_path) + image_dir = os.path.join(self.dataset_dir, self.image_dir) + + assert anno_path.endswith('.json'), \ + 'invalid coco annotation file: ' + anno_path + from pycocotools.coco import COCO + coco = COCO(anno_path) + img_ids = coco.getImgIds() + img_ids.sort() + cat_ids = coco.getCatIds() + records = [] + empty_records = [] + ct = 0 + ct_sub = 0 + + self.catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)}) + self.cname2cid = dict({ + coco.loadCats(catid)[0]['name']: clsid + for catid, clsid in self.catid2clsid.items() + }) + + if 'annotations' not in coco.dataset: + self.load_image_only = True + logger.warning('Annotation file: {} does not contains ground truth ' + 'and load image information only.'.format(anno_path)) + try: + import sahi + from sahi.slicing import slice_image + except Exception as e: + logger.error( + 'sahi not found, plaese install sahi. ' + 'for example: `pip install sahi`, see https://github.com/obss/sahi.' + ) + raise e + + sub_img_ids = 0 + for img_id in img_ids: + img_anno = coco.loadImgs([img_id])[0] + im_fname = img_anno['file_name'] + im_w = float(img_anno['width']) + im_h = float(img_anno['height']) + + im_path = os.path.join(image_dir, + im_fname) if image_dir else im_fname + is_empty = False + if not os.path.exists(im_path): + logger.warning('Illegal image file: {}, and it will be ' + 'ignored'.format(im_path)) + continue + + if im_w < 0 or im_h < 0: + logger.warning('Illegal width: {} or height: {} in annotation, ' + 'and im_id: {} will be ignored'.format( + im_w, im_h, img_id)) + continue + + slice_image_result = sahi.slicing.slice_image( + image=im_path, + slice_height=self.sliced_size[0], + slice_width=self.sliced_size[1], + overlap_height_ratio=self.overlap_ratio[0], + overlap_width_ratio=self.overlap_ratio[1]) + + sub_img_num = len(slice_image_result) + for _ind in range(sub_img_num): + im = slice_image_result.images[_ind] + coco_rec = { + 'image': im, + 'im_id': np.array([sub_img_ids + _ind]), + 'h': im.shape[0], + 'w': im.shape[1], + 'ori_im_id': np.array([img_id]), + 'st_pix': np.array( + slice_image_result.starting_pixels[_ind], + dtype=np.float32), + 'is_last': 1 if _ind == sub_img_num - 1 else 0, + } if 'image' in self.data_fields else {} + records.append(coco_rec) + ct_sub += sub_img_num + ct += 1 + if self.sample_num > 0 and ct >= self.sample_num: + break + assert ct > 0, 'not found any coco record in %s' % (anno_path) + logger.info('{} samples and slice to {} sub_samples in file {}'.format( + ct, ct_sub, anno_path)) + if self.allow_empty and len(empty_records) > 0: + empty_records = self._sample_empty(empty_records, len(records)) + records += empty_records + self.roidbs = records + + +@register +@serializable +class SemiCOCODataSet(COCODataSet): + """Semi-COCODataSet used for supervised and unsupervised dataSet""" + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + load_crowd=False, + allow_empty=False, + empty_ratio=1., + repeat=1, + supervised=True): + super(SemiCOCODataSet, self).__init__( + dataset_dir, image_dir, anno_path, data_fields, sample_num, + load_crowd, allow_empty, empty_ratio, repeat) + self.supervised = supervised + self.length = -1 # defalut -1 means all + + def parse_dataset(self): + anno_path = os.path.join(self.dataset_dir, self.anno_path) + image_dir = os.path.join(self.dataset_dir, self.image_dir) + + assert anno_path.endswith('.json'), \ + 'invalid coco annotation file: ' + anno_path + from pycocotools.coco import COCO + coco = COCO(anno_path) + img_ids = coco.getImgIds() + img_ids.sort() + cat_ids = coco.getCatIds() + records = [] + empty_records = [] + ct = 0 + + self.catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)}) + self.cname2cid = dict({ + coco.loadCats(catid)[0]['name']: clsid + for catid, clsid in self.catid2clsid.items() + }) + + if 'annotations' not in coco.dataset or self.supervised == False: + self.load_image_only = True + logger.warning('Annotation file: {} does not contains ground truth ' + 'and load image information only.'.format(anno_path)) + + for img_id in img_ids: + img_anno = coco.loadImgs([img_id])[0] + im_fname = img_anno['file_name'] + im_w = float(img_anno['width']) + im_h = float(img_anno['height']) + + im_path = os.path.join(image_dir, + im_fname) if image_dir else im_fname + is_empty = False + if not os.path.exists(im_path): + logger.warning('Illegal image file: {}, and it will be ' + 'ignored'.format(im_path)) + continue + + if im_w < 0 or im_h < 0: + logger.warning('Illegal width: {} or height: {} in annotation, ' + 'and im_id: {} will be ignored'.format( + im_w, im_h, img_id)) + continue + + coco_rec = { + 'im_file': im_path, + 'im_id': np.array([img_id]), + 'h': im_h, + 'w': im_w, + } if 'image' in self.data_fields else {} + + if not self.load_image_only: + ins_anno_ids = coco.getAnnIds( + imgIds=[img_id], iscrowd=None if self.load_crowd else False) + instances = coco.loadAnns(ins_anno_ids) + + bboxes = [] + is_rbox_anno = False + for inst in instances: + # check gt bbox + if inst.get('ignore', False): + continue + if 'bbox' not in inst.keys(): + continue + else: + if not any(np.array(inst['bbox'])): + continue + + x1, y1, box_w, box_h = inst['bbox'] + x2 = x1 + box_w + y2 = y1 + box_h + eps = 1e-5 + if inst['area'] > 0 and x2 - x1 > eps and y2 - y1 > eps: + inst['clean_bbox'] = [ + round(float(x), 3) for x in [x1, y1, x2, y2] + ] + bboxes.append(inst) + else: + logger.warning( + 'Found an invalid bbox in annotations: im_id: {}, ' + 'area: {} x1: {}, y1: {}, x2: {}, y2: {}.'.format( + img_id, float(inst['area']), x1, y1, x2, y2)) + + num_bbox = len(bboxes) + if num_bbox <= 0 and not self.allow_empty: + continue + elif num_bbox <= 0: + is_empty = True + + gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32) + gt_class = np.zeros((num_bbox, 1), dtype=np.int32) + is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) + gt_poly = [None] * num_bbox + + has_segmentation = False + for i, box in enumerate(bboxes): + catid = box['category_id'] + gt_class[i][0] = self.catid2clsid[catid] + gt_bbox[i, :] = box['clean_bbox'] + is_crowd[i][0] = box['iscrowd'] + # check RLE format + if 'segmentation' in box and box['iscrowd'] == 1: + gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] + elif 'segmentation' in box and box['segmentation']: + if not np.array(box['segmentation'] + ).size > 0 and not self.allow_empty: + bboxes.pop(i) + gt_poly.pop(i) + np.delete(is_crowd, i) + np.delete(gt_class, i) + np.delete(gt_bbox, i) + else: + gt_poly[i] = box['segmentation'] + has_segmentation = True + + if has_segmentation and not any( + gt_poly) and not self.allow_empty: + continue + + gt_rec = { + 'is_crowd': is_crowd, + 'gt_class': gt_class, + 'gt_bbox': gt_bbox, + 'gt_poly': gt_poly, + } + + for k, v in gt_rec.items(): + if k in self.data_fields: + coco_rec[k] = v + + # TODO: remove load_semantic + if self.load_semantic and 'semantic' in self.data_fields: + seg_path = os.path.join(self.dataset_dir, 'stuffthingmaps', + 'train2017', im_fname[:-3] + 'png') + coco_rec.update({'semantic': seg_path}) + + logger.debug('Load file: {}, im_id: {}, h: {}, w: {}.'.format( + im_path, img_id, im_h, im_w)) + if is_empty: + empty_records.append(coco_rec) + else: + records.append(coco_rec) + ct += 1 + if self.sample_num > 0 and ct >= self.sample_num: + break + assert ct > 0, 'not found any coco record in %s' % (anno_path) + logger.info('Load [{} samples valid, {} samples invalid] in file {}.'. + format(ct, len(img_ids) - ct, anno_path)) + if self.allow_empty and len(empty_records) > 0: + empty_records = self._sample_empty(empty_records, len(records)) + records += empty_records + self.roidbs = records + + if self.supervised: + logger.info(f'Use {len(self.roidbs)} sup_samples data as LABELED') + else: + if self.length > 0: # unsup length will be decide by sup length + all_roidbs = self.roidbs.copy() + selected_idxs = [ + np.random.choice(len(all_roidbs)) + for _ in range(self.length) + ] + self.roidbs = [all_roidbs[i] for i in selected_idxs] + logger.info( + f'Use {len(self.roidbs)} unsup_samples data as UNLABELED') + + def __getitem__(self, idx): + n = len(self.roidbs) + if self.repeat > 1: + idx %= n + # data batch + roidb = copy.deepcopy(self.roidbs[idx]) + if self.mixup_epoch == 0 or self._epoch < self.mixup_epoch: + idx = np.random.randint(n) + roidb = [roidb, copy.deepcopy(self.roidbs[idx])] + elif self.cutmix_epoch == 0 or self._epoch < self.cutmix_epoch: + idx = np.random.randint(n) + roidb = [roidb, copy.deepcopy(self.roidbs[idx])] + elif self.mosaic_epoch == 0 or self._epoch < self.mosaic_epoch: + roidb = [roidb, ] + [ + copy.deepcopy(self.roidbs[np.random.randint(n)]) + for _ in range(4) + ] + if isinstance(roidb, Sequence): + for r in roidb: + r['curr_iter'] = self._curr_iter + else: + roidb['curr_iter'] = self._curr_iter + self._curr_iter += 1 + + return self.transform(roidb) diff --git a/PaddleDetection-release-2.6/ppdet/data/source/dataset.py b/PaddleDetection-release-2.6/ppdet/data/source/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..4f22b222aa1a99bf1239db5c379cc4bd1a6632e0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/dataset.py @@ -0,0 +1,307 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import copy +import numpy as np +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence +from paddle.io import Dataset +from ppdet.core.workspace import register, serializable +from ppdet.utils.download import get_dataset_path +from ppdet.data import source + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@serializable +class DetDataset(Dataset): + """ + Load detection dataset. + + Args: + dataset_dir (str): root directory for dataset. + image_dir (str): directory for images. + anno_path (str): annotation file path. + data_fields (list): key name of data dictionary, at least have 'image'. + sample_num (int): number of samples to load, -1 means all. + use_default_label (bool): whether to load default label list. + repeat (int): repeat times for dataset, use in benchmark. + """ + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + use_default_label=None, + repeat=1, + **kwargs): + super(DetDataset, self).__init__() + self.dataset_dir = dataset_dir if dataset_dir is not None else '' + self.anno_path = anno_path + self.image_dir = image_dir if image_dir is not None else '' + self.data_fields = data_fields + self.sample_num = sample_num + self.use_default_label = use_default_label + self.repeat = repeat + self._epoch = 0 + self._curr_iter = 0 + + def __len__(self, ): + return len(self.roidbs) * self.repeat + + def __call__(self, *args, **kwargs): + return self + + def __getitem__(self, idx): + n = len(self.roidbs) + if self.repeat > 1: + idx %= n + # data batch + roidb = copy.deepcopy(self.roidbs[idx]) + if self.mixup_epoch == 0 or self._epoch < self.mixup_epoch: + idx = np.random.randint(n) + roidb = [roidb, copy.deepcopy(self.roidbs[idx])] + elif self.cutmix_epoch == 0 or self._epoch < self.cutmix_epoch: + idx = np.random.randint(n) + roidb = [roidb, copy.deepcopy(self.roidbs[idx])] + elif self.mosaic_epoch == 0 or self._epoch < self.mosaic_epoch: + roidb = [roidb, ] + [ + copy.deepcopy(self.roidbs[np.random.randint(n)]) + for _ in range(4) + ] + elif self.pre_img_epoch == 0 or self._epoch < self.pre_img_epoch: + # Add previous image as input, only used in CenterTrack + idx_pre_img = idx - 1 + if idx_pre_img < 0: + idx_pre_img = idx + 1 + roidb = [roidb, ] + [copy.deepcopy(self.roidbs[idx_pre_img])] + if isinstance(roidb, Sequence): + for r in roidb: + r['curr_iter'] = self._curr_iter + else: + roidb['curr_iter'] = self._curr_iter + self._curr_iter += 1 + + return self.transform(roidb) + + def check_or_download_dataset(self): + self.dataset_dir = get_dataset_path(self.dataset_dir, self.anno_path, + self.image_dir) + + def set_kwargs(self, **kwargs): + self.mixup_epoch = kwargs.get('mixup_epoch', -1) + self.cutmix_epoch = kwargs.get('cutmix_epoch', -1) + self.mosaic_epoch = kwargs.get('mosaic_epoch', -1) + self.pre_img_epoch = kwargs.get('pre_img_epoch', -1) + + def set_transform(self, transform): + self.transform = transform + + def set_epoch(self, epoch_id): + self._epoch = epoch_id + + def parse_dataset(self, ): + raise NotImplementedError( + "Need to implement parse_dataset method of Dataset") + + def get_anno(self): + if self.anno_path is None: + return + return os.path.join(self.dataset_dir, self.anno_path) + + +def _is_valid_file(f, extensions=('.jpg', '.jpeg', '.png', '.bmp')): + return f.lower().endswith(extensions) + + +def _make_dataset(dir): + dir = os.path.expanduser(dir) + if not os.path.isdir(dir): + raise ('{} should be a dir'.format(dir)) + images = [] + for root, _, fnames in sorted(os.walk(dir, followlinks=True)): + for fname in sorted(fnames): + path = os.path.join(root, fname) + if _is_valid_file(path): + images.append(path) + return images + + +@register +@serializable +class ImageFolder(DetDataset): + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + sample_num=-1, + use_default_label=None, + **kwargs): + super(ImageFolder, self).__init__( + dataset_dir, + image_dir, + anno_path, + sample_num=sample_num, + use_default_label=use_default_label) + self._imid2path = {} + self.roidbs = None + self.sample_num = sample_num + + def check_or_download_dataset(self): + return + + def get_anno(self): + if self.anno_path is None: + return + if self.dataset_dir: + return os.path.join(self.dataset_dir, self.anno_path) + else: + return self.anno_path + + def parse_dataset(self, ): + if not self.roidbs: + self.roidbs = self._load_images() + + def _parse(self): + image_dir = self.image_dir + if not isinstance(image_dir, Sequence): + image_dir = [image_dir] + images = [] + for im_dir in image_dir: + if os.path.isdir(im_dir): + im_dir = os.path.join(self.dataset_dir, im_dir) + images.extend(_make_dataset(im_dir)) + elif os.path.isfile(im_dir) and _is_valid_file(im_dir): + images.append(im_dir) + return images + + def _load_images(self): + images = self._parse() + ct = 0 + records = [] + for image in images: + assert image != '' and os.path.isfile(image), \ + "Image {} not found".format(image) + if self.sample_num > 0 and ct >= self.sample_num: + break + rec = {'im_id': np.array([ct]), 'im_file': image} + self._imid2path[ct] = image + ct += 1 + records.append(rec) + assert len(records) > 0, "No image file found" + return records + + def get_imid2path(self): + return self._imid2path + + def set_images(self, images): + self.image_dir = images + self.roidbs = self._load_images() + + def set_slice_images(self, + images, + slice_size=[640, 640], + overlap_ratio=[0.25, 0.25]): + self.image_dir = images + ori_records = self._load_images() + try: + import sahi + from sahi.slicing import slice_image + except Exception as e: + logger.error( + 'sahi not found, plaese install sahi. ' + 'for example: `pip install sahi`, see https://github.com/obss/sahi.' + ) + raise e + + sub_img_ids = 0 + ct = 0 + ct_sub = 0 + records = [] + for i, ori_rec in enumerate(ori_records): + im_path = ori_rec['im_file'] + slice_image_result = sahi.slicing.slice_image( + image=im_path, + slice_height=slice_size[0], + slice_width=slice_size[1], + overlap_height_ratio=overlap_ratio[0], + overlap_width_ratio=overlap_ratio[1]) + + sub_img_num = len(slice_image_result) + for _ind in range(sub_img_num): + im = slice_image_result.images[_ind] + rec = { + 'image': im, + 'im_id': np.array([sub_img_ids + _ind]), + 'h': im.shape[0], + 'w': im.shape[1], + 'ori_im_id': np.array([ori_rec['im_id'][0]]), + 'st_pix': np.array( + slice_image_result.starting_pixels[_ind], + dtype=np.float32), + 'is_last': 1 if _ind == sub_img_num - 1 else 0, + } if 'image' in self.data_fields else {} + records.append(rec) + ct_sub += sub_img_num + ct += 1 + logger.info('{} samples and slice to {} sub_samples.'.format(ct, + ct_sub)) + self.roidbs = records + + def get_label_list(self): + # Only VOC dataset needs label list in ImageFold + return self.anno_path + + +@register +class CommonDataset(object): + def __init__(self, **dataset_args): + super(CommonDataset, self).__init__() + dataset_args = copy.deepcopy(dataset_args) + type = dataset_args.pop("name") + self.dataset = getattr(source, type)(**dataset_args) + + def __call__(self): + return self.dataset + + +@register +class TrainDataset(CommonDataset): + pass + + +@register +class EvalMOTDataset(CommonDataset): + pass + + +@register +class TestMOTDataset(CommonDataset): + pass + + +@register +class EvalDataset(CommonDataset): + pass + + +@register +class TestDataset(CommonDataset): + pass diff --git a/PaddleDetection-release-2.6/ppdet/data/source/keypoint_coco.py b/PaddleDetection-release-2.6/ppdet/data/source/keypoint_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..11ecea538404bf498c66a90f4e8293824edbf317 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/keypoint_coco.py @@ -0,0 +1,724 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is base on https://github.com/open-mmlab/mmpose +""" +import os +import cv2 +import numpy as np +import json +import copy +import pycocotools +from pycocotools.coco import COCO +from .dataset import DetDataset +from ppdet.core.workspace import register, serializable + + +@serializable +class KeypointBottomUpBaseDataset(DetDataset): + """Base class for bottom-up datasets. + + All datasets should subclass it. + All subclasses should overwrite: + Methods:`_get_imganno` + + Args: + dataset_dir (str): Root path to the dataset. + anno_path (str): Relative path to the annotation file. + image_dir (str): Path to a directory where images are held. + Default: None. + num_joints (int): keypoint numbers + transform (composed(operators)): A sequence of data transforms. + shard (list): [rank, worldsize], the distributed env params + test_mode (bool): Store True when building test or + validation dataset. Default: False. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + transform=[], + shard=[0, 1], + test_mode=False): + super().__init__(dataset_dir, image_dir, anno_path) + self.image_info = {} + self.ann_info = {} + + self.img_prefix = os.path.join(dataset_dir, image_dir) + self.transform = transform + self.test_mode = test_mode + + self.ann_info['num_joints'] = num_joints + self.img_ids = [] + + def parse_dataset(self): + pass + + def __len__(self): + """Get dataset length.""" + return len(self.img_ids) + + def _get_imganno(self, idx): + """Get anno for a single image.""" + raise NotImplementedError + + def __getitem__(self, idx): + """Prepare image for training given the index.""" + records = copy.deepcopy(self._get_imganno(idx)) + records['image'] = cv2.imread(records['image_file']) + records['image'] = cv2.cvtColor(records['image'], cv2.COLOR_BGR2RGB) + if 'mask' in records: + records['mask'] = (records['mask'] + 0).astype('uint8') + records = self.transform(records) + return records + + def parse_dataset(self): + return + + +@register +@serializable +class KeypointBottomUpCocoDataset(KeypointBottomUpBaseDataset): + """COCO dataset for bottom-up pose estimation. + + The dataset loads raw features and apply specified transforms + to return a dict containing the image tensors and other information. + + COCO keypoint indexes:: + + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' + + Args: + dataset_dir (str): Root path to the dataset. + anno_path (str): Relative path to the annotation file. + image_dir (str): Path to a directory where images are held. + Default: None. + num_joints (int): keypoint numbers + transform (composed(operators)): A sequence of data transforms. + shard (list): [rank, worldsize], the distributed env params + test_mode (bool): Store True when building test or + validation dataset. Default: False. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + transform=[], + shard=[0, 1], + test_mode=False, + return_mask=True, + return_bbox=True, + return_area=True, + return_class=True): + super().__init__(dataset_dir, image_dir, anno_path, num_joints, + transform, shard, test_mode) + + self.ann_file = os.path.join(dataset_dir, anno_path) + self.shard = shard + self.test_mode = test_mode + self.return_mask = return_mask + self.return_bbox = return_bbox + self.return_area = return_area + self.return_class = return_class + + def parse_dataset(self): + self.coco = COCO(self.ann_file) + + self.img_ids = self.coco.getImgIds() + if not self.test_mode: + self.img_ids_tmp = [] + for img_id in self.img_ids: + ann_ids = self.coco.getAnnIds(imgIds=img_id) + anno = self.coco.loadAnns(ann_ids) + anno = [obj for obj in anno if obj['iscrowd'] == 0] + if len(anno) == 0: + continue + self.img_ids_tmp.append(img_id) + self.img_ids = self.img_ids_tmp + + blocknum = int(len(self.img_ids) / self.shard[1]) + self.img_ids = self.img_ids[(blocknum * self.shard[0]):(blocknum * ( + self.shard[0] + 1))] + self.num_images = len(self.img_ids) + self.id2name, self.name2id = self._get_mapping_id_name(self.coco.imgs) + self.dataset_name = 'coco' + + cat_ids = self.coco.getCatIds() + self.catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)}) + print('=> num_images: {}'.format(self.num_images)) + + @staticmethod + def _get_mapping_id_name(imgs): + """ + Args: + imgs (dict): dict of image info. + + Returns: + tuple: Image name & id mapping dicts. + + - id2name (dict): Mapping image id to name. + - name2id (dict): Mapping image name to id. + """ + id2name = {} + name2id = {} + for image_id, image in imgs.items(): + file_name = image['file_name'] + id2name[image_id] = file_name + name2id[file_name] = image_id + + return id2name, name2id + + def _get_imganno(self, idx): + """Get anno for a single image. + + Args: + idx (int): image idx + + Returns: + dict: info for model training + """ + coco = self.coco + img_id = self.img_ids[idx] + ann_ids = coco.getAnnIds(imgIds=img_id) + anno = coco.loadAnns(ann_ids) + + anno = [ + obj for obj in anno + if obj['iscrowd'] == 0 and obj['num_keypoints'] > 0 + ] + + db_rec = {} + joints, orgsize = self._get_joints(anno, idx) + db_rec['gt_joints'] = joints + db_rec['im_shape'] = orgsize + + if self.return_bbox: + db_rec['gt_bbox'] = self._get_bboxs(anno, idx) + + if self.return_class: + db_rec['gt_class'] = self._get_labels(anno, idx) + + if self.return_area: + db_rec['gt_areas'] = self._get_areas(anno, idx) + + if self.return_mask: + db_rec['mask'] = self._get_mask(anno, idx) + + db_rec['im_id'] = img_id + db_rec['image_file'] = os.path.join(self.img_prefix, + self.id2name[img_id]) + + return db_rec + + def _get_joints(self, anno, idx): + """Get joints for all people in an image.""" + num_people = len(anno) + + joints = np.zeros( + (num_people, self.ann_info['num_joints'], 3), dtype=np.float32) + + for i, obj in enumerate(anno): + joints[i, :self.ann_info['num_joints'], :3] = \ + np.array(obj['keypoints']).reshape([-1, 3]) + + img_info = self.coco.loadImgs(self.img_ids[idx])[0] + orgsize = np.array([img_info['height'], img_info['width'], 1]) + + return joints, orgsize + + def _get_bboxs(self, anno, idx): + num_people = len(anno) + gt_bboxes = np.zeros((num_people, 4), dtype=np.float32) + + for idx, obj in enumerate(anno): + if 'bbox' in obj: + gt_bboxes[idx, :] = obj['bbox'] + + gt_bboxes[:, 2] += gt_bboxes[:, 0] + gt_bboxes[:, 3] += gt_bboxes[:, 1] + return gt_bboxes + + def _get_labels(self, anno, idx): + num_people = len(anno) + gt_labels = np.zeros((num_people, 1), dtype=np.float32) + + for idx, obj in enumerate(anno): + if 'category_id' in obj: + catid = obj['category_id'] + gt_labels[idx, 0] = self.catid2clsid[catid] + return gt_labels + + def _get_areas(self, anno, idx): + num_people = len(anno) + gt_areas = np.zeros((num_people, ), dtype=np.float32) + + for idx, obj in enumerate(anno): + if 'area' in obj: + gt_areas[idx, ] = obj['area'] + return gt_areas + + def _get_mask(self, anno, idx): + """Get ignore masks to mask out losses.""" + coco = self.coco + img_info = coco.loadImgs(self.img_ids[idx])[0] + + m = np.zeros((img_info['height'], img_info['width']), dtype=np.float32) + + for obj in anno: + if 'segmentation' in obj: + if obj['iscrowd']: + rle = pycocotools.mask.frPyObjects(obj['segmentation'], + img_info['height'], + img_info['width']) + m += pycocotools.mask.decode(rle) + elif obj['num_keypoints'] == 0: + rles = pycocotools.mask.frPyObjects(obj['segmentation'], + img_info['height'], + img_info['width']) + for rle in rles: + m += pycocotools.mask.decode(rle) + + return m < 0.5 + + +@register +@serializable +class KeypointBottomUpCrowdPoseDataset(KeypointBottomUpCocoDataset): + """CrowdPose dataset for bottom-up pose estimation. + + The dataset loads raw features and apply specified transforms + to return a dict containing the image tensors and other information. + + CrowdPose keypoint indexes:: + + 0: 'left_shoulder', + 1: 'right_shoulder', + 2: 'left_elbow', + 3: 'right_elbow', + 4: 'left_wrist', + 5: 'right_wrist', + 6: 'left_hip', + 7: 'right_hip', + 8: 'left_knee', + 9: 'right_knee', + 10: 'left_ankle', + 11: 'right_ankle', + 12: 'top_head', + 13: 'neck' + + Args: + dataset_dir (str): Root path to the dataset. + anno_path (str): Relative path to the annotation file. + image_dir (str): Path to a directory where images are held. + Default: None. + num_joints (int): keypoint numbers + transform (composed(operators)): A sequence of data transforms. + shard (list): [rank, worldsize], the distributed env params + test_mode (bool): Store True when building test or + validation dataset. Default: False. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + transform=[], + shard=[0, 1], + test_mode=False): + super().__init__(dataset_dir, image_dir, anno_path, num_joints, + transform, shard, test_mode) + + self.ann_file = os.path.join(dataset_dir, anno_path) + self.shard = shard + self.test_mode = test_mode + + def parse_dataset(self): + self.coco = COCO(self.ann_file) + + self.img_ids = self.coco.getImgIds() + if not self.test_mode: + self.img_ids = [ + img_id for img_id in self.img_ids + if len(self.coco.getAnnIds( + imgIds=img_id, iscrowd=None)) > 0 + ] + blocknum = int(len(self.img_ids) / self.shard[1]) + self.img_ids = self.img_ids[(blocknum * self.shard[0]):(blocknum * ( + self.shard[0] + 1))] + self.num_images = len(self.img_ids) + self.id2name, self.name2id = self._get_mapping_id_name(self.coco.imgs) + + self.dataset_name = 'crowdpose' + print('=> num_images: {}'.format(self.num_images)) + + +@serializable +class KeypointTopDownBaseDataset(DetDataset): + """Base class for top_down datasets. + + All datasets should subclass it. + All subclasses should overwrite: + Methods:`_get_db` + + Args: + dataset_dir (str): Root path to the dataset. + image_dir (str): Path to a directory where images are held. + anno_path (str): Relative path to the annotation file. + num_joints (int): keypoint numbers + transform (composed(operators)): A sequence of data transforms. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + transform=[]): + super().__init__(dataset_dir, image_dir, anno_path) + self.image_info = {} + self.ann_info = {} + + self.img_prefix = os.path.join(dataset_dir, image_dir) + self.transform = transform + + self.ann_info['num_joints'] = num_joints + self.db = [] + + def __len__(self): + """Get dataset length.""" + return len(self.db) + + def _get_db(self): + """Get a sample""" + raise NotImplementedError + + def __getitem__(self, idx): + """Prepare sample for training given the index.""" + records = copy.deepcopy(self.db[idx]) + records['image'] = cv2.imread(records['image_file'], cv2.IMREAD_COLOR | + cv2.IMREAD_IGNORE_ORIENTATION) + records['image'] = cv2.cvtColor(records['image'], cv2.COLOR_BGR2RGB) + records['score'] = records['score'] if 'score' in records else 1 + records = self.transform(records) + # print('records', records) + return records + + +@register +@serializable +class KeypointTopDownCocoDataset(KeypointTopDownBaseDataset): + """COCO dataset for top-down pose estimation. + + The dataset loads raw features and apply specified transforms + to return a dict containing the image tensors and other information. + + COCO keypoint indexes: + + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' + + Args: + dataset_dir (str): Root path to the dataset. + image_dir (str): Path to a directory where images are held. + anno_path (str): Relative path to the annotation file. + num_joints (int): Keypoint numbers + trainsize (list):[w, h] Image target size + transform (composed(operators)): A sequence of data transforms. + bbox_file (str): Path to a detection bbox file + Default: None. + use_gt_bbox (bool): Whether to use ground truth bbox + Default: True. + pixel_std (int): The pixel std of the scale + Default: 200. + image_thre (float): The threshold to filter the detection box + Default: 0.0. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + trainsize, + transform=[], + bbox_file=None, + use_gt_bbox=True, + pixel_std=200, + image_thre=0.0): + super().__init__(dataset_dir, image_dir, anno_path, num_joints, + transform) + + self.bbox_file = bbox_file + self.use_gt_bbox = use_gt_bbox + self.trainsize = trainsize + self.pixel_std = pixel_std + self.image_thre = image_thre + self.dataset_name = 'coco' + + def parse_dataset(self): + if self.use_gt_bbox: + self.db = self._load_coco_keypoint_annotations() + else: + self.db = self._load_coco_person_detection_results() + + def _load_coco_keypoint_annotations(self): + coco = COCO(self.get_anno()) + img_ids = coco.getImgIds() + gt_db = [] + for index in img_ids: + im_ann = coco.loadImgs(index)[0] + width = im_ann['width'] + height = im_ann['height'] + file_name = im_ann['file_name'] + im_id = int(im_ann["id"]) + + annIds = coco.getAnnIds(imgIds=index, iscrowd=False) + objs = coco.loadAnns(annIds) + + valid_objs = [] + for obj in objs: + x, y, w, h = obj['bbox'] + x1 = np.max((0, x)) + y1 = np.max((0, y)) + x2 = np.min((width - 1, x1 + np.max((0, w - 1)))) + y2 = np.min((height - 1, y1 + np.max((0, h - 1)))) + if obj['area'] > 0 and x2 >= x1 and y2 >= y1: + obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1] + valid_objs.append(obj) + objs = valid_objs + + rec = [] + for obj in objs: + if max(obj['keypoints']) == 0: + continue + + joints = np.zeros( + (self.ann_info['num_joints'], 3), dtype=np.float32) + joints_vis = np.zeros( + (self.ann_info['num_joints'], 3), dtype=np.float32) + for ipt in range(self.ann_info['num_joints']): + joints[ipt, 0] = obj['keypoints'][ipt * 3 + 0] + joints[ipt, 1] = obj['keypoints'][ipt * 3 + 1] + joints[ipt, 2] = 0 + t_vis = obj['keypoints'][ipt * 3 + 2] + if t_vis > 1: + t_vis = 1 + joints_vis[ipt, 0] = t_vis + joints_vis[ipt, 1] = t_vis + joints_vis[ipt, 2] = 0 + + center, scale = self._box2cs(obj['clean_bbox'][:4]) + rec.append({ + 'image_file': os.path.join(self.img_prefix, file_name), + 'center': center, + 'scale': scale, + 'gt_joints': joints, + 'joints_vis': joints_vis, + 'im_id': im_id, + }) + gt_db.extend(rec) + + return gt_db + + def _box2cs(self, box): + x, y, w, h = box[:4] + center = np.zeros((2), dtype=np.float32) + center[0] = x + w * 0.5 + center[1] = y + h * 0.5 + aspect_ratio = self.trainsize[0] * 1.0 / self.trainsize[1] + + if w > aspect_ratio * h: + h = w * 1.0 / aspect_ratio + elif w < aspect_ratio * h: + w = h * aspect_ratio + scale = np.array( + [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std], + dtype=np.float32) + if center[0] != -1: + scale = scale * 1.25 + + return center, scale + + def _load_coco_person_detection_results(self): + all_boxes = None + bbox_file_path = os.path.join(self.dataset_dir, self.bbox_file) + with open(bbox_file_path, 'r') as f: + all_boxes = json.load(f) + + if not all_boxes: + print('=> Load %s fail!' % bbox_file_path) + return None + + kpt_db = [] + for n_img in range(0, len(all_boxes)): + det_res = all_boxes[n_img] + if det_res['category_id'] != 1: + continue + file_name = det_res[ + 'filename'] if 'filename' in det_res else '%012d.jpg' % det_res[ + 'image_id'] + img_name = os.path.join(self.img_prefix, file_name) + box = det_res['bbox'] + score = det_res['score'] + im_id = int(det_res['image_id']) + + if score < self.image_thre: + continue + + center, scale = self._box2cs(box) + joints = np.zeros( + (self.ann_info['num_joints'], 3), dtype=np.float32) + joints_vis = np.ones( + (self.ann_info['num_joints'], 3), dtype=np.float32) + kpt_db.append({ + 'image_file': img_name, + 'im_id': im_id, + 'center': center, + 'scale': scale, + 'score': score, + 'gt_joints': joints, + 'joints_vis': joints_vis, + }) + + return kpt_db + + +@register +@serializable +class KeypointTopDownMPIIDataset(KeypointTopDownBaseDataset): + """MPII dataset for topdown pose estimation. + + The dataset loads raw features and apply specified transforms + to return a dict containing the image tensors and other information. + + MPII keypoint indexes:: + + 0: 'right_ankle', + 1: 'right_knee', + 2: 'right_hip', + 3: 'left_hip', + 4: 'left_knee', + 5: 'left_ankle', + 6: 'pelvis', + 7: 'thorax', + 8: 'upper_neck', + 9: 'head_top', + 10: 'right_wrist', + 11: 'right_elbow', + 12: 'right_shoulder', + 13: 'left_shoulder', + 14: 'left_elbow', + 15: 'left_wrist', + + Args: + dataset_dir (str): Root path to the dataset. + image_dir (str): Path to a directory where images are held. + anno_path (str): Relative path to the annotation file. + num_joints (int): Keypoint numbers + trainsize (list):[w, h] Image target size + transform (composed(operators)): A sequence of data transforms. + """ + + def __init__(self, + dataset_dir, + image_dir, + anno_path, + num_joints, + transform=[]): + super().__init__(dataset_dir, image_dir, anno_path, num_joints, + transform) + + self.dataset_name = 'mpii' + + def parse_dataset(self): + with open(self.get_anno()) as anno_file: + anno = json.load(anno_file) + + gt_db = [] + for a in anno: + image_name = a['image'] + im_id = a['image_id'] if 'image_id' in a else int( + os.path.splitext(image_name)[0]) + + c = np.array(a['center'], dtype=np.float32) + s = np.array([a['scale'], a['scale']], dtype=np.float32) + + # Adjust center/scale slightly to avoid cropping limbs + if c[0] != -1: + c[1] = c[1] + 15 * s[1] + s = s * 1.25 + c = c - 1 + + joints = np.zeros( + (self.ann_info['num_joints'], 3), dtype=np.float32) + joints_vis = np.zeros( + (self.ann_info['num_joints'], 3), dtype=np.float32) + if 'gt_joints' in a: + joints_ = np.array(a['gt_joints']) + joints_[:, 0:2] = joints_[:, 0:2] - 1 + joints_vis_ = np.array(a['joints_vis']) + assert len(joints_) == self.ann_info[ + 'num_joints'], 'joint num diff: {} vs {}'.format( + len(joints_), self.ann_info['num_joints']) + + joints[:, 0:2] = joints_[:, 0:2] + joints_vis[:, 0] = joints_vis_[:] + joints_vis[:, 1] = joints_vis_[:] + + gt_db.append({ + 'image_file': os.path.join(self.img_prefix, image_name), + 'im_id': im_id, + 'center': c, + 'scale': s, + 'gt_joints': joints, + 'joints_vis': joints_vis + }) + print("number length: {}".format(len(gt_db))) + self.db = gt_db diff --git a/PaddleDetection-release-2.6/ppdet/data/source/mot.py b/PaddleDetection-release-2.6/ppdet/data/source/mot.py new file mode 100644 index 0000000000000000000000000000000000000000..90a8a1fe88d70e1627623c1cc721f2c6eb9781e4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/mot.py @@ -0,0 +1,638 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import cv2 +import glob +import numpy as np +from collections import OrderedDict, defaultdict +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence +from .dataset import DetDataset, _make_dataset, _is_valid_file +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register +@serializable +class MOTDataSet(DetDataset): + """ + Load dataset with MOT format, only support single class MOT. + + Args: + dataset_dir (str): root directory for dataset. + image_lists (str|list): mot data image lists, muiti-source mot dataset. + data_fields (list): key name of data dictionary, at least have 'image'. + sample_num (int): number of samples to load, -1 means all. + repeat (int): repeat times for dataset, use in benchmark. + + Notes: + MOT datasets root directory following this: + dataset/mot + |——————image_lists + | |——————caltech.train + | |——————caltech.val + | |——————mot16.train + | |——————mot17.train + | ...... + |——————Caltech + |——————MOT17 + |——————...... + + All the MOT datasets have the following structure: + Caltech + |——————images + | └——————00001.jpg + | |—————— ... + | └——————0000N.jpg + └——————labels_with_ids + └——————00001.txt + |—————— ... + └——————0000N.txt + or + + MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train + """ + + def __init__(self, + dataset_dir=None, + image_lists=[], + data_fields=['image'], + sample_num=-1, + repeat=1): + super(MOTDataSet, self).__init__( + dataset_dir=dataset_dir, + data_fields=data_fields, + sample_num=sample_num, + repeat=repeat) + self.dataset_dir = dataset_dir + self.image_lists = image_lists + if isinstance(self.image_lists, str): + self.image_lists = [self.image_lists] + self.roidbs = None + self.cname2cid = None + + def get_anno(self): + if self.image_lists == []: + return + # only used to get categories and metric + # only check first data, but the label_list of all data should be same. + first_mot_data = self.image_lists[0].split('.')[0] + anno_file = os.path.join(self.dataset_dir, first_mot_data, + 'label_list.txt') + return anno_file + + def parse_dataset(self): + self.img_files = OrderedDict() + self.img_start_index = OrderedDict() + self.label_files = OrderedDict() + self.tid_num = OrderedDict() + self.tid_start_index = OrderedDict() + + img_index = 0 + for data_name in self.image_lists: + # check every data image list + image_lists_dir = os.path.join(self.dataset_dir, 'image_lists') + assert os.path.isdir(image_lists_dir), \ + "The {} is not a directory.".format(image_lists_dir) + + list_path = os.path.join(image_lists_dir, data_name) + assert os.path.exists(list_path), \ + "The list path {} does not exist.".format(list_path) + + # record img_files, filter out empty ones + with open(list_path, 'r') as file: + self.img_files[data_name] = file.readlines() + self.img_files[data_name] = [ + os.path.join(self.dataset_dir, x.strip()) + for x in self.img_files[data_name] + ] + self.img_files[data_name] = list( + filter(lambda x: len(x) > 0, self.img_files[data_name])) + + self.img_start_index[data_name] = img_index + img_index += len(self.img_files[data_name]) + + # record label_files + self.label_files[data_name] = [ + x.replace('images', 'labels_with_ids').replace( + '.png', '.txt').replace('.jpg', '.txt') + for x in self.img_files[data_name] + ] + + for data_name, label_paths in self.label_files.items(): + max_index = -1 + for lp in label_paths: + lb = np.loadtxt(lp) + if len(lb) < 1: + continue + if len(lb.shape) < 2: + img_max = lb[1] + else: + img_max = np.max(lb[:, 1]) + if img_max > max_index: + max_index = img_max + self.tid_num[data_name] = int(max_index + 1) + + last_index = 0 + for i, (k, v) in enumerate(self.tid_num.items()): + self.tid_start_index[k] = last_index + last_index += v + + self.num_identities_dict = defaultdict(int) + self.num_identities_dict[0] = int(last_index + 1) # single class + self.num_imgs_each_data = [len(x) for x in self.img_files.values()] + self.total_imgs = sum(self.num_imgs_each_data) + + logger.info('MOT dataset summary: ') + logger.info(self.tid_num) + logger.info('Total images: {}'.format(self.total_imgs)) + logger.info('Image start index: {}'.format(self.img_start_index)) + logger.info('Total identities: {}'.format(self.num_identities_dict[0])) + logger.info('Identity start index: {}'.format(self.tid_start_index)) + + records = [] + cname2cid = mot_label() + + for img_index in range(self.total_imgs): + for i, (k, v) in enumerate(self.img_start_index.items()): + if img_index >= v: + data_name = list(self.label_files.keys())[i] + start_index = v + img_file = self.img_files[data_name][img_index - start_index] + lbl_file = self.label_files[data_name][img_index - start_index] + + if not os.path.exists(img_file): + logger.warning('Illegal image file: {}, and it will be ignored'. + format(img_file)) + continue + if not os.path.isfile(lbl_file): + logger.warning('Illegal label file: {}, and it will be ignored'. + format(lbl_file)) + continue + + labels = np.loadtxt(lbl_file, dtype=np.float32).reshape(-1, 6) + # each row in labels (N, 6) is [gt_class, gt_identity, cx, cy, w, h] + + cx, cy = labels[:, 2], labels[:, 3] + w, h = labels[:, 4], labels[:, 5] + gt_bbox = np.stack((cx, cy, w, h)).T.astype('float32') + gt_class = labels[:, 0:1].astype('int32') + gt_score = np.ones((len(labels), 1)).astype('float32') + gt_ide = labels[:, 1:2].astype('int32') + for i, _ in enumerate(gt_ide): + if gt_ide[i] > -1: + gt_ide[i] += self.tid_start_index[data_name] + + mot_rec = { + 'im_file': img_file, + 'im_id': img_index, + } if 'image' in self.data_fields else {} + + gt_rec = { + 'gt_class': gt_class, + 'gt_score': gt_score, + 'gt_bbox': gt_bbox, + 'gt_ide': gt_ide, + } + + for k, v in gt_rec.items(): + if k in self.data_fields: + mot_rec[k] = v + + records.append(mot_rec) + if self.sample_num > 0 and img_index >= self.sample_num: + break + assert len(records) > 0, 'not found any mot record in %s' % ( + self.image_lists) + self.roidbs, self.cname2cid = records, cname2cid + + +@register +@serializable +class MCMOTDataSet(DetDataset): + """ + Load dataset with MOT format, support multi-class MOT. + + Args: + dataset_dir (str): root directory for dataset. + image_lists (list(str)): mcmot data image lists, muiti-source mcmot dataset. + data_fields (list): key name of data dictionary, at least have 'image'. + label_list (str): if use_default_label is False, will load + mapping between category and class index. + sample_num (int): number of samples to load, -1 means all. + + Notes: + MCMOT datasets root directory following this: + dataset/mot + |——————image_lists + | |——————visdrone_mcmot.train + | |——————visdrone_mcmot.val + visdrone_mcmot + |——————images + | └——————train + | └——————val + └——————labels_with_ids + └——————train + """ + + def __init__(self, + dataset_dir=None, + image_lists=[], + data_fields=['image'], + label_list=None, + sample_num=-1): + super(MCMOTDataSet, self).__init__( + dataset_dir=dataset_dir, + data_fields=data_fields, + sample_num=sample_num) + self.dataset_dir = dataset_dir + self.image_lists = image_lists + if isinstance(self.image_lists, str): + self.image_lists = [self.image_lists] + self.label_list = label_list + self.roidbs = None + self.cname2cid = None + + def get_anno(self): + if self.image_lists == []: + return + # only used to get categories and metric + # only check first data, but the label_list of all data should be same. + first_mot_data = self.image_lists[0].split('.')[0] + anno_file = os.path.join(self.dataset_dir, first_mot_data, + 'label_list.txt') + return anno_file + + def parse_dataset(self): + self.img_files = OrderedDict() + self.img_start_index = OrderedDict() + self.label_files = OrderedDict() + self.tid_num = OrderedDict() + self.tid_start_idx_of_cls_ids = defaultdict(dict) # for MCMOT + + img_index = 0 + for data_name in self.image_lists: + # check every data image list + image_lists_dir = os.path.join(self.dataset_dir, 'image_lists') + assert os.path.isdir(image_lists_dir), \ + "The {} is not a directory.".format(image_lists_dir) + + list_path = os.path.join(image_lists_dir, data_name) + assert os.path.exists(list_path), \ + "The list path {} does not exist.".format(list_path) + + # record img_files, filter out empty ones + with open(list_path, 'r') as file: + self.img_files[data_name] = file.readlines() + self.img_files[data_name] = [ + os.path.join(self.dataset_dir, x.strip()) + for x in self.img_files[data_name] + ] + self.img_files[data_name] = list( + filter(lambda x: len(x) > 0, self.img_files[data_name])) + + self.img_start_index[data_name] = img_index + img_index += len(self.img_files[data_name]) + + # record label_files + self.label_files[data_name] = [ + x.replace('images', 'labels_with_ids').replace( + '.png', '.txt').replace('.jpg', '.txt') + for x in self.img_files[data_name] + ] + + for data_name, label_paths in self.label_files.items(): + # using max_ids_dict rather than max_index + max_ids_dict = defaultdict(int) + for lp in label_paths: + lb = np.loadtxt(lp) + if len(lb) < 1: + continue + lb = lb.reshape(-1, 6) + for item in lb: + if item[1] > max_ids_dict[int(item[0])]: + # item[0]: cls_id + # item[1]: track id + max_ids_dict[int(item[0])] = int(item[1]) + # track id number + self.tid_num[data_name] = max_ids_dict + + last_idx_dict = defaultdict(int) + for i, (k, v) in enumerate(self.tid_num.items()): # each sub dataset + for cls_id, id_num in v.items(): # v is a max_ids_dict + self.tid_start_idx_of_cls_ids[k][cls_id] = last_idx_dict[cls_id] + last_idx_dict[cls_id] += id_num + + self.num_identities_dict = defaultdict(int) + for k, v in last_idx_dict.items(): + self.num_identities_dict[k] = int(v) # total ids of each category + + self.num_imgs_each_data = [len(x) for x in self.img_files.values()] + self.total_imgs = sum(self.num_imgs_each_data) + + # cname2cid and cid2cname + cname2cid = {} + if self.label_list is not None: + # if use label_list for multi source mix dataset, + # please make sure label_list in the first sub_dataset at least. + sub_dataset = self.image_lists[0].split('.')[0] + label_path = os.path.join(self.dataset_dir, sub_dataset, + self.label_list) + if not os.path.exists(label_path): + logger.info( + "Note: label_list {} does not exists, use VisDrone 10 classes labels as default.". + format(label_path)) + cname2cid = visdrone_mcmot_label() + else: + with open(label_path, 'r') as fr: + label_id = 0 + for line in fr.readlines(): + cname2cid[line.strip()] = label_id + label_id += 1 + else: + cname2cid = visdrone_mcmot_label() + + cid2cname = dict([(v, k) for (k, v) in cname2cid.items()]) + + logger.info('MCMOT dataset summary: ') + logger.info(self.tid_num) + logger.info('Total images: {}'.format(self.total_imgs)) + logger.info('Image start index: {}'.format(self.img_start_index)) + + logger.info('Total identities of each category: ') + num_identities_dict = sorted( + self.num_identities_dict.items(), key=lambda x: x[0]) + total_IDs_all_cats = 0 + for (k, v) in num_identities_dict: + logger.info('Category {} [{}] has {} IDs.'.format(k, cid2cname[k], + v)) + total_IDs_all_cats += v + logger.info('Total identities of all categories: {}'.format( + total_IDs_all_cats)) + + logger.info('Identity start index of each category: ') + for k, v in self.tid_start_idx_of_cls_ids.items(): + sorted_v = sorted(v.items(), key=lambda x: x[0]) + for (cls_id, start_idx) in sorted_v: + logger.info('Start index of dataset {} category {:d} is {:d}' + .format(k, cls_id, start_idx)) + + records = [] + for img_index in range(self.total_imgs): + for i, (k, v) in enumerate(self.img_start_index.items()): + if img_index >= v: + data_name = list(self.label_files.keys())[i] + start_index = v + img_file = self.img_files[data_name][img_index - start_index] + lbl_file = self.label_files[data_name][img_index - start_index] + + if not os.path.exists(img_file): + logger.warning('Illegal image file: {}, and it will be ignored'. + format(img_file)) + continue + if not os.path.isfile(lbl_file): + logger.warning('Illegal label file: {}, and it will be ignored'. + format(lbl_file)) + continue + + labels = np.loadtxt(lbl_file, dtype=np.float32).reshape(-1, 6) + # each row in labels (N, 6) is [gt_class, gt_identity, cx, cy, w, h] + + cx, cy = labels[:, 2], labels[:, 3] + w, h = labels[:, 4], labels[:, 5] + gt_bbox = np.stack((cx, cy, w, h)).T.astype('float32') + gt_class = labels[:, 0:1].astype('int32') + gt_score = np.ones((len(labels), 1)).astype('float32') + gt_ide = labels[:, 1:2].astype('int32') + for i, _ in enumerate(gt_ide): + if gt_ide[i] > -1: + cls_id = int(gt_class[i]) + start_idx = self.tid_start_idx_of_cls_ids[data_name][cls_id] + gt_ide[i] += start_idx + + mot_rec = { + 'im_file': img_file, + 'im_id': img_index, + } if 'image' in self.data_fields else {} + + gt_rec = { + 'gt_class': gt_class, + 'gt_score': gt_score, + 'gt_bbox': gt_bbox, + 'gt_ide': gt_ide, + } + + for k, v in gt_rec.items(): + if k in self.data_fields: + mot_rec[k] = v + + records.append(mot_rec) + if self.sample_num > 0 and img_index >= self.sample_num: + break + assert len(records) > 0, 'not found any mot record in %s' % ( + self.image_lists) + self.roidbs, self.cname2cid = records, cname2cid + + +@register +@serializable +class MOTImageFolder(DetDataset): + """ + Load MOT dataset with MOT format from image folder or video . + Args: + video_file (str): path of the video file, default ''. + frame_rate (int): frame rate of the video, use cv2 VideoCapture if not set. + dataset_dir (str): root directory for dataset. + keep_ori_im (bool): whether to keep original image, default False. + Set True when used during MOT model inference while saving + images or video, or used in DeepSORT. + """ + + def __init__(self, + video_file=None, + frame_rate=-1, + dataset_dir=None, + data_root=None, + image_dir=None, + sample_num=-1, + keep_ori_im=False, + anno_path=None, + **kwargs): + super(MOTImageFolder, self).__init__( + dataset_dir, image_dir, sample_num=sample_num) + self.video_file = video_file + self.data_root = data_root + self.keep_ori_im = keep_ori_im + self._imid2path = {} + self.roidbs = None + self.frame_rate = frame_rate + self.anno_path = anno_path + + def check_or_download_dataset(self): + return + + def parse_dataset(self, ): + if not self.roidbs: + if self.video_file is None: + self.frame_rate = 30 # set as default if infer image folder + self.roidbs = self._load_images() + else: + self.roidbs = self._load_video_images() + + def _load_video_images(self): + if self.frame_rate == -1: + # if frame_rate is not set for video, use cv2.VideoCapture + cap = cv2.VideoCapture(self.video_file) + self.frame_rate = int(cap.get(cv2.CAP_PROP_FPS)) + + extension = self.video_file.split('.')[-1] + output_path = self.video_file.replace('.{}'.format(extension), '') + frames_path = video2frames(self.video_file, output_path, + self.frame_rate) + self.video_frames = sorted( + glob.glob(os.path.join(frames_path, '*.png'))) + + self.video_length = len(self.video_frames) + logger.info('Length of the video: {:d} frames.'.format( + self.video_length)) + ct = 0 + records = [] + for image in self.video_frames: + assert image != '' and os.path.isfile(image), \ + "Image {} not found".format(image) + if self.sample_num > 0 and ct >= self.sample_num: + break + rec = {'im_id': np.array([ct]), 'im_file': image} + if self.keep_ori_im: + rec.update({'keep_ori_im': 1}) + self._imid2path[ct] = image + ct += 1 + records.append(rec) + assert len(records) > 0, "No image file found" + return records + + def _find_images(self): + image_dir = self.image_dir + if not isinstance(image_dir, Sequence): + image_dir = [image_dir] + images = [] + for im_dir in image_dir: + if os.path.isdir(im_dir): + im_dir = os.path.join(self.dataset_dir, im_dir) + images.extend(_make_dataset(im_dir)) + elif os.path.isfile(im_dir) and _is_valid_file(im_dir): + images.append(im_dir) + return images + + def _load_images(self): + images = self._find_images() + ct = 0 + records = [] + for image in images: + assert image != '' and os.path.isfile(image), \ + "Image {} not found".format(image) + if self.sample_num > 0 and ct >= self.sample_num: + break + rec = {'im_id': np.array([ct]), 'im_file': image} + if self.keep_ori_im: + rec.update({'keep_ori_im': 1}) + self._imid2path[ct] = image + ct += 1 + records.append(rec) + assert len(records) > 0, "No image file found" + return records + + def get_imid2path(self): + return self._imid2path + + def set_images(self, images): + self.image_dir = images + self.roidbs = self._load_images() + + def set_video(self, video_file, frame_rate): + # update video_file and frame_rate by command line of tools/infer_mot.py + self.video_file = video_file + self.frame_rate = frame_rate + assert os.path.isfile(self.video_file) and _is_valid_video(self.video_file), \ + "wrong or unsupported file format: {}".format(self.video_file) + self.roidbs = self._load_video_images() + + def get_anno(self): + return self.anno_path + + +def _is_valid_video(f, extensions=('.mp4', '.avi', '.mov', '.rmvb', 'flv')): + return f.lower().endswith(extensions) + + +def video2frames(video_path, outpath, frame_rate, **kargs): + def _dict2str(kargs): + cmd_str = '' + for k, v in kargs.items(): + cmd_str += (' ' + str(k) + ' ' + str(v)) + return cmd_str + + ffmpeg = ['ffmpeg ', ' -y -loglevel ', ' error '] + vid_name = os.path.basename(video_path).split('.')[0] + out_full_path = os.path.join(outpath, vid_name) + + if not os.path.exists(out_full_path): + os.makedirs(out_full_path) + + # video file name + outformat = os.path.join(out_full_path, '%08d.png') + + cmd = ffmpeg + cmd = ffmpeg + [ + ' -i ', video_path, ' -r ', str(frame_rate), ' -f image2 ', outformat + ] + cmd = ''.join(cmd) + _dict2str(kargs) + + if os.system(cmd) != 0: + raise RuntimeError('ffmpeg process video: {} error'.format(video_path)) + sys.exit(-1) + + sys.stdout.flush() + return out_full_path + + +def mot_label(): + labels_map = {'person': 0} + return labels_map + + +def visdrone_mcmot_label(): + labels_map = { + 'pedestrian': 0, + 'people': 1, + 'bicycle': 2, + 'car': 3, + 'van': 4, + 'truck': 5, + 'tricycle': 6, + 'awning-tricycle': 7, + 'bus': 8, + 'motor': 9, + } + return labels_map diff --git a/PaddleDetection-release-2.6/ppdet/data/source/pose3d_cmb.py b/PaddleDetection-release-2.6/ppdet/data/source/pose3d_cmb.py new file mode 100644 index 0000000000000000000000000000000000000000..06dbdd9e9abaf597112ea905c5d6e708caa3b132 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/pose3d_cmb.py @@ -0,0 +1,380 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import cv2 +import numpy as np +import json +import copy +import pycocotools +from pycocotools.coco import COCO +from .dataset import DetDataset +from ppdet.core.workspace import register, serializable +from paddle.io import Dataset + + +@serializable +class Pose3DDataset(DetDataset): + """Pose3D Dataset class. + + Args: + dataset_dir (str): Root path to the dataset. + anno_list (list of str): each of the element is a relative path to the annotation file. + image_dirs (list of str): each of path is a relative path where images are held. + transform (composed(operators)): A sequence of data transforms. + test_mode (bool): Store True when building test or + validation dataset. Default: False. + 24 joints order: + 0-2: 'R_Ankle', 'R_Knee', 'R_Hip', + 3-5:'L_Hip', 'L_Knee', 'L_Ankle', + 6-8:'R_Wrist', 'R_Elbow', 'R_Shoulder', + 9-11:'L_Shoulder','L_Elbow','L_Wrist', + 12-14:'Neck','Top_of_Head','Pelvis', + 15-18:'Thorax','Spine','Jaw','Head', + 19-23:'Nose','L_Eye','R_Eye','L_Ear','R_Ear' + """ + + def __init__(self, + dataset_dir, + image_dirs, + anno_list, + transform=[], + num_joints=24, + test_mode=False): + super().__init__(dataset_dir, image_dirs, anno_list) + self.image_info = {} + self.ann_info = {} + self.num_joints = num_joints + + self.transform = transform + self.test_mode = test_mode + + self.img_ids = [] + self.dataset_dir = dataset_dir + self.image_dirs = image_dirs + self.anno_list = anno_list + + def get_mask(self, mvm_percent=0.3): + num_joints = self.num_joints + mjm_mask = np.ones((num_joints, 1)).astype(np.float32) + if self.test_mode == False: + pb = np.random.random_sample() + masked_num = int( + pb * mvm_percent * + num_joints) # at most x% of the joints could be masked + indices = np.random.choice( + np.arange(num_joints), replace=False, size=masked_num) + mjm_mask[indices, :] = 0.0 + # return mjm_mask + + num_joints = 10 + mvm_mask = np.ones((num_joints, 1)).astype(np.float) + if self.test_mode == False: + num_vertices = num_joints + pb = np.random.random_sample() + masked_num = int( + pb * mvm_percent * + num_vertices) # at most x% of the vertices could be masked + indices = np.random.choice( + np.arange(num_vertices), replace=False, size=masked_num) + mvm_mask[indices, :] = 0.0 + + mjm_mask = np.concatenate([mjm_mask, mvm_mask], axis=0) + return mjm_mask + + def filterjoints(self, x): + if self.num_joints == 24: + return x + elif self.num_joints == 14: + return x[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18], :] + elif self.num_joints == 17: + return x[ + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 18, 19], :] + else: + raise ValueError( + "unsupported joint numbers, only [24 or 17 or 14] is supported!") + + def parse_dataset(self): + print("Loading annotations..., please wait") + self.annos = [] + im_id = 0 + self.human36m_num = 0 + for idx, annof in enumerate(self.anno_list): + img_prefix = os.path.join(self.dataset_dir, self.image_dirs[idx]) + dataf = os.path.join(self.dataset_dir, annof) + with open(dataf, 'r') as rf: + anno_data = json.load(rf) + annos = anno_data['data'] + new_annos = [] + print("{} has annos numbers: {}".format(dataf, len(annos))) + for anno in annos: + new_anno = {} + new_anno['im_id'] = im_id + im_id += 1 + imagename = anno['imageName'] + if imagename.startswith("COCO_train2014_"): + imagename = imagename[len("COCO_train2014_"):] + elif imagename.startswith("COCO_val2014_"): + imagename = imagename[len("COCO_val2014_"):] + imagename = os.path.join(img_prefix, imagename) + if not os.path.exists(imagename): + if "train2017" in imagename: + imagename = imagename.replace("train2017", + "val2017") + if not os.path.exists(imagename): + print("cannot find imagepath:{}".format( + imagename)) + continue + else: + print("cannot find imagepath:{}".format(imagename)) + continue + new_anno['imageName'] = imagename + if 'human3.6m' in imagename: + self.human36m_num += 1 + new_anno['bbox_center'] = anno['bbox_center'] + new_anno['bbox_scale'] = anno['bbox_scale'] + new_anno['joints_2d'] = np.array(anno[ + 'gt_keypoint_2d']).astype(np.float32) + if new_anno['joints_2d'].shape[0] == 49: + #if the joints_2d is in SPIN format(which generated by eft), choose the last 24 public joints + #for detail please refer: https://github.com/nkolot/SPIN/blob/master/constants.py + new_anno['joints_2d'] = new_anno['joints_2d'][25:] + new_anno['joints_3d'] = np.array(anno[ + 'pose3d'])[:, :3].astype(np.float32) + new_anno['mjm_mask'] = self.get_mask() + if not 'has_3d_joints' in anno: + new_anno['has_3d_joints'] = int(1) + new_anno['has_2d_joints'] = int(1) + else: + new_anno['has_3d_joints'] = int(anno['has_3d_joints']) + new_anno['has_2d_joints'] = int(anno['has_2d_joints']) + new_anno['joints_2d'] = self.filterjoints(new_anno[ + 'joints_2d']) + self.annos.append(new_anno) + del annos + + def get_temp_num(self): + """get temporal data number, like human3.6m""" + return self.human36m_num + + def __len__(self): + """Get dataset length.""" + return len(self.annos) + + def _get_imganno(self, idx): + """Get anno for a single image.""" + return self.annos[idx] + + def __getitem__(self, idx): + """Prepare image for training given the index.""" + records = copy.deepcopy(self._get_imganno(idx)) + imgpath = records['imageName'] + assert os.path.exists(imgpath), "cannot find image {}".format(imgpath) + records['image'] = cv2.imread(imgpath) + records['image'] = cv2.cvtColor(records['image'], cv2.COLOR_BGR2RGB) + records = self.transform(records) + return records + + def check_or_download_dataset(self): + alldatafind = True + for image_dir in self.image_dirs: + image_dir = os.path.join(self.dataset_dir, image_dir) + if not os.path.isdir(image_dir): + print("dataset [{}] is not found".format(image_dir)) + alldatafind = False + if not alldatafind: + raise ValueError( + "Some dataset is not valid and cannot download automatically now, please prepare the dataset first" + ) + + +@register +@serializable +class Keypoint3DMultiFramesDataset(Dataset): + """24 keypoints 3D dataset for pose estimation. + + each item is a list of images + + The dataset loads raw features and apply specified transforms + to return a dict containing the image tensors and other information. + + Args: + dataset_dir (str): Root path to the dataset. + image_dir (str): Path to a directory where images are held. + """ + + def __init__( + self, + dataset_dir, # 数据集根目录 + image_dir, # 图像文件夹 + p3d_dir, # 3D关键点文件夹 + json_path, + img_size, #图像resize大小 + num_frames, # 帧序列长度 + anno_path=None, ): + + self.dataset_dir = dataset_dir + self.image_dir = image_dir + self.p3d_dir = p3d_dir + self.json_path = json_path + self.img_size = img_size + self.num_frames = num_frames + self.anno_path = anno_path + + self.data_labels, self.mf_inds = self._generate_multi_frames_list() + + def _generate_multi_frames_list(self): + act_list = os.listdir(self.dataset_dir) # 动作列表 + count = 0 + mf_list = [] + annos_dict = {'images': [], 'annotations': [], 'act_inds': []} + for act in act_list: #对每个动作,生成帧序列 + if '.' in act: + continue + + json_path = os.path.join(self.dataset_dir, act, self.json_path) + with open(json_path, 'r') as j: + annos = json.load(j) + length = len(annos['images']) + for k, v in annos.items(): + if k in annos_dict: + annos_dict[k].extend(v) + annos_dict['act_inds'].extend([act] * length) + + mf = [[i + j + count for j in range(self.num_frames)] + for i in range(0, length - self.num_frames + 1)] + mf_list.extend(mf) + count += length + + print("total data number:", len(mf_list)) + return annos_dict, mf_list + + def __call__(self, *args, **kwargs): + return self + + def __getitem__(self, index): # 拿一个连续的序列 + inds = self.mf_inds[ + index] # 如[568, 569, 570, 571, 572, 573],长度为num_frames + + images = self.data_labels['images'] # all images + annots = self.data_labels['annotations'] # all annots + + act = self.data_labels['act_inds'][inds[0]] # 动作名(文件夹名) + + kps3d_list = [] + kps3d_vis_list = [] + names = [] + + h, w = 0, 0 + for ind in inds: # one image + height = float(images[ind]['height']) + width = float(images[ind]['width']) + name = images[ind]['file_name'] # 图像名称,带有后缀 + + kps3d_name = name.split('.')[0] + '.obj' + kps3d_path = os.path.join(self.dataset_dir, act, self.p3d_dir, + kps3d_name) + + joints, joints_vis = self.kps3d_process(kps3d_path) + joints_vis = np.array(joints_vis, dtype=np.float32) + + kps3d_list.append(joints) + kps3d_vis_list.append(joints_vis) + names.append(name) + + kps3d = np.array(kps3d_list) # (6, 24, 3),(num_frames, joints_num, 3) + kps3d_vis = np.array(kps3d_vis_list) + + # read image + imgs = [] + for name in names: + img_path = os.path.join(self.dataset_dir, act, self.image_dir, name) + + image = cv2.imread(img_path, cv2.IMREAD_COLOR | + cv2.IMREAD_IGNORE_ORIENTATION) + image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + + imgs.append(np.expand_dims(image, axis=0)) + + imgs = np.concatenate(imgs, axis=0) + imgs = imgs.astype( + np.float32) # (6, 1080, 1920, 3),(num_frames, h, w, c) + + # attention: 此时图像和标注是镜像的 + records = { + 'kps3d': kps3d, + 'kps3d_vis': kps3d_vis, + "image": imgs, + 'act': act, + 'names': names, + 'im_id': index + } + + return self.transform(records) + + def kps3d_process(self, kps3d_path): + count = 0 + kps = [] + kps_vis = [] + + with open(kps3d_path, 'r') as f: + lines = f.readlines() + for line in lines: + if line[0] == 'v': + kps.append([]) + line = line.strip('\n').split(' ')[1:] + for kp in line: + kps[-1].append(float(kp)) + count += 1 + + kps_vis.append([1, 1, 1]) + + kps = np.array(kps) # 52,3 + kps_vis = np.array(kps_vis) + + kps *= 10 # scale points + kps -= kps[[0], :] # set root point to zero + + kps = np.concatenate((kps[0:23], kps[[37]]), axis=0) # 24,3 + + kps *= 10 + + kps_vis = np.concatenate((kps_vis[0:23], kps_vis[[37]]), axis=0) # 24,3 + + return kps, kps_vis + + def __len__(self): + return len(self.mf_inds) + + def get_anno(self): + if self.anno_path is None: + return + return os.path.join(self.dataset_dir, self.anno_path) + + def check_or_download_dataset(self): + return + + def parse_dataset(self, ): + return + + def set_transform(self, transform): + self.transform = transform + + def set_epoch(self, epoch_id): + self._epoch = epoch_id + + def set_kwargs(self, **kwargs): + self.mixup_epoch = kwargs.get('mixup_epoch', -1) + self.cutmix_epoch = kwargs.get('cutmix_epoch', -1) + self.mosaic_epoch = kwargs.get('mosaic_epoch', -1) diff --git a/PaddleDetection-release-2.6/ppdet/data/source/sniper_coco.py b/PaddleDetection-release-2.6/ppdet/data/source/sniper_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..1b07e7a31d999d137965c4860a4d8085d0b91465 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/sniper_coco.py @@ -0,0 +1,194 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import cv2 +import json +import copy +import numpy as np + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +from ppdet.core.workspace import register, serializable +from ppdet.data.crop_utils.annotation_cropper import AnnoCropper +from .coco import COCODataSet +from .dataset import _make_dataset, _is_valid_file +from ppdet.utils.logger import setup_logger + +logger = setup_logger('sniper_coco_dataset') + + +@register +@serializable +class SniperCOCODataSet(COCODataSet): + """SniperCOCODataSet""" + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + proposals_file=None, + data_fields=['image'], + sample_num=-1, + load_crowd=False, + allow_empty=True, + empty_ratio=1., + is_trainset=True, + image_target_sizes=[2000, 1000], + valid_box_ratio_ranges=[[-1, 0.1],[0.08, -1]], + chip_target_size=500, + chip_target_stride=200, + use_neg_chip=False, + max_neg_num_per_im=8, + max_per_img=-1, + nms_thresh=0.5): + super(SniperCOCODataSet, self).__init__( + dataset_dir=dataset_dir, + image_dir=image_dir, + anno_path=anno_path, + data_fields=data_fields, + sample_num=sample_num, + load_crowd=load_crowd, + allow_empty=allow_empty, + empty_ratio=empty_ratio + ) + self.proposals_file = proposals_file + self.proposals = None + self.anno_cropper = None + self.is_trainset = is_trainset + self.image_target_sizes = image_target_sizes + self.valid_box_ratio_ranges = valid_box_ratio_ranges + self.chip_target_size = chip_target_size + self.chip_target_stride = chip_target_stride + self.use_neg_chip = use_neg_chip + self.max_neg_num_per_im = max_neg_num_per_im + self.max_per_img = max_per_img + self.nms_thresh = nms_thresh + + + def parse_dataset(self): + if not hasattr(self, "roidbs"): + super(SniperCOCODataSet, self).parse_dataset() + if self.is_trainset: + self._parse_proposals() + self._merge_anno_proposals() + self.ori_roidbs = copy.deepcopy(self.roidbs) + self.init_anno_cropper() + self.roidbs = self.generate_chips_roidbs(self.roidbs, self.is_trainset) + + def set_proposals_file(self, file_path): + self.proposals_file = file_path + + def init_anno_cropper(self): + logger.info("Init AnnoCropper...") + self.anno_cropper = AnnoCropper( + image_target_sizes=self.image_target_sizes, + valid_box_ratio_ranges=self.valid_box_ratio_ranges, + chip_target_size=self.chip_target_size, + chip_target_stride=self.chip_target_stride, + use_neg_chip=self.use_neg_chip, + max_neg_num_per_im=self.max_neg_num_per_im, + max_per_img=self.max_per_img, + nms_thresh=self.nms_thresh + ) + + def generate_chips_roidbs(self, roidbs, is_trainset): + if is_trainset: + roidbs = self.anno_cropper.crop_anno_records(roidbs) + else: + roidbs = self.anno_cropper.crop_infer_anno_records(roidbs) + return roidbs + + def _parse_proposals(self): + if self.proposals_file: + self.proposals = {} + logger.info("Parse proposals file:{}".format(self.proposals_file)) + with open(self.proposals_file, 'r') as f: + proposals = json.load(f) + for prop in proposals: + image_id = prop["image_id"] + if image_id not in self.proposals: + self.proposals[image_id] = [] + x, y, w, h = prop["bbox"] + self.proposals[image_id].append([x, y, x + w, y + h]) + + def _merge_anno_proposals(self): + assert self.roidbs + if self.proposals and len(self.proposals.keys()) > 0: + logger.info("merge proposals to annos") + for id, record in enumerate(self.roidbs): + image_id = int(record["im_id"]) + if image_id not in self.proposals.keys(): + logger.info("image id :{} no proposals".format(image_id)) + record["proposals"] = np.array(self.proposals.get(image_id, []), dtype=np.float32) + self.roidbs[id] = record + + def get_ori_roidbs(self): + if not hasattr(self, "ori_roidbs"): + return None + return self.ori_roidbs + + def get_roidbs(self): + if not hasattr(self, "roidbs"): + self.parse_dataset() + return self.roidbs + + def set_roidbs(self, roidbs): + self.roidbs = roidbs + + def check_or_download_dataset(self): + return + + def _parse(self): + image_dir = self.image_dir + if not isinstance(image_dir, Sequence): + image_dir = [image_dir] + images = [] + for im_dir in image_dir: + if os.path.isdir(im_dir): + im_dir = os.path.join(self.dataset_dir, im_dir) + images.extend(_make_dataset(im_dir)) + elif os.path.isfile(im_dir) and _is_valid_file(im_dir): + images.append(im_dir) + return images + + def _load_images(self): + images = self._parse() + ct = 0 + records = [] + for image in images: + assert image != '' and os.path.isfile(image), \ + "Image {} not found".format(image) + if self.sample_num > 0 and ct >= self.sample_num: + break + im = cv2.imread(image) + h, w, c = im.shape + rec = {'im_id': np.array([ct]), 'im_file': image, "h": h, "w": w} + self._imid2path[ct] = image + ct += 1 + records.append(rec) + assert len(records) > 0, "No image file found" + return records + + def get_imid2path(self): + return self._imid2path + + def set_images(self, images): + self._imid2path = {} + self.image_dir = images + self.roidbs = self._load_images() + diff --git a/PaddleDetection-release-2.6/ppdet/data/source/voc.py b/PaddleDetection-release-2.6/ppdet/data/source/voc.py new file mode 100644 index 0000000000000000000000000000000000000000..2f103588537c5499ef83133fe3f8d4ba7303e685 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/voc.py @@ -0,0 +1,234 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import numpy as np + +import xml.etree.ElementTree as ET + +from ppdet.core.workspace import register, serializable + +from .dataset import DetDataset + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register +@serializable +class VOCDataSet(DetDataset): + """ + Load dataset with PascalVOC format. + + Notes: + `anno_path` must contains xml file and image file path for annotations. + + Args: + dataset_dir (str): root directory for dataset. + image_dir (str): directory for images. + anno_path (str): voc annotation file path. + data_fields (list): key name of data dictionary, at least have 'image'. + sample_num (int): number of samples to load, -1 means all. + label_list (str): if use_default_label is False, will load + mapping between category and class index. + allow_empty (bool): whether to load empty entry. False as default + empty_ratio (float): the ratio of empty record number to total + record's, if empty_ratio is out of [0. ,1.), do not sample the + records and use all the empty entries. 1. as default + repeat (int): repeat times for dataset, use in benchmark. + """ + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + label_list=None, + allow_empty=False, + empty_ratio=1., + repeat=1): + super(VOCDataSet, self).__init__( + dataset_dir=dataset_dir, + image_dir=image_dir, + anno_path=anno_path, + data_fields=data_fields, + sample_num=sample_num, + repeat=repeat) + self.label_list = label_list + self.allow_empty = allow_empty + self.empty_ratio = empty_ratio + + def _sample_empty(self, records, num): + # if empty_ratio is out of [0. ,1.), do not sample the records + if self.empty_ratio < 0. or self.empty_ratio >= 1.: + return records + import random + sample_num = min( + int(num * self.empty_ratio / (1 - self.empty_ratio)), len(records)) + records = random.sample(records, sample_num) + return records + + def parse_dataset(self, ): + anno_path = os.path.join(self.dataset_dir, self.anno_path) + image_dir = os.path.join(self.dataset_dir, self.image_dir) + + # mapping category name to class id + # first_class:0, second_class:1, ... + records = [] + empty_records = [] + ct = 0 + cname2cid = {} + if self.label_list: + label_path = os.path.join(self.dataset_dir, self.label_list) + if not os.path.exists(label_path): + raise ValueError("label_list {} does not exists".format( + label_path)) + with open(label_path, 'r') as fr: + label_id = 0 + for line in fr.readlines(): + cname2cid[line.strip()] = label_id + label_id += 1 + else: + cname2cid = pascalvoc_label() + + with open(anno_path, 'r') as fr: + while True: + line = fr.readline() + if not line: + break + img_file, xml_file = [os.path.join(image_dir, x) \ + for x in line.strip().split()[:2]] + if not os.path.exists(img_file): + logger.warning( + 'Illegal image file: {}, and it will be ignored'.format( + img_file)) + continue + if not os.path.isfile(xml_file): + logger.warning( + 'Illegal xml file: {}, and it will be ignored'.format( + xml_file)) + continue + tree = ET.parse(xml_file) + if tree.find('id') is None: + im_id = np.array([ct]) + else: + im_id = np.array([int(tree.find('id').text)]) + + objs = tree.findall('object') + im_w = float(tree.find('size').find('width').text) + im_h = float(tree.find('size').find('height').text) + if im_w < 0 or im_h < 0: + logger.warning( + 'Illegal width: {} or height: {} in annotation, ' + 'and {} will be ignored'.format(im_w, im_h, xml_file)) + continue + + num_bbox, i = len(objs), 0 + gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32) + gt_class = np.zeros((num_bbox, 1), dtype=np.int32) + gt_score = np.zeros((num_bbox, 1), dtype=np.float32) + difficult = np.zeros((num_bbox, 1), dtype=np.int32) + for obj in objs: + cname = obj.find('name').text + + # user dataset may not contain difficult field + _difficult = obj.find('difficult') + _difficult = int( + _difficult.text) if _difficult is not None else 0 + + x1 = float(obj.find('bndbox').find('xmin').text) + y1 = float(obj.find('bndbox').find('ymin').text) + x2 = float(obj.find('bndbox').find('xmax').text) + y2 = float(obj.find('bndbox').find('ymax').text) + x1 = max(0, x1) + y1 = max(0, y1) + x2 = min(im_w - 1, x2) + y2 = min(im_h - 1, y2) + if x2 > x1 and y2 > y1: + gt_bbox[i, :] = [x1, y1, x2, y2] + gt_class[i, 0] = cname2cid[cname] + gt_score[i, 0] = 1. + difficult[i, 0] = _difficult + i += 1 + else: + logger.warning( + 'Found an invalid bbox in annotations: xml_file: {}' + ', x1: {}, y1: {}, x2: {}, y2: {}.'.format( + xml_file, x1, y1, x2, y2)) + gt_bbox = gt_bbox[:i, :] + gt_class = gt_class[:i, :] + gt_score = gt_score[:i, :] + difficult = difficult[:i, :] + + voc_rec = { + 'im_file': img_file, + 'im_id': im_id, + 'h': im_h, + 'w': im_w + } if 'image' in self.data_fields else {} + + gt_rec = { + 'gt_class': gt_class, + 'gt_score': gt_score, + 'gt_bbox': gt_bbox, + 'difficult': difficult + } + for k, v in gt_rec.items(): + if k in self.data_fields: + voc_rec[k] = v + + if len(objs) == 0: + empty_records.append(voc_rec) + else: + records.append(voc_rec) + + ct += 1 + if self.sample_num > 0 and ct >= self.sample_num: + break + assert ct > 0, 'not found any voc record in %s' % (self.anno_path) + logger.debug('{} samples in file {}'.format(ct, anno_path)) + if self.allow_empty and len(empty_records) > 0: + empty_records = self._sample_empty(empty_records, len(records)) + records += empty_records + self.roidbs, self.cname2cid = records, cname2cid + + def get_label_list(self): + return os.path.join(self.dataset_dir, self.label_list) + + +def pascalvoc_label(): + labels_map = { + 'aeroplane': 0, + 'bicycle': 1, + 'bird': 2, + 'boat': 3, + 'bottle': 4, + 'bus': 5, + 'car': 6, + 'cat': 7, + 'chair': 8, + 'cow': 9, + 'diningtable': 10, + 'dog': 11, + 'horse': 12, + 'motorbike': 13, + 'person': 14, + 'pottedplant': 15, + 'sheep': 16, + 'sofa': 17, + 'train': 18, + 'tvmonitor': 19 + } + return labels_map diff --git a/PaddleDetection-release-2.6/ppdet/data/source/widerface.py b/PaddleDetection-release-2.6/ppdet/data/source/widerface.py new file mode 100644 index 0000000000000000000000000000000000000000..a17c2aaf8a20c0218f8833891f1f858715dce4b0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/source/widerface.py @@ -0,0 +1,180 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import numpy as np + +from ppdet.core.workspace import register, serializable +from .dataset import DetDataset + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register +@serializable +class WIDERFaceDataSet(DetDataset): + """ + Load WiderFace records with 'anno_path' + + Args: + dataset_dir (str): root directory for dataset. + image_dir (str): directory for images. + anno_path (str): WiderFace annotation data. + data_fields (list): key name of data dictionary, at least have 'image'. + sample_num (int): number of samples to load, -1 means all. + with_lmk (bool): whether to load face landmark keypoint labels. + """ + + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + data_fields=['image'], + sample_num=-1, + with_lmk=False): + super(WIDERFaceDataSet, self).__init__( + dataset_dir=dataset_dir, + image_dir=image_dir, + anno_path=anno_path, + data_fields=data_fields, + sample_num=sample_num, + with_lmk=with_lmk) + self.anno_path = anno_path + self.sample_num = sample_num + self.roidbs = None + self.cname2cid = None + self.with_lmk = with_lmk + + def parse_dataset(self): + anno_path = os.path.join(self.dataset_dir, self.anno_path) + image_dir = os.path.join(self.dataset_dir, self.image_dir) + + txt_file = anno_path + + records = [] + ct = 0 + file_lists = self._load_file_list(txt_file) + cname2cid = widerface_label() + + for item in file_lists: + im_fname = item[0] + im_id = np.array([ct]) + gt_bbox = np.zeros((len(item) - 1, 4), dtype=np.float32) + gt_class = np.zeros((len(item) - 1, 1), dtype=np.int32) + gt_lmk_labels = np.zeros((len(item) - 1, 10), dtype=np.float32) + lmk_ignore_flag = np.zeros((len(item) - 1, 1), dtype=np.int32) + for index_box in range(len(item)): + if index_box < 1: + continue + gt_bbox[index_box - 1] = item[index_box][0] + if self.with_lmk: + gt_lmk_labels[index_box - 1] = item[index_box][1] + lmk_ignore_flag[index_box - 1] = item[index_box][2] + im_fname = os.path.join(image_dir, + im_fname) if image_dir else im_fname + widerface_rec = { + 'im_file': im_fname, + 'im_id': im_id, + } if 'image' in self.data_fields else {} + gt_rec = { + 'gt_bbox': gt_bbox, + 'gt_class': gt_class, + } + for k, v in gt_rec.items(): + if k in self.data_fields: + widerface_rec[k] = v + if self.with_lmk: + widerface_rec['gt_keypoint'] = gt_lmk_labels + widerface_rec['keypoint_ignore'] = lmk_ignore_flag + + if len(item) != 0: + records.append(widerface_rec) + + ct += 1 + if self.sample_num > 0 and ct >= self.sample_num: + break + assert len(records) > 0, 'not found any widerface in %s' % (anno_path) + logger.debug('{} samples in file {}'.format(ct, anno_path)) + self.roidbs, self.cname2cid = records, cname2cid + + def _load_file_list(self, input_txt): + with open(input_txt, 'r') as f_dir: + lines_input_txt = f_dir.readlines() + + file_dict = {} + num_class = 0 + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for i in range(len(lines_input_txt)): + line_txt = lines_input_txt[i].strip('\n\t\r') + split_str = line_txt.split(' ') + if len(split_str) == 1: + img_file_name = os.path.split(split_str[0])[1] + split_txt = img_file_name.split('.') + if len(split_txt) < 2: + continue + elif split_txt[-1] in exts: + if i != 0: + num_class += 1 + file_dict[num_class] = [line_txt] + else: + if len(line_txt) <= 6: + continue + result_boxs = [] + xmin = float(split_str[0]) + ymin = float(split_str[1]) + w = float(split_str[2]) + h = float(split_str[3]) + # Filter out wrong labels + if w < 0 or h < 0: + logger.warning('Illegal box with w: {}, h: {} in ' + 'img: {}, and it will be ignored'.format( + w, h, file_dict[num_class][0])) + continue + xmin = max(0, xmin) + ymin = max(0, ymin) + xmax = xmin + w + ymax = ymin + h + gt_bbox = [xmin, ymin, xmax, ymax] + result_boxs.append(gt_bbox) + if self.with_lmk: + assert len(split_str) > 18, 'When `with_lmk=True`, the number' \ + 'of characters per line in the annotation file should' \ + 'exceed 18.' + lmk0_x = float(split_str[5]) + lmk0_y = float(split_str[6]) + lmk1_x = float(split_str[8]) + lmk1_y = float(split_str[9]) + lmk2_x = float(split_str[11]) + lmk2_y = float(split_str[12]) + lmk3_x = float(split_str[14]) + lmk3_y = float(split_str[15]) + lmk4_x = float(split_str[17]) + lmk4_y = float(split_str[18]) + lmk_ignore_flag = 0 if lmk0_x == -1 else 1 + gt_lmk_label = [ + lmk0_x, lmk0_y, lmk1_x, lmk1_y, lmk2_x, lmk2_y, lmk3_x, + lmk3_y, lmk4_x, lmk4_y + ] + result_boxs.append(gt_lmk_label) + result_boxs.append(lmk_ignore_flag) + file_dict[num_class].append(result_boxs) + + return list(file_dict.values()) + + +def widerface_label(): + labels_map = {'face': 0} + return labels_map diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__init__.py b/PaddleDetection-release-2.6/ppdet/data/transform/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..08d7f64d9e906a4b7aa47496e32cd787244a8c9f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/__init__.py @@ -0,0 +1,32 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import operators +from . import batch_operators +from . import keypoint_operators +from . import mot_operators +from . import rotated_operators +from . import keypoints_3d_operators + +from .operators import * +from .batch_operators import * +from .keypoint_operators import * +from .mot_operators import * +from .rotated_operators import * +from .keypoints_3d_operators import * + +__all__ = [] +__all__ += registered_ops +__all__ += keypoint_operators.__all__ +__all__ += mot_operators.__all__ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9bab4a8de009921928e63f2f98a8ca5fd90e428f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/atss_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/atss_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e90f6c692327c30409f2801994ece7d0c9a4db54 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/atss_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/batch_operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/batch_operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..caf6273c623cf9cfb1e8f6303556b600d114eeba Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/batch_operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoint_operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoint_operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7b6cd776169ecd31b5a8ded65da592369393a164 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoint_operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoints_3d_operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoints_3d_operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0fed3bbcf25d4f53d57fce548d0a2d5282fd7aa4 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/keypoints_3d_operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/mot_operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/mot_operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..69d52e0ec325a67b1110605b9f627a1a64a1ab27 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/mot_operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/op_helper.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/op_helper.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..78d7c9c35ae9981060a89e6a26cfc2b20b342873 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/op_helper.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..03917f437099bccf9a692a170eab9626edbbce9d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/rotated_operators.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/rotated_operators.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d4bd3596afa83b29f5f2cda084b5581381c3ee96 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/data/transform/__pycache__/rotated_operators.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/atss_assigner.py b/PaddleDetection-release-2.6/ppdet/data/transform/atss_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..686b1407bfc30871fa7847424a41e75dcf0596c5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/atss_assigner.py @@ -0,0 +1,421 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/assigners/atss_assigner.py + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +def bbox_overlaps(bboxes1, bboxes2, mode='iou', is_aligned=False, eps=1e-6): + """Calculate overlap between two set of bboxes. + If ``is_aligned `` is ``False``, then calculate the overlaps between each + bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned + pair of bboxes1 and bboxes2. + Args: + bboxes1 (Tensor): shape (B, m, 4) in format or empty. + bboxes2 (Tensor): shape (B, n, 4) in format or empty. + B indicates the batch dim, in shape (B1, B2, ..., Bn). + If ``is_aligned `` is ``True``, then m and n must be equal. + mode (str): "iou" (intersection over union) or "iof" (intersection over + foreground). + is_aligned (bool, optional): If True, then m and n must be equal. + Default False. + eps (float, optional): A value added to the denominator for numerical + stability. Default 1e-6. + Returns: + Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,) + """ + assert mode in ['iou', 'iof', 'giou', 'diou'], 'Unsupported mode {}'.format( + mode) + # Either the boxes are empty or the length of boxes's last dimenstion is 4 + assert (bboxes1.shape[-1] == 4 or bboxes1.shape[0] == 0) + assert (bboxes2.shape[-1] == 4 or bboxes2.shape[0] == 0) + + # Batch dim must be the same + # Batch dim: (B1, B2, ... Bn) + assert bboxes1.shape[:-2] == bboxes2.shape[:-2] + batch_shape = bboxes1.shape[:-2] + + rows = bboxes1.shape[-2] if bboxes1.shape[0] > 0 else 0 + cols = bboxes2.shape[-2] if bboxes2.shape[0] > 0 else 0 + if is_aligned: + assert rows == cols + + if rows * cols == 0: + if is_aligned: + return np.random.random(batch_shape + (rows, )) + else: + return np.random.random(batch_shape + (rows, cols)) + + area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * ( + bboxes1[..., 3] - bboxes1[..., 1]) + area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * ( + bboxes2[..., 3] - bboxes2[..., 1]) + + if is_aligned: + lt = np.maximum(bboxes1[..., :2], bboxes2[..., :2]) # [B, rows, 2] + rb = np.minimum(bboxes1[..., 2:], bboxes2[..., 2:]) # [B, rows, 2] + + wh = (rb - lt).clip(min=0) # [B, rows, 2] + overlap = wh[..., 0] * wh[..., 1] + + if mode in ['iou', 'giou']: + union = area1 + area2 - overlap + else: + union = area1 + if mode == 'giou': + enclosed_lt = np.minimum(bboxes1[..., :2], bboxes2[..., :2]) + enclosed_rb = np.maximum(bboxes1[..., 2:], bboxes2[..., 2:]) + if mode == 'diou': + enclosed_lt = np.minimum(bboxes1[..., :2], bboxes2[..., :2]) + enclosed_rb = np.maximum(bboxes1[..., 2:], bboxes2[..., 2:]) + b1_x1, b1_y1 = bboxes1[..., 0], bboxes1[..., 1] + b1_x2, b1_y2 = bboxes1[..., 2], bboxes1[..., 3] + b2_x1, b2_y1 = bboxes2[..., 0], bboxes2[..., 1] + b2_x2, b2_y2 = bboxes2[..., 2], bboxes2[..., 3] + else: + lt = np.maximum(bboxes1[..., :, None, :2], + bboxes2[..., None, :, :2]) # [B, rows, cols, 2] + rb = np.minimum(bboxes1[..., :, None, 2:], + bboxes2[..., None, :, 2:]) # [B, rows, cols, 2] + + wh = (rb - lt).clip(min=0) # [B, rows, cols, 2] + overlap = wh[..., 0] * wh[..., 1] + + if mode in ['iou', 'giou']: + union = area1[..., None] + area2[..., None, :] - overlap + else: + union = area1[..., None] + if mode == 'giou': + enclosed_lt = np.minimum(bboxes1[..., :, None, :2], + bboxes2[..., None, :, :2]) + enclosed_rb = np.maximum(bboxes1[..., :, None, 2:], + bboxes2[..., None, :, 2:]) + if mode == 'diou': + enclosed_lt = np.minimum(bboxes1[..., :, None, :2], + bboxes2[..., None, :, :2]) + enclosed_rb = np.maximum(bboxes1[..., :, None, 2:], + bboxes2[..., None, :, 2:]) + b1_x1, b1_y1 = bboxes1[..., :, None, 0], bboxes1[..., :, None, 1] + b1_x2, b1_y2 = bboxes1[..., :, None, 2], bboxes1[..., :, None, 3] + b2_x1, b2_y1 = bboxes2[..., None, :, 0], bboxes2[..., None, :, 1] + b2_x2, b2_y2 = bboxes2[..., None, :, 2], bboxes2[..., None, :, 3] + + eps = np.array([eps]) + union = np.maximum(union, eps) + ious = overlap / union + if mode in ['iou', 'iof']: + return ious + # calculate gious + if mode in ['giou']: + enclose_wh = (enclosed_rb - enclosed_lt).clip(min=0) + enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1] + enclose_area = np.maximum(enclose_area, eps) + gious = ious - (enclose_area - union) / enclose_area + return gious + if mode in ['diou']: + left = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2))**2 / 4 + right = ((b2_y1 + b2_y2) - (b1_y1 + b1_y2))**2 / 4 + rho2 = left + right + enclose_wh = (enclosed_rb - enclosed_lt).clip(min=0) + enclose_c = enclose_wh[..., 0]**2 + enclose_wh[..., 1]**2 + enclose_c = np.maximum(enclose_c, eps) + dious = ious - rho2 / enclose_c + return dious + + +def topk_(input, k, axis=1, largest=True): + x = -input if largest else input + if axis == 0: + row_index = np.arange(input.shape[1 - axis]) + if k == x.shape[0]: # argpartition requires index < len(input) + topk_index = np.argpartition(x, k - 1, axis=axis)[0:k, :] + else: + topk_index = np.argpartition(x, k, axis=axis)[0:k, :] + + topk_data = x[topk_index, row_index] + + topk_index_sort = np.argsort(topk_data, axis=axis) + topk_data_sort = topk_data[topk_index_sort, row_index] + topk_index_sort = topk_index[0:k, :][topk_index_sort, row_index] + else: + column_index = np.arange(x.shape[1 - axis])[:, None] + topk_index = np.argpartition(x, k, axis=axis)[:, 0:k] + topk_data = x[column_index, topk_index] + topk_data = -topk_data if largest else topk_data + topk_index_sort = np.argsort(topk_data, axis=axis) + topk_data_sort = topk_data[column_index, topk_index_sort] + topk_index_sort = topk_index[:, 0:k][column_index, topk_index_sort] + + return topk_data_sort, topk_index_sort + + +class ATSSAssigner(object): + """Assign a corresponding gt bbox or background to each bbox. + + Each proposals will be assigned with `0` or a positive integer + indicating the ground truth index. + + - 0: negative sample, no assigned gt + - positive integer: positive sample, index (1-based) of assigned gt + + Args: + topk (float): number of bbox selected in each level + """ + + def __init__(self, topk=9): + self.topk = topk + + def __call__(self, + bboxes, + num_level_bboxes, + gt_bboxes, + gt_bboxes_ignore=None, + gt_labels=None): + """Assign gt to bboxes. + The assignment is done in following steps + 1. compute iou between all bbox (bbox of all pyramid levels) and gt + 2. compute center distance between all bbox and gt + 3. on each pyramid level, for each gt, select k bbox whose center + are closest to the gt center, so we total select k*l bbox as + candidates for each gt + 4. get corresponding iou for the these candidates, and compute the + mean and std, set mean + std as the iou threshold + 5. select these candidates whose iou are greater than or equal to + the threshold as postive + 6. limit the positive sample's center in gt + Args: + bboxes (np.array): Bounding boxes to be assigned, shape(n, 4). + num_level_bboxes (List): num of bboxes in each level + gt_bboxes (np.array): Groundtruth boxes, shape (k, 4). + gt_bboxes_ignore (np.array, optional): Ground truth bboxes that are + labelled as `ignored`, e.g., crowd boxes in COCO. + gt_labels (np.array, optional): Label of gt_bboxes, shape (k, ). + """ + bboxes = bboxes[:, :4] + num_gt, num_bboxes = gt_bboxes.shape[0], bboxes.shape[0] + + # assign 0 by default + assigned_gt_inds = np.zeros((num_bboxes, ), dtype=np.int64) + + if num_gt == 0 or num_bboxes == 0: + # No ground truth or boxes, return empty assignment + max_overlaps = np.zeros((num_bboxes, )) + if num_gt == 0: + # No truth, assign everything to background + assigned_gt_inds[:] = 0 + if not np.any(gt_labels): + assigned_labels = None + else: + assigned_labels = -np.ones((num_bboxes, ), dtype=np.int64) + return assigned_gt_inds, max_overlaps + + # compute iou between all bbox and gt + overlaps = bbox_overlaps(bboxes, gt_bboxes) + # compute center distance between all bbox and gt + gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0 + gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0 + gt_points = np.stack((gt_cx, gt_cy), axis=1) + + bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0 + bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0 + bboxes_points = np.stack((bboxes_cx, bboxes_cy), axis=1) + + distances = np.sqrt( + np.power((bboxes_points[:, None, :] - gt_points[None, :, :]), 2) + .sum(-1)) + + # Selecting candidates based on the center distance + candidate_idxs = [] + start_idx = 0 + for bboxes_per_level in num_level_bboxes: + # on each pyramid level, for each gt, + # select k bbox whose center are closest to the gt center + end_idx = start_idx + bboxes_per_level + distances_per_level = distances[start_idx:end_idx, :] + selectable_k = min(self.topk, bboxes_per_level) + _, topk_idxs_per_level = topk_( + distances_per_level, selectable_k, axis=0, largest=False) + candidate_idxs.append(topk_idxs_per_level + start_idx) + start_idx = end_idx + candidate_idxs = np.concatenate(candidate_idxs, axis=0) + + # get corresponding iou for the these candidates, and compute the + # mean and std, set mean + std as the iou threshold + candidate_overlaps = overlaps[candidate_idxs, np.arange(num_gt)] + overlaps_mean_per_gt = candidate_overlaps.mean(0) + overlaps_std_per_gt = candidate_overlaps.std(0) + overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt + + is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :] + + # limit the positive sample's center in gt + for gt_idx in range(num_gt): + candidate_idxs[:, gt_idx] += gt_idx * num_bboxes + ep_bboxes_cx = np.broadcast_to( + bboxes_cx.reshape(1, -1), [num_gt, num_bboxes]).reshape(-1) + ep_bboxes_cy = np.broadcast_to( + bboxes_cy.reshape(1, -1), [num_gt, num_bboxes]).reshape(-1) + candidate_idxs = candidate_idxs.reshape(-1) + + # calculate the left, top, right, bottom distance between positive + # bbox center and gt side + l_ = ep_bboxes_cx[candidate_idxs].reshape(-1, num_gt) - gt_bboxes[:, 0] + t_ = ep_bboxes_cy[candidate_idxs].reshape(-1, num_gt) - gt_bboxes[:, 1] + r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].reshape(-1, num_gt) + b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].reshape(-1, num_gt) + is_in_gts = np.stack([l_, t_, r_, b_], axis=1).min(axis=1) > 0.01 + is_pos = is_pos & is_in_gts + + # if an anchor box is assigned to multiple gts, + # the one with the highest IoU will be selected. + overlaps_inf = -np.inf * np.ones_like(overlaps).T.reshape(-1) + index = candidate_idxs.reshape(-1)[is_pos.reshape(-1)] + overlaps_inf[index] = overlaps.T.reshape(-1)[index] + overlaps_inf = overlaps_inf.reshape(num_gt, -1).T + + max_overlaps = overlaps_inf.max(axis=1) + argmax_overlaps = overlaps_inf.argmax(axis=1) + assigned_gt_inds[max_overlaps != + -np.inf] = argmax_overlaps[max_overlaps != -np.inf] + 1 + + return assigned_gt_inds, max_overlaps + + def get_vlr_region(self, + bboxes, + num_level_bboxes, + gt_bboxes, + gt_bboxes_ignore=None, + gt_labels=None): + """get vlr region for ld distillation. + Args: + bboxes (np.array): Bounding boxes to be assigned, shape(n, 4). + num_level_bboxes (List): num of bboxes in each level + gt_bboxes (np.array): Groundtruth boxes, shape (k, 4). + gt_bboxes_ignore (np.array, optional): Ground truth bboxes that are + labelled as `ignored`, e.g., crowd boxes in COCO. + gt_labels (np.array, optional): Label of gt_bboxes, shape (k, ). + """ + bboxes = bboxes[:, :4] + + num_gt, num_bboxes = gt_bboxes.shape[0], bboxes.shape[0] + + # compute iou between all bbox and gt + overlaps = bbox_overlaps(bboxes, gt_bboxes) + + # compute diou between all bbox and gt + diou = bbox_overlaps(bboxes, gt_bboxes, mode='diou') + + # assign 0 by default + assigned_gt_inds = np.zeros((num_bboxes, ), dtype=np.int64) + + vlr_region_iou = (assigned_gt_inds + 0).astype(np.float32) + + if num_gt == 0 or num_bboxes == 0: + # No ground truth or boxes, return empty assignment + max_overlaps = np.zeros((num_bboxes, )) + if num_gt == 0: + # No truth, assign everything to background + assigned_gt_inds[:] = 0 + if not np.any(gt_labels): + assigned_labels = None + else: + assigned_labels = -np.ones((num_bboxes, ), dtype=np.int64) + return assigned_gt_inds, max_overlaps + + # compute center distance between all bbox and gt + gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0 + gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0 + gt_points = np.stack((gt_cx, gt_cy), axis=1) + + bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0 + bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0 + bboxes_points = np.stack((bboxes_cx, bboxes_cy), axis=1) + + distances = np.sqrt( + np.power((bboxes_points[:, None, :] - gt_points[None, :, :]), 2) + .sum(-1)) + + # Selecting candidates based on the center distance + candidate_idxs = [] + candidate_idxs_t = [] + start_idx = 0 + for bboxes_per_level in num_level_bboxes: + # on each pyramid level, for each gt, + # select k bbox whose center are closest to the gt center + end_idx = start_idx + bboxes_per_level + distances_per_level = distances[start_idx:end_idx, :] + selectable_t = min(self.topk, bboxes_per_level) + selectable_k = bboxes_per_level #k for all + _, topt_idxs_per_level = topk_( + distances_per_level, selectable_t, axis=0, largest=False) + _, topk_idxs_per_level = topk_( + distances_per_level, selectable_k, axis=0, largest=False) + candidate_idxs_t.append(topt_idxs_per_level + start_idx) + candidate_idxs.append(topk_idxs_per_level + start_idx) + start_idx = end_idx + + candidate_idxs_t = np.concatenate(candidate_idxs_t, axis=0) + candidate_idxs = np.concatenate(candidate_idxs, axis=0) + + # get corresponding iou for the these candidates, and compute the + # mean and std, set mean + std as the iou threshold + candidate_overlaps_t = overlaps[candidate_idxs_t, np.arange(num_gt)] + + # compute tdiou + t_diou = diou[candidate_idxs, np.arange(num_gt)] + + overlaps_mean_per_gt = candidate_overlaps_t.mean(0) + overlaps_std_per_gt = candidate_overlaps_t.std( + 0, ddof=1) # NOTE: use Bessel correction + overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt + + # compute region + is_pos = (t_diou < overlaps_thr_per_gt[None, :]) & ( + t_diou >= 0.25 * overlaps_thr_per_gt[None, :]) + + # limit the positive sample's center in gt + for gt_idx in range(num_gt): + candidate_idxs[:, gt_idx] += gt_idx * num_bboxes + + candidate_idxs = candidate_idxs.reshape(-1) + + # if an anchor box is assigned to multiple gts, + # the one with the highest IoU will be selected. + overlaps_inf = -np.inf * np.ones_like(overlaps).T.reshape(-1) + index = candidate_idxs.reshape(-1)[is_pos.reshape(-1)] + + overlaps_inf[index] = overlaps.T.reshape(-1)[index] + overlaps_inf = overlaps_inf.reshape(num_gt, -1).T + + max_overlaps = overlaps_inf.max(axis=1) + argmax_overlaps = overlaps_inf.argmax(axis=1) + + overlaps_inf = -np.inf * np.ones_like(overlaps).T.reshape(-1) + overlaps_inf = overlaps_inf.reshape(num_gt, -1).T + + assigned_gt_inds[max_overlaps != + -np.inf] = argmax_overlaps[max_overlaps != -np.inf] + 1 + + vlr_region_iou[max_overlaps != + -np.inf] = max_overlaps[max_overlaps != -np.inf] + 0 + + return vlr_region_iou diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/autoaugment_utils.py b/PaddleDetection-release-2.6/ppdet/data/transform/autoaugment_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..cfa89d374d94260c881566c12ef6a6afd5e823b9 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/autoaugment_utils.py @@ -0,0 +1,1586 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Reference: +# https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py +"""AutoAugment util file.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import inspect +import math +from PIL import Image, ImageEnhance +import numpy as np +import cv2 +from copy import deepcopy + +# This signifies the max integer that the controller RNN could predict for the +# augmentation scheme. +_MAX_LEVEL = 10. + +# Represents an invalid bounding box that is used for checking for padding +# lists of bounding box coordinates for a few augmentation operations +_INVALID_BOX = [[-1.0, -1.0, -1.0, -1.0]] + + +def policy_v0(): + """Autoaugment policy that was used in AutoAugment Detection Paper.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)], + [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)], + [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)], + [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)], + [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)], + ] + return policy + + +def policy_v1(): + """Autoaugment policy that was used in AutoAugment Detection Paper.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)], + [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)], + [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)], + [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)], + [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)], + [('Color', 0.0, 0), ('ShearX_Only_BBoxes', 0.8, 4)], + [('ShearY_Only_BBoxes', 0.8, 2), ('Flip_Only_BBoxes', 0.0, 10)], + [('Equalize', 0.6, 10), ('TranslateX_BBox', 0.2, 2)], + [('Color', 1.0, 10), ('TranslateY_Only_BBoxes', 0.4, 6)], + [('Rotate_BBox', 0.8, 10), ('Contrast', 0.0, 10)], # , + [('Cutout', 0.2, 2), ('Brightness', 0.8, 10)], + [('Color', 1.0, 6), ('Equalize', 1.0, 2)], + [('Cutout_Only_BBoxes', 0.4, 6), ('TranslateY_Only_BBoxes', 0.8, 2)], + [('Color', 0.2, 8), ('Rotate_BBox', 0.8, 10)], + [('Sharpness', 0.4, 4), ('TranslateY_Only_BBoxes', 0.0, 4)], + [('Sharpness', 1.0, 4), ('SolarizeAdd', 0.4, 4)], + [('Rotate_BBox', 1.0, 8), ('Sharpness', 0.2, 8)], + [('ShearY_BBox', 0.6, 10), ('Equalize_Only_BBoxes', 0.6, 8)], + [('ShearX_BBox', 0.2, 6), ('TranslateY_Only_BBoxes', 0.2, 10)], + [('SolarizeAdd', 0.6, 8), ('Brightness', 0.8, 10)], + ] + return policy + + +def policy_vtest(): + """Autoaugment test policy for debugging.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [[('TranslateX_BBox', 1.0, 4), ('Equalize', 1.0, 10)], ] + return policy + + +def policy_v2(): + """Additional policy that performs well on object detection.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('Color', 0.0, 6), ('Cutout', 0.6, 8), ('Sharpness', 0.4, 8)], + [('Rotate_BBox', 0.4, 8), ('Sharpness', 0.4, 2), + ('Rotate_BBox', 0.8, 10)], + [('TranslateY_BBox', 1.0, 8), ('AutoContrast', 0.8, 2)], + [('AutoContrast', 0.4, 6), ('ShearX_BBox', 0.8, 8), + ('Brightness', 0.0, 10)], + [('SolarizeAdd', 0.2, 6), ('Contrast', 0.0, 10), + ('AutoContrast', 0.6, 0)], + [('Cutout', 0.2, 0), ('Solarize', 0.8, 8), ('Color', 1.0, 4)], + [('TranslateY_BBox', 0.0, 4), ('Equalize', 0.6, 8), + ('Solarize', 0.0, 10)], + [('TranslateY_BBox', 0.2, 2), ('ShearY_BBox', 0.8, 8), + ('Rotate_BBox', 0.8, 8)], + [('Cutout', 0.8, 8), ('Brightness', 0.8, 8), ('Cutout', 0.2, 2)], + [('Color', 0.8, 4), ('TranslateY_BBox', 1.0, 6), + ('Rotate_BBox', 0.6, 6)], + [('Rotate_BBox', 0.6, 10), ('BBox_Cutout', 1.0, 4), ('Cutout', 0.2, 8)], + [('Rotate_BBox', 0.0, 0), ('Equalize', 0.6, 6), + ('ShearY_BBox', 0.6, 8)], + [('Brightness', 0.8, 8), ('AutoContrast', 0.4, 2), + ('Brightness', 0.2, 2)], + [('TranslateY_BBox', 0.4, 8), ('Solarize', 0.4, 6), + ('SolarizeAdd', 0.2, 10)], + [('Contrast', 1.0, 10), ('SolarizeAdd', 0.2, 8), ('Equalize', 0.2, 4)], + ] + return policy + + +def policy_v3(): + """"Additional policy that performs well on object detection.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('Posterize', 0.8, 2), ('TranslateX_BBox', 1.0, 8)], + [('BBox_Cutout', 0.2, 10), ('Sharpness', 1.0, 8)], + [('Rotate_BBox', 0.6, 8), ('Rotate_BBox', 0.8, 10)], + [('Equalize', 0.8, 10), ('AutoContrast', 0.2, 10)], + [('SolarizeAdd', 0.2, 2), ('TranslateY_BBox', 0.2, 8)], + [('Sharpness', 0.0, 2), ('Color', 0.4, 8)], + [('Equalize', 1.0, 8), ('TranslateY_BBox', 1.0, 8)], + [('Posterize', 0.6, 2), ('Rotate_BBox', 0.0, 10)], + [('AutoContrast', 0.6, 0), ('Rotate_BBox', 1.0, 6)], + [('Equalize', 0.0, 4), ('Cutout', 0.8, 10)], + [('Brightness', 1.0, 2), ('TranslateY_BBox', 1.0, 6)], + [('Contrast', 0.0, 2), ('ShearY_BBox', 0.8, 0)], + [('AutoContrast', 0.8, 10), ('Contrast', 0.2, 10)], + [('Rotate_BBox', 1.0, 10), ('Cutout', 1.0, 10)], + [('SolarizeAdd', 0.8, 6), ('Equalize', 0.8, 8)], + ] + return policy + + +def _equal(val1, val2, eps=1e-8): + return abs(val1 - val2) <= eps + + +def blend(image1, image2, factor): + """Blend image1 and image2 using 'factor'. + + Factor can be above 0.0. A value of 0.0 means only image1 is used. + A value of 1.0 means only image2 is used. A value between 0.0 and + 1.0 means we linearly interpolate the pixel values between the two + images. A value greater than 1.0 "extrapolates" the difference + between the two pixel values, and we clip the results to values + between 0 and 255. + + Args: + image1: An image Tensor of type uint8. + image2: An image Tensor of type uint8. + factor: A floating point value above 0.0. + + Returns: + A blended image Tensor of type uint8. + """ + if factor == 0.0: + return image1 + if factor == 1.0: + return image2 + + image1 = image1.astype(np.float32) + image2 = image2.astype(np.float32) + + difference = image2 - image1 + scaled = factor * difference + + # Do addition in float. + temp = image1 + scaled + + # Interpolate + if factor > 0.0 and factor < 1.0: + # Interpolation means we always stay within 0 and 255. + return temp.astype(np.uint8) + + # Extrapolate: + # + # We need to clip and then cast. + return np.clip(temp, a_min=0, a_max=255).astype(np.uint8) + + +def cutout(image, pad_size, replace=0): + """Apply cutout (https://arxiv.org/abs/1708.04552) to image. + + This operation applies a (2*pad_size x 2*pad_size) mask of zeros to + a random location within `img`. The pixel values filled in will be of the + value `replace`. The located where the mask will be applied is randomly + chosen uniformly over the whole image. + + Args: + image: An image Tensor of type uint8. + pad_size: Specifies how big the zero mask that will be generated is that + is applied to the image. The mask will be of size + (2*pad_size x 2*pad_size). + replace: What pixel value to fill in the image in the area that has + the cutout mask applied to it. + + Returns: + An image Tensor that is of type uint8. + Example: + img = cv2.imread( "/home/vis/gry/train/img_data/test.jpg", cv2.COLOR_BGR2RGB ) + new_img = cutout(img, pad_size=50, replace=0) + """ + image_height, image_width = image.shape[0], image.shape[1] + + cutout_center_height = np.random.randint(low=0, high=image_height) + cutout_center_width = np.random.randint(low=0, high=image_width) + + lower_pad = np.maximum(0, cutout_center_height - pad_size) + upper_pad = np.maximum(0, image_height - cutout_center_height - pad_size) + left_pad = np.maximum(0, cutout_center_width - pad_size) + right_pad = np.maximum(0, image_width - cutout_center_width - pad_size) + + cutout_shape = [ + image_height - (lower_pad + upper_pad), + image_width - (left_pad + right_pad) + ] + padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] + mask = np.pad(np.zeros( + cutout_shape, dtype=image.dtype), + padding_dims, + 'constant', + constant_values=1) + mask = np.expand_dims(mask, -1) + mask = np.tile(mask, [1, 1, 3]) + image = np.where( + np.equal(mask, 0), + np.ones_like( + image, dtype=image.dtype) * replace, + image) + return image.astype(np.uint8) + + +def solarize(image, threshold=128): + # For each pixel in the image, select the pixel + # if the value is less than the threshold. + # Otherwise, subtract 255 from the pixel. + return np.where(image < threshold, image, 255 - image) + + +def solarize_add(image, addition=0, threshold=128): + # For each pixel in the image less than threshold + # we add 'addition' amount to it and then clip the + # pixel value to be between 0 and 255. The value + # of 'addition' is between -128 and 128. + added_image = image.astype(np.int64) + addition + added_image = np.clip(added_image, a_min=0, a_max=255).astype(np.uint8) + return np.where(image < threshold, added_image, image) + + +def color(image, factor): + """use cv2 to deal""" + gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) + degenerate = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR) + return blend(degenerate, image, factor) + + +# refer to https://github.com/4uiiurz1/pytorch-auto-augment/blob/024b2eac4140c38df8342f09998e307234cafc80/auto_augment.py#L197 +def contrast(img, factor): + img = ImageEnhance.Contrast(Image.fromarray(img)).enhance(factor) + return np.array(img) + + +def brightness(image, factor): + """Equivalent of PIL Brightness.""" + degenerate = np.zeros_like(image) + return blend(degenerate, image, factor) + + +def posterize(image, bits): + """Equivalent of PIL Posterize.""" + shift = 8 - bits + return np.left_shift(np.right_shift(image, shift), shift) + + +def rotate(image, degrees, replace): + """Rotates the image by degrees either clockwise or counterclockwise. + + Args: + image: An image Tensor of type uint8. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + replace: A one or three value 1D tensor to fill empty pixels caused by + the rotate operation. + + Returns: + The rotated version of image. + """ + image = wrap(image) + image = Image.fromarray(image) + image = image.rotate(degrees) + image = np.array(image, dtype=np.uint8) + return unwrap(image, replace) + + +def random_shift_bbox(image, + bbox, + pixel_scaling, + replace, + new_min_bbox_coords=None): + """Move the bbox and the image content to a slightly new random location. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + The potential values for the new min corner of the bbox will be between + [old_min - pixel_scaling * bbox_height/2, + old_min - pixel_scaling * bbox_height/2]. + pixel_scaling: A float between 0 and 1 that specifies the pixel range + that the new bbox location will be sampled from. + replace: A one or three value 1D tensor to fill empty pixels. + new_min_bbox_coords: If not None, then this is a tuple that specifies the + (min_y, min_x) coordinates of the new bbox. Normally this is randomly + specified, but this allows it to be manually set. The coordinates are + the absolute coordinates between 0 and image height/width and are int32. + + Returns: + The new image that will have the shifted bbox location in it along with + the new bbox that contains the new coordinates. + """ + # Obtains image height and width and create helper clip functions. + image_height, image_width = image.shape[0], image.shape[1] + image_height = float(image_height) + image_width = float(image_width) + + def clip_y(val): + return np.clip(val, a_min=0, a_max=image_height - 1).astype(np.int32) + + def clip_x(val): + return np.clip(val, a_min=0, a_max=image_width - 1).astype(np.int32) + + # Convert bbox to pixel coordinates. + min_y = int(image_height * bbox[0]) + min_x = int(image_width * bbox[1]) + max_y = clip_y(image_height * bbox[2]) + max_x = clip_x(image_width * bbox[3]) + + bbox_height, bbox_width = (max_y - min_y + 1, max_x - min_x + 1) + image_height = int(image_height) + image_width = int(image_width) + + # Select the new min/max bbox ranges that are used for sampling the + # new min x/y coordinates of the shifted bbox. + minval_y = clip_y(min_y - np.int32(pixel_scaling * float(bbox_height) / + 2.0)) + maxval_y = clip_y(min_y + np.int32(pixel_scaling * float(bbox_height) / + 2.0)) + minval_x = clip_x(min_x - np.int32(pixel_scaling * float(bbox_width) / 2.0)) + maxval_x = clip_x(min_x + np.int32(pixel_scaling * float(bbox_width) / 2.0)) + + # Sample and calculate the new unclipped min/max coordinates of the new bbox. + if new_min_bbox_coords is None: + unclipped_new_min_y = np.random.randint( + low=minval_y, high=maxval_y, dtype=np.int32) + unclipped_new_min_x = np.random.randint( + low=minval_x, high=maxval_x, dtype=np.int32) + else: + unclipped_new_min_y, unclipped_new_min_x = ( + clip_y(new_min_bbox_coords[0]), clip_x(new_min_bbox_coords[1])) + unclipped_new_max_y = unclipped_new_min_y + bbox_height - 1 + unclipped_new_max_x = unclipped_new_min_x + bbox_width - 1 + + # Determine if any of the new bbox was shifted outside the current image. + # This is used for determining if any of the original bbox content should be + # discarded. + new_min_y, new_min_x, new_max_y, new_max_x = ( + clip_y(unclipped_new_min_y), clip_x(unclipped_new_min_x), + clip_y(unclipped_new_max_y), clip_x(unclipped_new_max_x)) + shifted_min_y = (new_min_y - unclipped_new_min_y) + min_y + shifted_max_y = max_y - (unclipped_new_max_y - new_max_y) + shifted_min_x = (new_min_x - unclipped_new_min_x) + min_x + shifted_max_x = max_x - (unclipped_new_max_x - new_max_x) + + # Create the new bbox tensor by converting pixel integer values to floats. + new_bbox = np.stack([ + float(new_min_y) / float(image_height), float(new_min_x) / + float(image_width), float(new_max_y) / float(image_height), + float(new_max_x) / float(image_width) + ]) + + # Copy the contents in the bbox and fill the old bbox location + # with gray (128). + bbox_content = image[shifted_min_y:shifted_max_y + 1, shifted_min_x: + shifted_max_x + 1, :] + + def mask_and_add_image(min_y_, min_x_, max_y_, max_x_, mask, content_tensor, + image_): + """Applies mask to bbox region in image then adds content_tensor to it.""" + mask = np.pad(mask, [[min_y_, (image_height - 1) - max_y_], + [min_x_, (image_width - 1) - max_x_], [0, 0]], + 'constant', + constant_values=1) + + content_tensor = np.pad(content_tensor, + [[min_y_, (image_height - 1) - max_y_], + [min_x_, (image_width - 1) - max_x_], [0, 0]], + 'constant', + constant_values=0) + return image_ * mask + content_tensor + + # Zero out original bbox location. + mask = np.zeros_like(image)[min_y:max_y + 1, min_x:max_x + 1, :] + grey_tensor = np.zeros_like(mask) + replace[0] + image = mask_and_add_image(min_y, min_x, max_y, max_x, mask, grey_tensor, + image) + + # Fill in bbox content to new bbox location. + mask = np.zeros_like(bbox_content) + image = mask_and_add_image(new_min_y, new_min_x, new_max_y, new_max_x, mask, + bbox_content, image) + + return image.astype(np.uint8), new_bbox + + +def _clip_bbox(min_y, min_x, max_y, max_x): + """Clip bounding box coordinates between 0 and 1. + + Args: + min_y: Normalized bbox coordinate of type float between 0 and 1. + min_x: Normalized bbox coordinate of type float between 0 and 1. + max_y: Normalized bbox coordinate of type float between 0 and 1. + max_x: Normalized bbox coordinate of type float between 0 and 1. + + Returns: + Clipped coordinate values between 0 and 1. + """ + min_y = np.clip(min_y, a_min=0, a_max=1.0) + min_x = np.clip(min_x, a_min=0, a_max=1.0) + max_y = np.clip(max_y, a_min=0, a_max=1.0) + max_x = np.clip(max_x, a_min=0, a_max=1.0) + return min_y, min_x, max_y, max_x + + +def _check_bbox_area(min_y, min_x, max_y, max_x, delta=0.05): + """Adjusts bbox coordinates to make sure the area is > 0. + + Args: + min_y: Normalized bbox coordinate of type float between 0 and 1. + min_x: Normalized bbox coordinate of type float between 0 and 1. + max_y: Normalized bbox coordinate of type float between 0 and 1. + max_x: Normalized bbox coordinate of type float between 0 and 1. + delta: Float, this is used to create a gap of size 2 * delta between + bbox min/max coordinates that are the same on the boundary. + This prevents the bbox from having an area of zero. + + Returns: + Tuple of new bbox coordinates between 0 and 1 that will now have a + guaranteed area > 0. + """ + height = max_y - min_y + width = max_x - min_x + + def _adjust_bbox_boundaries(min_coord, max_coord): + # Make sure max is never 0 and min is never 1. + max_coord = np.maximum(max_coord, 0.0 + delta) + min_coord = np.minimum(min_coord, 1.0 - delta) + return min_coord, max_coord + + if _equal(height, 0): + min_y, max_y = _adjust_bbox_boundaries(min_y, max_y) + + if _equal(width, 0): + min_x, max_x = _adjust_bbox_boundaries(min_x, max_x) + + return min_y, min_x, max_y, max_x + + +def _scale_bbox_only_op_probability(prob): + """Reduce the probability of the bbox-only operation. + + Probability is reduced so that we do not distort the content of too many + bounding boxes that are close to each other. The value of 3.0 was a chosen + hyper parameter when designing the autoaugment algorithm that we found + empirically to work well. + + Args: + prob: Float that is the probability of applying the bbox-only operation. + + Returns: + Reduced probability. + """ + return prob / 3.0 + + +def _apply_bbox_augmentation(image, bbox, augmentation_func, *args): + """Applies augmentation_func to the subsection of image indicated by bbox. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + augmentation_func: Augmentation function that will be applied to the + subsection of image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A modified version of image, where the bbox location in the image will + have `ugmentation_func applied to it. + """ + image_height = image.shape[0] + image_width = image.shape[1] + + min_y = int(image_height * bbox[0]) + min_x = int(image_width * bbox[1]) + max_y = int(image_height * bbox[2]) + max_x = int(image_width * bbox[3]) + + # Clip to be sure the max values do not fall out of range. + max_y = np.minimum(max_y, image_height - 1) + max_x = np.minimum(max_x, image_width - 1) + + # Get the sub-tensor that is the image within the bounding box region. + bbox_content = image[min_y:max_y + 1, min_x:max_x + 1, :] + + # Apply the augmentation function to the bbox portion of the image. + augmented_bbox_content = augmentation_func(bbox_content, *args) + + # Pad the augmented_bbox_content and the mask to match the shape of original + # image. + augmented_bbox_content = np.pad( + augmented_bbox_content, [[min_y, (image_height - 1) - max_y], + [min_x, (image_width - 1) - max_x], [0, 0]], + 'constant', + constant_values=1) + + # Create a mask that will be used to zero out a part of the original image. + mask_tensor = np.zeros_like(bbox_content) + + mask_tensor = np.pad(mask_tensor, + [[min_y, (image_height - 1) - max_y], + [min_x, (image_width - 1) - max_x], [0, 0]], + 'constant', + constant_values=1) + # Replace the old bbox content with the new augmented content. + image = image * mask_tensor + augmented_bbox_content + return image.astype(np.uint8) + + +def _concat_bbox(bbox, bboxes): + """Helper function that concates bbox to bboxes along the first dimension.""" + + # Note if all elements in bboxes are -1 (_INVALID_BOX), then this means + # we discard bboxes and start the bboxes Tensor with the current bbox. + bboxes_sum_check = np.sum(bboxes) + bbox = np.expand_dims(bbox, 0) + # This check will be true when it is an _INVALID_BOX + if _equal(bboxes_sum_check, -4): + bboxes = bbox + else: + bboxes = np.concatenate([bboxes, bbox], 0) + return bboxes + + +def _apply_bbox_augmentation_wrapper(image, bbox, new_bboxes, prob, + augmentation_func, func_changes_bbox, + *args): + """Applies _apply_bbox_augmentation with probability prob. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + new_bboxes: 2D Tensor that is a list of the bboxes in the image after they + have been altered by aug_func. These will only be changed when + func_changes_bbox is set to true. Each bbox has 4 elements + (min_y, min_x, max_y, max_x) of type float that are the normalized + bbox coordinates between 0 and 1. + prob: Float that is the probability of applying _apply_bbox_augmentation. + augmentation_func: Augmentation function that will be applied to the + subsection of image. + func_changes_bbox: Boolean. Does augmentation_func return bbox in addition + to image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A tuple. Fist element is a modified version of image, where the bbox + location in the image will have augmentation_func applied to it if it is + chosen to be called with probability `prob`. The second element is a + Tensor of Tensors of length 4 that will contain the altered bbox after + applying augmentation_func. + """ + should_apply_op = (np.random.rand() + prob >= 1) + if func_changes_bbox: + if should_apply_op: + augmented_image, bbox = augmentation_func(image, bbox, *args) + else: + augmented_image, bbox = (image, bbox) + else: + if should_apply_op: + augmented_image = _apply_bbox_augmentation(image, bbox, + augmentation_func, *args) + else: + augmented_image = image + new_bboxes = _concat_bbox(bbox, new_bboxes) + return augmented_image.astype(np.uint8), new_bboxes + + +def _apply_multi_bbox_augmentation(image, bboxes, prob, aug_func, + func_changes_bbox, *args): + """Applies aug_func to the image for each bbox in bboxes. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float. + prob: Float that is the probability of applying aug_func to a specific + bounding box within the image. + aug_func: Augmentation function that will be applied to the + subsections of image indicated by the bbox values in bboxes. + func_changes_bbox: Boolean. Does augmentation_func return bbox in addition + to image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A modified version of image, where each bbox location in the image will + have augmentation_func applied to it if it is chosen to be called with + probability prob independently across all bboxes. Also the final + bboxes are returned that will be unchanged if func_changes_bbox is set to + false and if true, the new altered ones will be returned. + """ + # Will keep track of the new altered bboxes after aug_func is repeatedly + # applied. The -1 values are a dummy value and this first Tensor will be + # removed upon appending the first real bbox. + new_bboxes = np.array(_INVALID_BOX) + + # If the bboxes are empty, then just give it _INVALID_BOX. The result + # will be thrown away. + bboxes = np.array((_INVALID_BOX)) if bboxes.size == 0 else bboxes + + assert bboxes.shape[1] == 4, "bboxes.shape[1] must be 4!!!!" + + # pylint:disable=g-long-lambda + # pylint:disable=line-too-long + wrapped_aug_func = lambda _image, bbox, _new_bboxes: _apply_bbox_augmentation_wrapper(_image, bbox, _new_bboxes, prob, aug_func, func_changes_bbox, *args) + # pylint:enable=g-long-lambda + # pylint:enable=line-too-long + + # Setup the while_loop. + num_bboxes = bboxes.shape[0] # We loop until we go over all bboxes. + idx = 0 # Counter for the while loop. + + # Conditional function when to end the loop once we go over all bboxes + # images_and_bboxes contain (_image, _new_bboxes) + def cond(_idx, _images_and_bboxes): + return _idx < num_bboxes + + # Shuffle the bboxes so that the augmentation order is not deterministic if + # we are not changing the bboxes with aug_func. + # if not func_changes_bbox: + # print(bboxes) + # loop_bboxes = np.take(bboxes,np.random.permutation(bboxes.shape[0]),axis=0) + # print(loop_bboxes) + # else: + # loop_bboxes = bboxes + # we can not shuffle the bbox because it does not contain class information here + loop_bboxes = deepcopy(bboxes) + + # Main function of while_loop where we repeatedly apply augmentation on the + # bboxes in the image. + # pylint:disable=g-long-lambda + body = lambda _idx, _images_and_bboxes: [ + _idx + 1, wrapped_aug_func(_images_and_bboxes[0], + loop_bboxes[_idx], + _images_and_bboxes[1])] + while (cond(idx, (image, new_bboxes))): + idx, (image, new_bboxes) = body(idx, (image, new_bboxes)) + + # Either return the altered bboxes or the original ones depending on if + # we altered them in anyway. + if func_changes_bbox: + final_bboxes = new_bboxes + else: + final_bboxes = bboxes + return image, final_bboxes + + +def _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, aug_func, + func_changes_bbox, *args): + """Checks to be sure num bboxes > 0 before calling inner function.""" + num_bboxes = len(bboxes) + new_image = deepcopy(image) + new_bboxes = deepcopy(bboxes) + if num_bboxes != 0: + new_image, new_bboxes = _apply_multi_bbox_augmentation( + new_image, new_bboxes, prob, aug_func, func_changes_bbox, *args) + return new_image, new_bboxes + + +def rotate_only_bboxes(image, bboxes, prob, degrees, replace): + """Apply rotate to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, rotate, func_changes_bbox, degrees, replace) + + +def shear_x_only_bboxes(image, bboxes, prob, level, replace): + """Apply shear_x to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, shear_x, func_changes_bbox, level, replace) + + +def shear_y_only_bboxes(image, bboxes, prob, level, replace): + """Apply shear_y to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, shear_y, func_changes_bbox, level, replace) + + +def translate_x_only_bboxes(image, bboxes, prob, pixels, replace): + """Apply translate_x to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, translate_x, func_changes_bbox, pixels, replace) + + +def translate_y_only_bboxes(image, bboxes, prob, pixels, replace): + """Apply translate_y to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, translate_y, func_changes_bbox, pixels, replace) + + +def flip_only_bboxes(image, bboxes, prob): + """Apply flip_lr to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, + np.fliplr, func_changes_bbox) + + +def solarize_only_bboxes(image, bboxes, prob, threshold): + """Apply solarize to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, solarize, + func_changes_bbox, threshold) + + +def equalize_only_bboxes(image, bboxes, prob): + """Apply equalize to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, equalize, + func_changes_bbox) + + +def cutout_only_bboxes(image, bboxes, prob, pad_size, replace): + """Apply cutout to each bbox in the image with probability prob.""" + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, cutout, func_changes_bbox, pad_size, replace) + + +def _rotate_bbox(bbox, image_height, image_width, degrees): + """Rotates the bbox coordinated by degrees. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, height of the image. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + + Returns: + A tensor of the same shape as bbox, but now with the rotated coordinates. + """ + image_height, image_width = (float(image_height), float(image_width)) + + # Convert from degrees to radians. + degrees_to_radians = math.pi / 180.0 + radians = degrees * degrees_to_radians + + # Translate the bbox to the center of the image and turn the normalized 0-1 + # coordinates to absolute pixel locations. + # Y coordinates are made negative as the y axis of images goes down with + # increasing pixel values, so we negate to make sure x axis and y axis points + # are in the traditionally positive direction. + min_y = -int(image_height * (bbox[0] - 0.5)) + min_x = int(image_width * (bbox[1] - 0.5)) + max_y = -int(image_height * (bbox[2] - 0.5)) + max_x = int(image_width * (bbox[3] - 0.5)) + coordinates = np.stack([[min_y, min_x], [min_y, max_x], [max_y, min_x], + [max_y, max_x]]).astype(np.float32) + # Rotate the coordinates according to the rotation matrix clockwise if + # radians is positive, else negative + rotation_matrix = np.stack([[math.cos(radians), math.sin(radians)], + [-math.sin(radians), math.cos(radians)]]) + new_coords = np.matmul(rotation_matrix, + np.transpose(coordinates)).astype(np.int32) + + # Find min/max values and convert them back to normalized 0-1 floats. + min_y = -(float(np.max(new_coords[0, :])) / image_height - 0.5) + min_x = float(np.min(new_coords[1, :])) / image_width + 0.5 + max_y = -(float(np.min(new_coords[0, :])) / image_height - 0.5) + max_x = float(np.max(new_coords[1, :])) / image_width + 0.5 + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return np.stack([min_y, min_x, max_y, max_x]) + + +def rotate_with_bboxes(image, bboxes, degrees, replace): + # Rotate the image. + image = rotate(image, degrees, replace) + + # Convert bbox coordinates to pixel values. + image_height, image_width = image.shape[:2] + # pylint:disable=g-long-lambda + wrapped_rotate_bbox = lambda bbox: _rotate_bbox(bbox, image_height, image_width, degrees) + # pylint:enable=g-long-lambda + new_bboxes = np.zeros_like(bboxes) + for idx in range(len(bboxes)): + new_bboxes[idx] = wrapped_rotate_bbox(bboxes[idx]) + return image, new_bboxes + + +def translate_x(image, pixels, replace): + """Equivalent of PIL Translate in X dimension.""" + image = Image.fromarray(wrap(image)) + image = image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0)) + return unwrap(np.array(image), replace) + + +def translate_y(image, pixels, replace): + """Equivalent of PIL Translate in Y dimension.""" + image = Image.fromarray(wrap(image)) + image = image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels)) + return unwrap(np.array(image), replace) + + +def _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal): + """Shifts the bbox coordinates by pixels. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, width of the image. + pixels: An int. How many pixels to shift the bbox. + shift_horizontal: Boolean. If true then shift in X dimension else shift in + Y dimension. + + Returns: + A tensor of the same shape as bbox, but now with the shifted coordinates. + """ + pixels = int(pixels) + # Convert bbox to integer pixel locations. + min_y = int(float(image_height) * bbox[0]) + min_x = int(float(image_width) * bbox[1]) + max_y = int(float(image_height) * bbox[2]) + max_x = int(float(image_width) * bbox[3]) + + if shift_horizontal: + min_x = np.maximum(0, min_x - pixels) + max_x = np.minimum(image_width, max_x - pixels) + else: + min_y = np.maximum(0, min_y - pixels) + max_y = np.minimum(image_height, max_y - pixels) + + # Convert bbox back to floats. + min_y = float(min_y) / float(image_height) + min_x = float(min_x) / float(image_width) + max_y = float(max_y) / float(image_height) + max_x = float(max_x) / float(image_width) + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return np.stack([min_y, min_x, max_y, max_x]) + + +def translate_bbox(image, bboxes, pixels, replace, shift_horizontal): + """Equivalent of PIL Translate in X/Y dimension that shifts image and bbox. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float with values + between [0, 1]. + pixels: An int. How many pixels to shift the image and bboxes + replace: A one or three value 1D tensor to fill empty pixels. + shift_horizontal: Boolean. If true then shift in X dimension else shift in + Y dimension. + + Returns: + A tuple containing a 3D uint8 Tensor that will be the result of translating + image by pixels. The second element of the tuple is bboxes, where now + the coordinates will be shifted to reflect the shifted image. + """ + if shift_horizontal: + image = translate_x(image, pixels, replace) + else: + image = translate_y(image, pixels, replace) + + # Convert bbox coordinates to pixel values. + image_height, image_width = image.shape[0], image.shape[1] + # pylint:disable=g-long-lambda + wrapped_shift_bbox = lambda bbox: _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal) + # pylint:enable=g-long-lambda + new_bboxes = deepcopy(bboxes) + num_bboxes = len(bboxes) + for idx in range(num_bboxes): + new_bboxes[idx] = wrapped_shift_bbox(bboxes[idx]) + return image.astype(np.uint8), new_bboxes + + +def shear_x(image, level, replace): + """Equivalent of PIL Shearing in X dimension.""" + # Shear parallel to x axis is a projective transform + # with a matrix form of: + # [1 level + # 0 1]. + image = Image.fromarray(wrap(image)) + image = image.transform(image.size, Image.AFFINE, (1, level, 0, 0, 1, 0)) + return unwrap(np.array(image), replace) + + +def shear_y(image, level, replace): + """Equivalent of PIL Shearing in Y dimension.""" + # Shear parallel to y axis is a projective transform + # with a matrix form of: + # [1 0 + # level 1]. + image = Image.fromarray(wrap(image)) + image = image.transform(image.size, Image.AFFINE, (1, 0, 0, level, 1, 0)) + return unwrap(np.array(image), replace) + + +def _shear_bbox(bbox, image_height, image_width, level, shear_horizontal): + """Shifts the bbox according to how the image was sheared. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, height of the image. + level: Float. How much to shear the image. + shear_horizontal: If true then shear in X dimension else shear in + the Y dimension. + + Returns: + A tensor of the same shape as bbox, but now with the shifted coordinates. + """ + image_height, image_width = (float(image_height), float(image_width)) + + # Change bbox coordinates to be pixels. + min_y = int(image_height * bbox[0]) + min_x = int(image_width * bbox[1]) + max_y = int(image_height * bbox[2]) + max_x = int(image_width * bbox[3]) + coordinates = np.stack( + [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]]) + coordinates = coordinates.astype(np.float32) + + # Shear the coordinates according to the translation matrix. + if shear_horizontal: + translation_matrix = np.stack([[1, 0], [-level, 1]]) + else: + translation_matrix = np.stack([[1, -level], [0, 1]]) + translation_matrix = translation_matrix.astype(np.float32) + new_coords = np.matmul(translation_matrix, + np.transpose(coordinates)).astype(np.int32) + + # Find min/max values and convert them back to floats. + min_y = float(np.min(new_coords[0, :])) / image_height + min_x = float(np.min(new_coords[1, :])) / image_width + max_y = float(np.max(new_coords[0, :])) / image_height + max_x = float(np.max(new_coords[1, :])) / image_width + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return np.stack([min_y, min_x, max_y, max_x]) + + +def shear_with_bboxes(image, bboxes, level, replace, shear_horizontal): + """Applies Shear Transformation to the image and shifts the bboxes. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float with values + between [0, 1]. + level: Float. How much to shear the image. This value will be between + -0.3 to 0.3. + replace: A one or three value 1D tensor to fill empty pixels. + shear_horizontal: Boolean. If true then shear in X dimension else shear in + the Y dimension. + + Returns: + A tuple containing a 3D uint8 Tensor that will be the result of shearing + image by level. The second element of the tuple is bboxes, where now + the coordinates will be shifted to reflect the sheared image. + """ + if shear_horizontal: + image = shear_x(image, level, replace) + else: + image = shear_y(image, level, replace) + + # Convert bbox coordinates to pixel values. + image_height, image_width = image.shape[:2] + # pylint:disable=g-long-lambda + wrapped_shear_bbox = lambda bbox: _shear_bbox(bbox, image_height, image_width, level, shear_horizontal) + # pylint:enable=g-long-lambda + new_bboxes = deepcopy(bboxes) + num_bboxes = len(bboxes) + for idx in range(num_bboxes): + new_bboxes[idx] = wrapped_shear_bbox(bboxes[idx]) + return image.astype(np.uint8), new_bboxes + + +def autocontrast(image): + """Implements Autocontrast function from PIL. + + Args: + image: A 3D uint8 tensor. + + Returns: + The image after it has had autocontrast applied to it and will be of type + uint8. + """ + + def scale_channel(image): + """Scale the 2D image using the autocontrast rule.""" + # A possibly cheaper version can be done using cumsum/unique_with_counts + # over the histogram values, rather than iterating over the entire image. + # to compute mins and maxes. + lo = float(np.min(image)) + hi = float(np.max(image)) + + # Scale the image, making the lowest value 0 and the highest value 255. + def scale_values(im): + scale = 255.0 / (hi - lo) + offset = -lo * scale + im = im.astype(np.float32) * scale + offset + img = np.clip(im, a_min=0, a_max=255.0) + return im.astype(np.uint8) + + result = scale_values(image) if hi > lo else image + return result + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image[:, :, 0]) + s2 = scale_channel(image[:, :, 1]) + s3 = scale_channel(image[:, :, 2]) + image = np.stack([s1, s2, s3], 2) + return image + + +def sharpness(image, factor): + """Implements Sharpness function from PIL.""" + orig_image = image + image = image.astype(np.float32) + # Make image 4D for conv operation. + # SMOOTH PIL Kernel. + kernel = np.array([[1, 1, 1], [1, 5, 1], [1, 1, 1]], dtype=np.float32) / 13. + result = cv2.filter2D(image, -1, kernel).astype(np.uint8) + + # Blend the final result. + return blend(result, orig_image, factor) + + +def equalize(image): + """Implements Equalize function from PIL using.""" + + def scale_channel(im, c): + """Scale the data in the channel to implement equalize.""" + im = im[:, :, c].astype(np.int32) + # Compute the histogram of the image channel. + histo, _ = np.histogram(im, range=[0, 255], bins=256) + + # For the purposes of computing the step, filter out the nonzeros. + nonzero = np.where(np.not_equal(histo, 0)) + nonzero_histo = np.reshape(np.take(histo, nonzero), [-1]) + step = (np.sum(nonzero_histo) - nonzero_histo[-1]) // 255 + + def build_lut(histo, step): + # Compute the cumulative sum, shifting by step // 2 + # and then normalization by step. + lut = (np.cumsum(histo) + (step // 2)) // step + # Shift lut, prepending with 0. + lut = np.concatenate([[0], lut[:-1]], 0) + # Clip the counts to be in range. This is done + # in the C code for image.point. + return np.clip(lut, a_min=0, a_max=255).astype(np.uint8) + + # If step is zero, return the original image. Otherwise, build + # lut from the full histogram and step and then index from it. + if step == 0: + result = im + else: + result = np.take(build_lut(histo, step), im) + + return result.astype(np.uint8) + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image, 0) + s2 = scale_channel(image, 1) + s3 = scale_channel(image, 2) + image = np.stack([s1, s2, s3], 2) + return image + + +def wrap(image): + """Returns 'image' with an extra channel set to all 1s.""" + shape = image.shape + extended_channel = 255 * np.ones([shape[0], shape[1], 1], image.dtype) + extended = np.concatenate([image, extended_channel], 2).astype(image.dtype) + return extended + + +def unwrap(image, replace): + """Unwraps an image produced by wrap. + + Where there is a 0 in the last channel for every spatial position, + the rest of the three channels in that spatial dimension are grayed + (set to 128). Operations like translate and shear on a wrapped + Tensor will leave 0s in empty locations. Some transformations look + at the intensity of values to do preprocessing, and we want these + empty pixels to assume the 'average' value, rather than pure black. + + + Args: + image: A 3D Image Tensor with 4 channels. + replace: A one or three value 1D tensor to fill empty pixels. + + Returns: + image: A 3D image Tensor with 3 channels. + """ + image_shape = image.shape + # Flatten the spatial dimensions. + flattened_image = np.reshape(image, [-1, image_shape[2]]) + + # Find all pixels where the last channel is zero. + alpha_channel = flattened_image[:, 3] + + replace = np.concatenate([replace, np.ones([1], image.dtype)], 0) + + # Where they are zero, fill them in with 'replace'. + alpha_channel = np.reshape(alpha_channel, (-1, 1)) + alpha_channel = np.tile(alpha_channel, reps=(1, flattened_image.shape[1])) + + flattened_image = np.where( + np.equal(alpha_channel, 0), + np.ones_like( + flattened_image, dtype=image.dtype) * replace, + flattened_image) + + image = np.reshape(flattened_image, image_shape) + image = image[:, :, :3] + return image.astype(np.uint8) + + +def _cutout_inside_bbox(image, bbox, pad_fraction): + """Generates cutout mask and the mean pixel value of the bbox. + + First a location is randomly chosen within the image as the center where the + cutout mask will be applied. Note this can be towards the boundaries of the + image, so the full cutout mask may not be applied. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + pad_fraction: Float that specifies how large the cutout mask should be in + in reference to the size of the original bbox. If pad_fraction is 0.25, + then the cutout mask will be of shape + (0.25 * bbox height, 0.25 * bbox width). + + Returns: + A tuple. Fist element is a tensor of the same shape as image where each + element is either a 1 or 0 that is used to determine where the image + will have cutout applied. The second element is the mean of the pixels + in the image where the bbox is located. + mask value: [0,1] + """ + image_height, image_width = image.shape[0], image.shape[1] + # Transform from shape [1, 4] to [4]. + bbox = np.squeeze(bbox) + + min_y = int(float(image_height) * bbox[0]) + min_x = int(float(image_width) * bbox[1]) + max_y = int(float(image_height) * bbox[2]) + max_x = int(float(image_width) * bbox[3]) + + # Calculate the mean pixel values in the bounding box, which will be used + # to fill the cutout region. + mean = np.mean(image[min_y:max_y + 1, min_x:max_x + 1], axis=(0, 1)) + # Cutout mask will be size pad_size_heigh * 2 by pad_size_width * 2 if the + # region lies entirely within the bbox. + box_height = max_y - min_y + 1 + box_width = max_x - min_x + 1 + pad_size_height = int(pad_fraction * (box_height / 2)) + pad_size_width = int(pad_fraction * (box_width / 2)) + + # Sample the center location in the image where the zero mask will be applied. + cutout_center_height = np.random.randint(min_y, max_y + 1, dtype=np.int32) + cutout_center_width = np.random.randint(min_x, max_x + 1, dtype=np.int32) + + lower_pad = np.maximum(0, cutout_center_height - pad_size_height) + upper_pad = np.maximum( + 0, image_height - cutout_center_height - pad_size_height) + left_pad = np.maximum(0, cutout_center_width - pad_size_width) + right_pad = np.maximum(0, + image_width - cutout_center_width - pad_size_width) + + cutout_shape = [ + image_height - (lower_pad + upper_pad), + image_width - (left_pad + right_pad) + ] + padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] + + mask = np.pad(np.zeros( + cutout_shape, dtype=image.dtype), + padding_dims, + 'constant', + constant_values=1) + + mask = np.expand_dims(mask, 2) + mask = np.tile(mask, [1, 1, 3]) + return mask, mean + + +def bbox_cutout(image, bboxes, pad_fraction, replace_with_mean): + """Applies cutout to the image according to bbox information. + + This is a cutout variant that using bbox information to make more informed + decisions on where to place the cutout mask. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float with values + between [0, 1]. + pad_fraction: Float that specifies how large the cutout mask should be in + in reference to the size of the original bbox. If pad_fraction is 0.25, + then the cutout mask will be of shape + (0.25 * bbox height, 0.25 * bbox width). + replace_with_mean: Boolean that specified what value should be filled in + where the cutout mask is applied. Since the incoming image will be of + uint8 and will not have had any mean normalization applied, by default + we set the value to be 128. If replace_with_mean is True then we find + the mean pixel values across the channel dimension and use those to fill + in where the cutout mask is applied. + + Returns: + A tuple. First element is a tensor of the same shape as image that has + cutout applied to it. Second element is the bboxes that were passed in + that will be unchanged. + """ + + def apply_bbox_cutout(image, bboxes, pad_fraction): + """Applies cutout to a single bounding box within image.""" + # Choose a single bounding box to apply cutout to. + random_index = np.random.randint(0, bboxes.shape[0], dtype=np.int32) + # Select the corresponding bbox and apply cutout. + chosen_bbox = np.take(bboxes, random_index, axis=0) + mask, mean = _cutout_inside_bbox(image, chosen_bbox, pad_fraction) + + # When applying cutout we either set the pixel value to 128 or to the mean + # value inside the bbox. + replace = mean if replace_with_mean else [128] * 3 + + # Apply the cutout mask to the image. Where the mask is 0 we fill it with + # `replace`. + image = np.where( + np.equal(mask, 0), + np.ones_like( + image, dtype=image.dtype) * replace, + image).astype(image.dtype) + return image + + # Check to see if there are boxes, if so then apply boxcutout. + if len(bboxes) != 0: + image = apply_bbox_cutout(image, bboxes, pad_fraction) + + return image, bboxes + + +NAME_TO_FUNC = { + 'AutoContrast': autocontrast, + 'Equalize': equalize, + 'Posterize': posterize, + 'Solarize': solarize, + 'SolarizeAdd': solarize_add, + 'Color': color, + 'Contrast': contrast, + 'Brightness': brightness, + 'Sharpness': sharpness, + 'Cutout': cutout, + 'BBox_Cutout': bbox_cutout, + 'Rotate_BBox': rotate_with_bboxes, + # pylint:disable=g-long-lambda + 'TranslateX_BBox': lambda image, bboxes, pixels, replace: translate_bbox( + image, bboxes, pixels, replace, shift_horizontal=True), + 'TranslateY_BBox': lambda image, bboxes, pixels, replace: translate_bbox( + image, bboxes, pixels, replace, shift_horizontal=False), + 'ShearX_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( + image, bboxes, level, replace, shear_horizontal=True), + 'ShearY_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( + image, bboxes, level, replace, shear_horizontal=False), + # pylint:enable=g-long-lambda + 'Rotate_Only_BBoxes': rotate_only_bboxes, + 'ShearX_Only_BBoxes': shear_x_only_bboxes, + 'ShearY_Only_BBoxes': shear_y_only_bboxes, + 'TranslateX_Only_BBoxes': translate_x_only_bboxes, + 'TranslateY_Only_BBoxes': translate_y_only_bboxes, + 'Flip_Only_BBoxes': flip_only_bboxes, + 'Solarize_Only_BBoxes': solarize_only_bboxes, + 'Equalize_Only_BBoxes': equalize_only_bboxes, + 'Cutout_Only_BBoxes': cutout_only_bboxes, +} + + +def _randomly_negate_tensor(tensor): + """With 50% prob turn the tensor negative.""" + should_flip = np.floor(np.random.rand() + 0.5) >= 1 + final_tensor = tensor if should_flip else -tensor + return final_tensor + + +def _rotate_level_to_arg(level): + level = (level / _MAX_LEVEL) * 30. + level = _randomly_negate_tensor(level) + return (level, ) + + +def _shrink_level_to_arg(level): + """Converts level to ratio by which we shrink the image content.""" + if level == 0: + return (1.0, ) # if level is zero, do not shrink the image + # Maximum shrinking ratio is 2.9. + level = 2. / (_MAX_LEVEL / level) + 0.9 + return (level, ) + + +def _enhance_level_to_arg(level): + return ((level / _MAX_LEVEL) * 1.8 + 0.1, ) + + +def _shear_level_to_arg(level): + level = (level / _MAX_LEVEL) * 0.3 + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level, ) + + +def _translate_level_to_arg(level, translate_const): + level = (level / _MAX_LEVEL) * float(translate_const) + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level, ) + + +def _bbox_cutout_level_to_arg(level, hparams): + cutout_pad_fraction = (level / + _MAX_LEVEL) * 0.75 # hparams.cutout_max_pad_fraction + return (cutout_pad_fraction, False) # hparams.cutout_bbox_replace_with_mean + + +def level_to_arg(hparams): + return { + 'AutoContrast': lambda level: (), + 'Equalize': lambda level: (), + 'Posterize': lambda level: (int((level / _MAX_LEVEL) * 4), ), + 'Solarize': lambda level: (int((level / _MAX_LEVEL) * 256), ), + 'SolarizeAdd': lambda level: (int((level / _MAX_LEVEL) * 110), ), + 'Color': _enhance_level_to_arg, + 'Contrast': _enhance_level_to_arg, + 'Brightness': _enhance_level_to_arg, + 'Sharpness': _enhance_level_to_arg, + 'Cutout': + lambda level: (int((level / _MAX_LEVEL) * 100), ), # hparams.cutout_const=100 + # pylint:disable=g-long-lambda + 'BBox_Cutout': lambda level: _bbox_cutout_level_to_arg(level, hparams), + 'TranslateX_BBox': + lambda level: _translate_level_to_arg(level, 250), # hparams.translate_const=250 + 'TranslateY_BBox': + lambda level: _translate_level_to_arg(level, 250), # hparams.translate_cons + # pylint:enable=g-long-lambda + 'ShearX_BBox': _shear_level_to_arg, + 'ShearY_BBox': _shear_level_to_arg, + 'Rotate_BBox': _rotate_level_to_arg, + 'Rotate_Only_BBoxes': _rotate_level_to_arg, + 'ShearX_Only_BBoxes': _shear_level_to_arg, + 'ShearY_Only_BBoxes': _shear_level_to_arg, + # pylint:disable=g-long-lambda + 'TranslateX_Only_BBoxes': + lambda level: _translate_level_to_arg(level, 120), # hparams.translate_bbox_const + 'TranslateY_Only_BBoxes': + lambda level: _translate_level_to_arg(level, 120), # hparams.translate_bbox_const + # pylint:enable=g-long-lambda + 'Flip_Only_BBoxes': lambda level: (), + 'Solarize_Only_BBoxes': + lambda level: (int((level / _MAX_LEVEL) * 256), ), + 'Equalize_Only_BBoxes': lambda level: (), + # pylint:disable=g-long-lambda + 'Cutout_Only_BBoxes': + lambda level: (int((level / _MAX_LEVEL) * 50), ), # hparams.cutout_bbox_const + # pylint:enable=g-long-lambda + } + + +def bbox_wrapper(func): + """Adds a bboxes function argument to func and returns unchanged bboxes.""" + + def wrapper(images, bboxes, *args, **kwargs): + return (func(images, *args, **kwargs), bboxes) + + return wrapper + + +def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams): + """Return the function that corresponds to `name` and update `level` param.""" + func = NAME_TO_FUNC[name] + args = level_to_arg(augmentation_hparams)[name](level) + + # Check to see if prob is passed into function. This is used for operations + # where we alter bboxes independently. + # pytype:disable=wrong-arg-types + if 'prob' in inspect.getfullargspec(func)[0]: + args = tuple([prob] + list(args)) + # pytype:enable=wrong-arg-types + + # Add in replace arg if it is required for the function that is being called. + if 'replace' in inspect.getfullargspec(func)[0]: + # Make sure replace is the final argument + assert 'replace' == inspect.getfullargspec(func)[0][-1] + args = tuple(list(args) + [replace_value]) + + # Add bboxes as the second positional argument for the function if it does + # not already exist. + if 'bboxes' not in inspect.getfullargspec(func)[0]: + func = bbox_wrapper(func) + return (func, prob, args) + + +def _apply_func_with_prob(func, image, args, prob, bboxes): + """Apply `func` to image w/ `args` as input with probability `prob`.""" + assert isinstance(args, tuple) + assert 'bboxes' == inspect.getfullargspec(func)[0][1] + + # If prob is a function argument, then this randomness is being handled + # inside the function, so make sure it is always called. + if 'prob' in inspect.getfullargspec(func)[0]: + prob = 1.0 + + # Apply the function with probability `prob`. + should_apply_op = np.floor(np.random.rand() + 0.5) >= 1 + if should_apply_op: + augmented_image, augmented_bboxes = func(image, bboxes, *args) + else: + augmented_image, augmented_bboxes = (image, bboxes) + return augmented_image, augmented_bboxes + + +def select_and_apply_random_policy(policies, image, bboxes): + """Select a random policy from `policies` and apply it to `image`.""" + policy_to_select = np.random.randint(0, len(policies), dtype=np.int32) + # policy_to_select = 6 # for test + for (i, policy) in enumerate(policies): + if i == policy_to_select: + image, bboxes = policy(image, bboxes) + return (image, bboxes) + + +def build_and_apply_nas_policy(policies, image, bboxes, augmentation_hparams): + """Build a policy from the given policies passed in and apply to image. + + Args: + policies: list of lists of tuples in the form `(func, prob, level)`, `func` + is a string name of the augmentation function, `prob` is the probability + of applying the `func` operation, `level` is the input argument for + `func`. + image: numpy array that the resulting policy will be applied to. + bboxes: + augmentation_hparams: Hparams associated with the NAS learned policy. + + Returns: + A version of image that now has data augmentation applied to it based on + the `policies` pass into the function. Additionally, returns bboxes if + a value for them is passed in that is not None + """ + replace_value = [128, 128, 128] + + # func is the string name of the augmentation function, prob is the + # probability of applying the operation and level is the parameter associated + + # tf_policies are functions that take in an image and return an augmented + # image. + tf_policies = [] + for policy in policies: + tf_policy = [] + # Link string name to the correct python function and make sure the correct + # argument is passed into that function. + for policy_info in policy: + policy_info = list( + policy_info) + [replace_value, augmentation_hparams] + + tf_policy.append(_parse_policy_info(*policy_info)) + # Now build the tf policy that will apply the augmentation procedue + # on image. + def make_final_policy(tf_policy_): + def final_policy(image_, bboxes_): + for func, prob, args in tf_policy_: + image_, bboxes_ = _apply_func_with_prob(func, image_, args, + prob, bboxes_) + return image_, bboxes_ + + return final_policy + + tf_policies.append(make_final_policy(tf_policy)) + + augmented_images, augmented_bboxes = select_and_apply_random_policy( + tf_policies, image, bboxes) + # If no bounding boxes were specified, then just return the images. + return (augmented_images, augmented_bboxes) + + +# TODO(barretzoph): Add in ArXiv link once paper is out. +def distort_image_with_autoaugment(image, bboxes, augmentation_name): + """Applies the AutoAugment policy to `image` and `bboxes`. + + Args: + image: `Tensor` of shape [height, width, 3] representing an image. + bboxes: `Tensor` of shape [N, 4] representing ground truth boxes that are + normalized between [0, 1]. + augmentation_name: The name of the AutoAugment policy to use. The available + options are `v0`, `v1`, `v2`, `v3` and `test`. `v0` is the policy used for + all of the results in the paper and was found to achieve the best results + on the COCO dataset. `v1`, `v2` and `v3` are additional good policies + found on the COCO dataset that have slight variation in what operations + were used during the search procedure along with how many operations are + applied in parallel to a single image (2 vs 3). + + Returns: + A tuple containing the augmented versions of `image` and `bboxes`. + """ + available_policies = { + 'v0': policy_v0, + 'v1': policy_v1, + 'v2': policy_v2, + 'v3': policy_v3, + 'test': policy_vtest + } + if augmentation_name not in available_policies: + raise ValueError('Invalid augmentation_name: {}'.format( + augmentation_name)) + + policy = available_policies[augmentation_name]() + augmentation_hparams = {} + return build_and_apply_nas_policy(policy, image, bboxes, + augmentation_hparams) diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/batch_operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/batch_operators.py new file mode 100644 index 0000000000000000000000000000000000000000..2637db43d217e5b9bcbc7900f396f03bf4f5319e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/batch_operators.py @@ -0,0 +1,1484 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import typing + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +import cv2 +import copy +import math +import numpy as np +from .operators import register_op, BaseOperator, Resize +from .op_helper import jaccard_overlap, gaussian2D, gaussian_radius, draw_umich_gaussian +from .atss_assigner import ATSSAssigner +from scipy import ndimage + +from ppdet.modeling import bbox_utils +from ppdet.utils.logger import setup_logger +from ppdet.modeling.keypoint_utils import get_affine_transform, affine_transform +logger = setup_logger(__name__) + +__all__ = [ + 'PadBatch', + 'BatchRandomResize', + 'Gt2YoloTarget', + 'Gt2FCOSTarget', + 'Gt2TTFTarget', + 'Gt2Solov2Target', + 'Gt2SparseTarget', + 'PadMaskBatch', + 'Gt2GFLTarget', + 'Gt2CenterNetTarget', + 'Gt2CenterTrackTarget', + 'PadGT', + 'PadRGT', +] + + +@register_op +class PadBatch(BaseOperator): + """ + Pad a batch of samples so they can be divisible by a stride. + The layout of each image should be 'CHW'. + Args: + pad_to_stride (int): If `pad_to_stride > 0`, pad zeros to ensure + height and width is divisible by `pad_to_stride`. + """ + + def __init__(self, pad_to_stride=0): + super(PadBatch, self).__init__() + self.pad_to_stride = pad_to_stride + + def __call__(self, samples, context=None): + """ + Args: + samples (list): a batch of sample, each is dict. + """ + coarsest_stride = self.pad_to_stride + + # multi scale input is nested list + if isinstance(samples, + typing.Sequence) and len(samples) > 0 and isinstance( + samples[0], typing.Sequence): + inner_samples = samples[0] + else: + inner_samples = samples + + max_shape = np.array( + [data['image'].shape for data in inner_samples]).max(axis=0) + if coarsest_stride > 0: + max_shape[1] = int( + np.ceil(max_shape[1] / coarsest_stride) * coarsest_stride) + max_shape[2] = int( + np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride) + + for data in inner_samples: + im = data['image'] + im_c, im_h, im_w = im.shape[:] + padding_im = np.zeros( + (im_c, max_shape[1], max_shape[2]), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + data['image'] = padding_im + if 'semantic' in data and data['semantic'] is not None: + semantic = data['semantic'] + padding_sem = np.zeros( + (1, max_shape[1], max_shape[2]), dtype=np.float32) + padding_sem[:, :im_h, :im_w] = semantic + data['semantic'] = padding_sem + if 'gt_segm' in data and data['gt_segm'] is not None: + gt_segm = data['gt_segm'] + padding_segm = np.zeros( + (gt_segm.shape[0], max_shape[1], max_shape[2]), + dtype=np.uint8) + padding_segm[:, :im_h, :im_w] = gt_segm + data['gt_segm'] = padding_segm + + return samples + + +@register_op +class BatchRandomResize(BaseOperator): + """ + Resize image to target size randomly. random target_size and interpolation method + Args: + target_size (int, list, tuple): image target size, if random size is True, must be list or tuple + keep_ratio (bool): whether keep_raio or not, default true + interp (int): the interpolation method + random_size (bool): whether random select target size of image + random_interp (bool): whether random select interpolation method + """ + + def __init__(self, + target_size, + keep_ratio, + interp=cv2.INTER_NEAREST, + random_size=True, + random_interp=False): + super(BatchRandomResize, self).__init__() + self.keep_ratio = keep_ratio + self.interps = [ + cv2.INTER_NEAREST, + cv2.INTER_LINEAR, + cv2.INTER_AREA, + cv2.INTER_CUBIC, + cv2.INTER_LANCZOS4, + ] + self.interp = interp + assert isinstance(target_size, ( + int, Sequence)), "target_size must be int, list or tuple" + if random_size and not isinstance(target_size, list): + raise TypeError( + "Type of target_size is invalid when random_size is True. Must be List, now is {}". + format(type(target_size))) + self.target_size = target_size + self.random_size = random_size + self.random_interp = random_interp + + def __call__(self, samples, context=None): + if self.random_size: + index = np.random.choice(len(self.target_size)) + target_size = self.target_size[index] + else: + target_size = self.target_size + + if self.random_interp: + interp = np.random.choice(self.interps) + else: + interp = self.interp + + resizer = Resize(target_size, keep_ratio=self.keep_ratio, interp=interp) + return resizer(samples, context=context) + + +@register_op +class Gt2YoloTarget(BaseOperator): + __shared__ = ['num_classes'] + """ + Generate YOLOv3 targets by groud truth data, this operator is only used in + fine grained YOLOv3 loss mode + """ + + def __init__(self, + anchors, + anchor_masks, + downsample_ratios, + num_classes=80, + iou_thresh=1.): + super(Gt2YoloTarget, self).__init__() + self.anchors = anchors + self.anchor_masks = anchor_masks + self.downsample_ratios = downsample_ratios + self.num_classes = num_classes + self.iou_thresh = iou_thresh + + def __call__(self, samples, context=None): + assert len(self.anchor_masks) == len(self.downsample_ratios), \ + "anchor_masks', and 'downsample_ratios' should have same length." + + h, w = samples[0]['image'].shape[1:3] + an_hw = np.array(self.anchors) / np.array([[w, h]]) + for sample in samples: + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + if 'gt_score' not in sample: + sample['gt_score'] = np.ones( + (gt_bbox.shape[0], 1), dtype=np.float32) + gt_score = sample['gt_score'] + for i, ( + mask, downsample_ratio + ) in enumerate(zip(self.anchor_masks, self.downsample_ratios)): + grid_h = int(h / downsample_ratio) + grid_w = int(w / downsample_ratio) + target = np.zeros( + (len(mask), 6 + self.num_classes, grid_h, grid_w), + dtype=np.float32) + for b in range(gt_bbox.shape[0]): + gx, gy, gw, gh = gt_bbox[b, :] + cls = gt_class[b] + score = gt_score[b] + if gw <= 0. or gh <= 0. or score <= 0.: + continue + + # find best match anchor index + best_iou = 0. + best_idx = -1 + for an_idx in range(an_hw.shape[0]): + iou = jaccard_overlap( + [0., 0., gw, gh], + [0., 0., an_hw[an_idx, 0], an_hw[an_idx, 1]]) + if iou > best_iou: + best_iou = iou + best_idx = an_idx + + gi = int(gx * grid_w) + gj = int(gy * grid_h) + + # gtbox should be regresed in this layes if best match + # anchor index in anchor mask of this layer + if best_idx in mask: + best_n = mask.index(best_idx) + + # x, y, w, h, scale + target[best_n, 0, gj, gi] = gx * grid_w - gi + target[best_n, 1, gj, gi] = gy * grid_h - gj + target[best_n, 2, gj, gi] = np.log( + gw * w / self.anchors[best_idx][0]) + target[best_n, 3, gj, gi] = np.log( + gh * h / self.anchors[best_idx][1]) + target[best_n, 4, gj, gi] = 2.0 - gw * gh + + # objectness record gt_score + target[best_n, 5, gj, gi] = score + + # classification + target[best_n, 6 + cls, gj, gi] = 1. + + # For non-matched anchors, calculate the target if the iou + # between anchor and gt is larger than iou_thresh + if self.iou_thresh < 1: + for idx, mask_i in enumerate(mask): + if mask_i == best_idx: continue + iou = jaccard_overlap( + [0., 0., gw, gh], + [0., 0., an_hw[mask_i, 0], an_hw[mask_i, 1]]) + if iou > self.iou_thresh and target[idx, 5, gj, + gi] == 0.: + # x, y, w, h, scale + target[idx, 0, gj, gi] = gx * grid_w - gi + target[idx, 1, gj, gi] = gy * grid_h - gj + target[idx, 2, gj, gi] = np.log( + gw * w / self.anchors[mask_i][0]) + target[idx, 3, gj, gi] = np.log( + gh * h / self.anchors[mask_i][1]) + target[idx, 4, gj, gi] = 2.0 - gw * gh + + # objectness record gt_score + target[idx, 5, gj, gi] = score + + # classification + target[idx, 6 + cls, gj, gi] = 1. + sample['target{}'.format(i)] = target + + # remove useless gt_class and gt_score after target calculated + sample.pop('gt_class') + sample.pop('gt_score') + + return samples + + +@register_op +class Gt2FCOSTarget(BaseOperator): + """ + Generate FCOS targets by groud truth data + """ + + def __init__(self, + object_sizes_boundary, + center_sampling_radius, + downsample_ratios, + num_shift=0.5, + multiply_strides_reg_targets=False, + norm_reg_targets=True): + super(Gt2FCOSTarget, self).__init__() + self.center_sampling_radius = center_sampling_radius + self.downsample_ratios = downsample_ratios + self.INF = np.inf + self.object_sizes_boundary = [-1] + object_sizes_boundary + [self.INF] + object_sizes_of_interest = [] + for i in range(len(self.object_sizes_boundary) - 1): + object_sizes_of_interest.append([ + self.object_sizes_boundary[i], self.object_sizes_boundary[i + 1] + ]) + self.object_sizes_of_interest = object_sizes_of_interest + self.num_shift = num_shift + self.multiply_strides_reg_targets = multiply_strides_reg_targets + self.norm_reg_targets = norm_reg_targets + + def _compute_points(self, w, h): + """ + compute the corresponding points in each feature map + :param h: image height + :param w: image width + :return: points from all feature map + """ + locations = [] + for stride in self.downsample_ratios: + shift_x = np.arange(0, w, stride).astype(np.float32) + shift_y = np.arange(0, h, stride).astype(np.float32) + shift_x, shift_y = np.meshgrid(shift_x, shift_y) + shift_x = shift_x.flatten() + shift_y = shift_y.flatten() + location = np.stack( + [shift_x, shift_y], axis=1) + stride * self.num_shift + locations.append(location) + num_points_each_level = [len(location) for location in locations] + locations = np.concatenate(locations, axis=0) + return locations, num_points_each_level + + def _convert_xywh2xyxy(self, gt_bbox, w, h): + """ + convert the bounding box from style xywh to xyxy + :param gt_bbox: bounding boxes normalized into [0, 1] + :param w: image width + :param h: image height + :return: bounding boxes in xyxy style + """ + bboxes = gt_bbox.copy() + bboxes[:, [0, 2]] = bboxes[:, [0, 2]] * w + bboxes[:, [1, 3]] = bboxes[:, [1, 3]] * h + bboxes[:, 2] = bboxes[:, 0] + bboxes[:, 2] + bboxes[:, 3] = bboxes[:, 1] + bboxes[:, 3] + return bboxes + + def _check_inside_boxes_limited(self, gt_bbox, xs, ys, + num_points_each_level): + """ + check if points is within the clipped boxes + :param gt_bbox: bounding boxes + :param xs: horizontal coordinate of points + :param ys: vertical coordinate of points + :return: the mask of points is within gt_box or not + """ + bboxes = np.reshape( + gt_bbox, newshape=[1, gt_bbox.shape[0], gt_bbox.shape[1]]) + bboxes = np.tile(bboxes, reps=[xs.shape[0], 1, 1]) + ct_x = (bboxes[:, :, 0] + bboxes[:, :, 2]) / 2 + ct_y = (bboxes[:, :, 1] + bboxes[:, :, 3]) / 2 + beg = 0 + clipped_box = bboxes.copy() + for lvl, stride in enumerate(self.downsample_ratios): + end = beg + num_points_each_level[lvl] + stride_exp = self.center_sampling_radius * stride + clipped_box[beg:end, :, 0] = np.maximum( + bboxes[beg:end, :, 0], ct_x[beg:end, :] - stride_exp) + clipped_box[beg:end, :, 1] = np.maximum( + bboxes[beg:end, :, 1], ct_y[beg:end, :] - stride_exp) + clipped_box[beg:end, :, 2] = np.minimum( + bboxes[beg:end, :, 2], ct_x[beg:end, :] + stride_exp) + clipped_box[beg:end, :, 3] = np.minimum( + bboxes[beg:end, :, 3], ct_y[beg:end, :] + stride_exp) + beg = end + l_res = xs - clipped_box[:, :, 0] + r_res = clipped_box[:, :, 2] - xs + t_res = ys - clipped_box[:, :, 1] + b_res = clipped_box[:, :, 3] - ys + clipped_box_reg_targets = np.stack([l_res, t_res, r_res, b_res], axis=2) + inside_gt_box = np.min(clipped_box_reg_targets, axis=2) > 0 + return inside_gt_box + + def __call__(self, samples, context=None): + assert len(self.object_sizes_of_interest) == len(self.downsample_ratios), \ + "object_sizes_of_interest', and 'downsample_ratios' should have same length." + + for sample in samples: + im = sample['image'] + bboxes = sample['gt_bbox'] + gt_class = sample['gt_class'] + # calculate the locations + h, w = im.shape[1:3] + points, num_points_each_level = self._compute_points(w, h) + object_scale_exp = [] + for i, num_pts in enumerate(num_points_each_level): + object_scale_exp.append( + np.tile( + np.array([self.object_sizes_of_interest[i]]), + reps=[num_pts, 1])) + object_scale_exp = np.concatenate(object_scale_exp, axis=0) + + gt_area = (bboxes[:, 2] - bboxes[:, 0]) * ( + bboxes[:, 3] - bboxes[:, 1]) + xs, ys = points[:, 0], points[:, 1] + xs = np.reshape(xs, newshape=[xs.shape[0], 1]) + xs = np.tile(xs, reps=[1, bboxes.shape[0]]) + ys = np.reshape(ys, newshape=[ys.shape[0], 1]) + ys = np.tile(ys, reps=[1, bboxes.shape[0]]) + + l_res = xs - bboxes[:, 0] + r_res = bboxes[:, 2] - xs + t_res = ys - bboxes[:, 1] + b_res = bboxes[:, 3] - ys + reg_targets = np.stack([l_res, t_res, r_res, b_res], axis=2) + if self.center_sampling_radius > 0: + is_inside_box = self._check_inside_boxes_limited( + bboxes, xs, ys, num_points_each_level) + else: + is_inside_box = np.min(reg_targets, axis=2) > 0 + # check if the targets is inside the corresponding level + max_reg_targets = np.max(reg_targets, axis=2) + lower_bound = np.tile( + np.expand_dims( + object_scale_exp[:, 0], axis=1), + reps=[1, max_reg_targets.shape[1]]) + high_bound = np.tile( + np.expand_dims( + object_scale_exp[:, 1], axis=1), + reps=[1, max_reg_targets.shape[1]]) + is_match_current_level = \ + (max_reg_targets > lower_bound) & \ + (max_reg_targets < high_bound) + points2gtarea = np.tile( + np.expand_dims( + gt_area, axis=0), reps=[xs.shape[0], 1]) + points2gtarea[is_inside_box == 0] = self.INF + points2gtarea[is_match_current_level == 0] = self.INF + points2min_area = points2gtarea.min(axis=1) + points2min_area_ind = points2gtarea.argmin(axis=1) + labels = gt_class[points2min_area_ind] + 1 + labels[points2min_area == self.INF] = 0 + reg_targets = reg_targets[range(xs.shape[0]), points2min_area_ind] + ctn_targets = np.sqrt((reg_targets[:, [0, 2]].min(axis=1) / \ + reg_targets[:, [0, 2]].max(axis=1)) * \ + (reg_targets[:, [1, 3]].min(axis=1) / \ + reg_targets[:, [1, 3]].max(axis=1))).astype(np.float32) + ctn_targets = np.reshape( + ctn_targets, newshape=[ctn_targets.shape[0], 1]) + ctn_targets[labels <= 0] = 0 + pos_ind = np.nonzero(labels != 0) + reg_targets_pos = reg_targets[pos_ind[0], :] + split_sections = [] + beg = 0 + for lvl in range(len(num_points_each_level)): + end = beg + num_points_each_level[lvl] + split_sections.append(end) + beg = end + labels_by_level = np.split(labels, split_sections, axis=0) + reg_targets_by_level = np.split(reg_targets, split_sections, axis=0) + ctn_targets_by_level = np.split(ctn_targets, split_sections, axis=0) + for lvl in range(len(self.downsample_ratios)): + grid_w = int(np.ceil(w / self.downsample_ratios[lvl])) + grid_h = int(np.ceil(h / self.downsample_ratios[lvl])) + if self.norm_reg_targets: + if self.multiply_strides_reg_targets: + sample['reg_target{}'.format(lvl)] = np.reshape( + reg_targets_by_level[lvl], + newshape=[grid_h, grid_w, 4]) + else: + sample['reg_target{}'.format(lvl)] = \ + np.reshape( + reg_targets_by_level[lvl] / \ + self.downsample_ratios[lvl], + newshape=[grid_h, grid_w, 4]) + else: + sample['reg_target{}'.format(lvl)] = np.reshape( + reg_targets_by_level[lvl], + newshape=[grid_h, grid_w, 4]) + sample['labels{}'.format(lvl)] = np.reshape( + labels_by_level[lvl], newshape=[grid_h, grid_w, 1]) + sample['centerness{}'.format(lvl)] = np.reshape( + ctn_targets_by_level[lvl], newshape=[grid_h, grid_w, 1]) + + sample.pop('is_crowd', None) + sample.pop('difficult', None) + sample.pop('gt_class', None) + sample.pop('gt_bbox', None) + return samples + + +@register_op +class Gt2GFLTarget(BaseOperator): + __shared__ = ['num_classes'] + """ + Generate GFocal loss targets by groud truth data + """ + + def __init__(self, + num_classes=80, + downsample_ratios=[8, 16, 32, 64, 128], + grid_cell_scale=4, + cell_offset=0, + compute_vlr_region=False): + super(Gt2GFLTarget, self).__init__() + self.num_classes = num_classes + self.downsample_ratios = downsample_ratios + self.grid_cell_scale = grid_cell_scale + self.cell_offset = cell_offset + self.compute_vlr_region = compute_vlr_region + + self.assigner = ATSSAssigner() + + def get_grid_cells(self, featmap_size, scale, stride, offset=0): + """ + Generate grid cells of a feature map for target assignment. + Args: + featmap_size: Size of a single level feature map. + scale: Grid cell scale. + stride: Down sample stride of the feature map. + offset: Offset of grid cells. + return: + Grid_cells xyxy position. Size should be [feat_w * feat_h, 4] + """ + cell_size = stride * scale + h, w = featmap_size + x_range = (np.arange(w, dtype=np.float32) + offset) * stride + y_range = (np.arange(h, dtype=np.float32) + offset) * stride + x, y = np.meshgrid(x_range, y_range) + y = y.flatten() + x = x.flatten() + grid_cells = np.stack( + [ + x - 0.5 * cell_size, y - 0.5 * cell_size, x + 0.5 * cell_size, + y + 0.5 * cell_size + ], + axis=-1) + return grid_cells + + def get_sample(self, assign_gt_inds, gt_bboxes): + pos_inds = np.unique(np.nonzero(assign_gt_inds > 0)[0]) + neg_inds = np.unique(np.nonzero(assign_gt_inds == 0)[0]) + pos_assigned_gt_inds = assign_gt_inds[pos_inds] - 1 + + if gt_bboxes.size == 0: + # hack for index error case + assert pos_assigned_gt_inds.size == 0 + pos_gt_bboxes = np.empty_like(gt_bboxes).reshape(-1, 4) + else: + if len(gt_bboxes.shape) < 2: + gt_bboxes = gt_bboxes.resize(-1, 4) + pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :] + return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds + + def __call__(self, samples, context=None): + assert len(samples) > 0 + batch_size = len(samples) + # get grid cells of image + h, w = samples[0]['image'].shape[1:3] + multi_level_grid_cells = [] + for stride in self.downsample_ratios: + featmap_size = (int(math.ceil(h / stride)), + int(math.ceil(w / stride))) + multi_level_grid_cells.append( + self.get_grid_cells(featmap_size, self.grid_cell_scale, stride, + self.cell_offset)) + mlvl_grid_cells_list = [ + multi_level_grid_cells for i in range(batch_size) + ] + # pixel cell number of multi-level feature maps + num_level_cells = [ + grid_cells.shape[0] for grid_cells in mlvl_grid_cells_list[0] + ] + num_level_cells_list = [num_level_cells] * batch_size + # concat all level cells and to a single array + for i in range(batch_size): + mlvl_grid_cells_list[i] = np.concatenate(mlvl_grid_cells_list[i]) + # target assign on all images + for sample, grid_cells, num_level_cells in zip( + samples, mlvl_grid_cells_list, num_level_cells_list): + gt_bboxes = sample['gt_bbox'] + gt_labels = sample['gt_class'].squeeze() + if gt_labels.size == 1: + gt_labels = np.array([gt_labels]).astype(np.int32) + gt_bboxes_ignore = None + assign_gt_inds, _ = self.assigner(grid_cells, num_level_cells, + gt_bboxes, gt_bboxes_ignore, + gt_labels) + + if self.compute_vlr_region: + vlr_region = self.assigner.get_vlr_region( + grid_cells, num_level_cells, gt_bboxes, gt_bboxes_ignore, + gt_labels) + sample['vlr_regions'] = vlr_region + + pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds = self.get_sample( + assign_gt_inds, gt_bboxes) + + num_cells = grid_cells.shape[0] + bbox_targets = np.zeros_like(grid_cells) + bbox_weights = np.zeros_like(grid_cells) + labels = np.ones([num_cells], dtype=np.int64) * self.num_classes + label_weights = np.zeros([num_cells], dtype=np.float32) + + if len(pos_inds) > 0: + pos_bbox_targets = pos_gt_bboxes + bbox_targets[pos_inds, :] = pos_bbox_targets + bbox_weights[pos_inds, :] = 1.0 + if not np.any(gt_labels): + labels[pos_inds] = 0 + else: + labels[pos_inds] = gt_labels[pos_assigned_gt_inds] + + label_weights[pos_inds] = 1.0 + if len(neg_inds) > 0: + label_weights[neg_inds] = 1.0 + sample['grid_cells'] = grid_cells + sample['labels'] = labels + sample['label_weights'] = label_weights + sample['bbox_targets'] = bbox_targets + sample['pos_num'] = max(pos_inds.size, 1) + sample.pop('is_crowd', None) + sample.pop('difficult', None) + sample.pop('gt_class', None) + sample.pop('gt_bbox', None) + sample.pop('gt_score', None) + return samples + + +@register_op +class Gt2TTFTarget(BaseOperator): + __shared__ = ['num_classes'] + """ + Gt2TTFTarget + Generate TTFNet targets by ground truth data + + Args: + num_classes(int): the number of classes. + down_ratio(int): the down ratio from images to heatmap, 4 by default. + alpha(float): the alpha parameter to generate gaussian target. + 0.54 by default. + """ + + def __init__(self, num_classes=80, down_ratio=4, alpha=0.54): + super(Gt2TTFTarget, self).__init__() + self.down_ratio = down_ratio + self.num_classes = num_classes + self.alpha = alpha + + def __call__(self, samples, context=None): + output_size = samples[0]['image'].shape[1] + feat_size = output_size // self.down_ratio + for sample in samples: + heatmap = np.zeros( + (self.num_classes, feat_size, feat_size), dtype='float32') + box_target = np.ones( + (4, feat_size, feat_size), dtype='float32') * -1 + reg_weight = np.zeros((1, feat_size, feat_size), dtype='float32') + + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + + bbox_w = gt_bbox[:, 2] - gt_bbox[:, 0] + 1 + bbox_h = gt_bbox[:, 3] - gt_bbox[:, 1] + 1 + area = bbox_w * bbox_h + boxes_areas_log = np.log(area) + boxes_ind = np.argsort(boxes_areas_log, axis=0)[::-1] + boxes_area_topk_log = boxes_areas_log[boxes_ind] + gt_bbox = gt_bbox[boxes_ind] + gt_class = gt_class[boxes_ind] + + feat_gt_bbox = gt_bbox / self.down_ratio + feat_gt_bbox = np.clip(feat_gt_bbox, 0, feat_size - 1) + feat_hs, feat_ws = (feat_gt_bbox[:, 3] - feat_gt_bbox[:, 1], + feat_gt_bbox[:, 2] - feat_gt_bbox[:, 0]) + + ct_inds = np.stack( + [(gt_bbox[:, 0] + gt_bbox[:, 2]) / 2, + (gt_bbox[:, 1] + gt_bbox[:, 3]) / 2], + axis=1) / self.down_ratio + + h_radiuses_alpha = (feat_hs / 2. * self.alpha).astype('int32') + w_radiuses_alpha = (feat_ws / 2. * self.alpha).astype('int32') + + for k in range(len(gt_bbox)): + cls_id = gt_class[k] + fake_heatmap = np.zeros((feat_size, feat_size), dtype='float32') + self.draw_truncate_gaussian(fake_heatmap, ct_inds[k], + h_radiuses_alpha[k], + w_radiuses_alpha[k]) + + heatmap[cls_id] = np.maximum(heatmap[cls_id], fake_heatmap) + box_target_inds = fake_heatmap > 0 + box_target[:, box_target_inds] = gt_bbox[k][:, None] + + local_heatmap = fake_heatmap[box_target_inds] + ct_div = np.sum(local_heatmap) + local_heatmap *= boxes_area_topk_log[k] + reg_weight[0, box_target_inds] = local_heatmap / ct_div + sample['ttf_heatmap'] = heatmap + sample['ttf_box_target'] = box_target + sample['ttf_reg_weight'] = reg_weight + sample.pop('is_crowd', None) + sample.pop('difficult', None) + sample.pop('gt_class', None) + sample.pop('gt_bbox', None) + sample.pop('gt_score', None) + return samples + + def draw_truncate_gaussian(self, heatmap, center, h_radius, w_radius): + h, w = 2 * h_radius + 1, 2 * w_radius + 1 + sigma_x = w / 6 + sigma_y = h / 6 + gaussian = gaussian2D((h, w), sigma_x, sigma_y) + + x, y = int(center[0]), int(center[1]) + + height, width = heatmap.shape[0:2] + + left, right = min(x, w_radius), min(width - x, w_radius + 1) + top, bottom = min(y, h_radius), min(height - y, h_radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[h_radius - top:h_radius + bottom, w_radius - + left:w_radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: + heatmap[y - top:y + bottom, x - left:x + right] = np.maximum( + masked_heatmap, masked_gaussian) + return heatmap + + +@register_op +class Gt2Solov2Target(BaseOperator): + """Assign mask target and labels in SOLOv2 network. + The code of this function is based on: + https://github.com/WXinlong/SOLO/blob/master/mmdet/models/anchor_heads/solov2_head.py#L271 + Args: + num_grids (list): The list of feature map grids size. + scale_ranges (list): The list of mask boundary range. + coord_sigma (float): The coefficient of coordinate area length. + sampling_ratio (float): The ratio of down sampling. + """ + + def __init__(self, + num_grids=[40, 36, 24, 16, 12], + scale_ranges=[[1, 96], [48, 192], [96, 384], [192, 768], + [384, 2048]], + coord_sigma=0.2, + sampling_ratio=4.0): + super(Gt2Solov2Target, self).__init__() + self.num_grids = num_grids + self.scale_ranges = scale_ranges + self.coord_sigma = coord_sigma + self.sampling_ratio = sampling_ratio + + def _scale_size(self, im, scale): + h, w = im.shape[:2] + new_size = (int(w * float(scale) + 0.5), int(h * float(scale) + 0.5)) + resized_img = cv2.resize( + im, None, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR) + return resized_img + + def __call__(self, samples, context=None): + sample_id = 0 + max_ins_num = [0] * len(self.num_grids) + for sample in samples: + gt_bboxes_raw = sample['gt_bbox'] + gt_labels_raw = sample['gt_class'] + 1 + im_c, im_h, im_w = sample['image'].shape[:] + gt_masks_raw = sample['gt_segm'].astype(np.uint8) + mask_feat_size = [ + int(im_h / self.sampling_ratio), int(im_w / self.sampling_ratio) + ] + gt_areas = np.sqrt((gt_bboxes_raw[:, 2] - gt_bboxes_raw[:, 0]) * + (gt_bboxes_raw[:, 3] - gt_bboxes_raw[:, 1])) + ins_ind_label_list = [] + idx = 0 + for (lower_bound, upper_bound), num_grid \ + in zip(self.scale_ranges, self.num_grids): + + hit_indices = ((gt_areas >= lower_bound) & + (gt_areas <= upper_bound)).nonzero()[0] + num_ins = len(hit_indices) + + ins_label = [] + grid_order = [] + cate_label = np.zeros([num_grid, num_grid], dtype=np.int64) + ins_ind_label = np.zeros([num_grid**2], dtype=np.bool_) + + if num_ins == 0: + ins_label = np.zeros( + [1, mask_feat_size[0], mask_feat_size[1]], + dtype=np.uint8) + ins_ind_label_list.append(ins_ind_label) + sample['cate_label{}'.format(idx)] = cate_label.flatten() + sample['ins_label{}'.format(idx)] = ins_label + sample['grid_order{}'.format(idx)] = np.asarray( + [sample_id * num_grid * num_grid + 0], dtype=np.int32) + idx += 1 + continue + gt_bboxes = gt_bboxes_raw[hit_indices] + gt_labels = gt_labels_raw[hit_indices] + gt_masks = gt_masks_raw[hit_indices, ...] + + half_ws = 0.5 * ( + gt_bboxes[:, 2] - gt_bboxes[:, 0]) * self.coord_sigma + half_hs = 0.5 * ( + gt_bboxes[:, 3] - gt_bboxes[:, 1]) * self.coord_sigma + + for seg_mask, gt_label, half_h, half_w in zip( + gt_masks, gt_labels, half_hs, half_ws): + if seg_mask.sum() == 0: + continue + # mass center + upsampled_size = (mask_feat_size[0] * 4, + mask_feat_size[1] * 4) + center_h, center_w = ndimage.measurements.center_of_mass( + seg_mask) + coord_w = int( + (center_w / upsampled_size[1]) // (1. / num_grid)) + coord_h = int( + (center_h / upsampled_size[0]) // (1. / num_grid)) + + # left, top, right, down + top_box = max(0, + int(((center_h - half_h) / upsampled_size[0]) + // (1. / num_grid))) + down_box = min(num_grid - 1, + int(((center_h + half_h) / upsampled_size[0]) + // (1. / num_grid))) + left_box = max(0, + int(((center_w - half_w) / upsampled_size[1]) + // (1. / num_grid))) + right_box = min(num_grid - 1, + int(((center_w + half_w) / + upsampled_size[1]) // (1. / num_grid))) + + top = max(top_box, coord_h - 1) + down = min(down_box, coord_h + 1) + left = max(coord_w - 1, left_box) + right = min(right_box, coord_w + 1) + + cate_label[top:(down + 1), left:(right + 1)] = gt_label + seg_mask = self._scale_size( + seg_mask, scale=1. / self.sampling_ratio) + for i in range(top, down + 1): + for j in range(left, right + 1): + label = int(i * num_grid + j) + cur_ins_label = np.zeros( + [mask_feat_size[0], mask_feat_size[1]], + dtype=np.uint8) + cur_ins_label[:seg_mask.shape[0], :seg_mask.shape[ + 1]] = seg_mask + ins_label.append(cur_ins_label) + ins_ind_label[label] = True + grid_order.append(sample_id * num_grid * num_grid + + label) + if ins_label == []: + ins_label = np.zeros( + [1, mask_feat_size[0], mask_feat_size[1]], + dtype=np.uint8) + ins_ind_label_list.append(ins_ind_label) + sample['cate_label{}'.format(idx)] = cate_label.flatten() + sample['ins_label{}'.format(idx)] = ins_label + sample['grid_order{}'.format(idx)] = np.asarray( + [sample_id * num_grid * num_grid + 0], dtype=np.int32) + else: + ins_label = np.stack(ins_label, axis=0) + ins_ind_label_list.append(ins_ind_label) + sample['cate_label{}'.format(idx)] = cate_label.flatten() + sample['ins_label{}'.format(idx)] = ins_label + sample['grid_order{}'.format(idx)] = np.asarray( + grid_order, dtype=np.int32) + assert len(grid_order) > 0 + max_ins_num[idx] = max( + max_ins_num[idx], + sample['ins_label{}'.format(idx)].shape[0]) + idx += 1 + ins_ind_labels = np.concatenate([ + ins_ind_labels_level_img + for ins_ind_labels_level_img in ins_ind_label_list + ]) + fg_num = np.sum(ins_ind_labels) + sample['fg_num'] = fg_num + sample_id += 1 + + sample.pop('is_crowd') + sample.pop('gt_class') + sample.pop('gt_bbox') + sample.pop('gt_poly') + sample.pop('gt_segm') + + # padding batch + for data in samples: + for idx in range(len(self.num_grids)): + gt_ins_data = np.zeros( + [ + max_ins_num[idx], + data['ins_label{}'.format(idx)].shape[1], + data['ins_label{}'.format(idx)].shape[2] + ], + dtype=np.uint8) + gt_ins_data[0:data['ins_label{}'.format(idx)].shape[ + 0], :, :] = data['ins_label{}'.format(idx)] + gt_grid_order = np.zeros([max_ins_num[idx]], dtype=np.int32) + gt_grid_order[0:data['grid_order{}'.format(idx)].shape[ + 0]] = data['grid_order{}'.format(idx)] + data['ins_label{}'.format(idx)] = gt_ins_data + data['grid_order{}'.format(idx)] = gt_grid_order + + return samples + + +@register_op +class Gt2SparseTarget(BaseOperator): + def __init__(self, use_padding_shape=False): + super(Gt2SparseTarget, self).__init__() + self.use_padding_shape = use_padding_shape + + def __call__(self, samples, context=None): + for sample in samples: + ori_h, ori_w = sample['h'], sample['w'] + if self.use_padding_shape: + h, w = sample["image"].shape[1:3] + if "scale_factor" in sample: + sf_w, sf_h = sample["scale_factor"][1], sample[ + "scale_factor"][0] + sample["scale_factor_whwh"] = np.array( + [sf_w, sf_h, sf_w, sf_h], dtype=np.float32) + else: + sample["scale_factor_whwh"] = np.array( + [1.0, 1.0, 1.0, 1.0], dtype=np.float32) + else: + h, w = round(sample['im_shape'][0]), round(sample['im_shape'][ + 1]) + sample["scale_factor_whwh"] = np.array( + [w / ori_w, h / ori_h, w / ori_w, h / ori_h], + dtype=np.float32) + + sample["img_whwh"] = np.array([w, h, w, h], dtype=np.float32) + sample["ori_shape"] = np.array([ori_h, ori_w], dtype=np.int32) + + return samples + + +@register_op +class PadMaskBatch(BaseOperator): + """ + Pad a batch of samples so they can be divisible by a stride. + The layout of each image should be 'CHW'. + Args: + pad_to_stride (int): If `pad_to_stride > 0`, pad zeros to ensure + height and width is divisible by `pad_to_stride`. + return_pad_mask (bool): If `return_pad_mask = True`, return + `pad_mask` for transformer. + """ + + def __init__(self, pad_to_stride=0, return_pad_mask=False): + super(PadMaskBatch, self).__init__() + self.pad_to_stride = pad_to_stride + self.return_pad_mask = return_pad_mask + + def __call__(self, samples, context=None): + """ + Args: + samples (list): a batch of sample, each is dict. + """ + coarsest_stride = self.pad_to_stride + + max_shape = np.array([data['image'].shape for data in samples]).max( + axis=0) + if coarsest_stride > 0: + max_shape[1] = int( + np.ceil(max_shape[1] / coarsest_stride) * coarsest_stride) + max_shape[2] = int( + np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride) + + for data in samples: + im = data['image'] + im_c, im_h, im_w = im.shape[:] + padding_im = np.zeros( + (im_c, max_shape[1], max_shape[2]), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + data['image'] = padding_im + if 'semantic' in data and data['semantic'] is not None: + semantic = data['semantic'] + padding_sem = np.zeros( + (1, max_shape[1], max_shape[2]), dtype=np.float32) + padding_sem[:, :im_h, :im_w] = semantic + data['semantic'] = padding_sem + if 'gt_segm' in data and data['gt_segm'] is not None: + gt_segm = data['gt_segm'] + padding_segm = np.zeros( + (gt_segm.shape[0], max_shape[1], max_shape[2]), + dtype=np.uint8) + padding_segm[:, :im_h, :im_w] = gt_segm + data['gt_segm'] = padding_segm + if self.return_pad_mask: + padding_mask = np.zeros( + (max_shape[1], max_shape[2]), dtype=np.float32) + padding_mask[:im_h, :im_w] = 1. + data['pad_mask'] = padding_mask + + return samples + + +@register_op +class Gt2CenterNetTarget(BaseOperator): + __shared__ = ['num_classes'] + """Gt2CenterNetTarget + Genterate CenterNet targets by ground-truth + Args: + down_ratio (int): The down sample ratio between output feature and + input image. + num_classes (int): The number of classes, 80 by default. + max_objs (int): The maximum objects detected, 128 by default. + """ + + def __init__(self, num_classes=80, down_ratio=4, max_objs=128): + super(Gt2CenterNetTarget, self).__init__() + self.nc = num_classes + self.down_ratio = down_ratio + self.max_objs = max_objs + + def __call__(self, sample, context=None): + input_h, input_w = sample['image'].shape[1:] + output_h = input_h // self.down_ratio + output_w = input_w // self.down_ratio + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + + hm = np.zeros((self.nc, output_h, output_w), dtype=np.float32) + wh = np.zeros((self.max_objs, 2), dtype=np.float32) + reg = np.zeros((self.max_objs, 2), dtype=np.float32) + ind = np.zeros((self.max_objs), dtype=np.int64) + reg_mask = np.zeros((self.max_objs), dtype=np.int32) + cat_spec_wh = np.zeros((self.max_objs, self.nc * 2), dtype=np.float32) + cat_spec_mask = np.zeros((self.max_objs, self.nc * 2), dtype=np.int32) + + trans_output = get_affine_transform( + center=sample['center'], + input_size=[sample['scale'], sample['scale']], + rot=0, + output_size=[output_w, output_h]) + + gt_det = [] + for i, (bbox, cls) in enumerate(zip(gt_bbox, gt_class)): + cls = int(cls) + bbox[:2] = affine_transform(bbox[:2], trans_output) + bbox[2:] = affine_transform(bbox[2:], trans_output) + bbox_amodal = copy.deepcopy(bbox) + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, output_w - 1) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, output_h - 1) + h, w = bbox[3] - bbox[1], bbox[2] - bbox[0] + if h > 0 and w > 0: + radius = gaussian_radius((math.ceil(h), math.ceil(w)), 0.7) + radius = max(0, int(radius)) + ct = np.array( + [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], + dtype=np.float32) + ct_int = ct.astype(np.int32) + + # get hm,wh,reg,ind,ind_mask + draw_umich_gaussian(hm[cls], ct_int, radius) + wh[i] = 1. * w, 1. * h + reg[i] = ct - ct_int + ind[i] = ct_int[1] * output_w + ct_int[0] + reg_mask[i] = 1 + cat_spec_wh[i, cls * 2:cls * 2 + 2] = wh[i] + cat_spec_mask[i, cls * 2:cls * 2 + 2] = 1 + gt_det.append([ + ct[0] - w / 2, ct[1] - h / 2, ct[0] + w / 2, ct[1] + h / 2, + 1, cls + ]) + + sample.pop('gt_bbox', None) + sample.pop('gt_class', None) + sample.pop('center', None) + sample.pop('scale', None) + sample.pop('is_crowd', None) + sample.pop('difficult', None) + + sample['index'] = ind + sample['index_mask'] = reg_mask + sample['heatmap'] = hm + sample['size'] = wh + sample['offset'] = reg + return sample + + +@register_op +class PadGT(BaseOperator): + """ + Pad 0 to `gt_class`, `gt_bbox`, `gt_score`... + The num_max_boxes is the largest for batch. + Args: + return_gt_mask (bool): If true, return `pad_gt_mask`, + 1 means bbox, 0 means no bbox. + """ + + def __init__(self, return_gt_mask=True, pad_img=False, minimum_gtnum=0): + super(PadGT, self).__init__() + self.return_gt_mask = return_gt_mask + self.pad_img = pad_img + self.minimum_gtnum = minimum_gtnum + + def _impad(self, img: np.ndarray, + *, + shape = None, + padding = None, + pad_val = 0, + padding_mode = 'constant') -> np.ndarray: + """Pad the given image to a certain shape or pad on all sides with + specified padding mode and padding value. + + Args: + img (ndarray): Image to be padded. + shape (tuple[int]): Expected padding shape (h, w). Default: None. + padding (int or tuple[int]): Padding on each border. If a single int is + provided this is used to pad all borders. If tuple of length 2 is + provided this is the padding on left/right and top/bottom + respectively. If a tuple of length 4 is provided this is the + padding for the left, top, right and bottom borders respectively. + Default: None. Note that `shape` and `padding` can not be both + set. + pad_val (Number | Sequence[Number]): Values to be filled in padding + areas when padding_mode is 'constant'. Default: 0. + padding_mode (str): Type of padding. Should be: constant, edge, + reflect or symmetric. Default: constant. + - constant: pads with a constant value, this value is specified + with pad_val. + - edge: pads with the last value at the edge of the image. + - reflect: pads with reflection of image without repeating the last + value on the edge. For example, padding [1, 2, 3, 4] with 2 + elements on both sides in reflect mode will result in + [3, 2, 1, 2, 3, 4, 3, 2]. + - symmetric: pads with reflection of image repeating the last value + on the edge. For example, padding [1, 2, 3, 4] with 2 elements on + both sides in symmetric mode will result in + [2, 1, 1, 2, 3, 4, 4, 3] + + Returns: + ndarray: The padded image. + """ + + assert (shape is not None) ^ (padding is not None) + if shape is not None: + width = max(shape[1] - img.shape[1], 0) + height = max(shape[0] - img.shape[0], 0) + padding = (0, 0, int(width), int(height)) + + # check pad_val + import numbers + if isinstance(pad_val, tuple): + assert len(pad_val) == img.shape[-1] + elif not isinstance(pad_val, numbers.Number): + raise TypeError('pad_val must be a int or a tuple. ' + f'But received {type(pad_val)}') + + # check padding + if isinstance(padding, tuple) and len(padding) in [2, 4]: + if len(padding) == 2: + padding = (padding[0], padding[1], padding[0], padding[1]) + elif isinstance(padding, numbers.Number): + padding = (padding, padding, padding, padding) + else: + raise ValueError('Padding must be a int or a 2, or 4 element tuple.' + f'But received {padding}') + + # check padding mode + assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric'] + + border_type = { + 'constant': cv2.BORDER_CONSTANT, + 'edge': cv2.BORDER_REPLICATE, + 'reflect': cv2.BORDER_REFLECT_101, + 'symmetric': cv2.BORDER_REFLECT + } + img = cv2.copyMakeBorder( + img, + padding[1], + padding[3], + padding[0], + padding[2], + border_type[padding_mode], + value=pad_val) + + return img + + def checkmaxshape(self, samples): + maxh, maxw = 0, 0 + for sample in samples: + h,w = sample['im_shape'] + if h>maxh: + maxh = h + if w>maxw: + maxw = w + return (maxh, maxw) + + def __call__(self, samples, context=None): + num_max_boxes = max([len(s['gt_bbox']) for s in samples]) + num_max_boxes = max(self.minimum_gtnum, num_max_boxes) + if self.pad_img: + maxshape = self.checkmaxshape(samples) + for sample in samples: + if self.pad_img: + img = sample['image'] + padimg = self._impad(img, shape=maxshape) + sample['image'] = padimg + if self.return_gt_mask: + sample['pad_gt_mask'] = np.zeros( + (num_max_boxes, 1), dtype=np.float32) + if num_max_boxes == 0: + continue + + num_gt = len(sample['gt_bbox']) + pad_gt_class = np.zeros((num_max_boxes, 1), dtype=np.int32) + pad_gt_bbox = np.zeros((num_max_boxes, 4), dtype=np.float32) + if num_gt > 0: + pad_gt_class[:num_gt] = sample['gt_class'] + pad_gt_bbox[:num_gt] = sample['gt_bbox'] + sample['gt_class'] = pad_gt_class + sample['gt_bbox'] = pad_gt_bbox + # pad_gt_mask + if 'pad_gt_mask' in sample: + sample['pad_gt_mask'][:num_gt] = 1 + # gt_score + if 'gt_score' in sample: + pad_gt_score = np.zeros((num_max_boxes, 1), dtype=np.float32) + if num_gt > 0: + pad_gt_score[:num_gt] = sample['gt_score'] + sample['gt_score'] = pad_gt_score + if 'is_crowd' in sample: + pad_is_crowd = np.zeros((num_max_boxes, 1), dtype=np.int32) + if num_gt > 0: + pad_is_crowd[:num_gt] = sample['is_crowd'] + sample['is_crowd'] = pad_is_crowd + if 'difficult' in sample: + pad_diff = np.zeros((num_max_boxes, 1), dtype=np.int32) + if num_gt > 0: + pad_diff[:num_gt] = sample['difficult'] + sample['difficult'] = pad_diff + if 'gt_joints' in sample: + num_joints = sample['gt_joints'].shape[1] + pad_gt_joints = np.zeros((num_max_boxes, num_joints, 3), dtype=np.float32) + if num_gt > 0: + pad_gt_joints[:num_gt] = sample['gt_joints'] + sample['gt_joints'] = pad_gt_joints + if 'gt_areas' in sample: + pad_gt_areas = np.zeros((num_max_boxes, 1), dtype=np.float32) + if num_gt > 0: + pad_gt_areas[:num_gt, 0] = sample['gt_areas'] + sample['gt_areas'] = pad_gt_areas + return samples + + +@register_op +class PadRGT(BaseOperator): + """ + Pad 0 to `gt_class`, `gt_bbox`, `gt_score`... + The num_max_boxes is the largest for batch. + Args: + return_gt_mask (bool): If true, return `pad_gt_mask`, + 1 means bbox, 0 means no bbox. + """ + + def __init__(self, return_gt_mask=True): + super(PadRGT, self).__init__() + self.return_gt_mask = return_gt_mask + + def pad_field(self, sample, field, num_gt): + name, shape, dtype = field + if name in sample: + pad_v = np.zeros(shape, dtype=dtype) + if num_gt > 0: + pad_v[:num_gt] = sample[name] + sample[name] = pad_v + + def __call__(self, samples, context=None): + num_max_boxes = max([len(s['gt_bbox']) for s in samples]) + for sample in samples: + if self.return_gt_mask: + sample['pad_gt_mask'] = np.zeros( + (num_max_boxes, 1), dtype=np.float32) + if num_max_boxes == 0: + continue + + num_gt = len(sample['gt_bbox']) + pad_gt_class = np.zeros((num_max_boxes, 1), dtype=np.int32) + pad_gt_bbox = np.zeros((num_max_boxes, 4), dtype=np.float32) + if num_gt > 0: + pad_gt_class[:num_gt] = sample['gt_class'] + pad_gt_bbox[:num_gt] = sample['gt_bbox'] + sample['gt_class'] = pad_gt_class + sample['gt_bbox'] = pad_gt_bbox + # pad_gt_mask + if 'pad_gt_mask' in sample: + sample['pad_gt_mask'][:num_gt] = 1 + # gt_score + names = ['gt_score', 'is_crowd', 'difficult', 'gt_poly', 'gt_rbox'] + dims = [1, 1, 1, 8, 5] + dtypes = [np.float32, np.int32, np.int32, np.float32, np.float32] + + for name, dim, dtype in zip(names, dims, dtypes): + self.pad_field(sample, [name, (num_max_boxes, dim), dtype], + num_gt) + + return samples + + +@register_op +class Gt2CenterTrackTarget(BaseOperator): + __shared__ = ['num_classes'] + """Gt2CenterTrackTarget + Genterate CenterTrack targets by ground-truth + Args: + num_classes (int): The number of classes, 1 by default. + down_ratio (int): The down sample ratio between output feature and + input image. + max_objs (int): The maximum objects detected, 256 by default. + """ + + def __init__(self, + num_classes=1, + down_ratio=4, + max_objs=256, + hm_disturb=0.05, + lost_disturb=0.4, + fp_disturb=0.1, + pre_hm=True, + add_tracking=True, + add_ltrb_amodal=True): + super(Gt2CenterTrackTarget, self).__init__() + self.nc = num_classes + self.down_ratio = down_ratio + self.max_objs = max_objs + + self.hm_disturb = hm_disturb + self.lost_disturb = lost_disturb + self.fp_disturb = fp_disturb + self.pre_hm = pre_hm + self.add_tracking = add_tracking + self.add_ltrb_amodal = add_ltrb_amodal + + def _get_pre_dets(self, input_h, input_w, trans_input_pre, gt_bbox_pre, + gt_class_pre, gt_track_id_pre): + hm_h, hm_w = input_h, input_w + reutrn_hm = self.pre_hm + pre_hm = np.zeros( + (1, hm_h, hm_w), dtype=np.float32) if reutrn_hm else None + pre_cts, track_ids = [], [] + + for i, ( + bbox, cls, track_id + ) in enumerate(zip(gt_bbox_pre, gt_class_pre, gt_track_id_pre)): + cls = int(cls) + bbox[:2] = affine_transform(bbox[:2], trans_input_pre) + bbox[2:] = affine_transform(bbox[2:], trans_input_pre) + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, hm_w - 1) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, hm_h - 1) + h, w = bbox[3] - bbox[1], bbox[2] - bbox[0] + max_rad = 1 + if (h > 0 and w > 0): + radius = gaussian_radius((math.ceil(h), math.ceil(w)), 0.7) + radius = max(0, int(radius)) + max_rad = max(max_rad, radius) + ct = np.array( + [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], + dtype=np.float32) + ct0 = ct.copy() + conf = 1 + + ct[0] = ct[0] + np.random.randn() * self.hm_disturb * w + ct[1] = ct[1] + np.random.randn() * self.hm_disturb * h + conf = 1 if np.random.rand() > self.lost_disturb else 0 + + ct_int = ct.astype(np.int32) + if conf == 0: + pre_cts.append(ct / self.down_ratio) + else: + pre_cts.append(ct0 / self.down_ratio) + + track_ids.append(track_id) + if reutrn_hm: + draw_umich_gaussian(pre_hm[0], ct_int, radius, k=conf) + + if np.random.rand() < self.fp_disturb and reutrn_hm: + ct2 = ct0.copy() + # Hard code heatmap disturb ratio, haven't tried other numbers. + ct2[0] = ct2[0] + np.random.randn() * 0.05 * w + ct2[1] = ct2[1] + np.random.randn() * 0.05 * h + ct2_int = ct2.astype(np.int32) + draw_umich_gaussian(pre_hm[0], ct2_int, radius, k=conf) + return pre_hm, pre_cts, track_ids + + def __call__(self, sample, context=None): + input_h, input_w = sample['image'].shape[1:] + output_h = input_h // self.down_ratio + output_w = input_w // self.down_ratio + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + + # init + hm = np.zeros((self.nc, output_h, output_w), dtype=np.float32) + wh = np.zeros((self.max_objs, 2), dtype=np.float32) + reg = np.zeros((self.max_objs, 2), dtype=np.float32) + ind = np.zeros((self.max_objs), dtype=np.int64) + reg_mask = np.zeros((self.max_objs), dtype=np.int32) + if self.add_tracking: + tr = np.zeros((self.max_objs, 2), dtype=np.float32) + if self.add_ltrb_amodal: + ltrb_amodal = np.zeros((self.max_objs, 4), dtype=np.float32) + + trans_output = get_affine_transform( + center=sample['center'], + input_size=[sample['scale'], sample['scale']], + rot=0, + output_size=[output_w, output_h]) + + pre_hm, pre_cts, track_ids = self._get_pre_dets( + input_h, input_w, sample['trans_input'], sample['pre_gt_bbox'], + sample['pre_gt_class'], sample['pre_gt_track_id']) + + for i, (bbox, cls) in enumerate(zip(gt_bbox, gt_class)): + cls = int(cls) + rect = np.array( + [[bbox[0], bbox[1]], [bbox[0], bbox[3]], [bbox[2], bbox[3]], + [bbox[2], bbox[1]]], + dtype=np.float32) + for t in range(4): + rect[t] = affine_transform(rect[t], trans_output) + bbox[:2] = rect[:, 0].min(), rect[:, 1].min() + bbox[2:] = rect[:, 0].max(), rect[:, 1].max() + + bbox_amodal = copy.deepcopy(bbox) + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, output_w - 1) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, output_h - 1) + + h, w = bbox[3] - bbox[1], bbox[2] - bbox[0] + if h > 0 and w > 0: + radius = gaussian_radius((math.ceil(h), math.ceil(w)), 0.7) + radius = max(0, int(radius)) + ct = np.array( + [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], + dtype=np.float32) + ct_int = ct.astype(np.int32) + + # get hm,wh,reg,ind,ind_mask + draw_umich_gaussian(hm[cls], ct_int, radius) + wh[i] = 1. * w, 1. * h + reg[i] = ct - ct_int + ind[i] = ct_int[1] * output_w + ct_int[0] + reg_mask[i] = 1 + if self.add_tracking: + if sample['gt_track_id'][i] in track_ids: + pre_ct = pre_cts[track_ids.index(sample['gt_track_id'][ + i])] + tr[i] = pre_ct - ct_int + + if self.add_ltrb_amodal: + ltrb_amodal[i] = \ + bbox_amodal[0] - ct_int[0], bbox_amodal[1] - ct_int[1], \ + bbox_amodal[2] - ct_int[0], bbox_amodal[3] - ct_int[1] + + new_sample = {'image': sample['image']} + new_sample['index'] = ind + new_sample['index_mask'] = reg_mask + new_sample['heatmap'] = hm + new_sample['size'] = wh + new_sample['offset'] = reg + if self.add_tracking: + new_sample['tracking'] = tr + if self.add_ltrb_amodal: + new_sample['ltrb_amodal'] = ltrb_amodal + + new_sample['pre_image'] = sample['pre_image'] + new_sample['pre_hm'] = pre_hm + + del sample + return new_sample diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/gridmask_utils.py b/PaddleDetection-release-2.6/ppdet/data/transform/gridmask_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..c18701556efa793a2d9bbced5f333059b4ab6236 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/gridmask_utils.py @@ -0,0 +1,86 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/dvlab-research/GridMask/blob/master/detection_grid/maskrcnn_benchmark/data/transforms/grid.py + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +import numpy as np +from PIL import Image + + +class Gridmask(object): + def __init__(self, + use_h=True, + use_w=True, + rotate=1, + offset=False, + ratio=0.5, + mode=1, + prob=0.7, + upper_iter=360000): + super(Gridmask, self).__init__() + self.use_h = use_h + self.use_w = use_w + self.rotate = rotate + self.offset = offset + self.ratio = ratio + self.mode = mode + self.prob = prob + self.st_prob = prob + self.upper_iter = upper_iter + + def __call__(self, x, curr_iter): + self.prob = self.st_prob * min(1, 1.0 * curr_iter / self.upper_iter) + if np.random.rand() > self.prob: + return x + h, w, _ = x.shape + hh = int(1.5 * h) + ww = int(1.5 * w) + d = np.random.randint(2, h) + self.l = min(max(int(d * self.ratio + 0.5), 1), d - 1) + mask = np.ones((hh, ww), np.float32) + st_h = np.random.randint(d) + st_w = np.random.randint(d) + if self.use_h: + for i in range(hh // d): + s = d * i + st_h + t = min(s + self.l, hh) + mask[s:t, :] *= 0 + if self.use_w: + for i in range(ww // d): + s = d * i + st_w + t = min(s + self.l, ww) + mask[:, s:t] *= 0 + + r = np.random.randint(self.rotate) + mask = Image.fromarray(np.uint8(mask)) + mask = mask.rotate(r) + mask = np.asarray(mask) + mask = mask[(hh - h) // 2:(hh - h) // 2 + h, (ww - w) // 2:(ww - w) // 2 + + w].astype(np.float32) + + if self.mode == 1: + mask = 1 - mask + mask = np.expand_dims(mask, axis=-1) + if self.offset: + offset = (2 * (np.random.rand(h, w) - 0.5)).astype(np.float32) + x = (x * mask + offset * (1 - mask)).astype(x.dtype) + else: + x = (x * mask).astype(x.dtype) + + return x diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/keypoint_operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/keypoint_operators.py new file mode 100644 index 0000000000000000000000000000000000000000..fea23d696c27ab27ba1b70a06856b8d27e7c2a58 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/keypoint_operators.py @@ -0,0 +1,1613 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# function: +# operators to process sample, +# eg: decode/resize/crop image + +from __future__ import absolute_import + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +import cv2 +import numpy as np +import math +import copy + +from ...modeling.keypoint_utils import get_affine_mat_kernel, warp_affine_joints, get_affine_transform, affine_transform, get_warp_matrix +from ppdet.core.workspace import serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +registered_ops = [] + +__all__ = [ + 'RandomAffine', 'KeyPointFlip', 'TagGenerate', 'ToHeatmaps', + 'NormalizePermute', 'EvalAffine', 'RandomFlipHalfBodyTransform', + 'TopDownAffine', 'ToHeatmapsTopDown', 'ToHeatmapsTopDown_DARK', + 'ToHeatmapsTopDown_UDP', 'TopDownEvalAffine', + 'AugmentationbyInformantionDropping', 'SinglePoseAffine', 'NoiseJitter', + 'FlipPose', 'PETR_Resize' +] + + +def register_keypointop(cls): + return serializable(cls) + + +@register_keypointop +class KeyPointFlip(object): + """Get the fliped image by flip_prob. flip the coords also + the left coords and right coords should exchange while flip, for the right keypoint will be left keypoint after image fliped + + Args: + flip_permutation (list[17]): the left-right exchange order list corresponding to [0,1,2,...,16] + hmsize (list[2]): output heatmap's shape list of different scale outputs of higherhrnet + flip_prob (float): the ratio whether to flip the image + records(dict): the dict contained the image, mask and coords + + Returns: + records(dict): contain the image, mask and coords after tranformed + + """ + + def __init__(self, flip_permutation, hmsize=None, flip_prob=0.5): + super(KeyPointFlip, self).__init__() + assert isinstance(flip_permutation, Sequence) + self.flip_permutation = flip_permutation + self.flip_prob = flip_prob + self.hmsize = hmsize + + def _flipjoints(self, records, sizelst): + ''' + records['gt_joints'] is Sequence in higherhrnet + ''' + if not ('gt_joints' in records and len(records['gt_joints']) > 0): + return records + + kpts_lst = records['gt_joints'] + if isinstance(kpts_lst, Sequence): + for idx, hmsize in enumerate(sizelst): + if kpts_lst[idx].ndim == 3: + kpts_lst[idx] = kpts_lst[idx][:, self.flip_permutation] + else: + kpts_lst[idx] = kpts_lst[idx][self.flip_permutation] + kpts_lst[idx][..., 0] = hmsize - kpts_lst[idx][..., 0] + else: + hmsize = sizelst[0] + if kpts_lst.ndim == 3: + kpts_lst = kpts_lst[:, self.flip_permutation] + else: + kpts_lst = kpts_lst[self.flip_permutation] + kpts_lst[..., 0] = hmsize - kpts_lst[..., 0] + + records['gt_joints'] = kpts_lst + return records + + def _flipmask(self, records, sizelst): + if not 'mask' in records: + return records + + mask_lst = records['mask'] + for idx, hmsize in enumerate(sizelst): + if len(mask_lst) > idx: + mask_lst[idx] = mask_lst[idx][:, ::-1] + records['mask'] = mask_lst + return records + + def _flipbbox(self, records, sizelst): + if not 'gt_bbox' in records: + return records + + bboxes = records['gt_bbox'] + hmsize = sizelst[0] + bboxes[:, 0::2] = hmsize - bboxes[:, 0::2][:, ::-1] + bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, hmsize) + records['gt_bbox'] = bboxes + return records + + def __call__(self, records): + flip = np.random.random() < self.flip_prob + if flip: + image = records['image'] + image = image[:, ::-1] + records['image'] = image + if self.hmsize is None: + sizelst = [image.shape[1]] + else: + sizelst = self.hmsize + self._flipjoints(records, sizelst) + self._flipmask(records, sizelst) + self._flipbbox(records, sizelst) + + return records + + +@register_keypointop +class RandomAffine(object): + """apply affine transform to image, mask and coords + to achieve the rotate, scale and shift effect for training image + + Args: + max_degree (float): the max abslute rotate degree to apply, transform range is [-max_degree, max_degree] + max_scale (list[2]): the scale range to apply, transform range is [min, max] + max_shift (float): the max abslute shift ratio to apply, transform range is [-max_shift*imagesize, max_shift*imagesize] + hmsize (list[2]): output heatmap's shape list of different scale outputs of higherhrnet + trainsize (list[2]): the standard length used to train, the 'scale_type' of [h,w] will be resize to trainsize for standard + scale_type (str): the length of [h,w] to used for trainsize, chosed between 'short' and 'long' + records(dict): the dict contained the image, mask and coords + + Returns: + records(dict): contain the image, mask and coords after tranformed + + """ + + def __init__(self, + max_degree=30, + scale=[0.75, 1.5], + max_shift=0.2, + hmsize=None, + trainsize=[512, 512], + scale_type='short', + boldervalue=[114, 114, 114]): + super(RandomAffine, self).__init__() + self.max_degree = max_degree + self.min_scale = scale[0] + self.max_scale = scale[1] + self.max_shift = max_shift + self.hmsize = hmsize + self.trainsize = trainsize + self.scale_type = scale_type + self.boldervalue = boldervalue + + def _get_affine_matrix_old(self, center, scale, res, rot=0): + """Generate transformation matrix.""" + h = scale + t = np.zeros((3, 3), dtype=np.float32) + t[0, 0] = float(res[1]) / h + t[1, 1] = float(res[0]) / h + t[0, 2] = res[1] * (-float(center[0]) / h + .5) + t[1, 2] = res[0] * (-float(center[1]) / h + .5) + t[2, 2] = 1 + if rot != 0: + rot = -rot # To match direction of rotation from cropping + rot_mat = np.zeros((3, 3), dtype=np.float32) + rot_rad = rot * np.pi / 180 + sn, cs = np.sin(rot_rad), np.cos(rot_rad) + rot_mat[0, :2] = [cs, -sn] + rot_mat[1, :2] = [sn, cs] + rot_mat[2, 2] = 1 + # Need to rotate around center + t_mat = np.eye(3) + t_mat[0, 2] = -res[1] / 2 + t_mat[1, 2] = -res[0] / 2 + t_inv = t_mat.copy() + t_inv[:2, 2] *= -1 + t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t))) + return t + + def _get_affine_matrix(self, center, scale, res, rot=0): + """Generate transformation matrix.""" + w, h = scale + t = np.zeros((3, 3), dtype=np.float32) + t[0, 0] = float(res[0]) / w + t[1, 1] = float(res[1]) / h + t[0, 2] = res[0] * (-float(center[0]) / w + .5) + t[1, 2] = res[1] * (-float(center[1]) / h + .5) + t[2, 2] = 1 + if rot != 0: + rot = -rot # To match direction of rotation from cropping + rot_mat = np.zeros((3, 3), dtype=np.float32) + rot_rad = rot * np.pi / 180 + sn, cs = np.sin(rot_rad), np.cos(rot_rad) + rot_mat[0, :2] = [cs, -sn] + rot_mat[1, :2] = [sn, cs] + rot_mat[2, 2] = 1 + # Need to rotate around center + t_mat = np.eye(3) + t_mat[0, 2] = -res[0] / 2 + t_mat[1, 2] = -res[1] / 2 + t_inv = t_mat.copy() + t_inv[:2, 2] *= -1 + t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t))) + return t + + def _affine_joints_mask(self, + degree, + center, + roi_size, + dsize, + keypoints=None, + heatmap_mask=None, + gt_bbox=None): + kpts = None + mask = None + bbox = None + mask_affine_mat = self._get_affine_matrix(center, roi_size, dsize, + degree)[:2] + if heatmap_mask is not None: + mask = cv2.warpAffine(heatmap_mask, mask_affine_mat, dsize) + mask = ((mask / 255) > 0.5).astype(np.float32) + if keypoints is not None: + kpts = copy.deepcopy(keypoints) + kpts[..., 0:2] = warp_affine_joints(kpts[..., 0:2].copy(), + mask_affine_mat) + kpts[(kpts[..., 0]) > dsize[0], :] = 0 + kpts[(kpts[..., 1]) > dsize[1], :] = 0 + kpts[(kpts[..., 0]) < 0, :] = 0 + kpts[(kpts[..., 1]) < 0, :] = 0 + if gt_bbox is not None: + temp_bbox = gt_bbox[:, [0, 3, 2, 1]] + cat_bbox = np.concatenate((gt_bbox, temp_bbox), axis=-1) + gt_bbox_warped = warp_affine_joints(cat_bbox, mask_affine_mat) + bbox = np.zeros_like(gt_bbox) + bbox[:, 0] = gt_bbox_warped[:, 0::2].min(1).clip(0, dsize[0]) + bbox[:, 2] = gt_bbox_warped[:, 0::2].max(1).clip(0, dsize[0]) + bbox[:, 1] = gt_bbox_warped[:, 1::2].min(1).clip(0, dsize[1]) + bbox[:, 3] = gt_bbox_warped[:, 1::2].max(1).clip(0, dsize[1]) + return kpts, mask, bbox + + def __call__(self, records): + image = records['image'] + shape = np.array(image.shape[:2][::-1]) + keypoints = None + heatmap_mask = None + gt_bbox = None + if 'gt_joints' in records: + keypoints = records['gt_joints'] + + if 'mask' in records: + heatmap_mask = records['mask'] + heatmap_mask *= 255 + + if 'gt_bbox' in records: + gt_bbox = records['gt_bbox'] + + degree = (np.random.random() * 2 - 1) * self.max_degree + center = center = np.array((np.array(shape) / 2)) + + aug_scale = np.random.random() * (self.max_scale - self.min_scale + ) + self.min_scale + if self.scale_type == 'long': + scale = np.array([max(shape[0], shape[1]) / 1.0] * 2) + elif self.scale_type == 'short': + scale = np.array([min(shape[0], shape[1]) / 1.0] * 2) + elif self.scale_type == 'wh': + scale = shape + else: + raise ValueError('Unknown scale type: {}'.format(self.scale_type)) + roi_size = aug_scale * scale + dx = int(0) + dy = int(0) + if self.max_shift > 0: + + dx = np.random.randint(-self.max_shift * roi_size[0], + self.max_shift * roi_size[0]) + dy = np.random.randint(-self.max_shift * roi_size[0], + self.max_shift * roi_size[1]) + + center += np.array([dx, dy]) + input_size = 2 * center + if self.trainsize != -1: + dsize = self.trainsize + imgshape = (dsize) + else: + dsize = scale + imgshape = (shape.tolist()) + + image_affine_mat = self._get_affine_matrix(center, roi_size, dsize, + degree)[:2] + image = cv2.warpAffine( + image, + image_affine_mat, + imgshape, + flags=cv2.INTER_LINEAR, + borderValue=self.boldervalue) + + if self.hmsize is None: + kpts, mask, gt_bbox = self._affine_joints_mask( + degree, center, roi_size, dsize, keypoints, heatmap_mask, + gt_bbox) + records['image'] = image + if kpts is not None: records['gt_joints'] = kpts + if mask is not None: records['mask'] = mask + if gt_bbox is not None: records['gt_bbox'] = gt_bbox + return records + + kpts_lst = [] + mask_lst = [] + for hmsize in self.hmsize: + kpts, mask, gt_bbox = self._affine_joints_mask( + degree, center, roi_size, [hmsize, hmsize], keypoints, + heatmap_mask, gt_bbox) + kpts_lst.append(kpts) + mask_lst.append(mask) + records['image'] = image + + if 'gt_joints' in records: + records['gt_joints'] = kpts_lst + if 'mask' in records: + records['mask'] = mask_lst + if 'gt_bbox' in records: + records['gt_bbox'] = gt_bbox + return records + + +@register_keypointop +class EvalAffine(object): + """apply affine transform to image + resize the short of [h,w] to standard size for eval + + Args: + size (int): the standard length used to train, the 'short' of [h,w] will be resize to trainsize for standard + records(dict): the dict contained the image, mask and coords + + Returns: + records(dict): contain the image, mask and coords after tranformed + + """ + + def __init__(self, size, stride=64): + super(EvalAffine, self).__init__() + self.size = size + self.stride = stride + + def __call__(self, records): + image = records['image'] + mask = records['mask'] if 'mask' in records else None + s = self.size + h, w, _ = image.shape + trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False) + image_resized = cv2.warpAffine(image, trans, size_resized) + if mask is not None: + mask = cv2.warpAffine(mask, trans, size_resized) + records['mask'] = mask + if 'gt_joints' in records: + del records['gt_joints'] + records['image'] = image_resized + records['scale_factor'] = self.size / min(h, w) + return records + + +@register_keypointop +class NormalizePermute(object): + def __init__(self, + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.120, 57.375], + is_scale=True): + super(NormalizePermute, self).__init__() + self.mean = mean + self.std = std + self.is_scale = is_scale + + def __call__(self, records): + image = records['image'] + image = image.astype(np.float32) + if self.is_scale: + image /= 255. + image = image.transpose((2, 0, 1)) + mean = np.array(self.mean, dtype=np.float32) + std = np.array(self.std, dtype=np.float32) + invstd = 1. / std + for v, m, s in zip(image, mean, invstd): + v.__isub__(m).__imul__(s) + records['image'] = image + return records + + +@register_keypointop +class TagGenerate(object): + """record gt coords for aeloss to sample coords value in tagmaps + + Args: + num_joints (int): the keypoint numbers of dataset to train + num_people (int): maxmum people to support for sample aeloss + records(dict): the dict contained the image, mask and coords + + Returns: + records(dict): contain the gt coords used in tagmap + + """ + + def __init__(self, num_joints, max_people=30): + super(TagGenerate, self).__init__() + self.max_people = max_people + self.num_joints = num_joints + + def __call__(self, records): + kpts_lst = records['gt_joints'] + kpts = kpts_lst[0] + tagmap = np.zeros((self.max_people, self.num_joints, 4), dtype=np.int64) + inds = np.where(kpts[..., 2] > 0) + p, j = inds[0], inds[1] + visible = kpts[inds] + # tagmap is [p, j, 3], where last dim is j, y, x + tagmap[p, j, 0] = j + tagmap[p, j, 1] = visible[..., 1] # y + tagmap[p, j, 2] = visible[..., 0] # x + tagmap[p, j, 3] = 1 + records['tagmap'] = tagmap + del records['gt_joints'] + return records + + +@register_keypointop +class ToHeatmaps(object): + """to generate the gaussin heatmaps of keypoint for heatmap loss + + Args: + num_joints (int): the keypoint numbers of dataset to train + hmsize (list[2]): output heatmap's shape list of different scale outputs of higherhrnet + sigma (float): the std of gaussin kernel genereted + records(dict): the dict contained the image, mask and coords + + Returns: + records(dict): contain the heatmaps used to heatmaploss + + """ + + def __init__(self, num_joints, hmsize, sigma=None): + super(ToHeatmaps, self).__init__() + self.num_joints = num_joints + self.hmsize = np.array(hmsize) + if sigma is None: + sigma = hmsize[0] // 64 + self.sigma = sigma + + r = 6 * sigma + 3 + x = np.arange(0, r, 1, np.float32) + y = x[:, None] + x0, y0 = 3 * sigma + 1, 3 * sigma + 1 + self.gaussian = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * sigma**2)) + + def __call__(self, records): + kpts_lst = records['gt_joints'] + mask_lst = records['mask'] + for idx, hmsize in enumerate(self.hmsize): + mask = mask_lst[idx] + kpts = kpts_lst[idx] + heatmaps = np.zeros((self.num_joints, hmsize, hmsize)) + inds = np.where(kpts[..., 2] > 0) + visible = kpts[inds].astype(np.int64)[..., :2] + ul = np.round(visible - 3 * self.sigma - 1) + br = np.round(visible + 3 * self.sigma + 2) + sul = np.maximum(0, -ul) + sbr = np.minimum(hmsize, br) - ul + dul = np.clip(ul, 0, hmsize - 1) + dbr = np.clip(br, 0, hmsize) + for i in range(len(visible)): + if visible[i][0] < 0 or visible[i][1] < 0 or visible[i][ + 0] >= hmsize or visible[i][1] >= hmsize: + continue + dx1, dy1 = dul[i] + dx2, dy2 = dbr[i] + sx1, sy1 = sul[i] + sx2, sy2 = sbr[i] + heatmaps[inds[1][i], dy1:dy2, dx1:dx2] = np.maximum( + self.gaussian[sy1:sy2, sx1:sx2], + heatmaps[inds[1][i], dy1:dy2, dx1:dx2]) + records['heatmap_gt{}x'.format(idx + 1)] = heatmaps + records['mask_{}x'.format(idx + 1)] = mask + del records['mask'] + return records + + +@register_keypointop +class RandomFlipHalfBodyTransform(object): + """apply data augment to image and coords + to achieve the flip, scale, rotate and half body transform effect for training image + + Args: + trainsize (list):[w, h], Image target size + upper_body_ids (list): The upper body joint ids + flip_pairs (list): The left-right joints exchange order list + pixel_std (int): The pixel std of the scale + scale (float): The scale factor to transform the image + rot (int): The rotate factor to transform the image + num_joints_half_body (int): The joints threshold of the half body transform + prob_half_body (float): The threshold of the half body transform + flip (bool): Whether to flip the image + + Returns: + records(dict): contain the image and coords after tranformed + + """ + + def __init__(self, + trainsize, + upper_body_ids, + flip_pairs, + pixel_std, + scale=0.35, + rot=40, + num_joints_half_body=8, + prob_half_body=0.3, + flip=True, + rot_prob=0.6): + super(RandomFlipHalfBodyTransform, self).__init__() + self.trainsize = trainsize + self.upper_body_ids = upper_body_ids + self.flip_pairs = flip_pairs + self.pixel_std = pixel_std + self.scale = scale + self.rot = rot + self.num_joints_half_body = num_joints_half_body + self.prob_half_body = prob_half_body + self.flip = flip + self.aspect_ratio = trainsize[0] * 1.0 / trainsize[1] + self.rot_prob = rot_prob + + def halfbody_transform(self, joints, joints_vis): + upper_joints = [] + lower_joints = [] + for joint_id in range(joints.shape[0]): + if joints_vis[joint_id][0] > 0: + if joint_id in self.upper_body_ids: + upper_joints.append(joints[joint_id]) + else: + lower_joints.append(joints[joint_id]) + if np.random.randn() < 0.5 and len(upper_joints) > 2: + selected_joints = upper_joints + else: + selected_joints = lower_joints if len( + lower_joints) > 2 else upper_joints + if len(selected_joints) < 2: + return None, None + selected_joints = np.array(selected_joints, dtype=np.float32) + center = selected_joints.mean(axis=0)[:2] + left_top = np.amin(selected_joints, axis=0) + right_bottom = np.amax(selected_joints, axis=0) + w = right_bottom[0] - left_top[0] + h = right_bottom[1] - left_top[1] + if w > self.aspect_ratio * h: + h = w * 1.0 / self.aspect_ratio + elif w < self.aspect_ratio * h: + w = h * self.aspect_ratio + scale = np.array( + [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std], + dtype=np.float32) + scale = scale * 1.5 + + return center, scale + + def flip_joints(self, joints, joints_vis, width, matched_parts): + joints[:, 0] = width - joints[:, 0] - 1 + for pair in matched_parts: + joints[pair[0], :], joints[pair[1], :] = \ + joints[pair[1], :], joints[pair[0], :].copy() + joints_vis[pair[0], :], joints_vis[pair[1], :] = \ + joints_vis[pair[1], :], joints_vis[pair[0], :].copy() + + return joints * joints_vis, joints_vis + + def __call__(self, records): + image = records['image'] + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + c = records['center'] + s = records['scale'] + r = 0 + if (np.sum(joints_vis[:, 0]) > self.num_joints_half_body and + np.random.rand() < self.prob_half_body): + c_half_body, s_half_body = self.halfbody_transform(joints, + joints_vis) + if c_half_body is not None and s_half_body is not None: + c, s = c_half_body, s_half_body + sf = self.scale + rf = self.rot + s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf) + r = np.clip(np.random.randn() * rf, -rf * 2, + rf * 2) if np.random.random() <= self.rot_prob else 0 + + if self.flip and np.random.random() <= 0.5: + image = image[:, ::-1, :] + joints, joints_vis = self.flip_joints( + joints, joints_vis, image.shape[1], self.flip_pairs) + c[0] = image.shape[1] - c[0] - 1 + records['image'] = image + records['gt_joints'] = joints + records['joints_vis'] = joints_vis + records['center'] = c + records['scale'] = s + records['rotate'] = r + + return records + + +@register_keypointop +class AugmentationbyInformantionDropping(object): + """AID: Augmentation by Informantion Dropping. Please refer + to https://arxiv.org/abs/2008.07139 + + Args: + prob_cutout (float): The probability of the Cutout augmentation. + offset_factor (float): Offset factor of cutout center. + num_patch (int): Number of patches to be cutout. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, + trainsize, + prob_cutout=0.0, + offset_factor=0.2, + num_patch=1): + self.prob_cutout = prob_cutout + self.offset_factor = offset_factor + self.num_patch = num_patch + self.trainsize = trainsize + + def _cutout(self, img, joints, joints_vis): + height, width, _ = img.shape + img = img.reshape((height * width, -1)) + feat_x_int = np.arange(0, width) + feat_y_int = np.arange(0, height) + feat_x_int, feat_y_int = np.meshgrid(feat_x_int, feat_y_int) + feat_x_int = feat_x_int.reshape((-1, )) + feat_y_int = feat_y_int.reshape((-1, )) + for _ in range(self.num_patch): + vis_idx, _ = np.where(joints_vis > 0) + occlusion_joint_id = np.random.choice(vis_idx) + center = joints[occlusion_joint_id, 0:2] + offset = np.random.randn(2) * self.trainsize[0] * self.offset_factor + center = center + offset + radius = np.random.uniform(0.1, 0.2) * self.trainsize[0] + x_offset = (center[0] - feat_x_int) / radius + y_offset = (center[1] - feat_y_int) / radius + dis = x_offset**2 + y_offset**2 + keep_pos = np.where((dis <= 1) & (dis >= 0))[0] + img[keep_pos, :] = 0 + img = img.reshape((height, width, -1)) + return img + + def __call__(self, records): + img = records['image'] + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + if np.random.rand() < self.prob_cutout: + img = self._cutout(img, joints, joints_vis) + records['image'] = img + return records + + +@register_keypointop +class TopDownAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, records): + image = records['image'] + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + rot = records['rotate'] if "rotate" in records else 0 + if self.use_udp: + trans = get_warp_matrix( + rot, records['center'] * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], + records['scale'] * 200.0) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + joints[:, 0:2] = warp_affine_joints(joints[:, 0:2].copy(), trans) + else: + trans = get_affine_transform(records['center'], records['scale'] * + 200, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + for i in range(joints.shape[0]): + if joints_vis[i, 0] > 0.0: + joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) + + records['image'] = image + records['gt_joints'] = joints + + return records + + +@register_keypointop +class SinglePoseAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, + trainsize, + rotate=[1.0, 30], + scale=[1.0, 0.25], + use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + self.rot_prob = rotate[0] + self.rot_range = rotate[1] + self.scale_prob = scale[0] + self.scale_ratio = scale[1] + + def __call__(self, records): + image = records['image'] + if 'joints_2d' in records: + joints = records['joints_2d'] if 'joints_2d' in records else None + joints_vis = records[ + 'joints_vis'] if 'joints_vis' in records else np.ones( + (len(joints), 1)) + rot = 0 + s = 1. + if np.random.random() < self.rot_prob: + rot = np.clip(np.random.randn() * self.rot_range, + -self.rot_range * 2, self.rot_range * 2) + if np.random.random() < self.scale_prob: + s = np.clip(np.random.randn() * self.scale_ratio + 1, + 1 - self.scale_ratio, 1 + self.scale_ratio) + + if self.use_udp: + trans = get_warp_matrix( + rot, + np.array(records['bbox_center']) * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], + records['bbox_scale'] * 200.0 * s) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + if 'joints_2d' in records: + joints[:, 0:2] = warp_affine_joints(joints[:, 0:2].copy(), + trans) + else: + trans = get_affine_transform( + np.array(records['bbox_center']), + records['bbox_scale'] * s * 200, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + if 'joints_2d' in records: + for i in range(len(joints)): + if joints_vis[i, 0] > 0.0: + joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) + + if 'joints_3d' in records: + pose3d = records['joints_3d'] + if not rot == 0: + trans_3djoints = np.eye(3) + rot_rad = -rot * np.pi / 180 + sn, cs = np.sin(rot_rad), np.cos(rot_rad) + trans_3djoints[0, :2] = [cs, -sn] + trans_3djoints[1, :2] = [sn, cs] + pose3d[:, :3] = np.einsum('ij,kj->ki', trans_3djoints, + pose3d[:, :3]) + records['joints_3d'] = pose3d + + records['image'] = image + if 'joints_2d' in records: + records['joints_2d'] = joints + + return records + + +@register_keypointop +class NoiseJitter(object): + """apply NoiseJitter to image + + Args: + noise_factor (float): the noise factor ratio used to generate the jitter + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, noise_factor=0.4): + self.noise_factor = noise_factor + + def __call__(self, records): + self.pn = np.random.uniform(1 - self.noise_factor, + 1 + self.noise_factor, 3) + rgb_img = records['image'] + rgb_img[:, :, 0] = np.minimum( + 255.0, np.maximum(0.0, rgb_img[:, :, 0] * self.pn[0])) + rgb_img[:, :, 1] = np.minimum( + 255.0, np.maximum(0.0, rgb_img[:, :, 1] * self.pn[1])) + rgb_img[:, :, 2] = np.minimum( + 255.0, np.maximum(0.0, rgb_img[:, :, 2] * self.pn[2])) + records['image'] = rgb_img + return records + + +@register_keypointop +class FlipPose(object): + """random apply flip to image + + Args: + noise_factor (float): the noise factor ratio used to generate the jitter + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, flip_prob=0.5, img_res=224, num_joints=14): + self.flip_pob = flip_prob + self.img_res = img_res + if num_joints == 24: + self.perm = [ + 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6, 12, 13, 14, 15, 16, 17, + 18, 19, 21, 20, 23, 22 + ] + elif num_joints == 14: + self.perm = [5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6, 12, 13] + else: + print("error num_joints in flip :{}".format(num_joints)) + + def __call__(self, records): + + if np.random.random() < self.flip_pob: + img = records['image'] + img = np.fliplr(img) + + if 'joints_2d' in records: + joints_2d = records['joints_2d'] + joints_2d = joints_2d[self.perm] + joints_2d[:, 0] = self.img_res - joints_2d[:, 0] + records['joints_2d'] = joints_2d + + if 'joints_3d' in records: + joints_3d = records['joints_3d'] + joints_3d = joints_3d[self.perm] + joints_3d[:, 0] = -joints_3d[:, 0] + records['joints_3d'] = joints_3d + + records['image'] = img + return records + + +@register_keypointop +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, records): + image = records['image'] + rot = 0 + imshape = records['im_shape'][::-1] + center = imshape / 2. + scale = imshape + + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + records['image'] = image + + return records + + +@register_keypointop +class ToHeatmapsTopDown(object): + """to generate the gaussin heatmaps of keypoint for heatmap loss + + Args: + hmsize (list): [w, h] output heatmap's size + sigma (float): the std of gaussin kernel genereted + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the heatmaps used to heatmaploss + + """ + + def __init__(self, hmsize, sigma): + super(ToHeatmapsTopDown, self).__init__() + self.hmsize = np.array(hmsize) + self.sigma = sigma + + def __call__(self, records): + """refer to + https://github.com/leoxiaobin/deep-high-resolution-net.pytorch + Copyright (c) Microsoft, under the MIT License. + """ + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + num_joints = joints.shape[0] + image_size = np.array( + [records['image'].shape[1], records['image'].shape[0]]) + target_weight = np.ones((num_joints, 1), dtype=np.float32) + target_weight[:, 0] = joints_vis[:, 0] + target = np.zeros( + (num_joints, self.hmsize[1], self.hmsize[0]), dtype=np.float32) + tmp_size = self.sigma * 3 + feat_stride = image_size / self.hmsize + for joint_id in range(num_joints): + mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5) + mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5) + # Check that any part of the gaussian is in-bounds + ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] + br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] + if ul[0] >= self.hmsize[0] or ul[1] >= self.hmsize[1] or br[ + 0] < 0 or br[1] < 0: + # If not, just return the image as is + target_weight[joint_id] = 0 + continue + # # Generate gaussian + size = 2 * tmp_size + 1 + x = np.arange(0, size, 1, np.float32) + y = x[:, np.newaxis] + x0 = y0 = size // 2 + # The gaussian is not normalized, we want the center value to equal 1 + g = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * self.sigma**2)) + + # Usable gaussian range + g_x = max(0, -ul[0]), min(br[0], self.hmsize[0]) - ul[0] + g_y = max(0, -ul[1]), min(br[1], self.hmsize[1]) - ul[1] + # Image range + img_x = max(0, ul[0]), min(br[0], self.hmsize[0]) + img_y = max(0, ul[1]), min(br[1], self.hmsize[1]) + + v = target_weight[joint_id] + if v > 0.5: + target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[ + 0]:g_y[1], g_x[0]:g_x[1]] + records['target'] = target + records['target_weight'] = target_weight + del records['gt_joints'], records['joints_vis'] + + return records + + +@register_keypointop +class ToHeatmapsTopDown_DARK(object): + """to generate the gaussin heatmaps of keypoint for heatmap loss + + Args: + hmsize (list): [w, h] output heatmap's size + sigma (float): the std of gaussin kernel genereted + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the heatmaps used to heatmaploss + + """ + + def __init__(self, hmsize, sigma): + super(ToHeatmapsTopDown_DARK, self).__init__() + self.hmsize = np.array(hmsize) + self.sigma = sigma + + def __call__(self, records): + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + num_joints = joints.shape[0] + image_size = np.array( + [records['image'].shape[1], records['image'].shape[0]]) + target_weight = np.ones((num_joints, 1), dtype=np.float32) + target_weight[:, 0] = joints_vis[:, 0] + target = np.zeros( + (num_joints, self.hmsize[1], self.hmsize[0]), dtype=np.float32) + tmp_size = self.sigma * 3 + feat_stride = image_size / self.hmsize + for joint_id in range(num_joints): + mu_x = joints[joint_id][0] / feat_stride[0] + mu_y = joints[joint_id][1] / feat_stride[1] + # Check that any part of the gaussian is in-bounds + ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] + br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] + if ul[0] >= self.hmsize[0] or ul[1] >= self.hmsize[1] or br[ + 0] < 0 or br[1] < 0: + # If not, just return the image as is + target_weight[joint_id] = 0 + continue + + x = np.arange(0, self.hmsize[0], 1, np.float32) + y = np.arange(0, self.hmsize[1], 1, np.float32) + y = y[:, np.newaxis] + + v = target_weight[joint_id] + if v > 0.5: + target[joint_id] = np.exp(-( + (x - mu_x)**2 + (y - mu_y)**2) / (2 * self.sigma**2)) + records['target'] = target + records['target_weight'] = target_weight + del records['gt_joints'], records['joints_vis'] + + return records + + +@register_keypointop +class ToHeatmapsTopDown_UDP(object): + """This code is based on: + https://github.com/HuangJunJie2017/UDP-Pose/blob/master/deep-high-resolution-net.pytorch/lib/dataset/JointsDataset.py + + to generate the gaussian heatmaps of keypoint for heatmap loss. + ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing + for Human Pose Estimation (CVPR 2020). + + Args: + hmsize (list): [w, h] output heatmap's size + sigma (float): the std of gaussin kernel genereted + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the heatmaps used to heatmaploss + """ + + def __init__(self, hmsize, sigma): + super(ToHeatmapsTopDown_UDP, self).__init__() + self.hmsize = np.array(hmsize) + self.sigma = sigma + + def __call__(self, records): + joints = records['gt_joints'] + joints_vis = records['joints_vis'] + num_joints = joints.shape[0] + image_size = np.array( + [records['image'].shape[1], records['image'].shape[0]]) + target_weight = np.ones((num_joints, 1), dtype=np.float32) + target_weight[:, 0] = joints_vis[:, 0] + target = np.zeros( + (num_joints, self.hmsize[1], self.hmsize[0]), dtype=np.float32) + tmp_size = self.sigma * 3 + size = 2 * tmp_size + 1 + x = np.arange(0, size, 1, np.float32) + y = x[:, None] + feat_stride = (image_size - 1.0) / (self.hmsize - 1.0) + for joint_id in range(num_joints): + mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5) + mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5) + # Check that any part of the gaussian is in-bounds + ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] + br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] + if ul[0] >= self.hmsize[0] or ul[1] >= self.hmsize[1] or br[ + 0] < 0 or br[1] < 0: + # If not, just return the image as is + target_weight[joint_id] = 0 + continue + + mu_x_ac = joints[joint_id][0] / feat_stride[0] + mu_y_ac = joints[joint_id][1] / feat_stride[1] + x0 = y0 = size // 2 + x0 += mu_x_ac - mu_x + y0 += mu_y_ac - mu_y + g = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * self.sigma**2)) + # Usable gaussian range + g_x = max(0, -ul[0]), min(br[0], self.hmsize[0]) - ul[0] + g_y = max(0, -ul[1]), min(br[1], self.hmsize[1]) - ul[1] + # Image range + img_x = max(0, ul[0]), min(br[0], self.hmsize[0]) + img_y = max(0, ul[1]), min(br[1], self.hmsize[1]) + + v = target_weight[joint_id] + if v > 0.5: + target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[ + 0]:g_y[1], g_x[0]:g_x[1]] + records['target'] = target + records['target_weight'] = target_weight + del records['gt_joints'], records['joints_vis'] + + return records + + +from typing import Optional, Tuple, Union, List +import numbers + + +def _scale_size( + size: Tuple[int, int], + scale: Union[float, int, tuple], ) -> Tuple[int, int]: + """Rescale a size by a ratio. + + Args: + size (tuple[int]): (w, h). + scale (float | tuple(float)): Scaling factor. + + Returns: + tuple[int]: scaled size. + """ + if isinstance(scale, (float, int)): + scale = (scale, scale) + w, h = size + return int(w * float(scale[0]) + 0.5), int(h * float(scale[1]) + 0.5) + + +def rescale_size(old_size: tuple, + scale: Union[float, int, tuple], + return_scale: bool=False) -> tuple: + """Calculate the new size to be rescaled to. + + Args: + old_size (tuple[int]): The old size (w, h) of image. + scale (float | tuple[int]): The scaling factor or maximum size. + If it is a float number, then the image will be rescaled by this + factor, else if it is a tuple of 2 integers, then the image will + be rescaled as large as possible within the scale. + return_scale (bool): Whether to return the scaling factor besides the + rescaled image size. + + Returns: + tuple[int]: The new rescaled image size. + """ + w, h = old_size + if isinstance(scale, (float, int)): + if scale <= 0: + raise ValueError(f'Invalid scale {scale}, must be positive.') + scale_factor = scale + elif isinstance(scale, list): + max_long_edge = max(scale) + max_short_edge = min(scale) + scale_factor = min(max_long_edge / max(h, w), + max_short_edge / min(h, w)) + else: + raise TypeError( + f'Scale must be a number or tuple of int, but got {type(scale)}') + + new_size = _scale_size((w, h), scale_factor) + + if return_scale: + return new_size, scale_factor + else: + return new_size + + +def imrescale(img: np.ndarray, + scale: Union[float, Tuple[int, int]], + return_scale: bool=False, + interpolation: str='bilinear', + backend: Optional[str]=None) -> Union[np.ndarray, Tuple[ + np.ndarray, float]]: + """Resize image while keeping the aspect ratio. + + Args: + img (ndarray): The input image. + scale (float | tuple[int]): The scaling factor or maximum size. + If it is a float number, then the image will be rescaled by this + factor, else if it is a tuple of 2 integers, then the image will + be rescaled as large as possible within the scale. + return_scale (bool): Whether to return the scaling factor besides the + rescaled image. + interpolation (str): Same as :func:`resize`. + backend (str | None): Same as :func:`resize`. + + Returns: + ndarray: The rescaled image. + """ + h, w = img.shape[:2] + new_size, scale_factor = rescale_size((w, h), scale, return_scale=True) + rescaled_img = imresize( + img, new_size, interpolation=interpolation, backend=backend) + if return_scale: + return rescaled_img, scale_factor + else: + return rescaled_img + + +def imresize( + img: np.ndarray, + size: Tuple[int, int], + return_scale: bool=False, + interpolation: str='bilinear', + out: Optional[np.ndarray]=None, + backend: Optional[str]=None, + interp=cv2.INTER_LINEAR, ) -> Union[Tuple[np.ndarray, float, float], + np.ndarray]: + """Resize image to a given size. + + Args: + img (ndarray): The input image. + size (tuple[int]): Target size (w, h). + return_scale (bool): Whether to return `w_scale` and `h_scale`. + interpolation (str): Interpolation method, accepted values are + "nearest", "bilinear", "bicubic", "area", "lanczos" for 'cv2' + backend, "nearest", "bilinear" for 'pillow' backend. + out (ndarray): The output destination. + backend (str | None): The image resize backend type. Options are `cv2`, + `pillow`, `None`. If backend is None, the global imread_backend + specified by ``mmcv.use_backend()`` will be used. Default: None. + + Returns: + tuple | ndarray: (`resized_img`, `w_scale`, `h_scale`) or + `resized_img`. + """ + h, w = img.shape[:2] + if backend is None: + backend = imread_backend + if backend not in ['cv2', 'pillow']: + raise ValueError(f'backend: {backend} is not supported for resize.' + f"Supported backends are 'cv2', 'pillow'") + + if backend == 'pillow': + assert img.dtype == np.uint8, 'Pillow backend only support uint8 type' + pil_image = Image.fromarray(img) + pil_image = pil_image.resize(size, pillow_interp_codes[interpolation]) + resized_img = np.array(pil_image) + else: + resized_img = cv2.resize(img, size, dst=out, interpolation=interp) + if not return_scale: + return resized_img + else: + w_scale = size[0] / w + h_scale = size[1] / h + return resized_img, w_scale, h_scale + + +class PETR_Resize: + """Resize images & bbox & mask. + + This transform resizes the input image to some scale. Bboxes and masks are + then resized with the same scale factor. If the input dict contains the key + "scale", then the scale in the input dict is used, otherwise the specified + scale in the init method is used. If the input dict contains the key + "scale_factor" (if MultiScaleFlipAug does not give img_scale but + scale_factor), the actual scale will be computed by image shape and + scale_factor. + + `img_scale` can either be a tuple (single-scale) or a list of tuple + (multi-scale). There are 3 multiscale modes: + + - ``ratio_range is not None``: randomly sample a ratio from the ratio \ + range and multiply it with the image scale. + - ``ratio_range is None`` and ``multiscale_mode == "range"``: randomly \ + sample a scale from the multiscale range. + - ``ratio_range is None`` and ``multiscale_mode == "value"``: randomly \ + sample a scale from multiple scales. + + Args: + img_scale (tuple or list[tuple]): Images scales for resizing. + multiscale_mode (str): Either "range" or "value". + ratio_range (tuple[float]): (min_ratio, max_ratio) + keep_ratio (bool): Whether to keep the aspect ratio when resizing the + image. + bbox_clip_border (bool, optional): Whether to clip the objects outside + the border of the image. In some dataset like MOT17, the gt bboxes + are allowed to cross the border of images. Therefore, we don't + need to clip the gt bboxes in these cases. Defaults to True. + backend (str): Image resize backend, choices are 'cv2' and 'pillow'. + These two backends generates slightly different results. Defaults + to 'cv2'. + interpolation (str): Interpolation method, accepted values are + "nearest", "bilinear", "bicubic", "area", "lanczos" for 'cv2' + backend, "nearest", "bilinear" for 'pillow' backend. + override (bool, optional): Whether to override `scale` and + `scale_factor` so as to call resize twice. Default False. If True, + after the first resizing, the existed `scale` and `scale_factor` + will be ignored so the second resizing can be allowed. + This option is a work-around for multiple times of resize in DETR. + Defaults to False. + """ + + def __init__(self, + img_scale=None, + multiscale_mode='range', + ratio_range=None, + keep_ratio=True, + bbox_clip_border=True, + backend='cv2', + interpolation='bilinear', + override=False, + keypoint_clip_border=True): + if img_scale is None: + self.img_scale = None + else: + if isinstance(img_scale, list): + self.img_scale = img_scale + else: + self.img_scale = [img_scale] + assert isinstance(self.img_scale, list) + + if ratio_range is not None: + # mode 1: given a scale and a range of image ratio + assert len(self.img_scale) == 1 + else: + # mode 2: given multiple scales or a range of scales + assert multiscale_mode in ['value', 'range'] + + self.backend = backend + self.multiscale_mode = multiscale_mode + self.ratio_range = ratio_range + self.keep_ratio = keep_ratio + # TODO: refactor the override option in Resize + self.interpolation = interpolation + self.override = override + self.bbox_clip_border = bbox_clip_border + self.keypoint_clip_border = keypoint_clip_border + + @staticmethod + def random_select(img_scales): + """Randomly select an img_scale from given candidates. + + Args: + img_scales (list[tuple]): Images scales for selection. + + Returns: + (tuple, int): Returns a tuple ``(img_scale, scale_dix)``, \ + where ``img_scale`` is the selected image scale and \ + ``scale_idx`` is the selected index in the given candidates. + """ + + assert isinstance(img_scales, list) + scale_idx = np.random.randint(len(img_scales)) + img_scale = img_scales[scale_idx] + return img_scale, scale_idx + + @staticmethod + def random_sample(img_scales): + """Randomly sample an img_scale when ``multiscale_mode=='range'``. + + Args: + img_scales (list[tuple]): Images scale range for sampling. + There must be two tuples in img_scales, which specify the lower + and upper bound of image scales. + + Returns: + (tuple, None): Returns a tuple ``(img_scale, None)``, where \ + ``img_scale`` is sampled scale and None is just a placeholder \ + to be consistent with :func:`random_select`. + """ + + assert isinstance(img_scales, list) and len(img_scales) == 2 + img_scale_long = [max(s) for s in img_scales] + img_scale_short = [min(s) for s in img_scales] + long_edge = np.random.randint( + min(img_scale_long), max(img_scale_long) + 1) + short_edge = np.random.randint( + min(img_scale_short), max(img_scale_short) + 1) + img_scale = (long_edge, short_edge) + return img_scale, None + + @staticmethod + def random_sample_ratio(img_scale, ratio_range): + """Randomly sample an img_scale when ``ratio_range`` is specified. + + A ratio will be randomly sampled from the range specified by + ``ratio_range``. Then it would be multiplied with ``img_scale`` to + generate sampled scale. + + Args: + img_scale (list): Images scale base to multiply with ratio. + ratio_range (tuple[float]): The minimum and maximum ratio to scale + the ``img_scale``. + + Returns: + (tuple, None): Returns a tuple ``(scale, None)``, where \ + ``scale`` is sampled ratio multiplied with ``img_scale`` and \ + None is just a placeholder to be consistent with \ + :func:`random_select`. + """ + + assert isinstance(img_scale, list) and len(img_scale) == 2 + min_ratio, max_ratio = ratio_range + assert min_ratio <= max_ratio + ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio + scale = int(img_scale[0] * ratio), int(img_scale[1] * ratio) + return scale, None + + def _random_scale(self, results): + """Randomly sample an img_scale according to ``ratio_range`` and + ``multiscale_mode``. + + If ``ratio_range`` is specified, a ratio will be sampled and be + multiplied with ``img_scale``. + If multiple scales are specified by ``img_scale``, a scale will be + sampled according to ``multiscale_mode``. + Otherwise, single scale will be used. + + Args: + results (dict): Result dict from :obj:`dataset`. + + Returns: + dict: Two new keys 'scale` and 'scale_idx` are added into \ + ``results``, which would be used by subsequent pipelines. + """ + + if self.ratio_range is not None: + scale, scale_idx = self.random_sample_ratio(self.img_scale[0], + self.ratio_range) + elif len(self.img_scale) == 1: + scale, scale_idx = self.img_scale[0], 0 + elif self.multiscale_mode == 'range': + scale, scale_idx = self.random_sample(self.img_scale) + elif self.multiscale_mode == 'value': + scale, scale_idx = self.random_select(self.img_scale) + else: + raise NotImplementedError + results['scale'] = scale + results['scale_idx'] = scale_idx + + def _resize_img(self, results): + """Resize images with ``results['scale']``.""" + for key in ['image'] if 'image' in results else []: + if self.keep_ratio: + img, scale_factor = imrescale( + results[key], + results['scale'], + return_scale=True, + interpolation=self.interpolation, + backend=self.backend) + # the w_scale and h_scale has minor difference + # a real fix should be done in the imrescale in the future + new_h, new_w = img.shape[:2] + h, w = results[key].shape[:2] + w_scale = new_w / w + h_scale = new_h / h + else: + img, w_scale, h_scale = imresize( + results[key], + results['scale'], + return_scale=True, + interpolation=self.interpolation, + backend=self.backend) + + scale_factor = np.array( + [w_scale, h_scale, w_scale, h_scale], dtype=np.float32) + results['im_shape'] = np.array(img.shape) + # in case that there is no padding + results['pad_shape'] = img.shape + results['scale_factor'] = scale_factor + results['keep_ratio'] = self.keep_ratio + # img_pad = self.impad(img, shape=results['scale']) + results[key] = img + + def _resize_bboxes(self, results): + """Resize bounding boxes with ``results['scale_factor']``.""" + for key in ['gt_bbox'] if 'gt_bbox' in results else []: + bboxes = results[key] * results['scale_factor'] + if self.bbox_clip_border: + img_shape = results['im_shape'] + bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1]) + bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0]) + results[key] = bboxes + + def _resize_masks(self, results): + """Resize masks with ``results['scale']``""" + for key in ['mask'] if 'mask' in results else []: + if results[key] is None: + continue + if self.keep_ratio: + results[key] = results[key].rescale(results['scale']) + else: + results[key] = results[key].resize(results['im_shape'][:2]) + + def _resize_seg(self, results): + """Resize semantic segmentation map with ``results['scale']``.""" + for key in ['seg'] if 'seg' in results else []: + if self.keep_ratio: + gt_seg = imrescale( + results[key], + results['scale'], + interpolation='nearest', + backend=self.backend) + else: + gt_seg = imresize( + results[key], + results['scale'], + interpolation='nearest', + backend=self.backend) + results[key] = gt_seg + + def _resize_keypoints(self, results): + """Resize keypoints with ``results['scale_factor']``.""" + for key in ['gt_joints'] if 'gt_joints' in results else []: + keypoints = results[key].copy() + keypoints[..., 0] = keypoints[..., 0] * results['scale_factor'][0] + keypoints[..., 1] = keypoints[..., 1] * results['scale_factor'][1] + if self.keypoint_clip_border: + img_shape = results['im_shape'] + keypoints[..., 0] = np.clip(keypoints[..., 0], 0, img_shape[1]) + keypoints[..., 1] = np.clip(keypoints[..., 1], 0, img_shape[0]) + results[key] = keypoints + + def _resize_areas(self, results): + """Resize mask areas with ``results['scale_factor']``.""" + for key in ['gt_areas'] if 'gt_areas' in results else []: + areas = results[key].copy() + areas = areas * results['scale_factor'][0] * results[ + 'scale_factor'][1] + results[key] = areas + + def __call__(self, results): + """Call function to resize images, bounding boxes, masks, semantic + segmentation map. + + Args: + results (dict): Result dict from loading pipeline. + + Returns: + dict: Resized results, 'im_shape', 'pad_shape', 'scale_factor', \ + 'keep_ratio' keys are added into result dict. + """ + if 'scale' not in results: + if 'scale_factor' in results: + img_shape = results['image'].shape[:2] + scale_factor = results['scale_factor'][0] + # assert isinstance(scale_factor, float) + results['scale'] = [int(x * scale_factor) + for x in img_shape][::-1] + else: + self._random_scale(results) + else: + if not self.override: + assert 'scale_factor' not in results, ( + 'scale and scale_factor cannot be both set.') + else: + results.pop('scale') + if 'scale_factor' in results: + results.pop('scale_factor') + self._random_scale(results) + + self._resize_img(results) + self._resize_bboxes(results) + self._resize_masks(results) + self._resize_seg(results) + self._resize_keypoints(results) + self._resize_areas(results) + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f'(img_scale={self.img_scale}, ' + repr_str += f'multiscale_mode={self.multiscale_mode}, ' + repr_str += f'ratio_range={self.ratio_range}, ' + repr_str += f'keep_ratio={self.keep_ratio}, ' + repr_str += f'bbox_clip_border={self.bbox_clip_border})' + repr_str += f'keypoint_clip_border={self.keypoint_clip_border})' + return repr_str diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/keypoints_3d_operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/keypoints_3d_operators.py new file mode 100644 index 0000000000000000000000000000000000000000..13337bc320f7fe213cccc65fb78a4844bd1df6b1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/keypoints_3d_operators.py @@ -0,0 +1,296 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence +import cv2 +import numpy as np +import math +import copy +import random +import uuid +from numbers import Number, Integral + +from ...modeling.keypoint_utils import get_affine_mat_kernel, warp_affine_joints, get_affine_transform, affine_transform, get_warp_matrix +from ppdet.core.workspace import serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +registered_ops = [] + +__all__ = [ + 'CropAndFlipImages', 'PermuteImages', 'RandomFlipHalfBody3DTransformImages' +] + +import matplotlib.pyplot as plt +from PIL import Image, ImageDraw +from mpl_toolkits.mplot3d import Axes3D + + +def register_keypointop(cls): + return serializable(cls) + + +def register_op(cls): + registered_ops.append(cls.__name__) + if not hasattr(BaseOperator, cls.__name__): + setattr(BaseOperator, cls.__name__, cls) + else: + raise KeyError("The {} class has been registered.".format(cls.__name__)) + return serializable(cls) + + +class BaseOperator(object): + def __init__(self, name=None): + if name is None: + name = self.__class__.__name__ + self._id = name + '_' + str(uuid.uuid4())[-6:] + + def apply(self, sample, context=None): + """ Process a sample. + Args: + sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx} + context (dict): info about this sample processing + Returns: + result (dict): a processed sample + """ + return sample + + def __call__(self, sample, context=None): + """ Process a sample. + Args: + sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx} + context (dict): info about this sample processing + Returns: + result (dict): a processed sample + """ + if isinstance(sample, Sequence): # for batch_size + for i in range(len(sample)): + sample[i] = self.apply(sample[i], context) + else: + # image.shape changed + sample = self.apply(sample, context) + return sample + + def __str__(self): + return str(self._id) + + +@register_keypointop +class CropAndFlipImages(object): + """Crop all images""" + + def __init__(self, crop_range, flip_pairs=None): + super(CropAndFlipImages, self).__init__() + self.crop_range = crop_range + self.flip_pairs = flip_pairs + + def __call__(self, records): # tuple + images = records["image"] + images = images[:, :, ::-1, :] + images = images[:, :, self.crop_range[0]:self.crop_range[1]] + records["image"] = images + + if "kps2d" in records.keys(): + kps2d = records["kps2d"] + + width, height = images.shape[2], images.shape[1] + kps2d = np.array(kps2d) + kps2d[:, :, 0] = kps2d[:, :, 0] - self.crop_range[0] + + for pair in self.flip_pairs: + kps2d[:, pair[0], :], kps2d[:,pair[1], :] = \ + kps2d[:,pair[1], :], kps2d[:,pair[0], :].copy() + + records["kps2d"] = kps2d + + return records + + +@register_op +class PermuteImages(BaseOperator): + def __init__(self): + """ + Change the channel to be (batch_size, C, H, W) #(6, 3, 1080, 1920) + """ + super(PermuteImages, self).__init__() + + def apply(self, sample, context=None): + images = sample["image"] + images = images.transpose((0, 3, 1, 2)) + + sample["image"] = images + + return sample + + +@register_keypointop +class RandomFlipHalfBody3DTransformImages(object): + """apply data augment to images and coords + to achieve the flip, scale, rotate and half body transform effect for training image + Args: + trainsize (list):[w, h], Image target size + upper_body_ids (list): The upper body joint ids + flip_pairs (list): The left-right joints exchange order list + pixel_std (int): The pixel std of the scale + scale (float): The scale factor to transform the image + rot (int): The rotate factor to transform the image + num_joints_half_body (int): The joints threshold of the half body transform + prob_half_body (float): The threshold of the half body transform + flip (bool): Whether to flip the image + Returns: + records(dict): contain the image and coords after tranformed + """ + + def __init__(self, + trainsize, + upper_body_ids, + flip_pairs, + pixel_std, + scale=0.35, + rot=40, + num_joints_half_body=8, + prob_half_body=0.3, + flip=True, + rot_prob=0.6, + do_occlusion=False): + super(RandomFlipHalfBody3DTransformImages, self).__init__() + self.trainsize = trainsize + self.upper_body_ids = upper_body_ids + self.flip_pairs = flip_pairs + self.pixel_std = pixel_std + self.scale = scale + self.rot = rot + self.num_joints_half_body = num_joints_half_body + self.prob_half_body = prob_half_body + self.flip = flip + self.aspect_ratio = trainsize[0] * 1.0 / trainsize[1] + self.rot_prob = rot_prob + self.do_occlusion = do_occlusion + + def halfbody_transform(self, joints, joints_vis): + upper_joints = [] + lower_joints = [] + for joint_id in range(joints.shape[0]): + if joints_vis[joint_id][0] > 0: + if joint_id in self.upper_body_ids: + upper_joints.append(joints[joint_id]) + else: + lower_joints.append(joints[joint_id]) + if np.random.randn() < 0.5 and len(upper_joints) > 2: + selected_joints = upper_joints + else: + selected_joints = lower_joints if len( + lower_joints) > 2 else upper_joints + if len(selected_joints) < 2: + return None, None + selected_joints = np.array(selected_joints, dtype=np.float32) + center = selected_joints.mean(axis=0)[:2] + left_top = np.amin(selected_joints, axis=0) + right_bottom = np.amax(selected_joints, axis=0) + w = right_bottom[0] - left_top[0] + h = right_bottom[1] - left_top[1] + if w > self.aspect_ratio * h: + h = w * 1.0 / self.aspect_ratio + elif w < self.aspect_ratio * h: + w = h * self.aspect_ratio + scale = np.array( + [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std], + dtype=np.float32) + scale = scale * 1.5 + + return center, scale + + def flip_joints(self, joints, joints_vis, width, matched_parts, kps2d=None): + # joints: (6, 24, 3),(num_frames, num_joints, 3) + + joints[:, :, 0] = width - joints[:, :, 0] - 1 # x + if kps2d is not None: + kps2d[:, :, 0] = width - kps2d[:, :, 0] - 1 + + for pair in matched_parts: + joints[:, pair[0], :], joints[:,pair[1], :] = \ + joints[:,pair[1], :], joints[:,pair[0], :].copy() + + joints_vis[:,pair[0], :], joints_vis[:,pair[1], :] = \ + joints_vis[:,pair[1], :], joints_vis[:,pair[0], :].copy() + + if kps2d is not None: + kps2d[:, pair[0], :], kps2d[:,pair[1], :] = \ + kps2d[:,pair[1], :], kps2d[:,pair[0], :].copy() + + # move to zero + joints -= joints[:, [0], :] # (batch_size, 24, 3),numpy.ndarray + + return joints, joints_vis, kps2d + + def __call__(self, records): + images = records[ + 'image'] #kps3d, kps3d_vis, images. images.shape(num_frames, width, height, 3) + + joints = records['kps3d'] + joints_vis = records['kps3d_vis'] + + kps2d = None + if 'kps2d' in records.keys(): + kps2d = records['kps2d'] + + if self.flip and np.random.random() <= 0.5: + images = images[:, :, ::-1, :] # 图像水平翻转 (6, 1080, 810, 3) + joints, joints_vis, kps2d = self.flip_joints( + joints, joints_vis, images.shape[2], self.flip_pairs, + kps2d) # 关键点左右对称翻转 + occlusion = False + if self.do_occlusion and random.random() <= 0.5: # 随机遮挡 + height = images[0].shape[0] + width = images[0].shape[1] + occlusion = True + while True: + area_min = 0.0 + area_max = 0.2 + synth_area = (random.random() * + (area_max - area_min) + area_min) * width * height + + ratio_min = 0.3 + ratio_max = 1 / 0.3 + synth_ratio = (random.random() * + (ratio_max - ratio_min) + ratio_min) + + synth_h = math.sqrt(synth_area * synth_ratio) + synth_w = math.sqrt(synth_area / synth_ratio) + synth_xmin = random.random() * (width - synth_w - 1) + synth_ymin = random.random() * (height - synth_h - 1) + + if synth_xmin >= 0 and synth_ymin >= 0 and synth_xmin + synth_w < width and synth_ymin + synth_h < height: + xmin = int(synth_xmin) + ymin = int(synth_ymin) + w = int(synth_w) + h = int(synth_h) + + mask = np.random.rand(h, w, 3) * 255 + images[:, ymin:ymin + h, xmin:xmin + w, :] = mask[ + None, :, :, :] + break + + records['image'] = images + records['kps3d'] = joints + records['kps3d_vis'] = joints_vis + if kps2d is not None: + records['kps2d'] = kps2d + + return records diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/mot_operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/mot_operators.py new file mode 100644 index 0000000000000000000000000000000000000000..e533ea3dc186a1b5cae4ee221920839848f387b6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/mot_operators.py @@ -0,0 +1,627 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence +from numbers import Integral + +import cv2 +import copy +import numpy as np +import random +import math + +from .operators import BaseOperator, register_op +from .batch_operators import Gt2TTFTarget +from ppdet.modeling.bbox_utils import bbox_iou_np_expand +from ppdet.utils.logger import setup_logger +from .op_helper import gaussian_radius +logger = setup_logger(__name__) + +__all__ = [ + 'RGBReverse', 'LetterBoxResize', 'MOTRandomAffine', 'Gt2JDETargetThres', + 'Gt2JDETargetMax', 'Gt2FairMOTTarget' +] + + +@register_op +class RGBReverse(BaseOperator): + """RGB to BGR, or BGR to RGB, sensitive to MOTRandomAffine + """ + + def __init__(self): + super(RGBReverse, self).__init__() + + def apply(self, sample, context=None): + im = sample['image'] + sample['image'] = np.ascontiguousarray(im[:, :, ::-1]) + return sample + + +@register_op +class LetterBoxResize(BaseOperator): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if not isinstance(target_size, (Integral, Sequence)): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or Tuple, now is {}". + format(type(target_size))) + if isinstance(target_size, Integral): + target_size = [target_size, target_size] + self.target_size = target_size + + def apply_image(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def apply_bbox(self, bbox0, h, w, ratio, padw, padh): + bboxes = bbox0.copy() + bboxes[:, 0] = ratio * w * (bbox0[:, 0] - bbox0[:, 2] / 2) + padw + bboxes[:, 1] = ratio * h * (bbox0[:, 1] - bbox0[:, 3] / 2) + padh + bboxes[:, 2] = ratio * w * (bbox0[:, 0] + bbox0[:, 2] / 2) + padw + bboxes[:, 3] = ratio * h * (bbox0[:, 1] + bbox0[:, 3] / 2) + padh + return bboxes + + def apply(self, sample, context=None): + """ Resize the image numpy. + """ + im = sample['image'] + h, w = sample['im_shape'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image type is not numpy.".format(self)) + if len(im.shape) != 3: + from PIL import UnidentifiedImageError + raise UnidentifiedImageError( + '{}: image is not 3-dimensional.'.format(self)) + + # apply image + height, width = self.target_size + img, ratio, padw, padh = self.apply_image( + im, height=height, width=width) + + sample['image'] = img + new_shape = (round(h * ratio), round(w * ratio)) + sample['im_shape'] = np.asarray(new_shape, dtype=np.float32) + sample['scale_factor'] = np.asarray([ratio, ratio], dtype=np.float32) + + # apply bbox + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], h, w, ratio, + padw, padh) + return sample + + +@register_op +class MOTRandomAffine(BaseOperator): + """ + Affine transform to image and coords to achieve the rotate, scale and + shift effect for training image. + + Args: + degrees (list[2]): the rotate range to apply, transform range is [min, max] + translate (list[2]): the translate range to apply, transform range is [min, max] + scale (list[2]): the scale range to apply, transform range is [min, max] + shear (list[2]): the shear range to apply, transform range is [min, max] + borderValue (list[3]): value used in case of a constant border when appling + the perspective transformation + reject_outside (bool): reject warped bounding bboxes outside of image + + Returns: + records(dict): contain the image and coords after tranformed + + """ + + def __init__(self, + degrees=(-5, 5), + translate=(0.10, 0.10), + scale=(0.50, 1.20), + shear=(-2, 2), + borderValue=(127.5, 127.5, 127.5), + reject_outside=True): + super(MOTRandomAffine, self).__init__() + self.degrees = degrees + self.translate = translate + self.scale = scale + self.shear = shear + self.borderValue = borderValue + self.reject_outside = reject_outside + + def apply(self, sample, context=None): + # https://medium.com/uruvideo/dataset-augmentation-with-random-homographies-a8f4b44830d4 + border = 0 # width of added border (optional) + + img = sample['image'] + height, width = img.shape[0], img.shape[1] + + # Rotation and Scale + R = np.eye(3) + a = random.random() * (self.degrees[1] - self.degrees[0] + ) + self.degrees[0] + s = random.random() * (self.scale[1] - self.scale[0]) + self.scale[0] + R[:2] = cv2.getRotationMatrix2D( + angle=a, center=(width / 2, height / 2), scale=s) + + # Translation + T = np.eye(3) + T[0, 2] = ( + random.random() * 2 - 1 + ) * self.translate[0] * height + border # x translation (pixels) + T[1, 2] = ( + random.random() * 2 - 1 + ) * self.translate[1] * width + border # y translation (pixels) + + # Shear + S = np.eye(3) + S[0, 1] = math.tan((random.random() * + (self.shear[1] - self.shear[0]) + self.shear[0]) * + math.pi / 180) # x shear (deg) + S[1, 0] = math.tan((random.random() * + (self.shear[1] - self.shear[0]) + self.shear[0]) * + math.pi / 180) # y shear (deg) + + M = S @T @R # Combined rotation matrix. ORDER IS IMPORTANT HERE!! + imw = cv2.warpPerspective( + img, + M, + dsize=(width, height), + flags=cv2.INTER_LINEAR, + borderValue=self.borderValue) # BGR order borderValue + + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + targets = sample['gt_bbox'] + n = targets.shape[0] + points = targets.copy() + area0 = (points[:, 2] - points[:, 0]) * ( + points[:, 3] - points[:, 1]) + + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = points[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + xy = (xy @M.T)[:, :2].reshape(n, 8) + + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate( + (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + + # apply angle-based reduction + radians = a * math.pi / 180 + reduction = max(abs(math.sin(radians)), abs(math.cos(radians)))**0.5 + x = (xy[:, 2] + xy[:, 0]) / 2 + y = (xy[:, 3] + xy[:, 1]) / 2 + w = (xy[:, 2] - xy[:, 0]) * reduction + h = (xy[:, 3] - xy[:, 1]) * reduction + xy = np.concatenate( + (x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T + + # reject warped points outside of image + if self.reject_outside: + np.clip(xy[:, 0], 0, width, out=xy[:, 0]) + np.clip(xy[:, 2], 0, width, out=xy[:, 2]) + np.clip(xy[:, 1], 0, height, out=xy[:, 1]) + np.clip(xy[:, 3], 0, height, out=xy[:, 3]) + w = xy[:, 2] - xy[:, 0] + h = xy[:, 3] - xy[:, 1] + area = w * h + ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16)) + i = (w > 4) & (h > 4) & (area / (area0 + 1e-16) > 0.1) & (ar < 10) + + if sum(i) > 0: + sample['gt_bbox'] = xy[i].astype(sample['gt_bbox'].dtype) + sample['gt_class'] = sample['gt_class'][i] + if 'difficult' in sample: + sample['difficult'] = sample['difficult'][i] + if 'gt_ide' in sample: + sample['gt_ide'] = sample['gt_ide'][i] + if 'is_crowd' in sample: + sample['is_crowd'] = sample['is_crowd'][i] + sample['image'] = imw + return sample + else: + return sample + + +@register_op +class Gt2JDETargetThres(BaseOperator): + __shared__ = ['num_classes'] + """ + Generate JDE targets by groud truth data when training + Args: + anchors (list): anchors of JDE model + anchor_masks (list): anchor_masks of JDE model + downsample_ratios (list): downsample ratios of JDE model + ide_thresh (float): thresh of identity, higher is groud truth + fg_thresh (float): thresh of foreground, higher is foreground + bg_thresh (float): thresh of background, lower is background + num_classes (int): number of classes + """ + + def __init__(self, + anchors, + anchor_masks, + downsample_ratios, + ide_thresh=0.5, + fg_thresh=0.5, + bg_thresh=0.4, + num_classes=1): + super(Gt2JDETargetThres, self).__init__() + self.anchors = anchors + self.anchor_masks = anchor_masks + self.downsample_ratios = downsample_ratios + self.ide_thresh = ide_thresh + self.fg_thresh = fg_thresh + self.bg_thresh = bg_thresh + self.num_classes = num_classes + + def generate_anchor(self, nGh, nGw, anchor_hw): + nA = len(anchor_hw) + yy, xx = np.meshgrid(np.arange(nGh), np.arange(nGw)) + + mesh = np.stack([xx.T, yy.T], axis=0) # [2, nGh, nGw] + mesh = np.repeat(mesh[None, :], nA, axis=0) # [nA, 2, nGh, nGw] + + anchor_offset_mesh = anchor_hw[:, :, None][:, :, :, None] + anchor_offset_mesh = np.repeat(anchor_offset_mesh, nGh, axis=-2) + anchor_offset_mesh = np.repeat(anchor_offset_mesh, nGw, axis=-1) + + anchor_mesh = np.concatenate( + [mesh, anchor_offset_mesh], axis=1) # [nA, 4, nGh, nGw] + return anchor_mesh + + def encode_delta(self, gt_box_list, fg_anchor_list): + px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:,1], \ + fg_anchor_list[:, 2], fg_anchor_list[:,3] + gx, gy, gw, gh = gt_box_list[:, 0], gt_box_list[:, 1], \ + gt_box_list[:, 2], gt_box_list[:, 3] + dx = (gx - px) / pw + dy = (gy - py) / ph + dw = np.log(gw / pw) + dh = np.log(gh / ph) + return np.stack([dx, dy, dw, dh], axis=1) + + def pad_box(self, sample, num_max): + assert 'gt_bbox' in sample + bbox = sample['gt_bbox'] + gt_num = len(bbox) + pad_bbox = np.zeros((num_max, 4), dtype=np.float32) + if gt_num > 0: + pad_bbox[:gt_num, :] = bbox[:gt_num, :] + sample['gt_bbox'] = pad_bbox + if 'gt_score' in sample: + pad_score = np.zeros((num_max, ), dtype=np.float32) + if gt_num > 0: + pad_score[:gt_num] = sample['gt_score'][:gt_num, 0] + sample['gt_score'] = pad_score + if 'difficult' in sample: + pad_diff = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_diff[:gt_num] = sample['difficult'][:gt_num, 0] + sample['difficult'] = pad_diff + if 'is_crowd' in sample: + pad_crowd = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_crowd[:gt_num] = sample['is_crowd'][:gt_num, 0] + sample['is_crowd'] = pad_crowd + if 'gt_ide' in sample: + pad_ide = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_ide[:gt_num] = sample['gt_ide'][:gt_num, 0] + sample['gt_ide'] = pad_ide + return sample + + def __call__(self, samples, context=None): + assert len(self.anchor_masks) == len(self.downsample_ratios), \ + "anchor_masks', and 'downsample_ratios' should have same length." + h, w = samples[0]['image'].shape[1:3] + + num_max = 0 + for sample in samples: + num_max = max(num_max, len(sample['gt_bbox'])) + + for sample in samples: + gt_bbox = sample['gt_bbox'] + gt_ide = sample['gt_ide'] + for i, (anchor_hw, downsample_ratio + ) in enumerate(zip(self.anchors, self.downsample_ratios)): + anchor_hw = np.array( + anchor_hw, dtype=np.float32) / downsample_ratio + nA = len(anchor_hw) + nGh, nGw = int(h / downsample_ratio), int(w / downsample_ratio) + tbox = np.zeros((nA, nGh, nGw, 4), dtype=np.float32) + tconf = np.zeros((nA, nGh, nGw), dtype=np.float32) + tid = -np.ones((nA, nGh, nGw, 1), dtype=np.float32) + + gxy, gwh = gt_bbox[:, 0:2].copy(), gt_bbox[:, 2:4].copy() + gxy[:, 0] = gxy[:, 0] * nGw + gxy[:, 1] = gxy[:, 1] * nGh + gwh[:, 0] = gwh[:, 0] * nGw + gwh[:, 1] = gwh[:, 1] * nGh + gxy[:, 0] = np.clip(gxy[:, 0], 0, nGw - 1) + gxy[:, 1] = np.clip(gxy[:, 1], 0, nGh - 1) + tboxes = np.concatenate([gxy, gwh], axis=1) + + anchor_mesh = self.generate_anchor(nGh, nGw, anchor_hw) + + anchor_list = np.transpose(anchor_mesh, + (0, 2, 3, 1)).reshape(-1, 4) + iou_pdist = bbox_iou_np_expand( + anchor_list, tboxes, x1y1x2y2=False) + + iou_max = np.max(iou_pdist, axis=1) + max_gt_index = np.argmax(iou_pdist, axis=1) + + iou_map = iou_max.reshape(nA, nGh, nGw) + gt_index_map = max_gt_index.reshape(nA, nGh, nGw) + + id_index = iou_map > self.ide_thresh + fg_index = iou_map > self.fg_thresh + bg_index = iou_map < self.bg_thresh + ign_index = (iou_map < self.fg_thresh) * ( + iou_map > self.bg_thresh) + tconf[fg_index] = 1 + tconf[bg_index] = 0 + tconf[ign_index] = -1 + + gt_index = gt_index_map[fg_index] + gt_box_list = tboxes[gt_index] + gt_id_list = gt_ide[gt_index_map[id_index]] + + if np.sum(fg_index) > 0: + tid[id_index] = gt_id_list + + fg_anchor_list = anchor_list.reshape(nA, nGh, nGw, + 4)[fg_index] + delta_target = self.encode_delta(gt_box_list, + fg_anchor_list) + tbox[fg_index] = delta_target + + sample['tbox{}'.format(i)] = tbox + sample['tconf{}'.format(i)] = tconf + sample['tide{}'.format(i)] = tid + sample.pop('gt_class') + sample = self.pad_box(sample, num_max) + return samples + + +@register_op +class Gt2JDETargetMax(BaseOperator): + __shared__ = ['num_classes'] + """ + Generate JDE targets by groud truth data when evaluating + Args: + anchors (list): anchors of JDE model + anchor_masks (list): anchor_masks of JDE model + downsample_ratios (list): downsample ratios of JDE model + max_iou_thresh (float): iou thresh for high quality anchor + num_classes (int): number of classes + """ + + def __init__(self, + anchors, + anchor_masks, + downsample_ratios, + max_iou_thresh=0.60, + num_classes=1): + super(Gt2JDETargetMax, self).__init__() + self.anchors = anchors + self.anchor_masks = anchor_masks + self.downsample_ratios = downsample_ratios + self.max_iou_thresh = max_iou_thresh + self.num_classes = num_classes + + def __call__(self, samples, context=None): + assert len(self.anchor_masks) == len(self.downsample_ratios), \ + "anchor_masks', and 'downsample_ratios' should have same length." + h, w = samples[0]['image'].shape[1:3] + for sample in samples: + gt_bbox = sample['gt_bbox'] + gt_ide = sample['gt_ide'] + for i, (anchor_hw, downsample_ratio + ) in enumerate(zip(self.anchors, self.downsample_ratios)): + anchor_hw = np.array( + anchor_hw, dtype=np.float32) / downsample_ratio + nA = len(anchor_hw) + nGh, nGw = int(h / downsample_ratio), int(w / downsample_ratio) + tbox = np.zeros((nA, nGh, nGw, 4), dtype=np.float32) + tconf = np.zeros((nA, nGh, nGw), dtype=np.float32) + tid = -np.ones((nA, nGh, nGw, 1), dtype=np.float32) + + gxy, gwh = gt_bbox[:, 0:2].copy(), gt_bbox[:, 2:4].copy() + gxy[:, 0] = gxy[:, 0] * nGw + gxy[:, 1] = gxy[:, 1] * nGh + gwh[:, 0] = gwh[:, 0] * nGw + gwh[:, 1] = gwh[:, 1] * nGh + gi = np.clip(gxy[:, 0], 0, nGw - 1).astype(int) + gj = np.clip(gxy[:, 1], 0, nGh - 1).astype(int) + + # iou of targets-anchors (using wh only) + box1 = gwh + box2 = anchor_hw[:, None, :] + inter_area = np.minimum(box1, box2).prod(2) + iou = inter_area / ( + box1.prod(1) + box2.prod(2) - inter_area + 1e-16) + + # Select best iou_pred and anchor + iou_best = iou.max(0) # best anchor [0-2] for each target + a = np.argmax(iou, axis=0) + + # Select best unique target-anchor combinations + iou_order = np.argsort(-iou_best) # best to worst + + # Unique anchor selection + u = np.stack((gi, gj, a), 0)[:, iou_order] + _, first_unique = np.unique(u, axis=1, return_index=True) + mask = iou_order[first_unique] + # best anchor must share significant commonality (iou) with target + # TODO: examine arbitrary threshold + idx = mask[iou_best[mask] > self.max_iou_thresh] + + if len(idx) > 0: + a_i, gj_i, gi_i = a[idx], gj[idx], gi[idx] + t_box = gt_bbox[idx] + t_id = gt_ide[idx] + if len(t_box.shape) == 1: + t_box = t_box.reshape(1, 4) + + gxy, gwh = t_box[:, 0:2].copy(), t_box[:, 2:4].copy() + gxy[:, 0] = gxy[:, 0] * nGw + gxy[:, 1] = gxy[:, 1] * nGh + gwh[:, 0] = gwh[:, 0] * nGw + gwh[:, 1] = gwh[:, 1] * nGh + + # XY coordinates + tbox[:, :, :, 0:2][a_i, gj_i, gi_i] = gxy - gxy.astype(int) + # Width and height in yolo method + tbox[:, :, :, 2:4][a_i, gj_i, gi_i] = np.log(gwh / + anchor_hw[a_i]) + tconf[a_i, gj_i, gi_i] = 1 + tid[a_i, gj_i, gi_i] = t_id + + sample['tbox{}'.format(i)] = tbox + sample['tconf{}'.format(i)] = tconf + sample['tide{}'.format(i)] = tid + + +class Gt2FairMOTTarget(Gt2TTFTarget): + __shared__ = ['num_classes'] + """ + Generate FairMOT targets by ground truth data. + Difference between Gt2FairMOTTarget and Gt2TTFTarget are: + 1. the gaussian kernal radius to generate a heatmap. + 2. the targets needed during training. + + Args: + num_classes(int): the number of classes. + down_ratio(int): the down ratio from images to heatmap, 4 by default. + max_objs(int): the maximum number of ground truth objects in a image, 500 by default. + """ + + def __init__(self, num_classes=1, down_ratio=4, max_objs=500): + super(Gt2TTFTarget, self).__init__() + self.down_ratio = down_ratio + self.num_classes = num_classes + self.max_objs = max_objs + + def __call__(self, samples, context=None): + for b_id, sample in enumerate(samples): + output_h = sample['image'].shape[1] // self.down_ratio + output_w = sample['image'].shape[2] // self.down_ratio + + heatmap = np.zeros( + (self.num_classes, output_h, output_w), dtype='float32') + bbox_size = np.zeros((self.max_objs, 4), dtype=np.float32) + center_offset = np.zeros((self.max_objs, 2), dtype=np.float32) + index = np.zeros((self.max_objs, ), dtype=np.int64) + index_mask = np.zeros((self.max_objs, ), dtype=np.int32) + reid = np.zeros((self.max_objs, ), dtype=np.int64) + bbox_xys = np.zeros((self.max_objs, 4), dtype=np.float32) + if self.num_classes > 1: + # each category corresponds to a set of track ids + cls_tr_ids = np.zeros( + (self.num_classes, output_h, output_w), dtype=np.int64) + cls_id_map = np.full((output_h, output_w), -1, dtype=np.int64) + + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + gt_ide = sample['gt_ide'] + + for k in range(len(gt_bbox)): + cls_id = gt_class[k][0] + bbox = gt_bbox[k] + ide = gt_ide[k][0] + bbox[[0, 2]] = bbox[[0, 2]] * output_w + bbox[[1, 3]] = bbox[[1, 3]] * output_h + bbox_amodal = copy.deepcopy(bbox) + bbox_amodal[0] = bbox_amodal[0] - bbox_amodal[2] / 2. + bbox_amodal[1] = bbox_amodal[1] - bbox_amodal[3] / 2. + bbox_amodal[2] = bbox_amodal[0] + bbox_amodal[2] + bbox_amodal[3] = bbox_amodal[1] + bbox_amodal[3] + bbox[0] = np.clip(bbox[0], 0, output_w - 1) + bbox[1] = np.clip(bbox[1], 0, output_h - 1) + h = bbox[3] + w = bbox[2] + + bbox_xy = copy.deepcopy(bbox) + bbox_xy[0] = bbox_xy[0] - bbox_xy[2] / 2 + bbox_xy[1] = bbox_xy[1] - bbox_xy[3] / 2 + bbox_xy[2] = bbox_xy[0] + bbox_xy[2] + bbox_xy[3] = bbox_xy[1] + bbox_xy[3] + + if h > 0 and w > 0: + radius = gaussian_radius((math.ceil(h), math.ceil(w)), 0.7) + radius = max(0, int(radius)) + ct = np.array([bbox[0], bbox[1]], dtype=np.float32) + ct_int = ct.astype(np.int32) + self.draw_truncate_gaussian(heatmap[cls_id], ct_int, radius, + radius) + bbox_size[k] = ct[0] - bbox_amodal[0], ct[1] - bbox_amodal[1], \ + bbox_amodal[2] - ct[0], bbox_amodal[3] - ct[1] + + index[k] = ct_int[1] * output_w + ct_int[0] + center_offset[k] = ct - ct_int + index_mask[k] = 1 + reid[k] = ide + bbox_xys[k] = bbox_xy + if self.num_classes > 1: + cls_id_map[ct_int[1], ct_int[0]] = cls_id + cls_tr_ids[cls_id][ct_int[1]][ct_int[0]] = ide - 1 + # track id start from 0 + + sample['heatmap'] = heatmap + sample['index'] = index + sample['offset'] = center_offset + sample['size'] = bbox_size + sample['index_mask'] = index_mask + sample['reid'] = reid + if self.num_classes > 1: + sample['cls_id_map'] = cls_id_map + sample['cls_tr_ids'] = cls_tr_ids + sample['bbox_xys'] = bbox_xys + sample.pop('is_crowd', None) + sample.pop('difficult', None) + sample.pop('gt_class', None) + sample.pop('gt_bbox', None) + sample.pop('gt_score', None) + sample.pop('gt_ide', None) + return samples diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/op_helper.py b/PaddleDetection-release-2.6/ppdet/data/transform/op_helper.py new file mode 100644 index 0000000000000000000000000000000000000000..6c400306da8ec3ff605c0efac3e725ffd2e267a3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/op_helper.py @@ -0,0 +1,494 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# this file contains helper methods for BBOX processing + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import random +import math +import cv2 + + +def meet_emit_constraint(src_bbox, sample_bbox): + center_x = (src_bbox[2] + src_bbox[0]) / 2 + center_y = (src_bbox[3] + src_bbox[1]) / 2 + if center_x >= sample_bbox[0] and \ + center_x <= sample_bbox[2] and \ + center_y >= sample_bbox[1] and \ + center_y <= sample_bbox[3]: + return True + return False + + +def clip_bbox(src_bbox): + src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0) + src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0) + src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0) + src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0) + return src_bbox + + +def bbox_area(src_bbox): + if src_bbox[2] < src_bbox[0] or src_bbox[3] < src_bbox[1]: + return 0. + else: + width = src_bbox[2] - src_bbox[0] + height = src_bbox[3] - src_bbox[1] + return width * height + + +def is_overlap(object_bbox, sample_bbox): + if object_bbox[0] >= sample_bbox[2] or \ + object_bbox[2] <= sample_bbox[0] or \ + object_bbox[1] >= sample_bbox[3] or \ + object_bbox[3] <= sample_bbox[1]: + return False + else: + return True + + +def filter_and_process(sample_bbox, bboxes, labels, scores=None, + keypoints=None): + new_bboxes = [] + new_labels = [] + new_scores = [] + new_keypoints = [] + new_kp_ignore = [] + for i in range(len(bboxes)): + new_bbox = [0, 0, 0, 0] + obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]] + if not meet_emit_constraint(obj_bbox, sample_bbox): + continue + if not is_overlap(obj_bbox, sample_bbox): + continue + sample_width = sample_bbox[2] - sample_bbox[0] + sample_height = sample_bbox[3] - sample_bbox[1] + new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width + new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height + new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width + new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height + new_bbox = clip_bbox(new_bbox) + if bbox_area(new_bbox) > 0: + new_bboxes.append(new_bbox) + new_labels.append([labels[i][0]]) + if scores is not None: + new_scores.append([scores[i][0]]) + if keypoints is not None: + sample_keypoint = keypoints[0][i] + for j in range(len(sample_keypoint)): + kp_len = sample_height if j % 2 else sample_width + sample_coord = sample_bbox[1] if j % 2 else sample_bbox[0] + sample_keypoint[j] = ( + sample_keypoint[j] - sample_coord) / kp_len + sample_keypoint[j] = max(min(sample_keypoint[j], 1.0), 0.0) + new_keypoints.append(sample_keypoint) + new_kp_ignore.append(keypoints[1][i]) + + bboxes = np.array(new_bboxes) + labels = np.array(new_labels) + scores = np.array(new_scores) + if keypoints is not None: + keypoints = np.array(new_keypoints) + new_kp_ignore = np.array(new_kp_ignore) + return bboxes, labels, scores, (keypoints, new_kp_ignore) + return bboxes, labels, scores + + +def bbox_area_sampling(bboxes, labels, scores, target_size, min_size): + new_bboxes = [] + new_labels = [] + new_scores = [] + for i, bbox in enumerate(bboxes): + w = float((bbox[2] - bbox[0]) * target_size) + h = float((bbox[3] - bbox[1]) * target_size) + if w * h < float(min_size * min_size): + continue + else: + new_bboxes.append(bbox) + new_labels.append(labels[i]) + if scores is not None and scores.size != 0: + new_scores.append(scores[i]) + bboxes = np.array(new_bboxes) + labels = np.array(new_labels) + scores = np.array(new_scores) + return bboxes, labels, scores + + +def generate_sample_bbox(sampler): + scale = np.random.uniform(sampler[2], sampler[3]) + aspect_ratio = np.random.uniform(sampler[4], sampler[5]) + aspect_ratio = max(aspect_ratio, (scale**2.0)) + aspect_ratio = min(aspect_ratio, 1 / (scale**2.0)) + bbox_width = scale * (aspect_ratio**0.5) + bbox_height = scale / (aspect_ratio**0.5) + xmin_bound = 1 - bbox_width + ymin_bound = 1 - bbox_height + xmin = np.random.uniform(0, xmin_bound) + ymin = np.random.uniform(0, ymin_bound) + xmax = xmin + bbox_width + ymax = ymin + bbox_height + sampled_bbox = [xmin, ymin, xmax, ymax] + return sampled_bbox + + +def generate_sample_bbox_square(sampler, image_width, image_height): + scale = np.random.uniform(sampler[2], sampler[3]) + aspect_ratio = np.random.uniform(sampler[4], sampler[5]) + aspect_ratio = max(aspect_ratio, (scale**2.0)) + aspect_ratio = min(aspect_ratio, 1 / (scale**2.0)) + bbox_width = scale * (aspect_ratio**0.5) + bbox_height = scale / (aspect_ratio**0.5) + if image_height < image_width: + bbox_width = bbox_height * image_height / image_width + else: + bbox_height = bbox_width * image_width / image_height + xmin_bound = 1 - bbox_width + ymin_bound = 1 - bbox_height + xmin = np.random.uniform(0, xmin_bound) + ymin = np.random.uniform(0, ymin_bound) + xmax = xmin + bbox_width + ymax = ymin + bbox_height + sampled_bbox = [xmin, ymin, xmax, ymax] + return sampled_bbox + + +def data_anchor_sampling(bbox_labels, image_width, image_height, scale_array, + resize_width): + num_gt = len(bbox_labels) + # np.random.randint range: [low, high) + rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0 + + if num_gt != 0: + norm_xmin = bbox_labels[rand_idx][0] + norm_ymin = bbox_labels[rand_idx][1] + norm_xmax = bbox_labels[rand_idx][2] + norm_ymax = bbox_labels[rand_idx][3] + + xmin = norm_xmin * image_width + ymin = norm_ymin * image_height + wid = image_width * (norm_xmax - norm_xmin) + hei = image_height * (norm_ymax - norm_ymin) + range_size = 0 + + area = wid * hei + for scale_ind in range(0, len(scale_array) - 1): + if area > scale_array[scale_ind] ** 2 and area < \ + scale_array[scale_ind + 1] ** 2: + range_size = scale_ind + 1 + break + + if area > scale_array[len(scale_array) - 2]**2: + range_size = len(scale_array) - 2 + + scale_choose = 0.0 + if range_size == 0: + rand_idx_size = 0 + else: + # np.random.randint range: [low, high) + rng_rand_size = np.random.randint(0, range_size + 1) + rand_idx_size = rng_rand_size % (range_size + 1) + + if rand_idx_size == range_size: + min_resize_val = scale_array[rand_idx_size] / 2.0 + max_resize_val = min(2.0 * scale_array[rand_idx_size], + 2 * math.sqrt(wid * hei)) + scale_choose = random.uniform(min_resize_val, max_resize_val) + else: + min_resize_val = scale_array[rand_idx_size] / 2.0 + max_resize_val = 2.0 * scale_array[rand_idx_size] + scale_choose = random.uniform(min_resize_val, max_resize_val) + + sample_bbox_size = wid * resize_width / scale_choose + + w_off_orig = 0.0 + h_off_orig = 0.0 + if sample_bbox_size < max(image_height, image_width): + if wid <= sample_bbox_size: + w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size, + xmin) + else: + w_off_orig = np.random.uniform(xmin, + xmin + wid - sample_bbox_size) + + if hei <= sample_bbox_size: + h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size, + ymin) + else: + h_off_orig = np.random.uniform(ymin, + ymin + hei - sample_bbox_size) + + else: + w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0) + h_off_orig = np.random.uniform(image_height - sample_bbox_size, 0.0) + + w_off_orig = math.floor(w_off_orig) + h_off_orig = math.floor(h_off_orig) + + # Figure out top left coordinates. + w_off = float(w_off_orig / image_width) + h_off = float(h_off_orig / image_height) + + sampled_bbox = [ + w_off, h_off, w_off + float(sample_bbox_size / image_width), + h_off + float(sample_bbox_size / image_height) + ] + return sampled_bbox + else: + return 0 + + +def jaccard_overlap(sample_bbox, object_bbox): + if sample_bbox[0] >= object_bbox[2] or \ + sample_bbox[2] <= object_bbox[0] or \ + sample_bbox[1] >= object_bbox[3] or \ + sample_bbox[3] <= object_bbox[1]: + return 0 + intersect_xmin = max(sample_bbox[0], object_bbox[0]) + intersect_ymin = max(sample_bbox[1], object_bbox[1]) + intersect_xmax = min(sample_bbox[2], object_bbox[2]) + intersect_ymax = min(sample_bbox[3], object_bbox[3]) + intersect_size = (intersect_xmax - intersect_xmin) * ( + intersect_ymax - intersect_ymin) + sample_bbox_size = bbox_area(sample_bbox) + object_bbox_size = bbox_area(object_bbox) + overlap = intersect_size / ( + sample_bbox_size + object_bbox_size - intersect_size) + return overlap + + +def intersect_bbox(bbox1, bbox2): + if bbox2[0] > bbox1[2] or bbox2[2] < bbox1[0] or \ + bbox2[1] > bbox1[3] or bbox2[3] < bbox1[1]: + intersection_box = [0.0, 0.0, 0.0, 0.0] + else: + intersection_box = [ + max(bbox1[0], bbox2[0]), max(bbox1[1], bbox2[1]), + min(bbox1[2], bbox2[2]), min(bbox1[3], bbox2[3]) + ] + return intersection_box + + +def bbox_coverage(bbox1, bbox2): + inter_box = intersect_bbox(bbox1, bbox2) + intersect_size = bbox_area(inter_box) + + if intersect_size > 0: + bbox1_size = bbox_area(bbox1) + return intersect_size / bbox1_size + else: + return 0. + + +def satisfy_sample_constraint(sampler, + sample_bbox, + gt_bboxes, + satisfy_all=False): + if sampler[6] == 0 and sampler[7] == 0: + return True + satisfied = [] + for i in range(len(gt_bboxes)): + object_bbox = [ + gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3] + ] + overlap = jaccard_overlap(sample_bbox, object_bbox) + if sampler[6] != 0 and \ + overlap < sampler[6]: + satisfied.append(False) + continue + if sampler[7] != 0 and \ + overlap > sampler[7]: + satisfied.append(False) + continue + satisfied.append(True) + if not satisfy_all: + return True + + if satisfy_all: + return np.all(satisfied) + else: + return False + + +def satisfy_sample_constraint_coverage(sampler, sample_bbox, gt_bboxes): + if sampler[6] == 0 and sampler[7] == 0: + has_jaccard_overlap = False + else: + has_jaccard_overlap = True + if sampler[8] == 0 and sampler[9] == 0: + has_object_coverage = False + else: + has_object_coverage = True + + if not has_jaccard_overlap and not has_object_coverage: + return True + found = False + for i in range(len(gt_bboxes)): + object_bbox = [ + gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3] + ] + if has_jaccard_overlap: + overlap = jaccard_overlap(sample_bbox, object_bbox) + if sampler[6] != 0 and \ + overlap < sampler[6]: + continue + if sampler[7] != 0 and \ + overlap > sampler[7]: + continue + found = True + if has_object_coverage: + object_coverage = bbox_coverage(object_bbox, sample_bbox) + if sampler[8] != 0 and \ + object_coverage < sampler[8]: + continue + if sampler[9] != 0 and \ + object_coverage > sampler[9]: + continue + found = True + if found: + return True + return found + + +def crop_image_sampling(img, sample_bbox, image_width, image_height, + target_size): + # no clipping here + xmin = int(sample_bbox[0] * image_width) + xmax = int(sample_bbox[2] * image_width) + ymin = int(sample_bbox[1] * image_height) + ymax = int(sample_bbox[3] * image_height) + + w_off = xmin + h_off = ymin + width = xmax - xmin + height = ymax - ymin + cross_xmin = max(0.0, float(w_off)) + cross_ymin = max(0.0, float(h_off)) + cross_xmax = min(float(w_off + width - 1.0), float(image_width)) + cross_ymax = min(float(h_off + height - 1.0), float(image_height)) + cross_width = cross_xmax - cross_xmin + cross_height = cross_ymax - cross_ymin + + roi_xmin = 0 if w_off >= 0 else abs(w_off) + roi_ymin = 0 if h_off >= 0 else abs(h_off) + roi_width = cross_width + roi_height = cross_height + + roi_y1 = int(roi_ymin) + roi_y2 = int(roi_ymin + roi_height) + roi_x1 = int(roi_xmin) + roi_x2 = int(roi_xmin + roi_width) + + cross_y1 = int(cross_ymin) + cross_y2 = int(cross_ymin + cross_height) + cross_x1 = int(cross_xmin) + cross_x2 = int(cross_xmin + cross_width) + + sample_img = np.zeros((height, width, 3)) + sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \ + img[cross_y1: cross_y2, cross_x1: cross_x2] + + sample_img = cv2.resize( + sample_img, (target_size, target_size), interpolation=cv2.INTER_AREA) + + return sample_img + + +def is_poly(segm): + assert isinstance(segm, (list, dict)), \ + "Invalid segm type: {}".format(type(segm)) + return isinstance(segm, list) + + +def gaussian_radius(bbox_size, min_overlap): + height, width = bbox_size + + a1 = 1 + b1 = (height + width) + c1 = width * height * (1 - min_overlap) / (1 + min_overlap) + sq1 = np.sqrt(b1**2 - 4 * a1 * c1) + radius1 = (b1 + sq1) / (2 * a1) + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = np.sqrt(b2**2 - 4 * a2 * c2) + radius2 = (b2 + sq2) / 2 + + a3 = 4 * min_overlap + b3 = -2 * min_overlap * (height + width) + c3 = (min_overlap - 1) * width * height + sq3 = np.sqrt(b3**2 - 4 * a3 * c3) + radius3 = (b3 + sq3) / 2 + return min(radius1, radius2, radius3) + + +def draw_gaussian(heatmap, center, radius, k=1, delte=6): + diameter = 2 * radius + 1 + sigma = diameter / delte + gaussian = gaussian2D((diameter, diameter), sigma_x=sigma, sigma_y=sigma) + + x, y = center + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left: + radius + right] + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) + + +def gaussian2D(shape, sigma_x=1, sigma_y=1): + m, n = [(ss - 1.) / 2. for ss in shape] + y, x = np.ogrid[-m:m + 1, -n:n + 1] + + h = np.exp(-(x * x / (2 * sigma_x * sigma_x) + y * y / (2 * sigma_y * + sigma_y))) + h[h < np.finfo(h.dtype).eps * h.max()] = 0 + return h + + +def draw_umich_gaussian(heatmap, center, radius, k=1): + """ + draw_umich_gaussian, refer to https://github.com/xingyizhou/CenterNet/blob/master/src/lib/utils/image.py#L126 + """ + diameter = 2 * radius + 1 + gaussian = gaussian2D( + (diameter, diameter), sigma_x=diameter / 6, sigma_y=diameter / 6) + + x, y = int(center[0]), int(center[1]) + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left: + radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) + return heatmap + + +def get_border(border, size): + i = 1 + while size - border // i <= border // i: + i *= 2 + return border // i diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/operators.py new file mode 100644 index 0000000000000000000000000000000000000000..61a4aacba024e7b81cfd832ae219d6cfa05af09e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/operators.py @@ -0,0 +1,4064 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# function: +# operators to process sample, +# eg: decode/resize/crop image + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +from numbers import Number, Integral + +import uuid +import random +import math +import numpy as np +import os +import copy +import logging +import cv2 +from PIL import Image, ImageDraw +import pickle +import threading +MUTEX = threading.Lock() + +import paddle +from ppdet.core.workspace import serializable +from ..reader import Compose + +from .op_helper import (satisfy_sample_constraint, filter_and_process, + generate_sample_bbox, clip_bbox, data_anchor_sampling, + satisfy_sample_constraint_coverage, crop_image_sampling, + generate_sample_bbox_square, bbox_area_sampling, + is_poly, get_border) + +from ppdet.utils.logger import setup_logger +from ppdet.modeling.keypoint_utils import get_affine_transform, affine_transform +logger = setup_logger(__name__) + +registered_ops = [] + + +def register_op(cls): + registered_ops.append(cls.__name__) + if not hasattr(BaseOperator, cls.__name__): + setattr(BaseOperator, cls.__name__, cls) + else: + raise KeyError("The {} class has been registered.".format(cls.__name__)) + return serializable(cls) + + +class BboxError(ValueError): + pass + + +class ImageError(ValueError): + pass + + +class BaseOperator(object): + def __init__(self, name=None): + if name is None: + name = self.__class__.__name__ + self._id = name + '_' + str(uuid.uuid4())[-6:] + + def apply(self, sample, context=None): + """ Process a sample. + Args: + sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx} + context (dict): info about this sample processing + Returns: + result (dict): a processed sample + """ + return sample + + def __call__(self, sample, context=None): + """ Process a sample. + Args: + sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx} + context (dict): info about this sample processing + Returns: + result (dict): a processed sample + """ + if isinstance(sample, Sequence): + for i in range(len(sample)): + sample[i] = self.apply(sample[i], context) + else: + sample = self.apply(sample, context) + return sample + + def __str__(self): + return str(self._id) + + +@register_op +class Decode(BaseOperator): + def __init__(self): + """ Transform the image data to numpy format following the rgb format + """ + super(Decode, self).__init__() + + def apply(self, sample, context=None): + """ load image if 'im_file' field is not empty but 'image' is""" + if 'image' not in sample: + with open(sample['im_file'], 'rb') as f: + sample['image'] = f.read() + sample.pop('im_file') + + try: + im = sample['image'] + data = np.frombuffer(im, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + if 'keep_ori_im' in sample and sample['keep_ori_im']: + sample['ori_image'] = im + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + except: + im = sample['image'] + + sample['image'] = im + if 'h' not in sample: + sample['h'] = im.shape[0] + elif sample['h'] != im.shape[0]: + logger.warning( + "The actual image height: {} is not equal to the " + "height: {} in annotation, and update sample['h'] by actual " + "image height.".format(im.shape[0], sample['h'])) + sample['h'] = im.shape[0] + if 'w' not in sample: + sample['w'] = im.shape[1] + elif sample['w'] != im.shape[1]: + logger.warning( + "The actual image width: {} is not equal to the " + "width: {} in annotation, and update sample['w'] by actual " + "image width.".format(im.shape[1], sample['w'])) + sample['w'] = im.shape[1] + + sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + sample['scale_factor'] = np.array([1., 1.], dtype=np.float32) + return sample + + +def _make_dirs(dirname): + try: + from pathlib import Path + except ImportError: + from pathlib2 import Path + Path(dirname).mkdir(exist_ok=True) + + +@register_op +class DecodeCache(BaseOperator): + def __init__(self, cache_root=None): + '''decode image and caching + ''' + super(DecodeCache, self).__init__() + + self.use_cache = False if cache_root is None else True + self.cache_root = cache_root + + if cache_root is not None: + _make_dirs(cache_root) + + def apply(self, sample, context=None): + + if self.use_cache and os.path.exists( + self.cache_path(self.cache_root, sample['im_file'])): + path = self.cache_path(self.cache_root, sample['im_file']) + im = self.load(path) + + else: + if 'image' not in sample: + with open(sample['im_file'], 'rb') as f: + sample['image'] = f.read() + + im = sample['image'] + data = np.frombuffer(im, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + if 'keep_ori_im' in sample and sample['keep_ori_im']: + sample['ori_image'] = im + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + + if self.use_cache and not os.path.exists( + self.cache_path(self.cache_root, sample['im_file'])): + path = self.cache_path(self.cache_root, sample['im_file']) + self.dump(im, path) + + sample['image'] = im + sample['h'] = im.shape[0] + sample['w'] = im.shape[1] + + sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + sample['scale_factor'] = np.array([1., 1.], dtype=np.float32) + + sample.pop('im_file') + + return sample + + @staticmethod + def cache_path(dir_oot, im_file): + return os.path.join(dir_oot, os.path.basename(im_file) + '.pkl') + + @staticmethod + def load(path): + with open(path, 'rb') as f: + im = pickle.load(f) + return im + + @staticmethod + def dump(obj, path): + MUTEX.acquire() + try: + with open(path, 'wb') as f: + pickle.dump(obj, f) + + except Exception as e: + logger.warning('dump {} occurs exception {}'.format(path, str(e))) + + finally: + MUTEX.release() + + +@register_op +class SniperDecodeCrop(BaseOperator): + def __init__(self): + super(SniperDecodeCrop, self).__init__() + + def __call__(self, sample, context=None): + if 'image' not in sample: + with open(sample['im_file'], 'rb') as f: + sample['image'] = f.read() + sample.pop('im_file') + + im = sample['image'] + data = np.frombuffer(im, dtype='uint8') + im = cv2.imdecode(data, cv2.IMREAD_COLOR) # BGR mode, but need RGB mode + if 'keep_ori_im' in sample and sample['keep_ori_im']: + sample['ori_image'] = im + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + + chip = sample['chip'] + x1, y1, x2, y2 = [int(xi) for xi in chip] + im = im[max(y1, 0):min(y2, im.shape[0]), max(x1, 0):min(x2, im.shape[ + 1]), :] + + sample['image'] = im + h = im.shape[0] + w = im.shape[1] + # sample['im_info'] = [h, w, 1.0] + sample['h'] = h + sample['w'] = w + + sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32) + sample['scale_factor'] = np.array([1., 1.], dtype=np.float32) + return sample + + +@register_op +class Permute(BaseOperator): + def __init__(self): + """ + Change the channel to be (C, H, W) + """ + super(Permute, self).__init__() + + def apply(self, sample, context=None): + im = sample['image'] + im = im.transpose((2, 0, 1)) + sample['image'] = im + + if 'pre_image' in sample: + pre_im = sample['pre_image'] + pre_im = pre_im.transpose((2, 0, 1)) + sample['pre_image'] = pre_im + return sample + + +@register_op +class Lighting(BaseOperator): + """ + Lighting the image by eigenvalues and eigenvectors + Args: + eigval (list): eigenvalues + eigvec (list): eigenvectors + alphastd (float): random weight of lighting, 0.1 by default + """ + + def __init__(self, eigval, eigvec, alphastd=0.1): + super(Lighting, self).__init__() + self.alphastd = alphastd + self.eigval = np.array(eigval).astype('float32') + self.eigvec = np.array(eigvec).astype('float32') + + def apply(self, sample, context=None): + alpha = np.random.normal(scale=self.alphastd, size=(3, )) + sample['image'] += np.dot(self.eigvec, self.eigval * alpha) + + if 'pre_image' in sample: + sample['pre_image'] += np.dot(self.eigvec, self.eigval * alpha) + return sample + + +@register_op +class RandomErasingImage(BaseOperator): + def __init__(self, prob=0.5, lower=0.02, higher=0.4, aspect_ratio=0.3): + """ + Random Erasing Data Augmentation, see https://arxiv.org/abs/1708.04896 + Args: + prob (float): probability to carry out random erasing + lower (float): lower limit of the erasing area ratio + higher (float): upper limit of the erasing area ratio + aspect_ratio (float): aspect ratio of the erasing region + """ + super(RandomErasingImage, self).__init__() + self.prob = prob + self.lower = lower + self.higher = higher + self.aspect_ratio = aspect_ratio + + def apply(self, sample, context=None): + gt_bbox = sample['gt_bbox'] + im = sample['image'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image is not a numpy array.".format(self)) + if len(im.shape) != 3: + raise ImageError("{}: image is not 3-dimensional.".format(self)) + + for idx in range(gt_bbox.shape[0]): + if self.prob <= np.random.rand(): + continue + + x1, y1, x2, y2 = gt_bbox[idx, :] + w_bbox = x2 - x1 + h_bbox = y2 - y1 + area = w_bbox * h_bbox + + target_area = random.uniform(self.lower, self.higher) * area + aspect_ratio = random.uniform(self.aspect_ratio, + 1 / self.aspect_ratio) + + h = int(round(math.sqrt(target_area * aspect_ratio))) + w = int(round(math.sqrt(target_area / aspect_ratio))) + + if w < w_bbox and h < h_bbox: + off_y1 = random.randint(0, int(h_bbox - h)) + off_x1 = random.randint(0, int(w_bbox - w)) + im[int(y1 + off_y1):int(y1 + off_y1 + h), int(x1 + off_x1):int( + x1 + off_x1 + w), :] = 0 + sample['image'] = im + return sample + + +@register_op +class NormalizeImage(BaseOperator): + def __init__(self, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225], + is_scale=True, + norm_type='mean_std'): + """ + Args: + mean (list): the pixel mean + std (list): the pixel variance + is_scale (bool): scale the pixel to [0,1] + norm_type (str): type in ['mean_std', 'none'] + """ + super(NormalizeImage, self).__init__() + self.mean = mean + self.std = std + self.is_scale = is_scale + self.norm_type = norm_type + if not (isinstance(self.mean, list) and isinstance(self.std, list) and + isinstance(self.is_scale, bool) and + self.norm_type in ['mean_std', 'none']): + raise TypeError("{}: input type is invalid.".format(self)) + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise ValueError('{}: std is invalid!'.format(self)) + + def apply(self, sample, context=None): + """Normalize the image. + Operators: + 1.(optional) Scale the pixel to [0,1] + 2.(optional) Each pixel minus mean and is divided by std + """ + im = sample['image'] + + im = im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im -= mean + im /= std + + sample['image'] = im + + if 'pre_image' in sample: + pre_im = sample['pre_image'] + pre_im = pre_im.astype(np.float32, copy=False) + if self.is_scale: + scale = 1.0 / 255.0 + pre_im *= scale + + if self.norm_type == 'mean_std': + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + pre_im -= mean + pre_im /= std + sample['pre_image'] = pre_im + + return sample + + +@register_op +class GridMask(BaseOperator): + def __init__(self, + use_h=True, + use_w=True, + rotate=1, + offset=False, + ratio=0.5, + mode=1, + prob=0.7, + upper_iter=360000): + """ + GridMask Data Augmentation, see https://arxiv.org/abs/2001.04086 + Args: + use_h (bool): whether to mask vertically + use_w (boo;): whether to mask horizontally + rotate (float): angle for the mask to rotate + offset (float): mask offset + ratio (float): mask ratio + mode (int): gridmask mode + prob (float): max probability to carry out gridmask + upper_iter (int): suggested to be equal to global max_iter + """ + super(GridMask, self).__init__() + self.use_h = use_h + self.use_w = use_w + self.rotate = rotate + self.offset = offset + self.ratio = ratio + self.mode = mode + self.prob = prob + self.upper_iter = upper_iter + + from .gridmask_utils import Gridmask + self.gridmask_op = Gridmask( + use_h, + use_w, + rotate=rotate, + offset=offset, + ratio=ratio, + mode=mode, + prob=prob, + upper_iter=upper_iter) + + def apply(self, sample, context=None): + sample['image'] = self.gridmask_op(sample['image'], sample['curr_iter']) + return sample + + +@register_op +class RandomDistort(BaseOperator): + """Random color distortion. + Args: + hue (list): hue settings. in [lower, upper, probability] format. + saturation (list): saturation settings. in [lower, upper, probability] format. + contrast (list): contrast settings. in [lower, upper, probability] format. + brightness (list): brightness settings. in [lower, upper, probability] format. + random_apply (bool): whether to apply in random (yolo) or fixed (SSD) + order. + count (int): the number of doing distrot + random_channel (bool): whether to swap channels randomly + """ + + def __init__(self, + hue=[-18, 18, 0.5], + saturation=[0.5, 1.5, 0.5], + contrast=[0.5, 1.5, 0.5], + brightness=[0.5, 1.5, 0.5], + random_apply=True, + count=4, + random_channel=False): + super(RandomDistort, self).__init__() + self.hue = hue + self.saturation = saturation + self.contrast = contrast + self.brightness = brightness + self.random_apply = random_apply + self.count = count + self.random_channel = random_channel + + def apply_hue(self, img): + low, high, prob = self.hue + if np.random.uniform(0., 1.) < prob: + return img + + img = img.astype(np.float32) + # it works, but result differ from HSV version + delta = np.random.uniform(low, high) + u = np.cos(delta * np.pi) + w = np.sin(delta * np.pi) + bt = np.array([[1.0, 0.0, 0.0], [0.0, u, -w], [0.0, w, u]]) + tyiq = np.array([[0.299, 0.587, 0.114], [0.596, -0.274, -0.321], + [0.211, -0.523, 0.311]]) + ityiq = np.array([[1.0, 0.956, 0.621], [1.0, -0.272, -0.647], + [1.0, -1.107, 1.705]]) + t = np.dot(np.dot(ityiq, bt), tyiq).T + img = np.dot(img, t) + return img + + def apply_saturation(self, img): + low, high, prob = self.saturation + if np.random.uniform(0., 1.) < prob: + return img + delta = np.random.uniform(low, high) + img = img.astype(np.float32) + # it works, but result differ from HSV version + gray = img * np.array([[[0.299, 0.587, 0.114]]], dtype=np.float32) + gray = gray.sum(axis=2, keepdims=True) + gray *= (1.0 - delta) + img *= delta + img += gray + return img + + def apply_contrast(self, img): + low, high, prob = self.contrast + if np.random.uniform(0., 1.) < prob: + return img + delta = np.random.uniform(low, high) + img = img.astype(np.float32) + img *= delta + return img + + def apply_brightness(self, img): + low, high, prob = self.brightness + if np.random.uniform(0., 1.) < prob: + return img + delta = np.random.uniform(low, high) + img = img.astype(np.float32) + img += delta + return img + + def apply(self, sample, context=None): + img = sample['image'] + if self.random_apply: + functions = [ + self.apply_brightness, self.apply_contrast, + self.apply_saturation, self.apply_hue + ] + distortions = np.random.permutation(functions)[:self.count] + for func in distortions: + img = func(img) + sample['image'] = img + return sample + + img = self.apply_brightness(img) + mode = np.random.randint(0, 2) + + if mode: + img = self.apply_contrast(img) + + img = self.apply_saturation(img) + img = self.apply_hue(img) + + if not mode: + img = self.apply_contrast(img) + + if self.random_channel: + if np.random.randint(0, 2): + img = img[..., np.random.permutation(3)] + sample['image'] = img + return sample + + +@register_op +class PhotoMetricDistortion(BaseOperator): + """Apply photometric distortion to image sequentially, every transformation + is applied with a probability of 0.5. The position of random contrast is in + second or second to last. + + 1. random brightness + 2. random contrast (mode 0) + 3. convert color from BGR to HSV + 4. random saturation + 5. random hue + 6. convert color from HSV to BGR + 7. random contrast (mode 1) + 8. randomly swap channels + + Args: + brightness_delta (int): delta of brightness. + contrast_range (tuple): range of contrast. + saturation_range (tuple): range of saturation. + hue_delta (int): delta of hue. + """ + + def __init__(self, + brightness_delta=32, + contrast_range=(0.5, 1.5), + saturation_range=(0.5, 1.5), + hue_delta=18): + super(PhotoMetricDistortion, self).__init__() + self.brightness_delta = brightness_delta + self.contrast_lower, self.contrast_upper = contrast_range + self.saturation_lower, self.saturation_upper = saturation_range + self.hue_delta = hue_delta + + def apply(self, results, context=None): + """Call function to perform photometric distortion on images. + + Args: + results (dict): Result dict from loading pipeline. + + Returns: + dict: Result dict with images distorted. + """ + + img = results['image'] + img = img.astype(np.float32) + # random brightness + if np.random.randint(2): + delta = np.random.uniform(-self.brightness_delta, + self.brightness_delta) + img += delta + + # mode == 0 --> do random contrast first + # mode == 1 --> do random contrast last + mode = np.random.randint(2) + if mode == 1: + if np.random.randint(2): + alpha = np.random.uniform(self.contrast_lower, + self.contrast_upper) + img *= alpha + + # convert color from BGR to HSV + img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) + + # random saturation + if np.random.randint(2): + img[..., 1] *= np.random.uniform(self.saturation_lower, + self.saturation_upper) + + # random hue + if np.random.randint(2): + img[..., 0] += np.random.uniform(-self.hue_delta, self.hue_delta) + img[..., 0][img[..., 0] > 360] -= 360 + img[..., 0][img[..., 0] < 0] += 360 + + # convert color from HSV to BGR + img = cv2.cvtColor(img, cv2.COLOR_HSV2BGR) + + # random contrast + if mode == 0: + if np.random.randint(2): + alpha = np.random.uniform(self.contrast_lower, + self.contrast_upper) + img *= alpha + + # randomly swap channels + if np.random.randint(2): + img = img[..., np.random.permutation(3)] + + results['image'] = img + return results + + def __repr__(self): + repr_str = self.__class__.__name__ + repr_str += f'(\nbrightness_delta={self.brightness_delta},\n' + repr_str += 'contrast_range=' + repr_str += f'{(self.contrast_lower, self.contrast_upper)},\n' + repr_str += 'saturation_range=' + repr_str += f'{(self.saturation_lower, self.saturation_upper)},\n' + repr_str += f'hue_delta={self.hue_delta})' + return repr_str + + +@register_op +class AutoAugment(BaseOperator): + def __init__(self, autoaug_type="v1"): + """ + Args: + autoaug_type (str): autoaug type, support v0, v1, v2, v3, test + """ + super(AutoAugment, self).__init__() + self.autoaug_type = autoaug_type + + def apply(self, sample, context=None): + """ + Learning Data Augmentation Strategies for Object Detection, see https://arxiv.org/abs/1906.11172 + """ + im = sample['image'] + gt_bbox = sample['gt_bbox'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image is not a numpy array.".format(self)) + if len(im.shape) != 3: + raise ImageError("{}: image is not 3-dimensional.".format(self)) + if len(gt_bbox) == 0: + return sample + + height, width, _ = im.shape + norm_gt_bbox = np.ones_like(gt_bbox, dtype=np.float32) + norm_gt_bbox[:, 0] = gt_bbox[:, 1] / float(height) + norm_gt_bbox[:, 1] = gt_bbox[:, 0] / float(width) + norm_gt_bbox[:, 2] = gt_bbox[:, 3] / float(height) + norm_gt_bbox[:, 3] = gt_bbox[:, 2] / float(width) + + from .autoaugment_utils import distort_image_with_autoaugment + im, norm_gt_bbox = distort_image_with_autoaugment(im, norm_gt_bbox, + self.autoaug_type) + + gt_bbox[:, 0] = norm_gt_bbox[:, 1] * float(width) + gt_bbox[:, 1] = norm_gt_bbox[:, 0] * float(height) + gt_bbox[:, 2] = norm_gt_bbox[:, 3] * float(width) + gt_bbox[:, 3] = norm_gt_bbox[:, 2] * float(height) + + sample['image'] = im + sample['gt_bbox'] = gt_bbox + return sample + + +@register_op +class RandomFlip(BaseOperator): + def __init__(self, prob=0.5): + """ + Args: + prob (float): the probability of flipping image + """ + super(RandomFlip, self).__init__() + self.prob = prob + if not (isinstance(self.prob, float)): + raise TypeError("{}: input type is invalid.".format(self)) + + def apply_segm(self, segms, height, width): + def _flip_poly(poly, width): + flipped_poly = np.array(poly) + flipped_poly[0::2] = width - np.array(poly[0::2]) + return flipped_poly.tolist() + + def _flip_rle(rle, height, width): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, height, width) + mask = mask_util.decode(rle) + mask = mask[:, ::-1] + rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8)) + return rle + + flipped_segms = [] + for segm in segms: + if is_poly(segm): + # Polygon format + flipped_segms.append([_flip_poly(poly, width) for poly in segm]) + else: + # RLE format + import pycocotools.mask as mask_util + flipped_segms.append(_flip_rle(segm, height, width)) + return flipped_segms + + def apply_keypoint(self, gt_keypoint, width): + for i in range(gt_keypoint.shape[1]): + if i % 2 == 0: + old_x = gt_keypoint[:, i].copy() + gt_keypoint[:, i] = width - old_x + return gt_keypoint + + def apply_image(self, image): + return image[:, ::-1, :] + + def apply_bbox(self, bbox, width): + oldx1 = bbox[:, 0].copy() + oldx2 = bbox[:, 2].copy() + bbox[:, 0] = width - oldx2 + bbox[:, 2] = width - oldx1 + return bbox + + def apply(self, sample, context=None): + """Filp the image and bounding box. + Operators: + 1. Flip the image numpy. + 2. Transform the bboxes' x coordinates. + (Must judge whether the coordinates are normalized!) + 3. Transform the segmentations' x coordinates. + (Must judge whether the coordinates are normalized!) + Output: + sample: the image, bounding box and segmentation part + in sample are flipped. + """ + if np.random.uniform(0, 1) < self.prob: + im = sample['image'] + height, width = im.shape[:2] + im = self.apply_image(im) + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], width) + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], height, + width) + if 'gt_keypoint' in sample and len(sample['gt_keypoint']) > 0: + sample['gt_keypoint'] = self.apply_keypoint( + sample['gt_keypoint'], width) + + if 'semantic' in sample and sample['semantic']: + sample['semantic'] = sample['semantic'][:, ::-1] + + if 'gt_segm' in sample and sample['gt_segm'].any(): + sample['gt_segm'] = sample['gt_segm'][:, :, ::-1] + + sample['flipped'] = True + sample['image'] = im + return sample + + +@register_op +class Resize(BaseOperator): + def __init__(self, target_size, keep_ratio, interp=cv2.INTER_LINEAR): + """ + Resize image to target size. if keep_ratio is True, + resize the image's long side to the maximum of target_size + if keep_ratio is False, resize the image to target size(h, w) + Args: + target_size (int|list): image target size + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): the interpolation method + """ + super(Resize, self).__init__() + self.keep_ratio = keep_ratio + self.interp = interp + if not isinstance(target_size, (Integral, Sequence)): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or Tuple, now is {}". + format(type(target_size))) + if isinstance(target_size, Integral): + target_size = [target_size, target_size] + self.target_size = target_size + + def apply_image(self, image, scale): + im_scale_x, im_scale_y = scale + + return cv2.resize( + image, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + + def apply_bbox(self, bbox, scale, size): + im_scale_x, im_scale_y = scale + resize_w, resize_h = size + bbox[:, 0::2] *= im_scale_x + bbox[:, 1::2] *= im_scale_y + bbox[:, 0::2] = np.clip(bbox[:, 0::2], 0, resize_w) + bbox[:, 1::2] = np.clip(bbox[:, 1::2], 0, resize_h) + return bbox + + def apply_area(self, area, scale): + im_scale_x, im_scale_y = scale + return area * im_scale_x * im_scale_y + + def apply_joints(self, joints, scale, size): + im_scale_x, im_scale_y = scale + resize_w, resize_h = size + joints[..., 0] *= im_scale_x + joints[..., 1] *= im_scale_y + joints[..., 0] = np.clip(joints[..., 0], 0, resize_w) + joints[..., 1] = np.clip(joints[..., 1], 0, resize_h) + return joints + + def apply_segm(self, segms, im_size, scale): + def _resize_poly(poly, im_scale_x, im_scale_y): + resized_poly = np.array(poly).astype('float32') + resized_poly[0::2] *= im_scale_x + resized_poly[1::2] *= im_scale_y + return resized_poly.tolist() + + def _resize_rle(rle, im_h, im_w, im_scale_x, im_scale_y): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, im_h, im_w) + + mask = mask_util.decode(rle) + mask = cv2.resize( + mask, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8)) + return rle + + im_h, im_w = im_size + im_scale_x, im_scale_y = scale + resized_segms = [] + for segm in segms: + if is_poly(segm): + # Polygon format + resized_segms.append([ + _resize_poly(poly, im_scale_x, im_scale_y) for poly in segm + ]) + else: + # RLE format + import pycocotools.mask as mask_util + resized_segms.append( + _resize_rle(segm, im_h, im_w, im_scale_x, im_scale_y)) + + return resized_segms + + def apply(self, sample, context=None): + """ Resize the image numpy. + """ + im = sample['image'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image type is not numpy.".format(self)) + + # apply image + if len(im.shape) == 3: + im_shape = im.shape + else: + im_shape = im[0].shape + + if self.keep_ratio: + im_size_min = np.min(im_shape[0:2]) + im_size_max = np.max(im_shape[0:2]) + + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + + im_scale = min(target_size_min / im_size_min, + target_size_max / im_size_max) + + resize_h = int(im_scale * float(im_shape[0]) + 0.5) + resize_w = int(im_scale * float(im_shape[1]) + 0.5) + + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / im_shape[0] + im_scale_x = resize_w / im_shape[1] + + if len(im.shape) == 3: + im = self.apply_image(sample['image'], [im_scale_x, im_scale_y]) + sample['image'] = im.astype(np.float32) + else: + resized_images = [] + for one_im in im: + applied_im = self.apply_image(one_im, [im_scale_x, im_scale_y]) + resized_images.append(applied_im) + + sample['image'] = np.array(resized_images) + + # 2d keypoints resize + if 'kps2d' in sample.keys(): + kps2d = sample['kps2d'] + kps2d[:, :, 0] = kps2d[:, :, 0] * im_scale_x + kps2d[:, :, 1] = kps2d[:, :, 1] * im_scale_y + + sample['kps2d'] = kps2d + + sample['im_shape'] = np.asarray([resize_h, resize_w], dtype=np.float32) + if 'scale_factor' in sample: + scale_factor = sample['scale_factor'] + sample['scale_factor'] = np.asarray( + [scale_factor[0] * im_scale_y, scale_factor[1] * im_scale_x], + dtype=np.float32) + else: + sample['scale_factor'] = np.asarray( + [im_scale_y, im_scale_x], dtype=np.float32) + + # apply bbox + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + # apply areas + if 'gt_areas' in sample: + sample['gt_areas'] = self.apply_area(sample['gt_areas'], + [im_scale_x, im_scale_y]) + + # apply polygon + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im_shape[:2], + [im_scale_x, im_scale_y]) + + # apply semantic + if 'semantic' in sample and sample['semantic']: + semantic = sample['semantic'] + semantic = cv2.resize( + semantic.astype('float32'), + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + semantic = np.asarray(semantic).astype('int32') + semantic = np.expand_dims(semantic, 0) + sample['semantic'] = semantic + + # apply gt_segm + if 'gt_segm' in sample and len(sample['gt_segm']) > 0: + masks = [ + cv2.resize( + gt_segm, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_NEAREST) + for gt_segm in sample['gt_segm'] + ] + sample['gt_segm'] = np.asarray(masks).astype(np.uint8) + + if 'gt_joints' in sample: + sample['gt_joints'] = self.apply_joints(sample['gt_joints'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + return sample + + +@register_op +class MultiscaleTestResize(BaseOperator): + def __init__(self, + origin_target_size=[800, 1333], + target_size=[], + interp=cv2.INTER_LINEAR, + use_flip=True): + """ + Rescale image to the each size in target size, and capped at max_size. + Args: + origin_target_size (list): origin target size of image + target_size (list): A list of target sizes of image. + interp (int): the interpolation method. + use_flip (bool): whether use flip augmentation. + """ + super(MultiscaleTestResize, self).__init__() + self.interp = interp + self.use_flip = use_flip + + if not isinstance(target_size, Sequence): + raise TypeError( + "Type of target_size is invalid. Must be List or Tuple, now is {}". + format(type(target_size))) + self.target_size = target_size + + if not isinstance(origin_target_size, Sequence): + raise TypeError( + "Type of origin_target_size is invalid. Must be List or Tuple, now is {}". + format(type(origin_target_size))) + + self.origin_target_size = origin_target_size + + def apply(self, sample, context=None): + """ Resize the image numpy for multi-scale test. + """ + samples = [] + resizer = Resize( + self.origin_target_size, keep_ratio=True, interp=self.interp) + samples.append(resizer(sample.copy(), context)) + if self.use_flip: + flipper = RandomFlip(1.1) + samples.append(flipper(sample.copy(), context=context)) + + for size in self.target_size: + resizer = Resize(size, keep_ratio=True, interp=self.interp) + samples.append(resizer(sample.copy(), context)) + + return samples + + +@register_op +class RandomResize(BaseOperator): + def __init__(self, + target_size, + keep_ratio=True, + interp=cv2.INTER_LINEAR, + random_range=False, + random_size=True, + random_interp=False): + """ + Resize image to target size randomly. random target_size and interpolation method + Args: + target_size (int, list, tuple): image target size, if random size is True, must be list or tuple + keep_ratio (bool): whether keep_raio or not, default true + interp (int): the interpolation method + random_range (bool): whether random select target size of image, the target_size must be + a [[min_short_edge, long_edge], [max_short_edge, long_edge]] + random_size (bool): whether random select target size of image + random_interp (bool): whether random select interpolation method + """ + super(RandomResize, self).__init__() + self.keep_ratio = keep_ratio + self.interp = interp + self.interps = [ + cv2.INTER_NEAREST, + cv2.INTER_LINEAR, + cv2.INTER_AREA, + cv2.INTER_CUBIC, + cv2.INTER_LANCZOS4, + ] + assert isinstance(target_size, ( + Integral, Sequence)), "target_size must be Integer, List or Tuple" + if (random_range or random_size) and not isinstance(target_size, + Sequence): + raise TypeError( + "Type of target_size is invalid when random_size or random_range is True. Must be List or Tuple, now is {}". + format(type(target_size))) + if random_range and not len(target_size) == 2: + raise TypeError( + "target_size must be two list as [[min_short_edge, long_edge], [max_short_edge, long_edge]] when random_range is True." + ) + self.target_size = target_size + self.random_range = random_range + self.random_size = random_size + self.random_interp = random_interp + + def apply(self, sample, context=None): + """ Resize the image numpy. + """ + if self.random_range: + short_edge = np.random.randint(self.target_size[0][0], + self.target_size[1][0] + 1) + long_edge = max(self.target_size[0][1], self.target_size[1][1] + 1) + target_size = [short_edge, long_edge] + else: + if self.random_size: + target_size = random.choice(self.target_size) + else: + target_size = self.target_size + + if self.random_interp: + interp = random.choice(self.interps) + else: + interp = self.interp + + resizer = Resize(target_size, self.keep_ratio, interp) + return resizer(sample, context=context) + + +@register_op +class RandomExpand(BaseOperator): + """Random expand the canvas. + Args: + ratio (float): maximum expansion ratio. + prob (float): probability to expand. + fill_value (list): color value used to fill the canvas. in RGB order. + """ + + def __init__(self, ratio=4., prob=0.5, fill_value=(127.5, 127.5, 127.5)): + super(RandomExpand, self).__init__() + assert ratio > 1.01, "expand ratio must be larger than 1.01" + self.ratio = ratio + self.prob = prob + assert isinstance(fill_value, (Number, Sequence)), \ + "fill value must be either float or sequence" + if isinstance(fill_value, Number): + fill_value = (fill_value, ) * 3 + if not isinstance(fill_value, tuple): + fill_value = tuple(fill_value) + self.fill_value = fill_value + + def apply(self, sample, context=None): + if np.random.uniform(0., 1.) < self.prob: + return sample + + im = sample['image'] + height, width = im.shape[:2] + ratio = np.random.uniform(1., self.ratio) + h = int(height * ratio) + w = int(width * ratio) + if not h > height or not w > width: + return sample + y = np.random.randint(0, h - height) + x = np.random.randint(0, w - width) + offsets, size = [x, y], [h, w] + + pad = Pad(size, + pad_mode=-1, + offsets=offsets, + fill_value=self.fill_value) + + return pad(sample, context=context) + + +@register_op +class CropWithSampling(BaseOperator): + def __init__(self, batch_sampler, satisfy_all=False, avoid_no_bbox=True): + """ + Args: + batch_sampler (list): Multiple sets of different + parameters for cropping. + satisfy_all (bool): whether all boxes must satisfy. + e.g.[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0], + [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]] + [max sample, max trial, min scale, max scale, + min aspect ratio, max aspect ratio, + min overlap, max overlap] + avoid_no_bbox (bool): whether to avoid the + situation where the box does not appear. + """ + super(CropWithSampling, self).__init__() + self.batch_sampler = batch_sampler + self.satisfy_all = satisfy_all + self.avoid_no_bbox = avoid_no_bbox + + def apply(self, sample, context): + """ + Crop the image and modify bounding box. + Operators: + 1. Scale the image width and height. + 2. Crop the image according to a radom sample. + 3. Rescale the bounding box. + 4. Determine if the new bbox is satisfied in the new image. + Returns: + sample: the image, bounding box are replaced. + """ + assert 'image' in sample, "image data not found" + im = sample['image'] + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + im_height, im_width = im.shape[:2] + gt_score = None + if 'gt_score' in sample: + gt_score = sample['gt_score'] + sampled_bbox = [] + gt_bbox = gt_bbox.tolist() + for sampler in self.batch_sampler: + found = 0 + for i in range(sampler[1]): + if found >= sampler[0]: + break + sample_bbox = generate_sample_bbox(sampler) + if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox, + self.satisfy_all): + sampled_bbox.append(sample_bbox) + found = found + 1 + im = np.array(im) + while sampled_bbox: + idx = int(np.random.uniform(0, len(sampled_bbox))) + sample_bbox = sampled_bbox.pop(idx) + sample_bbox = clip_bbox(sample_bbox) + crop_bbox, crop_class, crop_score = \ + filter_and_process(sample_bbox, gt_bbox, gt_class, scores=gt_score) + if self.avoid_no_bbox: + if len(crop_bbox) < 1: + continue + xmin = int(sample_bbox[0] * im_width) + xmax = int(sample_bbox[2] * im_width) + ymin = int(sample_bbox[1] * im_height) + ymax = int(sample_bbox[3] * im_height) + im = im[ymin:ymax, xmin:xmax] + sample['image'] = im + sample['gt_bbox'] = crop_bbox + sample['gt_class'] = crop_class + sample['gt_score'] = crop_score + return sample + return sample + + +@register_op +class CropWithDataAchorSampling(BaseOperator): + def __init__(self, + batch_sampler, + anchor_sampler=None, + target_size=None, + das_anchor_scales=[16, 32, 64, 128], + sampling_prob=0.5, + min_size=8., + avoid_no_bbox=True): + """ + Args: + anchor_sampler (list): anchor_sampling sets of different + parameters for cropping. + batch_sampler (list): Multiple sets of different + parameters for cropping. + e.g.[[1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0]] + [[1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0]] + [max sample, max trial, min scale, max scale, + min aspect ratio, max aspect ratio, + min overlap, max overlap, min coverage, max coverage] + target_size (int): target image size. + das_anchor_scales (list[float]): a list of anchor scales in data + anchor smapling. + min_size (float): minimum size of sampled bbox. + avoid_no_bbox (bool): whether to avoid the + situation where the box does not appear. + """ + super(CropWithDataAchorSampling, self).__init__() + self.anchor_sampler = anchor_sampler + self.batch_sampler = batch_sampler + self.target_size = target_size + self.sampling_prob = sampling_prob + self.min_size = min_size + self.avoid_no_bbox = avoid_no_bbox + self.das_anchor_scales = np.array(das_anchor_scales) + + def apply(self, sample, context): + """ + Crop the image and modify bounding box. + Operators: + 1. Scale the image width and height. + 2. Crop the image according to a radom sample. + 3. Rescale the bounding box. + 4. Determine if the new bbox is satisfied in the new image. + Returns: + sample: the image, bounding box are replaced. + """ + assert 'image' in sample, "image data not found" + im = sample['image'] + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + image_height, image_width = im.shape[:2] + gt_bbox[:, 0] /= image_width + gt_bbox[:, 1] /= image_height + gt_bbox[:, 2] /= image_width + gt_bbox[:, 3] /= image_height + gt_score = None + if 'gt_score' in sample: + gt_score = sample['gt_score'] + sampled_bbox = [] + gt_bbox = gt_bbox.tolist() + + prob = np.random.uniform(0., 1.) + if prob > self.sampling_prob: # anchor sampling + assert self.anchor_sampler + for sampler in self.anchor_sampler: + found = 0 + for i in range(sampler[1]): + if found >= sampler[0]: + break + sample_bbox = data_anchor_sampling( + gt_bbox, image_width, image_height, + self.das_anchor_scales, self.target_size) + if sample_bbox == 0: + break + if satisfy_sample_constraint_coverage(sampler, sample_bbox, + gt_bbox): + sampled_bbox.append(sample_bbox) + found = found + 1 + im = np.array(im) + while sampled_bbox: + idx = int(np.random.uniform(0, len(sampled_bbox))) + sample_bbox = sampled_bbox.pop(idx) + + if 'gt_keypoint' in sample.keys(): + keypoints = (sample['gt_keypoint'], + sample['keypoint_ignore']) + crop_bbox, crop_class, crop_score, gt_keypoints = \ + filter_and_process(sample_bbox, gt_bbox, gt_class, + scores=gt_score, + keypoints=keypoints) + else: + crop_bbox, crop_class, crop_score = filter_and_process( + sample_bbox, gt_bbox, gt_class, scores=gt_score) + crop_bbox, crop_class, crop_score = bbox_area_sampling( + crop_bbox, crop_class, crop_score, self.target_size, + self.min_size) + + if self.avoid_no_bbox: + if len(crop_bbox) < 1: + continue + im = crop_image_sampling(im, sample_bbox, image_width, + image_height, self.target_size) + height, width = im.shape[:2] + crop_bbox[:, 0] *= width + crop_bbox[:, 1] *= height + crop_bbox[:, 2] *= width + crop_bbox[:, 3] *= height + sample['image'] = im + sample['gt_bbox'] = crop_bbox + sample['gt_class'] = crop_class + if 'gt_score' in sample: + sample['gt_score'] = crop_score + if 'gt_keypoint' in sample.keys(): + sample['gt_keypoint'] = gt_keypoints[0] + sample['keypoint_ignore'] = gt_keypoints[1] + return sample + return sample + + else: + for sampler in self.batch_sampler: + found = 0 + for i in range(sampler[1]): + if found >= sampler[0]: + break + sample_bbox = generate_sample_bbox_square( + sampler, image_width, image_height) + if satisfy_sample_constraint_coverage(sampler, sample_bbox, + gt_bbox): + sampled_bbox.append(sample_bbox) + found = found + 1 + im = np.array(im) + while sampled_bbox: + idx = int(np.random.uniform(0, len(sampled_bbox))) + sample_bbox = sampled_bbox.pop(idx) + sample_bbox = clip_bbox(sample_bbox) + + if 'gt_keypoint' in sample.keys(): + keypoints = (sample['gt_keypoint'], + sample['keypoint_ignore']) + crop_bbox, crop_class, crop_score, gt_keypoints = \ + filter_and_process(sample_bbox, gt_bbox, gt_class, + scores=gt_score, + keypoints=keypoints) + else: + crop_bbox, crop_class, crop_score = filter_and_process( + sample_bbox, gt_bbox, gt_class, scores=gt_score) + # sampling bbox according the bbox area + crop_bbox, crop_class, crop_score = bbox_area_sampling( + crop_bbox, crop_class, crop_score, self.target_size, + self.min_size) + + if self.avoid_no_bbox: + if len(crop_bbox) < 1: + continue + xmin = int(sample_bbox[0] * image_width) + xmax = int(sample_bbox[2] * image_width) + ymin = int(sample_bbox[1] * image_height) + ymax = int(sample_bbox[3] * image_height) + im = im[ymin:ymax, xmin:xmax] + height, width = im.shape[:2] + crop_bbox[:, 0] *= width + crop_bbox[:, 1] *= height + crop_bbox[:, 2] *= width + crop_bbox[:, 3] *= height + sample['image'] = im + sample['gt_bbox'] = crop_bbox + sample['gt_class'] = crop_class + if 'gt_score' in sample: + sample['gt_score'] = crop_score + if 'gt_keypoint' in sample.keys(): + sample['gt_keypoint'] = gt_keypoints[0] + sample['keypoint_ignore'] = gt_keypoints[1] + return sample + return sample + + +@register_op +class RandomCrop(BaseOperator): + """Random crop image and bboxes. + Args: + aspect_ratio (list): aspect ratio of cropped region. + in [min, max] format. + thresholds (list): iou thresholds for decide a valid bbox crop. + scaling (list): ratio between a cropped region and the original image. + in [min, max] format. + num_attempts (int): number of tries before giving up. + allow_no_crop (bool): allow return without actually cropping them. + cover_all_box (bool): ensure all bboxes are covered in the final crop. + is_mask_crop(bool): whether crop the segmentation. + """ + + def __init__(self, + aspect_ratio=[.5, 2.], + thresholds=[.0, .1, .3, .5, .7, .9], + scaling=[.3, 1.], + num_attempts=50, + allow_no_crop=True, + cover_all_box=False, + is_mask_crop=False, + ioumode="iou"): + super(RandomCrop, self).__init__() + self.aspect_ratio = aspect_ratio + self.thresholds = thresholds + self.scaling = scaling + self.num_attempts = num_attempts + self.allow_no_crop = allow_no_crop + self.cover_all_box = cover_all_box + self.is_mask_crop = is_mask_crop + self.ioumode = ioumode + + def crop_segms(self, segms, valid_ids, crop, height, width): + def _crop_poly(segm, crop): + xmin, ymin, xmax, ymax = crop + crop_coord = [xmin, ymin, xmin, ymax, xmax, ymax, xmax, ymin] + crop_p = np.array(crop_coord).reshape(4, 2) + crop_p = Polygon(crop_p) + + crop_segm = list() + for poly in segm: + poly = np.array(poly).reshape(len(poly) // 2, 2) + polygon = Polygon(poly) + if not polygon.is_valid: + exterior = polygon.exterior + multi_lines = exterior.intersection(exterior) + polygons = shapely.ops.polygonize(multi_lines) + polygon = MultiPolygon(polygons) + multi_polygon = list() + if isinstance(polygon, MultiPolygon): + multi_polygon = copy.deepcopy(polygon) + else: + multi_polygon.append(copy.deepcopy(polygon)) + for per_polygon in multi_polygon: + inter = per_polygon.intersection(crop_p) + if not inter: + continue + if isinstance(inter, (MultiPolygon, GeometryCollection)): + for part in inter: + if not isinstance(part, Polygon): + continue + part = np.squeeze( + np.array(part.exterior.coords[:-1]).reshape(1, + -1)) + part[0::2] -= xmin + part[1::2] -= ymin + crop_segm.append(part.tolist()) + elif isinstance(inter, Polygon): + crop_poly = np.squeeze( + np.array(inter.exterior.coords[:-1]).reshape(1, -1)) + crop_poly[0::2] -= xmin + crop_poly[1::2] -= ymin + crop_segm.append(crop_poly.tolist()) + else: + continue + return crop_segm + + def _crop_rle(rle, crop, height, width): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, height, width) + mask = mask_util.decode(rle) + mask = mask[crop[1]:crop[3], crop[0]:crop[2]] + rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8)) + return rle + + crop_segms = [] + for id in valid_ids: + segm = segms[id] + if is_poly(segm): + import copy + import shapely.ops + from shapely.geometry import Polygon, MultiPolygon, GeometryCollection + logging.getLogger("shapely").setLevel(logging.WARNING) + # Polygon format + crop_segms.append(_crop_poly(segm, crop)) + else: + # RLE format + import pycocotools.mask as mask_util + crop_segms.append(_crop_rle(segm, crop, height, width)) + return crop_segms + + def set_fake_bboxes(self, sample): + sample['gt_bbox'] = np.array( + [ + [32, 32, 128, 128], + [32, 32, 128, 256], + [32, 64, 128, 128], + [32, 64, 128, 256], + [64, 64, 128, 256], + [64, 64, 256, 256], + [64, 32, 128, 256], + [64, 32, 128, 256], + [96, 32, 128, 256], + [96, 32, 128, 256], + ], + dtype=np.float32) + sample['gt_class'] = np.array( + [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], np.int32) + return sample + + def apply(self, sample, context=None): + if 'gt_bbox' not in sample: + # only used in semi-det as unsup data + sample = self.set_fake_bboxes(sample) + sample = self.random_crop(sample, fake_bboxes=True) + return sample + + if 'gt_bbox' in sample and len(sample['gt_bbox']) == 0: + return sample + sample = self.random_crop(sample) + return sample + + def random_crop(self, sample, fake_bboxes=False): + h, w = sample['image'].shape[:2] + gt_bbox = sample['gt_bbox'] + + # NOTE Original method attempts to generate one candidate for each + # threshold then randomly sample one from the resulting list. + # Here a short circuit approach is taken, i.e., randomly choose a + # threshold and attempt to find a valid crop, and simply return the + # first one found. + # The probability is not exactly the same, kinda resembling the + # "Monty Hall" problem. Actually carrying out the attempts will affect + # observability (just like opening doors in the "Monty Hall" game). + thresholds = list(self.thresholds) + if self.allow_no_crop: + thresholds.append('no_crop') + np.random.shuffle(thresholds) + + for thresh in thresholds: + if thresh == 'no_crop': + return sample + + found = False + for i in range(self.num_attempts): + scale = np.random.uniform(*self.scaling) + if self.aspect_ratio is not None: + min_ar, max_ar = self.aspect_ratio + aspect_ratio = np.random.uniform( + max(min_ar, scale**2), min(max_ar, scale**-2)) + h_scale = scale / np.sqrt(aspect_ratio) + w_scale = scale * np.sqrt(aspect_ratio) + else: + h_scale = np.random.uniform(*self.scaling) + w_scale = np.random.uniform(*self.scaling) + crop_h = h * h_scale + crop_w = w * w_scale + if self.aspect_ratio is None: + if crop_h / crop_w < 0.5 or crop_h / crop_w > 2.0: + continue + + crop_h = int(crop_h) + crop_w = int(crop_w) + crop_y = np.random.randint(0, h - crop_h) + crop_x = np.random.randint(0, w - crop_w) + crop_box = [crop_x, crop_y, crop_x + crop_w, crop_y + crop_h] + if self.ioumode == "iof": + iou = self._gtcropiou_matrix( + gt_bbox, np.array( + [crop_box], dtype=np.float32)) + elif self.ioumode == "iou": + iou = self._iou_matrix( + gt_bbox, np.array( + [crop_box], dtype=np.float32)) + if iou.max() < thresh: + continue + + if self.cover_all_box and iou.min() < thresh: + continue + + cropped_box, valid_ids = self._crop_box_with_center_constraint( + gt_bbox, np.array( + crop_box, dtype=np.float32)) + if valid_ids.size > 0: + found = True + break + + if found: + if self.is_mask_crop and 'gt_poly' in sample and len(sample[ + 'gt_poly']) > 0: + crop_polys = self.crop_segms( + sample['gt_poly'], + valid_ids, + np.array( + crop_box, dtype=np.int64), + h, + w) + if [] in crop_polys: + delete_id = list() + valid_polys = list() + for id, crop_poly in enumerate(crop_polys): + if crop_poly == []: + delete_id.append(id) + else: + valid_polys.append(crop_poly) + valid_ids = np.delete(valid_ids, delete_id) + if len(valid_polys) == 0: + return sample + sample['gt_poly'] = valid_polys + else: + sample['gt_poly'] = crop_polys + + if 'gt_segm' in sample: + sample['gt_segm'] = self._crop_segm(sample['gt_segm'], + crop_box) + sample['gt_segm'] = np.take( + sample['gt_segm'], valid_ids, axis=0) + + sample['image'] = self._crop_image(sample['image'], crop_box) + if fake_bboxes == True: + return sample + + sample['gt_bbox'] = np.take(cropped_box, valid_ids, axis=0) + sample['gt_class'] = np.take( + sample['gt_class'], valid_ids, axis=0) + if 'gt_score' in sample: + sample['gt_score'] = np.take( + sample['gt_score'], valid_ids, axis=0) + + if 'is_crowd' in sample: + sample['is_crowd'] = np.take( + sample['is_crowd'], valid_ids, axis=0) + + if 'difficult' in sample: + sample['difficult'] = np.take( + sample['difficult'], valid_ids, axis=0) + + if 'gt_joints' in sample: + sample['gt_joints'] = self._crop_joints(sample['gt_joints'], + crop_box) + + return sample + + return sample + + def _iou_matrix(self, a, b): + tl_i = np.maximum(a[:, np.newaxis, :2], b[:, :2]) + br_i = np.minimum(a[:, np.newaxis, 2:], b[:, 2:]) + + area_i = np.prod(br_i - tl_i, axis=2) * (tl_i < br_i).all(axis=2) + area_a = np.prod(a[:, 2:] - a[:, :2], axis=1) + area_b = np.prod(b[:, 2:] - b[:, :2], axis=1) + area_o = (area_a[:, np.newaxis] + area_b - area_i) + return area_i / (area_o + 1e-10) + + def _gtcropiou_matrix(self, a, b): + tl_i = np.maximum(a[:, np.newaxis, :2], b[:, :2]) + br_i = np.minimum(a[:, np.newaxis, 2:], b[:, 2:]) + + area_i = np.prod(br_i - tl_i, axis=2) * (tl_i < br_i).all(axis=2) + area_a = np.prod(a[:, 2:] - a[:, :2], axis=1) + area_b = np.prod(b[:, 2:] - b[:, :2], axis=1) + area_o = (area_a[:, np.newaxis] + area_b - area_i) + return area_i / (area_a + 1e-10) + + def _crop_box_with_center_constraint(self, box, crop): + cropped_box = box.copy() + + cropped_box[:, :2] = np.maximum(box[:, :2], crop[:2]) + cropped_box[:, 2:] = np.minimum(box[:, 2:], crop[2:]) + cropped_box[:, :2] -= crop[:2] + cropped_box[:, 2:] -= crop[:2] + + centers = (box[:, :2] + box[:, 2:]) / 2 + valid = np.logical_and(crop[:2] <= centers, + centers < crop[2:]).all(axis=1) + valid = np.logical_and( + valid, (cropped_box[:, :2] < cropped_box[:, 2:]).all(axis=1)) + + return cropped_box, np.where(valid)[0] + + def _crop_image(self, img, crop): + x1, y1, x2, y2 = crop + return img[y1:y2, x1:x2, :] + + def _crop_segm(self, segm, crop): + x1, y1, x2, y2 = crop + return segm[:, y1:y2, x1:x2] + + def _crop_joints(self, joints, crop): + x1, y1, x2, y2 = crop + joints[joints[..., 0] > x2, :] = 0 + joints[joints[..., 1] > y2, :] = 0 + joints[joints[..., 0] < x1, :] = 0 + joints[joints[..., 1] < y1, :] = 0 + joints[..., 0] -= x1 + joints[..., 1] -= y1 + return joints + + +@register_op +class RandomScaledCrop(BaseOperator): + """Resize image and bbox based on long side (with optional random scaling), + then crop or pad image to target size. + Args: + target_dim (int): target size. + scale_range (list): random scale range. + interp (int): interpolation method, default to `cv2.INTER_LINEAR`. + """ + + def __init__(self, + target_dim=512, + scale_range=[.1, 2.], + interp=cv2.INTER_LINEAR): + super(RandomScaledCrop, self).__init__() + self.target_dim = target_dim + self.scale_range = scale_range + self.interp = interp + + def apply(self, sample, context=None): + img = sample['image'] + h, w = img.shape[:2] + random_scale = np.random.uniform(*self.scale_range) + dim = self.target_dim + random_dim = int(dim * random_scale) + dim_max = max(h, w) + scale = random_dim / dim_max + resize_w = int(w * scale + 0.5) + resize_h = int(h * scale + 0.5) + offset_x = int(max(0, np.random.uniform(0., resize_w - dim))) + offset_y = int(max(0, np.random.uniform(0., resize_h - dim))) + + img = cv2.resize(img, (resize_w, resize_h), interpolation=self.interp) + img = np.array(img) + canvas = np.zeros((dim, dim, 3), dtype=img.dtype) + canvas[:min(dim, resize_h), :min(dim, resize_w), :] = img[ + offset_y:offset_y + dim, offset_x:offset_x + dim, :] + sample['image'] = canvas + sample['im_shape'] = np.asarray([resize_h, resize_w], dtype=np.float32) + scale_factor = sample['sacle_factor'] + sample['scale_factor'] = np.asarray( + [scale_factor[0] * scale, scale_factor[1] * scale], + dtype=np.float32) + + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + scale_array = np.array([scale, scale] * 2, dtype=np.float32) + shift_array = np.array([offset_x, offset_y] * 2, dtype=np.float32) + boxes = sample['gt_bbox'] * scale_array - shift_array + boxes = np.clip(boxes, 0, dim - 1) + # filter boxes with no area + area = np.prod(boxes[..., 2:] - boxes[..., :2], axis=1) + valid = (area > 1.).nonzero()[0] + sample['gt_bbox'] = boxes[valid] + sample['gt_class'] = sample['gt_class'][valid] + + return sample + + +@register_op +class Cutmix(BaseOperator): + def __init__(self, alpha=1.5, beta=1.5): + """ + CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features, see https://arxiv.org/abs/1905.04899 + Cutmix image and gt_bbbox/gt_score + Args: + alpha (float): alpha parameter of beta distribute + beta (float): beta parameter of beta distribute + """ + super(Cutmix, self).__init__() + self.alpha = alpha + self.beta = beta + if self.alpha <= 0.0: + raise ValueError("alpha shold be positive in {}".format(self)) + if self.beta <= 0.0: + raise ValueError("beta shold be positive in {}".format(self)) + + def apply_image(self, img1, img2, factor): + """ _rand_bbox """ + h = max(img1.shape[0], img2.shape[0]) + w = max(img1.shape[1], img2.shape[1]) + cut_rat = np.sqrt(1. - factor) + + cut_w = np.int32(w * cut_rat) + cut_h = np.int32(h * cut_rat) + + # uniform + cx = np.random.randint(w) + cy = np.random.randint(h) + + bbx1 = np.clip(cx - cut_w // 2, 0, w - 1) + bby1 = np.clip(cy - cut_h // 2, 0, h - 1) + bbx2 = np.clip(cx + cut_w // 2, 0, w - 1) + bby2 = np.clip(cy + cut_h // 2, 0, h - 1) + + img_1_pad = np.zeros((h, w, img1.shape[2]), 'float32') + img_1_pad[:img1.shape[0], :img1.shape[1], :] = \ + img1.astype('float32') + img_2_pad = np.zeros((h, w, img2.shape[2]), 'float32') + img_2_pad[:img2.shape[0], :img2.shape[1], :] = \ + img2.astype('float32') + img_1_pad[bby1:bby2, bbx1:bbx2, :] = img_2_pad[bby1:bby2, bbx1:bbx2, :] + return img_1_pad + + def __call__(self, sample, context=None): + if not isinstance(sample, Sequence): + return sample + + assert len(sample) == 2, 'cutmix need two samples' + + factor = np.random.beta(self.alpha, self.beta) + factor = max(0.0, min(1.0, factor)) + if factor >= 1.0: + return sample[0] + if factor <= 0.0: + return sample[1] + img1 = sample[0]['image'] + img2 = sample[1]['image'] + img = self.apply_image(img1, img2, factor) + gt_bbox1 = sample[0]['gt_bbox'] + gt_bbox2 = sample[1]['gt_bbox'] + gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0) + gt_class1 = sample[0]['gt_class'] + gt_class2 = sample[1]['gt_class'] + gt_class = np.concatenate((gt_class1, gt_class2), axis=0) + gt_score1 = np.ones_like(sample[0]['gt_class']) + gt_score2 = np.ones_like(sample[1]['gt_class']) + gt_score = np.concatenate( + (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0) + result = copy.deepcopy(sample[0]) + result['image'] = img + result['gt_bbox'] = gt_bbox + result['gt_score'] = gt_score + result['gt_class'] = gt_class + if 'is_crowd' in sample[0]: + is_crowd1 = sample[0]['is_crowd'] + is_crowd2 = sample[1]['is_crowd'] + is_crowd = np.concatenate((is_crowd1, is_crowd2), axis=0) + result['is_crowd'] = is_crowd + if 'difficult' in sample[0]: + is_difficult1 = sample[0]['difficult'] + is_difficult2 = sample[1]['difficult'] + is_difficult = np.concatenate( + (is_difficult1, is_difficult2), axis=0) + result['difficult'] = is_difficult + return result + + +@register_op +class Mixup(BaseOperator): + def __init__(self, alpha=1.5, beta=1.5): + """ Mixup image and gt_bbbox/gt_score + Args: + alpha (float): alpha parameter of beta distribute + beta (float): beta parameter of beta distribute + """ + super(Mixup, self).__init__() + self.alpha = alpha + self.beta = beta + if self.alpha <= 0.0: + raise ValueError("alpha shold be positive in {}".format(self)) + if self.beta <= 0.0: + raise ValueError("beta shold be positive in {}".format(self)) + + def apply_image(self, img1, img2, factor): + h = max(img1.shape[0], img2.shape[0]) + w = max(img1.shape[1], img2.shape[1]) + img = np.zeros((h, w, img1.shape[2]), 'float32') + img[:img1.shape[0], :img1.shape[1], :] = \ + img1.astype('float32') * factor + img[:img2.shape[0], :img2.shape[1], :] += \ + img2.astype('float32') * (1.0 - factor) + return img.astype('uint8') + + def __call__(self, sample, context=None): + if not isinstance(sample, Sequence): + return sample + + assert len(sample) == 2, 'mixup need two samples' + + factor = np.random.beta(self.alpha, self.beta) + factor = max(0.0, min(1.0, factor)) + if factor >= 1.0: + return sample[0] + if factor <= 0.0: + return sample[1] + im = self.apply_image(sample[0]['image'], sample[1]['image'], factor) + result = copy.deepcopy(sample[0]) + result['image'] = im + # apply bbox and score + if 'gt_bbox' in sample[0]: + gt_bbox1 = sample[0]['gt_bbox'] + gt_bbox2 = sample[1]['gt_bbox'] + gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0) + result['gt_bbox'] = gt_bbox + if 'gt_class' in sample[0]: + gt_class1 = sample[0]['gt_class'] + gt_class2 = sample[1]['gt_class'] + gt_class = np.concatenate((gt_class1, gt_class2), axis=0) + result['gt_class'] = gt_class + + gt_score1 = np.ones_like(sample[0]['gt_class']) + gt_score2 = np.ones_like(sample[1]['gt_class']) + gt_score = np.concatenate( + (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0) + result['gt_score'] = gt_score.astype('float32') + if 'is_crowd' in sample[0]: + is_crowd1 = sample[0]['is_crowd'] + is_crowd2 = sample[1]['is_crowd'] + is_crowd = np.concatenate((is_crowd1, is_crowd2), axis=0) + result['is_crowd'] = is_crowd + if 'difficult' in sample[0]: + is_difficult1 = sample[0]['difficult'] + is_difficult2 = sample[1]['difficult'] + is_difficult = np.concatenate( + (is_difficult1, is_difficult2), axis=0) + result['difficult'] = is_difficult + + if 'gt_ide' in sample[0]: + gt_ide1 = sample[0]['gt_ide'] + gt_ide2 = sample[1]['gt_ide'] + gt_ide = np.concatenate((gt_ide1, gt_ide2), axis=0) + result['gt_ide'] = gt_ide + return result + + +@register_op +class NormalizeBox(BaseOperator): + """Transform the bounding box's coornidates to [0,1].""" + + def __init__(self): + super(NormalizeBox, self).__init__() + + def apply(self, sample, context): + im = sample['image'] + gt_bbox = sample['gt_bbox'] + height, width, _ = im.shape + for i in range(gt_bbox.shape[0]): + gt_bbox[i][0] = gt_bbox[i][0] / width + gt_bbox[i][1] = gt_bbox[i][1] / height + gt_bbox[i][2] = gt_bbox[i][2] / width + gt_bbox[i][3] = gt_bbox[i][3] / height + sample['gt_bbox'] = gt_bbox + + if 'gt_keypoint' in sample.keys(): + gt_keypoint = sample['gt_keypoint'] + + for i in range(gt_keypoint.shape[1]): + if i % 2: + gt_keypoint[:, i] = gt_keypoint[:, i] / height + else: + gt_keypoint[:, i] = gt_keypoint[:, i] / width + sample['gt_keypoint'] = gt_keypoint + + return sample + + +@register_op +class BboxXYXY2XYWH(BaseOperator): + """ + Convert bbox XYXY format to XYWH format. + """ + + def __init__(self): + super(BboxXYXY2XYWH, self).__init__() + + def apply(self, sample, context=None): + assert 'gt_bbox' in sample + bbox = sample['gt_bbox'] + bbox[:, 2:4] = bbox[:, 2:4] - bbox[:, :2] + bbox[:, :2] = bbox[:, :2] + bbox[:, 2:4] / 2. + sample['gt_bbox'] = bbox + return sample + + +@register_op +class PadBox(BaseOperator): + def __init__(self, num_max_boxes=50): + """ + Pad zeros to bboxes if number of bboxes is less than num_max_boxes. + Args: + num_max_boxes (int): the max number of bboxes + """ + self.num_max_boxes = num_max_boxes + super(PadBox, self).__init__() + + def apply(self, sample, context=None): + assert 'gt_bbox' in sample + bbox = sample['gt_bbox'] + gt_num = min(self.num_max_boxes, len(bbox)) + num_max = self.num_max_boxes + # fields = context['fields'] if context else [] + pad_bbox = np.zeros((num_max, 4), dtype=np.float32) + if gt_num > 0: + pad_bbox[:gt_num, :] = bbox[:gt_num, :] + sample['gt_bbox'] = pad_bbox + if 'gt_class' in sample: + pad_class = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_class[:gt_num] = sample['gt_class'][:gt_num, 0] + sample['gt_class'] = pad_class + if 'gt_score' in sample: + pad_score = np.zeros((num_max, ), dtype=np.float32) + if gt_num > 0: + pad_score[:gt_num] = sample['gt_score'][:gt_num, 0] + sample['gt_score'] = pad_score + # in training, for example in op ExpandImage, + # the bbox and gt_class is expandded, but the difficult is not, + # so, judging by it's length + if 'difficult' in sample: + pad_diff = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_diff[:gt_num] = sample['difficult'][:gt_num, 0] + sample['difficult'] = pad_diff + if 'is_crowd' in sample: + pad_crowd = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_crowd[:gt_num] = sample['is_crowd'][:gt_num, 0] + sample['is_crowd'] = pad_crowd + if 'gt_ide' in sample: + pad_ide = np.zeros((num_max, ), dtype=np.int32) + if gt_num > 0: + pad_ide[:gt_num] = sample['gt_ide'][:gt_num, 0] + sample['gt_ide'] = pad_ide + return sample + + +@register_op +class DebugVisibleImage(BaseOperator): + """ + In debug mode, visualize images according to `gt_box`. + (Currently only supported when not cropping and flipping image.) + """ + + def __init__(self, output_dir='output/debug', is_normalized=False): + super(DebugVisibleImage, self).__init__() + self.is_normalized = is_normalized + self.output_dir = output_dir + if not os.path.isdir(output_dir): + os.makedirs(output_dir) + if not isinstance(self.is_normalized, bool): + raise TypeError("{}: input type is invalid.".format(self)) + + def apply(self, sample, context=None): + image = Image.fromarray(sample['image'].astype(np.uint8)) + out_file_name = '{:012d}.jpg'.format(sample['im_id'][0]) + width = sample['w'] + height = sample['h'] + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + draw = ImageDraw.Draw(image) + for i in range(gt_bbox.shape[0]): + if self.is_normalized: + gt_bbox[i][0] = gt_bbox[i][0] * width + gt_bbox[i][1] = gt_bbox[i][1] * height + gt_bbox[i][2] = gt_bbox[i][2] * width + gt_bbox[i][3] = gt_bbox[i][3] * height + + xmin, ymin, xmax, ymax = gt_bbox[i] + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=2, + fill='green') + # draw label + text = str(gt_class[i][0]) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill='green') + draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) + + if 'gt_keypoint' in sample.keys(): + gt_keypoint = sample['gt_keypoint'] + if self.is_normalized: + for i in range(gt_keypoint.shape[1]): + if i % 2: + gt_keypoint[:, i] = gt_keypoint[:, i] * height + else: + gt_keypoint[:, i] = gt_keypoint[:, i] * width + for i in range(gt_keypoint.shape[0]): + keypoint = gt_keypoint[i] + for j in range(int(keypoint.shape[0] / 2)): + x1 = round(keypoint[2 * j]).astype(np.int32) + y1 = round(keypoint[2 * j + 1]).astype(np.int32) + draw.ellipse( + (x1, y1, x1 + 5, y1 + 5), fill='green', outline='green') + save_path = os.path.join(self.output_dir, out_file_name) + image.save(save_path, quality=95) + return sample + + +@register_op +class Pad(BaseOperator): + def __init__(self, + size=None, + size_divisor=32, + pad_mode=0, + offsets=None, + fill_value=(127.5, 127.5, 127.5)): + """ + Pad image to a specified size or multiple of size_divisor. + Args: + size (int, Sequence): image target size, if None, pad to multiple of size_divisor, default None + size_divisor (int): size divisor, default 32 + pad_mode (int): pad mode, currently only supports four modes [-1, 0, 1, 2]. if -1, use specified offsets + if 0, only pad to right and bottom. if 1, pad according to center. if 2, only pad left and top + offsets (list): [offset_x, offset_y], specify offset while padding, only supported pad_mode=-1 + fill_value (bool): rgb value of pad area, default (127.5, 127.5, 127.5) + """ + super(Pad, self).__init__() + + if not isinstance(size, (int, Sequence)): + raise TypeError( + "Type of target_size is invalid when random_size is True. \ + Must be List, now is {}".format(type(size))) + + if isinstance(size, int): + size = [size, size] + + assert pad_mode in [ + -1, 0, 1, 2 + ], 'currently only supports four modes [-1, 0, 1, 2]' + if pad_mode == -1: + assert offsets, 'if pad_mode is -1, offsets should not be None' + + self.size = size + self.size_divisor = size_divisor + self.pad_mode = pad_mode + self.fill_value = fill_value + self.offsets = offsets + + def apply_segm(self, segms, offsets, im_size, size): + def _expand_poly(poly, x, y): + expanded_poly = np.array(poly) + expanded_poly[0::2] += x + expanded_poly[1::2] += y + return expanded_poly.tolist() + + def _expand_rle(rle, x, y, height, width, h, w): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, height, width) + mask = mask_util.decode(rle) + expanded_mask = np.full((h, w), 0).astype(mask.dtype) + expanded_mask[y:y + height, x:x + width] = mask + rle = mask_util.encode( + np.array( + expanded_mask, order='F', dtype=np.uint8)) + return rle + + x, y = offsets + height, width = im_size + h, w = size + expanded_segms = [] + for segm in segms: + if is_poly(segm): + # Polygon format + expanded_segms.append( + [_expand_poly(poly, x, y) for poly in segm]) + else: + # RLE format + import pycocotools.mask as mask_util + expanded_segms.append( + _expand_rle(segm, x, y, height, width, h, w)) + return expanded_segms + + def apply_bbox(self, bbox, offsets): + return bbox + np.array(offsets * 2, dtype=np.float32) + + def apply_keypoint(self, keypoints, offsets): + n = len(keypoints[0]) // 2 + return keypoints + np.array(offsets * n, dtype=np.float32) + + def apply_image(self, image, offsets, im_size, size): + x, y = offsets + im_h, im_w = im_size + h, w = size + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[y:y + im_h, x:x + im_w, :] = image.astype(np.float32) + return canvas + + def apply(self, sample, context=None): + im = sample['image'] + im_h, im_w = im.shape[:2] + if self.size: + h, w = self.size + assert ( + im_h <= h and im_w <= w + ), '(h, w) of target size should be greater than (im_h, im_w)' + else: + h = int(np.ceil(im_h / self.size_divisor) * self.size_divisor) + w = int(np.ceil(im_w / self.size_divisor) * self.size_divisor) + + if h == im_h and w == im_w: + sample['image'] = im.astype(np.float32) + return sample + + if self.pad_mode == -1: + offset_x, offset_y = self.offsets + elif self.pad_mode == 0: + offset_y, offset_x = 0, 0 + elif self.pad_mode == 1: + offset_y, offset_x = (h - im_h) // 2, (w - im_w) // 2 + else: + offset_y, offset_x = h - im_h, w - im_w + + offsets, im_size, size = [offset_x, offset_y], [im_h, im_w], [h, w] + + sample['image'] = self.apply_image(im, offsets, im_size, size) + + if self.pad_mode == 0: + return sample + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], offsets) + + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], offsets, + im_size, size) + + if 'gt_keypoint' in sample and len(sample['gt_keypoint']) > 0: + sample['gt_keypoint'] = self.apply_keypoint(sample['gt_keypoint'], + offsets) + + return sample + + +@register_op +class Poly2Mask(BaseOperator): + """ + gt poly to mask annotations. + Args: + del_poly (bool): Whether to delete poly after generating mask. Default: False. + """ + + def __init__(self, del_poly=False): + super(Poly2Mask, self).__init__() + import pycocotools.mask as maskUtils + self.maskutils = maskUtils + self.del_poly = del_poly + + def _poly2mask(self, mask_ann, img_h, img_w): + if isinstance(mask_ann, list): + # polygon -- a single object might consist of multiple parts + # we merge all parts into one mask rle code + rles = self.maskutils.frPyObjects(mask_ann, img_h, img_w) + rle = self.maskutils.merge(rles) + elif isinstance(mask_ann['counts'], list): + # uncompressed RLE + rle = self.maskutils.frPyObjects(mask_ann, img_h, img_w) + else: + # rle + rle = mask_ann + mask = self.maskutils.decode(rle) + return mask + + def apply(self, sample, context=None): + assert 'gt_poly' in sample + im_h, im_w = sample['im_shape'] + masks = [ + self._poly2mask(gt_poly, im_h, im_w) + for gt_poly in sample['gt_poly'] + ] + sample['gt_segm'] = np.asarray(masks).astype(np.uint8) + if self.del_poly: + del (sample['gt_poly']) + + return sample + + +@register_op +class AugmentHSV(BaseOperator): + """ + Augment the SV channel of image data. + Args: + fraction (float): the fraction for augment. Default: 0.5. + is_bgr (bool): whether the image is BGR mode. Default: True. + hgain (float): H channel gains + sgain (float): S channel gains + vgain (float): V channel gains + """ + + def __init__(self, + fraction=0.50, + is_bgr=True, + hgain=None, + sgain=None, + vgain=None): + super(AugmentHSV, self).__init__() + self.fraction = fraction + self.is_bgr = is_bgr + self.hgain = hgain + self.sgain = sgain + self.vgain = vgain + self.use_hsvgain = False if hgain is None else True + + def apply(self, sample, context=None): + img = sample['image'] + if self.is_bgr: + img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) + else: + img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV) + + if self.use_hsvgain: + hsv_augs = np.random.uniform( + -1, 1, 3) * [self.hgain, self.sgain, self.vgain] + # random selection of h, s, v + hsv_augs *= np.random.randint(0, 2, 3) + img_hsv[..., 0] = (img_hsv[..., 0] + hsv_augs[0]) % 180 + img_hsv[..., 1] = np.clip(img_hsv[..., 1] + hsv_augs[1], 0, 255) + img_hsv[..., 2] = np.clip(img_hsv[..., 2] + hsv_augs[2], 0, 255) + + else: + S = img_hsv[:, :, 1].astype(np.float32) + V = img_hsv[:, :, 2].astype(np.float32) + + a = (random.random() * 2 - 1) * self.fraction + 1 + S *= a + if a > 1: + np.clip(S, a_min=0, a_max=255, out=S) + + a = (random.random() * 2 - 1) * self.fraction + 1 + V *= a + if a > 1: + np.clip(V, a_min=0, a_max=255, out=V) + + img_hsv[:, :, 1] = S.astype(np.uint8) + img_hsv[:, :, 2] = V.astype(np.uint8) + + if self.is_bgr: + cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img) + else: + cv2.cvtColor(img_hsv, cv2.COLOR_HSV2RGB, dst=img) + + sample['image'] = img.astype(np.float32) + return sample + + +@register_op +class Norm2PixelBbox(BaseOperator): + """ + Transform the bounding box's coornidates which is in [0,1] to pixels. + """ + + def __init__(self): + super(Norm2PixelBbox, self).__init__() + + def apply(self, sample, context=None): + assert 'gt_bbox' in sample + bbox = sample['gt_bbox'] + height, width = sample['image'].shape[:2] + bbox[:, 0::2] = bbox[:, 0::2] * width + bbox[:, 1::2] = bbox[:, 1::2] * height + sample['gt_bbox'] = bbox + return sample + + +@register_op +class BboxCXCYWH2XYXY(BaseOperator): + """ + Convert bbox CXCYWH format to XYXY format. + [center_x, center_y, width, height] -> [x0, y0, x1, y1] + """ + + def __init__(self): + super(BboxCXCYWH2XYXY, self).__init__() + + def apply(self, sample, context=None): + assert 'gt_bbox' in sample + bbox0 = sample['gt_bbox'] + bbox = bbox0.copy() + + bbox[:, :2] = bbox0[:, :2] - bbox0[:, 2:4] / 2. + bbox[:, 2:4] = bbox0[:, :2] + bbox0[:, 2:4] / 2. + sample['gt_bbox'] = bbox + return sample + + +@register_op +class RandomResizeCrop(BaseOperator): + """Random resize and crop image and bboxes. + Args: + resizes (list): resize image to one of resizes. if keep_ratio is True and mode is + 'long', resize the image's long side to the maximum of target_size, if keep_ratio is + True and mode is 'short', resize the image's short side to the minimum of target_size. + cropsizes (list): crop sizes after resize, [(min_crop_1, max_crop_1), ...] + mode (str): resize mode, `long` or `short`. Details see resizes. + prob (float): probability of this op. + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): the interpolation method + thresholds (list): iou thresholds for decide a valid bbox crop. + num_attempts (int): number of tries before giving up. + allow_no_crop (bool): allow return without actually cropping them. + cover_all_box (bool): ensure all bboxes are covered in the final crop. + is_mask_crop(bool): whether crop the segmentation. + """ + + def __init__(self, + resizes, + cropsizes, + prob=0.5, + mode='short', + keep_ratio=True, + interp=cv2.INTER_LINEAR, + num_attempts=3, + cover_all_box=False, + allow_no_crop=False, + thresholds=[0.3, 0.5, 0.7], + is_mask_crop=False, + ioumode="iou"): + super(RandomResizeCrop, self).__init__() + + self.resizes = resizes + self.cropsizes = cropsizes + self.prob = prob + self.mode = mode + self.ioumode = ioumode + + self.resizer = Resize(0, keep_ratio=keep_ratio, interp=interp) + self.croper = RandomCrop( + num_attempts=num_attempts, + cover_all_box=cover_all_box, + thresholds=thresholds, + allow_no_crop=allow_no_crop, + is_mask_crop=is_mask_crop) + + def _format_size(self, size): + if isinstance(size, Integral): + size = (size, size) + return size + + def apply(self, sample, context=None): + if random.random() < self.prob: + _resize = self._format_size(random.choice(self.resizes)) + _cropsize = self._format_size(random.choice(self.cropsizes)) + sample = self._resize( + self.resizer, + sample, + size=_resize, + mode=self.mode, + context=context) + sample = self._random_crop( + self.croper, sample, size=_cropsize, context=context) + return sample + + @staticmethod + def _random_crop(croper, sample, size, context=None): + if 'gt_bbox' in sample and len(sample['gt_bbox']) == 0: + return sample + + self = croper + h, w = sample['image'].shape[:2] + gt_bbox = sample['gt_bbox'] + cropsize = size + min_crop = min(cropsize) + max_crop = max(cropsize) + + thresholds = list(self.thresholds) + np.random.shuffle(thresholds) + + for thresh in thresholds: + found = False + for _ in range(self.num_attempts): + + crop_h = random.randint(min_crop, min(h, max_crop)) + crop_w = random.randint(min_crop, min(w, max_crop)) + + crop_y = random.randint(0, h - crop_h) + crop_x = random.randint(0, w - crop_w) + + crop_box = [crop_x, crop_y, crop_x + crop_w, crop_y + crop_h] + if self.ioumode == "iof": + iou = self._gtcropiou_matrix( + gt_bbox, np.array( + [crop_box], dtype=np.float32)) + elif self.ioumode == "iou": + iou = self._iou_matrix( + gt_bbox, np.array( + [crop_box], dtype=np.float32)) + if iou.max() < thresh: + continue + + if self.cover_all_box and iou.min() < thresh: + continue + + cropped_box, valid_ids = self._crop_box_with_center_constraint( + gt_bbox, np.array( + crop_box, dtype=np.float32)) + if valid_ids.size > 0: + found = True + break + + if found: + if self.is_mask_crop and 'gt_poly' in sample and len(sample[ + 'gt_poly']) > 0: + crop_polys = self.crop_segms( + sample['gt_poly'], + valid_ids, + np.array( + crop_box, dtype=np.int64), + h, + w) + if [] in crop_polys: + delete_id = list() + valid_polys = list() + for id, crop_poly in enumerate(crop_polys): + if crop_poly == []: + delete_id.append(id) + else: + valid_polys.append(crop_poly) + valid_ids = np.delete(valid_ids, delete_id) + if len(valid_polys) == 0: + return sample + sample['gt_poly'] = valid_polys + else: + sample['gt_poly'] = crop_polys + + if 'gt_segm' in sample: + sample['gt_segm'] = self._crop_segm(sample['gt_segm'], + crop_box) + sample['gt_segm'] = np.take( + sample['gt_segm'], valid_ids, axis=0) + + sample['image'] = self._crop_image(sample['image'], crop_box) + sample['gt_bbox'] = np.take(cropped_box, valid_ids, axis=0) + sample['gt_class'] = np.take( + sample['gt_class'], valid_ids, axis=0) + if 'gt_score' in sample: + sample['gt_score'] = np.take( + sample['gt_score'], valid_ids, axis=0) + + if 'is_crowd' in sample: + sample['is_crowd'] = np.take( + sample['is_crowd'], valid_ids, axis=0) + + if 'gt_areas' in sample: + sample['gt_areas'] = np.take( + sample['gt_areas'], valid_ids, axis=0) + + if 'gt_joints' in sample: + gt_joints = self._crop_joints(sample['gt_joints'], crop_box) + sample['gt_joints'] = gt_joints[valid_ids] + return sample + + return sample + + @staticmethod + def _resize(resizer, sample, size, mode='short', context=None): + self = resizer + im = sample['image'] + target_size = size + + if not isinstance(im, np.ndarray): + raise TypeError("{}: image type is not numpy.".format(self)) + if len(im.shape) != 3: + raise ImageError('{}: image is not 3-dimensional.'.format(self)) + + # apply image + im_shape = im.shape + if self.keep_ratio: + + im_size_min = np.min(im_shape[0:2]) + im_size_max = np.max(im_shape[0:2]) + + target_size_min = np.min(target_size) + target_size_max = np.max(target_size) + + if mode == 'long': + im_scale = min(target_size_min / im_size_min, + target_size_max / im_size_max) + else: + im_scale = max(target_size_min / im_size_min, + target_size_max / im_size_max) + + resize_h = int(im_scale * float(im_shape[0]) + 0.5) + resize_w = int(im_scale * float(im_shape[1]) + 0.5) + + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = target_size + im_scale_y = resize_h / im_shape[0] + im_scale_x = resize_w / im_shape[1] + + im = self.apply_image(sample['image'], [im_scale_x, im_scale_y]) + sample['image'] = im + sample['im_shape'] = np.asarray([resize_h, resize_w], dtype=np.float32) + if 'scale_factor' in sample: + scale_factor = sample['scale_factor'] + sample['scale_factor'] = np.asarray( + [scale_factor[0] * im_scale_y, scale_factor[1] * im_scale_x], + dtype=np.float32) + else: + sample['scale_factor'] = np.asarray( + [im_scale_y, im_scale_x], dtype=np.float32) + + # apply bbox + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox(sample['gt_bbox'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + # apply polygon + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im_shape[:2], + [im_scale_x, im_scale_y]) + + # apply semantic + if 'semantic' in sample and sample['semantic']: + semantic = sample['semantic'] + semantic = cv2.resize( + semantic.astype('float32'), + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + semantic = np.asarray(semantic).astype('int32') + semantic = np.expand_dims(semantic, 0) + sample['semantic'] = semantic + + # apply gt_segm + if 'gt_segm' in sample and len(sample['gt_segm']) > 0: + masks = [ + cv2.resize( + gt_segm, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_NEAREST) + for gt_segm in sample['gt_segm'] + ] + sample['gt_segm'] = np.asarray(masks).astype(np.uint8) + + if 'gt_joints' in sample: + sample['gt_joints'] = self.apply_joints(sample['gt_joints'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + return sample + + +@register_op +class RandomSelect(BaseOperator): + """ + Randomly choose a transformation between transforms1 and transforms2, + and the probability of choosing transforms1 is p. + + The code is based on https://github.com/facebookresearch/detr/blob/main/datasets/transforms.py + + """ + + def __init__(self, transforms1, transforms2, p=0.5): + super(RandomSelect, self).__init__() + self.transforms1 = Compose(transforms1) + self.transforms2 = Compose(transforms2) + self.p = p + + def apply(self, sample, context=None): + if random.random() < self.p: + return self.transforms1(sample) + return self.transforms2(sample) + + +@register_op +class RandomShortSideResize(BaseOperator): + def __init__(self, + short_side_sizes, + max_size=None, + interp=cv2.INTER_LINEAR, + random_interp=False): + """ + Resize the image randomly according to the short side. If max_size is not None, + the long side is scaled according to max_size. The whole process will be keep ratio. + Args: + short_side_sizes (list|tuple): Image target short side size. + max_size (int): The size of the longest side of image after resize. + interp (int): The interpolation method. + random_interp (bool): Whether random select interpolation method. + """ + super(RandomShortSideResize, self).__init__() + + assert isinstance(short_side_sizes, + Sequence), "short_side_sizes must be List or Tuple" + + self.short_side_sizes = short_side_sizes + self.max_size = max_size + self.interp = interp + self.random_interp = random_interp + self.interps = [ + cv2.INTER_NEAREST, + cv2.INTER_LINEAR, + cv2.INTER_AREA, + cv2.INTER_CUBIC, + cv2.INTER_LANCZOS4, + ] + + def get_size_with_aspect_ratio(self, image_shape, size, max_size=None): + h, w = image_shape + if max_size is not None: + min_original_size = float(min((w, h))) + max_original_size = float(max((w, h))) + if max_original_size / min_original_size * size > max_size: + size = int( + round(max_size * min_original_size / max_original_size)) + + if (w <= h and w == size) or (h <= w and h == size): + return (w, h) + + if w < h: + ow = size + oh = int(round(size * h / w)) + else: + oh = size + ow = int(round(size * w / h)) + + return (ow, oh) + + def resize(self, + sample, + target_size, + max_size=None, + interp=cv2.INTER_LINEAR): + im = sample['image'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image type is not numpy.".format(self)) + if len(im.shape) != 3: + raise ImageError('{}: image is not 3-dimensional.'.format(self)) + + target_size = self.get_size_with_aspect_ratio(im.shape[:2], target_size, + max_size) + im_scale_y, im_scale_x = target_size[1] / im.shape[0], target_size[ + 0] / im.shape[1] + + sample['image'] = cv2.resize(im, target_size, interpolation=interp) + sample['im_shape'] = np.asarray(target_size[::-1], dtype=np.float32) + if 'scale_factor' in sample: + scale_factor = sample['scale_factor'] + sample['scale_factor'] = np.asarray( + [scale_factor[0] * im_scale_y, scale_factor[1] * im_scale_x], + dtype=np.float32) + else: + sample['scale_factor'] = np.asarray( + [im_scale_y, im_scale_x], dtype=np.float32) + + # apply bbox + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_bbox( + sample['gt_bbox'], [im_scale_x, im_scale_y], target_size) + # apply polygon + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im.shape[:2], + [im_scale_x, im_scale_y]) + # apply semantic + if 'semantic' in sample and sample['semantic']: + semantic = sample['semantic'] + semantic = cv2.resize( + semantic.astype('float32'), + target_size, + interpolation=self.interp) + semantic = np.asarray(semantic).astype('int32') + semantic = np.expand_dims(semantic, 0) + sample['semantic'] = semantic + # apply gt_segm + if 'gt_segm' in sample and len(sample['gt_segm']) > 0: + masks = [ + cv2.resize( + gt_segm, target_size, interpolation=cv2.INTER_NEAREST) + for gt_segm in sample['gt_segm'] + ] + sample['gt_segm'] = np.asarray(masks).astype(np.uint8) + + if 'gt_joints' in sample: + sample['gt_joints'] = self.apply_joints( + sample['gt_joints'], [im_scale_x, im_scale_y], target_size) + + # apply areas + if 'gt_areas' in sample: + sample['gt_areas'] = self.apply_area(sample['gt_areas'], + [im_scale_x, im_scale_y]) + + return sample + + def apply_bbox(self, bbox, scale, size): + im_scale_x, im_scale_y = scale + resize_w, resize_h = size + bbox[:, 0::2] *= im_scale_x + bbox[:, 1::2] *= im_scale_y + bbox[:, 0::2] = np.clip(bbox[:, 0::2], 0, resize_w) + bbox[:, 1::2] = np.clip(bbox[:, 1::2], 0, resize_h) + return bbox.astype('float32') + + def apply_joints(self, joints, scale, size): + im_scale_x, im_scale_y = scale + resize_w, resize_h = size + joints[..., 0] *= im_scale_x + joints[..., 1] *= im_scale_y + # joints[joints[..., 0] >= resize_w, :] = 0 + # joints[joints[..., 1] >= resize_h, :] = 0 + # joints[joints[..., 0] < 0, :] = 0 + # joints[joints[..., 1] < 0, :] = 0 + joints[..., 0] = np.clip(joints[..., 0], 0, resize_w) + joints[..., 1] = np.clip(joints[..., 1], 0, resize_h) + return joints + + def apply_area(self, area, scale): + im_scale_x, im_scale_y = scale + return area * im_scale_x * im_scale_y + + def apply_segm(self, segms, im_size, scale): + def _resize_poly(poly, im_scale_x, im_scale_y): + resized_poly = np.array(poly).astype('float32') + resized_poly[0::2] *= im_scale_x + resized_poly[1::2] *= im_scale_y + return resized_poly.tolist() + + def _resize_rle(rle, im_h, im_w, im_scale_x, im_scale_y): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, im_h, im_w) + + mask = mask_util.decode(rle) + mask = cv2.resize( + mask, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8)) + return rle + + im_h, im_w = im_size + im_scale_x, im_scale_y = scale + resized_segms = [] + for segm in segms: + if is_poly(segm): + # Polygon format + resized_segms.append([ + _resize_poly(poly, im_scale_x, im_scale_y) for poly in segm + ]) + else: + # RLE format + import pycocotools.mask as mask_util + resized_segms.append( + _resize_rle(segm, im_h, im_w, im_scale_x, im_scale_y)) + + return resized_segms + + def apply(self, sample, context=None): + target_size = random.choice(self.short_side_sizes) + interp = random.choice( + self.interps) if self.random_interp else self.interp + + return self.resize(sample, target_size, self.max_size, interp) + + +@register_op +class RandomShortSideRangeResize(RandomShortSideResize): + def __init__(self, scales, interp=cv2.INTER_LINEAR, random_interp=False): + """ + Resize the image randomly according to the short side. If max_size is not None, + the long side is scaled according to max_size. The whole process will be keep ratio. + Args: + short_side_sizes (list|tuple): Image target short side size. + interp (int): The interpolation method. + random_interp (bool): Whether random select interpolation method. + """ + super(RandomShortSideRangeResize, self).__init__(scales, None, interp, + random_interp) + + assert isinstance(scales, + Sequence), "short_side_sizes must be List or Tuple" + + self.scales = scales + + def random_sample(self, img_scales): + img_scale_long = [max(s) for s in img_scales] + img_scale_short = [min(s) for s in img_scales] + long_edge = np.random.randint( + min(img_scale_long), max(img_scale_long) + 1) + short_edge = np.random.randint( + min(img_scale_short), max(img_scale_short) + 1) + img_scale = (long_edge, short_edge) + return img_scale + + def apply(self, sample, context=None): + long_edge, short_edge = self.random_sample(self.short_side_sizes) + # print("target size:{}".format((long_edge, short_edge))) + interp = random.choice( + self.interps) if self.random_interp else self.interp + + return self.resize(sample, short_edge, long_edge, interp) + + +@register_op +class RandomSizeCrop(BaseOperator): + """ + Cut the image randomly according to `min_size` and `max_size` + Args: + min_size (int): Min size for edges of cropped image. + max_size (int): Max size for edges of cropped image. If it + is set to larger than length of the input image, + the output will keep the origin length. + keep_empty (bool): Whether to keep the cropped result with no object. + If it is set to False, the no-object result will not + be returned, replaced by the original input. + """ + + def __init__(self, min_size, max_size, keep_empty=True): + super(RandomSizeCrop, self).__init__() + self.min_size = min_size + self.max_size = max_size + self.keep_empty = keep_empty + + from paddle.vision.transforms.functional import crop as paddle_crop + self.paddle_crop = paddle_crop + + @staticmethod + def get_crop_params(img_shape, output_size): + """Get parameters for ``crop`` for a random crop. + Args: + img_shape (list|tuple): Image's height and width. + output_size (list|tuple): Expected output size of the crop. + Returns: + tuple: params (i, j, h, w) to be passed to ``crop`` for random crop. + """ + h, w = img_shape + th, tw = output_size + + if h + 1 < th or w + 1 < tw: + raise ValueError( + "Required crop size {} is larger then input image size {}". + format((th, tw), (h, w))) + + if w == tw and h == th: + return 0, 0, h, w + + i = random.randint(0, h - th + 1) + j = random.randint(0, w - tw + 1) + return i, j, th, tw + + def crop(self, sample, region): + keep_index = None + # apply bbox and check whether the cropped result is valid + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + croped_bbox = self.apply_bbox(sample['gt_bbox'], region) + bbox = croped_bbox.reshape([-1, 2, 2]) + area = (bbox[:, 1, :] - bbox[:, 0, :]).prod(axis=1) + keep_index = np.where(area > 0)[0] + + if not self.keep_empty and len(keep_index) == 0: + # When keep_empty is set to False, cropped with no-object will + # not be used and return the origin content. + return sample + + sample['gt_bbox'] = croped_bbox[keep_index] if len( + keep_index) > 0 else np.zeros( + [0, 4], dtype=np.float32) + sample['gt_class'] = sample['gt_class'][keep_index] if len( + keep_index) > 0 else np.zeros( + [0, 1], dtype=np.float32) + if 'gt_score' in sample: + sample['gt_score'] = sample['gt_score'][keep_index] if len( + keep_index) > 0 else np.zeros( + [0, 1], dtype=np.float32) + if 'is_crowd' in sample: + sample['is_crowd'] = sample['is_crowd'][keep_index] if len( + keep_index) > 0 else np.zeros( + [0, 1], dtype=np.float32) + if 'gt_areas' in sample: + sample['gt_areas'] = np.take( + sample['gt_areas'], keep_index, axis=0) + + image_shape = sample['image'].shape[:2] + sample['image'] = self.paddle_crop(sample['image'], *region) + sample['im_shape'] = np.array( + sample['image'].shape[:2], dtype=np.float32) + + # apply polygon + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_segm(sample['gt_poly'], region, + image_shape) + sample['gt_poly'] = np.array(sample['gt_poly']) + if keep_index is not None and len(keep_index) > 0: + sample['gt_poly'] = sample['gt_poly'][keep_index] + sample['gt_poly'] = sample['gt_poly'].tolist() + # apply gt_segm + if 'gt_segm' in sample and len(sample['gt_segm']) > 0: + i, j, h, w = region + sample['gt_segm'] = sample['gt_segm'][:, i:i + h, j:j + w] + if keep_index is not None and len(keep_index) > 0: + sample['gt_segm'] = sample['gt_segm'][keep_index] + + if 'gt_joints' in sample: + gt_joints = self._crop_joints(sample['gt_joints'], region) + sample['gt_joints'] = gt_joints + if keep_index is not None: + sample['gt_joints'] = sample['gt_joints'][keep_index] + + return sample + + def apply_bbox(self, bbox, region): + i, j, h, w = region + region_size = np.asarray([w, h]) + crop_bbox = bbox - np.asarray([j, i, j, i]) + crop_bbox = np.minimum(crop_bbox.reshape([-1, 2, 2]), region_size) + crop_bbox = crop_bbox.clip(min=0) + return crop_bbox.reshape([-1, 4]).astype('float32') + + def _crop_joints(self, joints, region): + y1, x1, h, w = region + x2 = x1 + w + y2 = y1 + h + # x1, y1, x2, y2 = crop + joints[..., 0] -= x1 + joints[..., 1] -= y1 + joints[joints[..., 0] > w, :] = 0 + joints[joints[..., 1] > h, :] = 0 + joints[joints[..., 0] < 0, :] = 0 + joints[joints[..., 1] < 0, :] = 0 + return joints + + def apply_segm(self, segms, region, image_shape): + def _crop_poly(segm, crop): + xmin, ymin, xmax, ymax = crop + crop_coord = [xmin, ymin, xmin, ymax, xmax, ymax, xmax, ymin] + crop_p = np.array(crop_coord).reshape(4, 2) + crop_p = Polygon(crop_p) + + crop_segm = list() + for poly in segm: + poly = np.array(poly).reshape(len(poly) // 2, 2) + polygon = Polygon(poly) + if not polygon.is_valid: + exterior = polygon.exterior + multi_lines = exterior.intersection(exterior) + polygons = shapely.ops.polygonize(multi_lines) + polygon = MultiPolygon(polygons) + multi_polygon = list() + if isinstance(polygon, MultiPolygon): + multi_polygon = copy.deepcopy(polygon) + else: + multi_polygon.append(copy.deepcopy(polygon)) + for per_polygon in multi_polygon: + inter = per_polygon.intersection(crop_p) + if not inter: + continue + if isinstance(inter, (MultiPolygon, GeometryCollection)): + for part in inter: + if not isinstance(part, Polygon): + continue + part = np.squeeze( + np.array(part.exterior.coords[:-1]).reshape(1, + -1)) + part[0::2] -= xmin + part[1::2] -= ymin + crop_segm.append(part.tolist()) + elif isinstance(inter, Polygon): + crop_poly = np.squeeze( + np.array(inter.exterior.coords[:-1]).reshape(1, -1)) + crop_poly[0::2] -= xmin + crop_poly[1::2] -= ymin + crop_segm.append(crop_poly.tolist()) + else: + continue + return crop_segm + + def _crop_rle(rle, crop, height, width): + if 'counts' in rle and type(rle['counts']) == list: + rle = mask_util.frPyObjects(rle, height, width) + mask = mask_util.decode(rle) + mask = mask[crop[1]:crop[3], crop[0]:crop[2]] + rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8)) + return rle + + i, j, h, w = region + crop = [j, i, j + w, i + h] + height, width = image_shape + crop_segms = [] + for segm in segms: + if is_poly(segm): + import copy + import shapely.ops + from shapely.geometry import Polygon, MultiPolygon, GeometryCollection + # Polygon format + crop_segms.append(_crop_poly(segm, crop)) + else: + # RLE format + import pycocotools.mask as mask_util + crop_segms.append(_crop_rle(segm, crop, height, width)) + return crop_segms + + def apply(self, sample, context=None): + h = random.randint(self.min_size, + min(sample['image'].shape[0], self.max_size)) + w = random.randint(self.min_size, + min(sample['image'].shape[1], self.max_size)) + + region = self.get_crop_params(sample['image'].shape[:2], [h, w]) + return self.crop(sample, region) + + +@register_op +class WarpAffine(BaseOperator): + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1, + down_ratio=4): + """WarpAffine + Warp affine the image + The code is based on https://github.com/xingyizhou/CenterNet/blob/master/src/lib/datasets/sample/ctdet.py + """ + super(WarpAffine, self).__init__() + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + self.down_ratio = down_ratio + + def apply(self, sample, context=None): + img = sample['image'] + img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + # True in detection eval/infer + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + else: + # False in centertrack eval_mot/eval_mot + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + sample['image'] = inp + + if not self.keep_res: + out_h = input_h // self.down_ratio + out_w = input_w // self.down_ratio + trans_output = get_affine_transform(c, s, 0, [out_w, out_h]) + + sample.update({ + 'center': c, + 'scale': s, + 'out_height': out_h, + 'out_width': out_w, + 'inp_height': input_h, + 'inp_width': input_w, + 'trans_input': trans_input, + 'trans_output': trans_output, + }) + return sample + + +@register_op +class FlipWarpAffine(BaseOperator): + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + not_rand_crop=False, + scale=0.4, + shift=0.1, + flip=0.5, + is_scale=True, + use_random=True, + add_pre_img=False): + """FlipWarpAffine + 1. Random Crop + 2. Flip the image horizontal + 3. Warp affine the image + 4. (Optinal) Add previous image + """ + super(FlipWarpAffine, self).__init__() + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.not_rand_crop = not_rand_crop + self.scale = scale + self.shift = shift + self.flip = flip + self.is_scale = is_scale + self.use_random = use_random + self.add_pre_img = add_pre_img + + def __call__(self, samples, context=None): + if self.add_pre_img: + assert isinstance(samples, Sequence) and len(samples) == 2 + sample, pre_sample = samples[0], samples[1] + else: + sample = samples + + img = sample['image'] + img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) + if 'gt_bbox' in sample and len(sample['gt_bbox']) == 0: + return sample + + h, w = img.shape[:2] + flipped = 0 + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + else: + # centernet training default + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + if self.use_random: + gt_bbox = sample['gt_bbox'] + if not self.not_rand_crop: + # centernet default + s = s * np.random.choice(np.arange(0.6, 1.4, 0.1)) + w_border = get_border(128, w) + h_border = get_border(128, h) + c[0] = np.random.randint(low=w_border, high=w - w_border) + c[1] = np.random.randint(low=h_border, high=h - h_border) + else: + sf = self.scale + cf = self.shift + c[0] += s * np.clip(np.random.randn() * cf, -2 * cf, 2 * cf) + c[1] += s * np.clip(np.random.randn() * cf, -2 * cf, 2 * cf) + s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf) + + if np.random.random() < self.flip: + img = img[:, ::-1, :] + c[0] = w - c[0] - 1 + oldx1 = gt_bbox[:, 0].copy() + oldx2 = gt_bbox[:, 2].copy() + gt_bbox[:, 0] = w - oldx2 - 1 + gt_bbox[:, 2] = w - oldx1 - 1 + flipped = 1 + sample['gt_bbox'] = gt_bbox + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + if self.is_scale: + inp = (inp.astype(np.float32) / 255.) + + sample['image'] = inp + sample['center'] = c + sample['scale'] = s + + if self.add_pre_img: + sample['trans_input'] = trans_input + + # previous image, use same aug trans_input as current image + pre_img = pre_sample['image'] + pre_img = cv2.cvtColor(pre_img, cv2.COLOR_RGB2BGR) + if flipped: + pre_img = pre_img[:, ::-1, :].copy() + pre_inp = cv2.warpAffine( + pre_img, + trans_input, (input_w, input_h), + flags=cv2.INTER_LINEAR) + if self.is_scale: + pre_inp = (pre_inp.astype(np.float32) / 255.) + sample['pre_image'] = pre_inp + + # if empty gt_bbox + if 'gt_bbox' in pre_sample and len(pre_sample['gt_bbox']) == 0: + return sample + pre_gt_bbox = pre_sample['gt_bbox'] + if flipped: + pre_oldx1 = pre_gt_bbox[:, 0].copy() + pre_oldx2 = pre_gt_bbox[:, 2].copy() + pre_gt_bbox[:, 0] = w - pre_oldx1 - 1 + pre_gt_bbox[:, 2] = w - pre_oldx2 - 1 + sample['pre_gt_bbox'] = pre_gt_bbox + + sample['pre_gt_class'] = pre_sample['gt_class'] + sample['pre_gt_track_id'] = pre_sample['gt_track_id'] + del pre_sample + + return sample + + +@register_op +class CenterRandColor(BaseOperator): + """Random color for CenterNet series models. + Args: + saturation (float): saturation settings. + contrast (float): contrast settings. + brightness (float): brightness settings. + """ + + def __init__(self, saturation=0.4, contrast=0.4, brightness=0.4): + super(CenterRandColor, self).__init__() + self.saturation = saturation + self.contrast = contrast + self.brightness = brightness + + def apply_saturation(self, img, img_gray): + alpha = 1. + np.random.uniform( + low=-self.saturation, high=self.saturation) + self._blend(alpha, img, img_gray[:, :, None]) + return img + + def apply_contrast(self, img, img_gray): + alpha = 1. + np.random.uniform(low=-self.contrast, high=self.contrast) + img_mean = img_gray.mean() + self._blend(alpha, img, img_mean) + return img + + def apply_brightness(self, img, img_gray): + alpha = 1 + np.random.uniform( + low=-self.brightness, high=self.brightness) + img *= alpha + return img + + def _blend(self, alpha, img, img_mean): + img *= alpha + img_mean *= (1 - alpha) + img += img_mean + + def apply(self, sample, context=None): + functions = [ + self.apply_brightness, + self.apply_contrast, + self.apply_saturation, + ] + + img = sample['image'] + img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) + distortions = np.random.permutation(functions) + for func in distortions: + img = func(img, img_gray) + sample['image'] = img + + if 'pre_image' in sample: + pre_img = sample['pre_image'] + pre_img_gray = cv2.cvtColor(pre_img, cv2.COLOR_BGR2GRAY) + pre_distortions = np.random.permutation(functions) + for func in pre_distortions: + pre_img = func(pre_img, pre_img_gray) + sample['pre_image'] = pre_img + + return sample + + +@register_op +class Mosaic(BaseOperator): + """ Mosaic operator for image and gt_bboxes + The code is based on https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/data/datasets/mosaicdetection.py + + 1. get mosaic coords + 2. clip bbox and get mosaic_labels + 3. random_affine augment + 4. Mixup augment as copypaste (optinal), not used in tiny/nano + + Args: + prob (float): probability of using Mosaic, 1.0 as default + input_dim (list[int]): input shape + degrees (list[2]): the rotate range to apply, transform range is [min, max] + translate (list[2]): the translate range to apply, transform range is [min, max] + scale (list[2]): the scale range to apply, transform range is [min, max] + shear (list[2]): the shear range to apply, transform range is [min, max] + enable_mixup (bool): whether to enable Mixup or not + mixup_prob (float): probability of using Mixup, 1.0 as default + mixup_scale (list[int]): scale range of Mixup + remove_outside_box (bool): whether remove outside boxes, False as + default in COCO dataset, True in MOT dataset + """ + + def __init__(self, + prob=1.0, + input_dim=[640, 640], + degrees=[-10, 10], + translate=[-0.1, 0.1], + scale=[0.1, 2], + shear=[-2, 2], + enable_mixup=True, + mixup_prob=1.0, + mixup_scale=[0.5, 1.5], + remove_outside_box=False): + super(Mosaic, self).__init__() + self.prob = prob + if isinstance(input_dim, Integral): + input_dim = [input_dim, input_dim] + self.input_dim = input_dim + self.degrees = degrees + self.translate = translate + self.scale = scale + self.shear = shear + self.enable_mixup = enable_mixup + self.mixup_prob = mixup_prob + self.mixup_scale = mixup_scale + self.remove_outside_box = remove_outside_box + + def get_mosaic_coords(self, mosaic_idx, xc, yc, w, h, input_h, input_w): + # (x1, y1, x2, y2) means coords in large image, + # small_coords means coords in small image in mosaic aug. + if mosaic_idx == 0: + # top left + x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc + small_coords = w - (x2 - x1), h - (y2 - y1), w, h + elif mosaic_idx == 1: + # top right + x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc + small_coords = 0, h - (y2 - y1), min(w, x2 - x1), h + elif mosaic_idx == 2: + # bottom left + x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h) + small_coords = w - (x2 - x1), 0, w, min(y2 - y1, h) + elif mosaic_idx == 3: + # bottom right + x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, + yc + h) + small_coords = 0, 0, min(w, x2 - x1), min(y2 - y1, h) + + return (x1, y1, x2, y2), small_coords + + def random_affine_augment(self, + img, + labels=[], + input_dim=[640, 640], + degrees=[-10, 10], + scales=[0.1, 2], + shears=[-2, 2], + translates=[-0.1, 0.1]): + # random rotation and scale + degree = random.uniform(degrees[0], degrees[1]) + scale = random.uniform(scales[0], scales[1]) + assert scale > 0, "Argument scale should be positive." + R = cv2.getRotationMatrix2D(angle=degree, center=(0, 0), scale=scale) + M = np.ones([2, 3]) + + # random shear + shear = random.uniform(shears[0], shears[1]) + shear_x = math.tan(shear * math.pi / 180) + shear_y = math.tan(shear * math.pi / 180) + M[0] = R[0] + shear_y * R[1] + M[1] = R[1] + shear_x * R[0] + + # random translation + translate = random.uniform(translates[0], translates[1]) + translation_x = translate * input_dim[0] + translation_y = translate * input_dim[1] + M[0, 2] = translation_x + M[1, 2] = translation_y + + # warpAffine + img = cv2.warpAffine( + img, M, dsize=tuple(input_dim), borderValue=(114, 114, 114)) + + num_gts = len(labels) + if num_gts > 0: + # warp corner points + corner_points = np.ones((4 * num_gts, 3)) + corner_points[:, :2] = labels[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + 4 * num_gts, 2) # x1y1, x2y2, x1y2, x2y1 + # apply affine transform + corner_points = corner_points @M.T + corner_points = corner_points.reshape(num_gts, 8) + + # create new boxes + corner_xs = corner_points[:, 0::2] + corner_ys = corner_points[:, 1::2] + new_bboxes = np.concatenate((corner_xs.min(1), corner_ys.min(1), + corner_xs.max(1), corner_ys.max(1))) + new_bboxes = new_bboxes.reshape(4, num_gts).T + + # clip boxes + new_bboxes[:, 0::2] = np.clip(new_bboxes[:, 0::2], 0, input_dim[0]) + new_bboxes[:, 1::2] = np.clip(new_bboxes[:, 1::2], 0, input_dim[1]) + labels[:, :4] = new_bboxes + + return img, labels + + def __call__(self, sample, context=None): + if not isinstance(sample, Sequence): + return sample + + assert len( + sample) == 5, "Mosaic needs 5 samples, 4 for mosaic and 1 for mixup." + if np.random.uniform(0., 1.) > self.prob: + return sample[0] + + mosaic_gt_bbox, mosaic_gt_class, mosaic_is_crowd, mosaic_difficult = [], [], [], [] + input_h, input_w = self.input_dim + yc = int(random.uniform(0.5 * input_h, 1.5 * input_h)) + xc = int(random.uniform(0.5 * input_w, 1.5 * input_w)) + mosaic_img = np.full((input_h * 2, input_w * 2, 3), 114, dtype=np.uint8) + + # 1. get mosaic coords + for mosaic_idx, sp in enumerate(sample[:4]): + img = sp['image'] + gt_bbox = sp['gt_bbox'] + h0, w0 = img.shape[:2] + scale = min(1. * input_h / h0, 1. * input_w / w0) + img = cv2.resize( + img, (int(w0 * scale), int(h0 * scale)), + interpolation=cv2.INTER_LINEAR) + (h, w, c) = img.shape[:3] + + # suffix l means large image, while s means small image in mosaic aug. + (l_x1, l_y1, l_x2, l_y2), ( + s_x1, s_y1, s_x2, s_y2) = self.get_mosaic_coords( + mosaic_idx, xc, yc, w, h, input_h, input_w) + + mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2] + padw, padh = l_x1 - s_x1, l_y1 - s_y1 + + # Normalized xywh to pixel xyxy format + _gt_bbox = gt_bbox.copy() + if len(gt_bbox) > 0: + _gt_bbox[:, 0] = scale * gt_bbox[:, 0] + padw + _gt_bbox[:, 1] = scale * gt_bbox[:, 1] + padh + _gt_bbox[:, 2] = scale * gt_bbox[:, 2] + padw + _gt_bbox[:, 3] = scale * gt_bbox[:, 3] + padh + + mosaic_gt_bbox.append(_gt_bbox) + mosaic_gt_class.append(sp['gt_class']) + if 'is_crowd' in sp: + mosaic_is_crowd.append(sp['is_crowd']) + if 'difficult' in sp: + mosaic_difficult.append(sp['difficult']) + + # 2. clip bbox and get mosaic_labels([gt_bbox, gt_class, is_crowd]) + if len(mosaic_gt_bbox): + mosaic_gt_bbox = np.concatenate(mosaic_gt_bbox, 0) + mosaic_gt_class = np.concatenate(mosaic_gt_class, 0) + if mosaic_is_crowd: + mosaic_is_crowd = np.concatenate(mosaic_is_crowd, 0) + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, + mosaic_gt_class.astype(mosaic_gt_bbox.dtype), + mosaic_is_crowd.astype(mosaic_gt_bbox.dtype) + ], 1) + elif mosaic_difficult: + mosaic_difficult = np.concatenate(mosaic_difficult, 0) + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, + mosaic_gt_class.astype(mosaic_gt_bbox.dtype), + mosaic_difficult.astype(mosaic_gt_bbox.dtype) + ], 1) + else: + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, mosaic_gt_class.astype(mosaic_gt_bbox.dtype) + ], 1) + if self.remove_outside_box: + # for MOT dataset + flag1 = mosaic_gt_bbox[:, 0] < 2 * input_w + flag2 = mosaic_gt_bbox[:, 2] > 0 + flag3 = mosaic_gt_bbox[:, 1] < 2 * input_h + flag4 = mosaic_gt_bbox[:, 3] > 0 + flag_all = flag1 * flag2 * flag3 * flag4 + mosaic_labels = mosaic_labels[flag_all] + else: + mosaic_labels[:, 0] = np.clip(mosaic_labels[:, 0], 0, + 2 * input_w) + mosaic_labels[:, 1] = np.clip(mosaic_labels[:, 1], 0, + 2 * input_h) + mosaic_labels[:, 2] = np.clip(mosaic_labels[:, 2], 0, + 2 * input_w) + mosaic_labels[:, 3] = np.clip(mosaic_labels[:, 3], 0, + 2 * input_h) + else: + mosaic_labels = np.zeros((1, 6)) + + # 3. random_affine augment + mosaic_img, mosaic_labels = self.random_affine_augment( + mosaic_img, + mosaic_labels, + input_dim=self.input_dim, + degrees=self.degrees, + translates=self.translate, + scales=self.scale, + shears=self.shear) + + # 4. Mixup augment as copypaste, https://arxiv.org/abs/2012.07177 + # optinal, not used(enable_mixup=False) in tiny/nano + if (self.enable_mixup and not len(mosaic_labels) == 0 and + random.random() < self.mixup_prob): + sample_mixup = sample[4] + mixup_img = sample_mixup['image'] + if 'is_crowd' in sample_mixup: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype), + sample_mixup['is_crowd'].astype(mosaic_labels.dtype) + ], 1) + elif 'difficult' in sample_mixup: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype), + sample_mixup['difficult'].astype(mosaic_labels.dtype) + ], 1) + else: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype) + ], 1) + mosaic_img, mosaic_labels = self.mixup_augment( + mosaic_img, mosaic_labels, self.input_dim, cp_labels, mixup_img) + + sample0 = sample[0] + sample0['image'] = mosaic_img.astype(np.uint8) # can not be float32 + sample0['h'] = float(mosaic_img.shape[0]) + sample0['w'] = float(mosaic_img.shape[1]) + sample0['im_shape'][0] = sample0['h'] + sample0['im_shape'][1] = sample0['w'] + sample0['gt_bbox'] = mosaic_labels[:, :4].astype(np.float32) + sample0['gt_class'] = mosaic_labels[:, 4:5].astype(np.float32) + if 'is_crowd' in sample[0]: + sample0['is_crowd'] = mosaic_labels[:, 5:6].astype(np.float32) + if 'difficult' in sample[0]: + sample0['difficult'] = mosaic_labels[:, 5:6].astype(np.float32) + return sample0 + + def mixup_augment(self, origin_img, origin_labels, input_dim, cp_labels, + img): + jit_factor = random.uniform(*self.mixup_scale) + FLIP = random.uniform(0, 1) > 0.5 + if len(img.shape) == 3: + cp_img = np.ones( + (input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114 + else: + cp_img = np.ones(input_dim, dtype=np.uint8) * 114 + + cp_scale_ratio = min(input_dim[0] / img.shape[0], + input_dim[1] / img.shape[1]) + resized_img = cv2.resize( + img, (int(img.shape[1] * cp_scale_ratio), + int(img.shape[0] * cp_scale_ratio)), + interpolation=cv2.INTER_LINEAR) + + cp_img[:int(img.shape[0] * cp_scale_ratio), :int(img.shape[ + 1] * cp_scale_ratio)] = resized_img + + cp_img = cv2.resize(cp_img, (int(cp_img.shape[1] * jit_factor), + int(cp_img.shape[0] * jit_factor))) + cp_scale_ratio *= jit_factor + + if FLIP: + cp_img = cp_img[:, ::-1, :] + + origin_h, origin_w = cp_img.shape[:2] + target_h, target_w = origin_img.shape[:2] + padded_img = np.zeros( + (max(origin_h, target_h), max(origin_w, target_w), 3), + dtype=np.uint8) + padded_img[:origin_h, :origin_w] = cp_img + + x_offset, y_offset = 0, 0 + if padded_img.shape[0] > target_h: + y_offset = random.randint(0, padded_img.shape[0] - target_h - 1) + if padded_img.shape[1] > target_w: + x_offset = random.randint(0, padded_img.shape[1] - target_w - 1) + padded_cropped_img = padded_img[y_offset:y_offset + target_h, x_offset: + x_offset + target_w] + + # adjust boxes + cp_bboxes_origin_np = cp_labels[:, :4].copy() + cp_bboxes_origin_np[:, 0::2] = np.clip(cp_bboxes_origin_np[:, 0::2] * + cp_scale_ratio, 0, origin_w) + cp_bboxes_origin_np[:, 1::2] = np.clip(cp_bboxes_origin_np[:, 1::2] * + cp_scale_ratio, 0, origin_h) + + if FLIP: + cp_bboxes_origin_np[:, 0::2] = ( + origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]) + cp_bboxes_transformed_np = cp_bboxes_origin_np.copy() + if self.remove_outside_box: + # for MOT dataset + cp_bboxes_transformed_np[:, 0::2] -= x_offset + cp_bboxes_transformed_np[:, 1::2] -= y_offset + else: + cp_bboxes_transformed_np[:, 0::2] = np.clip( + cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w) + cp_bboxes_transformed_np[:, 1::2] = np.clip( + cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h) + + cls_labels = cp_labels[:, 4:5].copy() + box_labels = cp_bboxes_transformed_np + if cp_labels.shape[-1] == 6: + crd_labels = cp_labels[:, 5:6].copy() + labels = np.hstack((box_labels, cls_labels, crd_labels)) + else: + labels = np.hstack((box_labels, cls_labels)) + if self.remove_outside_box: + labels = labels[labels[:, 0] < target_w] + labels = labels[labels[:, 2] > 0] + labels = labels[labels[:, 1] < target_h] + labels = labels[labels[:, 3] > 0] + + origin_labels = np.vstack((origin_labels, labels)) + origin_img = origin_img.astype(np.float32) + origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype( + np.float32) + + return origin_img.astype(np.uint8), origin_labels + + +@register_op +class PadResize(BaseOperator): + """ PadResize for image and gt_bbbox + + Args: + target_size (list[int]): input shape + fill_value (float): pixel value of padded image + """ + + def __init__(self, target_size, fill_value=114): + super(PadResize, self).__init__() + if isinstance(target_size, Integral): + target_size = [target_size, target_size] + self.target_size = target_size + self.fill_value = fill_value + + def _resize(self, img, bboxes, labels): + ratio = min(self.target_size[0] / img.shape[0], + self.target_size[1] / img.shape[1]) + w, h = int(img.shape[1] * ratio), int(img.shape[0] * ratio) + resized_img = cv2.resize(img, (w, h), interpolation=cv2.INTER_LINEAR) + + if len(bboxes) > 0: + bboxes *= ratio + mask = np.minimum(bboxes[:, 2] - bboxes[:, 0], + bboxes[:, 3] - bboxes[:, 1]) > 1 + bboxes = bboxes[mask] + labels = labels[mask] + return resized_img, bboxes, labels + + def _pad(self, img): + h, w, _ = img.shape + if h == self.target_size[0] and w == self.target_size[1]: + return img + padded_img = np.full( + (self.target_size[0], self.target_size[1], 3), + self.fill_value, + dtype=np.uint8) + padded_img[:h, :w] = img + return padded_img + + def apply(self, sample, context=None): + image = sample['image'] + bboxes = sample['gt_bbox'] + labels = sample['gt_class'] + image, bboxes, labels = self._resize(image, bboxes, labels) + sample['image'] = self._pad(image).astype(np.float32) + sample['gt_bbox'] = bboxes + sample['gt_class'] = labels + return sample + + +@register_op +class RandomShift(BaseOperator): + """ + Randomly shift image + + Args: + prob (float): probability to do random shift. + max_shift (int): max shift pixels + filter_thr (int): filter gt bboxes if one side is smaller than this + """ + + def __init__(self, prob=0.5, max_shift=32, filter_thr=1): + super(RandomShift, self).__init__() + self.prob = prob + self.max_shift = max_shift + self.filter_thr = filter_thr + + def calc_shift_coor(self, im_h, im_w, shift_h, shift_w): + return [ + max(0, shift_w), max(0, shift_h), min(im_w, im_w + shift_w), + min(im_h, im_h + shift_h) + ] + + def apply(self, sample, context=None): + if random.random() > self.prob: + return sample + + im = sample['image'] + gt_bbox = sample['gt_bbox'] + gt_class = sample['gt_class'] + im_h, im_w = im.shape[:2] + shift_h = random.randint(-self.max_shift, self.max_shift) + shift_w = random.randint(-self.max_shift, self.max_shift) + + gt_bbox[:, 0::2] += shift_w + gt_bbox[:, 1::2] += shift_h + gt_bbox[:, 0::2] = np.clip(gt_bbox[:, 0::2], 0, im_w) + gt_bbox[:, 1::2] = np.clip(gt_bbox[:, 1::2], 0, im_h) + gt_bbox_h = gt_bbox[:, 2] - gt_bbox[:, 0] + gt_bbox_w = gt_bbox[:, 3] - gt_bbox[:, 1] + keep = (gt_bbox_w > self.filter_thr) & (gt_bbox_h > self.filter_thr) + if not keep.any(): + return sample + + gt_bbox = gt_bbox[keep] + gt_class = gt_class[keep] + + # shift image + coor_new = self.calc_shift_coor(im_h, im_w, shift_h, shift_w) + # shift frame to the opposite direction + coor_old = self.calc_shift_coor(im_h, im_w, -shift_h, -shift_w) + canvas = np.zeros_like(im) + canvas[coor_new[1]:coor_new[3], coor_new[0]:coor_new[2]] \ + = im[coor_old[1]:coor_old[3], coor_old[0]:coor_old[2]] + + sample['image'] = canvas + sample['gt_bbox'] = gt_bbox + sample['gt_class'] = gt_class + return sample + + +@register_op +class StrongAugImage(BaseOperator): + def __init__(self, transforms): + super(StrongAugImage, self).__init__() + self.transforms = Compose(transforms) + + def apply(self, sample, context=None): + im = sample + im['image'] = sample['image'].astype('uint8') + results = self.transforms(im) + sample['image'] = results['image'].astype('uint8') + return sample + + +@register_op +class RandomColorJitter(BaseOperator): + def __init__(self, + prob=0.8, + brightness=0.4, + contrast=0.4, + saturation=0.4, + hue=0.1): + super(RandomColorJitter, self).__init__() + self.prob = prob + self.brightness = brightness + self.contrast = contrast + self.saturation = saturation + self.hue = hue + + def apply(self, sample, context=None): + if np.random.uniform(0, 1) < self.prob: + from paddle.vision.transforms import ColorJitter + transform = ColorJitter(self.brightness, self.contrast, + self.saturation, self.hue) + sample['image'] = transform(sample['image'].astype(np.uint8)) + sample['image'] = sample['image'].astype(np.float32) + return sample + + +@register_op +class RandomGrayscale(BaseOperator): + def __init__(self, prob=0.2): + super(RandomGrayscale, self).__init__() + self.prob = prob + + def apply(self, sample, context=None): + if np.random.uniform(0, 1) < self.prob: + from paddle.vision.transforms import Grayscale + transform = Grayscale(num_output_channels=3) + sample['image'] = transform(sample['image']) + return sample + + +@register_op +class RandomGaussianBlur(BaseOperator): + def __init__(self, prob=0.5, sigma=[0.1, 2.0]): + super(RandomGaussianBlur, self).__init__() + self.prob = prob + self.sigma = sigma + + def apply(self, sample, context=None): + if np.random.uniform(0, 1) < self.prob: + sigma = np.random.uniform(self.sigma[0], self.sigma[1]) + im = cv2.GaussianBlur(sample['image'], (23, 23), sigma) + sample['image'] = im + return sample + + +@register_op +class RandomErasing(BaseOperator): + def __init__(self, + prob=0.5, + scale=(0.02, 0.33), + ratio=(0.3, 3.3), + value=0, + inplace=False): + super(RandomErasing, self).__init__() + assert isinstance(scale, + (tuple, list)), "scale should be a tuple or list" + assert (scale[0] >= 0 and scale[1] <= 1 and scale[0] <= scale[1] + ), "scale should be of kind (min, max) and in range [0, 1]" + assert isinstance(ratio, + (tuple, list)), "ratio should be a tuple or list" + assert (ratio[0] >= 0 and + ratio[0] <= ratio[1]), "ratio should be of kind (min, max)" + assert isinstance( + value, (Number, str, tuple, + list)), "value should be a number, tuple, list or str" + if isinstance(value, str) and value != "random": + raise ValueError("value must be 'random' when type is str") + self.prob = prob + self.scale = scale + self.ratio = ratio + self.value = value + self.inplace = inplace + + def _erase(self, img, i, j, h, w, v, inplace=False): + if not inplace: + img = img.copy() + img[i:i + h, j:j + w, ...] = v + return img + + def _get_param(self, img, scale, ratio, value): + shape = np.asarray(img).astype(np.uint8).shape + h, w, c = shape[-3], shape[-2], shape[-1] + img_area = h * w + log_ratio = np.log(ratio) + for _ in range(1): + erase_area = np.random.uniform(*scale) * img_area + aspect_ratio = np.exp(np.random.uniform(*log_ratio)) + erase_h = int(round(np.sqrt(erase_area * aspect_ratio))) + erase_w = int(round(np.sqrt(erase_area / aspect_ratio))) + if erase_h >= h or erase_w >= w: + continue + + if value is None: + v = np.random.normal(size=[erase_h, erase_w, c]) * 255 + else: + v = np.array(value)[None, None, :] + top = np.random.randint(0, h - erase_h + 1) + left = np.random.randint(0, w - erase_w + 1) + return top, left, erase_h, erase_w, v + return 0, 0, h, w, img + + def apply(self, sample, context=None): + if random.random() < self.prob: + if isinstance(self.value, Number): + value = [self.value] + elif isinstance(self.value, str): + value = None + else: + value = self.value + if value is not None and not (len(value) == 1 or len(value) == 3): + raise ValueError( + "Value should be a single number or a sequence with length equals to image's channel." + ) + im = sample['image'] + top, left, erase_h, erase_w, v = self._get_param(im, self.scale, + self.ratio, value) + im = self._erase(im, top, left, erase_h, erase_w, v, self.inplace) + sample['image'] = im + return sample + + +@register_op +class RandomErasingCrop(BaseOperator): + def __init__(self): + super(RandomErasingCrop, self).__init__() + self.transform1 = RandomErasing( + prob=0.7, scale=(0.05, 0.2), ratio=(0.3, 3.3), value="random") + self.transform2 = RandomErasing( + prob=0.5, scale=(0.05, 0.2), ratio=(0.1, 6), value="random") + self.transform3 = RandomErasing( + prob=0.3, scale=(0.05, 0.2), ratio=(0.05, 8), value="random") + + def apply(self, sample, context=None): + sample = self.transform1(sample) + sample = self.transform2(sample) + sample = self.transform3(sample) + return sample diff --git a/PaddleDetection-release-2.6/ppdet/data/transform/rotated_operators.py b/PaddleDetection-release-2.6/ppdet/data/transform/rotated_operators.py new file mode 100644 index 0000000000000000000000000000000000000000..99dfb572e6c730d651b5169c74cdecaffce8fe7b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/transform/rotated_operators.py @@ -0,0 +1,479 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +from numbers import Number, Integral + +import cv2 +import numpy as np +import math +import copy + +from .operators import register_op, BaseOperator +from ppdet.modeling.rbox_utils import poly2rbox_le135_np, poly2rbox_oc_np, rbox2poly_np +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register_op +class RRotate(BaseOperator): + """ Rotate Image, Polygon, Box + + Args: + scale (float): rotate scale + angle (float): rotate angle + fill_value (int, tuple): fill color + auto_bound (bool): whether auto bound or not + """ + + def __init__(self, scale=1.0, angle=0., fill_value=0., auto_bound=True): + super(RRotate, self).__init__() + self.scale = scale + self.angle = angle + self.fill_value = fill_value + self.auto_bound = auto_bound + + def get_rotated_matrix(self, angle, scale, h, w): + center = ((w - 1) * 0.5, (h - 1) * 0.5) + matrix = cv2.getRotationMatrix2D(center, -angle, scale) + # calculate the new size + cos = np.abs(matrix[0, 0]) + sin = np.abs(matrix[0, 1]) + new_w = h * sin + w * cos + new_h = h * cos + w * sin + # calculate offset + n_w = int(np.round(new_w)) + n_h = int(np.round(new_h)) + if self.auto_bound: + ratio = min(w / n_w, h / n_h) + matrix = cv2.getRotationMatrix2D(center, -angle, ratio) + else: + matrix[0, 2] += (new_w - w) * 0.5 + matrix[1, 2] += (new_h - h) * 0.5 + w = n_w + h = n_h + return matrix, h, w + + def get_rect_from_pts(self, pts, h, w): + """ get minimum rectangle of points + """ + assert pts.shape[-1] % 2 == 0, 'the dim of input [pts] is not correct' + min_x, min_y = np.min(pts[:, 0::2], axis=1), np.min(pts[:, 1::2], + axis=1) + max_x, max_y = np.max(pts[:, 0::2], axis=1), np.max(pts[:, 1::2], + axis=1) + min_x, min_y = np.clip(min_x, 0, w), np.clip(min_y, 0, h) + max_x, max_y = np.clip(max_x, 0, w), np.clip(max_y, 0, h) + boxes = np.stack([min_x, min_y, max_x, max_y], axis=-1) + return boxes + + def apply_image(self, image, matrix, h, w): + return cv2.warpAffine( + image, matrix, (w, h), borderValue=self.fill_value) + + def apply_pts(self, pts, matrix, h, w): + assert pts.shape[-1] % 2 == 0, 'the dim of input [pts] is not correct' + # n is number of samples and m is two times the number of points due to (x, y) + _, m = pts.shape + # transpose points + pts_ = pts.reshape(-1, 2).T + # pad 1 to convert the points to homogeneous coordinates + padding = np.ones((1, pts_.shape[1]), pts.dtype) + rotated_pts = np.matmul(matrix, np.concatenate((pts_, padding), axis=0)) + return rotated_pts[:2, :].T.reshape(-1, m) + + def apply(self, sample, context=None): + image = sample['image'] + h, w = image.shape[:2] + matrix, h, w = self.get_rotated_matrix(self.angle, self.scale, h, w) + sample['image'] = self.apply_image(image, matrix, h, w) + polys = sample['gt_poly'] + # TODO: segment or keypoint to be processed + if len(polys) > 0: + pts = self.apply_pts(polys, matrix, h, w) + sample['gt_poly'] = pts + sample['gt_bbox'] = self.get_rect_from_pts(pts, h, w) + + return sample + + +@register_op +class RandomRRotate(BaseOperator): + """ Random Rotate Image + Args: + scale (float, tuple, list): rotate scale + scale_mode (str): mode of scale, [range, value, None] + angle (float, tuple, list): rotate angle + angle_mode (str): mode of angle, [range, value, None] + fill_value (float, tuple, list): fill value + rotate_prob (float): probability of rotation + auto_bound (bool): whether auto bound or not + """ + + def __init__(self, + scale=1.0, + scale_mode=None, + angle=0., + angle_mode=None, + fill_value=0., + rotate_prob=1.0, + auto_bound=True): + super(RandomRRotate, self).__init__() + self.scale = scale + self.scale_mode = scale_mode + self.angle = angle + self.angle_mode = angle_mode + self.fill_value = fill_value + self.rotate_prob = rotate_prob + self.auto_bound = auto_bound + + def get_angle(self, angle, angle_mode): + assert not angle_mode or angle_mode in [ + 'range', 'value' + ], 'angle mode should be in [range, value, None]' + if not angle_mode: + return angle + elif angle_mode == 'range': + low, high = angle + return np.random.rand() * (high - low) + low + elif angle_mode == 'value': + return np.random.choice(angle) + + def get_scale(self, scale, scale_mode): + assert not scale_mode or scale_mode in [ + 'range', 'value' + ], 'scale mode should be in [range, value, None]' + if not scale_mode: + return scale + elif scale_mode == 'range': + low, high = scale + return np.random.rand() * (high - low) + low + elif scale_mode == 'value': + return np.random.choice(scale) + + def apply(self, sample, context=None): + if np.random.rand() > self.rotate_prob: + return sample + + angle = self.get_angle(self.angle, self.angle_mode) + scale = self.get_scale(self.scale, self.scale_mode) + rotator = RRotate(scale, angle, self.fill_value, self.auto_bound) + return rotator(sample) + + +@register_op +class Poly2RBox(BaseOperator): + """ Polygon to Rotated Box, using new OpenCV definition since 4.5.1 + + Args: + filter_threshold (int, float): threshold to filter annotations + filter_mode (str): filter mode, ['area', 'edge'] + rbox_type (str): rbox type, ['le135', 'oc'] + + """ + + def __init__(self, filter_threshold=4, filter_mode=None, rbox_type='le135'): + super(Poly2RBox, self).__init__() + self.filter_fn = lambda size: self.filter(size, filter_threshold, filter_mode) + self.rbox_fn = poly2rbox_le135_np if rbox_type == 'le135' else poly2rbox_oc_np + + def filter(self, size, threshold, mode): + if mode == 'area': + if size[0] * size[1] < threshold: + return True + elif mode == 'edge': + if min(size) < threshold: + return True + return False + + def get_rbox(self, polys): + valid_ids, rboxes, bboxes = [], [], [] + for i, poly in enumerate(polys): + cx, cy, w, h, angle = self.rbox_fn(poly) + if self.filter_fn((w, h)): + continue + rboxes.append(np.array([cx, cy, w, h, angle], dtype=np.float32)) + valid_ids.append(i) + xmin, ymin = min(poly[0::2]), min(poly[1::2]) + xmax, ymax = max(poly[0::2]), max(poly[1::2]) + bboxes.append(np.array([xmin, ymin, xmax, ymax], dtype=np.float32)) + + if len(valid_ids) == 0: + rboxes = np.zeros((0, 5), dtype=np.float32) + bboxes = np.zeros((0, 4), dtype=np.float32) + else: + rboxes = np.stack(rboxes) + bboxes = np.stack(bboxes) + + return rboxes, bboxes, valid_ids + + def apply(self, sample, context=None): + rboxes, bboxes, valid_ids = self.get_rbox(sample['gt_poly']) + sample['gt_rbox'] = rboxes + sample['gt_bbox'] = bboxes + for k in ['gt_class', 'gt_score', 'gt_poly', 'is_crowd', 'difficult']: + if k in sample: + sample[k] = sample[k][valid_ids] + + return sample + + +@register_op +class Poly2Array(BaseOperator): + """ convert gt_poly to np.array for rotated bboxes + """ + + def __init__(self): + super(Poly2Array, self).__init__() + + def apply(self, sample, context=None): + if 'gt_poly' in sample: + sample['gt_poly'] = np.array( + sample['gt_poly'], dtype=np.float32).reshape((-1, 8)) + + return sample + + +@register_op +class RResize(BaseOperator): + def __init__(self, target_size, keep_ratio, interp=cv2.INTER_LINEAR): + """ + Resize image to target size. if keep_ratio is True, + resize the image's long side to the maximum of target_size + if keep_ratio is False, resize the image to target size(h, w) + Args: + target_size (int|list): image target size + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): the interpolation method + """ + super(RResize, self).__init__() + self.keep_ratio = keep_ratio + self.interp = interp + if not isinstance(target_size, (Integral, Sequence)): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or Tuple, now is {}". + format(type(target_size))) + if isinstance(target_size, Integral): + target_size = [target_size, target_size] + self.target_size = target_size + + def apply_image(self, image, scale): + im_scale_x, im_scale_y = scale + + return cv2.resize( + image, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + + def apply_pts(self, pts, scale, size): + im_scale_x, im_scale_y = scale + resize_w, resize_h = size + pts[:, 0::2] *= im_scale_x + pts[:, 1::2] *= im_scale_y + pts[:, 0::2] = np.clip(pts[:, 0::2], 0, resize_w) + pts[:, 1::2] = np.clip(pts[:, 1::2], 0, resize_h) + return pts + + def apply(self, sample, context=None): + """ Resize the image numpy. + """ + im = sample['image'] + if not isinstance(im, np.ndarray): + raise TypeError("{}: image type is not numpy.".format(self)) + if len(im.shape) != 3: + raise ImageError('{}: image is not 3-dimensional.'.format(self)) + + # apply image + im_shape = im.shape + if self.keep_ratio: + + im_size_min = np.min(im_shape[0:2]) + im_size_max = np.max(im_shape[0:2]) + + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + + im_scale = min(target_size_min / im_size_min, + target_size_max / im_size_max) + + resize_h = im_scale * float(im_shape[0]) + resize_w = im_scale * float(im_shape[1]) + + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / im_shape[0] + im_scale_x = resize_w / im_shape[1] + + im = self.apply_image(sample['image'], [im_scale_x, im_scale_y]) + sample['image'] = im.astype(np.float32) + sample['im_shape'] = np.asarray([resize_h, resize_w], dtype=np.float32) + if 'scale_factor' in sample: + scale_factor = sample['scale_factor'] + sample['scale_factor'] = np.asarray( + [scale_factor[0] * im_scale_y, scale_factor[1] * im_scale_x], + dtype=np.float32) + else: + sample['scale_factor'] = np.asarray( + [im_scale_y, im_scale_x], dtype=np.float32) + + # apply bbox + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_pts(sample['gt_bbox'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + # apply polygon + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_pts(sample['gt_poly'], + [im_scale_x, im_scale_y], + [resize_w, resize_h]) + + return sample + + +@register_op +class RandomRFlip(BaseOperator): + def __init__(self, prob=0.5): + """ + Args: + prob (float): the probability of flipping image + """ + super(RandomRFlip, self).__init__() + self.prob = prob + if not (isinstance(self.prob, float)): + raise TypeError("{}: input type is invalid.".format(self)) + + def apply_image(self, image): + return image[:, ::-1, :] + + def apply_pts(self, pts, width): + oldx = pts[:, 0::2].copy() + pts[:, 0::2] = width - oldx - 1 + return pts + + def apply(self, sample, context=None): + """Filp the image and bounding box. + Operators: + 1. Flip the image numpy. + 2. Transform the bboxes' x coordinates. + (Must judge whether the coordinates are normalized!) + 3. Transform the segmentations' x coordinates. + (Must judge whether the coordinates are normalized!) + Output: + sample: the image, bounding box and segmentation part + in sample are flipped. + """ + if np.random.uniform(0, 1) < self.prob: + im = sample['image'] + height, width = im.shape[:2] + im = self.apply_image(im) + if 'gt_bbox' in sample and len(sample['gt_bbox']) > 0: + sample['gt_bbox'] = self.apply_pts(sample['gt_bbox'], width) + if 'gt_poly' in sample and len(sample['gt_poly']) > 0: + sample['gt_poly'] = self.apply_pts(sample['gt_poly'], width) + + sample['flipped'] = True + sample['image'] = im + return sample + + +@register_op +class VisibleRBox(BaseOperator): + """ + In debug mode, visualize images according to `gt_box`. + (Currently only supported when not cropping and flipping image.) + """ + + def __init__(self, output_dir='debug'): + super(VisibleRBox, self).__init__() + self.output_dir = output_dir + if not os.path.isdir(output_dir): + os.makedirs(output_dir) + + def apply(self, sample, context=None): + image = Image.fromarray(sample['image'].astype(np.uint8)) + out_file_name = '{:012d}.jpg'.format(sample['im_id'][0]) + width = sample['w'] + height = sample['h'] + # gt_poly = sample['gt_rbox'] + gt_poly = sample['gt_poly'] + gt_class = sample['gt_class'] + draw = ImageDraw.Draw(image) + for i in range(gt_poly.shape[0]): + x1, y1, x2, y2, x3, y3, x4, y4 = gt_poly[i] + draw.line( + [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)], + width=2, + fill='green') + # draw label + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + text = str(gt_class[i][0]) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill='green') + draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) + + if 'gt_keypoint' in sample.keys(): + gt_keypoint = sample['gt_keypoint'] + if self.is_normalized: + for i in range(gt_keypoint.shape[1]): + if i % 2: + gt_keypoint[:, i] = gt_keypoint[:, i] * height + else: + gt_keypoint[:, i] = gt_keypoint[:, i] * width + for i in range(gt_keypoint.shape[0]): + keypoint = gt_keypoint[i] + for j in range(int(keypoint.shape[0] / 2)): + x1 = round(keypoint[2 * j]).astype(np.int32) + y1 = round(keypoint[2 * j + 1]).astype(np.int32) + draw.ellipse( + (x1, y1, x1 + 5, y1 + 5), fill='green', outline='green') + save_path = os.path.join(self.output_dir, out_file_name) + image.save(save_path, quality=95) + return sample + + +@register_op +class Rbox2Poly(BaseOperator): + """ + Convert rbbox format to poly format. + """ + + def __init__(self): + super(Rbox2Poly, self).__init__() + + def apply(self, sample, context=None): + assert 'gt_rbox' in sample + assert sample['gt_rbox'].shape[1] == 5 + rboxes = sample['gt_rbox'] + polys = rbox2poly_np(rboxes) + sample['gt_poly'] = polys + xmin, ymin = polys[:, 0::2].min(1), polys[:, 1::2].min(1) + xmax, ymax = polys[:, 0::2].max(1), polys[:, 1::2].max(1) + sample['gt_bbox'] = np.stack([xmin, ymin, xmin, ymin], axis=1) + return sample diff --git a/PaddleDetection-release-2.6/ppdet/data/utils.py b/PaddleDetection-release-2.6/ppdet/data/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..02573e61484bc5ef07353dbef124c8afa54ccc64 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/data/utils.py @@ -0,0 +1,72 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import numbers +import numpy as np + +try: + from collections.abc import Sequence, Mapping +except: + from collections import Sequence, Mapping + + +def default_collate_fn(batch): + """ + Default batch collating function for :code:`paddle.io.DataLoader`, + get input data as a list of sample datas, each element in list + if the data of a sample, and sample data should composed of list, + dictionary, string, number, numpy array, this + function will parse input data recursively and stack number, + numpy array and paddle.Tensor datas as batch datas. e.g. for + following input data: + [{'image': np.array(shape=[3, 224, 224]), 'label': 1}, + {'image': np.array(shape=[3, 224, 224]), 'label': 3}, + {'image': np.array(shape=[3, 224, 224]), 'label': 4}, + {'image': np.array(shape=[3, 224, 224]), 'label': 5},] + + + This default collate function zipped each number and numpy array + field together and stack each field as the batch field as follows: + {'image': np.array(shape=[4, 3, 224, 224]), 'label': np.array([1, 3, 4, 5])} + Args: + batch(list of sample data): batch should be a list of sample data. + + Returns: + Batched data: batched each number, numpy array and paddle.Tensor + in input data. + """ + sample = batch[0] + if isinstance(sample, np.ndarray): + batch = np.stack(batch, axis=0) + return batch + elif isinstance(sample, numbers.Number): + batch = np.array(batch) + return batch + elif isinstance(sample, (str, bytes)): + return batch + elif isinstance(sample, Mapping): + return { + key: default_collate_fn([d[key] for d in batch]) + for key in sample + } + elif isinstance(sample, Sequence): + sample_fields_num = len(sample) + if not all(len(sample) == sample_fields_num for sample in iter(batch)): + raise RuntimeError( + "fileds number not same among samples in a batch") + return [default_collate_fn(fields) for fields in zip(*batch)] + + raise TypeError("batch data con only contains: tensor, numpy.ndarray, " + "dict, list, number, but got {}".format(type(sample))) diff --git a/PaddleDetection-release-2.6/ppdet/engine/__init__.py b/PaddleDetection-release-2.6/ppdet/engine/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..91166e8764f521cb3dd78ba86c5681c4b531413c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/__init__.py @@ -0,0 +1,37 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import trainer +from .trainer import * + +from . import trainer_cot +from .trainer_cot import * + +from . import callbacks +from .callbacks import * + +from . import env +from .env import * + +__all__ = trainer.__all__ \ + + callbacks.__all__ \ + + env.__all__ + +from . import tracker +from .tracker import * +__all__ = __all__ + tracker.__all__ + +from . import trainer_ssod +from .trainer_ssod import * +__all__ = __all__ + trainer_ssod.__all__ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..786b0ffc2abe351854ec47536c27d6cce7f69141 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/callbacks.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/callbacks.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8f661c7f4b87597370fbdca847ba2bb5be945c88 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/callbacks.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/env.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/env.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d1c0840cfc75e070bc3db8c3fcab9b0b38b3762e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/env.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/export_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/export_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..49c6e43d224da81de6e923ac0ee2cf2266933e71 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/export_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e7b456f8034dc6b00f5a116b1cbefc5ec8abc44a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6653167985aa49f8f2fe7b578dd9f4b1bf8e9745 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_cot.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_cot.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8d3a4f7d12d56f3f16c5b160c5cc4ef0f659cf50 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_cot.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_ssod.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_ssod.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6d67f1d67908998717ac09b1cdc6c76940cef50c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/engine/__pycache__/trainer_ssod.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/engine/callbacks.py b/PaddleDetection-release-2.6/ppdet/engine/callbacks.py new file mode 100644 index 0000000000000000000000000000000000000000..1f2d546d86e9473c5dd6b7fd15068c940006dab5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/callbacks.py @@ -0,0 +1,557 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import datetime +import six +import copy +import json + +import paddle +import paddle.distributed as dist + +from ppdet.utils.checkpoint import save_model +from ppdet.metrics import get_infer_results + +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +__all__ = [ + 'Callback', 'ComposeCallback', 'LogPrinter', 'Checkpointer', + 'VisualDLWriter', 'SniperProposalsGenerator' +] + + +class Callback(object): + def __init__(self, model): + self.model = model + + def on_step_begin(self, status): + pass + + def on_step_end(self, status): + pass + + def on_epoch_begin(self, status): + pass + + def on_epoch_end(self, status): + pass + + def on_train_begin(self, status): + pass + + def on_train_end(self, status): + pass + + +class ComposeCallback(object): + def __init__(self, callbacks): + callbacks = [c for c in list(callbacks) if c is not None] + for c in callbacks: + assert isinstance( + c, Callback), "callback should be subclass of Callback" + self._callbacks = callbacks + + def on_step_begin(self, status): + for c in self._callbacks: + c.on_step_begin(status) + + def on_step_end(self, status): + for c in self._callbacks: + c.on_step_end(status) + + def on_epoch_begin(self, status): + for c in self._callbacks: + c.on_epoch_begin(status) + + def on_epoch_end(self, status): + for c in self._callbacks: + c.on_epoch_end(status) + + def on_train_begin(self, status): + for c in self._callbacks: + c.on_train_begin(status) + + def on_train_end(self, status): + for c in self._callbacks: + c.on_train_end(status) + + +class LogPrinter(Callback): + def __init__(self, model): + super(LogPrinter, self).__init__(model) + + def on_step_end(self, status): + if dist.get_world_size() < 2 or dist.get_rank() == 0: + mode = status['mode'] + if mode == 'train': + epoch_id = status['epoch_id'] + step_id = status['step_id'] + steps_per_epoch = status['steps_per_epoch'] + training_staus = status['training_staus'] + batch_time = status['batch_time'] + data_time = status['data_time'] + + epoches = self.model.cfg.epoch + batch_size = self.model.cfg['{}Reader'.format(mode.capitalize( + ))]['batch_size'] + + logs = training_staus.log() + space_fmt = ':' + str(len(str(steps_per_epoch))) + 'd' + if step_id % self.model.cfg.log_iter == 0: + eta_steps = (epoches - epoch_id) * steps_per_epoch - step_id + eta_sec = eta_steps * batch_time.global_avg + eta_str = str(datetime.timedelta(seconds=int(eta_sec))) + ips = float(batch_size) / batch_time.avg + fmt = ' '.join([ + 'Epoch: [{}]', + '[{' + space_fmt + '}/{}]', + 'learning_rate: {lr:.6f}', + '{meters}', + 'eta: {eta}', + 'batch_cost: {btime}', + 'data_cost: {dtime}', + 'ips: {ips:.4f} images/s', + ]) + fmt = fmt.format( + epoch_id, + step_id, + steps_per_epoch, + lr=status['learning_rate'], + meters=logs, + eta=eta_str, + btime=str(batch_time), + dtime=str(data_time), + ips=ips) + logger.info(fmt) + if mode == 'eval': + step_id = status['step_id'] + if step_id % 100 == 0: + logger.info("Eval iter: {}".format(step_id)) + + def on_epoch_end(self, status): + if dist.get_world_size() < 2 or dist.get_rank() == 0: + mode = status['mode'] + if mode == 'eval': + sample_num = status['sample_num'] + cost_time = status['cost_time'] + logger.info('Total sample number: {}, average FPS: {}'.format( + sample_num, sample_num / cost_time)) + + +class Checkpointer(Callback): + def __init__(self, model): + super(Checkpointer, self).__init__(model) + self.best_ap = -1000. + self.save_dir = os.path.join(self.model.cfg.save_dir, + self.model.cfg.filename) + if hasattr(self.model.model, 'student_model'): + self.weight = self.model.model.student_model + else: + self.weight = self.model.model + + def on_epoch_end(self, status): + # Checkpointer only performed during training + mode = status['mode'] + epoch_id = status['epoch_id'] + weight = None + save_name = None + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + end_epoch = self.model.cfg.epoch + if ( + epoch_id + 1 + ) % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1: + save_name = str( + epoch_id) if epoch_id != end_epoch - 1 else "model_final" + weight = self.weight.state_dict() + elif mode == 'eval': + if 'save_best_model' in status and status['save_best_model']: + for metric in self.model._metrics: + map_res = metric.get_results() + eval_func = "ap" + if 'pose3d' in map_res: + key = 'pose3d' + eval_func = "mpjpe" + elif 'bbox' in map_res: + key = 'bbox' + elif 'keypoint' in map_res: + key = 'keypoint' + else: + key = 'mask' + if key not in map_res: + logger.warning("Evaluation results empty, this may be due to " \ + "training iterations being too few or not " \ + "loading the correct weights.") + return + if map_res[key][0] >= self.best_ap: + self.best_ap = map_res[key][0] + save_name = 'best_model' + weight = self.weight.state_dict() + logger.info("Best test {} {} is {:0.3f}.".format( + key, eval_func, abs(self.best_ap))) + if weight: + if self.model.use_ema: + exchange_save_model = status.get('exchange_save_model', + False) + if not exchange_save_model: + # save model and ema_model + save_model( + status['weight'], + self.model.optimizer, + self.save_dir, + save_name, + epoch_id + 1, + ema_model=weight) + else: + # save model(student model) and ema_model(teacher model) + # in DenseTeacher SSOD, the teacher model will be higher, + # so exchange when saving pdparams + student_model = status['weight'] # model + teacher_model = weight # ema_model + save_model( + teacher_model, + self.model.optimizer, + self.save_dir, + save_name, + epoch_id + 1, + ema_model=student_model) + del teacher_model + del student_model + else: + save_model(weight, self.model.optimizer, self.save_dir, + save_name, epoch_id + 1) + + +class WiferFaceEval(Callback): + def __init__(self, model): + super(WiferFaceEval, self).__init__(model) + + def on_epoch_begin(self, status): + assert self.model.mode == 'eval', \ + "WiferFaceEval can only be set during evaluation" + for metric in self.model._metrics: + metric.update(self.model.model) + sys.exit() + + +class VisualDLWriter(Callback): + """ + Use VisualDL to log data or image + """ + + def __init__(self, model): + super(VisualDLWriter, self).__init__(model) + + assert six.PY3, "VisualDL requires Python >= 3.5" + try: + from visualdl import LogWriter + except Exception as e: + logger.error('visualdl not found, plaese install visualdl. ' + 'for example: `pip install visualdl`.') + raise e + self.vdl_writer = LogWriter( + model.cfg.get('vdl_log_dir', 'vdl_log_dir/scalar')) + self.vdl_loss_step = 0 + self.vdl_mAP_step = 0 + self.vdl_image_step = 0 + self.vdl_image_frame = 0 + + def on_step_end(self, status): + mode = status['mode'] + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + training_staus = status['training_staus'] + for loss_name, loss_value in training_staus.get().items(): + self.vdl_writer.add_scalar(loss_name, loss_value, + self.vdl_loss_step) + self.vdl_loss_step += 1 + elif mode == 'test': + ori_image = status['original_image'] + result_image = status['result_image'] + self.vdl_writer.add_image( + "original/frame_{}".format(self.vdl_image_frame), ori_image, + self.vdl_image_step) + self.vdl_writer.add_image( + "result/frame_{}".format(self.vdl_image_frame), + result_image, self.vdl_image_step) + self.vdl_image_step += 1 + # each frame can display ten pictures at most. + if self.vdl_image_step % 10 == 0: + self.vdl_image_step = 0 + self.vdl_image_frame += 1 + + def on_epoch_end(self, status): + mode = status['mode'] + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'eval': + for metric in self.model._metrics: + for key, map_value in metric.get_results().items(): + self.vdl_writer.add_scalar("{}-mAP".format(key), + map_value[0], + self.vdl_mAP_step) + self.vdl_mAP_step += 1 + + +class WandbCallback(Callback): + def __init__(self, model): + super(WandbCallback, self).__init__(model) + + try: + import wandb + self.wandb = wandb + except Exception as e: + logger.error('wandb not found, please install wandb. ' + 'Use: `pip install wandb`.') + raise e + + self.wandb_params = model.cfg.get('wandb', None) + self.save_dir = os.path.join(self.model.cfg.save_dir, + self.model.cfg.filename) + if self.wandb_params is None: + self.wandb_params = {} + for k, v in model.cfg.items(): + if k.startswith("wandb_"): + self.wandb_params.update({k.lstrip("wandb_"): v}) + + self._run = None + if dist.get_world_size() < 2 or dist.get_rank() == 0: + _ = self.run + self.run.config.update(self.model.cfg) + self.run.define_metric("epoch") + self.run.define_metric("eval/*", step_metric="epoch") + + self.best_ap = -1000. + self.fps = [] + + @property + def run(self): + if self._run is None: + if self.wandb.run is not None: + logger.info( + "There is an ongoing wandb run which will be used" + "for logging. Please use `wandb.finish()` to end that" + "if the behaviour is not intended") + self._run = self.wandb.run + else: + self._run = self.wandb.init(**self.wandb_params) + return self._run + + def save_model(self, + optimizer, + save_dir, + save_name, + last_epoch, + ema_model=None, + ap=None, + fps=None, + tags=None): + if dist.get_world_size() < 2 or dist.get_rank() == 0: + model_path = os.path.join(save_dir, save_name) + metadata = {} + metadata["last_epoch"] = last_epoch + if ap: + metadata["ap"] = ap + + if fps: + metadata["fps"] = fps + + if ema_model is None: + ema_artifact = self.wandb.Artifact( + name="ema_model-{}".format(self.run.id), + type="model", + metadata=metadata) + model_artifact = self.wandb.Artifact( + name="model-{}".format(self.run.id), + type="model", + metadata=metadata) + + ema_artifact.add_file(model_path + ".pdema", name="model_ema") + model_artifact.add_file(model_path + ".pdparams", name="model") + + self.run.log_artifact(ema_artifact, aliases=tags) + self.run.log_artfact(model_artifact, aliases=tags) + else: + model_artifact = self.wandb.Artifact( + name="model-{}".format(self.run.id), + type="model", + metadata=metadata) + model_artifact.add_file(model_path + ".pdparams", name="model") + self.run.log_artifact(model_artifact, aliases=tags) + + def on_step_end(self, status): + + mode = status['mode'] + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + training_status = status['training_staus'].get() + for k, v in training_status.items(): + training_status[k] = float(v) + + # calculate ips, data_cost, batch_cost + batch_time = status['batch_time'] + data_time = status['data_time'] + batch_size = self.model.cfg['{}Reader'.format(mode.capitalize( + ))]['batch_size'] + + ips = float(batch_size) / float(batch_time.avg) + data_cost = float(data_time.avg) + batch_cost = float(batch_time.avg) + + metrics = {"train/" + k: v for k, v in training_status.items()} + + metrics["train/ips"] = ips + metrics["train/data_cost"] = data_cost + metrics["train/batch_cost"] = batch_cost + + self.fps.append(ips) + self.run.log(metrics) + + def on_epoch_end(self, status): + mode = status['mode'] + epoch_id = status['epoch_id'] + save_name = None + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + fps = sum(self.fps) / len(self.fps) + self.fps = [] + + end_epoch = self.model.cfg.epoch + if ( + epoch_id + 1 + ) % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1: + save_name = str( + epoch_id) if epoch_id != end_epoch - 1 else "model_final" + tags = ["latest", "epoch_{}".format(epoch_id)] + self.save_model( + self.model.optimizer, + self.save_dir, + save_name, + epoch_id + 1, + self.model.use_ema, + fps=fps, + tags=tags) + if mode == 'eval': + sample_num = status['sample_num'] + cost_time = status['cost_time'] + + fps = sample_num / cost_time + + merged_dict = {} + for metric in self.model._metrics: + for key, map_value in metric.get_results().items(): + merged_dict["eval/{}-mAP".format(key)] = map_value[0] + merged_dict["epoch"] = status["epoch_id"] + merged_dict["eval/fps"] = sample_num / cost_time + + self.run.log(merged_dict) + + if 'save_best_model' in status and status['save_best_model']: + for metric in self.model._metrics: + map_res = metric.get_results() + if 'pose3d' in map_res: + key = 'pose3d' + elif 'bbox' in map_res: + key = 'bbox' + elif 'keypoint' in map_res: + key = 'keypoint' + else: + key = 'mask' + if key not in map_res: + logger.warning("Evaluation results empty, this may be due to " \ + "training iterations being too few or not " \ + "loading the correct weights.") + return + if map_res[key][0] >= self.best_ap: + self.best_ap = map_res[key][0] + save_name = 'best_model' + tags = ["best", "epoch_{}".format(epoch_id)] + + self.save_model( + self.model.optimizer, + self.save_dir, + save_name, + last_epoch=epoch_id + 1, + ema_model=self.model.use_ema, + ap=abs(self.best_ap), + fps=fps, + tags=tags) + + def on_train_end(self, status): + self.run.finish() + + +class SniperProposalsGenerator(Callback): + def __init__(self, model): + super(SniperProposalsGenerator, self).__init__(model) + ori_dataset = self.model.dataset + self.dataset = self._create_new_dataset(ori_dataset) + self.loader = self.model.loader + self.cfg = self.model.cfg + self.infer_model = self.model.model + + def _create_new_dataset(self, ori_dataset): + dataset = copy.deepcopy(ori_dataset) + # init anno_cropper + dataset.init_anno_cropper() + # generate infer roidbs + ori_roidbs = dataset.get_ori_roidbs() + roidbs = dataset.anno_cropper.crop_infer_anno_records(ori_roidbs) + # set new roidbs + dataset.set_roidbs(roidbs) + + return dataset + + def _eval_with_loader(self, loader): + results = [] + with paddle.no_grad(): + self.infer_model.eval() + for step_id, data in enumerate(loader): + outs = self.infer_model(data) + for key in ['im_shape', 'scale_factor', 'im_id']: + outs[key] = data[key] + for key, value in outs.items(): + if hasattr(value, 'numpy'): + outs[key] = value.numpy() + + results.append(outs) + + return results + + def on_train_end(self, status): + self.loader.dataset = self.dataset + results = self._eval_with_loader(self.loader) + results = self.dataset.anno_cropper.aggregate_chips_detections(results) + # sniper + proposals = [] + clsid2catid = {v: k for k, v in self.dataset.catid2clsid.items()} + for outs in results: + batch_res = get_infer_results(outs, clsid2catid) + start = 0 + for i, im_id in enumerate(outs['im_id']): + bbox_num = outs['bbox_num'] + end = start + bbox_num[i] + bbox_res = batch_res['bbox'][start:end] \ + if 'bbox' in batch_res else None + if bbox_res: + proposals += bbox_res + logger.info("save proposals in {}".format(self.cfg.proposals_path)) + with open(self.cfg.proposals_path, 'w') as f: + json.dump(proposals, f) diff --git a/PaddleDetection-release-2.6/ppdet/engine/env.py b/PaddleDetection-release-2.6/ppdet/engine/env.py new file mode 100644 index 0000000000000000000000000000000000000000..0a896571db8bee03f3fdb172443af88622a912bd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/env.py @@ -0,0 +1,50 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import random +import numpy as np + +import paddle +from paddle.distributed import fleet + +__all__ = ['init_parallel_env', 'set_random_seed', 'init_fleet_env'] + + +def init_fleet_env(find_unused_parameters=False): + strategy = fleet.DistributedStrategy() + strategy.find_unused_parameters = find_unused_parameters + fleet.init(is_collective=True, strategy=strategy) + + +def init_parallel_env(): + env = os.environ + dist = 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env + if dist: + trainer_id = int(env['PADDLE_TRAINER_ID']) + local_seed = (99 + trainer_id) + random.seed(local_seed) + np.random.seed(local_seed) + + paddle.distributed.init_parallel_env() + + +def set_random_seed(seed): + paddle.seed(seed) + random.seed(seed) + np.random.seed(seed) diff --git a/PaddleDetection-release-2.6/ppdet/engine/export_utils.py b/PaddleDetection-release-2.6/ppdet/engine/export_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d7d2e883d2dc38ee09fc4e3c4cdab3a1dad6d8ac --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/export_utils.py @@ -0,0 +1,249 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import yaml +from collections import OrderedDict + +import paddle +from ppdet.data.source.category import get_categories + +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +# Global dictionary +TRT_MIN_SUBGRAPH = { + 'YOLO': 3, + 'PPYOLOE': 3, + 'SSD': 60, + 'RCNN': 40, + 'RetinaNet': 40, + 'S2ANet': 80, + 'EfficientDet': 40, + 'Face': 3, + 'TTFNet': 60, + 'FCOS': 16, + 'SOLOv2': 60, + 'HigherHRNet': 3, + 'HRNet': 3, + 'DeepSORT': 3, + 'ByteTrack': 10, + 'CenterTrack': 5, + 'JDE': 10, + 'FairMOT': 5, + 'GFL': 16, + 'PicoDet': 3, + 'CenterNet': 5, + 'TOOD': 5, + 'YOLOX': 8, + 'YOLOF': 40, + 'METRO_Body': 3, + 'DETR': 3, +} + +KEYPOINT_ARCH = ['HigherHRNet', 'TopDownHRNet'] +MOT_ARCH = ['JDE', 'FairMOT', 'DeepSORT', 'ByteTrack', 'CenterTrack'] + +TO_STATIC_SPEC = { + 'yolov3_darknet53_270e_coco': [{ + 'im_id': paddle.static.InputSpec( + name='im_id', shape=[-1, 1], dtype='float32'), + 'is_crowd': paddle.static.InputSpec( + name='is_crowd', shape=[-1, 50], dtype='float32'), + 'gt_bbox': paddle.static.InputSpec( + name='gt_bbox', shape=[-1, 50, 4], dtype='float32'), + 'curr_iter': paddle.static.InputSpec( + name='curr_iter', shape=[-1], dtype='float32'), + 'image': paddle.static.InputSpec( + name='image', shape=[-1, 3, -1, -1], dtype='float32'), + 'im_shape': paddle.static.InputSpec( + name='im_shape', shape=[-1, 2], dtype='float32'), + 'scale_factor': paddle.static.InputSpec( + name='scale_factor', shape=[-1, 2], dtype='float32'), + 'target0': paddle.static.InputSpec( + name='target0', shape=[-1, 3, 86, -1, -1], dtype='float32'), + 'target1': paddle.static.InputSpec( + name='target1', shape=[-1, 3, 86, -1, -1], dtype='float32'), + 'target2': paddle.static.InputSpec( + name='target2', shape=[-1, 3, 86, -1, -1], dtype='float32'), + }], +} + + +def apply_to_static(config, model): + filename = config.get('filename', None) + spec = TO_STATIC_SPEC.get(filename, None) + model = paddle.jit.to_static(model, input_spec=spec) + logger.info("Successfully to apply @to_static with specs: {}".format(spec)) + return model + + +def _prune_input_spec(input_spec, program, targets): + # try to prune static program to figure out pruned input spec + # so we perform following operations in static mode + device = paddle.get_device() + paddle.enable_static() + paddle.set_device(device) + pruned_input_spec = [{}] + program = program.clone() + program = program._prune(targets=targets) + global_block = program.global_block() + for name, spec in input_spec[0].items(): + try: + v = global_block.var(name) + pruned_input_spec[0][name] = spec + except Exception: + pass + paddle.disable_static(place=device) + return pruned_input_spec + + +def _parse_reader(reader_cfg, dataset_cfg, metric, arch, image_shape): + preprocess_list = [] + + anno_file = dataset_cfg.get_anno() + + clsid2catid, catid2name = get_categories(metric, anno_file, arch) + + label_list = [str(cat) for cat in catid2name.values()] + + fuse_normalize = reader_cfg.get('fuse_normalize', False) + sample_transforms = reader_cfg['sample_transforms'] + for st in sample_transforms[1:]: + for key, value in st.items(): + p = {'type': key} + if key == 'Resize': + if int(image_shape[1]) != -1: + value['target_size'] = image_shape[1:] + value['interp'] = value.get('interp', 1) # cv2.INTER_LINEAR + if fuse_normalize and key == 'NormalizeImage': + continue + p.update(value) + preprocess_list.append(p) + batch_transforms = reader_cfg.get('batch_transforms', None) + if batch_transforms: + for bt in batch_transforms: + for key, value in bt.items(): + # for deploy/infer, use PadStride(stride) instead PadBatch(pad_to_stride) + if key == 'PadBatch': + preprocess_list.append({ + 'type': 'PadStride', + 'stride': value['pad_to_stride'] + }) + break + + return preprocess_list, label_list + + +def _parse_tracker(tracker_cfg): + tracker_params = {} + for k, v in tracker_cfg.items(): + tracker_params.update({k: v}) + return tracker_params + + +def _dump_infer_config(config, path, image_shape, model): + arch_state = False + from ppdet.core.config.yaml_helpers import setup_orderdict + setup_orderdict() + use_dynamic_shape = True if image_shape[2] == -1 else False + infer_cfg = OrderedDict({ + 'mode': 'paddle', + 'draw_threshold': 0.5, + 'metric': config['metric'], + 'use_dynamic_shape': use_dynamic_shape + }) + export_onnx = config.get('export_onnx', False) + export_eb = config.get('export_eb', False) + + infer_arch = config['architecture'] + if 'RCNN' in infer_arch and export_onnx: + logger.warning( + "Exporting RCNN model to ONNX only support batch_size = 1") + infer_cfg['export_onnx'] = True + infer_cfg['export_eb'] = export_eb + + if infer_arch in MOT_ARCH: + if infer_arch == 'DeepSORT': + tracker_cfg = config['DeepSORTTracker'] + elif infer_arch == 'CenterTrack': + tracker_cfg = config['CenterTracker'] + else: + tracker_cfg = config['JDETracker'] + infer_cfg['tracker'] = _parse_tracker(tracker_cfg) + + for arch, min_subgraph_size in TRT_MIN_SUBGRAPH.items(): + if arch in infer_arch: + infer_cfg['arch'] = arch + infer_cfg['min_subgraph_size'] = min_subgraph_size + arch_state = True + break + + if infer_arch == 'PPYOLOEWithAuxHead': + infer_arch = 'PPYOLOE' + + if infer_arch in ['PPYOLOE', 'YOLOX', 'YOLOF']: + infer_cfg['arch'] = infer_arch + infer_cfg['min_subgraph_size'] = TRT_MIN_SUBGRAPH[infer_arch] + arch_state = True + + if not arch_state: + logger.error( + 'Architecture: {} is not supported for exporting model now.\n'. + format(infer_arch) + + 'Please set TRT_MIN_SUBGRAPH in ppdet/engine/export_utils.py') + os._exit(0) + if 'mask_head' in config[config['architecture']] and config[config[ + 'architecture']]['mask_head']: + infer_cfg['mask'] = True + label_arch = 'detection_arch' + if infer_arch in KEYPOINT_ARCH: + label_arch = 'keypoint_arch' + + if infer_arch in MOT_ARCH: + if config['metric'] in ['COCO', 'VOC']: + # MOT model run as Detector + reader_cfg = config['TestReader'] + dataset_cfg = config['TestDataset'] + else: + # 'metric' in ['MOT', 'MCMOT', 'KITTI'] + label_arch = 'mot_arch' + reader_cfg = config['TestMOTReader'] + dataset_cfg = config['TestMOTDataset'] + else: + reader_cfg = config['TestReader'] + dataset_cfg = config['TestDataset'] + + infer_cfg['Preprocess'], infer_cfg['label_list'] = _parse_reader( + reader_cfg, dataset_cfg, config['metric'], label_arch, image_shape[1:]) + + if infer_arch == 'PicoDet': + if hasattr(config, 'export') and config['export'].get( + 'post_process', + False) and not config['export'].get('benchmark', False): + infer_cfg['arch'] = 'GFL' + head_name = 'PicoHeadV2' if config['PicoHeadV2'] else 'PicoHead' + infer_cfg['NMS'] = config[head_name]['nms'] + # In order to speed up the prediction, the threshold of nms + # is adjusted here, which can be changed in infer_cfg.yml + config[head_name]['nms']["score_threshold"] = 0.3 + config[head_name]['nms']["nms_threshold"] = 0.5 + infer_cfg['fpn_stride'] = config[head_name]['fpn_stride'] + + yaml.dump(infer_cfg, open(path, 'w')) + logger.info("Export inference config file to {}".format(os.path.join(path))) diff --git a/PaddleDetection-release-2.6/ppdet/engine/tracker.py b/PaddleDetection-release-2.6/ppdet/engine/tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..90eb0c50fd9c280798140020b6d9fbd0c2a3ebf0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/tracker.py @@ -0,0 +1,731 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import glob +import re +import paddle +import paddle.nn as nn +import numpy as np +from tqdm import tqdm +from collections import defaultdict + +from ppdet.core.workspace import create +from ppdet.utils.checkpoint import load_weight, load_pretrain_weight +from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box +from ppdet.modeling.mot.utils import MOTTimer, load_det_results, write_mot_results, save_vis_results +from ppdet.modeling.mot.tracker import JDETracker, CenterTracker +from ppdet.modeling.mot.tracker import DeepSORTTracker, OCSORTTracker, BOTSORTTracker +from ppdet.modeling.architectures import YOLOX +from ppdet.metrics import Metric, MOTMetric, KITTIMOTMetric, MCMOTMetric +from ppdet.data.source.category import get_categories +import ppdet.utils.stats as stats + +from .callbacks import Callback, ComposeCallback + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +MOT_ARCH = ['JDE', 'FairMOT', 'DeepSORT', 'ByteTrack', 'CenterTrack'] +MOT_ARCH_JDE = MOT_ARCH[:2] +MOT_ARCH_SDE = MOT_ARCH[2:4] +MOT_DATA_TYPE = ['mot', 'mcmot', 'kitti'] + +__all__ = ['Tracker'] + + +class Tracker(object): + def __init__(self, cfg, mode='eval'): + self.cfg = cfg + assert mode.lower() in ['test', 'eval'], \ + "mode should be 'test' or 'eval'" + self.mode = mode.lower() + self.optimizer = None + + # build MOT data loader + self.dataset = cfg['{}MOTDataset'.format(self.mode.capitalize())] + + # build model + self.model = create(cfg.architecture) + + if isinstance(self.model.detector, YOLOX): + for k, m in self.model.named_sublayers(): + if isinstance(m, nn.BatchNorm2D): + m._epsilon = 1e-3 # for amp(fp16) + m._momentum = 0.97 # 0.03 in pytorch + + anno_file = self.dataset.get_anno() + clsid2catid, catid2name = get_categories( + self.cfg.metric, anno_file=anno_file) + self.ids2names = [] + for k, v in catid2name.items(): + self.ids2names.append(v) + + self.status = {} + self.start_epoch = 0 + + # initial default callbacks + self._init_callbacks() + + # initial default metrics + self._init_metrics() + self._reset_metrics() + + def _init_callbacks(self): + self._callbacks = [] + self._compose_callback = None + + def _init_metrics(self): + if self.mode in ['test']: + self._metrics = [] + return + + if self.cfg.metric == 'MOT': + self._metrics = [MOTMetric(), ] + elif self.cfg.metric == 'MCMOT': + self._metrics = [MCMOTMetric(self.cfg.num_classes), ] + elif self.cfg.metric == 'KITTI': + self._metrics = [KITTIMOTMetric(), ] + else: + logger.warning("Metric not support for metric type {}".format( + self.cfg.metric)) + self._metrics = [] + + def _reset_metrics(self): + for metric in self._metrics: + metric.reset() + + def register_callbacks(self, callbacks): + callbacks = [h for h in list(callbacks) if h is not None] + for c in callbacks: + assert isinstance(c, Callback), \ + "metrics shoule be instances of subclass of Metric" + self._callbacks.extend(callbacks) + self._compose_callback = ComposeCallback(self._callbacks) + + def register_metrics(self, metrics): + metrics = [m for m in list(metrics) if m is not None] + for m in metrics: + assert isinstance(m, Metric), \ + "metrics shoule be instances of subclass of Metric" + self._metrics.extend(metrics) + + def load_weights_jde(self, weights): + load_weight(self.model, weights, self.optimizer) + + def load_weights_sde(self, det_weights, reid_weights): + with_detector = self.model.detector is not None + with_reid = self.model.reid is not None + + if with_detector: + load_weight(self.model.detector, det_weights) + if with_reid: + load_weight(self.model.reid, reid_weights) + else: + load_weight(self.model.reid, reid_weights) + + def _eval_seq_centertrack(self, + dataloader, + save_dir=None, + show_image=False, + frame_rate=30, + draw_threshold=0): + assert isinstance(self.model.tracker, CenterTracker) + if save_dir: + if not os.path.exists(save_dir): os.makedirs(save_dir) + tracker = self.model.tracker + + timer = MOTTimer() + frame_id = 0 + self.status['mode'] = 'track' + self.model.eval() + results = defaultdict(list) # only support single class now + + for step_id, data in enumerate(tqdm(dataloader)): + self.status['step_id'] = step_id + if step_id == 0: + self.model.reset_tracking() + + # forward + timer.tic() + pred_ret = self.model(data) + + online_targets = tracker.update(pred_ret) + online_tlwhs, online_scores, online_ids = [], [], [] + for t in online_targets: + bbox = t['bbox'] + tlwh = [bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]] + tscore = float(t['score']) + tid = int(t['tracking_id']) + if tlwh[2] * tlwh[3] > 0: + online_tlwhs.append(tlwh) + online_ids.append(tid) + online_scores.append(tscore) + timer.toc() + # save results + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + frame_id += 1 + return results, frame_id, timer.average_time, timer.calls + + def _eval_seq_jde(self, + dataloader, + save_dir=None, + show_image=False, + frame_rate=30, + draw_threshold=0): + if save_dir: + if not os.path.exists(save_dir): os.makedirs(save_dir) + tracker = self.model.tracker + tracker.max_time_lost = int(frame_rate / 30.0 * tracker.track_buffer) + + timer = MOTTimer() + frame_id = 0 + self.status['mode'] = 'track' + self.model.eval() + results = defaultdict(list) # support single class and multi classes + + for step_id, data in enumerate(tqdm(dataloader)): + self.status['step_id'] = step_id + # forward + timer.tic() + pred_dets, pred_embs = self.model(data) + + pred_dets, pred_embs = pred_dets.numpy(), pred_embs.numpy() + online_targets_dict = self.model.tracker.update(pred_dets, + pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(self.cfg.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + # save results + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id], + online_ids[cls_id])) + + timer.toc() + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + frame_id += 1 + + return results, frame_id, timer.average_time, timer.calls + + def _eval_seq_sde(self, + dataloader, + save_dir=None, + show_image=False, + frame_rate=30, + seq_name='', + scaled=False, + det_file='', + draw_threshold=0): + if save_dir: + if not os.path.exists(save_dir): os.makedirs(save_dir) + use_detector = False if not self.model.detector else True + use_reid = hasattr(self.model, 'reid') + if use_reid and self.model.reid is not None: + use_reid = True + else: + use_reid = False + + timer = MOTTimer() + results = defaultdict(list) + frame_id = 0 + self.status['mode'] = 'track' + self.model.eval() + if use_reid: + self.model.reid.eval() + if not use_detector: + dets_list = load_det_results(det_file, len(dataloader)) + logger.info('Finish loading detection results file {}.'.format( + det_file)) + + tracker = self.model.tracker + for step_id, data in enumerate(tqdm(dataloader)): + self.status['step_id'] = step_id + ori_image = data['ori_image'] # [bs, H, W, 3] + ori_image_shape = data['ori_image'].shape[1:3] + # ori_image_shape: [H, W] + + input_shape = data['image'].shape[2:] + # input_shape: [h, w], before data transforms, set in model config + + im_shape = data['im_shape'][0].numpy() + # im_shape: [new_h, new_w], after data transforms + scale_factor = data['scale_factor'][0].numpy() + + empty_detections = False + # when it has no detected bboxes, will not inference reid model + # and if visualize, use original image instead + + # forward + timer.tic() + if not use_detector: + dets = dets_list[frame_id] + bbox_tlwh = np.array(dets['bbox'], dtype='float32') + if bbox_tlwh.shape[0] > 0: + # detector outputs: pred_cls_ids, pred_scores, pred_bboxes + pred_cls_ids = np.array(dets['cls_id'], dtype='float32') + pred_scores = np.array(dets['score'], dtype='float32') + pred_bboxes = np.concatenate( + (bbox_tlwh[:, 0:2], + bbox_tlwh[:, 2:4] + bbox_tlwh[:, 0:2]), + axis=1) + else: + logger.warning( + 'Frame {} has not object, try to modify score threshold.'. + format(frame_id)) + empty_detections = True + else: + outs = self.model.detector(data) + outs['bbox'] = outs['bbox'].numpy() + outs['bbox_num'] = outs['bbox_num'].numpy() + + if len(outs['bbox']) > 0 and empty_detections == False: + # detector outputs: pred_cls_ids, pred_scores, pred_bboxes + pred_cls_ids = outs['bbox'][:, 0:1] + pred_scores = outs['bbox'][:, 1:2] + if not scaled: + # Note: scaled=False only in JDE YOLOv3 or other detectors + # with LetterBoxResize and JDEBBoxPostProcess. + # + # 'scaled' means whether the coords after detector outputs + # have been scaled back to the original image, set True + # in general detector, set False in JDE YOLOv3. + pred_bboxes = scale_coords(outs['bbox'][:, 2:], + input_shape, im_shape, + scale_factor) + else: + pred_bboxes = outs['bbox'][:, 2:] + pred_dets_old = np.concatenate( + (pred_cls_ids, pred_scores, pred_bboxes), axis=1) + else: + logger.warning( + 'Frame {} has not detected object, try to modify score threshold.'. + format(frame_id)) + empty_detections = True + + if not empty_detections: + pred_xyxys, keep_idx = clip_box(pred_bboxes, ori_image_shape) + if len(keep_idx[0]) == 0: + logger.warning( + 'Frame {} has not detected object left after clip_box.'. + format(frame_id)) + empty_detections = True + + if empty_detections: + timer.toc() + # if visualize, use original image instead + online_ids, online_tlwhs, online_scores = None, None, None + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + frame_id += 1 + # thus will not inference reid model + continue + + pred_cls_ids = pred_cls_ids[keep_idx[0]] + pred_scores = pred_scores[keep_idx[0]] + pred_dets = np.concatenate( + (pred_cls_ids, pred_scores, pred_xyxys), axis=1) + + if use_reid: + crops = get_crops( + pred_xyxys, + ori_image, + w=tracker.input_size[0], + h=tracker.input_size[1]) + crops = paddle.to_tensor(crops) + + data.update({'crops': crops}) + pred_embs = self.model(data)['embeddings'].numpy() + else: + pred_embs = None + + if isinstance(tracker, DeepSORTTracker): + online_tlwhs, online_scores, online_ids = [], [], [] + tracker.predict() + online_targets = tracker.update(pred_dets, pred_embs) + for t in online_targets: + if not t.is_confirmed() or t.time_since_update > 1: + continue + tlwh = t.to_tlwh() + tscore = t.score + tid = t.track_id + if tscore < draw_threshold: continue + if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > tracker.vertical_ratio: + continue + online_tlwhs.append(tlwh) + online_scores.append(tscore) + online_ids.append(tid) + timer.toc() + + # save results + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + + elif isinstance(tracker, JDETracker): + # trick hyperparams only used for MOTChallenge (MOT17, MOT20) Test-set + tracker.track_buffer, tracker.conf_thres = get_trick_hyperparams( + seq_name, tracker.track_buffer, tracker.conf_thres) + + online_targets_dict = tracker.update(pred_dets_old, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for cls_id in range(self.cfg.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= tracker.min_box_area: continue + if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + # save results + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], + online_scores[cls_id], online_ids[cls_id])) + timer.toc() + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + + elif isinstance(tracker, OCSORTTracker): + # OC_SORT Tracker + online_targets = tracker.update(pred_dets_old, pred_embs) + online_tlwhs = [] + online_ids = [] + online_scores = [] + for t in online_targets: + tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]] + tscore = float(t[4]) + tid = int(t[5]) + if tlwh[2] * tlwh[3] > 0: + online_tlwhs.append(tlwh) + online_ids.append(tid) + online_scores.append(tscore) + timer.toc() + # save results + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + + elif isinstance(tracker, BOTSORTTracker): + # BOTSORT Tracker + online_targets = tracker.update( + pred_dets_old, img=ori_image.numpy()) + online_tlwhs = [] + online_ids = [] + online_scores = [] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] > 0: + online_tlwhs.append(tlwh) + online_ids.append(tid) + online_scores.append(tscore) + timer.toc() + # save results + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes, self.ids2names) + + else: + raise ValueError(tracker) + frame_id += 1 + + return results, frame_id, timer.average_time, timer.calls + + def mot_evaluate(self, + data_root, + seqs, + output_dir, + data_type='mot', + model_type='JDE', + save_images=False, + save_videos=False, + show_image=False, + scaled=False, + det_results_dir=''): + if not os.path.exists(output_dir): os.makedirs(output_dir) + result_root = os.path.join(output_dir, 'mot_results') + if not os.path.exists(result_root): os.makedirs(result_root) + assert data_type in MOT_DATA_TYPE, \ + "data_type should be 'mot', 'mcmot' or 'kitti'" + assert model_type in MOT_ARCH, \ + "model_type should be 'JDE', 'DeepSORT', 'FairMOT' or 'ByteTrack'" + + # run tracking + n_frame = 0 + timer_avgs, timer_calls = [], [] + for seq in seqs: + infer_dir = os.path.join(data_root, seq) + if not os.path.exists(infer_dir) or not os.path.isdir(infer_dir): + logger.warning("Seq {} error, {} has no images.".format( + seq, infer_dir)) + continue + if os.path.exists(os.path.join(infer_dir, 'img1')): + infer_dir = os.path.join(infer_dir, 'img1') + + frame_rate = 30 + seqinfo = os.path.join(data_root, seq, 'seqinfo.ini') + if os.path.exists(seqinfo): + meta_info = open(seqinfo).read() + frame_rate = int(meta_info[meta_info.find('frameRate') + 10: + meta_info.find('\nseqLength')]) + + save_dir = os.path.join(output_dir, 'mot_outputs', + seq) if save_images or save_videos else None + logger.info('Evaluate seq: {}'.format(seq)) + + self.dataset.set_images(self.get_infer_images(infer_dir)) + dataloader = create('EvalMOTReader')(self.dataset, 0) + + result_filename = os.path.join(result_root, '{}.txt'.format(seq)) + + with paddle.no_grad(): + if model_type in MOT_ARCH_JDE: + results, nf, ta, tc = self._eval_seq_jde( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate) + elif model_type in MOT_ARCH_SDE: + results, nf, ta, tc = self._eval_seq_sde( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate, + seq_name=seq, + scaled=scaled, + det_file=os.path.join(det_results_dir, + '{}.txt'.format(seq))) + elif model_type == 'CenterTrack': + results, nf, ta, tc = self._eval_seq_centertrack( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate) + else: + raise ValueError(model_type) + + write_mot_results(result_filename, results, data_type, + self.cfg.num_classes) + n_frame += nf + timer_avgs.append(ta) + timer_calls.append(tc) + + if save_videos: + output_video_path = os.path.join(save_dir, '..', + '{}_vis.mp4'.format(seq)) + cmd_str = 'ffmpeg -f image2 -i {}/%05d.jpg {}'.format( + save_dir, output_video_path) + os.system(cmd_str) + logger.info('Save video in {}.'.format(output_video_path)) + + # update metrics + for metric in self._metrics: + metric.update(data_root, seq, data_type, result_root, + result_filename) + + timer_avgs = np.asarray(timer_avgs) + timer_calls = np.asarray(timer_calls) + all_time = np.dot(timer_avgs, timer_calls) + avg_time = all_time / np.sum(timer_calls) + logger.info('Time elapsed: {:.2f} seconds, FPS: {:.2f}'.format( + all_time, 1.0 / avg_time)) + + # accumulate metric to log out + for metric in self._metrics: + metric.accumulate() + metric.log() + # reset metric states for metric may performed multiple times + self._reset_metrics() + + def get_infer_images(self, infer_dir): + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + images = set() + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + images.sort() + assert len(images) > 0, "no image found in {}".format(infer_dir) + logger.info("Found {} inference images in total.".format(len(images))) + return images + + def mot_predict_seq(self, + video_file, + frame_rate, + image_dir, + output_dir, + data_type='mot', + model_type='JDE', + save_images=False, + save_videos=True, + show_image=False, + scaled=False, + det_results_dir='', + draw_threshold=0.5): + assert video_file is not None or image_dir is not None, \ + "--video_file or --image_dir should be set." + assert video_file is None or os.path.isfile(video_file), \ + "{} is not a file".format(video_file) + assert image_dir is None or os.path.isdir(image_dir), \ + "{} is not a directory".format(image_dir) + + if not os.path.exists(output_dir): os.makedirs(output_dir) + result_root = os.path.join(output_dir, 'mot_results') + if not os.path.exists(result_root): os.makedirs(result_root) + assert data_type in MOT_DATA_TYPE, \ + "data_type should be 'mot', 'mcmot' or 'kitti'" + assert model_type in MOT_ARCH, \ + "model_type should be 'JDE', 'DeepSORT', 'FairMOT' or 'ByteTrack'" + + # run tracking + if video_file: + seq = video_file.split('/')[-1].split('.')[0] + self.dataset.set_video(video_file, frame_rate) + logger.info('Starting tracking video {}'.format(video_file)) + elif image_dir: + seq = image_dir.split('/')[-1].split('.')[0] + if os.path.exists(os.path.join(image_dir, 'img1')): + image_dir = os.path.join(image_dir, 'img1') + images = [ + '{}/{}'.format(image_dir, x) for x in os.listdir(image_dir) + ] + images.sort() + self.dataset.set_images(images) + logger.info('Starting tracking folder {}, found {} images'.format( + image_dir, len(images))) + else: + raise ValueError('--video_file or --image_dir should be set.') + + save_dir = os.path.join(output_dir, 'mot_outputs', + seq) if save_images or save_videos else None + + dataloader = create('TestMOTReader')(self.dataset, 0) + result_filename = os.path.join(result_root, '{}.txt'.format(seq)) + if frame_rate == -1: + frame_rate = self.dataset.frame_rate + + with paddle.no_grad(): + if model_type in MOT_ARCH_JDE: + results, nf, ta, tc = self._eval_seq_jde( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate, + draw_threshold=draw_threshold) + elif model_type in MOT_ARCH_SDE: + results, nf, ta, tc = self._eval_seq_sde( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate, + seq_name=seq, + scaled=scaled, + det_file=os.path.join(det_results_dir, + '{}.txt'.format(seq)), + draw_threshold=draw_threshold) + elif model_type == 'CenterTrack': + results, nf, ta, tc = self._eval_seq_centertrack( + dataloader, + save_dir=save_dir, + show_image=show_image, + frame_rate=frame_rate) + else: + raise ValueError(model_type) + + if save_videos: + output_video_path = os.path.join(save_dir, '..', + '{}_vis.mp4'.format(seq)) + cmd_str = 'ffmpeg -f image2 -i {}/%05d.jpg {}'.format( + save_dir, output_video_path) + os.system(cmd_str) + logger.info('Save video in {}'.format(output_video_path)) + + write_mot_results(result_filename, results, data_type, + self.cfg.num_classes) + + +def get_trick_hyperparams(video_name, ori_buffer, ori_thresh): + if video_name[:3] != 'MOT': + # only used for MOTChallenge (MOT17, MOT20) Test-set + return ori_buffer, ori_thresh + + video_name = video_name[:8] + if 'MOT17-05' in video_name: + track_buffer = 14 + elif 'MOT17-13' in video_name: + track_buffer = 25 + else: + track_buffer = ori_buffer + + if 'MOT17-01' in video_name: + track_thresh = 0.65 + elif 'MOT17-06' in video_name: + track_thresh = 0.65 + elif 'MOT17-12' in video_name: + track_thresh = 0.7 + elif 'MOT17-14' in video_name: + track_thresh = 0.67 + else: + track_thresh = ori_thresh + + if 'MOT20-06' in video_name or 'MOT20-08' in video_name: + track_thresh = 0.3 + else: + track_thresh = ori_thresh + + return track_buffer, ori_thresh diff --git a/PaddleDetection-release-2.6/ppdet/engine/trainer.py b/PaddleDetection-release-2.6/ppdet/engine/trainer.py new file mode 100644 index 0000000000000000000000000000000000000000..0378e00ecb548747ca0996bc206b22d498f8b301 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/trainer.py @@ -0,0 +1,1263 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import copy +import time +from tqdm import tqdm + +import numpy as np +import typing +from PIL import Image, ImageOps, ImageFile + +ImageFile.LOAD_TRUNCATED_IMAGES = True + +import paddle +import paddle.nn as nn +import paddle.distributed as dist +from paddle.distributed import fleet +from paddle.static import InputSpec +from ppdet.optimizer import ModelEMA + +from ppdet.core.workspace import create +from ppdet.utils.checkpoint import load_weight, load_pretrain_weight +from ppdet.utils.visualizer import visualize_results, save_result +from ppdet.metrics import Metric, COCOMetric, VOCMetric, WiderFaceMetric, get_infer_results, KeyPointTopDownCOCOEval, KeyPointTopDownMPIIEval, Pose3DEval +from ppdet.metrics import RBoxMetric, JDEDetMetric, SNIPERCOCOMetric +from ppdet.data.source.sniper_coco import SniperCOCODataSet +from ppdet.data.source.category import get_categories +import ppdet.utils.stats as stats +from ppdet.utils.fuse_utils import fuse_conv_bn +from ppdet.utils import profiler +from ppdet.modeling.post_process import multiclass_nms + +from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, WiferFaceEval, VisualDLWriter, SniperProposalsGenerator, WandbCallback +from .export_utils import _dump_infer_config, _prune_input_spec, apply_to_static + +from paddle.distributed.fleet.utils.hybrid_parallel_util import fused_allreduce_gradients + +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +__all__ = ['Trainer'] + +MOT_ARCH = ['JDE', 'FairMOT', 'DeepSORT', 'ByteTrack', 'CenterTrack'] + + +class Trainer(object): + def __init__(self, cfg, mode='train'): + self.cfg = cfg.copy() + assert mode.lower() in ['train', 'eval', 'test'], \ + "mode should be 'train', 'eval' or 'test'" + self.mode = mode.lower() + self.optimizer = None + self.is_loaded_weights = False + self.use_amp = self.cfg.get('amp', False) + self.amp_level = self.cfg.get('amp_level', 'O1') + self.custom_white_list = self.cfg.get('custom_white_list', None) + self.custom_black_list = self.cfg.get('custom_black_list', None) + if 'slim' in cfg and cfg['slim_type'] == 'PTQ': + self.cfg['TestDataset'] = create('TestDataset')() + + # build data loader + capital_mode = self.mode.capitalize() + if cfg.architecture in MOT_ARCH and self.mode in [ + 'eval', 'test' + ] and cfg.metric not in ['COCO', 'VOC']: + self.dataset = self.cfg['{}MOTDataset'.format( + capital_mode)] = create('{}MOTDataset'.format(capital_mode))() + else: + self.dataset = self.cfg['{}Dataset'.format(capital_mode)] = create( + '{}Dataset'.format(capital_mode))() + + if cfg.architecture == 'DeepSORT' and self.mode == 'train': + logger.error('DeepSORT has no need of training on mot dataset.') + sys.exit(1) + + if cfg.architecture == 'FairMOT' and self.mode == 'eval': + images = self.parse_mot_images(cfg) + self.dataset.set_images(images) + + if self.mode == 'train': + self.loader = create('{}Reader'.format(capital_mode))( + self.dataset, cfg.worker_num) + + if cfg.architecture == 'JDE' and self.mode == 'train': + self.cfg['JDEEmbeddingHead'][ + 'num_identities'] = self.dataset.num_identities_dict[0] + # JDE only support single class MOT now. + + if cfg.architecture == 'FairMOT' and self.mode == 'train': + self.cfg['FairMOTEmbeddingHead'][ + 'num_identities_dict'] = self.dataset.num_identities_dict + # FairMOT support single class and multi-class MOT now. + + # build model + if 'model' not in self.cfg: + self.model = create(cfg.architecture) + else: + self.model = self.cfg.model + self.is_loaded_weights = True + + if cfg.architecture == 'YOLOX': + for k, m in self.model.named_sublayers(): + if isinstance(m, nn.BatchNorm2D): + m._epsilon = 1e-3 # for amp(fp16) + m._momentum = 0.97 # 0.03 in pytorch + + #normalize params for deploy + if 'slim' in cfg and cfg['slim_type'] == 'OFA': + self.model.model.load_meanstd(cfg['TestReader'][ + 'sample_transforms']) + elif 'slim' in cfg and cfg['slim_type'] == 'Distill': + self.model.student_model.load_meanstd(cfg['TestReader'][ + 'sample_transforms']) + elif 'slim' in cfg and cfg[ + 'slim_type'] == 'DistillPrune' and self.mode == 'train': + self.model.student_model.load_meanstd(cfg['TestReader'][ + 'sample_transforms']) + else: + self.model.load_meanstd(cfg['TestReader']['sample_transforms']) + + # EvalDataset build with BatchSampler to evaluate in single device + # TODO: multi-device evaluate + if self.mode == 'eval': + if cfg.architecture == 'FairMOT': + self.loader = create('EvalMOTReader')(self.dataset, 0) + elif cfg.architecture == "METRO_Body": + reader_name = '{}Reader'.format(self.mode.capitalize()) + self.loader = create(reader_name)(self.dataset, cfg.worker_num) + else: + self._eval_batch_sampler = paddle.io.BatchSampler( + self.dataset, batch_size=self.cfg.EvalReader['batch_size']) + reader_name = '{}Reader'.format(self.mode.capitalize()) + # If metric is VOC, need to be set collate_batch=False. + if cfg.metric == 'VOC': + self.cfg[reader_name]['collate_batch'] = False + self.loader = create(reader_name)(self.dataset, cfg.worker_num, + self._eval_batch_sampler) + # TestDataset build after user set images, skip loader creation here + + # get Params + print_params = self.cfg.get('print_params', False) + if print_params: + params = sum([ + p.numel() for n, p in self.model.named_parameters() + if all([x not in n for x in ['_mean', '_variance', 'aux_']]) + ]) # exclude BatchNorm running status + logger.info('Model Params : {} M.'.format((params / 1e6).numpy()[ + 0])) + + # build optimizer in train mode + if self.mode == 'train': + steps_per_epoch = len(self.loader) + if steps_per_epoch < 1: + logger.warning( + "Samples in dataset are less than batch_size, please set smaller batch_size in TrainReader." + ) + self.lr = create('LearningRate')(steps_per_epoch) + self.optimizer = create('OptimizerBuilder')(self.lr, self.model) + + # Unstructured pruner is only enabled in the train mode. + if self.cfg.get('unstructured_prune'): + self.pruner = create('UnstructuredPruner')(self.model, + steps_per_epoch) + if self.use_amp and self.amp_level == 'O2': + self.model, self.optimizer = paddle.amp.decorate( + models=self.model, + optimizers=self.optimizer, + level=self.amp_level) + self.use_ema = ('use_ema' in cfg and cfg['use_ema']) + if self.use_ema: + ema_decay = self.cfg.get('ema_decay', 0.9998) + ema_decay_type = self.cfg.get('ema_decay_type', 'threshold') + cycle_epoch = self.cfg.get('cycle_epoch', -1) + ema_black_list = self.cfg.get('ema_black_list', None) + ema_filter_no_grad = self.cfg.get('ema_filter_no_grad', False) + self.ema = ModelEMA( + self.model, + decay=ema_decay, + ema_decay_type=ema_decay_type, + cycle_epoch=cycle_epoch, + ema_black_list=ema_black_list, + ema_filter_no_grad=ema_filter_no_grad) + + self._nranks = dist.get_world_size() + self._local_rank = dist.get_rank() + + self.status = {} + + self.start_epoch = 0 + self.end_epoch = 0 if 'epoch' not in cfg else cfg.epoch + + # initial default callbacks + self._init_callbacks() + + # initial default metrics + self._init_metrics() + self._reset_metrics() + + def _init_callbacks(self): + if self.mode == 'train': + self._callbacks = [LogPrinter(self), Checkpointer(self)] + if self.cfg.get('use_vdl', False): + self._callbacks.append(VisualDLWriter(self)) + if self.cfg.get('save_proposals', False): + self._callbacks.append(SniperProposalsGenerator(self)) + if self.cfg.get('use_wandb', False) or 'wandb' in self.cfg: + self._callbacks.append(WandbCallback(self)) + self._compose_callback = ComposeCallback(self._callbacks) + elif self.mode == 'eval': + self._callbacks = [LogPrinter(self)] + if self.cfg.metric == 'WiderFace': + self._callbacks.append(WiferFaceEval(self)) + self._compose_callback = ComposeCallback(self._callbacks) + elif self.mode == 'test' and self.cfg.get('use_vdl', False): + self._callbacks = [VisualDLWriter(self)] + self._compose_callback = ComposeCallback(self._callbacks) + else: + self._callbacks = [] + self._compose_callback = None + + def _init_metrics(self, validate=False): + if self.mode == 'test' or (self.mode == 'train' and not validate): + self._metrics = [] + return + classwise = self.cfg['classwise'] if 'classwise' in self.cfg else False + if self.cfg.metric == 'COCO' or self.cfg.metric == "SNIPERCOCO": + # TODO: bias should be unified + bias = 1 if self.cfg.get('bias', False) else 0 + output_eval = self.cfg['output_eval'] \ + if 'output_eval' in self.cfg else None + save_prediction_only = self.cfg.get('save_prediction_only', False) + + # pass clsid2catid info to metric instance to avoid multiple loading + # annotation file + clsid2catid = {v: k for k, v in self.dataset.catid2clsid.items()} \ + if self.mode == 'eval' else None + + # when do validation in train, annotation file should be get from + # EvalReader instead of self.dataset(which is TrainReader) + if self.mode == 'train' and validate: + eval_dataset = self.cfg['EvalDataset'] + eval_dataset.check_or_download_dataset() + anno_file = eval_dataset.get_anno() + dataset = eval_dataset + else: + dataset = self.dataset + anno_file = dataset.get_anno() + + IouType = self.cfg['IouType'] if 'IouType' in self.cfg else 'bbox' + if self.cfg.metric == "COCO": + self._metrics = [ + COCOMetric( + anno_file=anno_file, + clsid2catid=clsid2catid, + classwise=classwise, + output_eval=output_eval, + bias=bias, + IouType=IouType, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == "SNIPERCOCO": # sniper + self._metrics = [ + SNIPERCOCOMetric( + anno_file=anno_file, + dataset=dataset, + clsid2catid=clsid2catid, + classwise=classwise, + output_eval=output_eval, + bias=bias, + IouType=IouType, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == 'RBOX': + # TODO: bias should be unified + bias = self.cfg['bias'] if 'bias' in self.cfg else 0 + output_eval = self.cfg['output_eval'] \ + if 'output_eval' in self.cfg else None + save_prediction_only = self.cfg.get('save_prediction_only', False) + imid2path = self.cfg.get('imid2path', None) + + # when do validation in train, annotation file should be get from + # EvalReader instead of self.dataset(which is TrainReader) + anno_file = self.dataset.get_anno() + if self.mode == 'train' and validate: + eval_dataset = self.cfg['EvalDataset'] + eval_dataset.check_or_download_dataset() + anno_file = eval_dataset.get_anno() + + self._metrics = [ + RBoxMetric( + anno_file=anno_file, + classwise=classwise, + output_eval=output_eval, + bias=bias, + save_prediction_only=save_prediction_only, + imid2path=imid2path) + ] + elif self.cfg.metric == 'VOC': + output_eval = self.cfg['output_eval'] \ + if 'output_eval' in self.cfg else None + save_prediction_only = self.cfg.get('save_prediction_only', False) + + self._metrics = [ + VOCMetric( + label_list=self.dataset.get_label_list(), + class_num=self.cfg.num_classes, + map_type=self.cfg.map_type, + classwise=classwise, + output_eval=output_eval, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == 'WiderFace': + multi_scale = self.cfg.multi_scale_eval if 'multi_scale_eval' in self.cfg else True + self._metrics = [ + WiderFaceMetric( + image_dir=os.path.join(self.dataset.dataset_dir, + self.dataset.image_dir), + anno_file=self.dataset.get_anno(), + multi_scale=multi_scale) + ] + elif self.cfg.metric == 'KeyPointTopDownCOCOEval': + eval_dataset = self.cfg['EvalDataset'] + eval_dataset.check_or_download_dataset() + anno_file = eval_dataset.get_anno() + save_prediction_only = self.cfg.get('save_prediction_only', False) + self._metrics = [ + KeyPointTopDownCOCOEval( + anno_file, + len(eval_dataset), + self.cfg.num_joints, + self.cfg.save_dir, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == 'KeyPointTopDownMPIIEval': + eval_dataset = self.cfg['EvalDataset'] + eval_dataset.check_or_download_dataset() + anno_file = eval_dataset.get_anno() + save_prediction_only = self.cfg.get('save_prediction_only', False) + self._metrics = [ + KeyPointTopDownMPIIEval( + anno_file, + len(eval_dataset), + self.cfg.num_joints, + self.cfg.save_dir, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == 'Pose3DEval': + save_prediction_only = self.cfg.get('save_prediction_only', False) + self._metrics = [ + Pose3DEval( + self.cfg.save_dir, + save_prediction_only=save_prediction_only) + ] + elif self.cfg.metric == 'MOTDet': + self._metrics = [JDEDetMetric(), ] + else: + logger.warning("Metric not support for metric type {}".format( + self.cfg.metric)) + self._metrics = [] + + def _reset_metrics(self): + for metric in self._metrics: + metric.reset() + + def register_callbacks(self, callbacks): + callbacks = [c for c in list(callbacks) if c is not None] + for c in callbacks: + assert isinstance(c, Callback), \ + "metrics shoule be instances of subclass of Metric" + self._callbacks.extend(callbacks) + self._compose_callback = ComposeCallback(self._callbacks) + + def register_metrics(self, metrics): + metrics = [m for m in list(metrics) if m is not None] + for m in metrics: + assert isinstance(m, Metric), \ + "metrics shoule be instances of subclass of Metric" + self._metrics.extend(metrics) + + def load_weights(self, weights): + if self.is_loaded_weights: + return + self.start_epoch = 0 + load_pretrain_weight(self.model, weights) + logger.debug("Load weights {} to start training".format(weights)) + + def load_weights_sde(self, det_weights, reid_weights): + if self.model.detector: + load_weight(self.model.detector, det_weights) + if self.model.reid: + load_weight(self.model.reid, reid_weights) + else: + load_weight(self.model.reid, reid_weights) + + def resume_weights(self, weights): + # support Distill resume weights + if hasattr(self.model, 'student_model'): + self.start_epoch = load_weight(self.model.student_model, weights, + self.optimizer) + else: + self.start_epoch = load_weight(self.model, weights, self.optimizer, + self.ema if self.use_ema else None) + logger.debug("Resume weights of epoch {}".format(self.start_epoch)) + + def train(self, validate=False): + assert self.mode == 'train', "Model not in 'train' mode" + Init_mark = False + if validate: + self.cfg['EvalDataset'] = self.cfg.EvalDataset = create( + "EvalDataset")() + + model = self.model + if self.cfg.get('to_static', False): + model = apply_to_static(self.cfg, model) + sync_bn = ( + getattr(self.cfg, 'norm_type', None) == 'sync_bn' and + (self.cfg.use_gpu or self.cfg.use_npu or self.cfg.use_mlu) and + self._nranks > 1) + if sync_bn: + model = paddle.nn.SyncBatchNorm.convert_sync_batchnorm(model) + + # enabel auto mixed precision mode + if self.use_amp: + scaler = paddle.amp.GradScaler( + enable=self.cfg.use_gpu or self.cfg.use_npu or self.cfg.use_mlu, + init_loss_scaling=self.cfg.get('init_loss_scaling', 1024)) + # get distributed model + if self.cfg.get('fleet', False): + model = fleet.distributed_model(model) + self.optimizer = fleet.distributed_optimizer(self.optimizer) + elif self._nranks > 1: + find_unused_parameters = self.cfg[ + 'find_unused_parameters'] if 'find_unused_parameters' in self.cfg else False + model = paddle.DataParallel( + model, find_unused_parameters=find_unused_parameters) + + self.status.update({ + 'epoch_id': self.start_epoch, + 'step_id': 0, + 'steps_per_epoch': len(self.loader) + }) + + self.status['batch_time'] = stats.SmoothedValue( + self.cfg.log_iter, fmt='{avg:.4f}') + self.status['data_time'] = stats.SmoothedValue( + self.cfg.log_iter, fmt='{avg:.4f}') + self.status['training_staus'] = stats.TrainingStats(self.cfg.log_iter) + + if self.cfg.get('print_flops', False): + flops_loader = create('{}Reader'.format(self.mode.capitalize()))( + self.dataset, self.cfg.worker_num) + self._flops(flops_loader) + profiler_options = self.cfg.get('profiler_options', None) + + self._compose_callback.on_train_begin(self.status) + + use_fused_allreduce_gradients = self.cfg[ + 'use_fused_allreduce_gradients'] if 'use_fused_allreduce_gradients' in self.cfg else False + + for epoch_id in range(self.start_epoch, self.cfg.epoch): + self.status['mode'] = 'train' + self.status['epoch_id'] = epoch_id + self._compose_callback.on_epoch_begin(self.status) + self.loader.dataset.set_epoch(epoch_id) + model.train() + iter_tic = time.time() + for step_id, data in enumerate(self.loader): + self.status['data_time'].update(time.time() - iter_tic) + self.status['step_id'] = step_id + profiler.add_profiler_step(profiler_options) + self._compose_callback.on_step_begin(self.status) + data['epoch_id'] = epoch_id + + if self.use_amp: + if isinstance( + model, paddle. + DataParallel) and use_fused_allreduce_gradients: + with model.no_sync(): + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu or + self.cfg.use_npu or self.cfg.use_mlu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + scaled_loss = scaler.scale(loss) + scaled_loss.backward() + fused_allreduce_gradients( + list(model.parameters()), None) + else: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu or self.cfg.use_npu or + self.cfg.use_mlu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + scaled_loss = scaler.scale(loss) + scaled_loss.backward() + # in dygraph mode, optimizer.minimize is equal to optimizer.step + scaler.minimize(self.optimizer, scaled_loss) + else: + if isinstance( + model, paddle. + DataParallel) and use_fused_allreduce_gradients: + with model.no_sync(): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + loss.backward() + fused_allreduce_gradients( + list(model.parameters()), None) + else: + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + loss.backward() + self.optimizer.step() + curr_lr = self.optimizer.get_lr() + self.lr.step() + if self.cfg.get('unstructured_prune'): + self.pruner.step() + self.optimizer.clear_grad() + self.status['learning_rate'] = curr_lr + + if self._nranks < 2 or self._local_rank == 0: + self.status['training_staus'].update(outputs) + + self.status['batch_time'].update(time.time() - iter_tic) + self._compose_callback.on_step_end(self.status) + if self.use_ema: + self.ema.update() + iter_tic = time.time() + + if self.cfg.get('unstructured_prune'): + self.pruner.update_params() + + is_snapshot = (self._nranks < 2 or (self._local_rank == 0 or self.cfg.metric == "Pose3DEval")) \ + and ((epoch_id + 1) % self.cfg.snapshot_epoch == 0 or epoch_id == self.end_epoch - 1) + if is_snapshot and self.use_ema: + # apply ema weight on model + weight = copy.deepcopy(self.model.state_dict()) + self.model.set_dict(self.ema.apply()) + self.status['weight'] = weight + + self._compose_callback.on_epoch_end(self.status) + + if validate and is_snapshot: + if not hasattr(self, '_eval_loader'): + # build evaluation dataset and loader + self._eval_dataset = self.cfg.EvalDataset + self._eval_batch_sampler = \ + paddle.io.BatchSampler( + self._eval_dataset, + batch_size=self.cfg.EvalReader['batch_size']) + # If metric is VOC, need to be set collate_batch=False. + if self.cfg.metric == 'VOC': + self.cfg['EvalReader']['collate_batch'] = False + if self.cfg.metric == "Pose3DEval": + self._eval_loader = create('EvalReader')( + self._eval_dataset, self.cfg.worker_num) + else: + self._eval_loader = create('EvalReader')( + self._eval_dataset, + self.cfg.worker_num, + batch_sampler=self._eval_batch_sampler) + # if validation in training is enabled, metrics should be re-init + # Init_mark makes sure this code will only execute once + if validate and Init_mark == False: + Init_mark = True + self._init_metrics(validate=validate) + self._reset_metrics() + + with paddle.no_grad(): + self.status['save_best_model'] = True + self._eval_with_loader(self._eval_loader) + + if is_snapshot and self.use_ema: + # reset original weight + self.model.set_dict(weight) + self.status.pop('weight') + + self._compose_callback.on_train_end(self.status) + + def _eval_with_loader(self, loader): + sample_num = 0 + tic = time.time() + self._compose_callback.on_epoch_begin(self.status) + self.status['mode'] = 'eval' + + self.model.eval() + if self.cfg.get('print_flops', False): + flops_loader = create('{}Reader'.format(self.mode.capitalize()))( + self.dataset, self.cfg.worker_num, self._eval_batch_sampler) + self._flops(flops_loader) + for step_id, data in enumerate(loader): + self.status['step_id'] = step_id + self._compose_callback.on_step_begin(self.status) + # forward + if self.use_amp: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu or self.cfg.use_npu or + self.cfg.use_mlu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + outs = self.model(data) + else: + outs = self.model(data) + + # update metrics + for metric in self._metrics: + metric.update(data, outs) + + # multi-scale inputs: all inputs have same im_id + if isinstance(data, typing.Sequence): + sample_num += data[0]['im_id'].numpy().shape[0] + else: + sample_num += data['im_id'].numpy().shape[0] + self._compose_callback.on_step_end(self.status) + + self.status['sample_num'] = sample_num + self.status['cost_time'] = time.time() - tic + + # accumulate metric to log out + for metric in self._metrics: + metric.accumulate() + metric.log() + self._compose_callback.on_epoch_end(self.status) + # reset metric states for metric may performed multiple times + self._reset_metrics() + + def evaluate(self): + # get distributed model + if self.cfg.get('fleet', False): + self.model = fleet.distributed_model(self.model) + self.optimizer = fleet.distributed_optimizer(self.optimizer) + elif self._nranks > 1: + find_unused_parameters = self.cfg[ + 'find_unused_parameters'] if 'find_unused_parameters' in self.cfg else False + self.model = paddle.DataParallel( + self.model, find_unused_parameters=find_unused_parameters) + with paddle.no_grad(): + self._eval_with_loader(self.loader) + + def _eval_with_loader_slice(self, + loader, + slice_size=[640, 640], + overlap_ratio=[0.25, 0.25], + combine_method='nms', + match_threshold=0.6, + match_metric='iou'): + sample_num = 0 + tic = time.time() + self._compose_callback.on_epoch_begin(self.status) + self.status['mode'] = 'eval' + self.model.eval() + if self.cfg.get('print_flops', False): + flops_loader = create('{}Reader'.format(self.mode.capitalize()))( + self.dataset, self.cfg.worker_num, self._eval_batch_sampler) + self._flops(flops_loader) + + merged_bboxs = [] + for step_id, data in enumerate(loader): + self.status['step_id'] = step_id + self._compose_callback.on_step_begin(self.status) + # forward + if self.use_amp: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu or self.cfg.use_npu or + self.cfg.use_mlu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + outs = self.model(data) + else: + outs = self.model(data) + + shift_amount = data['st_pix'] + outs['bbox'][:, 2:4] = outs['bbox'][:, 2:4] + shift_amount + outs['bbox'][:, 4:6] = outs['bbox'][:, 4:6] + shift_amount + merged_bboxs.append(outs['bbox']) + + if data['is_last'] > 0: + # merge matching predictions + merged_results = {'bbox': []} + if combine_method == 'nms': + final_boxes = multiclass_nms( + np.concatenate(merged_bboxs), self.cfg.num_classes, + match_threshold, match_metric) + merged_results['bbox'] = np.concatenate(final_boxes) + elif combine_method == 'concat': + merged_results['bbox'] = np.concatenate(merged_bboxs) + else: + raise ValueError( + "Now only support 'nms' or 'concat' to fuse detection results." + ) + merged_results['im_id'] = np.array([[0]]) + merged_results['bbox_num'] = np.array( + [len(merged_results['bbox'])]) + + merged_bboxs = [] + data['im_id'] = data['ori_im_id'] + # update metrics + for metric in self._metrics: + metric.update(data, merged_results) + + # multi-scale inputs: all inputs have same im_id + if isinstance(data, typing.Sequence): + sample_num += data[0]['im_id'].numpy().shape[0] + else: + sample_num += data['im_id'].numpy().shape[0] + + self._compose_callback.on_step_end(self.status) + + self.status['sample_num'] = sample_num + self.status['cost_time'] = time.time() - tic + + # accumulate metric to log out + for metric in self._metrics: + metric.accumulate() + metric.log() + self._compose_callback.on_epoch_end(self.status) + # reset metric states for metric may performed multiple times + self._reset_metrics() + + def evaluate_slice(self, + slice_size=[640, 640], + overlap_ratio=[0.25, 0.25], + combine_method='nms', + match_threshold=0.6, + match_metric='iou'): + with paddle.no_grad(): + self._eval_with_loader_slice(self.loader, slice_size, overlap_ratio, + combine_method, match_threshold, + match_metric) + + def slice_predict(self, + images, + slice_size=[640, 640], + overlap_ratio=[0.25, 0.25], + combine_method='nms', + match_threshold=0.6, + match_metric='iou', + draw_threshold=0.5, + output_dir='output', + save_results=False, + visualize=True): + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + self.dataset.set_slice_images(images, slice_size, overlap_ratio) + loader = create('TestReader')(self.dataset, 0) + imid2path = self.dataset.get_imid2path() + + def setup_metrics_for_loader(): + # mem + metrics = copy.deepcopy(self._metrics) + mode = self.mode + save_prediction_only = self.cfg[ + 'save_prediction_only'] if 'save_prediction_only' in self.cfg else None + output_eval = self.cfg[ + 'output_eval'] if 'output_eval' in self.cfg else None + + # modify + self.mode = '_test' + self.cfg['save_prediction_only'] = True + self.cfg['output_eval'] = output_dir + self.cfg['imid2path'] = imid2path + self._init_metrics() + + # restore + self.mode = mode + self.cfg.pop('save_prediction_only') + if save_prediction_only is not None: + self.cfg['save_prediction_only'] = save_prediction_only + + self.cfg.pop('output_eval') + if output_eval is not None: + self.cfg['output_eval'] = output_eval + + self.cfg.pop('imid2path') + + _metrics = copy.deepcopy(self._metrics) + self._metrics = metrics + + return _metrics + + if save_results: + metrics = setup_metrics_for_loader() + else: + metrics = [] + + anno_file = self.dataset.get_anno() + clsid2catid, catid2name = get_categories( + self.cfg.metric, anno_file=anno_file) + + # Run Infer + self.status['mode'] = 'test' + self.model.eval() + if self.cfg.get('print_flops', False): + flops_loader = create('TestReader')(self.dataset, 0) + self._flops(flops_loader) + + results = [] # all images + merged_bboxs = [] # single image + for step_id, data in enumerate(tqdm(loader)): + self.status['step_id'] = step_id + # forward + outs = self.model(data) + + outs['bbox'] = outs['bbox'].numpy() # only in test mode + shift_amount = data['st_pix'] + outs['bbox'][:, 2:4] = outs['bbox'][:, 2:4] + shift_amount.numpy() + outs['bbox'][:, 4:6] = outs['bbox'][:, 4:6] + shift_amount.numpy() + merged_bboxs.append(outs['bbox']) + + if data['is_last'] > 0: + # merge matching predictions + merged_results = {'bbox': []} + if combine_method == 'nms': + final_boxes = multiclass_nms( + np.concatenate(merged_bboxs), self.cfg.num_classes, + match_threshold, match_metric) + merged_results['bbox'] = np.concatenate(final_boxes) + elif combine_method == 'concat': + merged_results['bbox'] = np.concatenate(merged_bboxs) + else: + raise ValueError( + "Now only support 'nms' or 'concat' to fuse detection results." + ) + merged_results['im_id'] = np.array([[0]]) + merged_results['bbox_num'] = np.array( + [len(merged_results['bbox'])]) + + merged_bboxs = [] + data['im_id'] = data['ori_im_id'] + + for _m in metrics: + _m.update(data, merged_results) + + for key in ['im_shape', 'scale_factor', 'im_id']: + if isinstance(data, typing.Sequence): + merged_results[key] = data[0][key] + else: + merged_results[key] = data[key] + for key, value in merged_results.items(): + if hasattr(value, 'numpy'): + merged_results[key] = value.numpy() + results.append(merged_results) + + for _m in metrics: + _m.accumulate() + _m.reset() + + if visualize: + for outs in results: + batch_res = get_infer_results(outs, clsid2catid) + bbox_num = outs['bbox_num'] + + start = 0 + for i, im_id in enumerate(outs['im_id']): + image_path = imid2path[int(im_id)] + image = Image.open(image_path).convert('RGB') + image = ImageOps.exif_transpose(image) + self.status['original_image'] = np.array(image.copy()) + + end = start + bbox_num[i] + bbox_res = batch_res['bbox'][start:end] \ + if 'bbox' in batch_res else None + mask_res = batch_res['mask'][start:end] \ + if 'mask' in batch_res else None + segm_res = batch_res['segm'][start:end] \ + if 'segm' in batch_res else None + keypoint_res = batch_res['keypoint'][start:end] \ + if 'keypoint' in batch_res else None + pose3d_res = batch_res['pose3d'][start:end] \ + if 'pose3d' in batch_res else None + image = visualize_results( + image, bbox_res, mask_res, segm_res, keypoint_res, + pose3d_res, int(im_id), catid2name, draw_threshold) + self.status['result_image'] = np.array(image.copy()) + if self._compose_callback: + self._compose_callback.on_step_end(self.status) + # save image with detection + save_name = self._get_save_image_name(output_dir, + image_path) + logger.info("Detection bbox results save in {}".format( + save_name)) + image.save(save_name, quality=95) + + start = end + + def predict(self, + images, + draw_threshold=0.5, + output_dir='output', + save_results=False, + visualize=True): + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + self.dataset.set_images(images) + loader = create('TestReader')(self.dataset, 0) + + imid2path = self.dataset.get_imid2path() + + def setup_metrics_for_loader(): + # mem + metrics = copy.deepcopy(self._metrics) + mode = self.mode + save_prediction_only = self.cfg[ + 'save_prediction_only'] if 'save_prediction_only' in self.cfg else None + output_eval = self.cfg[ + 'output_eval'] if 'output_eval' in self.cfg else None + + # modify + self.mode = '_test' + self.cfg['save_prediction_only'] = True + self.cfg['output_eval'] = output_dir + self.cfg['imid2path'] = imid2path + self._init_metrics() + + # restore + self.mode = mode + self.cfg.pop('save_prediction_only') + if save_prediction_only is not None: + self.cfg['save_prediction_only'] = save_prediction_only + + self.cfg.pop('output_eval') + if output_eval is not None: + self.cfg['output_eval'] = output_eval + + self.cfg.pop('imid2path') + + _metrics = copy.deepcopy(self._metrics) + self._metrics = metrics + + return _metrics + + if save_results: + metrics = setup_metrics_for_loader() + else: + metrics = [] + + anno_file = self.dataset.get_anno() + clsid2catid, catid2name = get_categories( + self.cfg.metric, anno_file=anno_file) + + # Run Infer + self.status['mode'] = 'test' + self.model.eval() + if self.cfg.get('print_flops', False): + flops_loader = create('TestReader')(self.dataset, 0) + self._flops(flops_loader) + results = [] + for step_id, data in enumerate(tqdm(loader)): + self.status['step_id'] = step_id + # forward + outs = self.model(data) + + for _m in metrics: + _m.update(data, outs) + + for key in ['im_shape', 'scale_factor', 'im_id']: + if isinstance(data, typing.Sequence): + outs[key] = data[0][key] + else: + outs[key] = data[key] + for key, value in outs.items(): + if hasattr(value, 'numpy'): + outs[key] = value.numpy() + results.append(outs) + + # sniper + if type(self.dataset) == SniperCOCODataSet: + results = self.dataset.anno_cropper.aggregate_chips_detections( + results) + + for _m in metrics: + _m.accumulate() + _m.reset() + + if visualize: + for outs in results: + batch_res = get_infer_results(outs, clsid2catid) + bbox_num = outs['bbox_num'] + + start = 0 + for i, im_id in enumerate(outs['im_id']): + image_path = imid2path[int(im_id)] + image = Image.open(image_path).convert('RGB') + image = ImageOps.exif_transpose(image) + self.status['original_image'] = np.array(image.copy()) + + end = start + bbox_num[i] + bbox_res = batch_res['bbox'][start:end] \ + if 'bbox' in batch_res else None + mask_res = batch_res['mask'][start:end] \ + if 'mask' in batch_res else None + segm_res = batch_res['segm'][start:end] \ + if 'segm' in batch_res else None + keypoint_res = batch_res['keypoint'][start:end] \ + if 'keypoint' in batch_res else None + pose3d_res = batch_res['pose3d'][start:end] \ + if 'pose3d' in batch_res else None + image = visualize_results( + image, bbox_res, mask_res, segm_res, keypoint_res, + pose3d_res, int(im_id), catid2name, draw_threshold) + self.status['result_image'] = np.array(image.copy()) + if self._compose_callback: + self._compose_callback.on_step_end(self.status) + # save image with detection + save_name = self._get_save_image_name(output_dir, + image_path) + logger.info("Detection bbox results save in {}".format( + save_name)) + image.save(save_name, quality=95) + + start = end + return results + + def _get_save_image_name(self, output_dir, image_path): + """ + Get save image name from source image path. + """ + image_name = os.path.split(image_path)[-1] + name, ext = os.path.splitext(image_name) + return os.path.join(output_dir, "{}".format(name)) + ext + + def _get_infer_cfg_and_input_spec(self, + save_dir, + prune_input=True, + kl_quant=False): + image_shape = None + im_shape = [None, 2] + scale_factor = [None, 2] + if self.cfg.architecture in MOT_ARCH: + test_reader_name = 'TestMOTReader' + else: + test_reader_name = 'TestReader' + if 'inputs_def' in self.cfg[test_reader_name]: + inputs_def = self.cfg[test_reader_name]['inputs_def'] + image_shape = inputs_def.get('image_shape', None) + # set image_shape=[None, 3, -1, -1] as default + if image_shape is None: + image_shape = [None, 3, -1, -1] + + if len(image_shape) == 3: + image_shape = [None] + image_shape + else: + im_shape = [image_shape[0], 2] + scale_factor = [image_shape[0], 2] + + if hasattr(self.model, 'deploy'): + self.model.deploy = True + + if 'slim' not in self.cfg: + for layer in self.model.sublayers(): + if hasattr(layer, 'convert_to_deploy'): + layer.convert_to_deploy() + + if hasattr(self.cfg, 'export') and 'fuse_conv_bn' in self.cfg[ + 'export'] and self.cfg['export']['fuse_conv_bn']: + self.model = fuse_conv_bn(self.model) + + export_post_process = self.cfg['export'].get( + 'post_process', False) if hasattr(self.cfg, 'export') else True + export_nms = self.cfg['export'].get('nms', False) if hasattr( + self.cfg, 'export') else True + export_benchmark = self.cfg['export'].get( + 'benchmark', False) if hasattr(self.cfg, 'export') else False + if hasattr(self.model, 'fuse_norm'): + self.model.fuse_norm = self.cfg['TestReader'].get('fuse_normalize', + False) + if hasattr(self.model, 'export_post_process'): + self.model.export_post_process = export_post_process if not export_benchmark else False + if hasattr(self.model, 'export_nms'): + self.model.export_nms = export_nms if not export_benchmark else False + if export_post_process and not export_benchmark: + image_shape = [None] + image_shape[1:] + + # Save infer cfg + _dump_infer_config(self.cfg, + os.path.join(save_dir, 'infer_cfg.yml'), image_shape, + self.model) + + input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image'), + "im_shape": InputSpec( + shape=im_shape, name='im_shape'), + "scale_factor": InputSpec( + shape=scale_factor, name='scale_factor') + }] + if self.cfg.architecture == 'DeepSORT': + input_spec[0].update({ + "crops": InputSpec( + shape=[None, 3, 192, 64], name='crops') + }) + if prune_input: + static_model = paddle.jit.to_static( + self.model, input_spec=input_spec) + # NOTE: dy2st do not pruned program, but jit.save will prune program + # input spec, prune input spec here and save with pruned input spec + pruned_input_spec = _prune_input_spec( + input_spec, static_model.forward.main_program, + static_model.forward.outputs) + else: + static_model = None + pruned_input_spec = input_spec + + # TODO: Hard code, delete it when support prune input_spec. + if self.cfg.architecture == 'PicoDet' and not export_post_process: + pruned_input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image') + }] + if kl_quant: + if self.cfg.architecture == 'PicoDet' or 'ppyoloe' in self.cfg.weights: + pruned_input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image'), + "scale_factor": InputSpec( + shape=scale_factor, name='scale_factor') + }] + elif 'tinypose' in self.cfg.weights: + pruned_input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image') + }] + + return static_model, pruned_input_spec + + def export(self, output_dir='output_inference'): + if hasattr(self.model, 'aux_neck'): + self.model.__delattr__('aux_neck') + if hasattr(self.model, 'aux_head'): + self.model.__delattr__('aux_head') + self.model.eval() + + model_name = os.path.splitext(os.path.split(self.cfg.filename)[-1])[0] + save_dir = os.path.join(output_dir, model_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + static_model, pruned_input_spec = self._get_infer_cfg_and_input_spec( + save_dir) + + # dy2st and save model + if 'slim' not in self.cfg or 'QAT' not in self.cfg['slim_type']: + paddle.jit.save( + static_model, + os.path.join(save_dir, 'model'), + input_spec=pruned_input_spec) + else: + self.cfg.slim.save_quantized_model( + self.model, + os.path.join(save_dir, 'model'), + input_spec=pruned_input_spec) + logger.info("Export model and saved in {}".format(save_dir)) + + def post_quant(self, output_dir='output_inference'): + model_name = os.path.splitext(os.path.split(self.cfg.filename)[-1])[0] + save_dir = os.path.join(output_dir, model_name) + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + for idx, data in enumerate(self.loader): + self.model(data) + if idx == int(self.cfg.get('quant_batch_num', 10)): + break + + # TODO: support prune input_spec + kl_quant = True if hasattr(self.cfg.slim, 'ptq') else False + _, pruned_input_spec = self._get_infer_cfg_and_input_spec( + save_dir, prune_input=False, kl_quant=kl_quant) + + self.cfg.slim.save_quantized_model( + self.model, + os.path.join(save_dir, 'model'), + input_spec=pruned_input_spec) + logger.info("Export Post-Quant model and saved in {}".format(save_dir)) + + def _flops(self, loader): + if hasattr(self.model, 'aux_neck'): + self.model.__delattr__('aux_neck') + if hasattr(self.model, 'aux_head'): + self.model.__delattr__('aux_head') + self.model.eval() + try: + import paddleslim + except Exception as e: + logger.warning( + 'Unable to calculate flops, please install paddleslim, for example: `pip install paddleslim`' + ) + return + + from paddleslim.analysis import dygraph_flops as flops + input_data = None + for data in loader: + input_data = data + break + + input_spec = [{ + "image": input_data['image'][0].unsqueeze(0), + "im_shape": input_data['im_shape'][0].unsqueeze(0), + "scale_factor": input_data['scale_factor'][0].unsqueeze(0) + }] + flops = flops(self.model, input_spec) / (1000**3) + logger.info(" Model FLOPs : {:.6f}G. (image shape is {})".format( + flops, input_data['image'][0].unsqueeze(0).shape)) + + def parse_mot_images(self, cfg): + import glob + # for quant + dataset_dir = cfg['EvalMOTDataset'].dataset_dir + data_root = cfg['EvalMOTDataset'].data_root + data_root = '{}/{}'.format(dataset_dir, data_root) + seqs = os.listdir(data_root) + seqs.sort() + all_images = [] + for seq in seqs: + infer_dir = os.path.join(data_root, seq) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + images = set() + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + images.sort() + assert len(images) > 0, "no image found in {}".format(infer_dir) + all_images.extend(images) + logger.info("Found {} inference images in total.".format( + len(images))) + return all_images diff --git a/PaddleDetection-release-2.6/ppdet/engine/trainer_cot.py b/PaddleDetection-release-2.6/ppdet/engine/trainer_cot.py new file mode 100644 index 0000000000000000000000000000000000000000..38d95fabfd0d19312af3cc40309cc8051ff538c3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/trainer_cot.py @@ -0,0 +1,42 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ppdet.core.workspace import create +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +from . import Trainer +__all__ = ['TrainerCot'] + +class TrainerCot(Trainer): + """ + Trainer for label-cotuning + calculate the relationship between base_classes and novel_classes + """ + def __init__(self, cfg, mode='train'): + super(TrainerCot, self).__init__(cfg, mode) + self.cotuning_init() + + def cotuning_init(self): + num_classes_novel = self.cfg['num_classes'] + + self.load_weights(self.cfg.pretrain_weights) + + self.model.eval() + relationship = self.model.relationship_learning(self.loader, num_classes_novel) + + self.model.init_cot_head(relationship) + self.optimizer = create('OptimizerBuilder')(self.lr, self.model) + + diff --git a/PaddleDetection-release-2.6/ppdet/engine/trainer_ssod.py b/PaddleDetection-release-2.6/ppdet/engine/trainer_ssod.py new file mode 100644 index 0000000000000000000000000000000000000000..ef2409b09b4e599200abf3e670577926bd07792d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/engine/trainer_ssod.py @@ -0,0 +1,475 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import copy +import time +import typing +import numpy as np + +import paddle +import paddle.nn as nn +import paddle.distributed as dist +from paddle.distributed import fleet +from ppdet.optimizer import ModelEMA, SimpleModelEMA + +from ppdet.core.workspace import create +from ppdet.utils.checkpoint import load_weight, load_pretrain_weight +import ppdet.utils.stats as stats +from ppdet.utils import profiler +from ppdet.modeling.ssod.utils import align_weak_strong_shape +from .trainer import Trainer + +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +__all__ = ['Trainer_DenseTeacher'] + + +class Trainer_DenseTeacher(Trainer): + def __init__(self, cfg, mode='train'): + self.cfg = cfg + assert mode.lower() in ['train', 'eval', 'test'], \ + "mode should be 'train', 'eval' or 'test'" + self.mode = mode.lower() + self.optimizer = None + self.is_loaded_weights = False + self.use_amp = self.cfg.get('amp', False) + self.amp_level = self.cfg.get('amp_level', 'O1') + self.custom_white_list = self.cfg.get('custom_white_list', None) + self.custom_black_list = self.cfg.get('custom_black_list', None) + + # build data loader + capital_mode = self.mode.capitalize() + self.dataset = self.cfg['{}Dataset'.format(capital_mode)] = create( + '{}Dataset'.format(capital_mode))() + + if self.mode == 'train': + self.dataset_unlabel = self.cfg['UnsupTrainDataset'] = create( + 'UnsupTrainDataset') + self.loader = create('SemiTrainReader')( + self.dataset, self.dataset_unlabel, cfg.worker_num) + + # build model + if 'model' not in self.cfg: + self.model = create(cfg.architecture) + else: + self.model = self.cfg.model + self.is_loaded_weights = True + + # EvalDataset build with BatchSampler to evaluate in single device + # TODO: multi-device evaluate + if self.mode == 'eval': + self._eval_batch_sampler = paddle.io.BatchSampler( + self.dataset, batch_size=self.cfg.EvalReader['batch_size']) + # If metric is VOC, need to be set collate_batch=False. + if cfg.metric == 'VOC': + cfg['EvalReader']['collate_batch'] = False + self.loader = create('EvalReader')(self.dataset, cfg.worker_num, + self._eval_batch_sampler) + # TestDataset build after user set images, skip loader creation here + + # build optimizer in train mode + if self.mode == 'train': + steps_per_epoch = len(self.loader) + if steps_per_epoch < 1: + logger.warning( + "Samples in dataset are less than batch_size, please set smaller batch_size in TrainReader." + ) + self.lr = create('LearningRate')(steps_per_epoch) + self.optimizer = create('OptimizerBuilder')(self.lr, self.model) + + # Unstructured pruner is only enabled in the train mode. + if self.cfg.get('unstructured_prune'): + self.pruner = create('UnstructuredPruner')(self.model, + steps_per_epoch) + if self.use_amp and self.amp_level == 'O2': + self.model, self.optimizer = paddle.amp.decorate( + models=self.model, + optimizers=self.optimizer, + level=self.amp_level) + + self.use_ema = ('use_ema' in cfg and cfg['use_ema']) + if self.use_ema: + ema_decay = self.cfg.get('ema_decay', 0.9998) + ema_decay_type = self.cfg.get('ema_decay_type', 'threshold') + cycle_epoch = self.cfg.get('cycle_epoch', -1) + ema_black_list = self.cfg.get('ema_black_list', None) + self.ema = ModelEMA( + self.model, + decay=ema_decay, + ema_decay_type=ema_decay_type, + cycle_epoch=cycle_epoch, + ema_black_list=ema_black_list) + self.ema_start_iters = self.cfg.get('ema_start_iters', 0) + + # simple_ema for SSOD + self.use_simple_ema = ('use_simple_ema' in cfg and + cfg['use_simple_ema']) + if self.use_simple_ema: + self.use_ema = True + ema_decay = self.cfg.get('ema_decay', 0.9996) + self.ema = SimpleModelEMA(self.model, decay=ema_decay) + self.ema_start_iters = self.cfg.get('ema_start_iters', 0) + + self._nranks = dist.get_world_size() + self._local_rank = dist.get_rank() + + self.status = {} + + self.start_epoch = 0 + self.end_epoch = 0 if 'epoch' not in cfg else cfg.epoch + + # initial default callbacks + self._init_callbacks() + + # initial default metrics + self._init_metrics() + self._reset_metrics() + + def load_weights(self, weights): + if self.is_loaded_weights: + return + self.start_epoch = 0 + load_pretrain_weight(self.model, weights) + load_pretrain_weight(self.ema.model, weights) + logger.info("Load weights {} to start training for teacher and student". + format(weights)) + + def resume_weights(self, weights, exchange=True): + # support Distill resume weights + if hasattr(self.model, 'student_model'): + self.start_epoch = load_weight(self.model.student_model, weights, + self.optimizer, exchange) + else: + self.start_epoch = load_weight(self.model, weights, self.optimizer, + self.ema + if self.use_ema else None, exchange) + logger.debug("Resume weights of epoch {}".format(self.start_epoch)) + + def train(self, validate=False): + self.semi_start_iters = self.cfg.get('semi_start_iters', 5000) + Init_mark = False + if validate: + self.cfg['EvalDataset'] = self.cfg.EvalDataset = create( + "EvalDataset")() + + sync_bn = (getattr(self.cfg, 'norm_type', None) == 'sync_bn' and + self.cfg.use_gpu and self._nranks > 1) + if sync_bn: + self.model = paddle.nn.SyncBatchNorm.convert_sync_batchnorm( + self.model) + + if self.cfg.get('fleet', False): + self.model = fleet.distributed_model(self.model) + self.optimizer = fleet.distributed_optimizer(self.optimizer) + elif self._nranks > 1: + find_unused_parameters = self.cfg[ + 'find_unused_parameters'] if 'find_unused_parameters' in self.cfg else False + self.model = paddle.DataParallel( + self.model, find_unused_parameters=find_unused_parameters) + self.ema.model = paddle.DataParallel( + self.ema.model, find_unused_parameters=find_unused_parameters) + + self.status.update({ + 'epoch_id': self.start_epoch, + 'step_id': 0, + 'steps_per_epoch': len(self.loader), + 'exchange_save_model': True, + }) + # Note: exchange_save_model + # in DenseTeacher SSOD, the teacher model will be higher, so exchange when saving pdparams + + self.status['batch_time'] = stats.SmoothedValue( + self.cfg.log_iter, fmt='{avg:.4f}') + self.status['data_time'] = stats.SmoothedValue( + self.cfg.log_iter, fmt='{avg:.4f}') + self.status['training_staus'] = stats.TrainingStats(self.cfg.log_iter) + + if self.cfg.get('print_flops', False): + flops_loader = create('{}Reader'.format(self.mode.capitalize()))( + self.dataset, self.cfg.worker_num) + self._flops(flops_loader) + profiler_options = self.cfg.get('profiler_options', None) + self._compose_callback.on_train_begin(self.status) + + train_cfg = self.cfg.DenseTeacher['train_cfg'] + concat_sup_data = train_cfg.get('concat_sup_data', True) + + for param in self.ema.model.parameters(): + param.stop_gradient = True + + for epoch_id in range(self.start_epoch, self.cfg.epoch): + self.status['mode'] = 'train' + self.status['epoch_id'] = epoch_id + self._compose_callback.on_epoch_begin(self.status) + self.loader.dataset_label.set_epoch(epoch_id) + self.loader.dataset_unlabel.set_epoch(epoch_id) + iter_tic = time.time() + loss_dict = { + 'loss': paddle.to_tensor([0]), + 'loss_sup_sum': paddle.to_tensor([0]), + 'loss_unsup_sum': paddle.to_tensor([0]), + 'fg_sum': paddle.to_tensor([0]), + } + if self._nranks > 1: + for k in self.model._layers.get_loss_keys(): + loss_dict.update({k: paddle.to_tensor([0.])}) + for k in self.model._layers.get_loss_keys(): + loss_dict.update({'distill_' + k: paddle.to_tensor([0.])}) + else: + for k in self.model.get_loss_keys(): + loss_dict.update({k: paddle.to_tensor([0.])}) + for k in self.model.get_loss_keys(): + loss_dict.update({'distill_' + k: paddle.to_tensor([0.])}) + + # Note: for step_id, data in enumerate(self.loader): # enumerate bug + for step_id in range(len(self.loader)): + data = next(self.loader) + + self.model.train() + self.ema.model.eval() + data_sup_w, data_sup_s, data_unsup_w, data_unsup_s = data + + self.status['data_time'].update(time.time() - iter_tic) + self.status['step_id'] = step_id + profiler.add_profiler_step(profiler_options) + self._compose_callback.on_step_begin(self.status) + + if data_sup_w['image'].shape != data_sup_s['image'].shape: + data_sup_w, data_sup_s = align_weak_strong_shape(data_sup_w, + data_sup_s) + + data_sup_w['epoch_id'] = epoch_id + data_sup_s['epoch_id'] = epoch_id + if concat_sup_data: + for k, v in data_sup_s.items(): + if k in ['epoch_id']: + continue + data_sup_s[k] = paddle.concat([v, data_sup_w[k]]) + loss_dict_sup = self.model(data_sup_s) + else: + loss_dict_sup_w = self.model(data_sup_w) + loss_dict_sup = self.model(data_sup_s) + for k, v in loss_dict_sup_w.items(): + loss_dict_sup[k] = (loss_dict_sup[k] + v) * 0.5 + + losses_sup = loss_dict_sup['loss'] * train_cfg['sup_weight'] + losses_sup.backward() + + losses = losses_sup.detach() + loss_dict.update(loss_dict_sup) + loss_dict.update({'loss_sup_sum': loss_dict['loss']}) + + curr_iter = len(self.loader) * epoch_id + step_id + st_iter = self.semi_start_iters + if curr_iter == st_iter: + logger.info("***" * 30) + logger.info('Semi starting ...') + logger.info("***" * 30) + if curr_iter > st_iter: + unsup_weight = train_cfg['unsup_weight'] + if train_cfg['suppress'] == 'linear': + tar_iter = st_iter * 2 + if curr_iter <= tar_iter: + unsup_weight *= (curr_iter - st_iter) / st_iter + elif train_cfg['suppress'] == 'exp': + tar_iter = st_iter + 2000 + if curr_iter <= tar_iter: + scale = np.exp((curr_iter - tar_iter) / 1000) + unsup_weight *= scale + elif train_cfg['suppress'] == 'step': + tar_iter = st_iter * 2 + if curr_iter <= tar_iter: + unsup_weight *= 0.25 + else: + raise ValueError + + if data_unsup_w['image'].shape != data_unsup_s[ + 'image'].shape: + data_unsup_w, data_unsup_s = align_weak_strong_shape( + data_unsup_w, data_unsup_s) + + data_unsup_w['epoch_id'] = epoch_id + data_unsup_s['epoch_id'] = epoch_id + + data_unsup_s['get_data'] = True + student_preds = self.model(data_unsup_s) + + with paddle.no_grad(): + data_unsup_w['is_teacher'] = True + teacher_preds = self.ema.model(data_unsup_w) + + train_cfg['curr_iter'] = curr_iter + train_cfg['st_iter'] = st_iter + if self._nranks > 1: + loss_dict_unsup = self.model._layers.get_ssod_loss( + student_preds, teacher_preds, train_cfg) + else: + loss_dict_unsup = self.model.get_ssod_loss( + student_preds, teacher_preds, train_cfg) + + fg_num = loss_dict_unsup["fg_sum"] + del loss_dict_unsup["fg_sum"] + distill_weights = train_cfg['loss_weight'] + loss_dict_unsup = { + k: v * distill_weights[k] + for k, v in loss_dict_unsup.items() + } + + losses_unsup = sum([ + metrics_value + for metrics_value in loss_dict_unsup.values() + ]) * unsup_weight + losses_unsup.backward() + + loss_dict.update(loss_dict_unsup) + loss_dict.update({'loss_unsup_sum': losses_unsup}) + losses += losses_unsup.detach() + loss_dict.update({"fg_sum": fg_num}) + loss_dict['loss'] = losses + + self.optimizer.step() + curr_lr = self.optimizer.get_lr() + self.lr.step() + self.optimizer.clear_grad() + self.status['learning_rate'] = curr_lr + if self._nranks < 2 or self._local_rank == 0: + self.status['training_staus'].update(loss_dict) + + self.status['batch_time'].update(time.time() - iter_tic) + self._compose_callback.on_step_end(self.status) + # Note: ema_start_iters + if self.use_ema and curr_iter == self.ema_start_iters: + logger.info("***" * 30) + logger.info('EMA starting ...') + logger.info("***" * 30) + self.ema.update(self.model, decay=0) + elif self.use_ema and curr_iter > self.ema_start_iters: + self.ema.update(self.model) + iter_tic = time.time() + + is_snapshot = (self._nranks < 2 or self._local_rank == 0) \ + and ((epoch_id + 1) % self.cfg.snapshot_epoch == 0 or epoch_id == self.end_epoch - 1) + if is_snapshot and self.use_ema: + # apply ema weight on model + weight = copy.deepcopy(self.ema.model.state_dict()) + for k, v in weight.items(): + if paddle.is_floating_point(v): + weight[k].stop_gradient = True + self.status['weight'] = weight + + self._compose_callback.on_epoch_end(self.status) + + if validate and is_snapshot: + if not hasattr(self, '_eval_loader'): + # build evaluation dataset and loader + self._eval_dataset = self.cfg.EvalDataset + self._eval_batch_sampler = \ + paddle.io.BatchSampler( + self._eval_dataset, + batch_size=self.cfg.EvalReader['batch_size']) + # If metric is VOC, need to be set collate_batch=False. + if self.cfg.metric == 'VOC': + self.cfg['EvalReader']['collate_batch'] = False + self._eval_loader = create('EvalReader')( + self._eval_dataset, + self.cfg.worker_num, + batch_sampler=self._eval_batch_sampler) + # if validation in training is enabled, metrics should be re-init + # Init_mark makes sure this code will only execute once + if validate and Init_mark == False: + Init_mark = True + self._init_metrics(validate=validate) + self._reset_metrics() + + with paddle.no_grad(): + self.status['save_best_model'] = True + self._eval_with_loader(self._eval_loader) + + if is_snapshot and self.use_ema: + self.status.pop('weight') + + self._compose_callback.on_train_end(self.status) + + def evaluate(self): + # get distributed model + if self.cfg.get('fleet', False): + self.model = fleet.distributed_model(self.model) + self.optimizer = fleet.distributed_optimizer(self.optimizer) + elif self._nranks > 1: + find_unused_parameters = self.cfg[ + 'find_unused_parameters'] if 'find_unused_parameters' in self.cfg else False + self.model = paddle.DataParallel( + self.model, find_unused_parameters=find_unused_parameters) + with paddle.no_grad(): + self._eval_with_loader(self.loader) + + def _eval_with_loader(self, loader): + sample_num = 0 + tic = time.time() + self._compose_callback.on_epoch_begin(self.status) + self.status['mode'] = 'eval' + + test_cfg = self.cfg.DenseTeacher['test_cfg'] + if test_cfg['inference_on'] == 'teacher': + logger.info("***** teacher model evaluating *****") + eval_model = self.ema.model + else: + logger.info("***** student model evaluating *****") + eval_model = self.model + + eval_model.eval() + if self.cfg.get('print_flops', False): + flops_loader = create('{}Reader'.format(self.mode.capitalize()))( + self.dataset, self.cfg.worker_num, self._eval_batch_sampler) + self._flops(flops_loader) + for step_id, data in enumerate(loader): + self.status['step_id'] = step_id + self._compose_callback.on_step_begin(self.status) + # forward + if self.use_amp: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu or self.cfg.use_mlu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + outs = eval_model(data) + else: + outs = eval_model(data) + + # update metrics + for metric in self._metrics: + metric.update(data, outs) + + # multi-scale inputs: all inputs have same im_id + if isinstance(data, typing.Sequence): + sample_num += data[0]['im_id'].numpy().shape[0] + else: + sample_num += data['im_id'].numpy().shape[0] + self._compose_callback.on_step_end(self.status) + + self.status['sample_num'] = sample_num + self.status['cost_time'] = time.time() - tic + + # accumulate metric to log out + for metric in self._metrics: + metric.accumulate() + metric.log() + self._compose_callback.on_epoch_end(self.status) + # reset metric states for metric may performed multiple times + self._reset_metrics() diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/README.md b/PaddleDetection-release-2.6/ppdet/ext_op/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0d67062ade859b0ca025d6ad35d9a630cf4ec523 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/README.md @@ -0,0 +1,35 @@ +# 自定义OP编译 +旋转框IOU计算OP是参考[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/new_cpp_op_cn.html) 。 + +## 1. 环境依赖 +- Paddle >= 2.0.1 +- gcc 8.2 + +## 2. 安装 +``` +python setup.py install +``` + +编译完成后即可使用,以下为`rbox_iou`的使用示例 +``` +# 引入自定义op +from ext_op import rbox_iou + +paddle.set_device('gpu:0') +paddle.disable_static() + +rbox1 = np.random.rand(13000, 5) +rbox2 = np.random.rand(7, 5) + +pd_rbox1 = paddle.to_tensor(rbox1) +pd_rbox2 = paddle.to_tensor(rbox2) + +iou = rbox_iou(pd_rbox1, pd_rbox2) +print('iou', iou) +``` + +## 3. 单元测试 +可以通过执行单元测试来确认自定义算子功能的正确性,执行单元测试的示例如下所示: +``` +python unittest/test_matched_rbox_iou.py +``` diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cc b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cc new file mode 100644 index 0000000000000000000000000000000000000000..b16e8c1f2ef93c322fe062af1735189d3eb98f47 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cc @@ -0,0 +1,91 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/ + +#include "../rbox_iou/rbox_iou_utils.h" +#include "paddle/extension.h" + +template +void matched_rbox_iou_cpu_kernel(const int rbox_num, const T *rbox1_data_ptr, + const T *rbox2_data_ptr, T *output_data_ptr) { + + int i; + for (i = 0; i < rbox_num; i++) { + output_data_ptr[i] = + rbox_iou_single(rbox1_data_ptr + i * 5, rbox2_data_ptr + i * 5); + } +} + +#define CHECK_INPUT_CPU(x) \ + PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.") + +std::vector +MatchedRboxIouCPUForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_CPU(rbox1); + CHECK_INPUT_CPU(rbox2); + PD_CHECK(rbox1.shape()[0] == rbox2.shape()[0], "inputs must be same dim"); + + auto rbox_num = rbox1.shape()[0]; + auto output = paddle::empty({rbox_num}, rbox1.dtype(), paddle::CPUPlace()); + + PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "matched_rbox_iou_cpu_kernel", ([&] { + matched_rbox_iou_cpu_kernel( + rbox_num, rbox1.data(), + rbox2.data(), output.data()); + })); + + return {output}; +} + +#ifdef PADDLE_WITH_CUDA +std::vector +MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2); +#endif + +#define CHECK_INPUT_SAME(x1, x2) \ + PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.") + +std::vector MatchedRboxIouForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_SAME(rbox1, rbox2); + if (rbox1.is_cpu()) { + return MatchedRboxIouCPUForward(rbox1, rbox2); +#ifdef PADDLE_WITH_CUDA + } else if (rbox1.is_gpu()) { + return MatchedRboxIouCUDAForward(rbox1, rbox2); +#endif + } +} + +std::vector> +MatchedRboxIouInferShape(std::vector rbox1_shape, + std::vector rbox2_shape) { + return {{rbox1_shape[0]}}; +} + +std::vector MatchedRboxIouInferDtype(paddle::DataType t1, + paddle::DataType t2) { + return {t1}; +} + +PD_BUILD_OP(matched_rbox_iou) + .Inputs({"RBOX1", "RBOX2"}) + .Outputs({"Output"}) + .SetKernelFn(PD_KERNEL(MatchedRboxIouForward)) + .SetInferShapeFn(PD_INFER_SHAPE(MatchedRboxIouInferShape)) + .SetInferDtypeFn(PD_INFER_DTYPE(MatchedRboxIouInferDtype)); diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cu b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cu new file mode 100644 index 0000000000000000000000000000000000000000..53454d106392f208e72a5e1d1fd6e9bcf609927f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/matched_rbox_iou/matched_rbox_iou.cu @@ -0,0 +1,58 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/ + +#include "../rbox_iou/rbox_iou_utils.h" +#include "paddle/extension.h" + +template +__global__ void +matched_rbox_iou_cuda_kernel(const int rbox_num, const T *rbox1_data_ptr, + const T *rbox2_data_ptr, T *output_data_ptr) { + for (int tid = blockIdx.x * blockDim.x + threadIdx.x; tid < rbox_num; + tid += blockDim.x * gridDim.x) { + output_data_ptr[tid] = + rbox_iou_single(rbox1_data_ptr + tid * 5, rbox2_data_ptr + tid * 5); + } +} + +#define CHECK_INPUT_GPU(x) \ + PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.") + +std::vector +MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_GPU(rbox1); + CHECK_INPUT_GPU(rbox2); + PD_CHECK(rbox1.shape()[0] == rbox2.shape()[0], "inputs must be same dim"); + + auto rbox_num = rbox1.shape()[0]; + + auto output = paddle::empty({rbox_num}, rbox1.dtype(), paddle::GPUPlace()); + + const int thread_per_block = 512; + const int block_per_grid = CeilDiv(rbox_num, thread_per_block); + + PD_DISPATCH_FLOATING_TYPES( + rbox1.type(), "matched_rbox_iou_cuda_kernel", ([&] { + matched_rbox_iou_cuda_kernel< + data_t><<>>( + rbox_num, rbox1.data(), rbox2.data(), + output.data()); + })); + + return {output}; +} diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cc b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cc new file mode 100644 index 0000000000000000000000000000000000000000..44f4eb62b851736176f7fade903248e6c95c6d83 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cc @@ -0,0 +1,121 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "../rbox_iou/rbox_iou_utils.h" +#include "paddle/extension.h" + +template +void nms_rotated_cpu_kernel(const T *boxes_data, const float threshold, + const int64_t num_boxes, int64_t *num_keep_boxes, + int64_t *output_data) { + + int num_masks = CeilDiv(num_boxes, 64); + std::vector masks(num_masks, 0); + for (int64_t i = 0; i < num_boxes; ++i) { + if (masks[i / 64] & 1ULL << (i % 64)) + continue; + T box_1[5]; + for (int k = 0; k < 5; ++k) { + box_1[k] = boxes_data[i * 5 + k]; + } + for (int64_t j = i + 1; j < num_boxes; ++j) { + if (masks[j / 64] & 1ULL << (j % 64)) + continue; + T box_2[5]; + for (int k = 0; k < 5; ++k) { + box_2[k] = boxes_data[j * 5 + k]; + } + if (rbox_iou_single(box_1, box_2) > threshold) { + masks[j / 64] |= 1ULL << (j % 64); + } + } + } + int64_t output_data_idx = 0; + for (int64_t i = 0; i < num_boxes; ++i) { + if (masks[i / 64] & 1ULL << (i % 64)) + continue; + output_data[output_data_idx++] = i; + } + *num_keep_boxes = output_data_idx; + for (; output_data_idx < num_boxes; ++output_data_idx) { + output_data[output_data_idx] = 0; + } +} + +#define CHECK_INPUT_CPU(x) \ + PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.") + +std::vector NMSRotatedCPUForward(const paddle::Tensor &boxes, + const paddle::Tensor &scores, + float threshold) { + CHECK_INPUT_CPU(boxes); + CHECK_INPUT_CPU(scores); + + auto num_boxes = boxes.shape()[0]; + + auto order_t = + std::get<1>(paddle::argsort(scores, /* axis=*/0, /* descending=*/true)); + auto boxes_sorted = paddle::gather(boxes, order_t, /* axis=*/0); + + auto keep = + paddle::empty({num_boxes}, paddle::DataType::INT64, paddle::CPUPlace()); + int64_t num_keep_boxes = 0; + + PD_DISPATCH_FLOATING_TYPES(boxes.type(), "nms_rotated_cpu_kernel", ([&] { + nms_rotated_cpu_kernel( + boxes_sorted.data(), threshold, + num_boxes, &num_keep_boxes, + keep.data()); + })); + + keep = keep.slice(0, num_keep_boxes); + return {paddle::gather(order_t, keep, /* axis=*/0)}; +} + +#ifdef PADDLE_WITH_CUDA +std::vector NMSRotatedCUDAForward(const paddle::Tensor &boxes, + const paddle::Tensor &scores, + float threshold); +#endif + +std::vector NMSRotatedForward(const paddle::Tensor &boxes, + const paddle::Tensor &scores, + float threshold) { + if (boxes.is_cpu()) { + return NMSRotatedCPUForward(boxes, scores, threshold); +#ifdef PADDLE_WITH_CUDA + } else if (boxes.is_gpu()) { + return NMSRotatedCUDAForward(boxes, scores, threshold); +#endif + } +} + +std::vector> +NMSRotatedInferShape(std::vector boxes_shape, + std::vector scores_shape) { + return {{-1}}; +} + +std::vector NMSRotatedInferDtype(paddle::DataType t1, + paddle::DataType t2) { + return {paddle::DataType::INT64}; +} + +PD_BUILD_OP(nms_rotated) + .Inputs({"Boxes", "Scores"}) + .Outputs({"Output"}) + .Attrs({"threshold: float"}) + .SetKernelFn(PD_KERNEL(NMSRotatedForward)) + .SetInferShapeFn(PD_INFER_SHAPE(NMSRotatedInferShape)) + .SetInferDtypeFn(PD_INFER_DTYPE(NMSRotatedInferDtype)); \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cu b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cu new file mode 100644 index 0000000000000000000000000000000000000000..d20dddb5739619de9fc616c1e0d59941952e73c5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/nms_rotated/nms_rotated.cu @@ -0,0 +1,96 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "../rbox_iou/rbox_iou_utils.h" +#include "paddle/extension.h" + +static const int64_t threadsPerBlock = sizeof(int64_t) * 8; + +template +__global__ void +nms_rotated_cuda_kernel(const T *boxes_data, const float threshold, + const int64_t num_boxes, int64_t *masks) { + auto raw_start = blockIdx.y; + auto col_start = blockIdx.x; + if (raw_start > col_start) + return; + const int raw_last_storage = + min(num_boxes - raw_start * threadsPerBlock, threadsPerBlock); + const int col_last_storage = + min(num_boxes - col_start * threadsPerBlock, threadsPerBlock); + if (threadIdx.x < raw_last_storage) { + int64_t mask = 0; + auto current_box_idx = raw_start * threadsPerBlock + threadIdx.x; + const T *current_box = boxes_data + current_box_idx * 5; + for (int i = 0; i < col_last_storage; ++i) { + const T *target_box = boxes_data + (col_start * threadsPerBlock + i) * 5; + if (rbox_iou_single(current_box, target_box) > threshold) { + mask |= 1ULL << i; + } + } + const int blocks_per_line = CeilDiv(num_boxes, threadsPerBlock); + masks[current_box_idx * blocks_per_line + col_start] = mask; + } +} + +#define CHECK_INPUT_GPU(x) \ + PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.") + +std::vector NMSRotatedCUDAForward(const paddle::Tensor &boxes, + const paddle::Tensor &scores, + float threshold) { + CHECK_INPUT_GPU(boxes); + CHECK_INPUT_GPU(scores); + + auto num_boxes = boxes.shape()[0]; + auto order_t = + std::get<1>(paddle::argsort(scores, /* axis=*/0, /* descending=*/true)); + auto boxes_sorted = paddle::gather(boxes, order_t, /* axis=*/0); + + const auto blocks_per_line = CeilDiv(num_boxes, threadsPerBlock); + dim3 block(threadsPerBlock); + dim3 grid(blocks_per_line, blocks_per_line); + auto mask_dev = paddle::empty({num_boxes * blocks_per_line}, + paddle::DataType::INT64, paddle::GPUPlace()); + + PD_DISPATCH_FLOATING_TYPES( + boxes.type(), "nms_rotated_cuda_kernel", ([&] { + nms_rotated_cuda_kernel<<>>( + boxes_sorted.data(), threshold, num_boxes, + mask_dev.data()); + })); + + auto mask_host = mask_dev.copy_to(paddle::CPUPlace(), true); + auto keep_host = + paddle::empty({num_boxes}, paddle::DataType::INT64, paddle::CPUPlace()); + int64_t *keep_host_ptr = keep_host.data(); + int64_t *mask_host_ptr = mask_host.data(); + std::vector remv(blocks_per_line); + int64_t last_box_num = 0; + for (int64_t i = 0; i < num_boxes; ++i) { + auto remv_element_id = i / threadsPerBlock; + auto remv_bit_id = i % threadsPerBlock; + if (!(remv[remv_element_id] & 1ULL << remv_bit_id)) { + keep_host_ptr[last_box_num++] = i; + int64_t *current_mask = mask_host_ptr + i * blocks_per_line; + for (auto j = remv_element_id; j < blocks_per_line; ++j) { + remv[j] |= current_mask[j]; + } + } + } + + keep_host = keep_host.slice(0, last_box_num); + auto keep_dev = keep_host.copy_to(paddle::GPUPlace(), true); + return {paddle::gather(order_t, keep_dev, /* axis=*/0)}; +} \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cc b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cc new file mode 100644 index 0000000000000000000000000000000000000000..c8e7528d35857eb39b8be441558876a4130a7ce6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cc @@ -0,0 +1,95 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/ + +#include "paddle/extension.h" +#include "rbox_iou_utils.h" + +template +void rbox_iou_cpu_kernel(const int rbox1_num, const int rbox2_num, + const T *rbox1_data_ptr, const T *rbox2_data_ptr, + T *output_data_ptr) { + + int i, j; + for (i = 0; i < rbox1_num; i++) { + for (j = 0; j < rbox2_num; j++) { + int offset = i * rbox2_num + j; + output_data_ptr[offset] = + rbox_iou_single(rbox1_data_ptr + i * 5, rbox2_data_ptr + j * 5); + } + } +} + +#define CHECK_INPUT_CPU(x) \ + PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.") + +std::vector RboxIouCPUForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_CPU(rbox1); + CHECK_INPUT_CPU(rbox2); + + auto rbox1_num = rbox1.shape()[0]; + auto rbox2_num = rbox2.shape()[0]; + + auto output = + paddle::empty({rbox1_num, rbox2_num}, rbox1.dtype(), paddle::CPUPlace()); + + PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "rbox_iou_cpu_kernel", ([&] { + rbox_iou_cpu_kernel( + rbox1_num, rbox2_num, rbox1.data(), + rbox2.data(), output.data()); + })); + + return {output}; +} + +#ifdef PADDLE_WITH_CUDA +std::vector RboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2); +#endif + +#define CHECK_INPUT_SAME(x1, x2) \ + PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.") + +std::vector RboxIouForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_SAME(rbox1, rbox2); + if (rbox1.is_cpu()) { + return RboxIouCPUForward(rbox1, rbox2); +#ifdef PADDLE_WITH_CUDA + } else if (rbox1.is_gpu()) { + return RboxIouCUDAForward(rbox1, rbox2); +#endif + } +} + +std::vector> +RboxIouInferShape(std::vector rbox1_shape, + std::vector rbox2_shape) { + return {{rbox1_shape[0], rbox2_shape[0]}}; +} + +std::vector RboxIouInferDtype(paddle::DataType t1, + paddle::DataType t2) { + return {t1}; +} + +PD_BUILD_OP(rbox_iou) + .Inputs({"RBox1", "RBox2"}) + .Outputs({"Output"}) + .SetKernelFn(PD_KERNEL(RboxIouForward)) + .SetInferShapeFn(PD_INFER_SHAPE(RboxIouInferShape)) + .SetInferDtypeFn(PD_INFER_DTYPE(RboxIouInferDtype)); diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cu b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cu new file mode 100644 index 0000000000000000000000000000000000000000..baedb6dedba6edbf207f4c68e84ab0b9b03b28ac --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou.cu @@ -0,0 +1,109 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/ + +#include "paddle/extension.h" +#include "rbox_iou_utils.h" + +// 2D block with 32 * 16 = 512 threads per block +const int BLOCK_DIM_X = 32; +const int BLOCK_DIM_Y = 16; + +template +__global__ void rbox_iou_cuda_kernel(const int rbox1_num, const int rbox2_num, + const T *rbox1_data_ptr, + const T *rbox2_data_ptr, + T *output_data_ptr) { + + // get row_start and col_start + const int rbox1_block_idx = blockIdx.x * blockDim.x; + const int rbox2_block_idx = blockIdx.y * blockDim.y; + + const int rbox1_thread_num = min(rbox1_num - rbox1_block_idx, blockDim.x); + const int rbox2_thread_num = min(rbox2_num - rbox2_block_idx, blockDim.y); + + __shared__ T block_boxes1[BLOCK_DIM_X * 5]; + __shared__ T block_boxes2[BLOCK_DIM_Y * 5]; + + // It's safe to copy using threadIdx.x since BLOCK_DIM_X >= BLOCK_DIM_Y + if (threadIdx.x < rbox1_thread_num && threadIdx.y == 0) { + block_boxes1[threadIdx.x * 5 + 0] = + rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 0]; + block_boxes1[threadIdx.x * 5 + 1] = + rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 1]; + block_boxes1[threadIdx.x * 5 + 2] = + rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 2]; + block_boxes1[threadIdx.x * 5 + 3] = + rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 3]; + block_boxes1[threadIdx.x * 5 + 4] = + rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 4]; + } + + // threadIdx.x < BLOCK_DIM_Y=rbox2_thread_num, just use same condition as + // above: threadIdx.y == 0 + if (threadIdx.x < rbox2_thread_num && threadIdx.y == 0) { + block_boxes2[threadIdx.x * 5 + 0] = + rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 0]; + block_boxes2[threadIdx.x * 5 + 1] = + rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 1]; + block_boxes2[threadIdx.x * 5 + 2] = + rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 2]; + block_boxes2[threadIdx.x * 5 + 3] = + rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 3]; + block_boxes2[threadIdx.x * 5 + 4] = + rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 4]; + } + + // sync + __syncthreads(); + + if (threadIdx.x < rbox1_thread_num && threadIdx.y < rbox2_thread_num) { + int offset = (rbox1_block_idx + threadIdx.x) * rbox2_num + rbox2_block_idx + + threadIdx.y; + output_data_ptr[offset] = rbox_iou_single( + block_boxes1 + threadIdx.x * 5, block_boxes2 + threadIdx.y * 5); + } +} + +#define CHECK_INPUT_GPU(x) \ + PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.") + +std::vector RboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_GPU(rbox1); + CHECK_INPUT_GPU(rbox2); + + auto rbox1_num = rbox1.shape()[0]; + auto rbox2_num = rbox2.shape()[0]; + + auto output = + paddle::empty({rbox1_num, rbox2_num}, rbox1.dtype(), paddle::GPUPlace()); + + const int blocks_x = CeilDiv(rbox1_num, BLOCK_DIM_X); + const int blocks_y = CeilDiv(rbox2_num, BLOCK_DIM_Y); + + dim3 blocks(blocks_x, blocks_y); + dim3 threads(BLOCK_DIM_X, BLOCK_DIM_Y); + + PD_DISPATCH_FLOATING_TYPES( + rbox1.type(), "rbox_iou_cuda_kernel", ([&] { + rbox_iou_cuda_kernel<<>>( + rbox1_num, rbox2_num, rbox1.data(), rbox2.data(), + output.data()); + })); + + return {output}; +} diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou_utils.h b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou_utils.h new file mode 100644 index 0000000000000000000000000000000000000000..6f275dd65a7d83962affc92be35fece8348a6a91 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/csrc/rbox_iou/rbox_iou_utils.h @@ -0,0 +1,356 @@ +// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/ + +#pragma once + +#include +#include +#include + +#ifdef __CUDACC__ +// Designates functions callable from the host (CPU) and the device (GPU) +#define HOST_DEVICE __host__ __device__ +#define HOST_DEVICE_INLINE HOST_DEVICE __forceinline__ +#else +#include +#define HOST_DEVICE +#define HOST_DEVICE_INLINE HOST_DEVICE inline +#endif + +namespace { + +template struct RotatedBox { T x_ctr, y_ctr, w, h, a; }; + +template struct Point { + T x, y; + HOST_DEVICE_INLINE Point(const T &px = 0, const T &py = 0) : x(px), y(py) {} + HOST_DEVICE_INLINE Point operator+(const Point &p) const { + return Point(x + p.x, y + p.y); + } + HOST_DEVICE_INLINE Point &operator+=(const Point &p) { + x += p.x; + y += p.y; + return *this; + } + HOST_DEVICE_INLINE Point operator-(const Point &p) const { + return Point(x - p.x, y - p.y); + } + HOST_DEVICE_INLINE Point operator*(const T coeff) const { + return Point(x * coeff, y * coeff); + } +}; + +template +HOST_DEVICE_INLINE T dot_2d(const Point &A, const Point &B) { + return A.x * B.x + A.y * B.y; +} + +template +HOST_DEVICE_INLINE T cross_2d(const Point &A, const Point &B) { + return A.x * B.y - B.x * A.y; +} + +template +HOST_DEVICE_INLINE void get_rotated_vertices(const RotatedBox &box, + Point (&pts)[4]) { + // M_PI / 180. == 0.01745329251 + // double theta = box.a * 0.01745329251; + // MODIFIED + double theta = box.a; + T cosTheta2 = (T)cos(theta) * 0.5f; + T sinTheta2 = (T)sin(theta) * 0.5f; + + // y: top --> down; x: left --> right + pts[0].x = box.x_ctr - sinTheta2 * box.h - cosTheta2 * box.w; + pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w; + pts[1].x = box.x_ctr + sinTheta2 * box.h - cosTheta2 * box.w; + pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w; + pts[2].x = 2 * box.x_ctr - pts[0].x; + pts[2].y = 2 * box.y_ctr - pts[0].y; + pts[3].x = 2 * box.x_ctr - pts[1].x; + pts[3].y = 2 * box.y_ctr - pts[1].y; +} + +template +HOST_DEVICE_INLINE int get_intersection_points(const Point (&pts1)[4], + const Point (&pts2)[4], + Point (&intersections)[24]) { + // Line vector + // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1] + Point vec1[4], vec2[4]; + for (int i = 0; i < 4; i++) { + vec1[i] = pts1[(i + 1) % 4] - pts1[i]; + vec2[i] = pts2[(i + 1) % 4] - pts2[i]; + } + + // Line test - test all line combos for intersection + int num = 0; // number of intersections + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + // Solve for 2x2 Ax=b + T det = cross_2d(vec2[j], vec1[i]); + + // This takes care of parallel lines + if (fabs(det) <= 1e-14) { + continue; + } + + auto vec12 = pts2[j] - pts1[i]; + + T t1 = cross_2d(vec2[j], vec12) / det; + T t2 = cross_2d(vec1[i], vec12) / det; + + if (t1 >= 0.0f && t1 <= 1.0f && t2 >= 0.0f && t2 <= 1.0f) { + intersections[num++] = pts1[i] + vec1[i] * t1; + } + } + } + + // Check for vertices of rect1 inside rect2 + { + const auto &AB = vec2[0]; + const auto &DA = vec2[3]; + auto ABdotAB = dot_2d(AB, AB); + auto ADdotAD = dot_2d(DA, DA); + for (int i = 0; i < 4; i++) { + // assume ABCD is the rectangle, and P is the point to be judged + // P is inside ABCD iff. P's projection on AB lies within AB + // and P's projection on AD lies within AD + + auto AP = pts1[i] - pts2[0]; + + auto APdotAB = dot_2d(AP, AB); + auto APdotAD = -dot_2d(AP, DA); + + if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && + (APdotAD <= ADdotAD)) { + intersections[num++] = pts1[i]; + } + } + } + + // Reverse the check - check for vertices of rect2 inside rect1 + { + const auto &AB = vec1[0]; + const auto &DA = vec1[3]; + auto ABdotAB = dot_2d(AB, AB); + auto ADdotAD = dot_2d(DA, DA); + for (int i = 0; i < 4; i++) { + auto AP = pts2[i] - pts1[0]; + + auto APdotAB = dot_2d(AP, AB); + auto APdotAD = -dot_2d(AP, DA); + + if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && + (APdotAD <= ADdotAD)) { + intersections[num++] = pts2[i]; + } + } + } + + return num; +} + +template +HOST_DEVICE_INLINE int convex_hull_graham(const Point (&p)[24], + const int &num_in, Point (&q)[24], + bool shift_to_zero = false) { + assert(num_in >= 2); + + // Step 1: + // Find point with minimum y + // if more than 1 points have the same minimum y, + // pick the one with the minimum x. + int t = 0; + for (int i = 1; i < num_in; i++) { + if (p[i].y < p[t].y || (p[i].y == p[t].y && p[i].x < p[t].x)) { + t = i; + } + } + auto &start = p[t]; // starting point + + // Step 2: + // Subtract starting point from every points (for sorting in the next step) + for (int i = 0; i < num_in; i++) { + q[i] = p[i] - start; + } + + // Swap the starting point to position 0 + auto tmp = q[0]; + q[0] = q[t]; + q[t] = tmp; + + // Step 3: + // Sort point 1 ~ num_in according to their relative cross-product values + // (essentially sorting according to angles) + // If the angles are the same, sort according to their distance to origin + T dist[24]; + for (int i = 0; i < num_in; i++) { + dist[i] = dot_2d(q[i], q[i]); + } + +#ifdef __CUDACC__ + // CUDA version + // In the future, we can potentially use thrust + // for sorting here to improve speed (though not guaranteed) + for (int i = 1; i < num_in - 1; i++) { + for (int j = i + 1; j < num_in; j++) { + T crossProduct = cross_2d(q[i], q[j]); + if ((crossProduct < -1e-6) || + (fabs(crossProduct) < 1e-6 && dist[i] > dist[j])) { + auto q_tmp = q[i]; + q[i] = q[j]; + q[j] = q_tmp; + auto dist_tmp = dist[i]; + dist[i] = dist[j]; + dist[j] = dist_tmp; + } + } + } +#else + // CPU version + std::sort(q + 1, q + num_in, + [](const Point &A, const Point &B) -> bool { + T temp = cross_2d(A, B); + if (fabs(temp) < 1e-6) { + return dot_2d(A, A) < dot_2d(B, B); + } else { + return temp > 0; + } + }); +#endif + + // Step 4: + // Make sure there are at least 2 points (that don't overlap with each other) + // in the stack + int k; // index of the non-overlapped second point + for (k = 1; k < num_in; k++) { + if (dist[k] > 1e-8) { + break; + } + } + if (k == num_in) { + // We reach the end, which means the convex hull is just one point + q[0] = p[t]; + return 1; + } + q[1] = q[k]; + int m = 2; // 2 points in the stack + // Step 5: + // Finally we can start the scanning process. + // When a non-convex relationship between the 3 points is found + // (either concave shape or duplicated points), + // we pop the previous point from the stack + // until the 3-point relationship is convex again, or + // until the stack only contains two points + for (int i = k + 1; i < num_in; i++) { + while (m > 1 && cross_2d(q[i] - q[m - 2], q[m - 1] - q[m - 2]) >= 0) { + m--; + } + q[m++] = q[i]; + } + + // Step 6 (Optional): + // In general sense we need the original coordinates, so we + // need to shift the points back (reverting Step 2) + // But if we're only interested in getting the area/perimeter of the shape + // We can simply return. + if (!shift_to_zero) { + for (int i = 0; i < m; i++) { + q[i] += start; + } + } + + return m; +} + +template +HOST_DEVICE_INLINE T polygon_area(const Point (&q)[24], const int &m) { + if (m <= 2) { + return 0; + } + + T area = 0; + for (int i = 1; i < m - 1; i++) { + area += fabs(cross_2d(q[i] - q[0], q[i + 1] - q[0])); + } + + return area / 2.0; +} + +template +HOST_DEVICE_INLINE T rboxes_intersection(const RotatedBox &box1, + const RotatedBox &box2) { + // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned + // from rotated_rect_intersection_pts + Point intersectPts[24], orderedPts[24]; + + Point pts1[4]; + Point pts2[4]; + get_rotated_vertices(box1, pts1); + get_rotated_vertices(box2, pts2); + + int num = get_intersection_points(pts1, pts2, intersectPts); + + if (num <= 2) { + return 0.0; + } + + // Convex Hull to order the intersection points in clockwise order and find + // the contour area. + int num_convex = convex_hull_graham(intersectPts, num, orderedPts, true); + return polygon_area(orderedPts, num_convex); +} + +} // namespace + +template +HOST_DEVICE_INLINE T rbox_iou_single(T const *const box1_raw, + T const *const box2_raw) { + // shift center to the middle point to achieve higher precision in result + RotatedBox box1, box2; + auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0; + auto center_shift_y = (box1_raw[1] + box2_raw[1]) / 2.0; + box1.x_ctr = box1_raw[0] - center_shift_x; + box1.y_ctr = box1_raw[1] - center_shift_y; + box1.w = box1_raw[2]; + box1.h = box1_raw[3]; + box1.a = box1_raw[4]; + box2.x_ctr = box2_raw[0] - center_shift_x; + box2.y_ctr = box2_raw[1] - center_shift_y; + box2.w = box2_raw[2]; + box2.h = box2_raw[3]; + box2.a = box2_raw[4]; + + if (box1.w < 1e-2 || box1.h < 1e-2 || box2.w < 1e-2 || box2.h < 1e-2) { + return 0.f; + } + const T area1 = box1.w * box1.h; + const T area2 = box2.w * box2.h; + + const T intersection = rboxes_intersection(box1, box2); + const T iou = intersection / (area1 + area2 - intersection); + return iou; +} + +/** + Computes ceil(a / b) +*/ + +HOST_DEVICE inline int CeilDiv(const int a, const int b) { + return (a + b - 1) / b; +} \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/setup.py b/PaddleDetection-release-2.6/ppdet/ext_op/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..5892f4625c263b9eac19a434aca10968882bc4bc --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/setup.py @@ -0,0 +1,33 @@ +import os +import glob +import paddle +from paddle.utils.cpp_extension import CppExtension, CUDAExtension, setup + + +def get_extensions(): + root_dir = os.path.dirname(os.path.abspath(__file__)) + ext_root_dir = os.path.join(root_dir, 'csrc') + sources = [] + for ext_name in os.listdir(ext_root_dir): + ext_dir = os.path.join(ext_root_dir, ext_name) + source = glob.glob(os.path.join(ext_dir, '*.cc')) + kwargs = dict() + if paddle.device.is_compiled_with_cuda(): + source += glob.glob(os.path.join(ext_dir, '*.cu')) + + if not source: + continue + + sources += source + + if paddle.device.is_compiled_with_cuda(): + extension = CUDAExtension( + sources, extra_compile_args={'cxx': ['-DPADDLE_WITH_CUDA']}) + else: + extension = CppExtension(sources) + + return extension + + +if __name__ == "__main__": + setup(name='ext_op', ext_modules=get_extensions()) diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_matched_rbox_iou.py b/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_matched_rbox_iou.py new file mode 100644 index 0000000000000000000000000000000000000000..af7b076da2435a4f025f608430549f2334c22e08 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_matched_rbox_iou.py @@ -0,0 +1,149 @@ +import numpy as np +import sys +import time +from shapely.geometry import Polygon +import paddle +import unittest + +from ext_op import matched_rbox_iou + + +def rbox2poly_single(rrect, get_best_begin_point=False): + """ + rrect:[x_ctr,y_ctr,w,h,angle] + to + poly:[x0,y0,x1,y1,x2,y2,x3,y3] + """ + x_ctr, y_ctr, width, height, angle = rrect[:5] + tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 + # rect 2x4 + rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) + R = np.array([[np.cos(angle), -np.sin(angle)], + [np.sin(angle), np.cos(angle)]]) + # poly + poly = R.dot(rect) + x0, x1, x2, x3 = poly[0, :4] + x_ctr + y0, y1, y2, y3 = poly[1, :4] + y_ctr + poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float64) + return poly + + +def intersection(g, p): + """ + Intersection. + """ + + g = g[:8].reshape((4, 2)) + p = p[:8].reshape((4, 2)) + + a = g + b = p + + use_filter = True + if use_filter: + # step1: + inter_x1 = np.maximum(np.min(a[:, 0]), np.min(b[:, 0])) + inter_x2 = np.minimum(np.max(a[:, 0]), np.max(b[:, 0])) + inter_y1 = np.maximum(np.min(a[:, 1]), np.min(b[:, 1])) + inter_y2 = np.minimum(np.max(a[:, 1]), np.max(b[:, 1])) + if inter_x1 >= inter_x2 or inter_y1 >= inter_y2: + return 0. + x1 = np.minimum(np.min(a[:, 0]), np.min(b[:, 0])) + x2 = np.maximum(np.max(a[:, 0]), np.max(b[:, 0])) + y1 = np.minimum(np.min(a[:, 1]), np.min(b[:, 1])) + y2 = np.maximum(np.max(a[:, 1]), np.max(b[:, 1])) + if x1 >= x2 or y1 >= y2 or (x2 - x1) < 2 or (y2 - y1) < 2: + return 0. + + g = Polygon(g) + p = Polygon(p) + if not g.is_valid or not p.is_valid: + return 0 + + inter = Polygon(g).intersection(Polygon(p)).area + union = g.area + p.area - inter + if union == 0: + return 0 + else: + return inter / union + + +def matched_rbox_overlaps(anchors, gt_bboxes, use_cv2=False): + """ + + Args: + anchors: [M, 5] x1,y1,x2,y2,angle + gt_bboxes: [M, 5] x1,y1,x2,y2,angle + + Returns: + macthed_iou: [M] + """ + assert anchors.shape[1] == 5 + assert gt_bboxes.shape[1] == 5 + + gt_bboxes_ploy = [rbox2poly_single(e) for e in gt_bboxes] + anchors_ploy = [rbox2poly_single(e) for e in anchors] + + num = len(anchors_ploy) + iou = np.zeros((num, ), dtype=np.float64) + + start_time = time.time() + for i in range(num): + try: + iou[i] = intersection(gt_bboxes_ploy[i], anchors_ploy[i]) + except Exception as e: + print('cur gt_bboxes_ploy[i]', gt_bboxes_ploy[i], + 'anchors_ploy[j]', anchors_ploy[i], e) + return iou + + +def gen_sample(n): + rbox = np.random.rand(n, 5) + rbox[:, 0:4] = rbox[:, 0:4] * 0.45 + 0.001 + rbox[:, 4] = rbox[:, 4] - 0.5 + return rbox + + +class MatchedRBoxIoUTest(unittest.TestCase): + def setUp(self): + self.initTestCase() + self.rbox1 = gen_sample(self.n) + self.rbox2 = gen_sample(self.n) + + def initTestCase(self): + self.n = 1000 + + def assertAllClose(self, x, y, msg, atol=5e-1, rtol=1e-2): + self.assertTrue(np.allclose(x, y, atol=atol, rtol=rtol), msg=msg) + + def get_places(self): + places = [paddle.CPUPlace()] + if paddle.device.is_compiled_with_cuda(): + places.append(paddle.CUDAPlace(0)) + + return places + + def check_output(self, place): + paddle.disable_static() + pd_rbox1 = paddle.to_tensor(self.rbox1, place=place) + pd_rbox2 = paddle.to_tensor(self.rbox2, place=place) + actual_t = matched_rbox_iou(pd_rbox1, pd_rbox2).numpy() + poly_rbox1 = self.rbox1 + poly_rbox2 = self.rbox2 + poly_rbox1[:, 0:4] = self.rbox1[:, 0:4] * 1024 + poly_rbox2[:, 0:4] = self.rbox2[:, 0:4] * 1024 + expect_t = matched_rbox_overlaps(poly_rbox1, poly_rbox2, use_cv2=False) + self.assertAllClose( + actual_t, + expect_t, + msg="rbox_iou has diff at {} \nExpect {}\nBut got {}".format( + str(place), str(expect_t), str(actual_t))) + + def test_output(self): + places = self.get_places() + for place in places: + self.check_output(place) + + +if __name__ == "__main__": + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_rbox_iou.py b/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_rbox_iou.py new file mode 100644 index 0000000000000000000000000000000000000000..8ef19ae841d5a73c5b90f1b971ed36d1d7f61a7a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/ext_op/unittest/test_rbox_iou.py @@ -0,0 +1,151 @@ +import numpy as np +import sys +import time +from shapely.geometry import Polygon +import paddle +import unittest + +from ext_op import rbox_iou + + +def rbox2poly_single(rrect, get_best_begin_point=False): + """ + rrect:[x_ctr,y_ctr,w,h,angle] + to + poly:[x0,y0,x1,y1,x2,y2,x3,y3] + """ + x_ctr, y_ctr, width, height, angle = rrect[:5] + tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 + # rect 2x4 + rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) + R = np.array([[np.cos(angle), -np.sin(angle)], + [np.sin(angle), np.cos(angle)]]) + # poly + poly = R.dot(rect) + x0, x1, x2, x3 = poly[0, :4] + x_ctr + y0, y1, y2, y3 = poly[1, :4] + y_ctr + poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float64) + return poly + + +def intersection(g, p): + """ + Intersection. + """ + + g = g[:8].reshape((4, 2)) + p = p[:8].reshape((4, 2)) + + a = g + b = p + + use_filter = True + if use_filter: + # step1: + inter_x1 = np.maximum(np.min(a[:, 0]), np.min(b[:, 0])) + inter_x2 = np.minimum(np.max(a[:, 0]), np.max(b[:, 0])) + inter_y1 = np.maximum(np.min(a[:, 1]), np.min(b[:, 1])) + inter_y2 = np.minimum(np.max(a[:, 1]), np.max(b[:, 1])) + if inter_x1 >= inter_x2 or inter_y1 >= inter_y2: + return 0. + x1 = np.minimum(np.min(a[:, 0]), np.min(b[:, 0])) + x2 = np.maximum(np.max(a[:, 0]), np.max(b[:, 0])) + y1 = np.minimum(np.min(a[:, 1]), np.min(b[:, 1])) + y2 = np.maximum(np.max(a[:, 1]), np.max(b[:, 1])) + if x1 >= x2 or y1 >= y2 or (x2 - x1) < 2 or (y2 - y1) < 2: + return 0. + + g = Polygon(g) + p = Polygon(p) + if not g.is_valid or not p.is_valid: + return 0 + + inter = Polygon(g).intersection(Polygon(p)).area + union = g.area + p.area - inter + if union == 0: + return 0 + else: + return inter / union + + +def rbox_overlaps(anchors, gt_bboxes, use_cv2=False): + """ + + Args: + anchors: [NA, 5] x1,y1,x2,y2,angle + gt_bboxes: [M, 5] x1,y1,x2,y2,angle + + Returns: + iou: [NA, M] + """ + assert anchors.shape[1] == 5 + assert gt_bboxes.shape[1] == 5 + + gt_bboxes_ploy = [rbox2poly_single(e) for e in gt_bboxes] + anchors_ploy = [rbox2poly_single(e) for e in anchors] + + num_gt, num_anchors = len(gt_bboxes_ploy), len(anchors_ploy) + iou = np.zeros((num_anchors, num_gt), dtype=np.float64) + + start_time = time.time() + for i in range(num_anchors): + for j in range(num_gt): + try: + iou[i, j] = intersection(anchors_ploy[i], gt_bboxes_ploy[j]) + except Exception as e: + print('cur anchors_ploy[i]', anchors_ploy[i], + 'gt_bboxes_ploy[j]', gt_bboxes_ploy[j], e) + return iou + + +def gen_sample(n): + rbox = np.random.rand(n, 5) + rbox[:, 0:4] = rbox[:, 0:4] * 0.45 + 0.001 + rbox[:, 4] = rbox[:, 4] - 0.5 + return rbox + + +class RBoxIoUTest(unittest.TestCase): + def setUp(self): + self.initTestCase() + self.rbox1 = gen_sample(self.n) + self.rbox2 = gen_sample(self.m) + + def initTestCase(self): + self.n = 13000 + self.m = 7 + + def assertAllClose(self, x, y, msg, atol=5e-1, rtol=1e-2): + self.assertTrue(np.allclose(x, y, atol=atol, rtol=rtol), msg=msg) + + def get_places(self): + places = [paddle.CPUPlace()] + if paddle.device.is_compiled_with_cuda(): + places.append(paddle.CUDAPlace(0)) + + return places + + def check_output(self, place): + paddle.disable_static() + pd_rbox1 = paddle.to_tensor(self.rbox1, place=place) + pd_rbox2 = paddle.to_tensor(self.rbox2, place=place) + actual_t = rbox_iou(pd_rbox1, pd_rbox2).numpy() + poly_rbox1 = self.rbox1 + poly_rbox2 = self.rbox2 + poly_rbox1[:, 0:4] = self.rbox1[:, 0:4] * 1024 + poly_rbox2[:, 0:4] = self.rbox2[:, 0:4] * 1024 + expect_t = rbox_overlaps(poly_rbox1, poly_rbox2, use_cv2=False) + self.assertAllClose( + actual_t, + expect_t, + msg="rbox_iou has diff at {} \nExpect {}\nBut got {}".format( + str(place), str(expect_t), str(actual_t))) + + def test_output(self): + places = self.get_places() + for place in places: + self.check_output(place) + + +if __name__ == "__main__": + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__init__.py b/PaddleDetection-release-2.6/ppdet/metrics/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..3e1b83cca4425ae82c8af14096a1e91bd7e0503d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/__init__.py @@ -0,0 +1,30 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import metrics +from . import keypoint_metrics + +from .metrics import * +from .keypoint_metrics import * +from .pose3d_metrics import * + +__all__ = metrics.__all__ + keypoint_metrics.__all__ + +from . import mot_metrics +from .mot_metrics import * +__all__ = metrics.__all__ + mot_metrics.__all__ + +from . import mcmot_metrics +from .mcmot_metrics import * +__all__ = metrics.__all__ + mcmot_metrics.__all__ \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6126105d03e214b9604ea6a068373d6aea48aa7d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/coco_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/coco_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7da64aa243376a7f91e0dd132d79993348ad9841 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/coco_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/json_results.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/json_results.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fbe53dfa44e0e5b8a337c12839d7465db05a1e5e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/json_results.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/keypoint_metrics.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/keypoint_metrics.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5f84e7caa33113608af5b3679a5be0c1e5fe030c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/keypoint_metrics.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/map_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/map_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6fd40a477661bdc11c1b9b4e5ae6a20e29575f78 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/map_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mcmot_metrics.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mcmot_metrics.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2073268232c96a6022fe9fdaad02efbc3e2fe633 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mcmot_metrics.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/metrics.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/metrics.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fbbaa7b50b5f1246bd8e8c74915125b97479b130 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/metrics.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mot_metrics.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mot_metrics.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ae20d0439988a1474a16a68bb50473274d669ff5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/mot_metrics.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/munkres.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/munkres.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..209f532aadb441e54aa00d821e4551a2b41dadd1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/munkres.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/pose3d_metrics.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/pose3d_metrics.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4805215d90128e02047f81c518d1903b023c12d9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/pose3d_metrics.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/widerface_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/widerface_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..081e3b0e3608e3df7fecec74eb517440dfae6b55 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/metrics/__pycache__/widerface_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/metrics/coco_utils.py b/PaddleDetection-release-2.6/ppdet/metrics/coco_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..b7a4d7e323064c56f1c5c8158983d18f70be236b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/coco_utils.py @@ -0,0 +1,188 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import numpy as np +import itertools + +from ppdet.metrics.json_results import get_det_res, get_det_poly_res, get_seg_res, get_solov2_segm_res, get_keypoint_res, get_pose3d_res +from ppdet.metrics.map_utils import draw_pr_curve + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +def get_infer_results(outs, catid, bias=0): + """ + Get result at the stage of inference. + The output format is dictionary containing bbox or mask result. + + For example, bbox result is a list and each element contains + image_id, category_id, bbox and score. + """ + if outs is None or len(outs) == 0: + raise ValueError( + 'The number of valid detection result if zero. Please use reasonable model and check input data.' + ) + + im_id = outs['im_id'] + + infer_res = {} + if 'bbox' in outs: + if len(outs['bbox']) > 0 and len(outs['bbox'][0]) > 6: + infer_res['bbox'] = get_det_poly_res( + outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias) + else: + infer_res['bbox'] = get_det_res( + outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias) + + if 'mask' in outs: + # mask post process + infer_res['mask'] = get_seg_res(outs['mask'], outs['bbox'], + outs['bbox_num'], im_id, catid) + + if 'segm' in outs: + infer_res['segm'] = get_solov2_segm_res(outs, im_id, catid) + + if 'keypoint' in outs: + infer_res['keypoint'] = get_keypoint_res(outs, im_id) + outs['bbox_num'] = [len(infer_res['keypoint'])] + + if 'pose3d' in outs: + infer_res['pose3d'] = get_pose3d_res(outs, im_id) + outs['bbox_num'] = [len(infer_res['pose3d'])] + + return infer_res + + +def cocoapi_eval(jsonfile, + style, + coco_gt=None, + anno_file=None, + max_dets=(100, 300, 1000), + classwise=False, + sigmas=None, + use_area=True): + """ + Args: + jsonfile (str): Evaluation json file, eg: bbox.json, mask.json. + style (str): COCOeval style, can be `bbox` , `segm` , `proposal`, `keypoints` and `keypoints_crowd`. + coco_gt (str): Whether to load COCOAPI through anno_file, + eg: coco_gt = COCO(anno_file) + anno_file (str): COCO annotations file. + max_dets (tuple): COCO evaluation maxDets. + classwise (bool): Whether per-category AP and draw P-R Curve or not. + sigmas (nparray): keypoint labelling sigmas. + use_area (bool): If gt annotations (eg. CrowdPose, AIC) + do not have 'area', please set use_area=False. + """ + assert coco_gt != None or anno_file != None + if style == 'keypoints_crowd': + #please install xtcocotools==1.6 + from xtcocotools.coco import COCO + from xtcocotools.cocoeval import COCOeval + else: + from pycocotools.coco import COCO + from pycocotools.cocoeval import COCOeval + + if coco_gt == None: + coco_gt = COCO(anno_file) + logger.info("Start evaluate...") + coco_dt = coco_gt.loadRes(jsonfile) + if style == 'proposal': + coco_eval = COCOeval(coco_gt, coco_dt, 'bbox') + coco_eval.params.useCats = 0 + coco_eval.params.maxDets = list(max_dets) + elif style == 'keypoints_crowd': + coco_eval = COCOeval(coco_gt, coco_dt, style, sigmas, use_area) + else: + coco_eval = COCOeval(coco_gt, coco_dt, style) + coco_eval.evaluate() + coco_eval.accumulate() + coco_eval.summarize() + if classwise: + # Compute per-category AP and PR curve + try: + from terminaltables import AsciiTable + except Exception as e: + logger.error( + 'terminaltables not found, plaese install terminaltables. ' + 'for example: `pip install terminaltables`.') + raise e + precisions = coco_eval.eval['precision'] + cat_ids = coco_gt.getCatIds() + # precision: (iou, recall, cls, area range, max dets) + assert len(cat_ids) == precisions.shape[2] + results_per_category = [] + for idx, catId in enumerate(cat_ids): + # area range index 0: all area ranges + # max dets index -1: typically 100 per image + nm = coco_gt.loadCats(catId)[0] + precision = precisions[:, :, idx, 0, -1] + precision = precision[precision > -1] + if precision.size: + ap = np.mean(precision) + else: + ap = float('nan') + results_per_category.append( + (str(nm["name"]), '{:0.3f}'.format(float(ap)))) + pr_array = precisions[0, :, idx, 0, 2] + recall_array = np.arange(0.0, 1.01, 0.01) + draw_pr_curve( + pr_array, + recall_array, + out_dir=style + '_pr_curve', + file_name='{}_precision_recall_curve.jpg'.format(nm["name"])) + + num_columns = min(6, len(results_per_category) * 2) + results_flatten = list(itertools.chain(*results_per_category)) + headers = ['category', 'AP'] * (num_columns // 2) + results_2d = itertools.zip_longest( + * [results_flatten[i::num_columns] for i in range(num_columns)]) + table_data = [headers] + table_data += [result for result in results_2d] + table = AsciiTable(table_data) + logger.info('Per-category of {} AP: \n{}'.format(style, table.table)) + logger.info("per-category PR curve has output to {} folder.".format( + style + '_pr_curve')) + # flush coco evaluation result + sys.stdout.flush() + return coco_eval.stats + + +def json_eval_results(metric, json_directory, dataset): + """ + cocoapi eval with already exists proposal.json, bbox.json or mask.json + """ + assert metric == 'COCO' + anno_file = dataset.get_anno() + json_file_list = ['proposal.json', 'bbox.json', 'mask.json'] + if json_directory: + assert os.path.exists( + json_directory), "The json directory:{} does not exist".format( + json_directory) + for k, v in enumerate(json_file_list): + json_file_list[k] = os.path.join(str(json_directory), v) + + coco_eval_style = ['proposal', 'bbox', 'segm'] + for i, v_json in enumerate(json_file_list): + if os.path.exists(v_json): + cocoapi_eval(v_json, coco_eval_style[i], anno_file=anno_file) + else: + logger.info("{} not exists!".format(v_json)) diff --git a/PaddleDetection-release-2.6/ppdet/metrics/json_results.py b/PaddleDetection-release-2.6/ppdet/metrics/json_results.py new file mode 100644 index 0000000000000000000000000000000000000000..d2575af434b19a579f910496e1fc876d10b00707 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/json_results.py @@ -0,0 +1,175 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import six +import numpy as np + + +def get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0): + det_res = [] + k = 0 + for i in range(len(bbox_nums)): + cur_image_id = int(image_id[i][0]) + det_nums = bbox_nums[i] + for j in range(det_nums): + dt = bboxes[k] + k = k + 1 + num_id, score, xmin, ymin, xmax, ymax = dt.tolist() + if int(num_id) < 0: + continue + category_id = label_to_cat_id_map[int(num_id)] + w = xmax - xmin + bias + h = ymax - ymin + bias + bbox = [xmin, ymin, w, h] + dt_res = { + 'image_id': cur_image_id, + 'category_id': category_id, + 'bbox': bbox, + 'score': score + } + det_res.append(dt_res) + return det_res + + +def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0): + det_res = [] + k = 0 + for i in range(len(bbox_nums)): + cur_image_id = int(image_id[i][0]) + det_nums = bbox_nums[i] + for j in range(det_nums): + dt = bboxes[k] + k = k + 1 + num_id, score, x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist() + if int(num_id) < 0: + continue + category_id = label_to_cat_id_map[int(num_id)] + rbox = [x1, y1, x2, y2, x3, y3, x4, y4] + dt_res = { + 'image_id': cur_image_id, + 'category_id': category_id, + 'bbox': rbox, + 'score': score + } + det_res.append(dt_res) + return det_res + + +def strip_mask(mask): + row = mask[0, 0, :] + col = mask[0, :, 0] + im_h = len(col) - np.count_nonzero(col == -1) + im_w = len(row) - np.count_nonzero(row == -1) + return mask[:, :im_h, :im_w] + + +def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map): + import pycocotools.mask as mask_util + seg_res = [] + k = 0 + for i in range(len(mask_nums)): + cur_image_id = int(image_id[i][0]) + det_nums = mask_nums[i] + mask_i = masks[k:k + det_nums] + mask_i = strip_mask(mask_i) + for j in range(det_nums): + mask = mask_i[j].astype(np.uint8) + score = float(bboxes[k][1]) + label = int(bboxes[k][0]) + k = k + 1 + if label == -1: + continue + cat_id = label_to_cat_id_map[label] + rle = mask_util.encode( + np.array( + mask[:, :, None], order="F", dtype="uint8"))[0] + if six.PY3: + if 'counts' in rle: + rle['counts'] = rle['counts'].decode("utf8") + sg_res = { + 'image_id': cur_image_id, + 'category_id': cat_id, + 'segmentation': rle, + 'score': score + } + seg_res.append(sg_res) + return seg_res + + +def get_solov2_segm_res(results, image_id, num_id_to_cat_id_map): + import pycocotools.mask as mask_util + segm_res = [] + # for each batch + segms = results['segm'].astype(np.uint8) + clsid_labels = results['cate_label'] + clsid_scores = results['cate_score'] + lengths = segms.shape[0] + im_id = int(image_id[0][0]) + if lengths == 0 or segms is None: + return None + # for each sample + for i in range(lengths - 1): + clsid = int(clsid_labels[i]) + catid = num_id_to_cat_id_map[clsid] + score = float(clsid_scores[i]) + mask = segms[i] + segm = mask_util.encode(np.array(mask[:, :, np.newaxis], order='F'))[0] + segm['counts'] = segm['counts'].decode('utf8') + coco_res = { + 'image_id': im_id, + 'category_id': catid, + 'segmentation': segm, + 'score': score + } + segm_res.append(coco_res) + return segm_res + + +def get_keypoint_res(results, im_id): + anns = [] + preds = results['keypoint'] + for idx in range(im_id.shape[0]): + image_id = im_id[idx].item() + kpts, scores = preds[idx] + for kpt, score in zip(kpts, scores): + kpt = kpt.flatten() + ann = { + 'image_id': image_id, + 'category_id': 1, # XXX hard code + 'keypoints': kpt.tolist(), + 'score': float(score) + } + x = kpt[0::3] + y = kpt[1::3] + x0, x1, y0, y1 = np.min(x).item(), np.max(x).item(), np.min(y).item( + ), np.max(y).item() + ann['area'] = (x1 - x0) * (y1 - y0) + ann['bbox'] = [x0, y0, x1 - x0, y1 - y0] + anns.append(ann) + return anns + + +def get_pose3d_res(results, im_id): + anns = [] + preds = results['pose3d'] + for idx in range(im_id.shape[0]): + image_id = im_id[idx].item() + pose3d = preds[idx] + ann = { + 'image_id': image_id, + 'category_id': 1, # XXX hard code + 'pose3d': pose3d.tolist(), + 'score': float(1.) + } + anns.append(ann) + return anns diff --git a/PaddleDetection-release-2.6/ppdet/metrics/keypoint_metrics.py b/PaddleDetection-release-2.6/ppdet/metrics/keypoint_metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..cbd52d02d4af1f6dd81edd0ea63a98b7ed77e614 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/keypoint_metrics.py @@ -0,0 +1,410 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +from collections import defaultdict, OrderedDict +import numpy as np +import paddle +from pycocotools.coco import COCO +from pycocotools.cocoeval import COCOeval +from ..modeling.keypoint_utils import oks_nms +from scipy.io import loadmat, savemat +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['KeyPointTopDownCOCOEval', 'KeyPointTopDownMPIIEval'] + + +class KeyPointTopDownCOCOEval(object): + """refer to + https://github.com/leoxiaobin/deep-high-resolution-net.pytorch + Copyright (c) Microsoft, under the MIT License. + """ + + def __init__(self, + anno_file, + num_samples, + num_joints, + output_eval, + iou_type='keypoints', + in_vis_thre=0.2, + oks_thre=0.9, + save_prediction_only=False): + super(KeyPointTopDownCOCOEval, self).__init__() + self.coco = COCO(anno_file) + self.num_samples = num_samples + self.num_joints = num_joints + self.iou_type = iou_type + self.in_vis_thre = in_vis_thre + self.oks_thre = oks_thre + self.output_eval = output_eval + self.res_file = os.path.join(output_eval, "keypoints_results.json") + self.save_prediction_only = save_prediction_only + self.reset() + + def reset(self): + self.results = { + 'all_preds': np.zeros( + (self.num_samples, self.num_joints, 3), dtype=np.float32), + 'all_boxes': np.zeros((self.num_samples, 6)), + 'image_path': [] + } + self.eval_results = {} + self.idx = 0 + + def update(self, inputs, outputs): + kpts, _ = outputs['keypoint'][0] + + num_images = inputs['image'].shape[0] + self.results['all_preds'][self.idx:self.idx + num_images, :, 0: + 3] = kpts[:, :, 0:3] + self.results['all_boxes'][self.idx:self.idx + num_images, 0:2] = inputs[ + 'center'].numpy()[:, 0:2] if isinstance( + inputs['center'], paddle.Tensor) else inputs['center'][:, 0:2] + self.results['all_boxes'][self.idx:self.idx + num_images, 2:4] = inputs[ + 'scale'].numpy()[:, 0:2] if isinstance( + inputs['scale'], paddle.Tensor) else inputs['scale'][:, 0:2] + self.results['all_boxes'][self.idx:self.idx + num_images, 4] = np.prod( + inputs['scale'].numpy() * 200, + 1) if isinstance(inputs['scale'], paddle.Tensor) else np.prod( + inputs['scale'] * 200, 1) + self.results['all_boxes'][ + self.idx:self.idx + num_images, + 5] = np.squeeze(inputs['score'].numpy()) if isinstance( + inputs['score'], paddle.Tensor) else np.squeeze(inputs['score']) + if isinstance(inputs['im_id'], paddle.Tensor): + self.results['image_path'].extend(inputs['im_id'].numpy()) + else: + self.results['image_path'].extend(inputs['im_id']) + self.idx += num_images + + def _write_coco_keypoint_results(self, keypoints): + data_pack = [{ + 'cat_id': 1, + 'cls': 'person', + 'ann_type': 'keypoints', + 'keypoints': keypoints + }] + results = self._coco_keypoint_results_one_category_kernel(data_pack[0]) + if not os.path.exists(self.output_eval): + os.makedirs(self.output_eval) + with open(self.res_file, 'w') as f: + json.dump(results, f, sort_keys=True, indent=4) + logger.info(f'The keypoint result is saved to {self.res_file}.') + try: + json.load(open(self.res_file)) + except Exception: + content = [] + with open(self.res_file, 'r') as f: + for line in f: + content.append(line) + content[-1] = ']' + with open(self.res_file, 'w') as f: + for c in content: + f.write(c) + + def _coco_keypoint_results_one_category_kernel(self, data_pack): + cat_id = data_pack['cat_id'] + keypoints = data_pack['keypoints'] + cat_results = [] + + for img_kpts in keypoints: + if len(img_kpts) == 0: + continue + + _key_points = np.array( + [img_kpts[k]['keypoints'] for k in range(len(img_kpts))]) + _key_points = _key_points.reshape(_key_points.shape[0], -1) + + result = [{ + 'image_id': img_kpts[k]['image'], + 'category_id': cat_id, + 'keypoints': _key_points[k].tolist(), + 'score': img_kpts[k]['score'], + 'center': list(img_kpts[k]['center']), + 'scale': list(img_kpts[k]['scale']) + } for k in range(len(img_kpts))] + cat_results.extend(result) + + return cat_results + + def get_final_results(self, preds, all_boxes, img_path): + _kpts = [] + for idx, kpt in enumerate(preds): + _kpts.append({ + 'keypoints': kpt, + 'center': all_boxes[idx][0:2], + 'scale': all_boxes[idx][2:4], + 'area': all_boxes[idx][4], + 'score': all_boxes[idx][5], + 'image': int(img_path[idx]) + }) + # image x person x (keypoints) + kpts = defaultdict(list) + for kpt in _kpts: + kpts[kpt['image']].append(kpt) + + # rescoring and oks nms + num_joints = preds.shape[1] + in_vis_thre = self.in_vis_thre + oks_thre = self.oks_thre + oks_nmsed_kpts = [] + for img in kpts.keys(): + img_kpts = kpts[img] + for n_p in img_kpts: + box_score = n_p['score'] + kpt_score = 0 + valid_num = 0 + for n_jt in range(0, num_joints): + t_s = n_p['keypoints'][n_jt][2] + if t_s > in_vis_thre: + kpt_score = kpt_score + t_s + valid_num = valid_num + 1 + if valid_num != 0: + kpt_score = kpt_score / valid_num + # rescoring + n_p['score'] = kpt_score * box_score + + keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))], + oks_thre) + + if len(keep) == 0: + oks_nmsed_kpts.append(img_kpts) + else: + oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep]) + + self._write_coco_keypoint_results(oks_nmsed_kpts) + + def accumulate(self): + self.get_final_results(self.results['all_preds'], + self.results['all_boxes'], + self.results['image_path']) + if self.save_prediction_only: + logger.info(f'The keypoint result is saved to {self.res_file} ' + 'and do not evaluate the mAP.') + return + coco_dt = self.coco.loadRes(self.res_file) + coco_eval = COCOeval(self.coco, coco_dt, 'keypoints') + coco_eval.params.useSegm = None + coco_eval.evaluate() + coco_eval.accumulate() + coco_eval.summarize() + + keypoint_stats = [] + for ind in range(len(coco_eval.stats)): + keypoint_stats.append((coco_eval.stats[ind])) + self.eval_results['keypoint'] = keypoint_stats + + def log(self): + if self.save_prediction_only: + return + stats_names = [ + 'AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5', + 'AR .75', 'AR (M)', 'AR (L)' + ] + num_values = len(stats_names) + print(' '.join(['| {}'.format(name) for name in stats_names]) + ' |') + print('|---' * (num_values + 1) + '|') + + print(' '.join([ + '| {:.3f}'.format(value) for value in self.eval_results['keypoint'] + ]) + ' |') + + def get_results(self): + return self.eval_results + + +class KeyPointTopDownMPIIEval(object): + def __init__(self, + anno_file, + num_samples, + num_joints, + output_eval, + oks_thre=0.9, + save_prediction_only=False): + super(KeyPointTopDownMPIIEval, self).__init__() + self.ann_file = anno_file + self.res_file = os.path.join(output_eval, "keypoints_results.json") + self.save_prediction_only = save_prediction_only + self.reset() + + def reset(self): + self.results = [] + self.eval_results = {} + self.idx = 0 + + def update(self, inputs, outputs): + kpts, _ = outputs['keypoint'][0] + + num_images = inputs['image'].shape[0] + results = {} + results['preds'] = kpts[:, :, 0:3] + results['boxes'] = np.zeros((num_images, 6)) + results['boxes'][:, 0:2] = inputs['center'].numpy()[:, 0:2] + results['boxes'][:, 2:4] = inputs['scale'].numpy()[:, 0:2] + results['boxes'][:, 4] = np.prod(inputs['scale'].numpy() * 200, 1) + results['boxes'][:, 5] = np.squeeze(inputs['score'].numpy()) + results['image_path'] = inputs['image_file'] + + self.results.append(results) + + def accumulate(self): + self._mpii_keypoint_results_save() + if self.save_prediction_only: + logger.info(f'The keypoint result is saved to {self.res_file} ' + 'and do not evaluate the mAP.') + return + + self.eval_results = self.evaluate(self.results) + + def _mpii_keypoint_results_save(self): + results = [] + for res in self.results: + if len(res) == 0: + continue + result = [{ + 'preds': res['preds'][k].tolist(), + 'boxes': res['boxes'][k].tolist(), + 'image_path': res['image_path'][k], + } for k in range(len(res))] + results.extend(result) + with open(self.res_file, 'w') as f: + json.dump(results, f, sort_keys=True, indent=4) + logger.info(f'The keypoint result is saved to {self.res_file}.') + + def log(self): + if self.save_prediction_only: + return + for item, value in self.eval_results.items(): + print("{} : {}".format(item, value)) + + def get_results(self): + return self.eval_results + + def evaluate(self, outputs, savepath=None): + """Evaluate PCKh for MPII dataset. refer to + https://github.com/leoxiaobin/deep-high-resolution-net.pytorch + Copyright (c) Microsoft, under the MIT License. + + Args: + outputs(list(preds, boxes)): + + * preds (np.ndarray[N,K,3]): The first two dimensions are + coordinates, score is the third dimension of the array. + * boxes (np.ndarray[N,6]): [center[0], center[1], scale[0] + , scale[1],area, score] + + Returns: + dict: PCKh for each joint + """ + + kpts = [] + for output in outputs: + preds = output['preds'] + batch_size = preds.shape[0] + for i in range(batch_size): + kpts.append({'keypoints': preds[i]}) + + preds = np.stack([kpt['keypoints'] for kpt in kpts]) + + # convert 0-based index to 1-based index, + # and get the first two dimensions. + preds = preds[..., :2] + 1.0 + + if savepath is not None: + pred_file = os.path.join(savepath, 'pred.mat') + savemat(pred_file, mdict={'preds': preds}) + + SC_BIAS = 0.6 + threshold = 0.5 + + gt_file = os.path.join( + os.path.dirname(self.ann_file), 'mpii_gt_val.mat') + gt_dict = loadmat(gt_file) + dataset_joints = gt_dict['dataset_joints'] + jnt_missing = gt_dict['jnt_missing'] + pos_gt_src = gt_dict['pos_gt_src'] + headboxes_src = gt_dict['headboxes_src'] + + pos_pred_src = np.transpose(preds, [1, 2, 0]) + + head = np.where(dataset_joints == 'head')[1][0] + lsho = np.where(dataset_joints == 'lsho')[1][0] + lelb = np.where(dataset_joints == 'lelb')[1][0] + lwri = np.where(dataset_joints == 'lwri')[1][0] + lhip = np.where(dataset_joints == 'lhip')[1][0] + lkne = np.where(dataset_joints == 'lkne')[1][0] + lank = np.where(dataset_joints == 'lank')[1][0] + + rsho = np.where(dataset_joints == 'rsho')[1][0] + relb = np.where(dataset_joints == 'relb')[1][0] + rwri = np.where(dataset_joints == 'rwri')[1][0] + rkne = np.where(dataset_joints == 'rkne')[1][0] + rank = np.where(dataset_joints == 'rank')[1][0] + rhip = np.where(dataset_joints == 'rhip')[1][0] + + jnt_visible = 1 - jnt_missing + uv_error = pos_pred_src - pos_gt_src + uv_err = np.linalg.norm(uv_error, axis=1) + headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :] + headsizes = np.linalg.norm(headsizes, axis=0) + headsizes *= SC_BIAS + scale = headsizes * np.ones((len(uv_err), 1), dtype=np.float32) + scaled_uv_err = uv_err / scale + scaled_uv_err = scaled_uv_err * jnt_visible + jnt_count = np.sum(jnt_visible, axis=1) + less_than_threshold = (scaled_uv_err <= threshold) * jnt_visible + PCKh = 100. * np.sum(less_than_threshold, axis=1) / jnt_count + + # save + rng = np.arange(0, 0.5 + 0.01, 0.01) + pckAll = np.zeros((len(rng), 16), dtype=np.float32) + + for r, threshold in enumerate(rng): + less_than_threshold = (scaled_uv_err <= threshold) * jnt_visible + pckAll[r, :] = 100. * np.sum(less_than_threshold, + axis=1) / jnt_count + + PCKh = np.ma.array(PCKh, mask=False) + PCKh.mask[6:8] = True + + jnt_count = np.ma.array(jnt_count, mask=False) + jnt_count.mask[6:8] = True + jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64) + + name_value = [ #noqa + ('Head', PCKh[head]), + ('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])), + ('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])), + ('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])), + ('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])), + ('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])), + ('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])), + ('PCKh', np.sum(PCKh * jnt_ratio)), + ('PCKh@0.1', np.sum(pckAll[11, :] * jnt_ratio)) + ] + name_value = OrderedDict(name_value) + + return name_value + + def _sort_and_unique_bboxes(self, kpts, key='bbox_id'): + """sort kpts and remove the repeated ones.""" + kpts = sorted(kpts, key=lambda x: x[key]) + num = len(kpts) + for i in range(num - 1, 0, -1): + if kpts[i][key] == kpts[i - 1][key]: + del kpts[i] + + return kpts diff --git a/PaddleDetection-release-2.6/ppdet/metrics/map_utils.py b/PaddleDetection-release-2.6/ppdet/metrics/map_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..57f12d9e2d2c2f4001de5eae3477fdabb2a94744 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/map_utils.py @@ -0,0 +1,436 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import os +import sys +import numpy as np +import itertools +import paddle +from ppdet.modeling.rbox_utils import poly2rbox_np + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'draw_pr_curve', + 'bbox_area', + 'jaccard_overlap', + 'prune_zero_padding', + 'DetectionMAP', + 'ap_per_class', + 'compute_ap', +] + + +def draw_pr_curve(precision, + recall, + iou=0.5, + out_dir='pr_curve', + file_name='precision_recall_curve.jpg'): + if not os.path.exists(out_dir): + os.makedirs(out_dir) + output_path = os.path.join(out_dir, file_name) + try: + import matplotlib.pyplot as plt + except Exception as e: + logger.error('Matplotlib not found, plaese install matplotlib.' + 'for example: `pip install matplotlib`.') + raise e + plt.cla() + plt.figure('P-R Curve') + plt.title('Precision/Recall Curve(IoU={})'.format(iou)) + plt.xlabel('Recall') + plt.ylabel('Precision') + plt.grid(True) + plt.plot(recall, precision) + plt.savefig(output_path) + + +def bbox_area(bbox, is_bbox_normalized): + """ + Calculate area of a bounding box + """ + norm = 1. - float(is_bbox_normalized) + width = bbox[2] - bbox[0] + norm + height = bbox[3] - bbox[1] + norm + return width * height + + +def jaccard_overlap(pred, gt, is_bbox_normalized=False): + """ + Calculate jaccard overlap ratio between two bounding box + """ + if pred[0] >= gt[2] or pred[2] <= gt[0] or \ + pred[1] >= gt[3] or pred[3] <= gt[1]: + return 0. + inter_xmin = max(pred[0], gt[0]) + inter_ymin = max(pred[1], gt[1]) + inter_xmax = min(pred[2], gt[2]) + inter_ymax = min(pred[3], gt[3]) + inter_size = bbox_area([inter_xmin, inter_ymin, inter_xmax, inter_ymax], + is_bbox_normalized) + pred_size = bbox_area(pred, is_bbox_normalized) + gt_size = bbox_area(gt, is_bbox_normalized) + overlap = float(inter_size) / (pred_size + gt_size - inter_size) + return overlap + + +def calc_rbox_iou(pred, gt_poly): + """ + calc iou between rotated bbox + """ + # calc iou of bounding box for speedup + pred = np.array(pred, np.float32).reshape(-1, 2) + gt_poly = np.array(gt_poly, np.float32).reshape(-1, 2) + pred_rect = [ + np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]), + np.max(pred[:, 1]) + ] + gt_rect = [ + np.min(gt_poly[:, 0]), np.min(gt_poly[:, 1]), np.max(gt_poly[:, 0]), + np.max(gt_poly[:, 1]) + ] + iou = jaccard_overlap(pred_rect, gt_rect, False) + + if iou <= 0: + return iou + + # calc rbox iou + pred_rbox = poly2rbox_np(pred.reshape(-1, 8)).reshape(-1, 5) + gt_rbox = poly2rbox_np(gt_poly.reshape(-1, 8)).reshape(-1, 5) + try: + from ext_op import rbox_iou + except Exception as e: + print("import custom_ops error, try install ext_op " \ + "following ppdet/ext_op/README.md", e) + sys.stdout.flush() + sys.exit(-1) + pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32') + pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32') + iou = rbox_iou(pd_gt_rbox, pd_pred_rbox) + iou = iou.numpy() + return iou[0][0] + + +def prune_zero_padding(gt_box, gt_label, difficult=None): + valid_cnt = 0 + for i in range(len(gt_box)): + if (gt_box[i] == 0).all(): + break + valid_cnt += 1 + return (gt_box[:valid_cnt], gt_label[:valid_cnt], difficult[:valid_cnt] + if difficult is not None else None) + + +class DetectionMAP(object): + """ + Calculate detection mean average precision. + Currently support two types: 11point and integral + + Args: + class_num (int): The class number. + overlap_thresh (float): The threshold of overlap + ratio between prediction bounding box and + ground truth bounding box for deciding + true/false positive. Default 0.5. + map_type (str): Calculation method of mean average + precision, currently support '11point' and + 'integral'. Default '11point'. + is_bbox_normalized (bool): Whether bounding boxes + is normalized to range[0, 1]. Default False. + evaluate_difficult (bool): Whether to evaluate + difficult bounding boxes. Default False. + catid2name (dict): Mapping between category id and category name. + classwise (bool): Whether per-category AP and draw + P-R Curve or not. + """ + + def __init__(self, + class_num, + overlap_thresh=0.5, + map_type='11point', + is_bbox_normalized=False, + evaluate_difficult=False, + catid2name=None, + classwise=False): + self.class_num = class_num + self.overlap_thresh = overlap_thresh + assert map_type in ['11point', 'integral'], \ + "map_type currently only support '11point' "\ + "and 'integral'" + self.map_type = map_type + self.is_bbox_normalized = is_bbox_normalized + self.evaluate_difficult = evaluate_difficult + self.classwise = classwise + self.classes = [] + for cname in catid2name.values(): + self.classes.append(cname) + self.reset() + + def update(self, bbox, score, label, gt_box, gt_label, difficult=None): + """ + Update metric statics from given prediction and ground + truth infomations. + """ + if difficult is None: + difficult = np.zeros_like(gt_label) + + # record class gt count + for gtl, diff in zip(gt_label, difficult): + if self.evaluate_difficult or int(diff) == 0: + self.class_gt_counts[int(np.array(gtl))] += 1 + + # record class score positive + visited = [False] * len(gt_label) + for b, s, l in zip(bbox, score, label): + pred = b.tolist() if isinstance(b, np.ndarray) else b + max_idx = -1 + max_overlap = -1.0 + for i, gl in enumerate(gt_label): + if int(gl) == int(l): + if len(gt_box[i]) == 8: + overlap = calc_rbox_iou(pred, gt_box[i]) + else: + overlap = jaccard_overlap(pred, gt_box[i], + self.is_bbox_normalized) + if overlap > max_overlap: + max_overlap = overlap + max_idx = i + + if max_overlap > self.overlap_thresh: + if self.evaluate_difficult or \ + int(np.array(difficult[max_idx])) == 0: + if not visited[max_idx]: + self.class_score_poss[int(l)].append([s, 1.0]) + visited[max_idx] = True + else: + self.class_score_poss[int(l)].append([s, 0.0]) + else: + self.class_score_poss[int(l)].append([s, 0.0]) + + def reset(self): + """ + Reset metric statics + """ + self.class_score_poss = [[] for _ in range(self.class_num)] + self.class_gt_counts = [0] * self.class_num + self.mAP = 0.0 + + def accumulate(self): + """ + Accumulate metric results and calculate mAP + """ + mAP = 0. + valid_cnt = 0 + eval_results = [] + for score_pos, count in zip(self.class_score_poss, + self.class_gt_counts): + if count == 0: continue + if len(score_pos) == 0: + valid_cnt += 1 + continue + + accum_tp_list, accum_fp_list = \ + self._get_tp_fp_accum(score_pos) + precision = [] + recall = [] + for ac_tp, ac_fp in zip(accum_tp_list, accum_fp_list): + precision.append(float(ac_tp) / (ac_tp + ac_fp)) + recall.append(float(ac_tp) / count) + + one_class_ap = 0.0 + if self.map_type == '11point': + max_precisions = [0.] * 11 + start_idx = len(precision) - 1 + for j in range(10, -1, -1): + for i in range(start_idx, -1, -1): + if recall[i] < float(j) / 10.: + start_idx = i + if j > 0: + max_precisions[j - 1] = max_precisions[j] + break + else: + if max_precisions[j] < precision[i]: + max_precisions[j] = precision[i] + one_class_ap = sum(max_precisions) / 11. + mAP += one_class_ap + valid_cnt += 1 + elif self.map_type == 'integral': + import math + prev_recall = 0. + for i in range(len(precision)): + recall_gap = math.fabs(recall[i] - prev_recall) + if recall_gap > 1e-6: + one_class_ap += precision[i] * recall_gap + prev_recall = recall[i] + mAP += one_class_ap + valid_cnt += 1 + else: + logger.error("Unspported mAP type {}".format(self.map_type)) + sys.exit(1) + eval_results.append({ + 'class': self.classes[valid_cnt - 1], + 'ap': one_class_ap, + 'precision': precision, + 'recall': recall, + }) + self.eval_results = eval_results + self.mAP = mAP / float(valid_cnt) if valid_cnt > 0 else mAP + + def get_map(self): + """ + Get mAP result + """ + if self.mAP is None: + logger.error("mAP is not calculated.") + if self.classwise: + # Compute per-category AP and PR curve + try: + from terminaltables import AsciiTable + except Exception as e: + logger.error( + 'terminaltables not found, plaese install terminaltables. ' + 'for example: `pip install terminaltables`.') + raise e + results_per_category = [] + for eval_result in self.eval_results: + results_per_category.append( + (str(eval_result['class']), + '{:0.3f}'.format(float(eval_result['ap'])))) + draw_pr_curve( + eval_result['precision'], + eval_result['recall'], + out_dir='voc_pr_curve', + file_name='{}_precision_recall_curve.jpg'.format( + eval_result['class'])) + + num_columns = min(6, len(results_per_category) * 2) + results_flatten = list(itertools.chain(*results_per_category)) + headers = ['category', 'AP'] * (num_columns // 2) + results_2d = itertools.zip_longest(* [ + results_flatten[i::num_columns] for i in range(num_columns) + ]) + table_data = [headers] + table_data += [result for result in results_2d] + table = AsciiTable(table_data) + logger.info('Per-category of VOC AP: \n{}'.format(table.table)) + logger.info( + "per-category PR curve has output to voc_pr_curve folder.") + return self.mAP + + def _get_tp_fp_accum(self, score_pos_list): + """ + Calculate accumulating true/false positive results from + [score, pos] records + """ + sorted_list = sorted(score_pos_list, key=lambda s: s[0], reverse=True) + accum_tp = 0 + accum_fp = 0 + accum_tp_list = [] + accum_fp_list = [] + for (score, pos) in sorted_list: + accum_tp += int(pos) + accum_tp_list.append(accum_tp) + accum_fp += 1 - int(pos) + accum_fp_list.append(accum_fp) + return accum_tp_list, accum_fp_list + + +def ap_per_class(tp, conf, pred_cls, target_cls): + """ + Computes the average precision, given the recall and precision curves. + Method originally from https://github.com/rafaelpadilla/Object-Detection-Metrics. + + Args: + tp (list): True positives. + conf (list): Objectness value from 0-1. + pred_cls (list): Predicted object classes. + target_cls (list): Target object classes. + """ + tp, conf, pred_cls, target_cls = np.array(tp), np.array(conf), np.array( + pred_cls), np.array(target_cls) + + # Sort by objectness + i = np.argsort(-conf) + tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] + + # Find unique classes + unique_classes = np.unique(np.concatenate((pred_cls, target_cls), 0)) + + # Create Precision-Recall curve and compute AP for each class + ap, p, r = [], [], [] + for c in unique_classes: + i = pred_cls == c + n_gt = sum(target_cls == c) # Number of ground truth objects + n_p = sum(i) # Number of predicted objects + + if (n_p == 0) and (n_gt == 0): + continue + elif (n_p == 0) or (n_gt == 0): + ap.append(0) + r.append(0) + p.append(0) + else: + # Accumulate FPs and TPs + fpc = np.cumsum(1 - tp[i]) + tpc = np.cumsum(tp[i]) + + # Recall + recall_curve = tpc / (n_gt + 1e-16) + r.append(tpc[-1] / (n_gt + 1e-16)) + + # Precision + precision_curve = tpc / (tpc + fpc) + p.append(tpc[-1] / (tpc[-1] + fpc[-1])) + + # AP from recall-precision curve + ap.append(compute_ap(recall_curve, precision_curve)) + + return np.array(ap), unique_classes.astype('int32'), np.array(r), np.array( + p) + + +def compute_ap(recall, precision): + """ + Computes the average precision, given the recall and precision curves. + Code originally from https://github.com/rbgirshick/py-faster-rcnn. + + Args: + recall (list): The recall curve. + precision (list): The precision curve. + + Returns: + The average precision as computed in py-faster-rcnn. + """ + # correct AP calculation + # first append sentinel values at the end + mrec = np.concatenate(([0.], recall, [1.])) + mpre = np.concatenate(([0.], precision, [0.])) + + # compute the precision envelope + for i in range(mpre.size - 1, 0, -1): + mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) + + # to calculate area under PR curve, look for points + # where X axis (recall) changes value + i = np.where(mrec[1:] != mrec[:-1])[0] + + # and sum (\Delta recall) * prec + ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) + return ap diff --git a/PaddleDetection-release-2.6/ppdet/metrics/mcmot_metrics.py b/PaddleDetection-release-2.6/ppdet/metrics/mcmot_metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..c9b5ef7506e92adcfe58d5dd1f2f2cad0d9d9e70 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/mcmot_metrics.py @@ -0,0 +1,473 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import copy +import sys +import math +from collections import defaultdict + +import numpy as np +import pandas as pd + +from .metrics import Metric +try: + import motmetrics as mm + from motmetrics.math_util import quiet_divide + metrics = mm.metrics.motchallenge_metrics + mh = mm.metrics.create() +except: + print( + 'Warning: Unable to use MCMOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['MCMOTEvaluator', 'MCMOTMetric'] + +METRICS_LIST = [ + 'num_frames', 'num_matches', 'num_switches', 'num_transfer', 'num_ascend', + 'num_migrate', 'num_false_positives', 'num_misses', 'num_detections', + 'num_objects', 'num_predictions', 'num_unique_objects', 'mostly_tracked', + 'partially_tracked', 'mostly_lost', 'num_fragmentations', 'motp', 'mota', + 'precision', 'recall', 'idfp', 'idfn', 'idtp', 'idp', 'idr', 'idf1' +] + +NAME_MAP = { + 'num_frames': 'num_frames', + 'num_matches': 'num_matches', + 'num_switches': 'IDs', + 'num_transfer': 'IDt', + 'num_ascend': 'IDa', + 'num_migrate': 'IDm', + 'num_false_positives': 'FP', + 'num_misses': 'FN', + 'num_detections': 'num_detections', + 'num_objects': 'num_objects', + 'num_predictions': 'num_predictions', + 'num_unique_objects': 'GT', + 'mostly_tracked': 'MT', + 'partially_tracked': 'partially_tracked', + 'mostly_lost': 'ML', + 'num_fragmentations': 'FM', + 'motp': 'MOTP', + 'mota': 'MOTA', + 'precision': 'Prcn', + 'recall': 'Rcll', + 'idfp': 'idfp', + 'idfn': 'idfn', + 'idtp': 'idtp', + 'idp': 'IDP', + 'idr': 'IDR', + 'idf1': 'IDF1' +} + + +def parse_accs_metrics(seq_acc, index_name, verbose=False): + """ + Parse the evaluation indicators of multiple MOTAccumulator + """ + mh = mm.metrics.create() + summary = MCMOTEvaluator.get_summary(seq_acc, index_name, METRICS_LIST) + summary.loc['OVERALL', 'motp'] = (summary['motp'] * summary['num_detections']).sum() / \ + summary.loc['OVERALL', 'num_detections'] + if verbose: + strsummary = mm.io.render_summary( + summary, formatters=mh.formatters, namemap=NAME_MAP) + print(strsummary) + + return summary + + +def seqs_overall_metrics(summary_df, verbose=False): + """ + Calculate overall metrics for multiple sequences + """ + add_col = [ + 'num_frames', 'num_matches', 'num_switches', 'num_transfer', + 'num_ascend', 'num_migrate', 'num_false_positives', 'num_misses', + 'num_detections', 'num_objects', 'num_predictions', + 'num_unique_objects', 'mostly_tracked', 'partially_tracked', + 'mostly_lost', 'num_fragmentations', 'idfp', 'idfn', 'idtp' + ] + calc_col = ['motp', 'mota', 'precision', 'recall', 'idp', 'idr', 'idf1'] + calc_df = summary_df.copy() + + overall_dic = {} + for col in add_col: + overall_dic[col] = calc_df[col].sum() + + for col in calc_col: + overall_dic[col] = getattr(MCMOTMetricOverall, col + '_overall')( + calc_df, overall_dic) + + overall_df = pd.DataFrame(overall_dic, index=['overall_calc']) + calc_df = pd.concat([calc_df, overall_df]) + + if verbose: + mh = mm.metrics.create() + str_calc_df = mm.io.render_summary( + calc_df, formatters=mh.formatters, namemap=NAME_MAP) + print(str_calc_df) + + return calc_df + + +class MCMOTMetricOverall(object): + def motp_overall(summary_df, overall_dic): + motp = quiet_divide((summary_df['motp'] * + summary_df['num_detections']).sum(), + overall_dic['num_detections']) + return motp + + def mota_overall(summary_df, overall_dic): + del summary_df + mota = 1. - quiet_divide( + (overall_dic['num_misses'] + overall_dic['num_switches'] + + overall_dic['num_false_positives']), overall_dic['num_objects']) + return mota + + def precision_overall(summary_df, overall_dic): + del summary_df + precision = quiet_divide(overall_dic['num_detections'], ( + overall_dic['num_false_positives'] + overall_dic['num_detections'])) + return precision + + def recall_overall(summary_df, overall_dic): + del summary_df + recall = quiet_divide(overall_dic['num_detections'], + overall_dic['num_objects']) + return recall + + def idp_overall(summary_df, overall_dic): + del summary_df + idp = quiet_divide(overall_dic['idtp'], + (overall_dic['idtp'] + overall_dic['idfp'])) + return idp + + def idr_overall(summary_df, overall_dic): + del summary_df + idr = quiet_divide(overall_dic['idtp'], + (overall_dic['idtp'] + overall_dic['idfn'])) + return idr + + def idf1_overall(summary_df, overall_dic): + del summary_df + idf1 = quiet_divide(2. * overall_dic['idtp'], ( + overall_dic['num_objects'] + overall_dic['num_predictions'])) + return idf1 + + +def read_mcmot_results_union(filename, is_gt, is_ignore): + results_dict = dict() + if os.path.isfile(filename): + all_result = np.loadtxt(filename, delimiter=',') + if all_result.shape[0] == 0 or all_result.shape[1] < 7: + return results_dict + if is_ignore: + return results_dict + if is_gt: + # only for test use + all_result = all_result[all_result[:, 7] != 0] + all_result[:, 7] = all_result[:, 7] - 1 + + if all_result.shape[0] == 0: + return results_dict + + class_unique = np.unique(all_result[:, 7]) + + last_max_id = 0 + result_cls_list = [] + for cls in class_unique: + result_cls_split = all_result[all_result[:, 7] == cls] + result_cls_split[:, 1] = result_cls_split[:, 1] + last_max_id + # make sure track id different between every category + last_max_id = max(np.unique(result_cls_split[:, 1])) + 1 + result_cls_list.append(result_cls_split) + + results_con = np.concatenate(result_cls_list) + + for line in range(len(results_con)): + linelist = results_con[line] + fid = int(linelist[0]) + if fid < 1: + continue + results_dict.setdefault(fid, list()) + + if is_gt: + score = 1 + else: + score = float(linelist[6]) + + tlwh = tuple(map(float, linelist[2:6])) + target_id = int(linelist[1]) + cls = int(linelist[7]) + + results_dict[fid].append((tlwh, target_id, cls, score)) + + return results_dict + + +def read_mcmot_results(filename, is_gt, is_ignore): + results_dict = dict() + if os.path.isfile(filename): + with open(filename, 'r') as f: + for line in f.readlines(): + linelist = line.strip().split(',') + if len(linelist) < 7: + continue + fid = int(linelist[0]) + if fid < 1: + continue + cid = int(linelist[7]) + if is_gt: + score = 1 + # only for test use + cid -= 1 + else: + score = float(linelist[6]) + + cls_result_dict = results_dict.setdefault(cid, dict()) + cls_result_dict.setdefault(fid, list()) + + tlwh = tuple(map(float, linelist[2:6])) + target_id = int(linelist[1]) + cls_result_dict[fid].append((tlwh, target_id, score)) + return results_dict + + +def read_results(filename, + data_type, + is_gt=False, + is_ignore=False, + multi_class=False, + union=False): + if data_type in ['mcmot', 'lab']: + if multi_class: + if union: + # The results are evaluated by union all the categories. + # Track IDs between different categories cannot be duplicate. + read_fun = read_mcmot_results_union + else: + # The results are evaluated separately by category. + read_fun = read_mcmot_results + else: + raise ValueError('multi_class: {}, MCMOT should have cls_id.'. + format(multi_class)) + else: + raise ValueError('Unknown data type: {}'.format(data_type)) + + return read_fun(filename, is_gt, is_ignore) + + +def unzip_objs(objs): + if len(objs) > 0: + tlwhs, ids, scores = zip(*objs) + else: + tlwhs, ids, scores = [], [], [] + tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4) + return tlwhs, ids, scores + + +def unzip_objs_cls(objs): + if len(objs) > 0: + tlwhs, ids, cls, scores = zip(*objs) + else: + tlwhs, ids, cls, scores = [], [], [], [] + tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4) + ids = np.array(ids) + cls = np.array(cls) + scores = np.array(scores) + return tlwhs, ids, cls, scores + + +class MCMOTEvaluator(object): + def __init__(self, data_root, seq_name, data_type, num_classes): + self.data_root = data_root + self.seq_name = seq_name + self.data_type = data_type + self.num_classes = num_classes + + self.load_annotations() + try: + import motmetrics as mm + mm.lap.default_solver = 'lap' + except Exception as e: + raise RuntimeError( + 'Unable to use MCMOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + self.reset_accumulator() + + self.class_accs = [] + + def load_annotations(self): + assert self.data_type == 'mcmot' + self.gt_filename = os.path.join(self.data_root, '../', 'sequences', + '{}.txt'.format(self.seq_name)) + if not os.path.exists(self.gt_filename): + logger.warning( + "gt_filename '{}' of MCMOTEvaluator is not exist, so the MOTA will be -INF." + ) + + def reset_accumulator(self): + self.acc = mm.MOTAccumulator(auto_id=True) + + def eval_frame_dict(self, trk_objs, gt_objs, rtn_events=False, union=False): + if union: + trk_tlwhs, trk_ids, trk_cls = unzip_objs_cls(trk_objs)[:3] + gt_tlwhs, gt_ids, gt_cls = unzip_objs_cls(gt_objs)[:3] + + # get distance matrix + iou_distance = mm.distances.iou_matrix( + gt_tlwhs, trk_tlwhs, max_iou=0.5) + + # Set the distance between objects of different categories to nan + gt_cls_len = len(gt_cls) + trk_cls_len = len(trk_cls) + # When the number of GT or Trk is 0, iou_distance dimension is (0,0) + if gt_cls_len != 0 and trk_cls_len != 0: + gt_cls = gt_cls.reshape(gt_cls_len, 1) + gt_cls = np.repeat(gt_cls, trk_cls_len, axis=1) + trk_cls = trk_cls.reshape(1, trk_cls_len) + trk_cls = np.repeat(trk_cls, gt_cls_len, axis=0) + iou_distance = np.where(gt_cls == trk_cls, iou_distance, np.nan) + + else: + trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2] + gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2] + + # get distance matrix + iou_distance = mm.distances.iou_matrix( + gt_tlwhs, trk_tlwhs, max_iou=0.5) + + self.acc.update(gt_ids, trk_ids, iou_distance) + + if rtn_events and iou_distance.size > 0 and hasattr(self.acc, + 'mot_events'): + events = self.acc.mot_events # only supported by https://github.com/longcw/py-motmetrics + else: + events = None + return events + + def eval_file(self, result_filename): + # evaluation of each category + gt_frame_dict = read_results( + self.gt_filename, + self.data_type, + is_gt=True, + multi_class=True, + union=False) + result_frame_dict = read_results( + result_filename, + self.data_type, + is_gt=False, + multi_class=True, + union=False) + + for cid in range(self.num_classes): + self.reset_accumulator() + cls_result_frame_dict = result_frame_dict.setdefault(cid, dict()) + cls_gt_frame_dict = gt_frame_dict.setdefault(cid, dict()) + + # only labeled frames will be evaluated + frames = sorted(list(set(cls_gt_frame_dict.keys()))) + + for frame_id in frames: + trk_objs = cls_result_frame_dict.get(frame_id, []) + gt_objs = cls_gt_frame_dict.get(frame_id, []) + self.eval_frame_dict(trk_objs, gt_objs, rtn_events=False) + + self.class_accs.append(self.acc) + + return self.class_accs + + @staticmethod + def get_summary(accs, + names, + metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', + 'precision', 'recall')): + names = copy.deepcopy(names) + if metrics is None: + metrics = mm.metrics.motchallenge_metrics + metrics = copy.deepcopy(metrics) + + mh = mm.metrics.create() + summary = mh.compute_many( + accs, metrics=metrics, names=names, generate_overall=True) + + return summary + + @staticmethod + def save_summary(summary, filename): + import pandas as pd + writer = pd.ExcelWriter(filename) + summary.to_excel(writer) + writer.save() + + +class MCMOTMetric(Metric): + def __init__(self, num_classes, save_summary=False): + self.num_classes = num_classes + self.save_summary = save_summary + self.MCMOTEvaluator = MCMOTEvaluator + self.result_root = None + self.reset() + + self.seqs_overall = defaultdict(list) + + def reset(self): + self.accs = [] + self.seqs = [] + + def update(self, data_root, seq, data_type, result_root, result_filename): + evaluator = self.MCMOTEvaluator(data_root, seq, data_type, + self.num_classes) + seq_acc = evaluator.eval_file(result_filename) + self.accs.append(seq_acc) + self.seqs.append(seq) + self.result_root = result_root + + cls_index_name = [ + '{}_{}'.format(seq, i) for i in range(self.num_classes) + ] + summary = parse_accs_metrics(seq_acc, cls_index_name) + summary.rename( + index={'OVERALL': '{}_OVERALL'.format(seq)}, inplace=True) + for row in range(len(summary)): + self.seqs_overall[row].append(summary.iloc[row:row + 1]) + + def accumulate(self): + self.cls_summary_list = [] + for row in range(self.num_classes): + seqs_cls_df = pd.concat(self.seqs_overall[row]) + seqs_cls_summary = seqs_overall_metrics(seqs_cls_df) + cls_summary_overall = seqs_cls_summary.iloc[-1:].copy() + cls_summary_overall.rename( + index={'overall_calc': 'overall_calc_{}'.format(row)}, + inplace=True) + self.cls_summary_list.append(cls_summary_overall) + + def log(self): + seqs_summary = seqs_overall_metrics( + pd.concat(self.seqs_overall[self.num_classes]), verbose=True) + class_summary = seqs_overall_metrics( + pd.concat(self.cls_summary_list), verbose=True) + + def get_results(self): + return 1 diff --git a/PaddleDetection-release-2.6/ppdet/metrics/metrics.py b/PaddleDetection-release-2.6/ppdet/metrics/metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..b473509599b9fbadb48ed792c5100857ab1c30ca --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/metrics.py @@ -0,0 +1,505 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import json +import paddle +import numpy as np +import typing +from collections import defaultdict +from pathlib import Path + +from .map_utils import prune_zero_padding, DetectionMAP +from .coco_utils import get_infer_results, cocoapi_eval +from .widerface_utils import face_eval_run +from ppdet.data.source.category import get_categories +from ppdet.modeling.rbox_utils import poly2rbox_np + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'Metric', 'COCOMetric', 'VOCMetric', 'WiderFaceMetric', 'get_infer_results', + 'RBoxMetric', 'SNIPERCOCOMetric' +] + +COCO_SIGMAS = np.array([ + .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, + .89, .89 +]) / 10.0 +CROWD_SIGMAS = np.array( + [.79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89, .79, + .79]) / 10.0 + + +class Metric(paddle.metric.Metric): + def name(self): + return self.__class__.__name__ + + def reset(self): + pass + + def accumulate(self): + pass + + # paddle.metric.Metric defined :metch:`update`, :meth:`accumulate` + # :metch:`reset`, in ppdet, we also need following 2 methods: + + # abstract method for logging metric results + def log(self): + pass + + # abstract method for getting metric results + def get_results(self): + pass + + +class COCOMetric(Metric): + def __init__(self, anno_file, **kwargs): + self.anno_file = anno_file + self.clsid2catid = kwargs.get('clsid2catid', None) + if self.clsid2catid is None: + self.clsid2catid, _ = get_categories('COCO', anno_file) + self.classwise = kwargs.get('classwise', False) + self.output_eval = kwargs.get('output_eval', None) + # TODO: bias should be unified + self.bias = kwargs.get('bias', 0) + self.save_prediction_only = kwargs.get('save_prediction_only', False) + self.iou_type = kwargs.get('IouType', 'bbox') + + if not self.save_prediction_only: + assert os.path.isfile(anno_file), \ + "anno_file {} not a file".format(anno_file) + + if self.output_eval is not None: + Path(self.output_eval).mkdir(exist_ok=True) + + self.reset() + + def reset(self): + # only bbox and mask evaluation support currently + self.results = {'bbox': [], 'mask': [], 'segm': [], 'keypoint': []} + self.eval_results = {} + + def update(self, inputs, outputs): + outs = {} + # outputs Tensor -> numpy.ndarray + for k, v in outputs.items(): + outs[k] = v.numpy() if isinstance(v, paddle.Tensor) else v + + # multi-scale inputs: all inputs have same im_id + if isinstance(inputs, typing.Sequence): + im_id = inputs[0]['im_id'] + else: + im_id = inputs['im_id'] + outs['im_id'] = im_id.numpy() if isinstance(im_id, + paddle.Tensor) else im_id + + infer_results = get_infer_results( + outs, self.clsid2catid, bias=self.bias) + self.results['bbox'] += infer_results[ + 'bbox'] if 'bbox' in infer_results else [] + self.results['mask'] += infer_results[ + 'mask'] if 'mask' in infer_results else [] + self.results['segm'] += infer_results[ + 'segm'] if 'segm' in infer_results else [] + self.results['keypoint'] += infer_results[ + 'keypoint'] if 'keypoint' in infer_results else [] + + def accumulate(self): + if len(self.results['bbox']) > 0: + output = "bbox.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results['bbox'], f) + logger.info('The bbox result is saved to bbox.json.') + + if self.save_prediction_only: + logger.info('The bbox result is saved to {} and do not ' + 'evaluate the mAP.'.format(output)) + else: + bbox_stats = cocoapi_eval( + output, + 'bbox', + anno_file=self.anno_file, + classwise=self.classwise) + self.eval_results['bbox'] = bbox_stats + sys.stdout.flush() + + if len(self.results['mask']) > 0: + output = "mask.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results['mask'], f) + logger.info('The mask result is saved to mask.json.') + + if self.save_prediction_only: + logger.info('The mask result is saved to {} and do not ' + 'evaluate the mAP.'.format(output)) + else: + seg_stats = cocoapi_eval( + output, + 'segm', + anno_file=self.anno_file, + classwise=self.classwise) + self.eval_results['mask'] = seg_stats + sys.stdout.flush() + + if len(self.results['segm']) > 0: + output = "segm.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results['segm'], f) + logger.info('The segm result is saved to segm.json.') + + if self.save_prediction_only: + logger.info('The segm result is saved to {} and do not ' + 'evaluate the mAP.'.format(output)) + else: + seg_stats = cocoapi_eval( + output, + 'segm', + anno_file=self.anno_file, + classwise=self.classwise) + self.eval_results['mask'] = seg_stats + sys.stdout.flush() + + if len(self.results['keypoint']) > 0: + output = "keypoint.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results['keypoint'], f) + logger.info('The keypoint result is saved to keypoint.json.') + + if self.save_prediction_only: + logger.info('The keypoint result is saved to {} and do not ' + 'evaluate the mAP.'.format(output)) + else: + style = 'keypoints' + use_area = True + sigmas = COCO_SIGMAS + if self.iou_type == 'keypoints_crowd': + style = 'keypoints_crowd' + use_area = False + sigmas = CROWD_SIGMAS + keypoint_stats = cocoapi_eval( + output, + style, + anno_file=self.anno_file, + classwise=self.classwise, + sigmas=sigmas, + use_area=use_area) + self.eval_results['keypoint'] = keypoint_stats + sys.stdout.flush() + + def log(self): + pass + + def get_results(self): + return self.eval_results + + +class VOCMetric(Metric): + def __init__(self, + label_list, + class_num=20, + overlap_thresh=0.5, + map_type='11point', + is_bbox_normalized=False, + evaluate_difficult=False, + classwise=False, + output_eval=None, + save_prediction_only=False): + assert os.path.isfile(label_list), \ + "label_list {} not a file".format(label_list) + self.clsid2catid, self.catid2name = get_categories('VOC', label_list) + + self.overlap_thresh = overlap_thresh + self.map_type = map_type + self.evaluate_difficult = evaluate_difficult + self.output_eval = output_eval + self.save_prediction_only = save_prediction_only + self.detection_map = DetectionMAP( + class_num=class_num, + overlap_thresh=overlap_thresh, + map_type=map_type, + is_bbox_normalized=is_bbox_normalized, + evaluate_difficult=evaluate_difficult, + catid2name=self.catid2name, + classwise=classwise) + + self.reset() + + def reset(self): + self.results = {'bbox': [], 'score': [], 'label': []} + self.detection_map.reset() + + def update(self, inputs, outputs): + bbox_np = outputs['bbox'].numpy() if isinstance( + outputs['bbox'], paddle.Tensor) else outputs['bbox'] + bboxes = bbox_np[:, 2:] + scores = bbox_np[:, 1] + labels = bbox_np[:, 0] + bbox_lengths = outputs['bbox_num'].numpy() if isinstance( + outputs['bbox_num'], paddle.Tensor) else outputs['bbox_num'] + + self.results['bbox'].append(bboxes.tolist()) + self.results['score'].append(scores.tolist()) + self.results['label'].append(labels.tolist()) + + if bboxes.shape == (1, 1) or bboxes is None: + return + if self.save_prediction_only: + return + + gt_boxes = inputs['gt_bbox'] + gt_labels = inputs['gt_class'] + difficults = inputs['difficult'] if not self.evaluate_difficult \ + else None + + if 'scale_factor' in inputs: + scale_factor = inputs['scale_factor'].numpy() if isinstance( + inputs['scale_factor'], + paddle.Tensor) else inputs['scale_factor'] + else: + scale_factor = np.ones((gt_boxes.shape[0], 2)).astype('float32') + + bbox_idx = 0 + for i in range(len(gt_boxes)): + gt_box = gt_boxes[i].numpy() if isinstance( + gt_boxes[i], paddle.Tensor) else gt_boxes[i] + h, w = scale_factor[i] + gt_box = gt_box / np.array([w, h, w, h]) + gt_label = gt_labels[i].numpy() if isinstance( + gt_labels[i], paddle.Tensor) else gt_labels[i] + if difficults is not None: + difficult = difficults[i].numpy() if isinstance( + difficults[i], paddle.Tensor) else difficults[i] + else: + difficult = None + bbox_num = bbox_lengths[i] + bbox = bboxes[bbox_idx:bbox_idx + bbox_num] + score = scores[bbox_idx:bbox_idx + bbox_num] + label = labels[bbox_idx:bbox_idx + bbox_num] + gt_box, gt_label, difficult = prune_zero_padding(gt_box, gt_label, + difficult) + self.detection_map.update(bbox, score, label, gt_box, gt_label, + difficult) + bbox_idx += bbox_num + + def accumulate(self): + output = "bbox.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results, f) + logger.info('The bbox result is saved to bbox.json.') + if self.save_prediction_only: + return + + logger.info("Accumulating evaluatation results...") + self.detection_map.accumulate() + + def log(self): + map_stat = 100. * self.detection_map.get_map() + logger.info("mAP({:.2f}, {}) = {:.2f}%".format(self.overlap_thresh, + self.map_type, map_stat)) + + def get_results(self): + return {'bbox': [self.detection_map.get_map()]} + + +class WiderFaceMetric(Metric): + def __init__(self, image_dir, anno_file, multi_scale=True): + self.image_dir = image_dir + self.anno_file = anno_file + self.multi_scale = multi_scale + self.clsid2catid, self.catid2name = get_categories('widerface') + + def update(self, model): + + face_eval_run( + model, + self.image_dir, + self.anno_file, + pred_dir='output/pred', + eval_mode='widerface', + multi_scale=self.multi_scale) + + +class RBoxMetric(Metric): + def __init__(self, anno_file, **kwargs): + self.anno_file = anno_file + self.clsid2catid, self.catid2name = get_categories('RBOX', anno_file) + self.catid2clsid = {v: k for k, v in self.clsid2catid.items()} + self.classwise = kwargs.get('classwise', False) + self.output_eval = kwargs.get('output_eval', None) + self.save_prediction_only = kwargs.get('save_prediction_only', False) + self.overlap_thresh = kwargs.get('overlap_thresh', 0.5) + self.map_type = kwargs.get('map_type', '11point') + self.evaluate_difficult = kwargs.get('evaluate_difficult', False) + self.imid2path = kwargs.get('imid2path', None) + class_num = len(self.catid2name) + self.detection_map = DetectionMAP( + class_num=class_num, + overlap_thresh=self.overlap_thresh, + map_type=self.map_type, + is_bbox_normalized=False, + evaluate_difficult=self.evaluate_difficult, + catid2name=self.catid2name, + classwise=self.classwise) + + self.reset() + + def reset(self): + self.results = [] + self.detection_map.reset() + + def update(self, inputs, outputs): + outs = {} + # outputs Tensor -> numpy.ndarray + for k, v in outputs.items(): + outs[k] = v.numpy() if isinstance(v, paddle.Tensor) else v + + im_id = inputs['im_id'] + im_id = im_id.numpy() if isinstance(im_id, paddle.Tensor) else im_id + outs['im_id'] = im_id + + infer_results = get_infer_results(outs, self.clsid2catid) + infer_results = infer_results['bbox'] if 'bbox' in infer_results else [] + self.results += infer_results + if self.save_prediction_only: + return + + gt_boxes = inputs['gt_poly'] + gt_labels = inputs['gt_class'] + + if 'scale_factor' in inputs: + scale_factor = inputs['scale_factor'].numpy() if isinstance( + inputs['scale_factor'], + paddle.Tensor) else inputs['scale_factor'] + else: + scale_factor = np.ones((gt_boxes.shape[0], 2)).astype('float32') + + for i in range(len(gt_boxes)): + gt_box = gt_boxes[i].numpy() if isinstance( + gt_boxes[i], paddle.Tensor) else gt_boxes[i] + h, w = scale_factor[i] + gt_box = gt_box / np.array([w, h, w, h, w, h, w, h]) + gt_label = gt_labels[i].numpy() if isinstance( + gt_labels[i], paddle.Tensor) else gt_labels[i] + gt_box, gt_label, _ = prune_zero_padding(gt_box, gt_label) + bbox = [ + res['bbox'] for res in infer_results + if int(res['image_id']) == int(im_id[i]) + ] + score = [ + res['score'] for res in infer_results + if int(res['image_id']) == int(im_id[i]) + ] + label = [ + self.catid2clsid[int(res['category_id'])] + for res in infer_results + if int(res['image_id']) == int(im_id[i]) + ] + self.detection_map.update(bbox, score, label, gt_box, gt_label) + + def save_results(self, results, output_dir, imid2path): + if imid2path: + data_dicts = defaultdict(list) + for result in results: + image_id = result['image_id'] + data_dicts[image_id].append(result) + + for image_id, image_path in imid2path.items(): + basename = os.path.splitext(os.path.split(image_path)[-1])[0] + output = os.path.join(output_dir, "{}.txt".format(basename)) + dets = data_dicts.get(image_id, []) + with open(output, 'w') as f: + for det in dets: + catid, bbox, score = det['category_id'], det[ + 'bbox'], det['score'] + bbox_pred = '{} {} '.format(self.catid2name[catid], + score) + ' '.join( + [str(e) for e in bbox]) + f.write(bbox_pred + '\n') + + logger.info('The bbox result is saved to {}.'.format(output_dir)) + else: + output = os.path.join(output_dir, "bbox.json") + with open(output, 'w') as f: + json.dump(results, f) + + logger.info('The bbox result is saved to {}.'.format(output)) + + def accumulate(self): + if self.output_eval: + self.save_results(self.results, self.output_eval, self.imid2path) + + if not self.save_prediction_only: + logger.info("Accumulating evaluatation results...") + self.detection_map.accumulate() + + def log(self): + map_stat = 100. * self.detection_map.get_map() + logger.info("mAP({:.2f}, {}) = {:.2f}%".format(self.overlap_thresh, + self.map_type, map_stat)) + + def get_results(self): + return {'bbox': [self.detection_map.get_map()]} + + +class SNIPERCOCOMetric(COCOMetric): + def __init__(self, anno_file, **kwargs): + super(SNIPERCOCOMetric, self).__init__(anno_file, **kwargs) + self.dataset = kwargs["dataset"] + self.chip_results = [] + + def reset(self): + # only bbox and mask evaluation support currently + self.results = {'bbox': [], 'mask': [], 'segm': [], 'keypoint': []} + self.eval_results = {} + self.chip_results = [] + + def update(self, inputs, outputs): + outs = {} + # outputs Tensor -> numpy.ndarray + for k, v in outputs.items(): + outs[k] = v.numpy() if isinstance(v, paddle.Tensor) else v + + im_id = inputs['im_id'] + outs['im_id'] = im_id.numpy() if isinstance(im_id, + paddle.Tensor) else im_id + + self.chip_results.append(outs) + + def accumulate(self): + results = self.dataset.anno_cropper.aggregate_chips_detections( + self.chip_results) + for outs in results: + infer_results = get_infer_results( + outs, self.clsid2catid, bias=self.bias) + self.results['bbox'] += infer_results[ + 'bbox'] if 'bbox' in infer_results else [] + + super(SNIPERCOCOMetric, self).accumulate() diff --git a/PaddleDetection-release-2.6/ppdet/metrics/mot_metrics.py b/PaddleDetection-release-2.6/ppdet/metrics/mot_metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..b5ed8a2d4a8f60d37297a94265733970212e24d0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/mot_metrics.py @@ -0,0 +1,1246 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import copy +import sys +import math +from collections import defaultdict +import numpy as np + +from ppdet.modeling.bbox_utils import bbox_iou_np_expand +from .map_utils import ap_per_class +from .metrics import Metric +from .munkres import Munkres + +try: + import motmetrics as mm + mm.lap.default_solver = 'lap' +except: + print( + 'Warning: Unable to use MOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['MOTEvaluator', 'MOTMetric', 'JDEDetMetric', 'KITTIMOTMetric'] + + +def read_mot_results(filename, is_gt=False, is_ignore=False): + valid_label = [1] + ignore_labels = [2, 7, 8, 12] # only in motchallenge datasets like 'MOT16' + if is_gt: + logger.info( + "In MOT16/17 dataset the valid_label of ground truth is '{}', " + "in other dataset it should be '0' for single classs MOT.".format( + valid_label[0])) + results_dict = dict() + if os.path.isfile(filename): + with open(filename, 'r') as f: + for line in f.readlines(): + linelist = line.split(',') + if len(linelist) < 7: + continue + fid = int(linelist[0]) + if fid < 1: + continue + results_dict.setdefault(fid, list()) + + if is_gt: + label = int(float(linelist[7])) + mark = int(float(linelist[6])) + if mark == 0 or label not in valid_label: + continue + score = 1 + elif is_ignore: + if 'MOT16-' in filename or 'MOT17-' in filename or 'MOT15-' in filename or 'MOT20-' in filename: + label = int(float(linelist[7])) + vis_ratio = float(linelist[8]) + if label not in ignore_labels and vis_ratio >= 0: + continue + else: + continue + score = 1 + else: + score = float(linelist[6]) + + tlwh = tuple(map(float, linelist[2:6])) + target_id = int(linelist[1]) + + results_dict[fid].append((tlwh, target_id, score)) + return results_dict + + +""" +MOT dataset label list, see in https://motchallenge.net +labels={'ped', ... % 1 + 'person_on_vhcl', ... % 2 + 'car', ... % 3 + 'bicycle', ... % 4 + 'mbike', ... % 5 + 'non_mot_vhcl', ... % 6 + 'static_person', ... % 7 + 'distractor', ... % 8 + 'occluder', ... % 9 + 'occluder_on_grnd', ... % 10 + 'occluder_full', ... % 11 + 'reflection', ... % 12 + 'crowd' ... % 13 +}; +""" + + +def unzip_objs(objs): + if len(objs) > 0: + tlwhs, ids, scores = zip(*objs) + else: + tlwhs, ids, scores = [], [], [] + tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4) + return tlwhs, ids, scores + + +class MOTEvaluator(object): + def __init__(self, data_root, seq_name, data_type): + self.data_root = data_root + self.seq_name = seq_name + self.data_type = data_type + + self.load_annotations() + try: + import motmetrics as mm + mm.lap.default_solver = 'lap' + except Exception as e: + raise RuntimeError( + 'Unable to use MOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + self.reset_accumulator() + + def load_annotations(self): + assert self.data_type == 'mot' + gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', + 'gt.txt') + if not os.path.exists(gt_filename): + logger.warning( + "gt_filename '{}' of MOTEvaluator is not exist, so the MOTA will be -INF." + ) + self.gt_frame_dict = read_mot_results(gt_filename, is_gt=True) + self.gt_ignore_frame_dict = read_mot_results( + gt_filename, is_ignore=True) + + def reset_accumulator(self): + self.acc = mm.MOTAccumulator(auto_id=True) + + def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False): + # results + trk_tlwhs = np.copy(trk_tlwhs) + trk_ids = np.copy(trk_ids) + + # gts + gt_objs = self.gt_frame_dict.get(frame_id, []) + gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2] + + # ignore boxes + ignore_objs = self.gt_ignore_frame_dict.get(frame_id, []) + ignore_tlwhs = unzip_objs(ignore_objs)[0] + + # remove ignored results + keep = np.ones(len(trk_tlwhs), dtype=bool) + iou_distance = mm.distances.iou_matrix( + ignore_tlwhs, trk_tlwhs, max_iou=0.5) + if len(iou_distance) > 0: + match_is, match_js = mm.lap.linear_sum_assignment(iou_distance) + match_is, match_js = map(lambda a: np.asarray(a, dtype=int), [match_is, match_js]) + match_ious = iou_distance[match_is, match_js] + + match_js = np.asarray(match_js, dtype=int) + match_js = match_js[np.logical_not(np.isnan(match_ious))] + keep[match_js] = False + trk_tlwhs = trk_tlwhs[keep] + trk_ids = trk_ids[keep] + + # get distance matrix + iou_distance = mm.distances.iou_matrix(gt_tlwhs, trk_tlwhs, max_iou=0.5) + + # acc + self.acc.update(gt_ids, trk_ids, iou_distance) + + if rtn_events and iou_distance.size > 0 and hasattr(self.acc, + 'last_mot_events'): + events = self.acc.last_mot_events # only supported by https://github.com/longcw/py-motmetrics + else: + events = None + return events + + def eval_file(self, filename): + self.reset_accumulator() + + result_frame_dict = read_mot_results(filename, is_gt=False) + frames = sorted(list(set(result_frame_dict.keys()))) + for frame_id in frames: + trk_objs = result_frame_dict.get(frame_id, []) + trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2] + self.eval_frame(frame_id, trk_tlwhs, trk_ids, rtn_events=False) + + return self.acc + + @staticmethod + def get_summary(accs, + names, + metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', + 'precision', 'recall')): + names = copy.deepcopy(names) + if metrics is None: + metrics = mm.metrics.motchallenge_metrics + metrics = copy.deepcopy(metrics) + + mh = mm.metrics.create() + summary = mh.compute_many( + accs, metrics=metrics, names=names, generate_overall=True) + return summary + + @staticmethod + def save_summary(summary, filename): + import pandas as pd + writer = pd.ExcelWriter(filename) + summary.to_excel(writer) + writer.save() + + +class MOTMetric(Metric): + def __init__(self, save_summary=False): + self.save_summary = save_summary + self.MOTEvaluator = MOTEvaluator + self.result_root = None + self.reset() + + def reset(self): + self.accs = [] + self.seqs = [] + + def update(self, data_root, seq, data_type, result_root, result_filename): + evaluator = self.MOTEvaluator(data_root, seq, data_type) + self.accs.append(evaluator.eval_file(result_filename)) + self.seqs.append(seq) + self.result_root = result_root + + def accumulate(self): + metrics = mm.metrics.motchallenge_metrics + mh = mm.metrics.create() + summary = self.MOTEvaluator.get_summary(self.accs, self.seqs, metrics) + self.strsummary = mm.io.render_summary( + summary, + formatters=mh.formatters, + namemap=mm.io.motchallenge_metric_names) + if self.save_summary: + self.MOTEvaluator.save_summary( + summary, os.path.join(self.result_root, 'summary.xlsx')) + + def log(self): + print(self.strsummary) + + def get_results(self): + return self.strsummary + + +class JDEDetMetric(Metric): + # Note this detection AP metric is different from COCOMetric or VOCMetric, + # and the bboxes coordinates are not scaled to the original image + def __init__(self, overlap_thresh=0.5): + self.overlap_thresh = overlap_thresh + self.reset() + + def reset(self): + self.AP_accum = np.zeros(1) + self.AP_accum_count = np.zeros(1) + + def update(self, inputs, outputs): + bboxes = outputs['bbox'][:, 2:].numpy() + scores = outputs['bbox'][:, 1].numpy() + labels = outputs['bbox'][:, 0].numpy() + bbox_lengths = outputs['bbox_num'].numpy() + if bboxes.shape[0] == 1 and bboxes.sum() == 0.0: + return + + gt_boxes = inputs['gt_bbox'].numpy()[0] + gt_labels = inputs['gt_class'].numpy()[0] + if gt_labels.shape[0] == 0: + return + + correct = [] + detected = [] + for i in range(bboxes.shape[0]): + obj_pred = 0 + pred_bbox = bboxes[i].reshape(1, 4) + # Compute iou with target boxes + iou = bbox_iou_np_expand(pred_bbox, gt_boxes, x1y1x2y2=True)[0] + # Extract index of largest overlap + best_i = np.argmax(iou) + # If overlap exceeds threshold and classification is correct mark as correct + if iou[best_i] > self.overlap_thresh and obj_pred == gt_labels[ + best_i] and best_i not in detected: + correct.append(1) + detected.append(best_i) + else: + correct.append(0) + + # Compute Average Precision (AP) per class + target_cls = list(gt_labels.T[0]) + AP, AP_class, R, P = ap_per_class( + tp=correct, + conf=scores, + pred_cls=np.zeros_like(scores), + target_cls=target_cls) + self.AP_accum_count += np.bincount(AP_class, minlength=1) + self.AP_accum += np.bincount(AP_class, minlength=1, weights=AP) + + def accumulate(self): + logger.info("Accumulating evaluatation results...") + self.map_stat = self.AP_accum[0] / (self.AP_accum_count[0] + 1E-16) + + def log(self): + map_stat = 100. * self.map_stat + logger.info("mAP({:.2f}) = {:.2f}%".format(self.overlap_thresh, + map_stat)) + + def get_results(self): + return self.map_stat + + +""" +Following code is borrow from https://github.com/xingyizhou/CenterTrack/blob/master/src/tools/eval_kitti_track/evaluate_tracking.py +""" + + +class tData: + """ + Utility class to load data. + """ + def __init__(self,frame=-1,obj_type="unset",truncation=-1,occlusion=-1,\ + obs_angle=-10,x1=-1,y1=-1,x2=-1,y2=-1,w=-1,h=-1,l=-1,\ + X=-1000,Y=-1000,Z=-1000,yaw=-10,score=-1000,track_id=-1): + """ + Constructor, initializes the object given the parameters. + """ + self.frame = frame + self.track_id = track_id + self.obj_type = obj_type + self.truncation = truncation + self.occlusion = occlusion + self.obs_angle = obs_angle + self.x1 = x1 + self.y1 = y1 + self.x2 = x2 + self.y2 = y2 + self.w = w + self.h = h + self.l = l + self.X = X + self.Y = Y + self.Z = Z + self.yaw = yaw + self.score = score + self.ignored = False + self.valid = False + self.tracker = -1 + + def __str__(self): + attrs = vars(self) + return '\n'.join("%s: %s" % item for item in attrs.items()) + + +class KITTIEvaluation(object): + """ KITTI tracking statistics (CLEAR MOT, id-switches, fragments, ML/PT/MT, precision/recall) + MOTA - Multi-object tracking accuracy in [0,100] + MOTP - Multi-object tracking precision in [0,100] (3D) / [td,100] (2D) + MOTAL - Multi-object tracking accuracy in [0,100] with log10(id-switches) + + id-switches - number of id switches + fragments - number of fragmentations + + MT, PT, ML - number of mostly tracked, partially tracked and mostly lost trajectories + + recall - recall = percentage of detected targets + precision - precision = percentage of correctly detected targets + FAR - number of false alarms per frame + falsepositives - number of false positives (FP) + missed - number of missed targets (FN) + """ + def __init__(self, result_path, gt_path, min_overlap=0.5, max_truncation = 0,\ + min_height = 25, max_occlusion = 2, cls="car",\ + n_frames=[], seqs=[], n_sequences=0): + # get number of sequences and + # get number of frames per sequence from test mapping + # (created while extracting the benchmark) + self.gt_path = os.path.join(gt_path, "../labels") + self.n_frames = n_frames + self.sequence_name = seqs + self.n_sequences = n_sequences + + self.cls = cls # class to evaluate, i.e. pedestrian or car + + self.result_path = result_path + + # statistics and numbers for evaluation + self.n_gt = 0 # number of ground truth detections minus ignored false negatives and true positives + self.n_igt = 0 # number of ignored ground truth detections + self.n_gts = [ + ] # number of ground truth detections minus ignored false negatives and true positives PER SEQUENCE + self.n_igts = [ + ] # number of ground ignored truth detections PER SEQUENCE + self.n_gt_trajectories = 0 + self.n_gt_seq = [] + self.n_tr = 0 # number of tracker detections minus ignored tracker detections + self.n_trs = [ + ] # number of tracker detections minus ignored tracker detections PER SEQUENCE + self.n_itr = 0 # number of ignored tracker detections + self.n_itrs = [] # number of ignored tracker detections PER SEQUENCE + self.n_igttr = 0 # number of ignored ground truth detections where the corresponding associated tracker detection is also ignored + self.n_tr_trajectories = 0 + self.n_tr_seq = [] + self.MOTA = 0 + self.MOTP = 0 + self.MOTAL = 0 + self.MODA = 0 + self.MODP = 0 + self.MODP_t = [] + self.recall = 0 + self.precision = 0 + self.F1 = 0 + self.FAR = 0 + self.total_cost = 0 + self.itp = 0 # number of ignored true positives + self.itps = [] # number of ignored true positives PER SEQUENCE + self.tp = 0 # number of true positives including ignored true positives! + self.tps = [ + ] # number of true positives including ignored true positives PER SEQUENCE + self.fn = 0 # number of false negatives WITHOUT ignored false negatives + self.fns = [ + ] # number of false negatives WITHOUT ignored false negatives PER SEQUENCE + self.ifn = 0 # number of ignored false negatives + self.ifns = [] # number of ignored false negatives PER SEQUENCE + self.fp = 0 # number of false positives + # a bit tricky, the number of ignored false negatives and ignored true positives + # is subtracted, but if both tracker detection and ground truth detection + # are ignored this number is added again to avoid double counting + self.fps = [] # above PER SEQUENCE + self.mme = 0 + self.fragments = 0 + self.id_switches = 0 + self.MT = 0 + self.PT = 0 + self.ML = 0 + + self.min_overlap = min_overlap # minimum bounding box overlap for 3rd party metrics + self.max_truncation = max_truncation # maximum truncation of an object for evaluation + self.max_occlusion = max_occlusion # maximum occlusion of an object for evaluation + self.min_height = min_height # minimum height of an object for evaluation + self.n_sample_points = 500 + + # this should be enough to hold all groundtruth trajectories + # is expanded if necessary and reduced in any case + self.gt_trajectories = [[] for x in range(self.n_sequences)] + self.ign_trajectories = [[] for x in range(self.n_sequences)] + + def loadGroundtruth(self): + try: + self._loadData(self.gt_path, cls=self.cls, loading_groundtruth=True) + except IOError: + return False + return True + + def loadTracker(self): + try: + if not self._loadData( + self.result_path, cls=self.cls, loading_groundtruth=False): + return False + except IOError: + return False + return True + + def _loadData(self, + root_dir, + cls, + min_score=-1000, + loading_groundtruth=False): + """ + Generic loader for ground truth and tracking data. + Use loadGroundtruth() or loadTracker() to load this data. + Loads detections in KITTI format from textfiles. + """ + # construct objectDetections object to hold detection data + t_data = tData() + data = [] + eval_2d = True + eval_3d = True + + seq_data = [] + n_trajectories = 0 + n_trajectories_seq = [] + for seq, s_name in enumerate(self.sequence_name): + i = 0 + filename = os.path.join(root_dir, "%s.txt" % s_name) + f = open(filename, "r") + + f_data = [ + [] for x in range(self.n_frames[seq]) + ] # current set has only 1059 entries, sufficient length is checked anyway + ids = [] + n_in_seq = 0 + id_frame_cache = [] + for line in f: + # KITTI tracking benchmark data format: + # (frame,tracklet_id,objectType,truncation,occlusion,alpha,x1,y1,x2,y2,h,w,l,X,Y,Z,ry) + line = line.strip() + fields = line.split(" ") + # classes that should be loaded (ignored neighboring classes) + if "car" in cls.lower(): + classes = ["car", "van"] + elif "pedestrian" in cls.lower(): + classes = ["pedestrian", "person_sitting"] + else: + classes = [cls.lower()] + classes += ["dontcare"] + if not any([s for s in classes if s in fields[2].lower()]): + continue + # get fields from table + t_data.frame = int(float(fields[0])) # frame + t_data.track_id = int(float(fields[1])) # id + t_data.obj_type = fields[ + 2].lower() # object type [car, pedestrian, cyclist, ...] + t_data.truncation = int( + float(fields[3])) # truncation [-1,0,1,2] + t_data.occlusion = int( + float(fields[4])) # occlusion [-1,0,1,2] + t_data.obs_angle = float(fields[5]) # observation angle [rad] + t_data.x1 = float(fields[6]) # left [px] + t_data.y1 = float(fields[7]) # top [px] + t_data.x2 = float(fields[8]) # right [px] + t_data.y2 = float(fields[9]) # bottom [px] + t_data.h = float(fields[10]) # height [m] + t_data.w = float(fields[11]) # width [m] + t_data.l = float(fields[12]) # length [m] + t_data.X = float(fields[13]) # X [m] + t_data.Y = float(fields[14]) # Y [m] + t_data.Z = float(fields[15]) # Z [m] + t_data.yaw = float(fields[16]) # yaw angle [rad] + if not loading_groundtruth: + if len(fields) == 17: + t_data.score = -1 + elif len(fields) == 18: + t_data.score = float(fields[17]) # detection score + else: + logger.info("file is not in KITTI format") + return + + # do not consider objects marked as invalid + if t_data.track_id is -1 and t_data.obj_type != "dontcare": + continue + + idx = t_data.frame + # check if length for frame data is sufficient + if idx >= len(f_data): + print("extend f_data", idx, len(f_data)) + f_data += [[] for x in range(max(500, idx - len(f_data)))] + try: + id_frame = (t_data.frame, t_data.track_id) + if id_frame in id_frame_cache and not loading_groundtruth: + logger.info( + "track ids are not unique for sequence %d: frame %d" + % (seq, t_data.frame)) + logger.info( + "track id %d occurred at least twice for this frame" + % t_data.track_id) + logger.info("Exiting...") + #continue # this allows to evaluate non-unique result files + return False + id_frame_cache.append(id_frame) + f_data[t_data.frame].append(copy.copy(t_data)) + except: + print(len(f_data), idx) + raise + + if t_data.track_id not in ids and t_data.obj_type != "dontcare": + ids.append(t_data.track_id) + n_trajectories += 1 + n_in_seq += 1 + + # check if uploaded data provides information for 2D and 3D evaluation + if not loading_groundtruth and eval_2d is True and ( + t_data.x1 == -1 or t_data.x2 == -1 or t_data.y1 == -1 or + t_data.y2 == -1): + eval_2d = False + if not loading_groundtruth and eval_3d is True and ( + t_data.X == -1000 or t_data.Y == -1000 or + t_data.Z == -1000): + eval_3d = False + + # only add existing frames + n_trajectories_seq.append(n_in_seq) + seq_data.append(f_data) + f.close() + + if not loading_groundtruth: + self.tracker = seq_data + self.n_tr_trajectories = n_trajectories + self.eval_2d = eval_2d + self.eval_3d = eval_3d + self.n_tr_seq = n_trajectories_seq + if self.n_tr_trajectories == 0: + return False + else: + # split ground truth and DontCare areas + self.dcareas = [] + self.groundtruth = [] + for seq_idx in range(len(seq_data)): + seq_gt = seq_data[seq_idx] + s_g, s_dc = [], [] + for f in range(len(seq_gt)): + all_gt = seq_gt[f] + g, dc = [], [] + for gg in all_gt: + if gg.obj_type == "dontcare": + dc.append(gg) + else: + g.append(gg) + s_g.append(g) + s_dc.append(dc) + self.dcareas.append(s_dc) + self.groundtruth.append(s_g) + self.n_gt_seq = n_trajectories_seq + self.n_gt_trajectories = n_trajectories + return True + + def boxoverlap(self, a, b, criterion="union"): + """ + boxoverlap computes intersection over union for bbox a and b in KITTI format. + If the criterion is 'union', overlap = (a inter b) / a union b). + If the criterion is 'a', overlap = (a inter b) / a, where b should be a dontcare area. + """ + x1 = max(a.x1, b.x1) + y1 = max(a.y1, b.y1) + x2 = min(a.x2, b.x2) + y2 = min(a.y2, b.y2) + + w = x2 - x1 + h = y2 - y1 + + if w <= 0. or h <= 0.: + return 0. + inter = w * h + aarea = (a.x2 - a.x1) * (a.y2 - a.y1) + barea = (b.x2 - b.x1) * (b.y2 - b.y1) + # intersection over union overlap + if criterion.lower() == "union": + o = inter / float(aarea + barea - inter) + elif criterion.lower() == "a": + o = float(inter) / float(aarea) + else: + raise TypeError("Unkown type for criterion") + return o + + def compute3rdPartyMetrics(self): + """ + Computes the metrics defined in + - Stiefelhagen 2008: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics + MOTA, MOTAL, MOTP + - Nevatia 2008: Global Data Association for Multi-Object Tracking Using Network Flows + MT/PT/ML + """ + # construct Munkres object for Hungarian Method association + hm = Munkres() + max_cost = 1e9 + + # go through all frames and associate ground truth and tracker results + # groundtruth and tracker contain lists for every single frame containing lists of KITTI format detections + fr, ids = 0, 0 + for seq_idx in range(len(self.groundtruth)): + seq_gt = self.groundtruth[seq_idx] + seq_dc = self.dcareas[seq_idx] # don't care areas + seq_tracker = self.tracker[seq_idx] + seq_trajectories = defaultdict(list) + seq_ignored = defaultdict(list) + + # statistics over the current sequence, check the corresponding + # variable comments in __init__ to get their meaning + seqtp = 0 + seqitp = 0 + seqfn = 0 + seqifn = 0 + seqfp = 0 + seqigt = 0 + seqitr = 0 + + last_ids = [[], []] + n_gts = 0 + n_trs = 0 + + for f in range(len(seq_gt)): + g = seq_gt[f] + dc = seq_dc[f] + + t = seq_tracker[f] + # counting total number of ground truth and tracker objects + self.n_gt += len(g) + self.n_tr += len(t) + + n_gts += len(g) + n_trs += len(t) + + # use hungarian method to associate, using boxoverlap 0..1 as cost + # build cost matrix + cost_matrix = [] + this_ids = [[], []] + for gg in g: + # save current ids + this_ids[0].append(gg.track_id) + this_ids[1].append(-1) + gg.tracker = -1 + gg.id_switch = 0 + gg.fragmentation = 0 + cost_row = [] + for tt in t: + # overlap == 1 is cost ==0 + c = 1 - self.boxoverlap(gg, tt) + # gating for boxoverlap + if c <= self.min_overlap: + cost_row.append(c) + else: + cost_row.append(max_cost) # = 1e9 + cost_matrix.append(cost_row) + # all ground truth trajectories are initially not associated + # extend groundtruth trajectories lists (merge lists) + seq_trajectories[gg.track_id].append(-1) + seq_ignored[gg.track_id].append(False) + + if len(g) is 0: + cost_matrix = [[]] + # associate + association_matrix = hm.compute(cost_matrix) + + # tmp variables for sanity checks and MODP computation + tmptp = 0 + tmpfp = 0 + tmpfn = 0 + tmpc = 0 # this will sum up the overlaps for all true positives + tmpcs = [0] * len( + g) # this will save the overlaps for all true positives + # the reason is that some true positives might be ignored + # later such that the corrsponding overlaps can + # be subtracted from tmpc for MODP computation + + # mapping for tracker ids and ground truth ids + for row, col in association_matrix: + # apply gating on boxoverlap + c = cost_matrix[row][col] + if c < max_cost: + g[row].tracker = t[col].track_id + this_ids[1][row] = t[col].track_id + t[col].valid = True + g[row].distance = c + self.total_cost += 1 - c + tmpc += 1 - c + tmpcs[row] = 1 - c + seq_trajectories[g[row].track_id][-1] = t[col].track_id + + # true positives are only valid associations + self.tp += 1 + tmptp += 1 + else: + g[row].tracker = -1 + self.fn += 1 + tmpfn += 1 + + # associate tracker and DontCare areas + # ignore tracker in neighboring classes + nignoredtracker = 0 # number of ignored tracker detections + ignoredtrackers = dict() # will associate the track_id with -1 + # if it is not ignored and 1 if it is + # ignored; + # this is used to avoid double counting ignored + # cases, see the next loop + + for tt in t: + ignoredtrackers[tt.track_id] = -1 + # ignore detection if it belongs to a neighboring class or is + # smaller or equal to the minimum height + + tt_height = abs(tt.y1 - tt.y2) + if ((self.cls == "car" and tt.obj_type == "van") or + (self.cls == "pedestrian" and + tt.obj_type == "person_sitting") or + tt_height <= self.min_height) and not tt.valid: + nignoredtracker += 1 + tt.ignored = True + ignoredtrackers[tt.track_id] = 1 + continue + for d in dc: + overlap = self.boxoverlap(tt, d, "a") + if overlap > 0.5 and not tt.valid: + tt.ignored = True + nignoredtracker += 1 + ignoredtrackers[tt.track_id] = 1 + break + + # check for ignored FN/TP (truncation or neighboring object class) + ignoredfn = 0 # the number of ignored false negatives + nignoredtp = 0 # the number of ignored true positives + nignoredpairs = 0 # the number of ignored pairs, i.e. a true positive + # which is ignored but where the associated tracker + # detection has already been ignored + + gi = 0 + for gg in g: + if gg.tracker < 0: + if gg.occlusion>self.max_occlusion or gg.truncation>self.max_truncation\ + or (self.cls=="car" and gg.obj_type=="van") or (self.cls=="pedestrian" and gg.obj_type=="person_sitting"): + seq_ignored[gg.track_id][-1] = True + gg.ignored = True + ignoredfn += 1 + + elif gg.tracker >= 0: + if gg.occlusion>self.max_occlusion or gg.truncation>self.max_truncation\ + or (self.cls=="car" and gg.obj_type=="van") or (self.cls=="pedestrian" and gg.obj_type=="person_sitting"): + + seq_ignored[gg.track_id][-1] = True + gg.ignored = True + nignoredtp += 1 + + # if the associated tracker detection is already ignored, + # we want to avoid double counting ignored detections + if ignoredtrackers[gg.tracker] > 0: + nignoredpairs += 1 + + # for computing MODP, the overlaps from ignored detections + # are subtracted + tmpc -= tmpcs[gi] + gi += 1 + + # the below might be confusion, check the comments in __init__ + # to see what the individual statistics represent + + # correct TP by number of ignored TP due to truncation + # ignored TP are shown as tracked in visualization + tmptp -= nignoredtp + + # count the number of ignored true positives + self.itp += nignoredtp + + # adjust the number of ground truth objects considered + self.n_gt -= (ignoredfn + nignoredtp) + + # count the number of ignored ground truth objects + self.n_igt += ignoredfn + nignoredtp + + # count the number of ignored tracker objects + self.n_itr += nignoredtracker + + # count the number of ignored pairs, i.e. associated tracker and + # ground truth objects that are both ignored + self.n_igttr += nignoredpairs + + # false negatives = associated gt bboxes exceding association threshold + non-associated gt bboxes + tmpfn += len(g) - len(association_matrix) - ignoredfn + self.fn += len(g) - len(association_matrix) - ignoredfn + self.ifn += ignoredfn + + # false positives = tracker bboxes - associated tracker bboxes + # mismatches (mme_t) + tmpfp += len( + t) - tmptp - nignoredtracker - nignoredtp + nignoredpairs + self.fp += len( + t) - tmptp - nignoredtracker - nignoredtp + nignoredpairs + + # update sequence data + seqtp += tmptp + seqitp += nignoredtp + seqfp += tmpfp + seqfn += tmpfn + seqifn += ignoredfn + seqigt += ignoredfn + nignoredtp + seqitr += nignoredtracker + + # sanity checks + # - the number of true positives minues ignored true positives + # should be greater or equal to 0 + # - the number of false negatives should be greater or equal to 0 + # - the number of false positives needs to be greater or equal to 0 + # otherwise ignored detections might be counted double + # - the number of counted true positives (plus ignored ones) + # and the number of counted false negatives (plus ignored ones) + # should match the total number of ground truth objects + # - the number of counted true positives (plus ignored ones) + # and the number of counted false positives + # plus the number of ignored tracker detections should + # match the total number of tracker detections; note that + # nignoredpairs is subtracted here to avoid double counting + # of ignored detection sin nignoredtp and nignoredtracker + if tmptp < 0: + print(tmptp, nignoredtp) + raise NameError("Something went wrong! TP is negative") + if tmpfn < 0: + print(tmpfn, + len(g), + len(association_matrix), ignoredfn, nignoredpairs) + raise NameError("Something went wrong! FN is negative") + if tmpfp < 0: + print(tmpfp, + len(t), tmptp, nignoredtracker, nignoredtp, + nignoredpairs) + raise NameError("Something went wrong! FP is negative") + if tmptp + tmpfn is not len(g) - ignoredfn - nignoredtp: + print("seqidx", seq_idx) + print("frame ", f) + print("TP ", tmptp) + print("FN ", tmpfn) + print("FP ", tmpfp) + print("nGT ", len(g)) + print("nAss ", len(association_matrix)) + print("ign GT", ignoredfn) + print("ign TP", nignoredtp) + raise NameError( + "Something went wrong! nGroundtruth is not TP+FN") + if tmptp + tmpfp + nignoredtp + nignoredtracker - nignoredpairs is not len( + t): + print(seq_idx, f, len(t), tmptp, tmpfp) + print(len(association_matrix), association_matrix) + raise NameError( + "Something went wrong! nTracker is not TP+FP") + + # check for id switches or fragmentations + for i, tt in enumerate(this_ids[0]): + if tt in last_ids[0]: + idx = last_ids[0].index(tt) + tid = this_ids[1][i] + lid = last_ids[1][idx] + if tid != lid and lid != -1 and tid != -1: + if g[i].truncation < self.max_truncation: + g[i].id_switch = 1 + ids += 1 + if tid != lid and lid != -1: + if g[i].truncation < self.max_truncation: + g[i].fragmentation = 1 + fr += 1 + + # save current index + last_ids = this_ids + # compute MOTP_t + MODP_t = 1 + if tmptp != 0: + MODP_t = tmpc / float(tmptp) + self.MODP_t.append(MODP_t) + + # remove empty lists for current gt trajectories + self.gt_trajectories[seq_idx] = seq_trajectories + self.ign_trajectories[seq_idx] = seq_ignored + + # gather statistics for "per sequence" statistics. + self.n_gts.append(n_gts) + self.n_trs.append(n_trs) + self.tps.append(seqtp) + self.itps.append(seqitp) + self.fps.append(seqfp) + self.fns.append(seqfn) + self.ifns.append(seqifn) + self.n_igts.append(seqigt) + self.n_itrs.append(seqitr) + + # compute MT/PT/ML, fragments, idswitches for all groundtruth trajectories + n_ignored_tr_total = 0 + for seq_idx, ( + seq_trajectories, seq_ignored + ) in enumerate(zip(self.gt_trajectories, self.ign_trajectories)): + if len(seq_trajectories) == 0: + continue + tmpMT, tmpML, tmpPT, tmpId_switches, tmpFragments = [0] * 5 + n_ignored_tr = 0 + for g, ign_g in zip(seq_trajectories.values(), + seq_ignored.values()): + # all frames of this gt trajectory are ignored + if all(ign_g): + n_ignored_tr += 1 + n_ignored_tr_total += 1 + continue + # all frames of this gt trajectory are not assigned to any detections + if all([this == -1 for this in g]): + tmpML += 1 + self.ML += 1 + continue + # compute tracked frames in trajectory + last_id = g[0] + # first detection (necessary to be in gt_trajectories) is always tracked + tracked = 1 if g[0] >= 0 else 0 + lgt = 0 if ign_g[0] else 1 + for f in range(1, len(g)): + if ign_g[f]: + last_id = -1 + continue + lgt += 1 + if last_id != g[f] and last_id != -1 and g[f] != -1 and g[ + f - 1] != -1: + tmpId_switches += 1 + self.id_switches += 1 + if f < len(g) - 1 and g[f - 1] != g[ + f] and last_id != -1 and g[f] != -1 and g[f + + 1] != -1: + tmpFragments += 1 + self.fragments += 1 + if g[f] != -1: + tracked += 1 + last_id = g[f] + # handle last frame; tracked state is handled in for loop (g[f]!=-1) + if len(g) > 1 and g[f - 1] != g[f] and last_id != -1 and g[ + f] != -1 and not ign_g[f]: + tmpFragments += 1 + self.fragments += 1 + + # compute MT/PT/ML + tracking_ratio = tracked / float(len(g) - sum(ign_g)) + if tracking_ratio > 0.8: + tmpMT += 1 + self.MT += 1 + elif tracking_ratio < 0.2: + tmpML += 1 + self.ML += 1 + else: # 0.2 <= tracking_ratio <= 0.8 + tmpPT += 1 + self.PT += 1 + + if (self.n_gt_trajectories - n_ignored_tr_total) == 0: + self.MT = 0. + self.PT = 0. + self.ML = 0. + else: + self.MT /= float(self.n_gt_trajectories - n_ignored_tr_total) + self.PT /= float(self.n_gt_trajectories - n_ignored_tr_total) + self.ML /= float(self.n_gt_trajectories - n_ignored_tr_total) + + # precision/recall etc. + if (self.fp + self.tp) == 0 or (self.tp + self.fn) == 0: + self.recall = 0. + self.precision = 0. + else: + self.recall = self.tp / float(self.tp + self.fn) + self.precision = self.tp / float(self.fp + self.tp) + if (self.recall + self.precision) == 0: + self.F1 = 0. + else: + self.F1 = 2. * (self.precision * self.recall) / ( + self.precision + self.recall) + if sum(self.n_frames) == 0: + self.FAR = "n/a" + else: + self.FAR = self.fp / float(sum(self.n_frames)) + + # compute CLEARMOT + if self.n_gt == 0: + self.MOTA = -float("inf") + self.MODA = -float("inf") + else: + self.MOTA = 1 - (self.fn + self.fp + self.id_switches + ) / float(self.n_gt) + self.MODA = 1 - (self.fn + self.fp) / float(self.n_gt) + if self.tp == 0: + self.MOTP = float("inf") + else: + self.MOTP = self.total_cost / float(self.tp) + if self.n_gt != 0: + if self.id_switches == 0: + self.MOTAL = 1 - (self.fn + self.fp + self.id_switches + ) / float(self.n_gt) + else: + self.MOTAL = 1 - (self.fn + self.fp + + math.log10(self.id_switches) + ) / float(self.n_gt) + else: + self.MOTAL = -float("inf") + if sum(self.n_frames) == 0: + self.MODP = "n/a" + else: + self.MODP = sum(self.MODP_t) / float(sum(self.n_frames)) + return True + + def createSummary(self): + summary = "" + summary += "tracking evaluation summary".center(80, "=") + "\n" + summary += self.printEntry("Multiple Object Tracking Accuracy (MOTA)", + self.MOTA) + "\n" + summary += self.printEntry("Multiple Object Tracking Precision (MOTP)", + self.MOTP) + "\n" + summary += self.printEntry("Multiple Object Tracking Accuracy (MOTAL)", + self.MOTAL) + "\n" + summary += self.printEntry("Multiple Object Detection Accuracy (MODA)", + self.MODA) + "\n" + summary += self.printEntry("Multiple Object Detection Precision (MODP)", + self.MODP) + "\n" + summary += "\n" + summary += self.printEntry("Recall", self.recall) + "\n" + summary += self.printEntry("Precision", self.precision) + "\n" + summary += self.printEntry("F1", self.F1) + "\n" + summary += self.printEntry("False Alarm Rate", self.FAR) + "\n" + summary += "\n" + summary += self.printEntry("Mostly Tracked", self.MT) + "\n" + summary += self.printEntry("Partly Tracked", self.PT) + "\n" + summary += self.printEntry("Mostly Lost", self.ML) + "\n" + summary += "\n" + summary += self.printEntry("True Positives", self.tp) + "\n" + #summary += self.printEntry("True Positives per Sequence", self.tps) + "\n" + summary += self.printEntry("Ignored True Positives", self.itp) + "\n" + #summary += self.printEntry("Ignored True Positives per Sequence", self.itps) + "\n" + + summary += self.printEntry("False Positives", self.fp) + "\n" + #summary += self.printEntry("False Positives per Sequence", self.fps) + "\n" + summary += self.printEntry("False Negatives", self.fn) + "\n" + #summary += self.printEntry("False Negatives per Sequence", self.fns) + "\n" + summary += self.printEntry("ID-switches", self.id_switches) + "\n" + self.fp = self.fp / self.n_gt + self.fn = self.fn / self.n_gt + self.id_switches = self.id_switches / self.n_gt + summary += self.printEntry("False Positives Ratio", self.fp) + "\n" + #summary += self.printEntry("False Positives per Sequence", self.fps) + "\n" + summary += self.printEntry("False Negatives Ratio", self.fn) + "\n" + #summary += self.printEntry("False Negatives per Sequence", self.fns) + "\n" + summary += self.printEntry("Ignored False Negatives Ratio", + self.ifn) + "\n" + + #summary += self.printEntry("Ignored False Negatives per Sequence", self.ifns) + "\n" + summary += self.printEntry("Missed Targets", self.fn) + "\n" + summary += self.printEntry("ID-switches", self.id_switches) + "\n" + summary += self.printEntry("Fragmentations", self.fragments) + "\n" + summary += "\n" + summary += self.printEntry("Ground Truth Objects (Total)", self.n_gt + + self.n_igt) + "\n" + #summary += self.printEntry("Ground Truth Objects (Total) per Sequence", self.n_gts) + "\n" + summary += self.printEntry("Ignored Ground Truth Objects", + self.n_igt) + "\n" + #summary += self.printEntry("Ignored Ground Truth Objects per Sequence", self.n_igts) + "\n" + summary += self.printEntry("Ground Truth Trajectories", + self.n_gt_trajectories) + "\n" + summary += "\n" + summary += self.printEntry("Tracker Objects (Total)", self.n_tr) + "\n" + #summary += self.printEntry("Tracker Objects (Total) per Sequence", self.n_trs) + "\n" + summary += self.printEntry("Ignored Tracker Objects", self.n_itr) + "\n" + #summary += self.printEntry("Ignored Tracker Objects per Sequence", self.n_itrs) + "\n" + summary += self.printEntry("Tracker Trajectories", + self.n_tr_trajectories) + "\n" + #summary += "\n" + #summary += self.printEntry("Ignored Tracker Objects with Associated Ignored Ground Truth Objects", self.n_igttr) + "\n" + summary += "=" * 80 + return summary + + def printEntry(self, key, val, width=(70, 10)): + """ + Pretty print an entry in a table fashion. + """ + s_out = key.ljust(width[0]) + if type(val) == int: + s = "%%%dd" % width[1] + s_out += s % val + elif type(val) == float: + s = "%%%df" % (width[1]) + s_out += s % val + else: + s_out += ("%s" % val).rjust(width[1]) + return s_out + + def saveToStats(self, save_summary): + """ + Save the statistics in a whitespace separate file. + """ + summary = self.createSummary() + if save_summary: + filename = os.path.join(self.result_path, + "summary_%s.txt" % self.cls) + dump = open(filename, "w+") + dump.write(summary) + dump.close() + return summary + + +class KITTIMOTMetric(Metric): + def __init__(self, save_summary=True): + self.save_summary = save_summary + self.MOTEvaluator = KITTIEvaluation + self.result_root = None + self.reset() + + def reset(self): + self.seqs = [] + self.n_sequences = 0 + self.n_frames = [] + self.strsummary = '' + + def update(self, data_root, seq, data_type, result_root, result_filename): + assert data_type == 'kitti', "data_type should 'kitti'" + self.result_root = result_root + self.gt_path = data_root + gt_path = '{}/../labels/{}.txt'.format(data_root, seq) + gt = open(gt_path, "r") + max_frame = 0 + for line in gt: + line = line.strip() + line_list = line.split(" ") + if int(line_list[0]) > max_frame: + max_frame = int(line_list[0]) + rs = open(result_filename, "r") + for line in rs: + line = line.strip() + line_list = line.split(" ") + if int(line_list[0]) > max_frame: + max_frame = int(line_list[0]) + gt.close() + rs.close() + self.n_frames.append(max_frame + 1) + self.seqs.append(seq) + self.n_sequences += 1 + + def accumulate(self): + logger.info("Processing Result for KITTI Tracking Benchmark") + e = self.MOTEvaluator(result_path=self.result_root, gt_path=self.gt_path,\ + n_frames=self.n_frames, seqs=self.seqs, n_sequences=self.n_sequences) + try: + if not e.loadTracker(): + return + logger.info("Loading Results - Success") + logger.info("Evaluate Object Class: %s" % c.upper()) + except: + logger.info("Caught exception while loading result data.") + if not e.loadGroundtruth(): + raise ValueError("Ground truth not found.") + logger.info("Loading Groundtruth - Success") + # sanity checks + if len(e.groundtruth) is not len(e.tracker): + logger.info( + "The uploaded data does not provide results for every sequence.") + return False + logger.info("Loaded %d Sequences." % len(e.groundtruth)) + logger.info("Start Evaluation...") + + if e.compute3rdPartyMetrics(): + self.strsummary = e.saveToStats(self.save_summary) + else: + logger.info( + "There seem to be no true positives or false positives at all in the submitted data." + ) + + def log(self): + print(self.strsummary) + + def get_results(self): + return self.strsummary diff --git a/PaddleDetection-release-2.6/ppdet/metrics/munkres.py b/PaddleDetection-release-2.6/ppdet/metrics/munkres.py new file mode 100644 index 0000000000000000000000000000000000000000..fbd4a92d2a793bf130c8a1d253bd45bde8cbb0d1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/munkres.py @@ -0,0 +1,428 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is borrow from https://github.com/xingyizhou/CenterTrack/blob/master/src/tools/eval_kitti_track/munkres.py +""" + +import sys + +__all__ = ['Munkres', 'make_cost_matrix'] + + +class Munkres: + """ + Calculate the Munkres solution to the classical assignment problem. + See the module documentation for usage. + """ + + def __init__(self): + """Create a new instance""" + self.C = None + self.row_covered = [] + self.col_covered = [] + self.n = 0 + self.Z0_r = 0 + self.Z0_c = 0 + self.marked = None + self.path = None + + def make_cost_matrix(profit_matrix, inversion_function): + """ + **DEPRECATED** + + Please use the module function ``make_cost_matrix()``. + """ + import munkres + return munkres.make_cost_matrix(profit_matrix, inversion_function) + + make_cost_matrix = staticmethod(make_cost_matrix) + + def pad_matrix(self, matrix, pad_value=0): + """ + Pad a possibly non-square matrix to make it square. + + :Parameters: + matrix : list of lists + matrix to pad + + pad_value : int + value to use to pad the matrix + + :rtype: list of lists + :return: a new, possibly padded, matrix + """ + max_columns = 0 + total_rows = len(matrix) + + for row in matrix: + max_columns = max(max_columns, len(row)) + + total_rows = max(max_columns, total_rows) + + new_matrix = [] + for row in matrix: + row_len = len(row) + new_row = row[:] + if total_rows > row_len: + # Row too short. Pad it. + new_row += [0] * (total_rows - row_len) + new_matrix += [new_row] + + while len(new_matrix) < total_rows: + new_matrix += [[0] * total_rows] + + return new_matrix + + def compute(self, cost_matrix): + """ + Compute the indexes for the lowest-cost pairings between rows and + columns in the database. Returns a list of (row, column) tuples + that can be used to traverse the matrix. + + :Parameters: + cost_matrix : list of lists + The cost matrix. If this cost matrix is not square, it + will be padded with zeros, via a call to ``pad_matrix()``. + (This method does *not* modify the caller's matrix. It + operates on a copy of the matrix.) + + **WARNING**: This code handles square and rectangular + matrices. It does *not* handle irregular matrices. + + :rtype: list + :return: A list of ``(row, column)`` tuples that describe the lowest + cost path through the matrix + + """ + self.C = self.pad_matrix(cost_matrix) + self.n = len(self.C) + self.original_length = len(cost_matrix) + self.original_width = len(cost_matrix[0]) + self.row_covered = [False for i in range(self.n)] + self.col_covered = [False for i in range(self.n)] + self.Z0_r = 0 + self.Z0_c = 0 + self.path = self.__make_matrix(self.n * 2, 0) + self.marked = self.__make_matrix(self.n, 0) + + done = False + step = 1 + + steps = { + 1: self.__step1, + 2: self.__step2, + 3: self.__step3, + 4: self.__step4, + 5: self.__step5, + 6: self.__step6 + } + + while not done: + try: + func = steps[step] + step = func() + except KeyError: + done = True + + # Look for the starred columns + results = [] + for i in range(self.original_length): + for j in range(self.original_width): + if self.marked[i][j] == 1: + results += [(i, j)] + + return results + + def __copy_matrix(self, matrix): + """Return an exact copy of the supplied matrix""" + return copy.deepcopy(matrix) + + def __make_matrix(self, n, val): + """Create an *n*x*n* matrix, populating it with the specific value.""" + matrix = [] + for i in range(n): + matrix += [[val for j in range(n)]] + return matrix + + def __step1(self): + """ + For each row of the matrix, find the smallest element and + subtract it from every element in its row. Go to Step 2. + """ + C = self.C + n = self.n + for i in range(n): + minval = min(self.C[i]) + # Find the minimum value for this row and subtract that minimum + # from every element in the row. + for j in range(n): + self.C[i][j] -= minval + + return 2 + + def __step2(self): + """ + Find a zero (Z) in the resulting matrix. If there is no starred + zero in its row or column, star Z. Repeat for each element in the + matrix. Go to Step 3. + """ + n = self.n + for i in range(n): + for j in range(n): + if (self.C[i][j] == 0) and \ + (not self.col_covered[j]) and \ + (not self.row_covered[i]): + self.marked[i][j] = 1 + self.col_covered[j] = True + self.row_covered[i] = True + + self.__clear_covers() + return 3 + + def __step3(self): + """ + Cover each column containing a starred zero. If K columns are + covered, the starred zeros describe a complete set of unique + assignments. In this case, Go to DONE, otherwise, Go to Step 4. + """ + n = self.n + count = 0 + for i in range(n): + for j in range(n): + if self.marked[i][j] == 1: + self.col_covered[j] = True + count += 1 + + if count >= n: + step = 7 # done + else: + step = 4 + + return step + + def __step4(self): + """ + Find a noncovered zero and prime it. If there is no starred zero + in the row containing this primed zero, Go to Step 5. Otherwise, + cover this row and uncover the column containing the starred + zero. Continue in this manner until there are no uncovered zeros + left. Save the smallest uncovered value and Go to Step 6. + """ + step = 0 + done = False + row = -1 + col = -1 + star_col = -1 + while not done: + (row, col) = self.__find_a_zero() + if row < 0: + done = True + step = 6 + else: + self.marked[row][col] = 2 + star_col = self.__find_star_in_row(row) + if star_col >= 0: + col = star_col + self.row_covered[row] = True + self.col_covered[col] = False + else: + done = True + self.Z0_r = row + self.Z0_c = col + step = 5 + + return step + + def __step5(self): + """ + Construct a series of alternating primed and starred zeros as + follows. Let Z0 represent the uncovered primed zero found in Step 4. + Let Z1 denote the starred zero in the column of Z0 (if any). + Let Z2 denote the primed zero in the row of Z1 (there will always + be one). Continue until the series terminates at a primed zero + that has no starred zero in its column. Unstar each starred zero + of the series, star each primed zero of the series, erase all + primes and uncover every line in the matrix. Return to Step 3 + """ + count = 0 + path = self.path + path[count][0] = self.Z0_r + path[count][1] = self.Z0_c + done = False + while not done: + row = self.__find_star_in_col(path[count][1]) + if row >= 0: + count += 1 + path[count][0] = row + path[count][1] = path[count - 1][1] + else: + done = True + + if not done: + col = self.__find_prime_in_row(path[count][0]) + count += 1 + path[count][0] = path[count - 1][0] + path[count][1] = col + + self.__convert_path(path, count) + self.__clear_covers() + self.__erase_primes() + return 3 + + def __step6(self): + """ + Add the value found in Step 4 to every element of each covered + row, and subtract it from every element of each uncovered column. + Return to Step 4 without altering any stars, primes, or covered + lines. + """ + minval = self.__find_smallest() + for i in range(self.n): + for j in range(self.n): + if self.row_covered[i]: + self.C[i][j] += minval + if not self.col_covered[j]: + self.C[i][j] -= minval + return 4 + + def __find_smallest(self): + """Find the smallest uncovered value in the matrix.""" + minval = 2e9 # sys.maxint + for i in range(self.n): + for j in range(self.n): + if (not self.row_covered[i]) and (not self.col_covered[j]): + if minval > self.C[i][j]: + minval = self.C[i][j] + return minval + + def __find_a_zero(self): + """Find the first uncovered element with value 0""" + row = -1 + col = -1 + i = 0 + n = self.n + done = False + + while not done: + j = 0 + while True: + if (self.C[i][j] == 0) and \ + (not self.row_covered[i]) and \ + (not self.col_covered[j]): + row = i + col = j + done = True + j += 1 + if j >= n: + break + i += 1 + if i >= n: + done = True + + return (row, col) + + def __find_star_in_row(self, row): + """ + Find the first starred element in the specified row. Returns + the column index, or -1 if no starred element was found. + """ + col = -1 + for j in range(self.n): + if self.marked[row][j] == 1: + col = j + break + + return col + + def __find_star_in_col(self, col): + """ + Find the first starred element in the specified row. Returns + the row index, or -1 if no starred element was found. + """ + row = -1 + for i in range(self.n): + if self.marked[i][col] == 1: + row = i + break + + return row + + def __find_prime_in_row(self, row): + """ + Find the first prime element in the specified row. Returns + the column index, or -1 if no starred element was found. + """ + col = -1 + for j in range(self.n): + if self.marked[row][j] == 2: + col = j + break + + return col + + def __convert_path(self, path, count): + for i in range(count + 1): + if self.marked[path[i][0]][path[i][1]] == 1: + self.marked[path[i][0]][path[i][1]] = 0 + else: + self.marked[path[i][0]][path[i][1]] = 1 + + def __clear_covers(self): + """Clear all covered matrix cells""" + for i in range(self.n): + self.row_covered[i] = False + self.col_covered[i] = False + + def __erase_primes(self): + """Erase all prime markings""" + for i in range(self.n): + for j in range(self.n): + if self.marked[i][j] == 2: + self.marked[i][j] = 0 + + +def make_cost_matrix(profit_matrix, inversion_function): + """ + Create a cost matrix from a profit matrix by calling + 'inversion_function' to invert each value. The inversion + function must take one numeric argument (of any type) and return + another numeric argument which is presumed to be the cost inverse + of the original profit. + + This is a static method. Call it like this: + + .. python:: + + cost_matrix = Munkres.make_cost_matrix(matrix, inversion_func) + + For example: + + .. python:: + + cost_matrix = Munkres.make_cost_matrix(matrix, lambda x : sys.maxint - x) + + :Parameters: + profit_matrix : list of lists + The matrix to convert from a profit to a cost matrix + + inversion_function : function + The function to use to invert each entry in the profit matrix + + :rtype: list of lists + :return: The converted matrix + """ + cost_matrix = [] + for row in profit_matrix: + cost_matrix.append([inversion_function(value) for value in row]) + return cost_matrix diff --git a/PaddleDetection-release-2.6/ppdet/metrics/pose3d_metrics.py b/PaddleDetection-release-2.6/ppdet/metrics/pose3d_metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..ea21de90b07e8883b7e5c4717b995527331b48d6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/pose3d_metrics.py @@ -0,0 +1,200 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +from paddle.distributed import ParallelEnv +import os +import json +from collections import defaultdict, OrderedDict +import numpy as np +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['Pose3DEval'] + + +class AverageMeter(object): + def __init__(self): + self.reset() + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + self.avg = self.sum / self.count + + +def mean_per_joint_position_error(pred, gt, has_3d_joints): + """ + Compute mPJPE + """ + gt = gt[has_3d_joints == 1] + gt = gt[:, :, :3] + pred = pred[has_3d_joints == 1] + + with paddle.no_grad(): + gt_pelvis = (gt[:, 2, :] + gt[:, 3, :]) / 2 + gt = gt - gt_pelvis[:, None, :] + pred_pelvis = (pred[:, 2, :] + pred[:, 3, :]) / 2 + pred = pred - pred_pelvis[:, None, :] + error = paddle.sqrt(((pred - gt)**2).sum(axis=-1)).mean(axis=-1).numpy() + return error + + +def compute_similarity_transform(S1, S2): + """Computes a similarity transform (sR, t) that takes + a set of 3D points S1 (3 x N) closest to a set of 3D points S2, + where R is an 3x3 rotation matrix, t 3x1 translation, s scale. + i.e. solves the orthogonal Procrutes problem. + """ + transposed = False + if S1.shape[0] != 3 and S1.shape[0] != 2: + S1 = S1.T + S2 = S2.T + transposed = True + assert (S2.shape[1] == S1.shape[1]) + + # 1. Remove mean. + mu1 = S1.mean(axis=1, keepdims=True) + mu2 = S2.mean(axis=1, keepdims=True) + X1 = S1 - mu1 + X2 = S2 - mu2 + + # 2. Compute variance of X1 used for scale. + var1 = np.sum(X1**2) + + # 3. The outer product of X1 and X2. + K = X1.dot(X2.T) + + # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are + # singular vectors of K. + U, s, Vh = np.linalg.svd(K) + V = Vh.T + # Construct Z that fixes the orientation of R to get det(R)=1. + Z = np.eye(U.shape[0]) + Z[-1, -1] *= np.sign(np.linalg.det(U.dot(V.T))) + # Construct R. + R = V.dot(Z.dot(U.T)) + + # 5. Recover scale. + scale = np.trace(R.dot(K)) / var1 + + # 6. Recover translation. + t = mu2 - scale * (R.dot(mu1)) + + # 7. Error: + S1_hat = scale * R.dot(S1) + t + + if transposed: + S1_hat = S1_hat.T + + return S1_hat + + +def compute_similarity_transform_batch(S1, S2): + """Batched version of compute_similarity_transform.""" + S1_hat = np.zeros_like(S1) + for i in range(S1.shape[0]): + S1_hat[i] = compute_similarity_transform(S1[i], S2[i]) + return S1_hat + + +def reconstruction_error(S1, S2, reduction='mean'): + """Do Procrustes alignment and compute reconstruction error.""" + S1_hat = compute_similarity_transform_batch(S1, S2) + re = np.sqrt(((S1_hat - S2)**2).sum(axis=-1)).mean(axis=-1) + if reduction == 'mean': + re = re.mean() + elif reduction == 'sum': + re = re.sum() + return re + + +def all_gather(data): + if paddle.distributed.get_world_size() == 1: + return data + vlist = [] + paddle.distributed.all_gather(vlist, data) + data = paddle.concat(vlist, 0) + return data + + +class Pose3DEval(object): + def __init__(self, output_eval, save_prediction_only=False): + super(Pose3DEval, self).__init__() + self.output_eval = output_eval + self.res_file = os.path.join(output_eval, "pose3d_results.json") + self.save_prediction_only = save_prediction_only + self.reset() + + def reset(self): + self.PAmPJPE = AverageMeter() + self.mPJPE = AverageMeter() + self.eval_results = {} + + def get_human36m_joints(self, input): + J24_TO_J14 = paddle.to_tensor( + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18]) + J24_TO_J17 = paddle.to_tensor( + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 18, 19]) + return paddle.index_select(input, J24_TO_J14, axis=1) + + def update(self, inputs, outputs): + gt_3d_joints = all_gather(inputs['joints_3d'].cuda(ParallelEnv() + .local_rank)) + has_3d_joints = all_gather(inputs['has_3d_joints'].cuda(ParallelEnv() + .local_rank)) + pred_3d_joints = all_gather(outputs['pose3d']) + if gt_3d_joints.shape[1] == 24: + gt_3d_joints = self.get_human36m_joints(gt_3d_joints) + if pred_3d_joints.shape[1] == 24: + pred_3d_joints = self.get_human36m_joints(pred_3d_joints) + mPJPE_val = mean_per_joint_position_error(pred_3d_joints, gt_3d_joints, + has_3d_joints).mean() + PAmPJPE_val = reconstruction_error( + pred_3d_joints.numpy(), + gt_3d_joints[:, :, :3].numpy(), + reduction=None).mean() + count = int(np.sum(has_3d_joints.numpy())) + self.PAmPJPE.update(PAmPJPE_val * 1000., count) + self.mPJPE.update(mPJPE_val * 1000., count) + + def accumulate(self): + if self.save_prediction_only: + logger.info(f'The pose3d result is saved to {self.res_file} ' + 'and do not evaluate the model.') + return + self.eval_results['pose3d'] = [-self.mPJPE.avg, -self.PAmPJPE.avg] + + def log(self): + if self.save_prediction_only: + return + stats_names = ['mPJPE', 'PAmPJPE'] + num_values = len(stats_names) + print(' '.join(['| {}'.format(name) for name in stats_names]) + ' |') + print('|---' * (num_values + 1) + '|') + + print(' '.join([ + '| {:.3f}'.format(abs(value)) + for value in self.eval_results['pose3d'] + ]) + ' |') + + def get_results(self): + return self.eval_results diff --git a/PaddleDetection-release-2.6/ppdet/metrics/widerface_utils.py b/PaddleDetection-release-2.6/ppdet/metrics/widerface_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2f64bf6d50a3678592c774c42a3fc4181e7bd167 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/metrics/widerface_utils.py @@ -0,0 +1,391 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import cv2 +import numpy as np +from collections import OrderedDict + +import paddle + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['face_eval_run', 'lmk2out'] + + +def face_eval_run(model, + image_dir, + gt_file, + pred_dir='output/pred', + eval_mode='widerface', + multi_scale=False): + # load ground truth files + with open(gt_file, 'r') as f: + gt_lines = f.readlines() + imid2path = [] + pos_gt = 0 + while pos_gt < len(gt_lines): + name_gt = gt_lines[pos_gt].strip('\n\t').split()[0] + imid2path.append(name_gt) + pos_gt += 1 + n_gt = int(gt_lines[pos_gt].strip('\n\t').split()[0]) + pos_gt += 1 + n_gt + logger.info('The ground truth file load {} images'.format(len(imid2path))) + + dets_dist = OrderedDict() + for iter_id, im_path in enumerate(imid2path): + image_path = os.path.join(image_dir, im_path) + if eval_mode == 'fddb': + image_path += '.jpg' + assert os.path.exists(image_path) + image = cv2.imread(image_path) + image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + if multi_scale: + shrink, max_shrink = get_shrink(image.shape[0], image.shape[1]) + det0 = detect_face(model, image, shrink) + det1 = flip_test(model, image, shrink) + [det2, det3] = multi_scale_test(model, image, max_shrink) + det4 = multi_scale_test_pyramid(model, image, max_shrink) + det = np.row_stack((det0, det1, det2, det3, det4)) + dets = bbox_vote(det) + else: + dets = detect_face(model, image, 1) + if eval_mode == 'widerface': + save_widerface_bboxes(image_path, dets, pred_dir) + else: + dets_dist[im_path] = dets + if iter_id % 100 == 0: + logger.info('Test iter {}'.format(iter_id)) + if eval_mode == 'fddb': + save_fddb_bboxes(dets_dist, pred_dir) + logger.info("Finish evaluation.") + + +def detect_face(model, image, shrink): + image_shape = [image.shape[0], image.shape[1]] + if shrink != 1: + h, w = int(image_shape[0] * shrink), int(image_shape[1] * shrink) + image = cv2.resize(image, (w, h)) + image_shape = [h, w] + + img = face_img_process(image) + image_shape = np.asarray([image_shape]) + scale_factor = np.asarray([[shrink, shrink]]) + data = { + "image": paddle.to_tensor( + img, dtype='float32'), + "im_shape": paddle.to_tensor( + image_shape, dtype='float32'), + "scale_factor": paddle.to_tensor( + scale_factor, dtype='float32') + } + model.eval() + detection = model(data) + detection = detection['bbox'].numpy() + # layout: xmin, ymin, xmax. ymax, score + if np.prod(detection.shape) == 1: + logger.info("No face detected") + return np.array([[0, 0, 0, 0, 0]]) + det_conf = detection[:, 1] + det_xmin = detection[:, 2] + det_ymin = detection[:, 3] + det_xmax = detection[:, 4] + det_ymax = detection[:, 5] + + det = np.column_stack((det_xmin, det_ymin, det_xmax, det_ymax, det_conf)) + return det + + +def flip_test(model, image, shrink): + img = cv2.flip(image, 1) + det_f = detect_face(model, img, shrink) + det_t = np.zeros(det_f.shape) + img_width = image.shape[1] + det_t[:, 0] = img_width - det_f[:, 2] + det_t[:, 1] = det_f[:, 1] + det_t[:, 2] = img_width - det_f[:, 0] + det_t[:, 3] = det_f[:, 3] + det_t[:, 4] = det_f[:, 4] + return det_t + + +def multi_scale_test(model, image, max_shrink): + # Shrink detecting is only used to detect big faces + st = 0.5 if max_shrink >= 0.75 else 0.5 * max_shrink + det_s = detect_face(model, image, st) + index = np.where( + np.maximum(det_s[:, 2] - det_s[:, 0] + 1, det_s[:, 3] - det_s[:, 1] + 1) + > 30)[0] + det_s = det_s[index, :] + # Enlarge one times + bt = min(2, max_shrink) if max_shrink > 1 else (st + max_shrink) / 2 + det_b = detect_face(model, image, bt) + + # Enlarge small image x times for small faces + if max_shrink > 2: + bt *= 2 + while bt < max_shrink: + det_b = np.row_stack((det_b, detect_face(model, image, bt))) + bt *= 2 + det_b = np.row_stack((det_b, detect_face(model, image, max_shrink))) + + # Enlarged images are only used to detect small faces. + if bt > 1: + index = np.where( + np.minimum(det_b[:, 2] - det_b[:, 0] + 1, + det_b[:, 3] - det_b[:, 1] + 1) < 100)[0] + det_b = det_b[index, :] + # Shrinked images are only used to detect big faces. + else: + index = np.where( + np.maximum(det_b[:, 2] - det_b[:, 0] + 1, + det_b[:, 3] - det_b[:, 1] + 1) > 30)[0] + det_b = det_b[index, :] + return det_s, det_b + + +def multi_scale_test_pyramid(model, image, max_shrink): + # Use image pyramids to detect faces + det_b = detect_face(model, image, 0.25) + index = np.where( + np.maximum(det_b[:, 2] - det_b[:, 0] + 1, det_b[:, 3] - det_b[:, 1] + 1) + > 30)[0] + det_b = det_b[index, :] + + st = [0.75, 1.25, 1.5, 1.75] + for i in range(len(st)): + if st[i] <= max_shrink: + det_temp = detect_face(model, image, st[i]) + # Enlarged images are only used to detect small faces. + if st[i] > 1: + index = np.where( + np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, + det_temp[:, 3] - det_temp[:, 1] + 1) < 100)[0] + det_temp = det_temp[index, :] + # Shrinked images are only used to detect big faces. + else: + index = np.where( + np.maximum(det_temp[:, 2] - det_temp[:, 0] + 1, + det_temp[:, 3] - det_temp[:, 1] + 1) > 30)[0] + det_temp = det_temp[index, :] + det_b = np.row_stack((det_b, det_temp)) + return det_b + + +def to_chw(image): + """ + Transpose image from HWC to CHW. + Args: + image (np.array): an image with HWC layout. + """ + # HWC to CHW + if len(image.shape) == 3: + image = np.swapaxes(image, 1, 2) + image = np.swapaxes(image, 1, 0) + return image + + +def face_img_process(image, + mean=[104., 117., 123.], + std=[127.502231, 127.502231, 127.502231]): + img = np.array(image) + img = to_chw(img) + img = img.astype('float32') + img -= np.array(mean)[:, np.newaxis, np.newaxis].astype('float32') + img /= np.array(std)[:, np.newaxis, np.newaxis].astype('float32') + img = [img] + img = np.array(img) + return img + + +def get_shrink(height, width): + """ + Args: + height (int): image height. + width (int): image width. + """ + # avoid out of memory + max_shrink_v1 = (0x7fffffff / 577.0 / (height * width))**0.5 + max_shrink_v2 = ((678 * 1024 * 2.0 * 2.0) / (height * width))**0.5 + + def get_round(x, loc): + str_x = str(x) + if '.' in str_x: + str_before, str_after = str_x.split('.') + len_after = len(str_after) + if len_after >= 3: + str_final = str_before + '.' + str_after[0:loc] + return float(str_final) + else: + return x + + max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3 + if max_shrink >= 1.5 and max_shrink < 2: + max_shrink = max_shrink - 0.1 + elif max_shrink >= 2 and max_shrink < 3: + max_shrink = max_shrink - 0.2 + elif max_shrink >= 3 and max_shrink < 4: + max_shrink = max_shrink - 0.3 + elif max_shrink >= 4 and max_shrink < 5: + max_shrink = max_shrink - 0.4 + elif max_shrink >= 5: + max_shrink = max_shrink - 0.5 + elif max_shrink <= 0.1: + max_shrink = 0.1 + + shrink = max_shrink if max_shrink < 1 else 1 + return shrink, max_shrink + + +def bbox_vote(det): + order = det[:, 4].ravel().argsort()[::-1] + det = det[order, :] + if det.shape[0] == 0: + dets = np.array([[10, 10, 20, 20, 0.002]]) + det = np.empty(shape=[0, 5]) + while det.shape[0] > 0: + # IOU + area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1) + xx1 = np.maximum(det[0, 0], det[:, 0]) + yy1 = np.maximum(det[0, 1], det[:, 1]) + xx2 = np.minimum(det[0, 2], det[:, 2]) + yy2 = np.minimum(det[0, 3], det[:, 3]) + w = np.maximum(0.0, xx2 - xx1 + 1) + h = np.maximum(0.0, yy2 - yy1 + 1) + inter = w * h + o = inter / (area[0] + area[:] - inter) + + # nms + merge_index = np.where(o >= 0.3)[0] + det_accu = det[merge_index, :] + det = np.delete(det, merge_index, 0) + if merge_index.shape[0] <= 1: + if det.shape[0] == 0: + try: + dets = np.row_stack((dets, det_accu)) + except: + dets = det_accu + continue + det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4)) + max_score = np.max(det_accu[:, 4]) + det_accu_sum = np.zeros((1, 5)) + det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4], + axis=0) / np.sum(det_accu[:, -1:]) + det_accu_sum[:, 4] = max_score + try: + dets = np.row_stack((dets, det_accu_sum)) + except: + dets = det_accu_sum + dets = dets[0:750, :] + keep_index = np.where(dets[:, 4] >= 0.01)[0] + dets = dets[keep_index, :] + return dets + + +def save_widerface_bboxes(image_path, bboxes_scores, output_dir): + image_name = image_path.split('/')[-1] + image_class = image_path.split('/')[-2] + odir = os.path.join(output_dir, image_class) + if not os.path.exists(odir): + os.makedirs(odir) + + ofname = os.path.join(odir, '%s.txt' % (image_name[:-4])) + f = open(ofname, 'w') + f.write('{:s}\n'.format(image_class + '/' + image_name)) + f.write('{:d}\n'.format(bboxes_scores.shape[0])) + for box_score in bboxes_scores: + xmin, ymin, xmax, ymax, score = box_score + f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, ( + xmax - xmin + 1), (ymax - ymin + 1), score)) + f.close() + logger.info("The predicted result is saved as {}".format(ofname)) + + +def save_fddb_bboxes(bboxes_scores, + output_dir, + output_fname='pred_fddb_res.txt'): + if not os.path.exists(output_dir): + os.makedirs(output_dir) + predict_file = os.path.join(output_dir, output_fname) + f = open(predict_file, 'w') + for image_path, dets in bboxes_scores.iteritems(): + f.write('{:s}\n'.format(image_path)) + f.write('{:d}\n'.format(dets.shape[0])) + for box_score in dets: + xmin, ymin, xmax, ymax, score = box_score + width, height = xmax - xmin, ymax - ymin + f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n' + .format(xmin, ymin, width, height, score)) + logger.info("The predicted result is saved as {}".format(predict_file)) + return predict_file + + +def lmk2out(results, is_bbox_normalized=False): + """ + Args: + results: request a dict, should include: `landmark`, `im_id`, + if is_bbox_normalized=True, also need `im_shape`. + is_bbox_normalized: whether or not landmark is normalized. + """ + xywh_res = [] + for t in results: + bboxes = t['bbox'][0] + lengths = t['bbox'][1][0] + im_ids = np.array(t['im_id'][0]).flatten() + if bboxes.shape == (1, 1) or bboxes is None: + continue + face_index = t['face_index'][0] + prior_box = t['prior_boxes'][0] + predict_lmk = t['landmark'][0] + prior = np.reshape(prior_box, (-1, 4)) + predictlmk = np.reshape(predict_lmk, (-1, 10)) + + k = 0 + for a in range(len(lengths)): + num = lengths[a] + im_id = int(im_ids[a]) + for i in range(num): + score = bboxes[k][1] + theindex = face_index[i][0] + me_prior = prior[theindex, :] + lmk_pred = predictlmk[theindex, :] + prior_w = me_prior[2] - me_prior[0] + prior_h = me_prior[3] - me_prior[1] + prior_w_center = (me_prior[2] + me_prior[0]) / 2 + prior_h_center = (me_prior[3] + me_prior[1]) / 2 + lmk_decode = np.zeros((10)) + for j in [0, 2, 4, 6, 8]: + lmk_decode[j] = lmk_pred[j] * 0.1 * prior_w + prior_w_center + for j in [1, 3, 5, 7, 9]: + lmk_decode[j] = lmk_pred[j] * 0.1 * prior_h + prior_h_center + im_shape = t['im_shape'][0][a].tolist() + image_h, image_w = int(im_shape[0]), int(im_shape[1]) + if is_bbox_normalized: + lmk_decode = lmk_decode * np.array([ + image_w, image_h, image_w, image_h, image_w, image_h, + image_w, image_h, image_w, image_h + ]) + lmk_res = { + 'image_id': im_id, + 'landmark': lmk_decode, + 'score': score, + } + xywh_res.append(lmk_res) + k += 1 + return xywh_res diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/.gitignore b/PaddleDetection-release-2.6/ppdet/model_zoo/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..f296851d6dae0aa69eb0954ad59c095850b135ba --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/.gitignore @@ -0,0 +1 @@ +MODEL_ZOO diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/__init__.py b/PaddleDetection-release-2.6/ppdet/model_zoo/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6db6eb6c6da542405cd3c61ee991b04530c7b3a9 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/__init__.py @@ -0,0 +1,18 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import model_zoo +from .model_zoo import * + +__all__ = model_zoo.__all__ diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..27de16af56544e6d8453657c688c3542f02edb30 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/model_zoo.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/model_zoo.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..16cbe3cb28dd7621d1d5a60d0426a717d54ad25d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/model_zoo/__pycache__/model_zoo.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/model_zoo.py b/PaddleDetection-release-2.6/ppdet/model_zoo/model_zoo.py new file mode 100644 index 0000000000000000000000000000000000000000..27581ef793dee60e0661f3b2fb69d9b4421ec1a5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/model_zoo.py @@ -0,0 +1,84 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os.path as osp +import pkg_resources + +try: + from collections.abc import Sequence +except: + from collections import Sequence + +from ppdet.core.workspace import load_config, create +from ppdet.utils.checkpoint import load_weight +from ppdet.utils.download import get_config_path + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'list_model', 'get_config_file', 'get_weights_url', 'get_model', + 'MODEL_ZOO_FILENAME' +] + +MODEL_ZOO_FILENAME = 'MODEL_ZOO' + + +def list_model(filters=[]): + model_zoo_file = pkg_resources.resource_filename('ppdet.model_zoo', + MODEL_ZOO_FILENAME) + with open(model_zoo_file) as f: + model_names = f.read().splitlines() + + # filter model_name + def filt(name): + for f in filters: + if name.find(f) < 0: + return False + return True + + if isinstance(filters, str) or not isinstance(filters, Sequence): + filters = [filters] + model_names = [name for name in model_names if filt(name)] + if len(model_names) == 0 and len(filters) > 0: + raise ValueError("no model found, please check filters seeting, " + "filters can be set as following kinds:\n" + "\tDataset: coco, voc ...\n" + "\tArchitecture: yolo, rcnn, ssd ...\n" + "\tBackbone: resnet, vgg, darknet ...\n") + + model_str = "Available Models:\n" + for model_name in model_names: + model_str += "\t{}\n".format(model_name) + logger.info(model_str) + + +# models and configs save on bcebos under dygraph directory +def get_config_file(model_name): + return get_config_path("ppdet://configs/{}.yml".format(model_name)) + + +def get_weights_url(model_name): + return "ppdet://models/{}.pdparams".format(osp.split(model_name)[-1]) + + +def get_model(model_name, pretrained=True): + cfg_file = get_config_file(model_name) + cfg = load_config(cfg_file) + model = create(cfg.architecture) + + if pretrained: + load_weight(model, get_weights_url(model_name)) + + return model diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/tests/__init__.py b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6f0ea85344b7e0c679730356928c8749cf71cd66 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_get_model.py b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_get_model.py new file mode 100644 index 0000000000000000000000000000000000000000..8887185e0ca2f6c8edc020be2b92b47c9933d604 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_get_model.py @@ -0,0 +1,48 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import paddle +import ppdet +import unittest + +# NOTE: weights downloading costs time, we choose +# a small model for unittesting +MODEL_NAME = 'ppyolo/ppyolo_tiny_650e_coco' + + +class TestGetConfigFile(unittest.TestCase): + def test_main(self): + try: + cfg_file = ppdet.model_zoo.get_config_file(MODEL_NAME) + assert os.path.isfile(cfg_file) + except: + self.assertTrue(False) + + +class TestGetModel(unittest.TestCase): + def test_main(self): + try: + model = ppdet.model_zoo.get_model(MODEL_NAME) + assert isinstance(model, paddle.nn.Layer) + except: + self.assertTrue(False) + + +if __name__ == '__main__': + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_list_model.py b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_list_model.py new file mode 100644 index 0000000000000000000000000000000000000000..8f91afe0058ff32fae7e1006bb8b4c4de9500fef --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/model_zoo/tests/test_list_model.py @@ -0,0 +1,68 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import unittest +import ppdet + + +class TestListModel(unittest.TestCase): + def setUp(self): + self._filter = [] + + def test_main(self): + try: + ppdet.model_zoo.list_model(self._filter) + self.assertTrue(True) + except: + self.assertTrue(False) + + +class TestListModelYOLO(TestListModel): + def setUp(self): + self._filter = ['yolo'] + + +class TestListModelRCNN(TestListModel): + def setUp(self): + self._filter = ['rcnn'] + + +class TestListModelSSD(TestListModel): + def setUp(self): + self._filter = ['ssd'] + + +class TestListModelMultiFilter(TestListModel): + def setUp(self): + self._filter = ['yolo', 'darknet'] + + +class TestListModelError(unittest.TestCase): + def setUp(self): + self._filter = ['xxx'] + + def test_main(self): + try: + ppdet.model_zoo.list_model(self._filter) + self.assertTrue(False) + except ValueError: + self.assertTrue(True) + + +if __name__ == '__main__': + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fc7caf4403318f0ff37fecc1a4a032c468009fb0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/__init__.py @@ -0,0 +1,49 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import warnings +warnings.filterwarnings( + action='ignore', category=DeprecationWarning, module='ops') + +from . import ops +from . import backbones +from . import necks +from . import proposal_generator +from . import heads +from . import losses +from . import architectures +from . import post_process +from . import layers +from . import reid +from . import mot +from . import transformers +from . import assigners +from . import rbox_utils +from . import ssod + +from .ops import * +from .backbones import * +from .necks import * +from .proposal_generator import * +from .heads import * +from .losses import * +from .architectures import * +from .post_process import * +from .layers import * +from .reid import * +from .mot import * +from .transformers import * +from .assigners import * +from .rbox_utils import * +from .ssod import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..97b32a6db8dd18e0179c700f730d23930a899d25 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/bbox_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/bbox_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..33528e544bd63009bf131d18191a98fc744007a9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/bbox_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/cls_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/cls_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b8575141190ce00c2cdbbbb898536408028c42ef Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/cls_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/initializer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/initializer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e72088e7cdac2b90ac93ec33dbf069b54fc82b0a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/initializer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/keypoint_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/keypoint_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..952ae2c0c72118354b62be03c2c668f97c6c06e8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/keypoint_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/layers.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/layers.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3c8f8ab2918a47efa76f922ae3157bb75c753e93 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/layers.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/ops.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/ops.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c8513d51e53ddc5407847f2cc3f7ea3cf2dd6b4b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/ops.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/post_process.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/post_process.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ee54c658155b410a77d0b270b526e10385e58468 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/post_process.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/rbox_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/rbox_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..bacc3d7caebfef17536eb512817df37f5cddfba2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/rbox_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/shape_spec.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/shape_spec.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b1b433b16f03cdea592629c6564893b552ac270e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/__pycache__/shape_spec.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..8899e5c0b4cb6957810dcbce20c35f55f0dcbdf2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__init__.py @@ -0,0 +1,75 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import meta_arch +from . import faster_rcnn +from . import mask_rcnn +from . import yolo +from . import ppyoloe +from . import cascade_rcnn +from . import ssd +from . import fcos +from . import solov2 +from . import ttfnet +from . import s2anet +from . import keypoint_hrhrnet +from . import keypoint_hrnet +from . import jde +from . import deepsort +from . import fairmot +from . import centernet +from . import gfl +from . import picodet +from . import detr +from . import sparse_rcnn +from . import tood +from . import retinanet +from . import bytetrack +from . import yolox +from . import yolof +from . import pose3d_metro +from . import centertrack +from . import queryinst + +from .meta_arch import * +from .faster_rcnn import * +from .mask_rcnn import * +from .yolo import * +from .ppyoloe import * +from .cascade_rcnn import * +from .ssd import * +from .fcos import * +from .solov2 import * +from .ttfnet import * +from .s2anet import * +from .keypoint_hrhrnet import * +from .keypoint_hrnet import * +from .jde import * +from .deepsort import * +from .fairmot import * +from .centernet import * +from .blazeface import * +from .gfl import * +from .picodet import * +from .detr import * +from .sparse_rcnn import * +from .tood import * +from .retinanet import * +from .bytetrack import * +from .yolox import * +from .yolof import * +from .pose3d_metro import * +from .centertrack import * +from .queryinst import * +from .keypoint_petr import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b4a1070a0dd1f6d200dfa5da61fa774250d1994d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/blazeface.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/blazeface.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..dbb0861600ff9466cc5c299c1d520ea8d5e923e3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/blazeface.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/bytetrack.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/bytetrack.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1d68bf02c506fa157782ee33095a0b110cc42fe7 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/bytetrack.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/cascade_rcnn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/cascade_rcnn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..93dbd7a55397da8c91d704220bf416896a4ae0db Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/cascade_rcnn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centernet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centernet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..25b27253a9d2139e6f469efabec68974eedafb0c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centernet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centertrack.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centertrack.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..05cad7a9efda21828c593cc8f58c0eef08df597f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/centertrack.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/deepsort.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/deepsort.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..22f356319545d86f9101f7c53b7da3478c3766d8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/deepsort.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/detr.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/detr.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d0c2a0e1246dd9b4ec53fb3f8e5150b765aa3c5a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/detr.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fairmot.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fairmot.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..be0887fa31ac8d6313312e1b8031abef25c19976 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fairmot.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/faster_rcnn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/faster_rcnn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0a5b03428bbcf88776d8e66737dbc9729141bb5a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/faster_rcnn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fcos.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fcos.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..40dd5656419fad323de82e42ed49b8192de680a6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/fcos.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/gfl.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/gfl.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..656ff7d2a30fdcd59fe10a9041dce44571a711b8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/gfl.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/jde.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/jde.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2fdfaf722024af0e2ad41d2886c4273910930279 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/jde.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrhrnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrhrnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..aa4adc3d7abbf4b21fa256fdb5da44c7445b3a04 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrhrnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..683e3024a022000be508a22d1d91be9c5468fd90 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_hrnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_petr.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_petr.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7a7bedac4488dfc1fe614a32500ecb1b7e26ae5e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/keypoint_petr.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/mask_rcnn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/mask_rcnn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..569a1b93f135d193c44283d2b65463bbe2d050a3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/mask_rcnn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/meta_arch.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/meta_arch.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..48db879736c07f033277d9dc7ecfad5397694265 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/meta_arch.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/picodet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/picodet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..97d24f8d908227df1d95c27c9022d82f03641bb6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/picodet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/pose3d_metro.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/pose3d_metro.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..a9047e16c8e61e8f6ae6e9d5b747c9095dbd8d35 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/pose3d_metro.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ppyoloe.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ppyoloe.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..beb2d9c50780c61f0924bfeba83179ad790288d1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ppyoloe.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/queryinst.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/queryinst.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ce031cd3c8e51e4429b3bb298b3eaec1878d09f1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/queryinst.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/retinanet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/retinanet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b668ba1ed16b1ab772250c070c30190b60e69300 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/retinanet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/s2anet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/s2anet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3d8c38fcc2cfef831eefd19a94e5c8c024b94efa Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/s2anet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/solov2.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/solov2.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0f270a88ee0d4c095ceb3488b26df4ffe4d6f347 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/solov2.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/sparse_rcnn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/sparse_rcnn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..633d939ff6e581a061780e90c8745a8fbba5659b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/sparse_rcnn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ssd.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ssd.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ebd6c263e8895ba9bcfd2dca7ae8620273ec0fa8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ssd.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/tood.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/tood.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..cee1cf06308bd07a7f87989a505d0c4110433417 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/tood.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ttfnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ttfnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..93e1b0bf53d1a01dda888429de7467a58d435bf5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/ttfnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolo.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolo.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0c389628d6573193f43404cf7993656213460fb8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolo.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolof.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolof.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7cdab10cad55890caa4e2a6c5e9b98767288b0d1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolof.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolox.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolox.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4a2f8f18b42cc7c0073d132b262f5e85d73df84a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/architectures/__pycache__/yolox.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/blazeface.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/blazeface.py new file mode 100644 index 0000000000000000000000000000000000000000..477732d95eec16448b42b062918d099344a81a10 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/blazeface.py @@ -0,0 +1,117 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +import paddle +import paddle.nn.functional as F + +__all__ = ['BlazeFace'] + + +@register +class BlazeFace(BaseArch): + """ + BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs, + see https://arxiv.org/abs/1907.05047 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + blaze_head (nn.Layer): `blazeHead` instance + post_process (object): `BBoxPostProcess` instance + """ + + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, backbone, blaze_head, neck, post_process): + super(BlazeFace, self).__init__() + self.backbone = backbone + self.neck = neck + self.blaze_head = blaze_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + # head + kwargs = {'input_shape': neck.out_shape} + blaze_head = create(cfg['blaze_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + 'blaze_head': blaze_head, + } + + def _forward(self): + # Backbone + body_feats = self.backbone(self.inputs) + # neck + neck_feats = self.neck(body_feats) + # blaze Head + if self.training: + return self.blaze_head(neck_feats, self.inputs['image'], + self.inputs['gt_bbox'], + self.inputs['gt_class']) + else: + preds, anchors = self.blaze_head(neck_feats, self.inputs['image']) + bbox, bbox_num, nms_keep_idx = self.post_process( + preds, anchors, self.inputs['im_shape'], + self.inputs['scale_factor']) + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + preds_logits = preds[1] # [[1xNumBBoxNumClass]] + extra_data['scores'] = F.softmax(paddle.concat( + preds_logits, axis=1)).transpose([0, 2, 1]) + extra_data['logits'] = paddle.concat( + preds_logits, axis=1).transpose([0, 2, 1]) + extra_data['nms_keep_idx'] = nms_keep_idx # bbox index before nms + return bbox, bbox_num, extra_data + else: + return bbox, bbox_num + + def get_loss(self, ): + return {"loss": self._forward()} + + def get_pred(self): + if self.use_extra_data: + bbox_pred, bbox_num, extra_data = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + "extra_data": extra_data + } + else: + bbox_pred, bbox_num = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + } + + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/bytetrack.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/bytetrack.py new file mode 100644 index 0000000000000000000000000000000000000000..1f3d0d173a1369d9044348a7a97af793fedd745e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/bytetrack.py @@ -0,0 +1,83 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['ByteTrack'] + + +@register +class ByteTrack(BaseArch): + """ + ByteTrack network, see https://arxiv.org/abs/2110.06864 + + Args: + detector (object): detector model instance + reid (object): reid model instance, default None + tracker (object): tracker instance + """ + __category__ = 'architecture' + + def __init__(self, + detector='YOLOX', + reid=None, + tracker='JDETracker'): + super(ByteTrack, self).__init__() + self.detector = detector + self.reid = reid + self.tracker = tracker + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + detector = create(cfg['detector']) + + if cfg['reid'] != 'None': + reid = create(cfg['reid']) + else: + reid = None + + tracker = create(cfg['tracker']) + + return { + "detector": detector, + "reid": reid, + "tracker": tracker, + } + + def _forward(self): + det_outs = self.detector(self.inputs) + + if self.training: + return det_outs + else: + if self.reid is not None: + assert 'crops' in self.inputs + crops = self.inputs['crops'] + pred_embs = self.reid(crops) + else: + pred_embs = None + det_outs['embeddings'] = pred_embs + return det_outs + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() + diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/cascade_rcnn.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/cascade_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..c5d454f4948891ed1400d09d0e24490ce46fb361 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/cascade_rcnn.py @@ -0,0 +1,143 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['CascadeRCNN'] + + +@register +class CascadeRCNN(BaseArch): + """ + Cascade R-CNN network, see https://arxiv.org/abs/1712.00726 + + Args: + backbone (object): backbone instance + rpn_head (object): `RPNHead` instance + bbox_head (object): `BBoxHead` instance + bbox_post_process (object): `BBoxPostProcess` instance + neck (object): 'FPN' instance + mask_head (object): `MaskHead` instance + mask_post_process (object): `MaskPostProcess` instance + """ + __category__ = 'architecture' + __inject__ = [ + 'bbox_post_process', + 'mask_post_process', + ] + + def __init__(self, + backbone, + rpn_head, + bbox_head, + bbox_post_process, + neck=None, + mask_head=None, + mask_post_process=None): + super(CascadeRCNN, self).__init__() + self.backbone = backbone + self.rpn_head = rpn_head + self.bbox_head = bbox_head + self.bbox_post_process = bbox_post_process + self.neck = neck + self.mask_head = mask_head + self.mask_post_process = mask_post_process + self.with_mask = mask_head is not None + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} + neck = cfg['neck'] and create(cfg['neck'], **kwargs) + + out_shape = neck and neck.out_shape or backbone.out_shape + kwargs = {'input_shape': out_shape} + rpn_head = create(cfg['rpn_head'], **kwargs) + bbox_head = create(cfg['bbox_head'], **kwargs) + + out_shape = neck and out_shape or bbox_head.get_head().out_shape + kwargs = {'input_shape': out_shape} + mask_head = cfg['mask_head'] and create(cfg['mask_head'], **kwargs) + return { + 'backbone': backbone, + 'neck': neck, + "rpn_head": rpn_head, + "bbox_head": bbox_head, + "mask_head": mask_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + if self.neck is not None: + body_feats = self.neck(body_feats) + + if self.training: + rois, rois_num, rpn_loss = self.rpn_head(body_feats, self.inputs) + bbox_loss, bbox_feat = self.bbox_head(body_feats, rois, rois_num, + self.inputs) + rois, rois_num = self.bbox_head.get_assigned_rois() + bbox_targets = self.bbox_head.get_assigned_targets() + if self.with_mask: + mask_loss = self.mask_head(body_feats, rois, rois_num, + self.inputs, bbox_targets, bbox_feat) + return rpn_loss, bbox_loss, mask_loss + else: + return rpn_loss, bbox_loss, {} + else: + rois, rois_num, _ = self.rpn_head(body_feats, self.inputs) + preds, _ = self.bbox_head(body_feats, rois, rois_num, self.inputs) + refined_rois = self.bbox_head.get_refined_rois() + + im_shape = self.inputs['im_shape'] + scale_factor = self.inputs['scale_factor'] + + bbox, bbox_num, nms_keep_idx = self.bbox_post_process( + preds, (refined_rois, rois_num), im_shape, scale_factor) + # rescale the prediction back to origin image + bbox, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) + if not self.with_mask: + return bbox_pred, bbox_num, None + mask_out = self.mask_head(body_feats, bbox, bbox_num, self.inputs) + origin_shape = self.bbox_post_process.get_origin_shape() + mask_pred = self.mask_post_process(mask_out, bbox_pred, bbox_num, + origin_shape) + return bbox_pred, bbox_num, mask_pred + + def get_loss(self, ): + rpn_loss, bbox_loss, mask_loss = self._forward() + loss = {} + loss.update(rpn_loss) + loss.update(bbox_loss) + if self.with_mask: + loss.update(mask_loss) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + bbox_pred, bbox_num, mask_pred = self._forward() + output = { + 'bbox': bbox_pred, + 'bbox_num': bbox_num, + } + if self.with_mask: + output.update({'mask': mask_pred}) + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/centernet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/centernet.py new file mode 100644 index 0000000000000000000000000000000000000000..439e5f872668e5bcd5445adf5a6e41b320679c59 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/centernet.py @@ -0,0 +1,103 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['CenterNet'] + + +@register +class CenterNet(BaseArch): + """ + CenterNet network, see http://arxiv.org/abs/1904.07850 + + Args: + backbone (object): backbone instance + neck (object): FPN instance, default use 'CenterNetDLAFPN' + head (object): 'CenterNetHead' instance + post_process (object): 'CenterNetPostProcess' instance + for_mot (bool): whether return other features used in tracking model + + """ + __category__ = 'architecture' + __inject__ = ['post_process'] + __shared__ = ['for_mot'] + + def __init__(self, + backbone, + neck='CenterNetDLAFPN', + head='CenterNetHead', + post_process='CenterNetPostProcess', + for_mot=False): + super(CenterNet, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.post_process = post_process + self.for_mot = for_mot + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = cfg['neck'] and create(cfg['neck'], **kwargs) + + out_shape = neck and neck.out_shape or backbone.out_shape + kwargs = {'input_shape': out_shape} + head = create(cfg['head'], **kwargs) + + return {'backbone': backbone, 'neck': neck, "head": head} + + def _forward(self): + neck_feat = self.backbone(self.inputs) + if self.neck is not None: + neck_feat = self.neck(neck_feat) + head_out = self.head(neck_feat, self.inputs) + if self.for_mot: + head_out.update({'neck_feat': neck_feat}) + elif self.training: + head_out['loss'] = head_out.pop('det_loss') + return head_out + + def get_pred(self): + head_out = self._forward() + bbox, bbox_num, bbox_inds, topk_clses, topk_ys, topk_xs = self.post_process( + head_out['heatmap'], + head_out['size'], + head_out['offset'], + im_shape=self.inputs['im_shape'], + scale_factor=self.inputs['scale_factor']) + + if self.for_mot: + output = { + "bbox": bbox, + "bbox_num": bbox_num, + "bbox_inds": bbox_inds, + "topk_clses": topk_clses, + "topk_ys": topk_ys, + "topk_xs": topk_xs, + "neck_feat": head_out['neck_feat'] + } + else: + output = {"bbox": bbox, "bbox_num": bbox_num} + return output + + def get_loss(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/centertrack.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/centertrack.py new file mode 100644 index 0000000000000000000000000000000000000000..b9880dbbb21435f1fc84c4c3203e5d818143e776 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/centertrack.py @@ -0,0 +1,176 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import copy +import math +import numpy as np +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +from ..keypoint_utils import affine_transform +from ppdet.data.transform.op_helper import gaussian_radius, gaussian2D, draw_umich_gaussian + +__all__ = ['CenterTrack'] + + +@register +class CenterTrack(BaseArch): + """ + CenterTrack network, see http://arxiv.org/abs/2004.01177 + + Args: + detector (object): 'CenterNet' instance + plugin_head (object): 'CenterTrackHead' instance + tracker (object): 'CenterTracker' instance + """ + __category__ = 'architecture' + __shared__ = ['mot_metric'] + + def __init__(self, + detector='CenterNet', + plugin_head='CenterTrackHead', + tracker='CenterTracker', + mot_metric=False): + super(CenterTrack, self).__init__() + self.detector = detector + self.plugin_head = plugin_head + self.tracker = tracker + self.mot_metric = mot_metric + self.pre_image = None + self.deploy = False + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + detector = create(cfg['detector']) + detector_out_shape = detector.neck and detector.neck.out_shape or detector.backbone.out_shape + + kwargs = {'input_shape': detector_out_shape} + plugin_head = create(cfg['plugin_head'], **kwargs) + tracker = create(cfg['tracker']) + + return { + 'detector': detector, + 'plugin_head': plugin_head, + 'tracker': tracker, + } + + def _forward(self): + if self.training: + det_outs = self.detector(self.inputs) + neck_feat = det_outs['neck_feat'] + + losses = {} + for k, v in det_outs.items(): + if 'loss' not in k: continue + losses.update({k: v}) + + plugin_outs = self.plugin_head(neck_feat, self.inputs) + for k, v in plugin_outs.items(): + if 'loss' not in k: continue + losses.update({k: v}) + + losses['loss'] = det_outs['det_loss'] + plugin_outs['plugin_loss'] + return losses + + else: + if not self.mot_metric: + # detection, support bs>=1 + det_outs = self.detector(self.inputs) + return { + 'bbox': det_outs['bbox'], + 'bbox_num': det_outs['bbox_num'] + } + + else: + # MOT, only support bs=1 + if not self.deploy: + if self.pre_image is None: + self.pre_image = self.inputs['image'] + # initializing tracker for the first frame + self.tracker.init_track([]) + self.inputs['pre_image'] = self.pre_image + self.pre_image = self.inputs[ + 'image'] # Note: update for next image + + # render input heatmap from tracker status + pre_hm = self.get_additional_inputs( + self.tracker.tracks, self.inputs, with_hm=True) + self.inputs['pre_hm'] = paddle.to_tensor(pre_hm) + + # model inference + det_outs = self.detector(self.inputs) + neck_feat = det_outs['neck_feat'] + result = self.plugin_head( + neck_feat, self.inputs, det_outs['bbox'], + det_outs['bbox_inds'], det_outs['topk_clses'], + det_outs['topk_ys'], det_outs['topk_xs']) + + if not self.deploy: + # convert the cropped and 4x downsampled output coordinate system + # back to the input image coordinate system + result = self.plugin_head.centertrack_post_process( + result, self.inputs, self.tracker.out_thresh) + return result + + def get_pred(self): + return self._forward() + + def get_loss(self): + return self._forward() + + def reset_tracking(self): + self.tracker.reset() + self.pre_image = None + + def get_additional_inputs(self, dets, meta, with_hm=True): + # Render input heatmap from previous trackings. + trans_input = meta['trans_input'][0].numpy() + inp_width, inp_height = int(meta['inp_width'][0]), int(meta[ + 'inp_height'][0]) + input_hm = np.zeros((1, inp_height, inp_width), dtype=np.float32) + + for det in dets: + if det['score'] < self.tracker.pre_thresh: + continue + bbox = affine_transform_bbox(det['bbox'], trans_input, inp_width, + inp_height) + h, w = bbox[3] - bbox[1], bbox[2] - bbox[0] + if (h > 0 and w > 0): + radius = gaussian_radius( + (math.ceil(h), math.ceil(w)), min_overlap=0.7) + radius = max(0, int(radius)) + ct = np.array( + [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], + dtype=np.float32) + ct_int = ct.astype(np.int32) + if with_hm: + input_hm[0] = draw_umich_gaussian(input_hm[0], ct_int, + radius) + if with_hm: + input_hm = input_hm[np.newaxis] + return input_hm + + +def affine_transform_bbox(bbox, trans, width, height): + bbox = np.array(copy.deepcopy(bbox), dtype=np.float32) + bbox[:2] = affine_transform(bbox[:2], trans) + bbox[2:] = affine_transform(bbox[2:], trans) + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, width - 1) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, height - 1) + return bbox diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/deepsort.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/deepsort.py new file mode 100644 index 0000000000000000000000000000000000000000..164c279760311d1c5b5a6c3635cff4cc8cc4aacf --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/deepsort.py @@ -0,0 +1,70 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box + +__all__ = ['DeepSORT'] + + +@register +class DeepSORT(BaseArch): + """ + DeepSORT network, see https://arxiv.org/abs/1703.07402 + + Args: + detector (object): detector model instance + reid (object): reid model instance + tracker (object): tracker instance + """ + __category__ = 'architecture' + + def __init__(self, + detector='YOLOv3', + reid='PCBPyramid', + tracker='DeepSORTTracker'): + super(DeepSORT, self).__init__() + self.detector = detector + self.reid = reid + self.tracker = tracker + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + if cfg['detector'] != 'None': + detector = create(cfg['detector']) + else: + detector = None + reid = create(cfg['reid']) + tracker = create(cfg['tracker']) + + return { + "detector": detector, + "reid": reid, + "tracker": tracker, + } + + def _forward(self): + crops = self.inputs['crops'] + outs = {} + outs['embeddings'] = self.reid(crops) + return outs + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/detr.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/detr.py new file mode 100644 index 0000000000000000000000000000000000000000..21379185b02a493b2ffc8f4174b4524bd7412556 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/detr.py @@ -0,0 +1,101 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from .meta_arch import BaseArch +from ppdet.core.workspace import register, create + +__all__ = ['DETR'] + + +@register +class DETR(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + __shared__ = ['exclude_post_process'] + + def __init__(self, + backbone, + transformer, + detr_head, + post_process='DETRBBoxPostProcess', + exclude_post_process=False): + super(DETR, self).__init__() + self.backbone = backbone + self.transformer = transformer + self.detr_head = detr_head + self.post_process = post_process + self.exclude_post_process = exclude_post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + # transformer + kwargs = {'input_shape': backbone.out_shape} + transformer = create(cfg['transformer'], **kwargs) + # head + kwargs = { + 'hidden_dim': transformer.hidden_dim, + 'nhead': transformer.nhead, + 'input_shape': backbone.out_shape + } + detr_head = create(cfg['detr_head'], **kwargs) + + return { + 'backbone': backbone, + 'transformer': transformer, + "detr_head": detr_head, + } + + def _forward(self): + # Backbone + body_feats = self.backbone(self.inputs) + + # Transformer + pad_mask = self.inputs['pad_mask'] if self.training else None + out_transformer = self.transformer(body_feats, pad_mask, self.inputs) + + # DETR Head + if self.training: + return self.detr_head(out_transformer, body_feats, self.inputs) + else: + preds = self.detr_head(out_transformer, body_feats) + if self.exclude_post_process: + bboxes, logits, masks = preds + return bboxes, logits + else: + bbox, bbox_num = self.post_process( + preds, self.inputs['im_shape'], self.inputs['scale_factor']) + return bbox, bbox_num + + def get_loss(self): + losses = self._forward() + losses.update({ + 'loss': + paddle.add_n([v for k, v in losses.items() if 'log' not in k]) + }) + return losses + + def get_pred(self): + bbox_pred, bbox_num = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + } + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/fairmot.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/fairmot.py new file mode 100644 index 0000000000000000000000000000000000000000..2714508397cd6628298bd5097a486827078b16f4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/fairmot.py @@ -0,0 +1,100 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['FairMOT'] + + +@register +class FairMOT(BaseArch): + """ + FairMOT network, see http://arxiv.org/abs/2004.01888 + + Args: + detector (object): 'CenterNet' instance + reid (object): 'FairMOTEmbeddingHead' instance + tracker (object): 'JDETracker' instance + loss (object): 'FairMOTLoss' instance + + """ + + __category__ = 'architecture' + __inject__ = ['loss'] + + def __init__(self, + detector='CenterNet', + reid='FairMOTEmbeddingHead', + tracker='JDETracker', + loss='FairMOTLoss'): + super(FairMOT, self).__init__() + self.detector = detector + self.reid = reid + self.tracker = tracker + self.loss = loss + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + detector = create(cfg['detector']) + detector_out_shape = detector.neck and detector.neck.out_shape or detector.backbone.out_shape + + kwargs = {'input_shape': detector_out_shape} + reid = create(cfg['reid'], **kwargs) + loss = create(cfg['loss']) + tracker = create(cfg['tracker']) + + return { + 'detector': detector, + 'reid': reid, + 'loss': loss, + 'tracker': tracker + } + + def _forward(self): + loss = dict() + # det_outs keys: + # train: neck_feat, det_loss, heatmap_loss, size_loss, offset_loss (optional: iou_loss) + # eval/infer: neck_feat, bbox, bbox_inds + det_outs = self.detector(self.inputs) + neck_feat = det_outs['neck_feat'] + if self.training: + reid_loss = self.reid(neck_feat, self.inputs) + + det_loss = det_outs['det_loss'] + loss = self.loss(det_loss, reid_loss) + for k, v in det_outs.items(): + if 'loss' not in k: + continue + loss.update({k: v}) + loss.update({'reid_loss': reid_loss}) + return loss + else: + pred_dets, pred_embs = self.reid( + neck_feat, self.inputs, det_outs['bbox'], det_outs['bbox_inds'], + det_outs['topk_clses']) + return pred_dets, pred_embs + + def get_pred(self): + output = self._forward() + return output + + def get_loss(self): + loss = self._forward() + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/faster_rcnn.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/faster_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..41c286fe02ef9d25f5b086ed99931fdd2aa70062 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/faster_rcnn.py @@ -0,0 +1,162 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +import numpy as np + +__all__ = ['FasterRCNN'] + + +@register +class FasterRCNN(BaseArch): + """ + Faster R-CNN network, see https://arxiv.org/abs/1506.01497 + + Args: + backbone (object): backbone instance + rpn_head (object): `RPNHead` instance + bbox_head (object): `BBoxHead` instance + bbox_post_process (object): `BBoxPostProcess` instance + neck (object): 'FPN' instance + """ + __category__ = 'architecture' + __inject__ = ['bbox_post_process'] + + def __init__(self, + backbone, + rpn_head, + bbox_head, + bbox_post_process, + neck=None): + super(FasterRCNN, self).__init__() + self.backbone = backbone + self.neck = neck + self.rpn_head = rpn_head + self.bbox_head = bbox_head + self.bbox_post_process = bbox_post_process + + def init_cot_head(self, relationship): + self.bbox_head.init_cot_head(relationship) + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} + neck = cfg['neck'] and create(cfg['neck'], **kwargs) + + out_shape = neck and neck.out_shape or backbone.out_shape + kwargs = {'input_shape': out_shape} + rpn_head = create(cfg['rpn_head'], **kwargs) + bbox_head = create(cfg['bbox_head'], **kwargs) + return { + 'backbone': backbone, + 'neck': neck, + "rpn_head": rpn_head, + "bbox_head": bbox_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + if self.neck is not None: + body_feats = self.neck(body_feats) + if self.training: + rois, rois_num, rpn_loss = self.rpn_head(body_feats, self.inputs) + bbox_loss, _ = self.bbox_head(body_feats, rois, rois_num, + self.inputs) + return rpn_loss, bbox_loss + else: + rois, rois_num, _ = self.rpn_head(body_feats, self.inputs) + preds, _ = self.bbox_head(body_feats, rois, rois_num, None) + im_shape = self.inputs['im_shape'] + scale_factor = self.inputs['scale_factor'] + bbox, bbox_num, nms_keep_idx = self.bbox_post_process(preds, (rois, rois_num), + im_shape, scale_factor) + + # rescale the prediction back to origin image + bboxes, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + extra_data['scores'] = preds[1] # predict scores (probability) + # Todo: get logits output + extra_data['nms_keep_idx'] = nms_keep_idx # bbox index before nms + return bbox_pred, bbox_num, extra_data + else: + return bbox_pred, bbox_num + + + def get_loss(self, ): + rpn_loss, bbox_loss = self._forward() + loss = {} + loss.update(rpn_loss) + loss.update(bbox_loss) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + if self.use_extra_data: + bbox_pred, bbox_num, extra_data = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num, 'extra_data': extra_data} + else: + bbox_pred, bbox_num = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output + + def target_bbox_forward(self, data): + body_feats = self.backbone(data) + if self.neck is not None: + body_feats = self.neck(body_feats) + rois = [roi for roi in data['gt_bbox']] + rois_num = paddle.concat([paddle.shape(roi)[0] for roi in rois]) + + preds, _ = self.bbox_head(body_feats, rois, rois_num, None, cot=True) + return preds + + def relationship_learning(self, loader, num_classes_novel): + print('computing relationship') + train_labels_list = [] + label_list = [] + + for step_id, data in enumerate(loader): + _, bbox_prob = self.target_bbox_forward(data) + batch_size = data['im_id'].shape[0] + for i in range(batch_size): + num_bbox = data['gt_class'][i].shape[0] + train_labels = data['gt_class'][i] + train_labels_list.append(train_labels.numpy().squeeze(1)) + base_labels = bbox_prob.detach().numpy()[:,:-1] + label_list.append(base_labels) + + labels = np.concatenate(train_labels_list, 0) + probabilities = np.concatenate(label_list, 0) + N_t = np.max(labels) + 1 + conditional = [] + for i in range(N_t): + this_class = probabilities[labels == i] + average = np.mean(this_class, axis=0, keepdims=True) + conditional.append(average) + return np.concatenate(conditional) \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/fcos.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/fcos.py new file mode 100644 index 0000000000000000000000000000000000000000..efebb6efb8a20558540772f5c31994e15ff8d09c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/fcos.py @@ -0,0 +1,96 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['FCOS'] + + +@register +class FCOS(BaseArch): + """ + FCOS network, see https://arxiv.org/abs/1904.01355 + + Args: + backbone (object): backbone instance + neck (object): 'FPN' instance + fcos_head (object): 'FCOSHead' instance + ssod_loss (object): 'SSODFCOSLoss' instance, only used for semi-det(ssod) + """ + + __category__ = 'architecture' + __inject__ = ['ssod_loss'] + + def __init__(self, + backbone='ResNet', + neck='FPN', + fcos_head='FCOSHead', + ssod_loss='SSODFCOSLoss'): + super(FCOS, self).__init__() + self.backbone = backbone + self.neck = neck + self.fcos_head = fcos_head + + # for ssod, semi-det + self.is_teacher = False + self.ssod_loss = ssod_loss + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + fcos_head = create(cfg['fcos_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "fcos_head": fcos_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + fpn_feats = self.neck(body_feats) + + self.is_teacher = self.inputs.get('is_teacher', False) + if self.training or self.is_teacher: + losses = self.fcos_head(fpn_feats, self.inputs) + return losses + else: + fcos_head_outs = self.fcos_head(fpn_feats) + bbox_pred, bbox_num = self.fcos_head.post_process( + fcos_head_outs, self.inputs['scale_factor']) + return {'bbox': bbox_pred, 'bbox_num': bbox_num} + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() + + def get_loss_keys(self): + return ['loss_cls', 'loss_box', 'loss_quality'] + + def get_ssod_loss(self, student_head_outs, teacher_head_outs, train_cfg): + ssod_losses = self.ssod_loss(student_head_outs, teacher_head_outs, + train_cfg) + return ssod_losses diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/gfl.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/gfl.py new file mode 100644 index 0000000000000000000000000000000000000000..91c13077f90b93fdd40686c579c7cd5eac2c9296 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/gfl.py @@ -0,0 +1,87 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['GFL'] + + +@register +class GFL(BaseArch): + """ + Generalized Focal Loss network, see https://arxiv.org/abs/2006.04388 + + Args: + backbone (object): backbone instance + neck (object): 'FPN' instance + head (object): 'GFLHead' instance + """ + + __category__ = 'architecture' + + def __init__(self, backbone, neck, head='GFLHead'): + super(GFL, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + fpn_feats = self.neck(body_feats) + head_outs = self.head(fpn_feats) + if not self.training: + im_shape = self.inputs['im_shape'] + scale_factor = self.inputs['scale_factor'] + bboxes, bbox_num = self.head.post_process(head_outs, im_shape, + scale_factor) + return bboxes, bbox_num + else: + return head_outs + + def get_loss(self, ): + loss = {} + + head_outs = self._forward() + loss_gfl = self.head.get_loss(head_outs, self.inputs) + loss.update(loss_gfl) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + bbox_pred, bbox_num = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/jde.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/jde.py new file mode 100644 index 0000000000000000000000000000000000000000..11b45c8c1614ee06807e4bdca0d924b2bf7c3dc8 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/jde.py @@ -0,0 +1,110 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['JDE'] + + +@register +class JDE(BaseArch): + __category__ = 'architecture' + __shared__ = ['metric'] + """ + JDE network, see https://arxiv.org/abs/1909.12605v1 + + Args: + detector (object): detector model instance + reid (object): reid model instance + tracker (object): tracker instance + metric (str): 'MOTDet' for training and detection evaluation, 'ReID' + for ReID embedding evaluation, or 'MOT' for multi object tracking + evaluation. + """ + + def __init__(self, + detector='YOLOv3', + reid='JDEEmbeddingHead', + tracker='JDETracker', + metric='MOT'): + super(JDE, self).__init__() + self.detector = detector + self.reid = reid + self.tracker = tracker + self.metric = metric + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + detector = create(cfg['detector']) + kwargs = {'input_shape': detector.neck.out_shape} + + reid = create(cfg['reid'], **kwargs) + + tracker = create(cfg['tracker']) + + return { + "detector": detector, + "reid": reid, + "tracker": tracker, + } + + def _forward(self): + det_outs = self.detector(self.inputs) + + if self.training: + emb_feats = det_outs['emb_feats'] + loss_confs = det_outs['det_losses']['loss_confs'] + loss_boxes = det_outs['det_losses']['loss_boxes'] + jde_losses = self.reid( + emb_feats, + self.inputs, + loss_confs=loss_confs, + loss_boxes=loss_boxes) + return jde_losses + else: + if self.metric == 'MOTDet': + det_results = { + 'bbox': det_outs['bbox'], + 'bbox_num': det_outs['bbox_num'], + } + return det_results + + elif self.metric == 'MOT': + emb_feats = det_outs['emb_feats'] + bboxes = det_outs['bbox'] + boxes_idx = det_outs['boxes_idx'] + nms_keep_idx = det_outs['nms_keep_idx'] + + pred_dets, pred_embs = self.reid( + emb_feats, + self.inputs, + bboxes=bboxes, + boxes_idx=boxes_idx, + nms_keep_idx=nms_keep_idx) + return pred_dets, pred_embs + + else: + raise ValueError("Unknown metric {} for multi object tracking.". + format(self.metric)) + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrhrnet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrhrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..366e9e3eed466f5e52503e94e6dea2afbf556c7e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrhrnet.py @@ -0,0 +1,287 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from scipy.optimize import linear_sum_assignment +from collections import abc, defaultdict +import numpy as np +import paddle + +from ppdet.core.workspace import register, create, serializable +from .meta_arch import BaseArch +from .. import layers as L +from ..keypoint_utils import transpred + +__all__ = ['HigherHRNet'] + + +@register +class HigherHRNet(BaseArch): + __category__ = 'architecture' + + def __init__(self, + backbone='HRNet', + hrhrnet_head='HrHRNetHead', + post_process='HrHRNetPostProcess', + eval_flip=True, + flip_perm=None, + max_num_people=30): + """ + HigherHRNet network, see https://arxiv.org/abs/1908.10357; + HigherHRNet+swahr, see https://arxiv.org/abs/2012.15175 + + Args: + backbone (nn.Layer): backbone instance + hrhrnet_head (nn.Layer): keypoint_head instance + bbox_post_process (object): `BBoxPostProcess` instance + """ + super(HigherHRNet, self).__init__() + self.backbone = backbone + self.hrhrnet_head = hrhrnet_head + self.post_process = post_process + self.flip = eval_flip + self.flip_perm = paddle.to_tensor(flip_perm) + self.deploy = False + self.interpolate = L.Upsample(2, mode='bilinear') + self.pool = L.MaxPool(5, 1, 2) + self.max_num_people = max_num_people + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + # head + kwargs = {'input_shape': backbone.out_shape} + hrhrnet_head = create(cfg['hrhrnet_head'], **kwargs) + post_process = create(cfg['post_process']) + + return { + 'backbone': backbone, + "hrhrnet_head": hrhrnet_head, + "post_process": post_process, + } + + def _forward(self): + if self.flip and not self.training and not self.deploy: + self.inputs['image'] = paddle.concat( + (self.inputs['image'], paddle.flip(self.inputs['image'], [3]))) + body_feats = self.backbone(self.inputs) + + if self.training: + return self.hrhrnet_head(body_feats, self.inputs) + else: + outputs = self.hrhrnet_head(body_feats) + + if self.flip and not self.deploy: + outputs = [paddle.split(o, 2) for o in outputs] + output_rflip = [ + paddle.flip(paddle.gather(o[1], self.flip_perm, 1), [3]) + for o in outputs + ] + output1 = [o[0] for o in outputs] + heatmap = (output1[0] + output_rflip[0]) / 2. + tagmaps = [output1[1], output_rflip[1]] + outputs = [heatmap] + tagmaps + outputs = self.get_topk(outputs) + + if self.deploy: + return outputs + + res_lst = [] + h = self.inputs['im_shape'][0, 0].numpy().item() + w = self.inputs['im_shape'][0, 1].numpy().item() + kpts, scores = self.post_process(*outputs, h, w) + res_lst.append([kpts, scores]) + return res_lst + + def get_loss(self): + return self._forward() + + def get_pred(self): + outputs = {} + res_lst = self._forward() + outputs['keypoint'] = res_lst + return outputs + + def get_topk(self, outputs): + # resize to image size + outputs = [self.interpolate(x) for x in outputs] + if len(outputs) == 3: + tagmap = paddle.concat( + (outputs[1].unsqueeze(4), outputs[2].unsqueeze(4)), axis=4) + else: + tagmap = outputs[1].unsqueeze(4) + + heatmap = outputs[0] + N, J = 1, self.hrhrnet_head.num_joints + heatmap_maxpool = self.pool(heatmap) + # topk + maxmap = heatmap * (heatmap == heatmap_maxpool) + maxmap = maxmap.reshape([N, J, -1]) + heat_k, inds_k = maxmap.topk(self.max_num_people, axis=2) + + outputs = [heatmap, tagmap, heat_k, inds_k] + return outputs + + +@register +@serializable +class HrHRNetPostProcess(object): + ''' + HrHRNet postprocess contain: + 1) get topk keypoints in the output heatmap + 2) sample the tagmap's value corresponding to each of the topk coordinate + 3) match different joints to combine to some people with Hungary algorithm + 4) adjust the coordinate by +-0.25 to decrease error std + 5) salvage missing joints by check positivity of heatmap - tagdiff_norm + Args: + max_num_people (int): max number of people support in postprocess + heat_thresh (float): value of topk below this threshhold will be ignored + tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init + + inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk + original_height, original_width (float): the original image size + ''' + + def __init__(self, max_num_people=30, heat_thresh=0.1, tag_thresh=1.): + self.max_num_people = max_num_people + self.heat_thresh = heat_thresh + self.tag_thresh = tag_thresh + + def lerp(self, j, y, x, heatmap): + H, W = heatmap.shape[-2:] + left = np.clip(x - 1, 0, W - 1) + right = np.clip(x + 1, 0, W - 1) + up = np.clip(y - 1, 0, H - 1) + down = np.clip(y + 1, 0, H - 1) + offset_y = np.where(heatmap[j, down, x] > heatmap[j, up, x], 0.25, + -0.25) + offset_x = np.where(heatmap[j, y, right] > heatmap[j, y, left], 0.25, + -0.25) + return offset_y + 0.5, offset_x + 0.5 + + def __call__(self, heatmap, tagmap, heat_k, inds_k, original_height, + original_width): + + N, J, H, W = heatmap.shape + assert N == 1, "only support batch size 1" + heatmap = heatmap[0].cpu().detach().numpy() + tagmap = tagmap[0].cpu().detach().numpy() + heats = heat_k[0].cpu().detach().numpy() + inds_np = inds_k[0].cpu().detach().numpy() + y = inds_np // W + x = inds_np % W + tags = tagmap[np.arange(J)[None, :].repeat(self.max_num_people), + y.flatten(), x.flatten()].reshape(J, -1, tagmap.shape[-1]) + coords = np.stack((y, x), axis=2) + # threshold + mask = heats > self.heat_thresh + # cluster + cluster = defaultdict(lambda: { + 'coords': np.zeros((J, 2), dtype=np.float32), + 'scores': np.zeros(J, dtype=np.float32), + 'tags': [] + }) + for jid, m in enumerate(mask): + num_valid = m.sum() + if num_valid == 0: + continue + valid_inds = np.where(m)[0] + valid_tags = tags[jid, m, :] + if len(cluster) == 0: # initialize + for i in valid_inds: + tag = tags[jid, i] + key = tag[0] + cluster[key]['tags'].append(tag) + cluster[key]['scores'][jid] = heats[jid, i] + cluster[key]['coords'][jid] = coords[jid, i] + continue + candidates = list(cluster.keys())[:self.max_num_people] + centroids = [ + np.mean( + cluster[k]['tags'], axis=0) for k in candidates + ] + num_clusters = len(centroids) + # shape is (num_valid, num_clusters, tag_dim) + dist = valid_tags[:, None, :] - np.array(centroids)[None, ...] + l2_dist = np.linalg.norm(dist, ord=2, axis=2) + # modulate dist with heat value, see `use_detection_val` + cost = np.round(l2_dist) * 100 - heats[jid, m, None] + # pad the cost matrix, otherwise new pose are ignored + if num_valid > num_clusters: + cost = np.pad(cost, ((0, 0), (0, num_valid - num_clusters)), + 'constant', + constant_values=((0, 0), (0, 1e-10))) + rows, cols = linear_sum_assignment(cost) + for y, x in zip(rows, cols): + tag = tags[jid, y] + if y < num_valid and x < num_clusters and \ + l2_dist[y, x] < self.tag_thresh: + key = candidates[x] # merge to cluster + else: + key = tag[0] # initialize new cluster + cluster[key]['tags'].append(tag) + cluster[key]['scores'][jid] = heats[jid, y] + cluster[key]['coords'][jid] = coords[jid, y] + + # shape is [k, J, 2] and [k, J] + pose_tags = np.array([cluster[k]['tags'] for k in cluster]) + pose_coords = np.array([cluster[k]['coords'] for k in cluster]) + pose_scores = np.array([cluster[k]['scores'] for k in cluster]) + valid = pose_scores > 0 + + pose_kpts = np.zeros((pose_scores.shape[0], J, 3), dtype=np.float32) + if valid.sum() == 0: + return pose_kpts, pose_kpts + + # refine coords + valid_coords = pose_coords[valid].astype(np.int32) + y = valid_coords[..., 0].flatten() + x = valid_coords[..., 1].flatten() + _, j = np.nonzero(valid) + offsets = self.lerp(j, y, x, heatmap) + pose_coords[valid, 0] += offsets[0] + pose_coords[valid, 1] += offsets[1] + + # mean score before salvage + mean_score = pose_scores.mean(axis=1) + pose_kpts[valid, 2] = pose_scores[valid] + + # salvage missing joints + if True: + for pid, coords in enumerate(pose_coords): + tag_mean = np.array(pose_tags[pid]).mean(axis=0) + norm = np.sum((tagmap - tag_mean)**2, axis=3)**0.5 + score = heatmap - np.round(norm) # (J, H, W) + flat_score = score.reshape(J, -1) + max_inds = np.argmax(flat_score, axis=1) + max_scores = np.max(flat_score, axis=1) + salvage_joints = (pose_scores[pid] == 0) & (max_scores > 0) + if salvage_joints.sum() == 0: + continue + y = max_inds[salvage_joints] // W + x = max_inds[salvage_joints] % W + offsets = self.lerp(salvage_joints.nonzero()[0], y, x, heatmap) + y = y.astype(np.float32) + offsets[0] + x = x.astype(np.float32) + offsets[1] + pose_coords[pid][salvage_joints, 0] = y + pose_coords[pid][salvage_joints, 1] = x + pose_kpts[pid][salvage_joints, 2] = max_scores[salvage_joints] + pose_kpts[..., :2] = transpred(pose_coords[..., :2][..., ::-1], + original_height, original_width, + min(H, W)) + return pose_kpts, mean_score diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrnet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..8d50502e71143dfc3dbfc28f7c9bfec912a832d0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_hrnet.py @@ -0,0 +1,468 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import numpy as np +import math +import cv2 +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +from ..keypoint_utils import transform_preds +from .. import layers as L +from paddle.nn import functional as F + +__all__ = ['TopDownHRNet', 'TinyPose3DHRNet', 'TinyPose3DHRHeatmapNet'] + + +@register +class TopDownHRNet(BaseArch): + __category__ = 'architecture' + __inject__ = ['loss'] + + def __init__(self, + width, + num_joints, + backbone='HRNet', + loss='KeyPointMSELoss', + post_process='HRNetPostProcess', + flip_perm=None, + flip=True, + shift_heatmap=True, + use_dark=True): + """ + HRNet network, see https://arxiv.org/abs/1902.09212 + + Args: + backbone (nn.Layer): backbone instance + post_process (object): `HRNetPostProcess` instance + flip_perm (list): The left-right joints exchange order list + use_dark(bool): Whether to use DARK in post processing + """ + super(TopDownHRNet, self).__init__() + self.backbone = backbone + self.post_process = HRNetPostProcess(use_dark) + self.loss = loss + self.flip_perm = flip_perm + self.flip = flip + self.final_conv = L.Conv2d(width, num_joints, 1, 1, 0, bias=True) + self.shift_heatmap = shift_heatmap + self.deploy = False + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + return {'backbone': backbone, } + + def _forward(self): + feats = self.backbone(self.inputs) + hrnet_outputs = self.final_conv(feats[0]) + + if self.training: + return self.loss(hrnet_outputs, self.inputs) + elif self.deploy: + outshape = hrnet_outputs.shape + max_idx = paddle.argmax( + hrnet_outputs.reshape( + (outshape[0], outshape[1], outshape[2] * outshape[3])), + axis=-1) + return hrnet_outputs, max_idx + else: + if self.flip: + self.inputs['image'] = self.inputs['image'].flip([3]) + feats = self.backbone(self.inputs) + output_flipped = self.final_conv(feats[0]) + output_flipped = self.flip_back(output_flipped.numpy(), + self.flip_perm) + output_flipped = paddle.to_tensor(output_flipped.copy()) + if self.shift_heatmap: + output_flipped[:, :, :, 1:] = output_flipped.clone( + )[:, :, :, 0:-1] + hrnet_outputs = (hrnet_outputs + output_flipped) * 0.5 + imshape = (self.inputs['im_shape'].numpy() + )[:, ::-1] if 'im_shape' in self.inputs else None + center = self.inputs['center'].numpy( + ) if 'center' in self.inputs else np.round(imshape / 2.) + scale = self.inputs['scale'].numpy( + ) if 'scale' in self.inputs else imshape / 200. + outputs = self.post_process(hrnet_outputs, center, scale) + return outputs + + def get_loss(self): + return self._forward() + + def get_pred(self): + res_lst = self._forward() + outputs = {'keypoint': res_lst} + return outputs + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped + + +class HRNetPostProcess(object): + def __init__(self, use_dark=True): + self.use_dark = use_dark + + def get_max_preds(self, heatmaps): + '''get predictions from score maps + + Args: + heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints + ''' + assert isinstance(heatmaps, + np.ndarray), 'heatmaps should be numpy.ndarray' + assert heatmaps.ndim == 4, 'batch_images should be 4-ndim' + + batch_size = heatmaps.shape[0] + num_joints = heatmaps.shape[1] + width = heatmaps.shape[3] + heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1)) + idx = np.argmax(heatmaps_reshaped, 2) + maxvals = np.amax(heatmaps_reshaped, 2) + + maxvals = maxvals.reshape((batch_size, num_joints, 1)) + idx = idx.reshape((batch_size, num_joints, 1)) + + preds = np.tile(idx, (1, 1, 2)).astype(np.float32) + + preds[:, :, 0] = (preds[:, :, 0]) % width + preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) + + pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) + pred_mask = pred_mask.astype(np.float32) + + preds *= pred_mask + + return preds, maxvals + + def gaussian_blur(self, heatmap, kernel): + border = (kernel - 1) // 2 + batch_size = heatmap.shape[0] + num_joints = heatmap.shape[1] + height = heatmap.shape[2] + width = heatmap.shape[3] + for i in range(batch_size): + for j in range(num_joints): + origin_max = np.max(heatmap[i, j]) + dr = np.zeros((height + 2 * border, width + 2 * border)) + dr[border:-border, border:-border] = heatmap[i, j].copy() + dr = cv2.GaussianBlur(dr, (kernel, kernel), 0) + heatmap[i, j] = dr[border:-border, border:-border].copy() + heatmap[i, j] *= origin_max / np.max(heatmap[i, j]) + return heatmap + + def dark_parse(self, hm, coord): + heatmap_height = hm.shape[0] + heatmap_width = hm.shape[1] + px = int(coord[0]) + py = int(coord[1]) + if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2: + dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1]) + dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px]) + dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2]) + dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \ + + hm[py-1][px-1]) + dyy = 0.25 * ( + hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px]) + derivative = np.matrix([[dx], [dy]]) + hessian = np.matrix([[dxx, dxy], [dxy, dyy]]) + if dxx * dyy - dxy**2 != 0: + hessianinv = hessian.I + offset = -hessianinv * derivative + offset = np.squeeze(np.array(offset.T), axis=0) + coord += offset + return coord + + def dark_postprocess(self, hm, coords, kernelsize): + '''DARK postpocessing, Zhang et al. Distribution-Aware Coordinate + Representation for Human Pose Estimation (CVPR 2020). + ''' + + hm = self.gaussian_blur(hm, kernelsize) + hm = np.maximum(hm, 1e-10) + hm = np.log(hm) + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + coords[n, p] = self.dark_parse(hm[n][p], coords[n][p]) + return coords + + def get_final_preds(self, heatmaps, center, scale, kernelsize=3): + """the highest heatvalue location with a quarter offset in the + direction from the highest response to the second highest response. + + Args: + heatmaps (numpy.ndarray): The predicted heatmaps + center (numpy.ndarray): The boxes center + scale (numpy.ndarray): The scale factor + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints + """ + coords, maxvals = self.get_max_preds(heatmaps) + + heatmap_height = heatmaps.shape[2] + heatmap_width = heatmaps.shape[3] + + if self.use_dark: + coords = self.dark_postprocess(heatmaps, coords, kernelsize) + else: + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + hm = heatmaps[n][p] + px = int(math.floor(coords[n][p][0] + 0.5)) + py = int(math.floor(coords[n][p][1] + 0.5)) + if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1: + diff = np.array([ + hm[py][px + 1] - hm[py][px - 1], + hm[py + 1][px] - hm[py - 1][px] + ]) + coords[n][p] += np.sign(diff) * .25 + preds = coords.copy() + + # Transform back + for i in range(coords.shape[0]): + preds[i] = transform_preds(coords[i], center[i], scale[i], + [heatmap_width, heatmap_height]) + + return preds, maxvals + + def __call__(self, output, center, scale): + preds, maxvals = self.get_final_preds(output.numpy(), center, scale) + outputs = [[ + np.concatenate( + (preds, maxvals), axis=-1), np.mean( + maxvals, axis=1) + ]] + return outputs + + +class TinyPose3DPostProcess(object): + def __init__(self): + pass + + def __call__(self, output, center, scale): + """ + Args: + output (numpy.ndarray): numpy.ndarray([batch_size, num_joints, 3]), keypoints coords + scale (numpy.ndarray): The scale factor + Returns: + preds: numpy.ndarray([batch_size, num_joints, 3]), keypoints coords + """ + + preds = output.numpy().copy() + + # Transform back + for i in range(output.shape[0]): # batch_size + preds[i][:, 0] = preds[i][:, 0] * scale[i][0] + preds[i][:, 1] = preds[i][:, 1] * scale[i][1] + + return preds + + +def soft_argmax(heatmaps, joint_num): + dims = heatmaps.shape + depth_dim = (int)(dims[1] / joint_num) + heatmaps = heatmaps.reshape((-1, joint_num, depth_dim * dims[2] * dims[3])) + heatmaps = F.softmax(heatmaps, 2) + heatmaps = heatmaps.reshape((-1, joint_num, depth_dim, dims[2], dims[3])) + + accu_x = heatmaps.sum(axis=(2, 3)) + accu_y = heatmaps.sum(axis=(2, 4)) + accu_z = heatmaps.sum(axis=(3, 4)) + + accu_x = accu_x * paddle.arange(1, 33) + accu_y = accu_y * paddle.arange(1, 33) + accu_z = accu_z * paddle.arange(1, 33) + + accu_x = accu_x.sum(axis=2, keepdim=True) - 1 + accu_y = accu_y.sum(axis=2, keepdim=True) - 1 + accu_z = accu_z.sum(axis=2, keepdim=True) - 1 + + coord_out = paddle.concat( + (accu_x, accu_y, accu_z), axis=2) # [batch_size, joint_num, 3] + + return coord_out + + +@register +class TinyPose3DHRHeatmapNet(BaseArch): + __category__ = 'architecture' + __inject__ = ['loss'] + + def __init__( + self, + width, # 40, backbone输出的channel数目 + num_joints, + backbone='HRNet', + loss='KeyPointRegressionMSELoss', + post_process=TinyPose3DPostProcess): + """ + Args: + backbone (nn.Layer): backbone instance + post_process (object): post process instance + """ + super(TinyPose3DHRHeatmapNet, self).__init__() + + self.backbone = backbone + self.post_process = TinyPose3DPostProcess() + self.loss = loss + self.deploy = False + self.num_joints = num_joints + + self.final_conv = L.Conv2d(width, num_joints * 32, 1, 1, 0, bias=True) + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + return {'backbone': backbone, } + + def _forward(self): + feats = self.backbone(self.inputs) # feats:[[batch_size, 40, 32, 24]] + + hrnet_outputs = self.final_conv(feats[0]) + res = soft_argmax(hrnet_outputs, self.num_joints) + return res + + def get_loss(self): + pose3d = self._forward() + loss = self.loss(pose3d, None, self.inputs) + outputs = {'loss': loss} + return outputs + + def get_pred(self): + res_lst = self._forward() + outputs = {'pose3d': res_lst} + return outputs + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped + + +@register +class TinyPose3DHRNet(BaseArch): + __category__ = 'architecture' + __inject__ = ['loss'] + + def __init__(self, + width, + num_joints, + fc_channel=768, + backbone='HRNet', + loss='KeyPointRegressionMSELoss', + post_process=TinyPose3DPostProcess): + """ + Args: + backbone (nn.Layer): backbone instance + post_process (object): post process instance + """ + super(TinyPose3DHRNet, self).__init__() + self.backbone = backbone + self.post_process = TinyPose3DPostProcess() + self.loss = loss + self.deploy = False + self.num_joints = num_joints + + self.final_conv = L.Conv2d(width, num_joints, 1, 1, 0, bias=True) + + self.flatten = paddle.nn.Flatten(start_axis=2, stop_axis=3) + self.fc1 = paddle.nn.Linear(fc_channel, 256) + self.act1 = paddle.nn.ReLU() + self.fc2 = paddle.nn.Linear(256, 64) + self.act2 = paddle.nn.ReLU() + self.fc3 = paddle.nn.Linear(64, 3) + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + return {'backbone': backbone, } + + def _forward(self): + ''' + self.inputs is a dict + ''' + feats = self.backbone( + self.inputs) # feats:[[batch_size, 40, width/4, height/4]] + + hrnet_outputs = self.final_conv( + feats[0]) # hrnet_outputs: [batch_size, num_joints*32,32,32] + + flatten_res = self.flatten( + hrnet_outputs) # [batch_size,num_joints*32,32*32] + + res = self.fc1(flatten_res) + res = self.act1(res) + res = self.fc2(res) + res = self.act2(res) + res = self.fc3(res) + + if self.training: + return self.loss(res, self.inputs) + else: # export model need + return res + + def get_loss(self): + return self._forward() + + def get_pred(self): + res_lst = self._forward() + outputs = {'pose3d': res_lst} + return outputs + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_petr.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_petr.py new file mode 100644 index 0000000000000000000000000000000000000000..b587c1f0668c968371def398c08b5968839c5b6f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/keypoint_petr.py @@ -0,0 +1,217 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is base on https://github.com/hikvision-research/opera/blob/main/opera/models/detectors/petr.py +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register +from .meta_arch import BaseArch +from .. import layers as L + +__all__ = ['PETR'] + + +@register +class PETR(BaseArch): + __category__ = 'architecture' + __inject__ = ['backbone', 'neck', 'bbox_head'] + + def __init__(self, + backbone='ResNet', + neck='ChannelMapper', + bbox_head='PETRHead'): + """ + PETR, see https://openaccess.thecvf.com/content/CVPR2022/papers/Shi_End-to-End_Multi-Person_Pose_Estimation_With_Transformers_CVPR_2022_paper.pdf + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck between backbone and head + bbox_head (nn.Layer): model output and loss + """ + super(PETR, self).__init__() + self.backbone = backbone + if neck is not None: + self.with_neck = True + self.neck = neck + self.bbox_head = bbox_head + self.deploy = False + + def extract_feat(self, img): + """Directly extract features from the backbone+neck.""" + x = self.backbone(img) + if self.with_neck: + x = self.neck(x) + return x + + def get_inputs(self): + img_metas = [] + gt_bboxes = [] + gt_labels = [] + gt_keypoints = [] + gt_areas = [] + pad_gt_mask = self.inputs['pad_gt_mask'].astype("bool").squeeze(-1) + for idx, im_shape in enumerate(self.inputs['im_shape']): + img_meta = { + 'img_shape': im_shape.astype("int32").tolist() + [1, ], + 'batch_input_shape': self.inputs['image'].shape[-2:], + 'image_name': self.inputs['image_file'][idx] + } + img_metas.append(img_meta) + if (not pad_gt_mask[idx].any()): + gt_keypoints.append(self.inputs['gt_joints'][idx][:1]) + gt_labels.append(self.inputs['gt_class'][idx][:1]) + gt_bboxes.append(self.inputs['gt_bbox'][idx][:1]) + gt_areas.append(self.inputs['gt_areas'][idx][:1]) + continue + + gt_keypoints.append(self.inputs['gt_joints'][idx][pad_gt_mask[idx]]) + gt_labels.append(self.inputs['gt_class'][idx][pad_gt_mask[idx]]) + gt_bboxes.append(self.inputs['gt_bbox'][idx][pad_gt_mask[idx]]) + gt_areas.append(self.inputs['gt_areas'][idx][pad_gt_mask[idx]]) + + return img_metas, gt_bboxes, gt_labels, gt_keypoints, gt_areas + + def get_loss(self): + """ + Args: + img (Tensor): Input images of shape (N, C, H, W). + Typically these should be mean centered and std scaled. + img_metas (list[dict]): A List of image info dict where each dict + has: 'img_shape', 'scale_factor', 'flip', and may also contain + 'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'. + For details on the values of these keys see + :class:`mmdet.datasets.pipelines.Collect`. + gt_bboxes (list[Tensor]): Each item are the truth boxes for each + image in [tl_x, tl_y, br_x, br_y] format. + gt_labels (list[Tensor]): Class indices corresponding to each box. + gt_keypoints (list[Tensor]): Each item are the truth keypoints for + each image in [p^{1}_x, p^{1}_y, p^{1}_v, ..., p^{K}_x, + p^{K}_y, p^{K}_v] format. + gt_areas (list[Tensor]): mask areas corresponding to each box. + gt_bboxes_ignore (None | list[Tensor]): Specify which bounding + boxes can be ignored when computing the loss. + + Returns: + dict[str, Tensor]: A dictionary of loss components. + """ + + img_metas, gt_bboxes, gt_labels, gt_keypoints, gt_areas = self.get_inputs( + ) + gt_bboxes_ignore = getattr(self.inputs, 'gt_bboxes_ignore', None) + + x = self.extract_feat(self.inputs) + losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes, + gt_labels, gt_keypoints, gt_areas, + gt_bboxes_ignore) + loss = 0 + for k, v in losses.items(): + loss += v + losses['loss'] = loss + + return losses + + def get_pred_numpy(self): + """Used for computing network flops. + """ + + img = self.inputs['image'] + batch_size, _, height, width = img.shape + dummy_img_metas = [ + dict( + batch_input_shape=(height, width), + img_shape=(height, width, 3), + scale_factor=(1., 1., 1., 1.)) for _ in range(batch_size) + ] + x = self.extract_feat(img) + outs = self.bbox_head(x, img_metas=dummy_img_metas) + bbox_list = self.bbox_head.get_bboxes( + *outs, dummy_img_metas, rescale=True) + return bbox_list + + def get_pred(self): + """ + """ + img = self.inputs['image'] + batch_size, _, height, width = img.shape + img_metas = [ + dict( + batch_input_shape=(height, width), + img_shape=(height, width, 3), + scale_factor=self.inputs['scale_factor'][i]) + for i in range(batch_size) + ] + kptpred = self.simple_test( + self.inputs, img_metas=img_metas, rescale=True) + keypoints = kptpred[0][1][0] + bboxs = kptpred[0][0][0] + keypoints[..., 2] = bboxs[:, None, 4] + res_lst = [[keypoints, bboxs[:, 4]]] + outputs = {'keypoint': res_lst} + return outputs + + def simple_test(self, inputs, img_metas, rescale=False): + """Test function without test time augmentation. + + Args: + inputs (list[paddle.Tensor]): List of multiple images. + img_metas (list[dict]): List of image information. + rescale (bool, optional): Whether to rescale the results. + Defaults to False. + + Returns: + list[list[np.ndarray]]: BBox and keypoint results of each image + and classes. The outer list corresponds to each image. + The inner list corresponds to each class. + """ + batch_size = len(img_metas) + assert batch_size == 1, 'Currently only batch_size 1 for inference ' \ + f'mode is supported. Found batch_size {batch_size}.' + feat = self.extract_feat(inputs) + results_list = self.bbox_head.simple_test( + feat, img_metas, rescale=rescale) + + bbox_kpt_results = [ + self.bbox_kpt2result(det_bboxes, det_labels, det_kpts, + self.bbox_head.num_classes) + for det_bboxes, det_labels, det_kpts in results_list + ] + return bbox_kpt_results + + def bbox_kpt2result(self, bboxes, labels, kpts, num_classes): + """Convert detection results to a list of numpy arrays. + + Args: + bboxes (paddle.Tensor | np.ndarray): shape (n, 5). + labels (paddle.Tensor | np.ndarray): shape (n, ). + kpts (paddle.Tensor | np.ndarray): shape (n, K, 3). + num_classes (int): class number, including background class. + + Returns: + list(ndarray): bbox and keypoint results of each class. + """ + if bboxes.shape[0] == 0: + return [np.zeros((0, 5), dtype=np.float32) for i in range(num_classes)], \ + [np.zeros((0, kpts.size(1), 3), dtype=np.float32) + for i in range(num_classes)] + else: + if isinstance(bboxes, paddle.Tensor): + bboxes = bboxes.numpy() + labels = labels.numpy() + kpts = kpts.numpy() + return [bboxes[labels == i, :] for i in range(num_classes)], \ + [kpts[labels == i, :, :] for i in range(num_classes)] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/mask_rcnn.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/mask_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..4f6a9ce10f76801ca4bff4e3dc3e304b8e3567f5 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/mask_rcnn.py @@ -0,0 +1,152 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['MaskRCNN'] + + +@register +class MaskRCNN(BaseArch): + """ + Mask R-CNN network, see https://arxiv.org/abs/1703.06870 + + Args: + backbone (object): backbone instance + rpn_head (object): `RPNHead` instance + bbox_head (object): `BBoxHead` instance + mask_head (object): `MaskHead` instance + bbox_post_process (object): `BBoxPostProcess` instance + mask_post_process (object): `MaskPostProcess` instance + neck (object): 'FPN' instance + """ + + __category__ = 'architecture' + __inject__ = [ + 'bbox_post_process', + 'mask_post_process', + ] + + def __init__(self, + backbone, + rpn_head, + bbox_head, + mask_head, + bbox_post_process, + mask_post_process, + neck=None): + super(MaskRCNN, self).__init__() + self.backbone = backbone + self.neck = neck + self.rpn_head = rpn_head + self.bbox_head = bbox_head + self.mask_head = mask_head + + self.bbox_post_process = bbox_post_process + self.mask_post_process = mask_post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} + neck = cfg['neck'] and create(cfg['neck'], **kwargs) + + out_shape = neck and neck.out_shape or backbone.out_shape + kwargs = {'input_shape': out_shape} + rpn_head = create(cfg['rpn_head'], **kwargs) + bbox_head = create(cfg['bbox_head'], **kwargs) + + out_shape = neck and out_shape or bbox_head.get_head().out_shape + kwargs = {'input_shape': out_shape} + mask_head = create(cfg['mask_head'], **kwargs) + return { + 'backbone': backbone, + 'neck': neck, + "rpn_head": rpn_head, + "bbox_head": bbox_head, + "mask_head": mask_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + if self.neck is not None: + body_feats = self.neck(body_feats) + + if self.training: + rois, rois_num, rpn_loss = self.rpn_head(body_feats, self.inputs) + bbox_loss, bbox_feat = self.bbox_head(body_feats, rois, rois_num, + self.inputs) + rois, rois_num = self.bbox_head.get_assigned_rois() + bbox_targets = self.bbox_head.get_assigned_targets() + # Mask Head needs bbox_feat in Mask RCNN + mask_loss = self.mask_head(body_feats, rois, rois_num, self.inputs, + bbox_targets, bbox_feat) + return rpn_loss, bbox_loss, mask_loss + else: + rois, rois_num, _ = self.rpn_head(body_feats, self.inputs) + preds, feat_func = self.bbox_head(body_feats, rois, rois_num, None) + + im_shape = self.inputs['im_shape'] + scale_factor = self.inputs['scale_factor'] + + bbox, bbox_num, nms_keep_idx = self.bbox_post_process( + preds, (rois, rois_num), im_shape, scale_factor) + mask_out = self.mask_head( + body_feats, bbox, bbox_num, self.inputs, feat_func=feat_func) + + # rescale the prediction back to origin image + bbox, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) + origin_shape = self.bbox_post_process.get_origin_shape() + mask_pred = self.mask_post_process(mask_out, bbox_pred, bbox_num, + origin_shape) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + extra_data['scores'] = preds[1] # predict scores (probability) + # Todo: get logits output + extra_data['nms_keep_idx'] = nms_keep_idx # bbox index before nms + return bbox_pred, bbox_num, mask_pred, extra_data + else: + return bbox_pred, bbox_num, mask_pred + + def get_loss(self, ): + bbox_loss, mask_loss, rpn_loss = self._forward() + loss = {} + loss.update(rpn_loss) + loss.update(bbox_loss) + loss.update(mask_loss) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + if self.use_extra_data: + bbox_pred, bbox_num, mask_pred, extra_data = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num, 'mask': mask_pred, 'extra_data': extra_data} + else: + bbox_pred, bbox_num, mask_pred = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num, 'mask': mask_pred} + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/meta_arch.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/meta_arch.py new file mode 100644 index 0000000000000000000000000000000000000000..370b2b124bfc1f5477a942f972731f2857e1641c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/meta_arch.py @@ -0,0 +1,132 @@ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import paddle +import paddle.nn as nn +import typing + +from ppdet.core.workspace import register +from ppdet.modeling.post_process import nms + +__all__ = ['BaseArch'] + + +@register +class BaseArch(nn.Layer): + def __init__(self, data_format='NCHW', use_extra_data=False): + super(BaseArch, self).__init__() + self.data_format = data_format + self.inputs = {} + self.fuse_norm = False + self.use_extra_data = use_extra_data + + def load_meanstd(self, cfg_transform): + scale = 1. + mean = np.array([0.485, 0.456, 0.406], dtype=np.float32) + std = np.array([0.229, 0.224, 0.225], dtype=np.float32) + for item in cfg_transform: + if 'NormalizeImage' in item: + mean = np.array( + item['NormalizeImage']['mean'], dtype=np.float32) + std = np.array(item['NormalizeImage']['std'], dtype=np.float32) + if item['NormalizeImage'].get('is_scale', True): + scale = 1. / 255. + break + if self.data_format == 'NHWC': + self.scale = paddle.to_tensor(scale / std).reshape((1, 1, 1, 3)) + self.bias = paddle.to_tensor(-mean / std).reshape((1, 1, 1, 3)) + else: + self.scale = paddle.to_tensor(scale / std).reshape((1, 3, 1, 1)) + self.bias = paddle.to_tensor(-mean / std).reshape((1, 3, 1, 1)) + + def forward(self, inputs): + if self.data_format == 'NHWC': + image = inputs['image'] + inputs['image'] = paddle.transpose(image, [0, 2, 3, 1]) + + if self.fuse_norm: + image = inputs['image'] + self.inputs['image'] = image * self.scale + self.bias + self.inputs['im_shape'] = inputs['im_shape'] + self.inputs['scale_factor'] = inputs['scale_factor'] + else: + self.inputs = inputs + + self.model_arch() + + if self.training: + out = self.get_loss() + else: + inputs_list = [] + # multi-scale input + if not isinstance(inputs, typing.Sequence): + inputs_list.append(inputs) + else: + inputs_list.extend(inputs) + outs = [] + for inp in inputs_list: + if self.fuse_norm: + self.inputs['image'] = inp['image'] * self.scale + self.bias + self.inputs['im_shape'] = inp['im_shape'] + self.inputs['scale_factor'] = inp['scale_factor'] + else: + self.inputs = inp + outs.append(self.get_pred()) + + # multi-scale test + if len(outs) > 1: + out = self.merge_multi_scale_predictions(outs) + else: + out = outs[0] + return out + + def merge_multi_scale_predictions(self, outs): + # default values for architectures not included in following list + num_classes = 80 + nms_threshold = 0.5 + keep_top_k = 100 + + if self.__class__.__name__ in ('CascadeRCNN', 'FasterRCNN', 'MaskRCNN'): + num_classes = self.bbox_head.num_classes + keep_top_k = self.bbox_post_process.nms.keep_top_k + nms_threshold = self.bbox_post_process.nms.nms_threshold + else: + raise Exception( + "Multi scale test only supports CascadeRCNN, FasterRCNN and MaskRCNN for now" + ) + + final_boxes = [] + all_scale_outs = paddle.concat([o['bbox'] for o in outs]).numpy() + for c in range(num_classes): + idxs = all_scale_outs[:, 0] == c + if np.count_nonzero(idxs) == 0: + continue + r = nms(all_scale_outs[idxs, 1:], nms_threshold) + final_boxes.append( + np.concatenate([np.full((r.shape[0], 1), c), r], 1)) + out = np.concatenate(final_boxes) + out = np.concatenate(sorted( + out, key=lambda e: e[1])[-keep_top_k:]).reshape((-1, 6)) + out = { + 'bbox': paddle.to_tensor(out), + 'bbox_num': paddle.to_tensor(np.array([out.shape[0], ])) + } + + return out + + def build_inputs(self, data, input_def): + inputs = {} + for i, k in enumerate(input_def): + inputs[k] = data[i] + return inputs + + def model_arch(self, ): + pass + + def get_loss(self, ): + raise NotImplementedError("Should implement get_loss method!") + + def get_pred(self, ): + raise NotImplementedError("Should implement get_pred method!") diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/picodet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/picodet.py new file mode 100644 index 0000000000000000000000000000000000000000..0b87a4baa429dae1c03286f09243ca3211b199df --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/picodet.py @@ -0,0 +1,95 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['PicoDet'] + + +@register +class PicoDet(BaseArch): + """ + Generalized Focal Loss network, see https://arxiv.org/abs/2006.04388 + + Args: + backbone (object): backbone instance + neck (object): 'FPN' instance + head (object): 'PicoHead' instance + """ + + __category__ = 'architecture' + + def __init__(self, backbone, neck, head='PicoHead'): + super(PicoDet, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.export_post_process = True + self.export_nms = True + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + fpn_feats = self.neck(body_feats) + head_outs = self.head(fpn_feats, self.export_post_process) + if self.training or not self.export_post_process: + return head_outs, None + else: + scale_factor = self.inputs['scale_factor'] + bboxes, bbox_num = self.head.post_process( + head_outs, scale_factor, export_nms=self.export_nms) + return bboxes, bbox_num + + def get_loss(self, ): + loss = {} + + head_outs, _ = self._forward() + loss_gfl = self.head.get_loss(head_outs, self.inputs) + loss.update(loss_gfl) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + if not self.export_post_process: + return {'picodet': self._forward()[0]} + elif self.export_nms: + bbox_pred, bbox_num = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output + else: + bboxes, mlvl_scores = self._forward() + output = {'bbox': bboxes, 'scores': mlvl_scores} + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/pose3d_metro.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/pose3d_metro.py new file mode 100644 index 0000000000000000000000000000000000000000..4275154d137ccd838ee36cbe2f09c520d0ea3d2b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/pose3d_metro.py @@ -0,0 +1,114 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +from .. import layers as L + +__all__ = ['METRO_Body'] + + +def orthographic_projection(X, camera): + """Perform orthographic projection of 3D points X using the camera parameters + Args: + X: size = [B, N, 3] + camera: size = [B, 3] + Returns: + Projected 2D points -- size = [B, N, 2] + """ + camera = camera.reshape((-1, 1, 3)) + X_trans = X[:, :, :2] + camera[:, :, 1:] + shape = paddle.shape(X_trans) + X_2d = (camera[:, :, 0] * X_trans.reshape((shape[0], -1))).reshape(shape) + return X_2d + + +@register +class METRO_Body(BaseArch): + __category__ = 'architecture' + __inject__ = ['loss'] + + def __init__( + self, + num_joints, + backbone='HRNet', + trans_encoder='', + loss='Pose3DLoss', ): + """ + Modified from METRO network, see https://arxiv.org/abs/2012.09760 + + Args: + backbone (nn.Layer): backbone instance + """ + super(METRO_Body, self).__init__() + self.num_joints = num_joints + self.backbone = backbone + self.loss = loss + self.deploy = False + + self.trans_encoder = trans_encoder + self.conv_learn_tokens = paddle.nn.Conv1D(49, num_joints + 10, 1) + self.cam_param_fc = paddle.nn.Linear(3, 2) + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + trans_encoder = create(cfg['trans_encoder']) + + return {'backbone': backbone, 'trans_encoder': trans_encoder} + + def _forward(self): + batch_size = self.inputs['image'].shape[0] + + image_feat = self.backbone(self.inputs) + image_feat_flatten = image_feat.reshape((batch_size, 2048, 49)) + image_feat_flatten = image_feat_flatten.transpose(perm=(0, 2, 1)) + # and apply a conv layer to learn image token for each 3d joint/vertex position + features = self.conv_learn_tokens(image_feat_flatten) # (B, J, C) + + if self.training: + # apply mask vertex/joint modeling + # meta_masks is a tensor of all the masks, randomly generated in dataloader + # we pre-define a [MASK] token, which is a floating-value vector with 0.01s + meta_masks = self.inputs['mjm_mask'].expand((-1, -1, 2048)) + constant_tensor = paddle.ones_like(features) * 0.01 + features = features * meta_masks + constant_tensor * (1 - meta_masks + ) + pred_out = self.trans_encoder(features) + + pred_3d_joints = pred_out[:, :self.num_joints, :] + cam_features = pred_out[:, self.num_joints:, :] + + # learn camera parameters + pred_2d_joints = self.cam_param_fc(cam_features) + return pred_3d_joints, pred_2d_joints + + def get_loss(self): + preds_3d, preds_2d = self._forward() + loss = self.loss(preds_3d, preds_2d, self.inputs) + output = {'loss': loss} + return output + + def get_pred(self): + preds_3d, preds_2d = self._forward() + outputs = {'pose3d': preds_3d, 'pose2d': preds_2d} + return outputs diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/ppyoloe.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ppyoloe.py new file mode 100644 index 0000000000000000000000000000000000000000..330542b8ab20d04fb433eddffa14bf05afd8e6a2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ppyoloe.py @@ -0,0 +1,260 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import copy +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['PPYOLOE', 'PPYOLOEWithAuxHead'] +# PP-YOLOE and PP-YOLOE+ are recommended to use this architecture, especially when use distillation or aux head +# PP-YOLOE and PP-YOLOE+ can also use the same architecture of YOLOv3 in yolo.py when not use distillation or aux head + + +@register +class PPYOLOE(BaseArch): + """ + PPYOLOE network, see https://arxiv.org/abs/2203.16250 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + yolo_head (nn.Layer): anchor_head instance + post_process (object): `BBoxPostProcess` instance + ssod_loss (object): 'SSODPPYOLOELoss' instance, only used for semi-det(ssod) + for_distill (bool): whether for distillation + feat_distill_place (str): distill which feature for distillation + for_mot (bool): whether return other features for multi-object tracking + models, default False in pure object detection models. + """ + + __category__ = 'architecture' + __shared__ = ['for_distill'] + __inject__ = ['post_process', 'ssod_loss'] + + def __init__(self, + backbone='CSPResNet', + neck='CustomCSPPAN', + yolo_head='PPYOLOEHead', + post_process='BBoxPostProcess', + ssod_loss='SSODPPYOLOELoss', + for_distill=False, + feat_distill_place='neck_feats', + for_mot=False): + super(PPYOLOE, self).__init__() + self.backbone = backbone + self.neck = neck + self.yolo_head = yolo_head + self.post_process = post_process + self.for_mot = for_mot + + # for ssod, semi-det + self.is_teacher = False + self.ssod_loss = ssod_loss + + # distill + self.for_distill = for_distill + self.feat_distill_place = feat_distill_place + if for_distill: + assert feat_distill_place in ['backbone_feats', 'neck_feats'] + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + yolo_head = create(cfg['yolo_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "yolo_head": yolo_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats, self.for_mot) + + self.is_teacher = self.inputs.get('is_teacher', False) # for semi-det + if self.training or self.is_teacher: + yolo_losses = self.yolo_head(neck_feats, self.inputs) + + if self.for_distill: + if self.feat_distill_place == 'backbone_feats': + self.yolo_head.distill_pairs['backbone_feats'] = body_feats + elif self.feat_distill_place == 'neck_feats': + self.yolo_head.distill_pairs['neck_feats'] = neck_feats + else: + raise ValueError + return yolo_losses + else: + + yolo_head_outs = self.yolo_head(neck_feats) + + if self.post_process is not None: + bbox, bbox_num, nms_keep_idx = self.post_process( + yolo_head_outs, self.yolo_head.mask_anchors, + self.inputs['im_shape'], self.inputs['scale_factor']) + + else: + bbox, bbox_num, nms_keep_idx = self.yolo_head.post_process( + yolo_head_outs, self.inputs['scale_factor']) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + extra_data['scores'] = yolo_head_outs[0] # predict scores (probability) + extra_data['nms_keep_idx'] = nms_keep_idx + output = {'bbox': bbox, 'bbox_num': bbox_num, 'extra_data': extra_data} + else: + output = {'bbox': bbox, 'bbox_num': bbox_num} + + return output + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() + + def get_loss_keys(self): + return ['loss_cls', 'loss_iou', 'loss_dfl', 'loss_contrast'] + + def get_ssod_loss(self, student_head_outs, teacher_head_outs, train_cfg): + ssod_losses = self.ssod_loss(student_head_outs, teacher_head_outs, + train_cfg) + return ssod_losses + + +@register +class PPYOLOEWithAuxHead(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone='CSPResNet', + neck='CustomCSPPAN', + yolo_head='PPYOLOEHead', + aux_head='SimpleConvHead', + post_process='BBoxPostProcess', + for_mot=False, + detach_epoch=5): + """ + PPYOLOE network, see https://arxiv.org/abs/2203.16250 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + yolo_head (nn.Layer): anchor_head instance + post_process (object): `BBoxPostProcess` instance + for_mot (bool): whether return other features for multi-object tracking + models, default False in pure object detection models. + """ + super(PPYOLOEWithAuxHead, self).__init__() + self.backbone = backbone + self.neck = neck + self.aux_neck = copy.deepcopy(self.neck) + + self.yolo_head = yolo_head + self.aux_head = aux_head + self.post_process = post_process + self.for_mot = for_mot + self.detach_epoch = detach_epoch + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + aux_neck = copy.deepcopy(neck) + + # head + kwargs = {'input_shape': neck.out_shape} + yolo_head = create(cfg['yolo_head'], **kwargs) + aux_head = create(cfg['aux_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "yolo_head": yolo_head, + 'aux_head': aux_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats, self.for_mot) + + if self.training: + if self.inputs['epoch_id'] >= self.detach_epoch: + aux_neck_feats = self.aux_neck([f.detach() for f in body_feats]) + dual_neck_feats = (paddle.concat( + [f.detach(), aux_f], axis=1) for f, aux_f in + zip(neck_feats, aux_neck_feats)) + else: + aux_neck_feats = self.aux_neck(body_feats) + dual_neck_feats = (paddle.concat( + [f, aux_f], axis=1) for f, aux_f in + zip(neck_feats, aux_neck_feats)) + aux_cls_scores, aux_bbox_preds = self.aux_head(dual_neck_feats) + loss = self.yolo_head( + neck_feats, + self.inputs, + aux_pred=[aux_cls_scores, aux_bbox_preds]) + return loss + else: + yolo_head_outs = self.yolo_head(neck_feats) + + if self.post_process is not None: + bbox, bbox_num, nms_keep_idx = self.post_process( + yolo_head_outs, self.yolo_head.mask_anchors, + self.inputs['im_shape'], self.inputs['scale_factor']) + else: + bbox, bbox_num, nms_keep_idx = self.yolo_head.post_process( + yolo_head_outs, self.inputs['scale_factor']) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + extra_data['scores'] = yolo_head_outs[0] # predict scores (probability) + # Todo: get logits output + extra_data['nms_keep_idx'] = nms_keep_idx + output = {'bbox': bbox, 'bbox_num': bbox_num, 'extra_data': extra_data} + else: + output = {'bbox': bbox, 'bbox_num': bbox_num} + + return output + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/queryinst.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/queryinst.py new file mode 100644 index 0000000000000000000000000000000000000000..76a65ed3a2565d638546f4b3deb09670bd809c1c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/queryinst.py @@ -0,0 +1,104 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['QueryInst'] + + +@register +class QueryInst(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone, + neck, + rpn_head, + roi_head, + post_process='SparsePostProcess'): + super(QueryInst, self).__init__() + self.backbone = backbone + self.neck = neck + self.rpn_head = rpn_head + self.roi_head = roi_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + rpn_head = create(cfg['rpn_head'], **kwargs) + roi_head = create(cfg['roi_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + 'rpn_head': rpn_head, + "roi_head": roi_head + } + + def _forward(self, targets=None): + features = self.backbone(self.inputs) + features = self.neck(features) + + proposal_bboxes, proposal_features = self.rpn_head(self.inputs[ + 'img_whwh']) + outputs = self.roi_head(features, proposal_bboxes, proposal_features, + targets) + + if self.training: + return outputs + else: + bbox_pred, bbox_num, mask_pred = self.post_process( + outputs['class_logits'], outputs['bbox_pred'], + self.inputs['scale_factor_whwh'], self.inputs['ori_shape'], + outputs['mask_logits']) + return bbox_pred, bbox_num, mask_pred + + def get_loss(self): + targets = [] + for i in range(len(self.inputs['img_whwh'])): + boxes = self.inputs['gt_bbox'][i] + labels = self.inputs['gt_class'][i].squeeze(-1) + img_whwh = self.inputs['img_whwh'][i] + if boxes.shape[0] != 0: + img_whwh_tgt = img_whwh.unsqueeze(0).tile([boxes.shape[0], 1]) + else: + img_whwh_tgt = paddle.zeros_like(boxes) + gt_segm = self.inputs['gt_segm'][i].astype('float32') + targets.append({ + 'boxes': boxes, + 'labels': labels, + 'img_whwh': img_whwh, + 'img_whwh_tgt': img_whwh_tgt, + 'gt_segm': gt_segm + }) + losses = self._forward(targets) + losses.update({'loss': sum(losses.values())}) + return losses + + def get_pred(self): + bbox_pred, bbox_num, mask_pred = self._forward() + return {'bbox': bbox_pred, 'bbox_num': bbox_num, 'mask': mask_pred} diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/retinanet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/retinanet.py new file mode 100644 index 0000000000000000000000000000000000000000..fc49f0e97365ef10d14c9214133d53db304b880b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/retinanet.py @@ -0,0 +1,84 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +import paddle +import paddle.nn.functional as F + +__all__ = ['RetinaNet'] + + +@register +class RetinaNet(BaseArch): + __category__ = 'architecture' + + def __init__(self, backbone, neck, head): + super(RetinaNet, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + 'head': head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats) + + if self.training: + return self.head(neck_feats, self.inputs) + else: + head_outs = self.head(neck_feats) + bbox, bbox_num, nms_keep_idx = self.head.post_process( + head_outs, self.inputs['im_shape'], self.inputs['scale_factor']) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + preds_logits = self.head.decode_cls_logits(head_outs[0]) + preds_scores = F.sigmoid(preds_logits) + extra_data['logits'] = preds_logits + extra_data['scores'] = preds_scores + extra_data['nms_keep_idx'] = nms_keep_idx # bbox index before nms + return {'bbox': bbox, 'bbox_num': bbox_num, "extra_data": extra_data} + else: + return {'bbox': bbox, 'bbox_num': bbox_num} + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/s2anet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/s2anet.py new file mode 100644 index 0000000000000000000000000000000000000000..8fb71e205abd27a2a503ba3eb5e16e158b8e2a95 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/s2anet.py @@ -0,0 +1,83 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['S2ANet'] + + +@register +class S2ANet(BaseArch): + __category__ = 'architecture' + __inject__ = ['head'] + + def __init__(self, backbone, neck, head): + """ + S2ANet, see https://arxiv.org/pdf/2008.09397.pdf + + Args: + backbone (object): backbone instance + neck (object): `FPN` instance + head (object): `Head` instance + """ + super(S2ANet, self).__init__() + self.backbone = backbone + self.neck = neck + self.s2anet_head = head + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} + neck = cfg['neck'] and create(cfg['neck'], **kwargs) + + out_shape = neck and neck.out_shape or backbone.out_shape + kwargs = {'input_shape': out_shape} + head = create(cfg['head'], **kwargs) + + return {'backbone': backbone, 'neck': neck, "head": head} + + def _forward(self): + body_feats = self.backbone(self.inputs) + if self.neck is not None: + body_feats = self.neck(body_feats) + if self.training: + loss = self.s2anet_head(body_feats, self.inputs) + return loss + else: + head_outs = self.s2anet_head(body_feats) + # post_process + bboxes, bbox_num = self.s2anet_head.get_bboxes(head_outs) + # rescale the prediction back to origin image + im_shape = self.inputs['im_shape'] + scale_factor = self.inputs['scale_factor'] + bboxes = self.s2anet_head.get_pred(bboxes, bbox_num, im_shape, + scale_factor) + # output + output = {'bbox': bboxes, 'bbox_num': bbox_num} + return output + + def get_loss(self, ): + loss = self._forward() + return loss + + def get_pred(self): + output = self._forward() + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/solov2.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/solov2.py new file mode 100644 index 0000000000000000000000000000000000000000..4e5fc211863b92ba609c958ac9206f99573ecfe4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/solov2.py @@ -0,0 +1,110 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['SOLOv2'] + + +@register +class SOLOv2(BaseArch): + """ + SOLOv2 network, see https://arxiv.org/abs/2003.10152 + + Args: + backbone (object): an backbone instance + solov2_head (object): an `SOLOv2Head` instance + mask_head (object): an `SOLOv2MaskHead` instance + neck (object): neck of network, such as feature pyramid network instance + """ + + __category__ = 'architecture' + + def __init__(self, backbone, solov2_head, mask_head, neck=None): + super(SOLOv2, self).__init__() + self.backbone = backbone + self.neck = neck + self.solov2_head = solov2_head + self.mask_head = mask_head + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + solov2_head = create(cfg['solov2_head'], **kwargs) + mask_head = create(cfg['mask_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + 'solov2_head': solov2_head, + 'mask_head': mask_head, + } + + def model_arch(self): + body_feats = self.backbone(self.inputs) + + body_feats = self.neck(body_feats) + + self.seg_pred = self.mask_head(body_feats) + + self.cate_pred_list, self.kernel_pred_list = self.solov2_head( + body_feats) + + def get_loss(self, ): + loss = {} + # get gt_ins_labels, gt_cate_labels, etc. + gt_ins_labels, gt_cate_labels, gt_grid_orders = [], [], [] + fg_num = self.inputs['fg_num'] + for i in range(len(self.solov2_head.seg_num_grids)): + ins_label = 'ins_label{}'.format(i) + if ins_label in self.inputs: + gt_ins_labels.append(self.inputs[ins_label]) + cate_label = 'cate_label{}'.format(i) + if cate_label in self.inputs: + gt_cate_labels.append(self.inputs[cate_label]) + grid_order = 'grid_order{}'.format(i) + if grid_order in self.inputs: + gt_grid_orders.append(self.inputs[grid_order]) + + loss_solov2 = self.solov2_head.get_loss( + self.cate_pred_list, self.kernel_pred_list, self.seg_pred, + gt_ins_labels, gt_cate_labels, gt_grid_orders, fg_num) + loss.update(loss_solov2) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + seg_masks, cate_labels, cate_scores, bbox_num = self.solov2_head.get_prediction( + self.cate_pred_list, self.kernel_pred_list, self.seg_pred, + self.inputs['im_shape'], self.inputs['scale_factor']) + outs = { + "segm": seg_masks, + "bbox_num": bbox_num, + 'cate_label': cate_labels, + 'cate_score': cate_scores + } + return outs diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/sparse_rcnn.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/sparse_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..2cbc85338eaf899f7344c415b09d8d49901972b0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/sparse_rcnn.py @@ -0,0 +1,99 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ["SparseRCNN"] + + +@register +class SparseRCNN(BaseArch): + __category__ = 'architecture' + __inject__ = ["postprocess"] + + def __init__(self, + backbone, + neck, + head="SparsercnnHead", + postprocess="SparsePostProcess"): + super(SparseRCNN, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.postprocess = postprocess + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'roi_input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + fpn_feats = self.neck(body_feats) + head_outs = self.head(fpn_feats, self.inputs["img_whwh"]) + + if not self.training: + bbox_pred, bbox_num = self.postprocess( + head_outs["pred_logits"], head_outs["pred_boxes"], + self.inputs["scale_factor_whwh"], self.inputs["ori_shape"]) + return bbox_pred, bbox_num + else: + return head_outs + + def get_loss(self): + batch_gt_class = self.inputs["gt_class"] + batch_gt_box = self.inputs["gt_bbox"] + batch_whwh = self.inputs["img_whwh"] + targets = [] + + for i in range(len(batch_gt_class)): + boxes = batch_gt_box[i] + labels = batch_gt_class[i].squeeze(-1) + img_whwh = batch_whwh[i] + img_whwh_tgt = img_whwh.unsqueeze(0).tile([int(boxes.shape[0]), 1]) + targets.append({ + "boxes": boxes, + "labels": labels, + "img_whwh": img_whwh, + "img_whwh_tgt": img_whwh_tgt + }) + + outputs = self._forward() + loss_dict = self.head.get_loss(outputs, targets) + acc = loss_dict["acc"] + loss_dict.pop("acc") + total_loss = sum(loss_dict.values()) + loss_dict.update({"loss": total_loss, "acc": acc}) + return loss_dict + + def get_pred(self): + bbox_pred, bbox_num = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/ssd.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ssd.py new file mode 100644 index 0000000000000000000000000000000000000000..b8669b7cf127857662f0f78e9c52e43f29fbfcfe --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ssd.py @@ -0,0 +1,118 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +import paddle +import paddle.nn.functional as F + +__all__ = ['SSD'] + + +@register +class SSD(BaseArch): + """ + Single Shot MultiBox Detector, see https://arxiv.org/abs/1512.02325 + + Args: + backbone (nn.Layer): backbone instance + ssd_head (nn.Layer): `SSDHead` instance + post_process (object): `BBoxPostProcess` instance + """ + + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, backbone, ssd_head, post_process, r34_backbone=False): + super(SSD, self).__init__() + self.backbone = backbone + self.ssd_head = ssd_head + self.post_process = post_process + self.r34_backbone = r34_backbone + if self.r34_backbone: + from ppdet.modeling.backbones.resnet import ResNet + assert isinstance(self.backbone, ResNet) and \ + self.backbone.depth == 34, \ + "If you set r34_backbone=True, please use ResNet-34 as backbone." + self.backbone.res_layers[2].blocks[0].branch2a.conv._stride = [1, 1] + self.backbone.res_layers[2].blocks[0].short.conv._stride = [1, 1] + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # head + kwargs = {'input_shape': backbone.out_shape} + ssd_head = create(cfg['ssd_head'], **kwargs) + + return { + 'backbone': backbone, + "ssd_head": ssd_head, + } + + def _forward(self): + # Backbone + body_feats = self.backbone(self.inputs) + + # SSD Head + if self.training: + return self.ssd_head(body_feats, self.inputs['image'], + self.inputs['gt_bbox'], + self.inputs['gt_class']) + else: + preds, anchors = self.ssd_head(body_feats, self.inputs['image']) + bbox, bbox_num, nms_keep_idx = self.post_process( + preds, anchors, self.inputs['im_shape'], + self.inputs['scale_factor']) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + preds_logits = preds[1] # [[1xNumBBoxNumClass]] + extra_data['scores'] = F.softmax(paddle.concat( + preds_logits, axis=1)).transpose([0, 2, 1]) + extra_data['logits'] = paddle.concat( + preds_logits, axis=1).transpose([0, 2, 1]) + extra_data['nms_keep_idx'] = nms_keep_idx # bbox index before nms + return bbox, bbox_num, extra_data + else: + return bbox, bbox_num + + def get_loss(self, ): + return {"loss": self._forward()} + + def get_pred(self): + if self.use_extra_data: + bbox_pred, bbox_num, extra_data = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + "extra_data": extra_data + } + else: + bbox_pred, bbox_num = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + } + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/tood.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/tood.py new file mode 100644 index 0000000000000000000000000000000000000000..157ec6f3a581a1a4f14b915553c397213c29dcd2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/tood.py @@ -0,0 +1,77 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['TOOD'] + + +@register +class TOOD(BaseArch): + """ + TOOD: Task-aligned One-stage Object Detection, see https://arxiv.org/abs/2108.07755 + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): 'FPN' instance + head (nn.Layer): 'TOODHead' instance + """ + + __category__ = 'architecture' + + def __init__(self, backbone, neck, head): + super(TOOD, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + fpn_feats = self.neck(body_feats) + head_outs = self.head(fpn_feats) + if not self.training: + bboxes, bbox_num = self.head.post_process( + head_outs, self.inputs['im_shape'], self.inputs['scale_factor']) + return bboxes, bbox_num + else: + loss = self.head.get_loss(head_outs, self.inputs) + return loss + + def get_loss(self): + return self._forward() + + def get_pred(self): + bbox_pred, bbox_num = self._forward() + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/ttfnet.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ttfnet.py new file mode 100644 index 0000000000000000000000000000000000000000..c3eb61c877efbffd8f5d6c3d957aff161d1af185 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/ttfnet.py @@ -0,0 +1,98 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['TTFNet'] + + +@register +class TTFNet(BaseArch): + """ + TTFNet network, see https://arxiv.org/abs/1909.00700 + + Args: + backbone (object): backbone instance + neck (object): 'TTFFPN' instance + ttf_head (object): 'TTFHead' instance + post_process (object): 'BBoxPostProcess' instance + """ + + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone='DarkNet', + neck='TTFFPN', + ttf_head='TTFHead', + post_process='BBoxPostProcess'): + super(TTFNet, self).__init__() + self.backbone = backbone + self.neck = neck + self.ttf_head = ttf_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + backbone = create(cfg['backbone']) + + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + kwargs = {'input_shape': neck.out_shape} + ttf_head = create(cfg['ttf_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "ttf_head": ttf_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + body_feats = self.neck(body_feats) + hm, wh = self.ttf_head(body_feats) + if self.training: + return hm, wh + else: + bbox, bbox_num = self.post_process(hm, wh, self.inputs['im_shape'], + self.inputs['scale_factor']) + return bbox, bbox_num + + def get_loss(self, ): + loss = {} + heatmap = self.inputs['ttf_heatmap'] + box_target = self.inputs['ttf_box_target'] + reg_weight = self.inputs['ttf_reg_weight'] + hm, wh = self._forward() + head_loss = self.ttf_head.get_loss(hm, wh, heatmap, box_target, + reg_weight) + loss.update(head_loss) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + return loss + + def get_pred(self): + bbox_pred, bbox_num = self._forward() + output = { + "bbox": bbox_pred, + "bbox_num": bbox_num, + } + return output diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolo.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolo.py new file mode 100644 index 0000000000000000000000000000000000000000..b004935654ed0ec2290af758133f479147231dbd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolo.py @@ -0,0 +1,150 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch +from ..post_process import JDEBBoxPostProcess + +__all__ = ['YOLOv3'] +# YOLOv3,PP-YOLO,PP-YOLOv2,PP-YOLOE,PP-YOLOE+ use the same architecture as YOLOv3 +# PP-YOLOE and PP-YOLOE+ are recommended to use PPYOLOE architecture in ppyoloe.py, especially when use distillation or aux head + + +@register +class YOLOv3(BaseArch): + __category__ = 'architecture' + __shared__ = ['data_format'] + __inject__ = ['post_process'] + + def __init__(self, + backbone='DarkNet', + neck='YOLOv3FPN', + yolo_head='YOLOv3Head', + post_process='BBoxPostProcess', + data_format='NCHW', + for_mot=False): + """ + YOLOv3 network, see https://arxiv.org/abs/1804.02767 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + yolo_head (nn.Layer): anchor_head instance + bbox_post_process (object): `BBoxPostProcess` instance + data_format (str): data format, NCHW or NHWC + for_mot (bool): whether return other features for multi-object tracking + models, default False in pure object detection models. + """ + super(YOLOv3, self).__init__(data_format=data_format) + self.backbone = backbone + self.neck = neck + self.yolo_head = yolo_head + self.post_process = post_process + self.for_mot = for_mot + self.return_idx = isinstance(post_process, JDEBBoxPostProcess) + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + # head + kwargs = {'input_shape': neck.out_shape} + yolo_head = create(cfg['yolo_head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "yolo_head": yolo_head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + if self.for_mot: + neck_feats = self.neck(body_feats, self.for_mot) + else: + neck_feats = self.neck(body_feats) + + if isinstance(neck_feats, dict): + assert self.for_mot == True + emb_feats = neck_feats['emb_feats'] + neck_feats = neck_feats['yolo_feats'] + + if self.training: + yolo_losses = self.yolo_head(neck_feats, self.inputs) + + if self.for_mot: + return {'det_losses': yolo_losses, 'emb_feats': emb_feats} + else: + return yolo_losses + + else: + yolo_head_outs = self.yolo_head(neck_feats) + + if self.for_mot: + # the detection part of JDE MOT model + boxes_idx, bbox, bbox_num, nms_keep_idx = self.post_process( + yolo_head_outs, self.yolo_head.mask_anchors) + output = { + 'bbox': bbox, + 'bbox_num': bbox_num, + 'boxes_idx': boxes_idx, + 'nms_keep_idx': nms_keep_idx, + 'emb_feats': emb_feats, + } + else: + if self.return_idx: + # the detection part of JDE MOT model + _, bbox, bbox_num, nms_keep_idx = self.post_process( + yolo_head_outs, self.yolo_head.mask_anchors) + elif self.post_process is not None: + # anchor based YOLOs: YOLOv3,PP-YOLO,PP-YOLOv2 use mask_anchors + bbox, bbox_num, nms_keep_idx = self.post_process( + yolo_head_outs, self.yolo_head.mask_anchors, + self.inputs['im_shape'], self.inputs['scale_factor']) + else: + # anchor free YOLOs: PP-YOLOE, PP-YOLOE+ + bbox, bbox_num, nms_keep_idx = self.yolo_head.post_process( + yolo_head_outs, self.inputs['scale_factor']) + + if self.use_extra_data: + extra_data = {} # record the bbox output before nms, such like scores and nms_keep_idx + """extra_data:{ + 'scores': predict scores, + 'nms_keep_idx': bbox index before nms, + } + """ + extra_data['scores'] = yolo_head_outs[0] # predict scores (probability) + # Todo: get logits output + extra_data['nms_keep_idx'] = nms_keep_idx + # Todo support for mask_anchors yolo + output = {'bbox': bbox, 'bbox_num': bbox_num, 'extra_data': extra_data} + else: + output = {'bbox': bbox, 'bbox_num': bbox_num} + + return output + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolof.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolof.py new file mode 100644 index 0000000000000000000000000000000000000000..b6a2920529e7203f88ae9150f6f3e014cc36cab0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolof.py @@ -0,0 +1,88 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +__all__ = ['YOLOF'] + + +@register +class YOLOF(BaseArch): + __category__ = 'architecture' + + def __init__(self, + backbone='ResNet', + neck='DilatedEncoder', + head='YOLOFHead', + for_mot=False): + """ + YOLOF network, see https://arxiv.org/abs/2103.09460 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): DilatedEncoder instance + head (nn.Layer): YOLOFHead instance + for_mot (bool): whether return other features for multi-object tracking + models, default False in pure object detection models. + """ + super(YOLOF, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.for_mot = for_mot + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + # head + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats, self.for_mot) + + if self.training: + yolo_losses = self.head(neck_feats, self.inputs) + return yolo_losses + else: + yolo_head_outs = self.head(neck_feats) + bbox, bbox_num = self.head.post_process(yolo_head_outs, + self.inputs['im_shape'], + self.inputs['scale_factor']) + output = {'bbox': bbox, 'bbox_num': bbox_num} + return output + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolox.py b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolox.py new file mode 100644 index 0000000000000000000000000000000000000000..8e02e9ef7ecce137013ec2e7707dc04e3afabb28 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/architectures/yolox.py @@ -0,0 +1,138 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +import random +import paddle +import paddle.nn.functional as F +import paddle.distributed as dist + +__all__ = ['YOLOX'] + + +@register +class YOLOX(BaseArch): + """ + YOLOX network, see https://arxiv.org/abs/2107.08430 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + head (nn.Layer): head instance + for_mot (bool): whether used for MOT or not + input_size (list[int]): initial scale, will be reset by self._preprocess() + size_stride (int): stride of the size range + size_range (list[int]): multi-scale range for training + random_interval (int): interval of iter to change self._input_size + """ + __category__ = 'architecture' + + def __init__(self, + backbone='CSPDarkNet', + neck='YOLOCSPPAN', + head='YOLOXHead', + for_mot=False, + input_size=[640, 640], + size_stride=32, + size_range=[15, 25], + random_interval=10): + super(YOLOX, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.for_mot = for_mot + + self.input_size = input_size + self._input_size = paddle.to_tensor(input_size) + self.size_stride = size_stride + self.size_range = size_range + self.random_interval = random_interval + self._step = 0 + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + # head + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + if self.training: + self._preprocess() + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats, self.for_mot) + + if self.training: + yolox_losses = self.head(neck_feats, self.inputs) + yolox_losses.update({'size': self._input_size[0]}) + return yolox_losses + else: + head_outs = self.head(neck_feats) + bbox, bbox_num = self.head.post_process( + head_outs, self.inputs['im_shape'], self.inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() + + def _preprocess(self): + # YOLOX multi-scale training, interpolate resize before inputs of the network. + self._get_size() + scale_y = self._input_size[0] / self.input_size[0] + scale_x = self._input_size[1] / self.input_size[1] + if scale_x != 1 or scale_y != 1: + self.inputs['image'] = F.interpolate( + self.inputs['image'], + size=self._input_size, + mode='bilinear', + align_corners=False) + gt_bboxes = self.inputs['gt_bbox'] + for i in range(len(gt_bboxes)): + if len(gt_bboxes[i]) > 0: + gt_bboxes[i][:, 0::2] = gt_bboxes[i][:, 0::2] * scale_x + gt_bboxes[i][:, 1::2] = gt_bboxes[i][:, 1::2] * scale_y + self.inputs['gt_bbox'] = gt_bboxes + + def _get_size(self): + # random_interval = 10 as default, every 10 iters to change self._input_size + image_ratio = self.input_size[1] * 1.0 / self.input_size[0] + if self._step % self.random_interval == 0: + size_factor = random.randint(*self.size_range) + size = [ + self.size_stride * size_factor, + self.size_stride * int(size_factor * image_ratio) + ] + self._input_size = paddle.to_tensor(size) + self._step += 1 diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..f462a9fd35190148f0285686d691806f3af8f4e2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__init__.py @@ -0,0 +1,35 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import utils +from . import task_aligned_assigner +from . import atss_assigner +from . import simota_assigner +from . import max_iou_assigner +from . import fcosr_assigner +from . import rotated_task_aligned_assigner +from . import task_aligned_assigner_cr +from . import uniform_assigner + +from .utils import * +from .task_aligned_assigner import * +from .atss_assigner import * +from .simota_assigner import * +from .max_iou_assigner import * +from .fcosr_assigner import * +from .rotated_task_aligned_assigner import * +from .task_aligned_assigner_cr import * +from .uniform_assigner import * +from .hungarian_assigner import * +from .pose_utils import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..da5a24c298428717a17ed266b706439f2c57222b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/atss_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/atss_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..43703ea9003b9c55f420f63649487dbbf4b78dc0 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/atss_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/fcosr_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/fcosr_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7e8d0a3fcb38113334222dcabb61621c6a114ad4 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/fcosr_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/hungarian_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/hungarian_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0d7856ec08f9e99d0120eacbf1a5da455a6ca66e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/hungarian_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/max_iou_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/max_iou_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0babfc4e5a00f1cc049a0145508e0f76e31081c6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/max_iou_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/pose_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/pose_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..78095dd7fa98f821eacdbb005d98fab5e7ad2222 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/pose_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/rotated_task_aligned_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/rotated_task_aligned_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..339c3d8c76ef017fda3b8bf33637aef0f5831723 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/rotated_task_aligned_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/simota_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/simota_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ead7ae041b63adcb5d070b12f68466bb6e2ec57e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/simota_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..10e0b5d449cd8f2cc08fd38a1b0da4066bb66930 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner_cr.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner_cr.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c1edc77ab24e496fceb76bf02e263db5e608820e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/task_aligned_assigner_cr.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/uniform_assigner.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/uniform_assigner.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8cdee10a0d8bdde250a37550592087eb81db749d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/uniform_assigner.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..25bfcdda1c34b286d407b9818150730003ba1a95 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/assigners/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/atss_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/atss_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..a1e753c9434708d1fa80cc2499812906e5411f77 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/atss_assigner.py @@ -0,0 +1,224 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ..bbox_utils import iou_similarity, batch_iou_similarity +from ..bbox_utils import bbox_center +from .utils import (check_points_inside_bboxes, compute_max_iou_anchor, + compute_max_iou_gt) + +__all__ = ['ATSSAssigner'] + + +@register +class ATSSAssigner(nn.Layer): + """Bridging the Gap Between Anchor-based and Anchor-free Detection + via Adaptive Training Sample Selection + """ + __shared__ = ['num_classes'] + + def __init__(self, + topk=9, + num_classes=80, + force_gt_matching=False, + eps=1e-9, + sm_use=False): + super(ATSSAssigner, self).__init__() + self.topk = topk + self.num_classes = num_classes + self.force_gt_matching = force_gt_matching + self.eps = eps + self.sm_use = sm_use + + def _gather_topk_pyramid(self, gt2anchor_distances, num_anchors_list, + pad_gt_mask): + gt2anchor_distances_list = paddle.split( + gt2anchor_distances, num_anchors_list, axis=-1) + num_anchors_index = np.cumsum(num_anchors_list).tolist() + num_anchors_index = [0, ] + num_anchors_index[:-1] + is_in_topk_list = [] + topk_idxs_list = [] + for distances, anchors_index in zip(gt2anchor_distances_list, + num_anchors_index): + num_anchors = distances.shape[-1] + _, topk_idxs = paddle.topk( + distances, self.topk, axis=-1, largest=False) + topk_idxs_list.append(topk_idxs + anchors_index) + is_in_topk = F.one_hot(topk_idxs, num_anchors).sum( + axis=-2).astype(gt2anchor_distances.dtype) + is_in_topk_list.append(is_in_topk * pad_gt_mask) + is_in_topk_list = paddle.concat(is_in_topk_list, axis=-1) + topk_idxs_list = paddle.concat(topk_idxs_list, axis=-1) + return is_in_topk_list, topk_idxs_list + + @paddle.no_grad() + def forward(self, + anchor_bboxes, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index, + gt_scores=None, + pred_bboxes=None): + r"""This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/core/bbox/assigners/atss_assigner.py + + The assignment is done in following steps + 1. compute iou between all bbox (bbox of all pyramid levels) and gt + 2. compute center distance between all bbox and gt + 3. on each pyramid level, for each gt, select k bbox whose center + are closest to the gt center, so we total select k*l bbox as + candidates for each gt + 4. get corresponding iou for the these candidates, and compute the + mean and std, set mean + std as the iou threshold + 5. select these candidates whose iou are greater than or equal to + the threshold as positive + 6. limit the positive sample's center in gt + 7. if an anchor box is assigned to multiple gts, the one with the + highest iou will be selected. + Args: + anchor_bboxes (Tensor, float32): pre-defined anchors, shape(L, 4), + "xmin, xmax, ymin, ymax" format + num_anchors_list (List): num of anchors in each level + gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) + gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) + pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) + bg_index (int): background index + gt_scores (Tensor|None, float32) Score of gt_bboxes, + shape(B, n, 1), if None, then it will initialize with one_hot label + pred_bboxes (Tensor, float32, optional): predicted bounding boxes, shape(B, L, 4) + Returns: + assigned_labels (Tensor): (B, L) + assigned_bboxes (Tensor): (B, L, 4) + assigned_scores (Tensor): (B, L, C), if pred_bboxes is not None, then output ious + """ + assert gt_labels.ndim == gt_bboxes.ndim and \ + gt_bboxes.ndim == 3 + + num_anchors, _ = anchor_bboxes.shape + batch_size, num_max_boxes, _ = gt_bboxes.shape + + # negative batch + if num_max_boxes == 0: + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype='int32') + assigned_bboxes = paddle.zeros([batch_size, num_anchors, 4]) + assigned_scores = paddle.zeros( + [batch_size, num_anchors, self.num_classes]) + return assigned_labels, assigned_bboxes, assigned_scores + + # 1. compute iou between gt and anchor bbox, [B, n, L] + ious = iou_similarity(gt_bboxes.reshape([-1, 4]), anchor_bboxes) + ious = ious.reshape([batch_size, -1, num_anchors]) + + # 2. compute center distance between all anchors and gt, [B, n, L] + gt_centers = bbox_center(gt_bboxes.reshape([-1, 4])).unsqueeze(1) + anchor_centers = bbox_center(anchor_bboxes) + gt2anchor_distances = (gt_centers - anchor_centers.unsqueeze(0)) \ + .norm(2, axis=-1).reshape([batch_size, -1, num_anchors]) + + # 3. on each pyramid level, selecting topk closest candidates + # based on the center distance, [B, n, L] + is_in_topk, topk_idxs = self._gather_topk_pyramid( + gt2anchor_distances, num_anchors_list, pad_gt_mask) + + # 4. get corresponding iou for the these candidates, and compute the + # mean and std, 5. set mean + std as the iou threshold + iou_candidates = ious * is_in_topk + iou_threshold = paddle.index_sample( + iou_candidates.flatten(stop_axis=-2), + topk_idxs.flatten(stop_axis=-2)) + iou_threshold = iou_threshold.reshape([batch_size, num_max_boxes, -1]) + iou_threshold = iou_threshold.mean(axis=-1, keepdim=True) + \ + iou_threshold.std(axis=-1, keepdim=True) + is_in_topk = paddle.where(iou_candidates > iou_threshold, is_in_topk, + paddle.zeros_like(is_in_topk)) + + # 6. check the positive sample's center in gt, [B, n, L] + if self.sm_use: + is_in_gts = check_points_inside_bboxes( + anchor_centers, gt_bboxes, sm_use=True) + else: + is_in_gts = check_points_inside_bboxes(anchor_centers, gt_bboxes) + + # select positive sample, [B, n, L] + mask_positive = is_in_topk * is_in_gts * pad_gt_mask + + # 7. if an anchor box is assigned to multiple gts, + # the one with the highest iou will be selected. + mask_positive_sum = mask_positive.sum(axis=-2) + if mask_positive_sum.max() > 1: + mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile( + [1, num_max_boxes, 1]) + if self.sm_use: + is_max_iou = compute_max_iou_anchor(ious * mask_positive) + else: + is_max_iou = compute_max_iou_anchor(ious) + mask_positive = paddle.where(mask_multiple_gts, is_max_iou, + mask_positive) + mask_positive_sum = mask_positive.sum(axis=-2) + # 8. make sure every gt_bbox matches the anchor + if self.force_gt_matching: + is_max_iou = compute_max_iou_gt(ious) * pad_gt_mask + mask_max_iou = (is_max_iou.sum(-2, keepdim=True) == 1).tile( + [1, num_max_boxes, 1]) + mask_positive = paddle.where(mask_max_iou, is_max_iou, + mask_positive) + mask_positive_sum = mask_positive.sum(axis=-2) + assigned_gt_index = mask_positive.argmax(axis=-2) + + # assigned target + batch_ind = paddle.arange( + end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) + assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes + assigned_labels = paddle.gather( + gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) + assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) + assigned_labels = paddle.where( + mask_positive_sum > 0, assigned_labels, + paddle.full_like(assigned_labels, bg_index)) + + assigned_bboxes = paddle.gather( + gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) + assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) + + assigned_scores = F.one_hot(assigned_labels, self.num_classes + 1) + ind = list(range(self.num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) + if pred_bboxes is not None: + # assigned iou + ious = batch_iou_similarity(gt_bboxes, pred_bboxes) * mask_positive + ious = ious.max(axis=-2).unsqueeze(-1) + assigned_scores *= ious + elif gt_scores is not None: + gather_scores = paddle.gather( + gt_scores.flatten(), assigned_gt_index.flatten(), axis=0) + gather_scores = gather_scores.reshape([batch_size, num_anchors]) + gather_scores = paddle.where(mask_positive_sum > 0, gather_scores, + paddle.zeros_like(gather_scores)) + assigned_scores *= gather_scores.unsqueeze(-1) + + return assigned_labels, assigned_bboxes, assigned_scores, mask_positive diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/fcosr_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/fcosr_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..46b743e601ab592cb275a554d4adb4c5a0e05bba --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/fcosr_assigner.py @@ -0,0 +1,227 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ppdet.modeling.rbox_utils import box2corners, check_points_in_polys, paddle_gather + +__all__ = ['FCOSRAssigner'] + +EPS = 1e-9 + + +@register +class FCOSRAssigner(nn.Layer): + """ FCOSR Assigner, refer to https://arxiv.org/abs/2111.10780 for details + + 1. compute normalized gaussian distribution score and refined gaussian distribution score + 2. refer to ellipse center sampling, sample points whose normalized gaussian distribution score is greater than threshold + 3. refer to multi-level sampling, assign ground truth to feature map which follows two conditions. + i). first, the ratio between the short edge of the target and the stride of the feature map is less than 2. + ii). second, the long edge of minimum bounding rectangle of the target is larger than the acceptance range of feature map + 4. refer to fuzzy sample label assignment, the points satisfying 2 and 3 will be assigned to the ground truth according to gaussian distribution score + """ + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + factor=12, + threshold=0.23, + boundary=[[-1, 128], [128, 320], [320, 10000]], + score_type='iou'): + super(FCOSRAssigner, self).__init__() + self.num_classes = num_classes + self.factor = factor + self.threshold = threshold + self.boundary = [ + paddle.to_tensor( + l, dtype=paddle.float32).reshape([1, 1, 2]) for l in boundary + ] + self.score_type = score_type + + def get_gaussian_distribution_score(self, points, gt_rboxes, gt_polys): + # projecting points to coordinate system defined by each rbox + # [B, N, 4, 2] -> 4 * [B, N, 1, 2] + a, b, c, d = gt_polys.split(4, axis=2) + # [1, L, 2] -> [1, 1, L, 2] + points = points.unsqueeze(0) + ab = b - a + ad = d - a + # [B, N, 5] -> [B, N, 2], [B, N, 2], [B, N, 1] + xy, wh, angle = gt_rboxes.split([2, 2, 1], axis=-1) + # [B, N, 2] -> [B, N, 1, 2] + xy = xy.unsqueeze(2) + # vector of points to center [B, N, L, 2] + vec = points - xy + # = |ab| * |vec| * cos(theta) [B, N, L] + vec_dot_ab = paddle.sum(vec * ab, axis=-1) + # = |ad| * |vec| * cos(theta) [B, N, L] + vec_dot_ad = paddle.sum(vec * ad, axis=-1) + # norm_ab [B, N, L] + norm_ab = paddle.sum(ab * ab, axis=-1).sqrt() + # norm_ad [B, N, L] + norm_ad = paddle.sum(ad * ad, axis=-1).sqrt() + # min(h, w), [B, N, 1] + min_edge = paddle.min(wh, axis=-1, keepdim=True) + # delta_x, delta_y [B, N, L] + delta_x = vec_dot_ab.pow(2) / (norm_ab.pow(3) * min_edge + EPS) + delta_y = vec_dot_ad.pow(2) / (norm_ad.pow(3) * min_edge + EPS) + # score [B, N, L] + norm_score = paddle.exp(-0.5 * self.factor * (delta_x + delta_y)) + + # simplified calculation + sigma = min_edge / self.factor + refined_score = norm_score / (2 * np.pi * sigma + EPS) + return norm_score, refined_score + + def get_rotated_inside_mask(self, points, gt_polys, scores): + inside_mask = check_points_in_polys(points, gt_polys) + center_mask = scores >= self.threshold + return (inside_mask & center_mask).cast(paddle.float32) + + def get_inside_range_mask(self, points, gt_bboxes, gt_rboxes, stride_tensor, + regress_range): + # [1, L, 2] -> [1, 1, L, 2] + points = points.unsqueeze(0) + # [B, n, 4] -> [B, n, 1, 4] + x1y1, x2y2 = gt_bboxes.unsqueeze(2).split(2, axis=-1) + # [B, n, L, 2] + lt = points - x1y1 + rb = x2y2 - points + # [B, n, L, 4] + ltrb = paddle.concat([lt, rb], axis=-1) + # [B, n, L, 4] -> [B, n, L] + inside_mask = paddle.min(ltrb, axis=-1) > EPS + # regress_range [1, L, 2] -> [1, 1, L, 2] + regress_range = regress_range.unsqueeze(0) + # stride_tensor [1, L, 1] -> [1, 1, L] + stride_tensor = stride_tensor.transpose((0, 2, 1)) + # fcos range + # [B, n, L, 4] -> [B, n, L] + ltrb_max = paddle.max(ltrb, axis=-1) + # [1, 1, L, 2] -> [1, 1, L] + low, high = regress_range[..., 0], regress_range[..., 1] + # [B, n, L] + regress_mask = (ltrb_max >= low) & (ltrb_max <= high) + # mask for rotated + # [B, n, 1] + min_edge = paddle.min(gt_rboxes[..., 2:4], axis=-1, keepdim=True) + # [B, n , L] + rotated_mask = ((min_edge / stride_tensor) < 2.0) & (ltrb_max > high) + mask = inside_mask & (regress_mask | rotated_mask) + return mask.cast(paddle.float32) + + @paddle.no_grad() + def forward(self, + anchor_points, + stride_tensor, + num_anchors_list, + gt_labels, + gt_bboxes, + gt_rboxes, + pad_gt_mask, + bg_index, + pred_rboxes=None): + r""" + + Args: + anchor_points (Tensor, float32): pre-defined anchor points, shape(1, L, 2), + "x, y" format + stride_tensor (Tensor, float32): stride tensor, shape (1, L, 1) + num_anchors_list (List): num of anchors in each level + gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) + gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) + gt_rboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 5) + pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) + bg_index (int): background index + pred_rboxes (Tensor, float32, optional): predicted bounding boxes, shape(B, L, 5) + Returns: + assigned_labels (Tensor): (B, L) + assigned_rboxes (Tensor): (B, L, 5) + assigned_scores (Tensor): (B, L, C), if pred_rboxes is not None, then output ious + """ + + _, num_anchors, _ = anchor_points.shape + batch_size, num_max_boxes, _ = gt_rboxes.shape + if num_max_boxes == 0: + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype=gt_labels.dtype) + assigned_rboxes = paddle.zeros([batch_size, num_anchors, 5]) + assigned_scores = paddle.zeros( + [batch_size, num_anchors, self.num_classes]) + return assigned_labels, assigned_rboxes, assigned_scores + + # get normalized gaussian distribution score and refined distribution score + gt_polys = box2corners(gt_rboxes) + score, refined_score = self.get_gaussian_distribution_score( + anchor_points, gt_rboxes, gt_polys) + inside_mask = self.get_rotated_inside_mask(anchor_points, gt_polys, + score) + regress_ranges = [] + for num, bound in zip(num_anchors_list, self.boundary): + regress_ranges.append(bound.tile((1, num, 1))) + regress_ranges = paddle.concat(regress_ranges, axis=1) + regress_mask = self.get_inside_range_mask( + anchor_points, gt_bboxes, gt_rboxes, stride_tensor, regress_ranges) + # [B, n, L] + mask_positive = inside_mask * regress_mask * pad_gt_mask + refined_score = refined_score * mask_positive - (1. - mask_positive) + + argmax_refined_score = refined_score.argmax(axis=-2) + max_refined_score = refined_score.max(axis=-2) + assigned_gt_index = argmax_refined_score + + # assigned target + batch_ind = paddle.arange( + end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) + assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes + assigned_labels = paddle.gather( + gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) + assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) + assigned_labels = paddle.where( + max_refined_score > 0, assigned_labels, + paddle.full_like(assigned_labels, bg_index)) + + assigned_rboxes = paddle.gather( + gt_rboxes.reshape([-1, 5]), assigned_gt_index.flatten(), axis=0) + assigned_rboxes = assigned_rboxes.reshape([batch_size, num_anchors, 5]) + + assigned_scores = F.one_hot(assigned_labels, self.num_classes + 1) + ind = list(range(self.num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) + + if self.score_type == 'gaussian': + selected_scores = paddle_gather( + score, 1, argmax_refined_score.unsqueeze(-2)).squeeze(-2) + assigned_scores = assigned_scores * selected_scores.unsqueeze(-1) + elif self.score_type == 'iou': + assert pred_rboxes is not None, 'If score type is iou, pred_rboxes should not be None' + from ext_op import matched_rbox_iou + b, l = pred_rboxes.shape[:2] + iou_score = matched_rbox_iou( + pred_rboxes.reshape((-1, 5)), assigned_rboxes.reshape( + (-1, 5))).reshape((b, l, 1)) + assigned_scores = assigned_scores * iou_score + + return assigned_labels, assigned_rboxes, assigned_scores \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/hungarian_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/hungarian_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..154c27ce978d5d959b7682e19a6c410dd8e9f0a4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/hungarian_assigner.py @@ -0,0 +1,316 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +try: + from scipy.optimize import linear_sum_assignment +except ImportError: + linear_sum_assignment = None + +import paddle + +from ppdet.core.workspace import register + +__all__ = ['PoseHungarianAssigner', 'PseudoSampler'] + + +class AssignResult: + """Stores assignments between predicted and truth boxes. + + Attributes: + num_gts (int): the number of truth boxes considered when computing this + assignment + + gt_inds (LongTensor): for each predicted box indicates the 1-based + index of the assigned truth box. 0 means unassigned and -1 means + ignore. + + max_overlaps (FloatTensor): the iou between the predicted box and its + assigned truth box. + + labels (None | LongTensor): If specified, for each predicted box + indicates the category label of the assigned truth box. + """ + + def __init__(self, num_gts, gt_inds, max_overlaps, labels=None): + self.num_gts = num_gts + self.gt_inds = gt_inds + self.max_overlaps = max_overlaps + self.labels = labels + # Interface for possible user-defined properties + self._extra_properties = {} + + @property + def num_preds(self): + """int: the number of predictions in this assignment""" + return len(self.gt_inds) + + def set_extra_property(self, key, value): + """Set user-defined new property.""" + assert key not in self.info + self._extra_properties[key] = value + + def get_extra_property(self, key): + """Get user-defined property.""" + return self._extra_properties.get(key, None) + + @property + def info(self): + """dict: a dictionary of info about the object""" + basic_info = { + 'num_gts': self.num_gts, + 'num_preds': self.num_preds, + 'gt_inds': self.gt_inds, + 'max_overlaps': self.max_overlaps, + 'labels': self.labels, + } + basic_info.update(self._extra_properties) + return basic_info + + +@register +class PoseHungarianAssigner: + """Computes one-to-one matching between predictions and ground truth. + + This class computes an assignment between the targets and the predictions + based on the costs. The costs are weighted sum of three components: + classification cost, regression L1 cost and regression oks cost. The + targets don't include the no_object, so generally there are more + predictions than targets. After the one-to-one matching, the un-matched + are treated as backgrounds. Thus each query prediction will be assigned + with `0` or a positive integer indicating the ground truth index: + + - 0: negative sample, no assigned gt. + - positive integer: positive sample, index (1-based) of assigned gt. + + Args: + cls_weight (int | float, optional): The scale factor for classification + cost. Default 1.0. + kpt_weight (int | float, optional): The scale factor for regression + L1 cost. Default 1.0. + oks_weight (int | float, optional): The scale factor for regression + oks cost. Default 1.0. + """ + __inject__ = ['cls_cost', 'kpt_cost', 'oks_cost'] + + def __init__(self, + cls_cost='ClassificationCost', + kpt_cost='KptL1Cost', + oks_cost='OksCost'): + self.cls_cost = cls_cost + self.kpt_cost = kpt_cost + self.oks_cost = oks_cost + + def assign(self, + cls_pred, + kpt_pred, + gt_labels, + gt_keypoints, + gt_areas, + img_meta, + eps=1e-7): + """Computes one-to-one matching based on the weighted costs. + + This method assign each query prediction to a ground truth or + background. The `assigned_gt_inds` with -1 means don't care, + 0 means negative sample, and positive number is the index (1-based) + of assigned gt. + The assignment is done in the following steps, the order matters. + + 1. assign every prediction to -1 + 2. compute the weighted costs + 3. do Hungarian matching on CPU based on the costs + 4. assign all to 0 (background) first, then for each matched pair + between predictions and gts, treat this prediction as foreground + and assign the corresponding gt index (plus 1) to it. + + Args: + cls_pred (Tensor): Predicted classification logits, shape + [num_query, num_class]. + kpt_pred (Tensor): Predicted keypoints with normalized coordinates + (x_{i}, y_{i}), which are all in range [0, 1]. Shape + [num_query, K*2]. + gt_labels (Tensor): Label of `gt_keypoints`, shape (num_gt,). + gt_keypoints (Tensor): Ground truth keypoints with unnormalized + coordinates [p^{1}_x, p^{1}_y, p^{1}_v, ..., \ + p^{K}_x, p^{K}_y, p^{K}_v]. Shape [num_gt, K*3]. + gt_areas (Tensor): Ground truth mask areas, shape (num_gt,). + img_meta (dict): Meta information for current image. + eps (int | float, optional): A value added to the denominator for + numerical stability. Default 1e-7. + + Returns: + :obj:`AssignResult`: The assigned result. + """ + num_gts, num_kpts = gt_keypoints.shape[0], kpt_pred.shape[0] + if not gt_keypoints.astype('bool').any(): + num_gts = 0 + + # 1. assign -1 by default + assigned_gt_inds = paddle.full((num_kpts, ), -1, dtype="int64") + assigned_labels = paddle.full((num_kpts, ), -1, dtype="int64") + if num_gts == 0 or num_kpts == 0: + # No ground truth or keypoints, return empty assignment + if num_gts == 0: + # No ground truth, assign all to background + assigned_gt_inds[:] = 0 + return AssignResult( + num_gts, assigned_gt_inds, None, labels=assigned_labels) + img_h, img_w, _ = img_meta['img_shape'] + factor = paddle.to_tensor( + [img_w, img_h, img_w, img_h], dtype=gt_keypoints.dtype).reshape( + (1, -1)) + + # 2. compute the weighted costs + # classification cost + cls_cost = self.cls_cost(cls_pred, gt_labels) + + # keypoint regression L1 cost + gt_keypoints_reshape = gt_keypoints.reshape((gt_keypoints.shape[0], -1, + 3)) + valid_kpt_flag = gt_keypoints_reshape[..., -1] + kpt_pred_tmp = kpt_pred.clone().detach().reshape((kpt_pred.shape[0], -1, + 2)) + normalize_gt_keypoints = gt_keypoints_reshape[ + ..., :2] / factor[:, :2].unsqueeze(0) + kpt_cost = self.kpt_cost(kpt_pred_tmp, normalize_gt_keypoints, + valid_kpt_flag) + # keypoint OKS cost + kpt_pred_tmp = kpt_pred.clone().detach().reshape((kpt_pred.shape[0], -1, + 2)) + kpt_pred_tmp = kpt_pred_tmp * factor[:, :2].unsqueeze(0) + oks_cost = self.oks_cost(kpt_pred_tmp, gt_keypoints_reshape[..., :2], + valid_kpt_flag, gt_areas) + # weighted sum of above three costs + cost = cls_cost + kpt_cost + oks_cost + + # 3. do Hungarian matching on CPU using linear_sum_assignment + cost = cost.detach().cpu() + if linear_sum_assignment is None: + raise ImportError('Please run "pip install scipy" ' + 'to install scipy first.') + matched_row_inds, matched_col_inds = linear_sum_assignment(cost) + matched_row_inds = paddle.to_tensor(matched_row_inds) + matched_col_inds = paddle.to_tensor(matched_col_inds) + + # 4. assign backgrounds and foregrounds + # assign all indices to backgrounds first + assigned_gt_inds[:] = 0 + # assign foregrounds based on matching results + assigned_gt_inds[matched_row_inds] = matched_col_inds + 1 + assigned_labels[matched_row_inds] = gt_labels[matched_col_inds][ + ..., 0].astype("int64") + return AssignResult( + num_gts, assigned_gt_inds, None, labels=assigned_labels) + + +class SamplingResult: + """Bbox sampling result. + """ + + def __init__(self, pos_inds, neg_inds, bboxes, gt_bboxes, assign_result, + gt_flags): + self.pos_inds = pos_inds + self.neg_inds = neg_inds + if pos_inds.size > 0: + self.pos_bboxes = bboxes[pos_inds] + self.neg_bboxes = bboxes[neg_inds] + self.pos_is_gt = gt_flags[pos_inds] + + self.num_gts = gt_bboxes.shape[0] + self.pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1 + + if gt_bboxes.numel() == 0: + # hack for index error case + assert self.pos_assigned_gt_inds.numel() == 0 + self.pos_gt_bboxes = paddle.zeros( + gt_bboxes.shape, dtype=gt_bboxes.dtype).reshape((-1, 4)) + else: + if len(gt_bboxes.shape) < 2: + gt_bboxes = gt_bboxes.reshape((-1, 4)) + + self.pos_gt_bboxes = paddle.index_select( + gt_bboxes, + self.pos_assigned_gt_inds.astype('int64'), + axis=0) + + if assign_result.labels is not None: + self.pos_gt_labels = assign_result.labels[pos_inds] + else: + self.pos_gt_labels = None + + @property + def bboxes(self): + """paddle.Tensor: concatenated positive and negative boxes""" + return paddle.concat([self.pos_bboxes, self.neg_bboxes]) + + def __nice__(self): + data = self.info.copy() + data['pos_bboxes'] = data.pop('pos_bboxes').shape + data['neg_bboxes'] = data.pop('neg_bboxes').shape + parts = [f"'{k}': {v!r}" for k, v in sorted(data.items())] + body = ' ' + ',\n '.join(parts) + return '{\n' + body + '\n}' + + @property + def info(self): + """Returns a dictionary of info about the object.""" + return { + 'pos_inds': self.pos_inds, + 'neg_inds': self.neg_inds, + 'pos_bboxes': self.pos_bboxes, + 'neg_bboxes': self.neg_bboxes, + 'pos_is_gt': self.pos_is_gt, + 'num_gts': self.num_gts, + 'pos_assigned_gt_inds': self.pos_assigned_gt_inds, + } + + +@register +class PseudoSampler: + """A pseudo sampler that does not do sampling actually.""" + + def __init__(self, **kwargs): + pass + + def _sample_pos(self, **kwargs): + """Sample positive samples.""" + raise NotImplementedError + + def _sample_neg(self, **kwargs): + """Sample negative samples.""" + raise NotImplementedError + + def sample(self, assign_result, bboxes, gt_bboxes, *args, **kwargs): + """Directly returns the positive and negative indices of samples. + + Args: + assign_result (:obj:`AssignResult`): Assigned results + bboxes (paddle.Tensor): Bounding boxes + gt_bboxes (paddle.Tensor): Ground truth boxes + + Returns: + :obj:`SamplingResult`: sampler results + """ + pos_inds = paddle.nonzero( + assign_result.gt_inds > 0, as_tuple=False).squeeze(-1) + neg_inds = paddle.nonzero( + assign_result.gt_inds == 0, as_tuple=False).squeeze(-1) + gt_flags = paddle.zeros([bboxes.shape[0]], dtype='int32') + sampling_result = SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes, + assign_result, gt_flags) + return sampling_result diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/max_iou_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/max_iou_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..98a4fdf8c5e2c7179318ac5c80ca3a9dd137cf7a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/max_iou_assigner.py @@ -0,0 +1,52 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register +from ppdet.modeling.proposal_generator.target import label_box + +__all__ = ['MaxIoUAssigner'] + +@register +class MaxIoUAssigner(object): + """a standard bbox assigner based on max IoU, use ppdet's label_box + as backend. + Args: + positive_overlap (float): threshold for defining positive samples + negative_overlap (float): threshold for denining negative samples + allow_low_quality (bool): whether to lower IoU thr if a GT poorly + overlaps with candidate bboxes + """ + def __init__(self, + positive_overlap, + negative_overlap, + allow_low_quality=True): + self.positive_overlap = positive_overlap + self.negative_overlap = negative_overlap + self.allow_low_quality = allow_low_quality + + def __call__(self, bboxes, gt_bboxes): + matches, match_labels = label_box( + bboxes, + gt_bboxes, + positive_overlap=self.positive_overlap, + negative_overlap=self.negative_overlap, + allow_low_quality=self.allow_low_quality, + ignore_thresh=-1, + is_crowd=None, + assign_on_cpu=False) + return matches, match_labels diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/pose_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/pose_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..313215a4dd4fc3a61f08a378a7ef598c74265f8d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/pose_utils.py @@ -0,0 +1,275 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import paddle +import paddle.nn.functional as F + +from ppdet.core.workspace import register + +__all__ = ['KptL1Cost', 'OksCost', 'ClassificationCost'] + + +def masked_fill(x, mask, value): + y = paddle.full(x.shape, value, x.dtype) + return paddle.where(mask, y, x) + + +@register +class KptL1Cost(object): + """KptL1Cost. + + this function based on: https://github.com/hikvision-research/opera/blob/main/opera/core/bbox/match_costs/match_cost.py + + Args: + weight (int | float, optional): loss_weight. + """ + + def __init__(self, weight=1.0): + self.weight = weight + + def __call__(self, kpt_pred, gt_keypoints, valid_kpt_flag): + """ + Args: + kpt_pred (Tensor): Predicted keypoints with normalized coordinates + (x_{i}, y_{i}), which are all in range [0, 1]. Shape + [num_query, K, 2]. + gt_keypoints (Tensor): Ground truth keypoints with normalized + coordinates (x_{i}, y_{i}). Shape [num_gt, K, 2]. + valid_kpt_flag (Tensor): valid flag of ground truth keypoints. + Shape [num_gt, K]. + + Returns: + paddle.Tensor: kpt_cost value with weight. + """ + kpt_cost = [] + for i in range(len(gt_keypoints)): + if gt_keypoints[i].size == 0: + kpt_cost.append(kpt_pred.sum() * 0) + kpt_pred_tmp = kpt_pred.clone() + valid_flag = valid_kpt_flag[i] > 0 + valid_flag_expand = valid_flag.unsqueeze(0).unsqueeze(-1).expand_as( + kpt_pred_tmp) + if not valid_flag_expand.all(): + kpt_pred_tmp = masked_fill(kpt_pred_tmp, ~valid_flag_expand, 0) + cost = F.pairwise_distance( + kpt_pred_tmp.reshape((kpt_pred_tmp.shape[0], -1)), + gt_keypoints[i].reshape((-1, )).unsqueeze(0), + p=1, + keepdim=True) + avg_factor = paddle.clip( + valid_flag.astype('float32').sum() * 2, 1.0) + cost = cost / avg_factor + kpt_cost.append(cost) + kpt_cost = paddle.concat(kpt_cost, axis=1) + return kpt_cost * self.weight + + +@register +class OksCost(object): + """OksCost. + + this function based on: https://github.com/hikvision-research/opera/blob/main/opera/core/bbox/match_costs/match_cost.py + + Args: + num_keypoints (int): number of keypoints + weight (int | float, optional): loss_weight. + """ + + def __init__(self, num_keypoints=17, weight=1.0): + self.weight = weight + if num_keypoints == 17: + self.sigmas = np.array( + [ + .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, + 1.07, .87, .87, .89, .89 + ], + dtype=np.float32) / 10.0 + elif num_keypoints == 14: + self.sigmas = np.array( + [ + .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, + .89, .79, .79 + ], + dtype=np.float32) / 10.0 + else: + raise ValueError(f'Unsupported keypoints number {num_keypoints}') + + def __call__(self, kpt_pred, gt_keypoints, valid_kpt_flag, gt_areas): + """ + Args: + kpt_pred (Tensor): Predicted keypoints with unnormalized + coordinates (x_{i}, y_{i}). Shape [num_query, K, 2]. + gt_keypoints (Tensor): Ground truth keypoints with unnormalized + coordinates (x_{i}, y_{i}). Shape [num_gt, K, 2]. + valid_kpt_flag (Tensor): valid flag of ground truth keypoints. + Shape [num_gt, K]. + gt_areas (Tensor): Ground truth mask areas. Shape [num_gt,]. + + Returns: + paddle.Tensor: oks_cost value with weight. + """ + sigmas = paddle.to_tensor(self.sigmas) + variances = (sigmas * 2)**2 + + oks_cost = [] + assert len(gt_keypoints) == len(gt_areas) + for i in range(len(gt_keypoints)): + if gt_keypoints[i].size == 0: + oks_cost.append(kpt_pred.sum() * 0) + squared_distance = \ + (kpt_pred[:, :, 0] - gt_keypoints[i, :, 0].unsqueeze(0)) ** 2 + \ + (kpt_pred[:, :, 1] - gt_keypoints[i, :, 1].unsqueeze(0)) ** 2 + vis_flag = (valid_kpt_flag[i] > 0).astype('int') + vis_ind = vis_flag.nonzero(as_tuple=False)[:, 0] + num_vis_kpt = vis_ind.shape[0] + # assert num_vis_kpt > 0 + if num_vis_kpt == 0: + oks_cost.append(paddle.zeros((squared_distance.shape[0], 1))) + continue + area = gt_areas[i] + + squared_distance0 = squared_distance / (area * variances * 2) + squared_distance0 = paddle.index_select( + squared_distance0, vis_ind, axis=1) + squared_distance1 = paddle.exp(-squared_distance0).sum(axis=1, + keepdim=True) + oks = squared_distance1 / num_vis_kpt + # The 1 is a constant that doesn't change the matching, so omitted. + oks_cost.append(-oks) + oks_cost = paddle.concat(oks_cost, axis=1) + return oks_cost * self.weight + + +@register +class ClassificationCost: + """ClsSoftmaxCost. + + Args: + weight (int | float, optional): loss_weight + """ + + def __init__(self, weight=1.): + self.weight = weight + + def __call__(self, cls_pred, gt_labels): + """ + Args: + cls_pred (Tensor): Predicted classification logits, shape + (num_query, num_class). + gt_labels (Tensor): Label of `gt_bboxes`, shape (num_gt,). + + Returns: + paddle.Tensor: cls_cost value with weight + """ + # Following the official DETR repo, contrary to the loss that + # NLL is used, we approximate it in 1 - cls_score[gt_label]. + # The 1 is a constant that doesn't change the matching, + # so it can be omitted. + cls_score = cls_pred.softmax(-1) + cls_cost = -cls_score[:, gt_labels] + return cls_cost * self.weight + + +@register +class FocalLossCost: + """FocalLossCost. + + Args: + weight (int | float, optional): loss_weight + alpha (int | float, optional): focal_loss alpha + gamma (int | float, optional): focal_loss gamma + eps (float, optional): default 1e-12 + binary_input (bool, optional): Whether the input is binary, + default False. + """ + + def __init__(self, + weight=1., + alpha=0.25, + gamma=2, + eps=1e-12, + binary_input=False): + self.weight = weight + self.alpha = alpha + self.gamma = gamma + self.eps = eps + self.binary_input = binary_input + + def _focal_loss_cost(self, cls_pred, gt_labels): + """ + Args: + cls_pred (Tensor): Predicted classification logits, shape + (num_query, num_class). + gt_labels (Tensor): Label of `gt_bboxes`, shape (num_gt,). + + Returns: + paddle.Tensor: cls_cost value with weight + """ + if gt_labels.size == 0: + return cls_pred.sum() * 0 + cls_pred = F.sigmoid(cls_pred) + neg_cost = -(1 - cls_pred + self.eps).log() * ( + 1 - self.alpha) * cls_pred.pow(self.gamma) + pos_cost = -(cls_pred + self.eps).log() * self.alpha * ( + 1 - cls_pred).pow(self.gamma) + + cls_cost = paddle.index_select( + pos_cost, gt_labels, axis=1) - paddle.index_select( + neg_cost, gt_labels, axis=1) + return cls_cost * self.weight + + def _mask_focal_loss_cost(self, cls_pred, gt_labels): + """ + Args: + cls_pred (Tensor): Predicted classfication logits + in shape (num_query, d1, ..., dn), dtype=paddle.float32. + gt_labels (Tensor): Ground truth in shape (num_gt, d1, ..., dn), + dtype=paddle.long. Labels should be binary. + + Returns: + Tensor: Focal cost matrix with weight in shape\ + (num_query, num_gt). + """ + cls_pred = cls_pred.flatten(1) + gt_labels = gt_labels.flatten(1).float() + n = cls_pred.shape[1] + cls_pred = F.sigmoid(cls_pred) + neg_cost = -(1 - cls_pred + self.eps).log() * ( + 1 - self.alpha) * cls_pred.pow(self.gamma) + pos_cost = -(cls_pred + self.eps).log() * self.alpha * ( + 1 - cls_pred).pow(self.gamma) + + cls_cost = paddle.einsum('nc,mc->nm', pos_cost, gt_labels) + \ + paddle.einsum('nc,mc->nm', neg_cost, (1 - gt_labels)) + return cls_cost / n * self.weight + + def __call__(self, cls_pred, gt_labels): + """ + Args: + cls_pred (Tensor): Predicted classfication logits. + gt_labels (Tensor)): Labels. + + Returns: + Tensor: Focal cost matrix with weight in shape\ + (num_query, num_gt). + """ + if self.binary_input: + return self._mask_focal_loss_cost(cls_pred, gt_labels) + else: + return self._focal_loss_cost(cls_pred, gt_labels) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/rotated_task_aligned_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/rotated_task_aligned_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..eeb9a68b6705fd2cb1c2b51b7d1496a943c1cd79 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/rotated_task_aligned_assigner.py @@ -0,0 +1,164 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ..rbox_utils import rotated_iou_similarity, check_points_in_rotated_boxes +from .utils import gather_topk_anchors, compute_max_iou_anchor + +__all__ = ['RotatedTaskAlignedAssigner'] + + +@register +class RotatedTaskAlignedAssigner(nn.Layer): + """TOOD: Task-aligned One-stage Object Detection + """ + + def __init__(self, topk=13, alpha=1.0, beta=6.0, eps=1e-9): + super(RotatedTaskAlignedAssigner, self).__init__() + self.topk = topk + self.alpha = alpha + self.beta = beta + self.eps = eps + + @paddle.no_grad() + def forward(self, + pred_scores, + pred_bboxes, + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index, + gt_scores=None): + r"""This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/core/bbox/assigners/task_aligned_assigner.py + + The assignment is done in following steps + 1. compute alignment metric between all bbox (bbox of all pyramid levels) and gt + 2. select top-k bbox as candidates for each gt + 3. limit the positive sample's center in gt (because the anchor-free detector + only can predict positive distance) + 4. if an anchor box is assigned to multiple gts, the one with the + highest iou will be selected. + Args: + pred_scores (Tensor, float32): predicted class probability, shape(B, L, C) + pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 5) + anchor_points (Tensor, float32): pre-defined anchors, shape(1, L, 2), "cxcy" format + num_anchors_list (List): num of anchors in each level, shape(L) + gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) + gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 5) + pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) + bg_index (int): background index + gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1) + Returns: + assigned_labels (Tensor): (B, L) + assigned_bboxes (Tensor): (B, L, 5) + assigned_scores (Tensor): (B, L, C) + """ + assert pred_scores.ndim == pred_bboxes.ndim + assert gt_labels.ndim == gt_bboxes.ndim and \ + gt_bboxes.ndim == 3 + + batch_size, num_anchors, num_classes = pred_scores.shape + _, num_max_boxes, _ = gt_bboxes.shape + + # negative batch + if num_max_boxes == 0: + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype=gt_labels.dtype) + assigned_bboxes = paddle.zeros([batch_size, num_anchors, 5]) + assigned_scores = paddle.zeros( + [batch_size, num_anchors, num_classes]) + return assigned_labels, assigned_bboxes, assigned_scores + + # compute iou between gt and pred bbox, [B, n, L] + ious = rotated_iou_similarity(gt_bboxes, pred_bboxes) + ious = paddle.where(ious > 1 + self.eps, paddle.zeros_like(ious), ious) + ious.stop_gradient = True + # gather pred bboxes class score + pred_scores = pred_scores.transpose([0, 2, 1]) + batch_ind = paddle.arange( + end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) + gt_labels_ind = paddle.stack( + [batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)], + axis=-1) + bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind) + # compute alignment metrics, [B, n, L] + alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow( + self.beta) + + # check the positive sample's center in gt, [B, n, L] + is_in_gts = check_points_in_rotated_boxes(anchor_points, gt_bboxes) + + # select topk largest alignment metrics pred bbox as candidates + # for each gt, [B, n, L] + is_in_topk = gather_topk_anchors( + alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask) + + # select positive sample, [B, n, L] + mask_positive = is_in_topk * is_in_gts * pad_gt_mask + + # if an anchor box is assigned to multiple gts, + # the one with the highest iou will be selected, [B, n, L] + mask_positive_sum = mask_positive.sum(axis=-2) + if mask_positive_sum.max() > 1: + mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile( + [1, num_max_boxes, 1]) + is_max_iou = compute_max_iou_anchor(ious) + mask_positive = paddle.where(mask_multiple_gts, is_max_iou, + mask_positive) + mask_positive_sum = mask_positive.sum(axis=-2) + assigned_gt_index = mask_positive.argmax(axis=-2) + + # assigned target + assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes + assigned_labels = paddle.gather( + gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) + assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) + assigned_labels = paddle.where( + mask_positive_sum > 0, assigned_labels, + paddle.full_like(assigned_labels, bg_index)) + + assigned_bboxes = paddle.gather( + gt_bboxes.reshape([-1, 5]), assigned_gt_index.flatten(), axis=0) + assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 5]) + + assigned_scores = F.one_hot(assigned_labels, num_classes + 1) + ind = list(range(num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) + # rescale alignment metrics + alignment_metrics *= mask_positive + max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True) + max_ious_per_instance = (ious * mask_positive).max(axis=-1, + keepdim=True) + alignment_metrics = alignment_metrics / ( + max_metrics_per_instance + self.eps) * max_ious_per_instance + alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1) + assigned_scores = assigned_scores * alignment_metrics + + assigned_bboxes.stop_gradient = True + assigned_scores.stop_gradient = True + assigned_labels.stop_gradient = True + return assigned_labels, assigned_bboxes, assigned_scores diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/simota_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/simota_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..abc055d29e3405174753e03501b40817accb583b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/simota_assigner.py @@ -0,0 +1,265 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/assigners/sim_ota_assigner.py + +import paddle +import numpy as np +import paddle.nn.functional as F + +from ppdet.modeling.losses.varifocal_loss import varifocal_loss +from ppdet.modeling.bbox_utils import batch_bbox_overlaps +from ppdet.core.workspace import register + + +@register +class SimOTAAssigner(object): + """Computes matching between predictions and ground truth. + Args: + center_radius (int | float, optional): Ground truth center size + to judge whether a prior is in center. Default 2.5. + candidate_topk (int, optional): The candidate top-k which used to + get top-k ious to calculate dynamic-k. Default 10. + iou_weight (int | float, optional): The scale factor for regression + iou cost. Default 3.0. + cls_weight (int | float, optional): The scale factor for classification + cost. Default 1.0. + num_classes (int): The num_classes of dataset. + use_vfl (int): Whether to use varifocal_loss when calculating the cost matrix. + """ + __shared__ = ['num_classes'] + + def __init__(self, + center_radius=2.5, + candidate_topk=10, + iou_weight=3.0, + cls_weight=1.0, + num_classes=80, + use_vfl=True): + self.center_radius = center_radius + self.candidate_topk = candidate_topk + self.iou_weight = iou_weight + self.cls_weight = cls_weight + self.num_classes = num_classes + self.use_vfl = use_vfl + + def get_in_gt_and_in_center_info(self, flatten_center_and_stride, + gt_bboxes): + num_gt = gt_bboxes.shape[0] + + flatten_x = flatten_center_and_stride[:, 0].unsqueeze(1).tile( + [1, num_gt]) + flatten_y = flatten_center_and_stride[:, 1].unsqueeze(1).tile( + [1, num_gt]) + flatten_stride_x = flatten_center_and_stride[:, 2].unsqueeze(1).tile( + [1, num_gt]) + flatten_stride_y = flatten_center_and_stride[:, 3].unsqueeze(1).tile( + [1, num_gt]) + + # is prior centers in gt bboxes, shape: [n_center, n_gt] + l_ = flatten_x - gt_bboxes[:, 0] + t_ = flatten_y - gt_bboxes[:, 1] + r_ = gt_bboxes[:, 2] - flatten_x + b_ = gt_bboxes[:, 3] - flatten_y + + deltas = paddle.stack([l_, t_, r_, b_], axis=1) + is_in_gts = deltas.min(axis=1) > 0 + is_in_gts_all = is_in_gts.sum(axis=1) > 0 + + # is prior centers in gt centers + gt_center_xs = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0 + gt_center_ys = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0 + ct_bound_l = gt_center_xs - self.center_radius * flatten_stride_x + ct_bound_t = gt_center_ys - self.center_radius * flatten_stride_y + ct_bound_r = gt_center_xs + self.center_radius * flatten_stride_x + ct_bound_b = gt_center_ys + self.center_radius * flatten_stride_y + + cl_ = flatten_x - ct_bound_l + ct_ = flatten_y - ct_bound_t + cr_ = ct_bound_r - flatten_x + cb_ = ct_bound_b - flatten_y + + ct_deltas = paddle.stack([cl_, ct_, cr_, cb_], axis=1) + is_in_cts = ct_deltas.min(axis=1) > 0 + is_in_cts_all = is_in_cts.sum(axis=1) > 0 + + # in any of gts or gt centers, shape: [n_center] + is_in_gts_or_centers_all = paddle.logical_or(is_in_gts_all, + is_in_cts_all) + + is_in_gts_or_centers_all_inds = paddle.nonzero( + is_in_gts_or_centers_all).squeeze(1) + + # both in gts and gt centers, shape: [num_fg, num_gt] + is_in_gts_and_centers = paddle.logical_and( + paddle.gather( + is_in_gts.cast('int'), is_in_gts_or_centers_all_inds, + axis=0).cast('bool'), + paddle.gather( + is_in_cts.cast('int'), is_in_gts_or_centers_all_inds, + axis=0).cast('bool')) + return is_in_gts_or_centers_all, is_in_gts_or_centers_all_inds, is_in_gts_and_centers + + def dynamic_k_matching(self, cost_matrix, pairwise_ious, num_gt): + match_matrix = np.zeros_like(cost_matrix.numpy()) + # select candidate topk ious for dynamic-k calculation + topk_ious, _ = paddle.topk( + pairwise_ious, + min(self.candidate_topk, pairwise_ious.shape[0]), + axis=0) + # calculate dynamic k for each gt + dynamic_ks = paddle.clip(topk_ious.sum(0).cast('int'), min=1) + for gt_idx in range(num_gt): + _, pos_idx = paddle.topk( + cost_matrix[:, gt_idx], k=dynamic_ks[gt_idx], largest=False) + match_matrix[:, gt_idx][pos_idx.numpy()] = 1.0 + + del topk_ious, dynamic_ks, pos_idx + + # match points more than two gts + extra_match_gts_mask = match_matrix.sum(1) > 1 + if extra_match_gts_mask.sum() > 0: + cost_matrix = cost_matrix.numpy() + cost_argmin = np.argmin( + cost_matrix[extra_match_gts_mask, :], axis=1) + match_matrix[extra_match_gts_mask, :] *= 0.0 + match_matrix[extra_match_gts_mask, cost_argmin] = 1.0 + # get foreground mask + match_fg_mask_inmatrix = match_matrix.sum(1) > 0 + match_gt_inds_to_fg = match_matrix[match_fg_mask_inmatrix, :].argmax(1) + + return match_gt_inds_to_fg, match_fg_mask_inmatrix + + def get_sample(self, assign_gt_inds, gt_bboxes): + pos_inds = np.unique(np.nonzero(assign_gt_inds > 0)[0]) + neg_inds = np.unique(np.nonzero(assign_gt_inds == 0)[0]) + pos_assigned_gt_inds = assign_gt_inds[pos_inds] - 1 + + if gt_bboxes.size == 0: + # hack for index error case + assert pos_assigned_gt_inds.size == 0 + pos_gt_bboxes = np.empty_like(gt_bboxes).reshape(-1, 4) + else: + if len(gt_bboxes.shape) < 2: + gt_bboxes = gt_bboxes.resize(-1, 4) + pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :] + return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds + + def __call__(self, + flatten_cls_pred_scores, + flatten_center_and_stride, + flatten_bboxes, + gt_bboxes, + gt_labels, + eps=1e-7): + """Assign gt to priors using SimOTA. + TODO: add comment. + Returns: + assign_result: The assigned result. + """ + num_gt = gt_bboxes.shape[0] + num_bboxes = flatten_bboxes.shape[0] + + if num_gt == 0 or num_bboxes == 0: + # No ground truth or boxes + label = np.ones([num_bboxes], dtype=np.int64) * self.num_classes + label_weight = np.ones([num_bboxes], dtype=np.float32) + bbox_target = np.zeros_like(flatten_center_and_stride) + return 0, label, label_weight, bbox_target + + is_in_gts_or_centers_all, is_in_gts_or_centers_all_inds, is_in_boxes_and_center = self.get_in_gt_and_in_center_info( + flatten_center_and_stride, gt_bboxes) + + # bboxes and scores to calculate matrix + valid_flatten_bboxes = flatten_bboxes[is_in_gts_or_centers_all_inds] + valid_cls_pred_scores = flatten_cls_pred_scores[ + is_in_gts_or_centers_all_inds] + num_valid_bboxes = valid_flatten_bboxes.shape[0] + + pairwise_ious = batch_bbox_overlaps(valid_flatten_bboxes, + gt_bboxes) # [num_points,num_gts] + if self.use_vfl: + gt_vfl_labels = gt_labels.squeeze(-1).unsqueeze(0).tile( + [num_valid_bboxes, 1]).reshape([-1]) + valid_pred_scores = valid_cls_pred_scores.unsqueeze(1).tile( + [1, num_gt, 1]).reshape([-1, self.num_classes]) + vfl_score = np.zeros(valid_pred_scores.shape) + vfl_score[np.arange(0, vfl_score.shape[0]), gt_vfl_labels.numpy( + )] = pairwise_ious.reshape([-1]) + vfl_score = paddle.to_tensor(vfl_score) + losses_vfl = varifocal_loss( + valid_pred_scores, vfl_score, + use_sigmoid=False).reshape([num_valid_bboxes, num_gt]) + losses_giou = batch_bbox_overlaps( + valid_flatten_bboxes, gt_bboxes, mode='giou') + cost_matrix = ( + losses_vfl * self.cls_weight + losses_giou * self.iou_weight + + paddle.logical_not(is_in_boxes_and_center).cast('float32') * + 100000000) + else: + iou_cost = -paddle.log(pairwise_ious + eps) + gt_onehot_label = (F.one_hot( + gt_labels.squeeze(-1).cast(paddle.int64), + flatten_cls_pred_scores.shape[-1]).cast('float32').unsqueeze(0) + .tile([num_valid_bboxes, 1, 1])) + + valid_pred_scores = valid_cls_pred_scores.unsqueeze(1).tile( + [1, num_gt, 1]) + cls_cost = F.binary_cross_entropy( + valid_pred_scores, gt_onehot_label, reduction='none').sum(-1) + + cost_matrix = ( + cls_cost * self.cls_weight + iou_cost * self.iou_weight + + paddle.logical_not(is_in_boxes_and_center).cast('float32') * + 100000000) + + match_gt_inds_to_fg, match_fg_mask_inmatrix = \ + self.dynamic_k_matching( + cost_matrix, pairwise_ious, num_gt) + + # sample and assign results + assigned_gt_inds = np.zeros([num_bboxes], dtype=np.int64) + match_fg_mask_inall = np.zeros_like(assigned_gt_inds) + match_fg_mask_inall[is_in_gts_or_centers_all.numpy( + )] = match_fg_mask_inmatrix + + assigned_gt_inds[match_fg_mask_inall.astype( + np.bool_)] = match_gt_inds_to_fg + 1 + + pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds \ + = self.get_sample(assigned_gt_inds, gt_bboxes.numpy()) + + bbox_target = np.zeros_like(flatten_bboxes) + bbox_weight = np.zeros_like(flatten_bboxes) + label = np.ones([num_bboxes], dtype=np.int64) * self.num_classes + label_weight = np.zeros([num_bboxes], dtype=np.float32) + + if len(pos_inds) > 0: + gt_labels = gt_labels.numpy() + pos_bbox_targets = pos_gt_bboxes + bbox_target[pos_inds, :] = pos_bbox_targets + bbox_weight[pos_inds, :] = 1.0 + if not np.any(gt_labels): + label[pos_inds] = 0 + else: + label[pos_inds] = gt_labels.squeeze(-1)[pos_assigned_gt_inds] + + label_weight[pos_inds] = 1.0 + if len(neg_inds) > 0: + label_weight[neg_inds] = 1.0 + + pos_num = max(pos_inds.size, 1) + + return pos_num, label, label_weight, bbox_target diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..5a756fa67dac6d5ab6bfe276cf5da3535038ea56 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner.py @@ -0,0 +1,193 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ..bbox_utils import batch_iou_similarity +from .utils import (gather_topk_anchors, check_points_inside_bboxes, + compute_max_iou_anchor) + +__all__ = ['TaskAlignedAssigner'] + + +def is_close_gt(anchor, gt, stride_lst, max_dist=2.0, alpha=2.): + """Calculate distance ratio of box1 and box2 in batch for larger stride + anchors dist/stride to promote the survive of large distance match + Args: + anchor (Tensor): box with the shape [L, 2] + gt (Tensor): box with the shape [N, M2, 4] + Return: + dist (Tensor): dist ratio between box1 and box2 with the shape [N, M1, M2] + """ + center1 = anchor.unsqueeze(0) + center2 = (gt[..., :2] + gt[..., -2:]) / 2. + center1 = center1.unsqueeze(1) # [N, M1, 2] -> [N, 1, M1, 2] + center2 = center2.unsqueeze(2) # [N, M2, 2] -> [N, M2, 1, 2] + + stride = paddle.concat([ + paddle.full([x], 32 / pow(2, idx)) for idx, x in enumerate(stride_lst) + ]).unsqueeze(0).unsqueeze(0) + dist = paddle.linalg.norm(center1 - center2, p=2, axis=-1) / stride + dist_ratio = dist + dist_ratio[dist < max_dist] = 1. + dist_ratio[dist >= max_dist] = 0. + return dist_ratio + + +@register +class TaskAlignedAssigner(nn.Layer): + """TOOD: Task-aligned One-stage Object Detection + """ + + def __init__(self, + topk=13, + alpha=1.0, + beta=6.0, + eps=1e-9, + is_close_gt=False): + super(TaskAlignedAssigner, self).__init__() + self.topk = topk + self.alpha = alpha + self.beta = beta + self.eps = eps + self.is_close_gt = is_close_gt + + @paddle.no_grad() + def forward(self, + pred_scores, + pred_bboxes, + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index, + gt_scores=None): + r"""This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/core/bbox/assigners/task_aligned_assigner.py + + The assignment is done in following steps + 1. compute alignment metric between all bbox (bbox of all pyramid levels) and gt + 2. select top-k bbox as candidates for each gt + 3. limit the positive sample's center in gt (because the anchor-free detector + only can predict positive distance) + 4. if an anchor box is assigned to multiple gts, the one with the + highest iou will be selected. + Args: + pred_scores (Tensor, float32): predicted class probability, shape(B, L, C) + pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 4) + anchor_points (Tensor, float32): pre-defined anchors, shape(L, 2), "cxcy" format + num_anchors_list (List): num of anchors in each level, shape(L) + gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) + gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) + pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) + bg_index (int): background index + gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1) + Returns: + assigned_labels (Tensor): (B, L) + assigned_bboxes (Tensor): (B, L, 4) + assigned_scores (Tensor): (B, L, C) + """ + assert pred_scores.ndim == pred_bboxes.ndim + assert gt_labels.ndim == gt_bboxes.ndim and \ + gt_bboxes.ndim == 3 + + batch_size, num_anchors, num_classes = pred_scores.shape + _, num_max_boxes, _ = gt_bboxes.shape + + # negative batch + if num_max_boxes == 0: + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype='int32') + assigned_bboxes = paddle.zeros([batch_size, num_anchors, 4]) + assigned_scores = paddle.zeros( + [batch_size, num_anchors, num_classes]) + return assigned_labels, assigned_bboxes, assigned_scores + + # compute iou between gt and pred bbox, [B, n, L] + ious = batch_iou_similarity(gt_bboxes, pred_bboxes) + # gather pred bboxes class score + pred_scores = pred_scores.transpose([0, 2, 1]) + batch_ind = paddle.arange( + end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) + gt_labels_ind = paddle.stack( + [batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)], + axis=-1) + bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind) + # compute alignment metrics, [B, n, L] + alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow( + self.beta) + + # check the positive sample's center in gt, [B, n, L] + if self.is_close_gt: + is_in_gts = is_close_gt(anchor_points, gt_bboxes, num_anchors_list) + else: + is_in_gts = check_points_inside_bboxes(anchor_points, gt_bboxes) + + # select topk largest alignment metrics pred bbox as candidates + # for each gt, [B, n, L] + is_in_topk = gather_topk_anchors( + alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask) + + # select positive sample, [B, n, L] + mask_positive = is_in_topk * is_in_gts * pad_gt_mask + + # if an anchor box is assigned to multiple gts, + # the one with the highest iou will be selected, [B, n, L] + mask_positive_sum = mask_positive.sum(axis=-2) + if mask_positive_sum.max() > 1: + mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile( + [1, num_max_boxes, 1]) + is_max_iou = compute_max_iou_anchor(ious) + mask_positive = paddle.where(mask_multiple_gts, is_max_iou, + mask_positive) + mask_positive_sum = mask_positive.sum(axis=-2) + assigned_gt_index = mask_positive.argmax(axis=-2) + + # assigned target + assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes + assigned_labels = paddle.gather( + gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) + assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) + assigned_labels = paddle.where( + mask_positive_sum > 0, assigned_labels, + paddle.full_like(assigned_labels, bg_index)) + + assigned_bboxes = paddle.gather( + gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) + assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) + + assigned_scores = F.one_hot(assigned_labels, num_classes + 1) + ind = list(range(num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) + # rescale alignment metrics + alignment_metrics *= mask_positive + max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True) + max_ious_per_instance = (ious * mask_positive).max(axis=-1, + keepdim=True) + alignment_metrics = alignment_metrics / ( + max_metrics_per_instance + self.eps) * max_ious_per_instance + alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1) + assigned_scores = assigned_scores * alignment_metrics + + return assigned_labels, assigned_bboxes, assigned_scores, mask_positive diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner_cr.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner_cr.py new file mode 100644 index 0000000000000000000000000000000000000000..4558d6e8ec7af5a59fc4975bff089616f0b0b209 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/task_aligned_assigner_cr.py @@ -0,0 +1,181 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ..bbox_utils import batch_iou_similarity +from .utils import (gather_topk_anchors, check_points_inside_bboxes, + compute_max_iou_anchor) + +__all__ = ['TaskAlignedAssigner_CR'] + + +@register +class TaskAlignedAssigner_CR(nn.Layer): + """TOOD: Task-aligned One-stage Object Detection with Center R + """ + + def __init__(self, + topk=13, + alpha=1.0, + beta=6.0, + center_radius=None, + eps=1e-9): + super(TaskAlignedAssigner_CR, self).__init__() + self.topk = topk + self.alpha = alpha + self.beta = beta + self.center_radius = center_radius + self.eps = eps + + @paddle.no_grad() + def forward(self, + pred_scores, + pred_bboxes, + anchor_points, + stride_tensor, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index, + gt_scores=None): + r"""This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/core/bbox/assigners/task_aligned_assigner.py + + The assignment is done in following steps + 1. compute alignment metric between all bbox (bbox of all pyramid levels) and gt + 2. select top-k bbox as candidates for each gt + 3. limit the positive sample's center in gt (because the anchor-free detector + only can predict positive distance) + 4. if an anchor box is assigned to multiple gts, the one with the + highest iou will be selected. + Args: + pred_scores (Tensor, float32): predicted class probability, shape(B, L, C) + pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 4) + anchor_points (Tensor, float32): pre-defined anchors, shape(L, 2), "cxcy" format + stride_tensor (Tensor, float32): stride of feature map, shape(L, 1) + gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) + gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) + pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) + bg_index (int): background index + gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1) + Returns: + assigned_labels (Tensor): (B, L) + assigned_bboxes (Tensor): (B, L, 4) + assigned_scores (Tensor): (B, L, C) + """ + assert pred_scores.ndim == pred_bboxes.ndim + assert gt_labels.ndim == gt_bboxes.ndim and \ + gt_bboxes.ndim == 3 + + batch_size, num_anchors, num_classes = pred_scores.shape + _, num_max_boxes, _ = gt_bboxes.shape + + # negative batch + if num_max_boxes == 0: + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype='int32') + assigned_bboxes = paddle.zeros([batch_size, num_anchors, 4]) + assigned_scores = paddle.zeros( + [batch_size, num_anchors, num_classes]) + return assigned_labels, assigned_bboxes, assigned_scores + + # compute iou between gt and pred bbox, [B, n, L] + ious = batch_iou_similarity(gt_bboxes, pred_bboxes) + # gather pred bboxes class score + pred_scores = pred_scores.transpose([0, 2, 1]) + batch_ind = paddle.arange( + end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) + gt_labels_ind = paddle.stack( + [batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)], + axis=-1) + bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind) + # compute alignment metrics, [B, n, L] + alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow( + self.beta) * pad_gt_mask + + # select positive sample, [B, n, L] + if self.center_radius is None: + # check the positive sample's center in gt, [B, n, L] + is_in_gts = check_points_inside_bboxes( + anchor_points, gt_bboxes, sm_use=True) + # select topk largest alignment metrics pred bbox as candidates + # for each gt, [B, n, L] + mask_positive = gather_topk_anchors( + alignment_metrics, self.topk, topk_mask=pad_gt_mask) * is_in_gts + else: + is_in_gts, is_in_center = check_points_inside_bboxes( + anchor_points, + gt_bboxes, + stride_tensor * self.center_radius, + sm_use=True) + is_in_gts *= pad_gt_mask + is_in_center *= pad_gt_mask + candidate_metrics = paddle.where( + is_in_gts.sum(-1, keepdim=True) == 0, + alignment_metrics + is_in_center, + alignment_metrics) + mask_positive = gather_topk_anchors( + candidate_metrics, self.topk, + topk_mask=pad_gt_mask) * paddle.cast((is_in_center > 0) | + (is_in_gts > 0), 'float32') + + # if an anchor box is assigned to multiple gts, + # the one with the highest iou will be selected, [B, n, L] + mask_positive_sum = mask_positive.sum(axis=-2) + if mask_positive_sum.max() > 1: + mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile( + [1, num_max_boxes, 1]) + is_max_iou = compute_max_iou_anchor(ious * mask_positive) + mask_positive = paddle.where(mask_multiple_gts, is_max_iou, + mask_positive) + mask_positive_sum = mask_positive.sum(axis=-2) + assigned_gt_index = mask_positive.argmax(axis=-2) + + # assigned target + assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes + assigned_labels = paddle.gather( + gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) + assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) + assigned_labels = paddle.where( + mask_positive_sum > 0, assigned_labels, + paddle.full_like(assigned_labels, bg_index)) + + assigned_bboxes = paddle.gather( + gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) + assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) + + assigned_scores = F.one_hot(assigned_labels, num_classes + 1) + ind = list(range(num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) + # rescale alignment metrics + alignment_metrics *= mask_positive + max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True) + max_ious_per_instance = (ious * mask_positive).max(axis=-1, + keepdim=True) + alignment_metrics = alignment_metrics / ( + max_metrics_per_instance + self.eps) * max_ious_per_instance + alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1) + assigned_scores = assigned_scores * alignment_metrics + + return assigned_labels, assigned_bboxes, assigned_scores, mask_positive diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/uniform_assigner.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/uniform_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..1c1480593d9fc66147b2eefb9d3b6246713e2c74 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/uniform_assigner.py @@ -0,0 +1,93 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +from ppdet.modeling.bbox_utils import batch_bbox_overlaps +from ppdet.modeling.transformers import bbox_xyxy_to_cxcywh + +__all__ = ['UniformAssigner'] + + +def batch_p_dist(x, y, p=2): + """ + calculate pairwise p_dist, the first index of x and y are batch + return [x.shape[0], y.shape[0]] + """ + x = x.unsqueeze(1) + diff = x - y + return paddle.norm(diff, p=p, axis=list(range(2, diff.dim()))) + + +@register +class UniformAssigner(nn.Layer): + def __init__(self, pos_ignore_thr, neg_ignore_thr, match_times=4): + super(UniformAssigner, self).__init__() + self.pos_ignore_thr = pos_ignore_thr + self.neg_ignore_thr = neg_ignore_thr + self.match_times = match_times + + def forward(self, bbox_pred, anchor, gt_bboxes, gt_labels=None): + num_bboxes = bbox_pred.shape[0] + num_gts = gt_bboxes.shape[0] + match_labels = paddle.full([num_bboxes], -1, dtype=paddle.int32) + + pred_ious = batch_bbox_overlaps(bbox_pred, gt_bboxes) + pred_max_iou = pred_ious.max(axis=1) + neg_ignore = pred_max_iou > self.neg_ignore_thr + # exclude potential ignored neg samples first, deal with pos samples later + #match_labels: -2(ignore), -1(neg) or >=0(pos_inds) + match_labels = paddle.where(neg_ignore, + paddle.full_like(match_labels, -2), + match_labels) + + bbox_pred_c = bbox_xyxy_to_cxcywh(bbox_pred) + anchor_c = bbox_xyxy_to_cxcywh(anchor) + gt_bboxes_c = bbox_xyxy_to_cxcywh(gt_bboxes) + bbox_pred_dist = batch_p_dist(bbox_pred_c, gt_bboxes_c, p=1) + anchor_dist = batch_p_dist(anchor_c, gt_bboxes_c, p=1) + + top_pred = bbox_pred_dist.topk( + k=self.match_times, axis=0, largest=False)[1] + top_anchor = anchor_dist.topk( + k=self.match_times, axis=0, largest=False)[1] + + tar_pred = paddle.arange(num_gts).expand([self.match_times, num_gts]) + tar_anchor = paddle.arange(num_gts).expand([self.match_times, num_gts]) + pos_places = paddle.concat([top_pred, top_anchor]).reshape([-1]) + pos_inds = paddle.concat([tar_pred, tar_anchor]).reshape([-1]) + + pos_anchor = anchor[pos_places] + pos_tar_bbox = gt_bboxes[pos_inds] + pos_ious = batch_bbox_overlaps( + pos_anchor, pos_tar_bbox, is_aligned=True) + pos_ignore = pos_ious < self.pos_ignore_thr + pos_inds = paddle.where(pos_ignore, + paddle.full_like(pos_inds, -2), pos_inds) + match_labels[pos_places] = pos_inds + match_labels.stop_gradient = True + pos_keep = ~pos_ignore + + if pos_keep.sum() > 0: + pos_places_keep = pos_places[pos_keep] + pos_bbox_pred = bbox_pred[pos_places_keep].reshape([-1, 4]) + pos_bbox_tar = pos_tar_bbox[pos_keep].reshape([-1, 4]).detach() + else: + pos_bbox_pred = None + pos_bbox_tar = None + + return match_labels, pos_bbox_pred, pos_bbox_tar diff --git a/PaddleDetection-release-2.6/ppdet/modeling/assigners/utils.py b/PaddleDetection-release-2.6/ppdet/modeling/assigners/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8fe7c9382c359a55b8c4b6efc491eaa049aab5b1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/assigners/utils.py @@ -0,0 +1,230 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn.functional as F + +__all__ = [ + 'pad_gt', 'gather_topk_anchors', 'check_points_inside_bboxes', + 'compute_max_iou_anchor', 'compute_max_iou_gt', + 'generate_anchors_for_grid_cell' +] + + +def pad_gt(gt_labels, gt_bboxes, gt_scores=None): + r""" Pad 0 in gt_labels and gt_bboxes. + Args: + gt_labels (Tensor|List[Tensor], int64): Label of gt_bboxes, + shape is [B, n, 1] or [[n_1, 1], [n_2, 1], ...], here n = sum(n_i) + gt_bboxes (Tensor|List[Tensor], float32): Ground truth bboxes, + shape is [B, n, 4] or [[n_1, 4], [n_2, 4], ...], here n = sum(n_i) + gt_scores (Tensor|List[Tensor]|None, float32): Score of gt_bboxes, + shape is [B, n, 1] or [[n_1, 4], [n_2, 4], ...], here n = sum(n_i) + Returns: + pad_gt_labels (Tensor, int64): shape[B, n, 1] + pad_gt_bboxes (Tensor, float32): shape[B, n, 4] + pad_gt_scores (Tensor, float32): shape[B, n, 1] + pad_gt_mask (Tensor, float32): shape[B, n, 1], 1 means bbox, 0 means no bbox + """ + if isinstance(gt_labels, paddle.Tensor) and isinstance(gt_bboxes, + paddle.Tensor): + assert gt_labels.ndim == gt_bboxes.ndim and \ + gt_bboxes.ndim == 3 + pad_gt_mask = ( + gt_bboxes.sum(axis=-1, keepdim=True) > 0).astype(gt_bboxes.dtype) + if gt_scores is None: + gt_scores = pad_gt_mask.clone() + assert gt_labels.ndim == gt_scores.ndim + + return gt_labels, gt_bboxes, gt_scores, pad_gt_mask + elif isinstance(gt_labels, list) and isinstance(gt_bboxes, list): + assert len(gt_labels) == len(gt_bboxes), \ + 'The number of `gt_labels` and `gt_bboxes` is not equal. ' + num_max_boxes = max([len(a) for a in gt_bboxes]) + batch_size = len(gt_bboxes) + # pad label and bbox + pad_gt_labels = paddle.zeros( + [batch_size, num_max_boxes, 1], dtype=gt_labels[0].dtype) + pad_gt_bboxes = paddle.zeros( + [batch_size, num_max_boxes, 4], dtype=gt_bboxes[0].dtype) + pad_gt_scores = paddle.zeros( + [batch_size, num_max_boxes, 1], dtype=gt_bboxes[0].dtype) + pad_gt_mask = paddle.zeros( + [batch_size, num_max_boxes, 1], dtype=gt_bboxes[0].dtype) + for i, (label, bbox) in enumerate(zip(gt_labels, gt_bboxes)): + if len(label) > 0 and len(bbox) > 0: + pad_gt_labels[i, :len(label)] = label + pad_gt_bboxes[i, :len(bbox)] = bbox + pad_gt_mask[i, :len(bbox)] = 1. + if gt_scores is not None: + pad_gt_scores[i, :len(gt_scores[i])] = gt_scores[i] + if gt_scores is None: + pad_gt_scores = pad_gt_mask.clone() + return pad_gt_labels, pad_gt_bboxes, pad_gt_scores, pad_gt_mask + else: + raise ValueError('The input `gt_labels` or `gt_bboxes` is invalid! ') + + +def gather_topk_anchors(metrics, topk, largest=True, topk_mask=None, eps=1e-9): + r""" + Args: + metrics (Tensor, float32): shape[B, n, L], n: num_gts, L: num_anchors + topk (int): The number of top elements to look for along the axis. + largest (bool) : largest is a flag, if set to true, + algorithm will sort by descending order, otherwise sort by + ascending order. Default: True + topk_mask (Tensor, float32): shape[B, n, 1], ignore bbox mask, + Default: None + eps (float): Default: 1e-9 + Returns: + is_in_topk (Tensor, float32): shape[B, n, L], value=1. means selected + """ + num_anchors = metrics.shape[-1] + topk_metrics, topk_idxs = paddle.topk( + metrics, topk, axis=-1, largest=largest) + if topk_mask is None: + topk_mask = ( + topk_metrics.max(axis=-1, keepdim=True) > eps).astype(metrics.dtype) + is_in_topk = F.one_hot(topk_idxs, num_anchors).sum( + axis=-2).astype(metrics.dtype) + return is_in_topk * topk_mask + + +def check_points_inside_bboxes(points, + bboxes, + center_radius_tensor=None, + eps=1e-9, + sm_use=False): + r""" + Args: + points (Tensor, float32): shape[L, 2], "xy" format, L: num_anchors + bboxes (Tensor, float32): shape[B, n, 4], "xmin, ymin, xmax, ymax" format + center_radius_tensor (Tensor, float32): shape [L, 1]. Default: None. + eps (float): Default: 1e-9 + Returns: + is_in_bboxes (Tensor, float32): shape[B, n, L], value=1. means selected + """ + points = points.unsqueeze([0, 1]) + x, y = points.chunk(2, axis=-1) + xmin, ymin, xmax, ymax = bboxes.unsqueeze(2).chunk(4, axis=-1) + # check whether `points` is in `bboxes` + l = x - xmin + t = y - ymin + r = xmax - x + b = ymax - y + delta_ltrb = paddle.concat([l, t, r, b], axis=-1) + is_in_bboxes = (delta_ltrb.min(axis=-1) > eps) + if center_radius_tensor is not None: + # check whether `points` is in `center_radius` + center_radius_tensor = center_radius_tensor.unsqueeze([0, 1]) + cx = (xmin + xmax) * 0.5 + cy = (ymin + ymax) * 0.5 + l = x - (cx - center_radius_tensor) + t = y - (cy - center_radius_tensor) + r = (cx + center_radius_tensor) - x + b = (cy + center_radius_tensor) - y + delta_ltrb_c = paddle.concat([l, t, r, b], axis=-1) + is_in_center = (delta_ltrb_c.min(axis=-1) > eps) + if sm_use: + return is_in_bboxes.astype(bboxes.dtype), is_in_center.astype( + bboxes.dtype) + else: + return (paddle.logical_and(is_in_bboxes, is_in_center), + paddle.logical_or(is_in_bboxes, is_in_center)) + + return is_in_bboxes.astype(bboxes.dtype) + + +def compute_max_iou_anchor(ious): + r""" + For each anchor, find the GT with the largest IOU. + Args: + ious (Tensor, float32): shape[B, n, L], n: num_gts, L: num_anchors + Returns: + is_max_iou (Tensor, float32): shape[B, n, L], value=1. means selected + """ + num_max_boxes = ious.shape[-2] + max_iou_index = ious.argmax(axis=-2) + is_max_iou = F.one_hot(max_iou_index, num_max_boxes).transpose([0, 2, 1]) + return is_max_iou.astype(ious.dtype) + + +def compute_max_iou_gt(ious): + r""" + For each GT, find the anchor with the largest IOU. + Args: + ious (Tensor, float32): shape[B, n, L], n: num_gts, L: num_anchors + Returns: + is_max_iou (Tensor, float32): shape[B, n, L], value=1. means selected + """ + num_anchors = ious.shape[-1] + max_iou_index = ious.argmax(axis=-1) + is_max_iou = F.one_hot(max_iou_index, num_anchors) + return is_max_iou.astype(ious.dtype) + + +def generate_anchors_for_grid_cell(feats, + fpn_strides, + grid_cell_size=5.0, + grid_cell_offset=0.5, + dtype='float32'): + r""" + Like ATSS, generate anchors based on grid size. + Args: + feats (List[Tensor]): shape[s, (b, c, h, w)] + fpn_strides (tuple|list): shape[s], stride for each scale feature + grid_cell_size (float): anchor size + grid_cell_offset (float): The range is between 0 and 1. + Returns: + anchors (Tensor): shape[l, 4], "xmin, ymin, xmax, ymax" format. + anchor_points (Tensor): shape[l, 2], "x, y" format. + num_anchors_list (List[int]): shape[s], contains [s_1, s_2, ...]. + stride_tensor (Tensor): shape[l, 1], contains the stride for each scale. + """ + assert len(feats) == len(fpn_strides) + anchors = [] + anchor_points = [] + num_anchors_list = [] + stride_tensor = [] + for feat, stride in zip(feats, fpn_strides): + _, _, h, w = feat.shape + cell_half_size = grid_cell_size * stride * 0.5 + shift_x = (paddle.arange(end=w) + grid_cell_offset) * stride + shift_y = (paddle.arange(end=h) + grid_cell_offset) * stride + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor = paddle.stack( + [ + shift_x - cell_half_size, shift_y - cell_half_size, + shift_x + cell_half_size, shift_y + cell_half_size + ], + axis=-1).astype(dtype) + anchor_point = paddle.stack([shift_x, shift_y], axis=-1).astype(dtype) + + anchors.append(anchor.reshape([-1, 4])) + anchor_points.append(anchor_point.reshape([-1, 2])) + num_anchors_list.append(len(anchors[-1])) + stride_tensor.append( + paddle.full( + [num_anchors_list[-1], 1], stride, dtype=dtype)) + anchors = paddle.concat(anchors) + anchors.stop_gradient = True + anchor_points = paddle.concat(anchor_points) + anchor_points.stop_gradient = True + stride_tensor = paddle.concat(stride_tensor) + stride_tensor.stop_gradient = True + return anchors, anchor_points, num_anchors_list, stride_tensor diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fcca7159fc75f21ad49ba09262bd656ad6bed201 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__init__.py @@ -0,0 +1,62 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import vgg +from . import resnet +from . import darknet +from . import mobilenet_v1 +from . import mobilenet_v3 +from . import hrnet +from . import lite_hrnet +from . import blazenet +from . import ghostnet +from . import senet +from . import res2net +from . import dla +from . import shufflenet_v2 +from . import swin_transformer +from . import lcnet +from . import hardnet +from . import esnet +from . import cspresnet +from . import csp_darknet +from . import convnext +from . import vision_transformer +from . import mobileone +from . import trans_encoder + +from .vgg import * +from .resnet import * +from .darknet import * +from .mobilenet_v1 import * +from .mobilenet_v3 import * +from .hrnet import * +from .lite_hrnet import * +from .blazenet import * +from .ghostnet import * +from .senet import * +from .res2net import * +from .dla import * +from .shufflenet_v2 import * +from .swin_transformer import * +from .lcnet import * +from .hardnet import * +from .esnet import * +from .cspresnet import * +from .csp_darknet import * +from .convnext import * +from .vision_transformer import * +from .vision_transformer import * +from .mobileone import * +from .trans_encoder import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8b75714b1aa8ba2455ae57a1fc96a07790089eca Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/blazenet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/blazenet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..000586897d2863a4e159ba5495607d1d277d7451 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/blazenet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/convnext.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/convnext.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6098f6c4493f039340e423867deff3f72ba83c67 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/convnext.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/csp_darknet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/csp_darknet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..70f2025a1ab4904754040a5b2b7f424173df94bb Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/csp_darknet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/cspresnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/cspresnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9647259e79bf41e9d1992b6d21b8fb4328d3187a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/cspresnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/darknet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/darknet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b46c5ad737cdd8757b45badaf7e0eae809340099 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/darknet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/dla.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/dla.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..af91619d019d24244c58076999cad4a3b16e537f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/dla.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/esnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/esnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..71b3fa42b7898f0fdb2e14afdf7f61e2120b71da Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/esnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/ghostnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/ghostnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..09692474e6bacd3ed240eea264b76764345ed992 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/ghostnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hardnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hardnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e1e2b6bda438a27521bda62a48d6b44615fd3e22 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hardnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hrnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hrnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5d7dc3b31e92c71a422a41dfcd67c7d29a4e2395 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/hrnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lcnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lcnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f4110a368836b27eafbce86eccb6049cc3cfaafe Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lcnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lite_hrnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lite_hrnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ad4940634c5a0efeefa1a83bc896479fa7a85a57 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/lite_hrnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v1.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v1.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6e86ca2b2ae889236c52913cce14b507091c522a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v1.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v3.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v3.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..123077df489a5de364b5f291c9ef7f0384952d84 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobilenet_v3.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobileone.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobileone.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..11ac7d63071de6c2717462a4b86880703daec2af Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/mobileone.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/name_adapter.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/name_adapter.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f62923e72a394ee9eb7426ebb4acc06d843b1456 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/name_adapter.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/res2net.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/res2net.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fac6e2d4b714a317c80ed3e54e9f1a9d60ae65c2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/res2net.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/resnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/resnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..14cef6b0e04cc504b5e24e7539a4d9aa656b55e9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/resnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/senet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/senet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d92ba820988129da81665f6d7a9619dde863ca2d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/senet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/shufflenet_v2.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/shufflenet_v2.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..14f7d27442953ad33383d945bcb2e484eca5ebd0 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/shufflenet_v2.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/swin_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/swin_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6194ee80246e92c9c7e06952909884253e95f3be Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/swin_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/trans_encoder.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/trans_encoder.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..29d0519ad97d5bed92675ccb72fe6f84906713f6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/trans_encoder.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/transformer_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/transformer_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..08c5538cd15674c92748bd8a60ae0c60f49d9b86 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/transformer_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vgg.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vgg.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..51f3f07af2a876f2d19becc353dbc74f358bd179 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vgg.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vision_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vision_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c31e54c7f3656180fc7465fb11e5a71822424b2f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/backbones/__pycache__/vision_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/blazenet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/blazenet.py new file mode 100644 index 0000000000000000000000000000000000000000..fbfdcec9de9f6caa7c2ad68c4c828ba48c66b8dd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/blazenet.py @@ -0,0 +1,319 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import KaimingNormal +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['BlazeNet'] + + +def hard_swish(x): + return x * F.relu6(x + 3) / 6. + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride, + padding, + num_groups=1, + act='relu', + conv_lr=0.1, + conv_decay=0., + norm_decay=0., + norm_type='bn', + name=None): + super(ConvBNLayer, self).__init__() + self.act = act + self._conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=num_groups, + weight_attr=ParamAttr( + learning_rate=conv_lr, initializer=KaimingNormal()), + bias_attr=False) + + if norm_type in ['bn', 'sync_bn']: + self._batch_norm = nn.BatchNorm2D(out_channels) + + def forward(self, x): + x = self._conv(x) + x = self._batch_norm(x) + if self.act == "relu": + x = F.relu(x) + elif self.act == "relu6": + x = F.relu6(x) + elif self.act == 'leaky': + x = F.leaky_relu(x) + elif self.act == 'hard_swish': + x = hard_swish(x) + return x + + +class BlazeBlock(nn.Layer): + def __init__(self, + in_channels, + out_channels1, + out_channels2, + double_channels=None, + stride=1, + use_5x5kernel=True, + act='relu', + name=None): + super(BlazeBlock, self).__init__() + assert stride in [1, 2] + self.use_pool = not stride == 1 + self.use_double_block = double_channels is not None + self.conv_dw = [] + if use_5x5kernel: + self.conv_dw.append( + self.add_sublayer( + name + "1_dw", + ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels1, + kernel_size=5, + stride=stride, + padding=2, + num_groups=out_channels1, + name=name + "1_dw"))) + else: + self.conv_dw.append( + self.add_sublayer( + name + "1_dw_1", + ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels1, + kernel_size=3, + stride=1, + padding=1, + num_groups=out_channels1, + name=name + "1_dw_1"))) + self.conv_dw.append( + self.add_sublayer( + name + "1_dw_2", + ConvBNLayer( + in_channels=out_channels1, + out_channels=out_channels1, + kernel_size=3, + stride=stride, + padding=1, + num_groups=out_channels1, + name=name + "1_dw_2"))) + self.act = act if self.use_double_block else None + self.conv_pw = ConvBNLayer( + in_channels=out_channels1, + out_channels=out_channels2, + kernel_size=1, + stride=1, + padding=0, + act=self.act, + name=name + "1_sep") + if self.use_double_block: + self.conv_dw2 = [] + if use_5x5kernel: + self.conv_dw2.append( + self.add_sublayer( + name + "2_dw", + ConvBNLayer( + in_channels=out_channels2, + out_channels=out_channels2, + kernel_size=5, + stride=1, + padding=2, + num_groups=out_channels2, + name=name + "2_dw"))) + else: + self.conv_dw2.append( + self.add_sublayer( + name + "2_dw_1", + ConvBNLayer( + in_channels=out_channels2, + out_channels=out_channels2, + kernel_size=3, + stride=1, + padding=1, + num_groups=out_channels2, + name=name + "1_dw_1"))) + self.conv_dw2.append( + self.add_sublayer( + name + "2_dw_2", + ConvBNLayer( + in_channels=out_channels2, + out_channels=out_channels2, + kernel_size=3, + stride=1, + padding=1, + num_groups=out_channels2, + name=name + "2_dw_2"))) + self.conv_pw2 = ConvBNLayer( + in_channels=out_channels2, + out_channels=double_channels, + kernel_size=1, + stride=1, + padding=0, + name=name + "2_sep") + # shortcut + if self.use_pool: + shortcut_channel = double_channels or out_channels2 + self._shortcut = [] + self._shortcut.append( + self.add_sublayer( + name + '_shortcut_pool', + nn.MaxPool2D( + kernel_size=stride, stride=stride, ceil_mode=True))) + self._shortcut.append( + self.add_sublayer( + name + '_shortcut_conv', + ConvBNLayer( + in_channels=in_channels, + out_channels=shortcut_channel, + kernel_size=1, + stride=1, + padding=0, + name="shortcut" + name))) + + def forward(self, x): + y = x + for conv_dw_block in self.conv_dw: + y = conv_dw_block(y) + y = self.conv_pw(y) + if self.use_double_block: + for conv_dw2_block in self.conv_dw2: + y = conv_dw2_block(y) + y = self.conv_pw2(y) + if self.use_pool: + for shortcut in self._shortcut: + x = shortcut(x) + return F.relu(paddle.add(x, y)) + + +@register +@serializable +class BlazeNet(nn.Layer): + """ + BlazeFace, see https://arxiv.org/abs/1907.05047 + + Args: + blaze_filters (list): number of filter for each blaze block. + double_blaze_filters (list): number of filter for each double_blaze block. + use_5x5kernel (bool): whether or not filter size is 5x5 in depth-wise conv. + """ + + def __init__( + self, + blaze_filters=[[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]], + double_blaze_filters=[[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]], + use_5x5kernel=True, + act=None): + super(BlazeNet, self).__init__() + conv1_num_filters = blaze_filters[0][0] + self.conv1 = ConvBNLayer( + in_channels=3, + out_channels=conv1_num_filters, + kernel_size=3, + stride=2, + padding=1, + name="conv1") + in_channels = conv1_num_filters + self.blaze_block = [] + self._out_channels = [] + for k, v in enumerate(blaze_filters): + assert len(v) in [2, 3], \ + "blaze_filters {} not in [2, 3]" + if len(v) == 2: + self.blaze_block.append( + self.add_sublayer( + 'blaze_{}'.format(k), + BlazeBlock( + in_channels, + v[0], + v[1], + use_5x5kernel=use_5x5kernel, + act=act, + name='blaze_{}'.format(k)))) + elif len(v) == 3: + self.blaze_block.append( + self.add_sublayer( + 'blaze_{}'.format(k), + BlazeBlock( + in_channels, + v[0], + v[1], + stride=v[2], + use_5x5kernel=use_5x5kernel, + act=act, + name='blaze_{}'.format(k)))) + in_channels = v[1] + + for k, v in enumerate(double_blaze_filters): + assert len(v) in [3, 4], \ + "blaze_filters {} not in [3, 4]" + if len(v) == 3: + self.blaze_block.append( + self.add_sublayer( + 'double_blaze_{}'.format(k), + BlazeBlock( + in_channels, + v[0], + v[1], + double_channels=v[2], + use_5x5kernel=use_5x5kernel, + act=act, + name='double_blaze_{}'.format(k)))) + elif len(v) == 4: + self.blaze_block.append( + self.add_sublayer( + 'double_blaze_{}'.format(k), + BlazeBlock( + in_channels, + v[0], + v[1], + double_channels=v[2], + stride=v[3], + use_5x5kernel=use_5x5kernel, + act=act, + name='double_blaze_{}'.format(k)))) + in_channels = v[2] + self._out_channels.append(in_channels) + + def forward(self, inputs): + outs = [] + y = self.conv1(inputs['image']) + for block in self.blaze_block: + y = block(y) + outs.append(y) + return [outs[-4], outs[-1]] + + @property + def out_shape(self): + return [ + ShapeSpec(channels=c) + for c in [self._out_channels[-4], self._out_channels[-1]] + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/convnext.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/convnext.py new file mode 100644 index 0000000000000000000000000000000000000000..476e12b2da50585dd142f3049ba024769e691e8b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/convnext.py @@ -0,0 +1,245 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +''' +Modified from https://github.com/facebookresearch/ConvNeXt +Copyright (c) Meta Platforms, Inc. and affiliates. +All rights reserved. +This source code is licensed under the license found in the +LICENSE file in the root directory of this source tree. +''' + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant + +import numpy as np + +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec +from .transformer_utils import DropPath, trunc_normal_, zeros_ + +__all__ = ['ConvNeXt'] + + +class Block(nn.Layer): + r""" ConvNeXt Block. There are two equivalent implementations: + (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) + (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back + We use (2) as we find it slightly faster in Pypaddle + + Args: + dim (int): Number of input channels. + drop_path (float): Stochastic depth rate. Default: 0.0 + layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6. + """ + + def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6): + super().__init__() + self.dwconv = nn.Conv2D( + dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv + self.norm = LayerNorm(dim, eps=1e-6) + self.pwconv1 = nn.Linear( + dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers + self.act = nn.GELU() + self.pwconv2 = nn.Linear(4 * dim, dim) + + if layer_scale_init_value > 0: + self.gamma = self.create_parameter( + shape=(dim, ), + attr=ParamAttr(initializer=Constant(layer_scale_init_value))) + else: + self.gamma = None + + self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity( + ) + + def forward(self, x): + input = x + x = self.dwconv(x) + x = x.transpose([0, 2, 3, 1]) + x = self.norm(x) + x = self.pwconv1(x) + x = self.act(x) + x = self.pwconv2(x) + if self.gamma is not None: + x = self.gamma * x + x = x.transpose([0, 3, 1, 2]) + x = input + self.drop_path(x) + return x + + +class LayerNorm(nn.Layer): + r""" LayerNorm that supports two data formats: channels_last (default) or channels_first. + The ordering of the dimensions in the inputs. channels_last corresponds to inputs with + shape (batch_size, height, width, channels) while channels_first corresponds to inputs + with shape (batch_size, channels, height, width). + """ + + def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"): + super().__init__() + + self.weight = self.create_parameter( + shape=(normalized_shape, ), + attr=ParamAttr(initializer=Constant(1.))) + self.bias = self.create_parameter( + shape=(normalized_shape, ), + attr=ParamAttr(initializer=Constant(0.))) + + self.eps = eps + self.data_format = data_format + if self.data_format not in ["channels_last", "channels_first"]: + raise NotImplementedError + self.normalized_shape = (normalized_shape, ) + + def forward(self, x): + if self.data_format == "channels_last": + return F.layer_norm(x, self.normalized_shape, self.weight, + self.bias, self.eps) + elif self.data_format == "channels_first": + u = x.mean(1, keepdim=True) + s = (x - u).pow(2).mean(1, keepdim=True) + x = (x - u) / paddle.sqrt(s + self.eps) + x = self.weight[:, None, None] * x + self.bias[:, None, None] + return x + + +@register +@serializable +class ConvNeXt(nn.Layer): + r""" ConvNeXt + A Pypaddle impl of : `A ConvNet for the 2020s` - + https://arxiv.org/pdf/2201.03545.pdf + + Args: + in_chans (int): Number of input image channels. Default: 3 + depths (tuple(int)): Number of blocks at each stage. Default: [3, 3, 9, 3] + dims (int): Feature dimension at each stage. Default: [96, 192, 384, 768] + drop_path_rate (float): Stochastic depth rate. Default: 0. + layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6. + """ + + arch_settings = { + 'tiny': { + 'depths': [3, 3, 9, 3], + 'dims': [96, 192, 384, 768] + }, + 'small': { + 'depths': [3, 3, 27, 3], + 'dims': [96, 192, 384, 768] + }, + 'base': { + 'depths': [3, 3, 27, 3], + 'dims': [128, 256, 512, 1024] + }, + 'large': { + 'depths': [3, 3, 27, 3], + 'dims': [192, 384, 768, 1536] + }, + 'xlarge': { + 'depths': [3, 3, 27, 3], + 'dims': [256, 512, 1024, 2048] + }, + } + + def __init__( + self, + arch='tiny', + in_chans=3, + drop_path_rate=0., + layer_scale_init_value=1e-6, + return_idx=[1, 2, 3], + norm_output=True, + pretrained=None, ): + super().__init__() + depths = self.arch_settings[arch]['depths'] + dims = self.arch_settings[arch]['dims'] + self.downsample_layers = nn.LayerList( + ) # stem and 3 intermediate downsampling conv layers + stem = nn.Sequential( + nn.Conv2D( + in_chans, dims[0], kernel_size=4, stride=4), + LayerNorm( + dims[0], eps=1e-6, data_format="channels_first")) + self.downsample_layers.append(stem) + for i in range(3): + downsample_layer = nn.Sequential( + LayerNorm( + dims[i], eps=1e-6, data_format="channels_first"), + nn.Conv2D( + dims[i], dims[i + 1], kernel_size=2, stride=2), ) + self.downsample_layers.append(downsample_layer) + + self.stages = nn.LayerList( + ) # 4 feature resolution stages, each consisting of multiple residual blocks + dp_rates = [x for x in np.linspace(0, drop_path_rate, sum(depths))] + cur = 0 + for i in range(4): + stage = nn.Sequential(* [ + Block( + dim=dims[i], + drop_path=dp_rates[cur + j], + layer_scale_init_value=layer_scale_init_value) + for j in range(depths[i]) + ]) + self.stages.append(stage) + cur += depths[i] + + self.return_idx = return_idx + self.dims = [dims[i] for i in return_idx] # [::-1] + + self.norm_output = norm_output + if norm_output: + self.norms = nn.LayerList([ + LayerNorm( + c, eps=1e-6, data_format="channels_first") + for c in self.dims + ]) + + self.apply(self._init_weights) + + if pretrained is not None: + if 'http' in pretrained: #URL + path = paddle.utils.download.get_weights_path_from_url( + pretrained) + else: #model in local path + path = pretrained + self.set_state_dict(paddle.load(path)) + + def _init_weights(self, m): + if isinstance(m, (nn.Conv2D, nn.Linear)): + trunc_normal_(m.weight) + zeros_(m.bias) + + def forward_features(self, x): + output = [] + for i in range(4): + x = self.downsample_layers[i](x) + x = self.stages[i](x) + output.append(x) + + outputs = [output[i] for i in self.return_idx] + if self.norm_output: + outputs = [self.norms[i](out) for i, out in enumerate(outputs)] + + return outputs + + def forward(self, x): + x = self.forward_features(x['image']) + return x + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self.dims] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/csp_darknet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/csp_darknet.py new file mode 100644 index 0000000000000000000000000000000000000000..4c225d15c8b560385078b19dd3dfafd272858bd4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/csp_darknet.py @@ -0,0 +1,404 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register, serializable +from ppdet.modeling.initializer import conv_init_ +from ..shape_spec import ShapeSpec + +__all__ = [ + 'CSPDarkNet', 'BaseConv', 'DWConv', 'BottleNeck', 'SPPLayer', 'SPPFLayer' +] + + +class BaseConv(nn.Layer): + def __init__(self, + in_channels, + out_channels, + ksize, + stride, + groups=1, + bias=False, + act="silu"): + super(BaseConv, self).__init__() + self.conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size=ksize, + stride=stride, + padding=(ksize - 1) // 2, + groups=groups, + bias_attr=bias) + self.bn = nn.BatchNorm2D( + out_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + self._init_weights() + + def _init_weights(self): + conv_init_(self.conv) + + def forward(self, x): + # use 'x * F.sigmoid(x)' replace 'silu' + x = self.bn(self.conv(x)) + y = x * F.sigmoid(x) + return y + + +class DWConv(nn.Layer): + """Depthwise Conv""" + + def __init__(self, + in_channels, + out_channels, + ksize, + stride=1, + bias=False, + act="silu"): + super(DWConv, self).__init__() + self.dw_conv = BaseConv( + in_channels, + in_channels, + ksize=ksize, + stride=stride, + groups=in_channels, + bias=bias, + act=act) + self.pw_conv = BaseConv( + in_channels, + out_channels, + ksize=1, + stride=1, + groups=1, + bias=bias, + act=act) + + def forward(self, x): + return self.pw_conv(self.dw_conv(x)) + + +class Focus(nn.Layer): + """Focus width and height information into channel space, used in YOLOX.""" + + def __init__(self, + in_channels, + out_channels, + ksize=3, + stride=1, + bias=False, + act="silu"): + super(Focus, self).__init__() + self.conv = BaseConv( + in_channels * 4, + out_channels, + ksize=ksize, + stride=stride, + bias=bias, + act=act) + + def forward(self, inputs): + # inputs [bs, C, H, W] -> outputs [bs, 4C, W/2, H/2] + top_left = inputs[:, :, 0::2, 0::2] + top_right = inputs[:, :, 0::2, 1::2] + bottom_left = inputs[:, :, 1::2, 0::2] + bottom_right = inputs[:, :, 1::2, 1::2] + outputs = paddle.concat( + [top_left, bottom_left, top_right, bottom_right], 1) + return self.conv(outputs) + + +class BottleNeck(nn.Layer): + def __init__(self, + in_channels, + out_channels, + shortcut=True, + expansion=0.5, + depthwise=False, + bias=False, + act="silu"): + super(BottleNeck, self).__init__() + hidden_channels = int(out_channels * expansion) + Conv = DWConv if depthwise else BaseConv + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.conv2 = Conv( + hidden_channels, + out_channels, + ksize=3, + stride=1, + bias=bias, + act=act) + self.add_shortcut = shortcut and in_channels == out_channels + + def forward(self, x): + y = self.conv2(self.conv1(x)) + if self.add_shortcut: + y = y + x + return y + + +class SPPLayer(nn.Layer): + """Spatial Pyramid Pooling (SPP) layer used in YOLOv3-SPP and YOLOX""" + + def __init__(self, + in_channels, + out_channels, + kernel_sizes=(5, 9, 13), + bias=False, + act="silu"): + super(SPPLayer, self).__init__() + hidden_channels = in_channels // 2 + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.maxpoolings = nn.LayerList([ + nn.MaxPool2D( + kernel_size=ks, stride=1, padding=ks // 2) + for ks in kernel_sizes + ]) + conv2_channels = hidden_channels * (len(kernel_sizes) + 1) + self.conv2 = BaseConv( + conv2_channels, out_channels, ksize=1, stride=1, bias=bias, act=act) + + def forward(self, x): + x = self.conv1(x) + x = paddle.concat([x] + [mp(x) for mp in self.maxpoolings], axis=1) + x = self.conv2(x) + return x + + +class SPPFLayer(nn.Layer): + """ Spatial Pyramid Pooling - Fast (SPPF) layer used in YOLOv5 by Glenn Jocher, + equivalent to SPP(k=(5, 9, 13)) + """ + + def __init__(self, + in_channels, + out_channels, + ksize=5, + bias=False, + act='silu'): + super(SPPFLayer, self).__init__() + hidden_channels = in_channels // 2 + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.maxpooling = nn.MaxPool2D( + kernel_size=ksize, stride=1, padding=ksize // 2) + conv2_channels = hidden_channels * 4 + self.conv2 = BaseConv( + conv2_channels, out_channels, ksize=1, stride=1, bias=bias, act=act) + + def forward(self, x): + x = self.conv1(x) + y1 = self.maxpooling(x) + y2 = self.maxpooling(y1) + y3 = self.maxpooling(y2) + concats = paddle.concat([x, y1, y2, y3], axis=1) + out = self.conv2(concats) + return out + + +class CSPLayer(nn.Layer): + """CSP (Cross Stage Partial) layer with 3 convs, named C3 in YOLOv5""" + + def __init__(self, + in_channels, + out_channels, + num_blocks=1, + shortcut=True, + expansion=0.5, + depthwise=False, + bias=False, + act="silu"): + super(CSPLayer, self).__init__() + hidden_channels = int(out_channels * expansion) + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.conv2 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.bottlenecks = nn.Sequential(* [ + BottleNeck( + hidden_channels, + hidden_channels, + shortcut=shortcut, + expansion=1.0, + depthwise=depthwise, + bias=bias, + act=act) for _ in range(num_blocks) + ]) + self.conv3 = BaseConv( + hidden_channels * 2, + out_channels, + ksize=1, + stride=1, + bias=bias, + act=act) + + def forward(self, x): + x_1 = self.conv1(x) + x_1 = self.bottlenecks(x_1) + x_2 = self.conv2(x) + x = paddle.concat([x_1, x_2], axis=1) + x = self.conv3(x) + return x + + +@register +@serializable +class CSPDarkNet(nn.Layer): + """ + CSPDarkNet backbone. + Args: + arch (str): Architecture of CSPDarkNet, from {P5, P6, X}, default as X, + and 'X' means used in YOLOX, 'P5/P6' means used in YOLOv5. + depth_mult (float): Depth multiplier, multiply number of channels in + each layer, default as 1.0. + width_mult (float): Width multiplier, multiply number of blocks in + CSPLayer, default as 1.0. + depthwise (bool): Whether to use depth-wise conv layer. + act (str): Activation function type, default as 'silu'. + return_idx (list): Index of stages whose feature maps are returned. + """ + + __shared__ = ['depth_mult', 'width_mult', 'act', 'trt'] + + # in_channels, out_channels, num_blocks, add_shortcut, use_spp(use_sppf) + # 'X' means setting used in YOLOX, 'P5/P6' means setting used in YOLOv5. + arch_settings = { + 'X': [[64, 128, 3, True, False], [128, 256, 9, True, False], + [256, 512, 9, True, False], [512, 1024, 3, False, True]], + 'P5': [[64, 128, 3, True, False], [128, 256, 6, True, False], + [256, 512, 9, True, False], [512, 1024, 3, True, True]], + 'P6': [[64, 128, 3, True, False], [128, 256, 6, True, False], + [256, 512, 9, True, False], [512, 768, 3, True, False], + [768, 1024, 3, True, True]], + } + + def __init__(self, + arch='X', + depth_mult=1.0, + width_mult=1.0, + depthwise=False, + act='silu', + trt=False, + return_idx=[2, 3, 4]): + super(CSPDarkNet, self).__init__() + self.arch = arch + self.return_idx = return_idx + Conv = DWConv if depthwise else BaseConv + arch_setting = self.arch_settings[arch] + base_channels = int(arch_setting[0][0] * width_mult) + + # Note: differences between the latest YOLOv5 and the original YOLOX + # 1. self.stem, use SPPF(in YOLOv5) or SPP(in YOLOX) + # 2. use SPPF(in YOLOv5) or SPP(in YOLOX) + # 3. put SPPF before(YOLOv5) or SPP after(YOLOX) the last cspdark block's CSPLayer + # 4. whether SPPF(SPP)'CSPLayer add shortcut, True in YOLOv5, False in YOLOX + if arch in ['P5', 'P6']: + # in the latest YOLOv5, use Conv stem, and SPPF (fast, only single spp kernal size) + self.stem = Conv( + 3, base_channels, ksize=6, stride=2, bias=False, act=act) + spp_kernal_sizes = 5 + elif arch in ['X']: + # in the original YOLOX, use Focus stem, and SPP (three spp kernal sizes) + self.stem = Focus( + 3, base_channels, ksize=3, stride=1, bias=False, act=act) + spp_kernal_sizes = (5, 9, 13) + else: + raise AttributeError("Unsupported arch type: {}".format(arch)) + + _out_channels = [base_channels] + layers_num = 1 + self.csp_dark_blocks = [] + + for i, (in_channels, out_channels, num_blocks, shortcut, + use_spp) in enumerate(arch_setting): + in_channels = int(in_channels * width_mult) + out_channels = int(out_channels * width_mult) + _out_channels.append(out_channels) + num_blocks = max(round(num_blocks * depth_mult), 1) + stage = [] + + conv_layer = self.add_sublayer( + 'layers{}.stage{}.conv_layer'.format(layers_num, i + 1), + Conv( + in_channels, out_channels, 3, 2, bias=False, act=act)) + stage.append(conv_layer) + layers_num += 1 + + if use_spp and arch in ['X']: + # in YOLOX use SPPLayer + spp_layer = self.add_sublayer( + 'layers{}.stage{}.spp_layer'.format(layers_num, i + 1), + SPPLayer( + out_channels, + out_channels, + kernel_sizes=spp_kernal_sizes, + bias=False, + act=act)) + stage.append(spp_layer) + layers_num += 1 + + csp_layer = self.add_sublayer( + 'layers{}.stage{}.csp_layer'.format(layers_num, i + 1), + CSPLayer( + out_channels, + out_channels, + num_blocks=num_blocks, + shortcut=shortcut, + depthwise=depthwise, + bias=False, + act=act)) + stage.append(csp_layer) + layers_num += 1 + + if use_spp and arch in ['P5', 'P6']: + # in latest YOLOv5 use SPPFLayer instead of SPPLayer + sppf_layer = self.add_sublayer( + 'layers{}.stage{}.sppf_layer'.format(layers_num, i + 1), + SPPFLayer( + out_channels, + out_channels, + ksize=5, + bias=False, + act=act)) + stage.append(sppf_layer) + layers_num += 1 + + self.csp_dark_blocks.append(nn.Sequential(*stage)) + + self._out_channels = [_out_channels[i] for i in self.return_idx] + self.strides = [[2, 4, 8, 16, 32, 64][i] for i in self.return_idx] + + def forward(self, inputs): + x = inputs['image'] + outputs = [] + x = self.stem(x) + for i, layer in enumerate(self.csp_dark_blocks): + x = layer(x) + if i + 1 in self.return_idx: + outputs.append(x) + return outputs + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=c, stride=s) + for c, s in zip(self._out_channels, self.strides) + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/cspresnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/cspresnet.py new file mode 100644 index 0000000000000000000000000000000000000000..5268ec835381052988b9ceaca47c89ab2755bec9 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/cspresnet.py @@ -0,0 +1,321 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Constant + +from ppdet.modeling.ops import get_act_fn +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['CSPResNet', 'BasicBlock', 'EffectiveSELayer', 'ConvBNLayer'] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size=3, + stride=1, + groups=1, + padding=0, + act=None): + super(ConvBNLayer, self).__init__() + + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=padding, + groups=groups, + bias_attr=False) + + self.bn = nn.BatchNorm2D( + ch_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.act(x) + + return x + + +class RepVggBlock(nn.Layer): + def __init__(self, ch_in, ch_out, act='relu', alpha=False): + super(RepVggBlock, self).__init__() + self.ch_in = ch_in + self.ch_out = ch_out + self.conv1 = ConvBNLayer( + ch_in, ch_out, 3, stride=1, padding=1, act=None) + self.conv2 = ConvBNLayer( + ch_in, ch_out, 1, stride=1, padding=0, act=None) + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + if alpha: + self.alpha = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=1.)), + dtype="float32") + else: + self.alpha = None + + def forward(self, x): + if hasattr(self, 'conv'): + y = self.conv(x) + else: + if self.alpha: + y = self.conv1(x) + self.alpha * self.conv2(x) + else: + y = self.conv1(x) + self.conv2(x) + y = self.act(y) + return y + + def convert_to_deploy(self): + if not hasattr(self, 'conv'): + self.conv = nn.Conv2D( + in_channels=self.ch_in, + out_channels=self.ch_out, + kernel_size=3, + stride=1, + padding=1, + groups=1) + kernel, bias = self.get_equivalent_kernel_bias() + self.conv.weight.set_value(kernel) + self.conv.bias.set_value(bias) + self.__delattr__('conv1') + self.__delattr__('conv2') + + def get_equivalent_kernel_bias(self): + kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1) + kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2) + if self.alpha: + return kernel3x3 + self.alpha * self._pad_1x1_to_3x3_tensor( + kernel1x1), bias3x3 + self.alpha * bias1x1 + else: + return kernel3x3 + self._pad_1x1_to_3x3_tensor( + kernel1x1), bias3x3 + bias1x1 + + def _pad_1x1_to_3x3_tensor(self, kernel1x1): + if kernel1x1 is None: + return 0 + else: + return nn.functional.pad(kernel1x1, [1, 1, 1, 1]) + + def _fuse_bn_tensor(self, branch): + if branch is None: + return 0, 0 + kernel = branch.conv.weight + running_mean = branch.bn._mean + running_var = branch.bn._variance + gamma = branch.bn.weight + beta = branch.bn.bias + eps = branch.bn._epsilon + std = (running_var + eps).sqrt() + t = (gamma / std).reshape((-1, 1, 1, 1)) + return kernel * t, beta - running_mean * gamma / std + + +class BasicBlock(nn.Layer): + def __init__(self, + ch_in, + ch_out, + act='relu', + shortcut=True, + use_alpha=False): + super(BasicBlock, self).__init__() + assert ch_in == ch_out + self.conv1 = ConvBNLayer(ch_in, ch_out, 3, stride=1, padding=1, act=act) + self.conv2 = RepVggBlock(ch_out, ch_out, act=act, alpha=use_alpha) + self.shortcut = shortcut + + def forward(self, x): + y = self.conv1(x) + y = self.conv2(y) + if self.shortcut: + return paddle.add(x, y) + else: + return y + + +class EffectiveSELayer(nn.Layer): + """ Effective Squeeze-Excitation + From `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667 + """ + + def __init__(self, channels, act='hardsigmoid'): + super(EffectiveSELayer, self).__init__() + self.fc = nn.Conv2D(channels, channels, kernel_size=1, padding=0) + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + + def forward(self, x): + x_se = x.mean((2, 3), keepdim=True) + x_se = self.fc(x_se) + return x * self.act(x_se) + + +class CSPResStage(nn.Layer): + def __init__(self, + block_fn, + ch_in, + ch_out, + n, + stride, + act='relu', + attn='eca', + use_alpha=False): + super(CSPResStage, self).__init__() + + ch_mid = (ch_in + ch_out) // 2 + if stride == 2: + self.conv_down = ConvBNLayer( + ch_in, ch_mid, 3, stride=2, padding=1, act=act) + else: + self.conv_down = None + self.conv1 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act) + self.conv2 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act) + self.blocks = nn.Sequential(*[ + block_fn( + ch_mid // 2, + ch_mid // 2, + act=act, + shortcut=True, + use_alpha=use_alpha) for i in range(n) + ]) + if attn: + self.attn = EffectiveSELayer(ch_mid, act='hardsigmoid') + else: + self.attn = None + + self.conv3 = ConvBNLayer(ch_mid, ch_out, 1, act=act) + + def forward(self, x): + if self.conv_down is not None: + x = self.conv_down(x) + y1 = self.conv1(x) + y2 = self.blocks(self.conv2(x)) + y = paddle.concat([y1, y2], axis=1) + if self.attn is not None: + y = self.attn(y) + y = self.conv3(y) + return y + + +@register +@serializable +class CSPResNet(nn.Layer): + __shared__ = ['width_mult', 'depth_mult', 'trt'] + + def __init__(self, + layers=[3, 6, 6, 3], + channels=[64, 128, 256, 512, 1024], + act='swish', + return_idx=[1, 2, 3], + depth_wise=False, + use_large_stem=False, + width_mult=1.0, + depth_mult=1.0, + trt=False, + use_checkpoint=False, + use_alpha=False, + **args): + super(CSPResNet, self).__init__() + self.use_checkpoint = use_checkpoint + channels = [max(round(c * width_mult), 1) for c in channels] + layers = [max(round(l * depth_mult), 1) for l in layers] + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + + if use_large_stem: + self.stem = nn.Sequential( + ('conv1', ConvBNLayer( + 3, channels[0] // 2, 3, stride=2, padding=1, act=act)), + ('conv2', ConvBNLayer( + channels[0] // 2, + channels[0] // 2, + 3, + stride=1, + padding=1, + act=act)), ('conv3', ConvBNLayer( + channels[0] // 2, + channels[0], + 3, + stride=1, + padding=1, + act=act))) + else: + self.stem = nn.Sequential( + ('conv1', ConvBNLayer( + 3, channels[0] // 2, 3, stride=2, padding=1, act=act)), + ('conv2', ConvBNLayer( + channels[0] // 2, + channels[0], + 3, + stride=1, + padding=1, + act=act))) + + n = len(channels) - 1 + self.stages = nn.Sequential(*[(str(i), CSPResStage( + BasicBlock, + channels[i], + channels[i + 1], + layers[i], + 2, + act=act, + use_alpha=use_alpha)) for i in range(n)]) + + self._out_channels = channels[1:] + self._out_strides = [4 * 2**i for i in range(n)] + self.return_idx = return_idx + if use_checkpoint: + paddle.seed(0) + + def forward(self, inputs): + x = inputs['image'] + x = self.stem(x) + outs = [] + for idx, stage in enumerate(self.stages): + if self.use_checkpoint and self.training: + x = paddle.distributed.fleet.utils.recompute( + stage, x, **{"preserve_rng_state": True}) + else: + x = stage(x) + if idx in self.return_idx: + outs.append(x) + + return outs + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self._out_channels[i], stride=self._out_strides[i]) + for i in self.return_idx + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/darknet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/darknet.py new file mode 100644 index 0000000000000000000000000000000000000000..c68c65027e83e0c1b353d05c1795ed7a622438a4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/darknet.py @@ -0,0 +1,345 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register, serializable +from ppdet.modeling.ops import batch_norm, mish +from ..shape_spec import ShapeSpec + +__all__ = ['DarkNet', 'ConvBNLayer'] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size=3, + stride=1, + groups=1, + padding=0, + norm_type='bn', + norm_decay=0., + act="leaky", + freeze_norm=False, + data_format='NCHW', + name=''): + """ + conv + bn + activation layer + + Args: + ch_in (int): input channel + ch_out (int): output channel + filter_size (int): filter size, default 3 + stride (int): stride, default 1 + groups (int): number of groups of conv layer, default 1 + padding (int): padding size, default 0 + norm_type (str): batch norm type, default bn + norm_decay (str): decay for weight and bias of batch norm layer, default 0. + act (str): activation function type, default 'leaky', which means leaky_relu + freeze_norm (bool): whether to freeze norm, default False + data_format (str): data format, NCHW or NHWC + """ + super(ConvBNLayer, self).__init__() + + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=padding, + groups=groups, + data_format=data_format, + bias_attr=False) + self.batch_norm = batch_norm( + ch_out, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + self.act = act + + def forward(self, inputs): + out = self.conv(inputs) + out = self.batch_norm(out) + if self.act == 'leaky': + out = F.leaky_relu(out, 0.1) + else: + out = getattr(F, self.act)(out) + return out + + +class DownSample(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size=3, + stride=2, + padding=1, + norm_type='bn', + norm_decay=0., + freeze_norm=False, + data_format='NCHW'): + """ + downsample layer + + Args: + ch_in (int): input channel + ch_out (int): output channel + filter_size (int): filter size, default 3 + stride (int): stride, default 2 + padding (int): padding size, default 1 + norm_type (str): batch norm type, default bn + norm_decay (str): decay for weight and bias of batch norm layer, default 0. + freeze_norm (bool): whether to freeze norm, default False + data_format (str): data format, NCHW or NHWC + """ + + super(DownSample, self).__init__() + + self.conv_bn_layer = ConvBNLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=filter_size, + stride=stride, + padding=padding, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + self.ch_out = ch_out + + def forward(self, inputs): + out = self.conv_bn_layer(inputs) + return out + + +class BasicBlock(nn.Layer): + def __init__(self, + ch_in, + ch_out, + norm_type='bn', + norm_decay=0., + freeze_norm=False, + data_format='NCHW'): + """ + BasicBlock layer of DarkNet + + Args: + ch_in (int): input channel + ch_out (int): output channel + norm_type (str): batch norm type, default bn + norm_decay (str): decay for weight and bias of batch norm layer, default 0. + freeze_norm (bool): whether to freeze norm, default False + data_format (str): data format, NCHW or NHWC + """ + + super(BasicBlock, self).__init__() + + assert ch_in == ch_out and (ch_in % 2) == 0, \ + f"ch_in and ch_out should be the same even int, but the input \'ch_in is {ch_in}, \'ch_out is {ch_out}" + # example: + # --------------{conv1} --> {conv2} + # channel route: 10-->5 --> 5-->10 + self.conv1 = ConvBNLayer( + ch_in=ch_in, + ch_out=int(ch_out / 2), + filter_size=1, + stride=1, + padding=0, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + self.conv2 = ConvBNLayer( + ch_in=int(ch_out / 2), + ch_out=ch_out, + filter_size=3, + stride=1, + padding=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + + def forward(self, inputs): + conv1 = self.conv1(inputs) + conv2 = self.conv2(conv1) + out = paddle.add(x=inputs, y=conv2) + return out + + +class Blocks(nn.Layer): + def __init__(self, + ch_in, + ch_out, + count, + norm_type='bn', + norm_decay=0., + freeze_norm=False, + name=None, + data_format='NCHW'): + """ + Blocks layer, which consist of some BaickBlock layers + + Args: + ch_in (int): input channel + ch_out (int): output channel + count (int): number of BasicBlock layer + norm_type (str): batch norm type, default bn + norm_decay (str): decay for weight and bias of batch norm layer, default 0. + freeze_norm (bool): whether to freeze norm, default False + name (str): layer name + data_format (str): data format, NCHW or NHWC + """ + super(Blocks, self).__init__() + + self.basicblock0 = BasicBlock( + ch_in, + ch_out, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + self.res_out_list = [] + for i in range(1, count): + block_name = '{}.{}'.format(name, i) + res_out = self.add_sublayer( + block_name, + BasicBlock( + ch_out, + ch_out, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format)) + self.res_out_list.append(res_out) + self.ch_out = ch_out + + def forward(self, inputs): + y = self.basicblock0(inputs) + for basic_block_i in self.res_out_list: + y = basic_block_i(y) + return y + + +DarkNet_cfg = {53: ([1, 2, 8, 8, 4])} + + +@register +@serializable +class DarkNet(nn.Layer): + __shared__ = ['norm_type', 'data_format'] + + def __init__(self, + depth=53, + freeze_at=-1, + return_idx=[2, 3, 4], + num_stages=5, + norm_type='bn', + norm_decay=0., + freeze_norm=False, + data_format='NCHW'): + """ + Darknet, see https://pjreddie.com/darknet/yolo/ + + Args: + depth (int): depth of network + freeze_at (int): freeze the backbone at which stage + filter_size (int): filter size, default 3 + return_idx (list): index of stages whose feature maps are returned + norm_type (str): batch norm type, default bn + norm_decay (str): decay for weight and bias of batch norm layer, default 0. + data_format (str): data format, NCHW or NHWC + """ + super(DarkNet, self).__init__() + self.depth = depth + self.freeze_at = freeze_at + self.return_idx = return_idx + self.num_stages = num_stages + self.stages = DarkNet_cfg[self.depth][0:num_stages] + + self.conv0 = ConvBNLayer( + ch_in=3, + ch_out=32, + filter_size=3, + stride=1, + padding=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + + self.downsample0 = DownSample( + ch_in=32, + ch_out=32 * 2, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format) + + self._out_channels = [] + self.darknet_conv_block_list = [] + self.downsample_list = [] + ch_in = [64, 128, 256, 512, 1024] + for i, stage in enumerate(self.stages): + name = 'stage.{}'.format(i) + conv_block = self.add_sublayer( + name, + Blocks( + int(ch_in[i]), + int(ch_in[i]), + stage, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format, + name=name)) + self.darknet_conv_block_list.append(conv_block) + if i in return_idx: + self._out_channels.append(int(ch_in[i])) + for i in range(num_stages - 1): + down_name = 'stage.{}.downsample'.format(i) + downsample = self.add_sublayer( + down_name, + DownSample( + ch_in=int(ch_in[i]), + ch_out=int(ch_in[i + 1]), + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + data_format=data_format)) + self.downsample_list.append(downsample) + + def forward(self, inputs): + x = inputs['image'] + + out = self.conv0(x) + out = self.downsample0(out) + blocks = [] + for i, conv_block_i in enumerate(self.darknet_conv_block_list): + out = conv_block_i(out) + if i == self.freeze_at: + out.stop_gradient = True + if i in self.return_idx: + blocks.append(out) + if i < self.num_stages - 1: + out = self.downsample_list[i](out) + return blocks + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/dla.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/dla.py new file mode 100644 index 0000000000000000000000000000000000000000..51d1f0782f760839b5320e272a72ca765f47fd79 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/dla.py @@ -0,0 +1,283 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ..shape_spec import ShapeSpec + +DLA_cfg = {34: ([1, 1, 1, 2, 2, 1], [16, 32, 64, 128, 256, 512]), } + + +class BasicBlock(nn.Layer): + def __init__(self, ch_in, ch_out, stride=1): + super(BasicBlock, self).__init__() + self.conv1 = ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=stride, + bias_on=False, + norm_decay=None) + self.conv2 = ConvNormLayer( + ch_out, + ch_out, + filter_size=3, + stride=1, + bias_on=False, + norm_decay=None) + + def forward(self, inputs, residual=None): + if residual is None: + residual = inputs + + out = self.conv1(inputs) + out = F.relu(out) + + out = self.conv2(out) + + out = paddle.add(x=out, y=residual) + out = F.relu(out) + + return out + + +class Root(nn.Layer): + def __init__(self, ch_in, ch_out, kernel_size, residual): + super(Root, self).__init__() + self.conv = ConvNormLayer( + ch_in, + ch_out, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + self.residual = residual + + def forward(self, inputs): + children = inputs + out = self.conv(paddle.concat(inputs, axis=1)) + if self.residual: + out = paddle.add(x=out, y=children[0]) + out = F.relu(out) + + return out + + +class Tree(nn.Layer): + def __init__(self, + level, + block, + ch_in, + ch_out, + stride=1, + level_root=False, + root_dim=0, + root_kernel_size=1, + root_residual=False): + super(Tree, self).__init__() + if root_dim == 0: + root_dim = 2 * ch_out + if level_root: + root_dim += ch_in + if level == 1: + self.tree1 = block(ch_in, ch_out, stride) + self.tree2 = block(ch_out, ch_out, 1) + else: + self.tree1 = Tree( + level - 1, + block, + ch_in, + ch_out, + stride, + root_dim=0, + root_kernel_size=root_kernel_size, + root_residual=root_residual) + self.tree2 = Tree( + level - 1, + block, + ch_out, + ch_out, + 1, + root_dim=root_dim + ch_out, + root_kernel_size=root_kernel_size, + root_residual=root_residual) + + if level == 1: + self.root = Root(root_dim, ch_out, root_kernel_size, root_residual) + self.level_root = level_root + self.root_dim = root_dim + self.downsample = None + self.project = None + self.level = level + if stride > 1: + self.downsample = nn.MaxPool2D(stride, stride=stride) + if ch_in != ch_out: + self.project = ConvNormLayer( + ch_in, + ch_out, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + + def forward(self, x, residual=None, children=None): + children = [] if children is None else children + bottom = self.downsample(x) if self.downsample else x + residual = self.project(bottom) if self.project else bottom + if self.level_root: + children.append(bottom) + x1 = self.tree1(x, residual) + if self.level == 1: + x2 = self.tree2(x1) + x = self.root([x2, x1] + children) + else: + children.append(x1) + x = self.tree2(x1, children=children) + return x + + +@register +@serializable +class DLA(nn.Layer): + """ + DLA, see https://arxiv.org/pdf/1707.06484.pdf + + Args: + depth (int): DLA depth, only support 34 now. + residual_root (bool): whether use a reidual layer in the root block + pre_img (bool): add pre_img, only used in CenterTrack + pre_hm (bool): add pre_hm, only used in CenterTrack + """ + + def __init__(self, + depth=34, + residual_root=False, + pre_img=False, + pre_hm=False): + super(DLA, self).__init__() + assert depth == 34, 'Only support DLA with depth of 34 now.' + if depth == 34: + block = BasicBlock + levels, channels = DLA_cfg[depth] + self.channels = channels + self.num_levels = len(levels) + + self.base_layer = nn.Sequential( + ConvNormLayer( + 3, + channels[0], + filter_size=7, + stride=1, + bias_on=False, + norm_decay=None), + nn.ReLU()) + self.level0 = self._make_conv_level(channels[0], channels[0], levels[0]) + self.level1 = self._make_conv_level( + channels[0], channels[1], levels[1], stride=2) + self.level2 = Tree( + levels[2], + block, + channels[1], + channels[2], + 2, + level_root=False, + root_residual=residual_root) + self.level3 = Tree( + levels[3], + block, + channels[2], + channels[3], + 2, + level_root=True, + root_residual=residual_root) + self.level4 = Tree( + levels[4], + block, + channels[3], + channels[4], + 2, + level_root=True, + root_residual=residual_root) + self.level5 = Tree( + levels[5], + block, + channels[4], + channels[5], + 2, + level_root=True, + root_residual=residual_root) + + if pre_img: + self.pre_img_layer = nn.Sequential( + ConvNormLayer( + 3, + channels[0], + filter_size=7, + stride=1, + bias_on=False, + norm_decay=None), + nn.ReLU()) + if pre_hm: + self.pre_hm_layer = nn.Sequential( + ConvNormLayer( + 1, + channels[0], + filter_size=7, + stride=1, + bias_on=False, + norm_decay=None), + nn.ReLU()) + self.pre_img = pre_img + self.pre_hm = pre_hm + + def _make_conv_level(self, ch_in, ch_out, conv_num, stride=1): + modules = [] + for i in range(conv_num): + modules.extend([ + ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=stride if i == 0 else 1, + bias_on=False, + norm_decay=None), nn.ReLU() + ]) + ch_in = ch_out + return nn.Sequential(*modules) + + @property + def out_shape(self): + return [ + ShapeSpec(channels=self.channels[i]) for i in range(self.num_levels) + ] + + def forward(self, inputs): + outs = [] + feats = self.base_layer(inputs['image']) + + if self.pre_img and 'pre_image' in inputs and inputs[ + 'pre_image'] is not None: + feats = feats + self.pre_img_layer(inputs['pre_image']) + + if self.pre_hm and 'pre_hm' in inputs and inputs['pre_hm'] is not None: + feats = feats + self.pre_hm_layer(inputs['pre_hm']) + + for i in range(self.num_levels): + feats = getattr(self, 'level{}'.format(i))(feats) + outs.append(feats) + + return outs diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/esnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/esnet.py new file mode 100644 index 0000000000000000000000000000000000000000..2b3f3c54a2f3bbc50caba2a86dd82b96bf689ba4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/esnet.py @@ -0,0 +1,290 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn import Conv2D, MaxPool2D, AdaptiveAvgPool2D, BatchNorm +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec +from ppdet.modeling.ops import channel_shuffle +from ppdet.modeling.backbones.shufflenet_v2 import ConvBNLayer + +__all__ = ['ESNet'] + + +def make_divisible(v, divisor=16, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class SEModule(nn.Layer): + def __init__(self, channel, reduction=4): + super(SEModule, self).__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D( + in_channels=channel, + out_channels=channel // reduction, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(), + bias_attr=ParamAttr()) + self.conv2 = Conv2D( + in_channels=channel // reduction, + out_channels=channel, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(), + bias_attr=ParamAttr()) + + def forward(self, inputs): + outputs = self.avg_pool(inputs) + outputs = self.conv1(outputs) + outputs = F.relu(outputs) + outputs = self.conv2(outputs) + outputs = F.hardsigmoid(outputs) + return paddle.multiply(x=inputs, y=outputs) + + +class InvertedResidual(nn.Layer): + def __init__(self, + in_channels, + mid_channels, + out_channels, + stride, + act="relu"): + super(InvertedResidual, self).__init__() + self._conv_pw = ConvBNLayer( + in_channels=in_channels // 2, + out_channels=mid_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + self._conv_dw = ConvBNLayer( + in_channels=mid_channels // 2, + out_channels=mid_channels // 2, + kernel_size=3, + stride=stride, + padding=1, + groups=mid_channels // 2, + act=None) + self._se = SEModule(mid_channels) + + self._conv_linear = ConvBNLayer( + in_channels=mid_channels, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + + def forward(self, inputs): + x1, x2 = paddle.split( + inputs, + num_or_sections=[inputs.shape[1] // 2, inputs.shape[1] // 2], + axis=1) + x2 = self._conv_pw(x2) + x3 = self._conv_dw(x2) + x3 = paddle.concat([x2, x3], axis=1) + x3 = self._se(x3) + x3 = self._conv_linear(x3) + out = paddle.concat([x1, x3], axis=1) + return channel_shuffle(out, 2) + + +class InvertedResidualDS(nn.Layer): + def __init__(self, + in_channels, + mid_channels, + out_channels, + stride, + act="relu"): + super(InvertedResidualDS, self).__init__() + + # branch1 + self._conv_dw_1 = ConvBNLayer( + in_channels=in_channels, + out_channels=in_channels, + kernel_size=3, + stride=stride, + padding=1, + groups=in_channels, + act=None) + self._conv_linear_1 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + # branch2 + self._conv_pw_2 = ConvBNLayer( + in_channels=in_channels, + out_channels=mid_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + self._conv_dw_2 = ConvBNLayer( + in_channels=mid_channels // 2, + out_channels=mid_channels // 2, + kernel_size=3, + stride=stride, + padding=1, + groups=mid_channels // 2, + act=None) + self._se = SEModule(mid_channels // 2) + self._conv_linear_2 = ConvBNLayer( + in_channels=mid_channels // 2, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + self._conv_dw_mv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=1, + padding=1, + groups=out_channels, + act="hard_swish") + self._conv_pw_mv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act="hard_swish") + + def forward(self, inputs): + x1 = self._conv_dw_1(inputs) + x1 = self._conv_linear_1(x1) + x2 = self._conv_pw_2(inputs) + x2 = self._conv_dw_2(x2) + x2 = self._se(x2) + x2 = self._conv_linear_2(x2) + out = paddle.concat([x1, x2], axis=1) + out = self._conv_dw_mv1(out) + out = self._conv_pw_mv1(out) + + return out + + +@register +@serializable +class ESNet(nn.Layer): + def __init__(self, + scale=1.0, + act="hard_swish", + feature_maps=[4, 11, 14], + channel_ratio=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]): + super(ESNet, self).__init__() + self.scale = scale + if isinstance(feature_maps, Integral): + feature_maps = [feature_maps] + self.feature_maps = feature_maps + stage_repeats = [3, 7, 3] + + stage_out_channels = [ + -1, 24, make_divisible(128 * scale), make_divisible(256 * scale), + make_divisible(512 * scale), 1024 + ] + + self._out_channels = [] + self._feature_idx = 0 + # 1. conv1 + self._conv1 = ConvBNLayer( + in_channels=3, + out_channels=stage_out_channels[1], + kernel_size=3, + stride=2, + padding=1, + act=act) + self._max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1) + self._feature_idx += 1 + + # 2. bottleneck sequences + self._block_list = [] + arch_idx = 0 + for stage_id, num_repeat in enumerate(stage_repeats): + for i in range(num_repeat): + channels_scales = channel_ratio[arch_idx] + mid_c = make_divisible( + int(stage_out_channels[stage_id + 2] * channels_scales), + divisor=8) + if i == 0: + block = self.add_sublayer( + name=str(stage_id + 2) + '_' + str(i + 1), + sublayer=InvertedResidualDS( + in_channels=stage_out_channels[stage_id + 1], + mid_channels=mid_c, + out_channels=stage_out_channels[stage_id + 2], + stride=2, + act=act)) + else: + block = self.add_sublayer( + name=str(stage_id + 2) + '_' + str(i + 1), + sublayer=InvertedResidual( + in_channels=stage_out_channels[stage_id + 2], + mid_channels=mid_c, + out_channels=stage_out_channels[stage_id + 2], + stride=1, + act=act)) + self._block_list.append(block) + arch_idx += 1 + self._feature_idx += 1 + self._update_out_channels(stage_out_channels[stage_id + 2], + self._feature_idx, self.feature_maps) + + def _update_out_channels(self, channel, feature_idx, feature_maps): + if feature_idx in feature_maps: + self._out_channels.append(channel) + + def forward(self, inputs): + y = self._conv1(inputs['image']) + y = self._max_pool(y) + outs = [] + for i, inv in enumerate(self._block_list): + y = inv(y) + if i + 2 in self.feature_maps: + outs.append(y) + + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/ghostnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/ghostnet.py new file mode 100644 index 0000000000000000000000000000000000000000..cd333b4fe0f23be4df85974a7af5744d9641a1e7 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/ghostnet.py @@ -0,0 +1,470 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +from paddle import ParamAttr +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import AdaptiveAvgPool2D, Linear +from paddle.nn.initializer import Uniform + +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec +from .mobilenet_v3 import make_divisible, ConvBNLayer + +__all__ = ['GhostNet'] + + +class ExtraBlockDW(nn.Layer): + def __init__(self, + in_c, + ch_1, + ch_2, + stride, + lr_mult, + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + name=None): + super(ExtraBlockDW, self).__init__() + self.pointwise_conv = ConvBNLayer( + in_c=in_c, + out_c=ch_1, + filter_size=1, + stride=1, + padding=0, + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra1") + self.depthwise_conv = ConvBNLayer( + in_c=ch_1, + out_c=ch_2, + filter_size=3, + stride=stride, + padding=1, # + num_groups=int(ch_1), + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra2_dw") + self.normal_conv = ConvBNLayer( + in_c=ch_2, + out_c=ch_2, + filter_size=1, + stride=1, + padding=0, + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra2_sep") + + def forward(self, inputs): + x = self.pointwise_conv(inputs) + x = self.depthwise_conv(x) + x = self.normal_conv(x) + return x + + +class SEBlock(nn.Layer): + def __init__(self, num_channels, lr_mult, reduction_ratio=4, name=None): + super(SEBlock, self).__init__() + self.pool2d_gap = AdaptiveAvgPool2D(1) + self._num_channels = num_channels + stdv = 1.0 / math.sqrt(num_channels * 1.0) + med_ch = num_channels // reduction_ratio + self.squeeze = Linear( + num_channels, + med_ch, + weight_attr=ParamAttr( + learning_rate=lr_mult, initializer=Uniform(-stdv, stdv)), + bias_attr=ParamAttr(learning_rate=lr_mult)) + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = Linear( + med_ch, + num_channels, + weight_attr=ParamAttr( + learning_rate=lr_mult, initializer=Uniform(-stdv, stdv)), + bias_attr=ParamAttr(learning_rate=lr_mult)) + + def forward(self, inputs): + pool = self.pool2d_gap(inputs) + pool = paddle.squeeze(pool, axis=[2, 3]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = paddle.clip(x=excitation, min=0, max=1) + excitation = paddle.unsqueeze(excitation, axis=[2, 3]) + out = paddle.multiply(inputs, excitation) + return out + + +class GhostModule(nn.Layer): + def __init__(self, + in_channels, + output_channels, + kernel_size=1, + ratio=2, + dw_size=3, + stride=1, + relu=True, + lr_mult=1., + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + name=None): + super(GhostModule, self).__init__() + init_channels = int(math.ceil(output_channels / ratio)) + new_channels = int(init_channels * (ratio - 1)) + self.primary_conv = ConvBNLayer( + in_c=in_channels, + out_c=init_channels, + filter_size=kernel_size, + stride=stride, + padding=int((kernel_size - 1) // 2), + num_groups=1, + act="relu" if relu else None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_primary_conv") + self.cheap_operation = ConvBNLayer( + in_c=init_channels, + out_c=new_channels, + filter_size=dw_size, + stride=1, + padding=int((dw_size - 1) // 2), + num_groups=init_channels, + act="relu" if relu else None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_cheap_operation") + + def forward(self, inputs): + x = self.primary_conv(inputs) + y = self.cheap_operation(x) + out = paddle.concat([x, y], axis=1) + return out + + +class GhostBottleneck(nn.Layer): + def __init__(self, + in_channels, + hidden_dim, + output_channels, + kernel_size, + stride, + use_se, + lr_mult, + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + return_list=False, + name=None): + super(GhostBottleneck, self).__init__() + self._stride = stride + self._use_se = use_se + self._num_channels = in_channels + self._output_channels = output_channels + self.return_list = return_list + + self.ghost_module_1 = GhostModule( + in_channels=in_channels, + output_channels=hidden_dim, + kernel_size=1, + stride=1, + relu=True, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_ghost_module_1") + if stride == 2: + self.depthwise_conv = ConvBNLayer( + in_c=hidden_dim, + out_c=hidden_dim, + filter_size=kernel_size, + stride=stride, + padding=int((kernel_size - 1) // 2), + num_groups=hidden_dim, + act=None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + + "_depthwise_depthwise" # looks strange due to an old typo, will be fixed later. + ) + if use_se: + self.se_block = SEBlock(hidden_dim, lr_mult, name=name + "_se") + self.ghost_module_2 = GhostModule( + in_channels=hidden_dim, + output_channels=output_channels, + kernel_size=1, + relu=False, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_ghost_module_2") + if stride != 1 or in_channels != output_channels: + self.shortcut_depthwise = ConvBNLayer( + in_c=in_channels, + out_c=in_channels, + filter_size=kernel_size, + stride=stride, + padding=int((kernel_size - 1) // 2), + num_groups=in_channels, + act=None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + + "_shortcut_depthwise_depthwise" # looks strange due to an old typo, will be fixed later. + ) + self.shortcut_conv = ConvBNLayer( + in_c=in_channels, + out_c=output_channels, + filter_size=1, + stride=1, + padding=0, + num_groups=1, + act=None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_shortcut_conv") + + def forward(self, inputs): + y = self.ghost_module_1(inputs) + x = y + if self._stride == 2: + x = self.depthwise_conv(x) + if self._use_se: + x = self.se_block(x) + x = self.ghost_module_2(x) + + if self._stride == 1 and self._num_channels == self._output_channels: + shortcut = inputs + else: + shortcut = self.shortcut_depthwise(inputs) + shortcut = self.shortcut_conv(shortcut) + x = paddle.add(x=x, y=shortcut) + + if self.return_list: + return [y, x] + else: + return x + + +@register +@serializable +class GhostNet(nn.Layer): + __shared__ = ['norm_type'] + + def __init__( + self, + scale=1.3, + feature_maps=[6, 12, 15], + with_extra_blocks=False, + extra_block_filters=[[256, 512], [128, 256], [128, 256], [64, 128]], + lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0], + conv_decay=0., + norm_type='bn', + norm_decay=0.0, + freeze_norm=False): + super(GhostNet, self).__init__() + if isinstance(feature_maps, Integral): + feature_maps = [feature_maps] + if norm_type == 'sync_bn' and freeze_norm: + raise ValueError( + "The norm_type should not be sync_bn when freeze_norm is True") + self.feature_maps = feature_maps + self.with_extra_blocks = with_extra_blocks + self.extra_block_filters = extra_block_filters + + inplanes = 16 + self.cfgs = [ + # k, t, c, SE, s + [3, 16, 16, 0, 1], + [3, 48, 24, 0, 2], + [3, 72, 24, 0, 1], + [5, 72, 40, 1, 2], + [5, 120, 40, 1, 1], + [3, 240, 80, 0, 2], + [3, 200, 80, 0, 1], + [3, 184, 80, 0, 1], + [3, 184, 80, 0, 1], + [3, 480, 112, 1, 1], + [3, 672, 112, 1, 1], + [5, 672, 160, 1, 2], # SSDLite output + [5, 960, 160, 0, 1], + [5, 960, 160, 1, 1], + [5, 960, 160, 0, 1], + [5, 960, 160, 1, 1] + ] + self.scale = scale + conv1_out_ch = int(make_divisible(inplanes * self.scale, 4)) + self.conv1 = ConvBNLayer( + in_c=3, + out_c=conv1_out_ch, + filter_size=3, + stride=2, + padding=1, + num_groups=1, + act="relu", + lr_mult=1., + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="conv1") + + # build inverted residual blocks + self._out_channels = [] + self.ghost_bottleneck_list = [] + idx = 0 + inplanes = conv1_out_ch + for k, exp_size, c, use_se, s in self.cfgs: + lr_idx = min(idx // 3, len(lr_mult_list) - 1) + lr_mult = lr_mult_list[lr_idx] + + # for SSD/SSDLite, first head input is after ResidualUnit expand_conv + return_list = self.with_extra_blocks and idx + 2 in self.feature_maps + + ghost_bottleneck = self.add_sublayer( + "_ghostbottleneck_" + str(idx), + sublayer=GhostBottleneck( + in_channels=inplanes, + hidden_dim=int(make_divisible(exp_size * self.scale, 4)), + output_channels=int(make_divisible(c * self.scale, 4)), + kernel_size=k, + stride=s, + use_se=use_se, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + return_list=return_list, + name="_ghostbottleneck_" + str(idx))) + self.ghost_bottleneck_list.append(ghost_bottleneck) + inplanes = int(make_divisible(c * self.scale, 4)) + idx += 1 + self._update_out_channels( + int(make_divisible(exp_size * self.scale, 4)) + if return_list else inplanes, idx + 1, feature_maps) + + if self.with_extra_blocks: + self.extra_block_list = [] + extra_out_c = int(make_divisible(self.scale * self.cfgs[-1][1], 4)) + lr_idx = min(idx // 3, len(lr_mult_list) - 1) + lr_mult = lr_mult_list[lr_idx] + + conv_extra = self.add_sublayer( + "conv" + str(idx + 2), + sublayer=ConvBNLayer( + in_c=inplanes, + out_c=extra_out_c, + filter_size=1, + stride=1, + padding=0, + num_groups=1, + act="relu6", + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="conv" + str(idx + 2))) + self.extra_block_list.append(conv_extra) + idx += 1 + self._update_out_channels(extra_out_c, idx + 1, feature_maps) + + for j, block_filter in enumerate(self.extra_block_filters): + in_c = extra_out_c if j == 0 else self.extra_block_filters[j - + 1][1] + conv_extra = self.add_sublayer( + "conv" + str(idx + 2), + sublayer=ExtraBlockDW( + in_c, + block_filter[0], + block_filter[1], + stride=2, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name='conv' + str(idx + 2))) + self.extra_block_list.append(conv_extra) + idx += 1 + self._update_out_channels(block_filter[1], idx + 1, + feature_maps) + + def _update_out_channels(self, channel, feature_idx, feature_maps): + if feature_idx in feature_maps: + self._out_channels.append(channel) + + def forward(self, inputs): + x = self.conv1(inputs['image']) + outs = [] + for idx, ghost_bottleneck in enumerate(self.ghost_bottleneck_list): + x = ghost_bottleneck(x) + if idx + 2 in self.feature_maps: + if isinstance(x, list): + outs.append(x[0]) + x = x[1] + else: + outs.append(x) + + if not self.with_extra_blocks: + return outs + + for i, block in enumerate(self.extra_block_list): + idx = i + len(self.ghost_bottleneck_list) + x = block(x) + if idx + 2 in self.feature_maps: + outs.append(x) + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/hardnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/hardnet.py new file mode 100644 index 0000000000000000000000000000000000000000..8615fb6a67f316cace6f2e9fb0132becf52f2d71 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/hardnet.py @@ -0,0 +1,226 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +from ppdet.core.workspace import register +from ..shape_spec import ShapeSpec + +__all__ = ['HarDNet'] + + +def ConvLayer(in_channels, + out_channels, + kernel_size=3, + stride=1, + bias_attr=False): + layer = nn.Sequential( + ('conv', nn.Conv2D( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=kernel_size // 2, + groups=1, + bias_attr=bias_attr)), ('norm', nn.BatchNorm2D(out_channels)), + ('relu', nn.ReLU6())) + return layer + + +def DWConvLayer(in_channels, + out_channels, + kernel_size=3, + stride=1, + bias_attr=False): + layer = nn.Sequential( + ('dwconv', nn.Conv2D( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=1, + groups=out_channels, + bias_attr=bias_attr)), ('norm', nn.BatchNorm2D(out_channels))) + return layer + + +def CombConvLayer(in_channels, out_channels, kernel_size=1, stride=1): + layer = nn.Sequential( + ('layer1', ConvLayer( + in_channels, out_channels, kernel_size=kernel_size)), + ('layer2', DWConvLayer( + out_channels, out_channels, stride=stride))) + return layer + + +class HarDBlock(nn.Layer): + def __init__(self, + in_channels, + growth_rate, + grmul, + n_layers, + keepBase=False, + residual_out=False, + dwconv=False): + super().__init__() + self.keepBase = keepBase + self.links = [] + layers_ = [] + self.out_channels = 0 + for i in range(n_layers): + outch, inch, link = self.get_link(i + 1, in_channels, growth_rate, + grmul) + self.links.append(link) + if dwconv: + layers_.append(CombConvLayer(inch, outch)) + else: + layers_.append(ConvLayer(inch, outch)) + + if (i % 2 == 0) or (i == n_layers - 1): + self.out_channels += outch + self.layers = nn.LayerList(layers_) + + def get_out_ch(self): + return self.out_channels + + def get_link(self, layer, base_ch, growth_rate, grmul): + if layer == 0: + return base_ch, 0, [] + out_channels = growth_rate + + link = [] + for i in range(10): + dv = 2**i + if layer % dv == 0: + k = layer - dv + link.append(k) + if i > 0: + out_channels *= grmul + + out_channels = int(int(out_channels + 1) / 2) * 2 + in_channels = 0 + + for i in link: + ch, _, _ = self.get_link(i, base_ch, growth_rate, grmul) + in_channels += ch + + return out_channels, in_channels, link + + def forward(self, x): + layers_ = [x] + + for layer in range(len(self.layers)): + link = self.links[layer] + tin = [] + for i in link: + tin.append(layers_[i]) + if len(tin) > 1: + x = paddle.concat(tin, 1) + else: + x = tin[0] + out = self.layers[layer](x) + layers_.append(out) + + t = len(layers_) + out_ = [] + for i in range(t): + if (i == 0 and self.keepBase) or (i == t - 1) or (i % 2 == 1): + out_.append(layers_[i]) + out = paddle.concat(out_, 1) + + return out + + +@register +class HarDNet(nn.Layer): + def __init__(self, depth_wise=False, return_idx=[1, 3, 8, 13], arch=85): + super(HarDNet, self).__init__() + assert arch in [68, 85], "HarDNet-{} is not supported.".format(arch) + if arch == 85: + first_ch = [48, 96] + second_kernel = 3 + ch_list = [192, 256, 320, 480, 720] + grmul = 1.7 + gr = [24, 24, 28, 36, 48] + n_layers = [8, 16, 16, 16, 16] + elif arch == 68: + first_ch = [32, 64] + second_kernel = 3 + ch_list = [128, 256, 320, 640] + grmul = 1.7 + gr = [14, 16, 20, 40] + n_layers = [8, 16, 16, 16] + else: + raise ValueError("HarDNet-{} is not supported.".format(arch)) + + self.return_idx = return_idx + self._out_channels = [96, 214, 458, 784] + + avg_pool = True + if depth_wise: + second_kernel = 1 + avg_pool = False + + blks = len(n_layers) + self.base = nn.LayerList([]) + + # First Layer: Standard Conv3x3, Stride=2 + self.base.append( + ConvLayer( + in_channels=3, + out_channels=first_ch[0], + kernel_size=3, + stride=2, + bias_attr=False)) + + # Second Layer + self.base.append( + ConvLayer( + first_ch[0], first_ch[1], kernel_size=second_kernel)) + + # Avgpooling or DWConv3x3 downsampling + if avg_pool: + self.base.append(nn.AvgPool2D(kernel_size=3, stride=2, padding=1)) + else: + self.base.append(DWConvLayer(first_ch[1], first_ch[1], stride=2)) + + # Build all HarDNet blocks + ch = first_ch[1] + for i in range(blks): + blk = HarDBlock(ch, gr[i], grmul, n_layers[i], dwconv=depth_wise) + ch = blk.out_channels + self.base.append(blk) + + if i != blks - 1: + self.base.append(ConvLayer(ch, ch_list[i], kernel_size=1)) + ch = ch_list[i] + if i == 0: + self.base.append( + nn.AvgPool2D( + kernel_size=2, stride=2, ceil_mode=True)) + elif i != blks - 1 and i != 1 and i != 3: + self.base.append(nn.AvgPool2D(kernel_size=2, stride=2)) + + def forward(self, inputs): + x = inputs['image'] + outs = [] + for i, layer in enumerate(self.base): + x = layer(x) + if i in self.return_idx: + outs.append(x) + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=self._out_channels[i]) for i in range(4)] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/hrnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..977edd69e96a90c6d62dcdeb3b4a49c4f5daff4c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/hrnet.py @@ -0,0 +1,869 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import AdaptiveAvgPool2D, Linear +from paddle.regularizer import L2Decay +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Uniform +from numbers import Integral +import math + +from ppdet.core.workspace import register +from ..shape_spec import ShapeSpec + +__all__ = ['HRNet'] + + +class ConvNormLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + stride=1, + norm_type='bn', + norm_groups=32, + use_dcn=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=False, + act=None, + name=None): + super(ConvNormLayer, self).__init__() + assert norm_type in ['bn', 'sync_bn', 'gn'] + + self.act = act + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=False) + + norm_lr = 0. if freeze_norm else 1. + + param_attr = ParamAttr( + learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) + bias_attr = ParamAttr( + learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) + global_stats = True if freeze_norm else None + if norm_type in ['bn', 'sync_bn']: + self.norm = nn.BatchNorm2D( + ch_out, + momentum=norm_momentum, + weight_attr=param_attr, + bias_attr=bias_attr, + use_global_stats=global_stats) + elif norm_type == 'gn': + self.norm = nn.GroupNorm( + num_groups=norm_groups, + num_channels=ch_out, + weight_attr=param_attr, + bias_attr=bias_attr) + norm_params = self.norm.parameters() + if freeze_norm: + for param in norm_params: + param.stop_gradient = True + + def forward(self, inputs): + out = self.conv(inputs) + out = self.norm(out) + + if self.act == 'relu': + out = F.relu(out) + return out + + +class Layer1(nn.Layer): + def __init__(self, + num_channels, + has_se=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(4): + bottleneck_block = self.add_sublayer( + "block_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else 256, + num_filters=64, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, input): + conv = input + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + out = [] + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + ConvNormLayer( + ch_in=in_channels[i], + ch_out=out_channels[i], + filter_size=3, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act='relu', + name=name + '_layer_' + str(i + 1))) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + ConvNormLayer( + ch_in=in_channels[-1], + ch_out=out_channels[i], + filter_size=3, + stride=2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act='relu', + name=name + '_layer_' + str(i + 1))) + self.conv_bn_func_list.append(residual) + + def forward(self, input): + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(input[idx]) + else: + if idx < len(input): + outs.append(conv_bn_func(input[idx])) + else: + outs.append(conv_bn_func(input[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, + block_num, + in_channels, + out_channels, + has_se=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(Branches, self).__init__() + + self.basic_block_list = [] + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(block_num): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + '_branch_layer_' + str(i + 1) + '_' + + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, inputs): + outs = [] + for idx, input in enumerate(inputs): + conv = input + basic_block_list = self.basic_block_list[idx] + for basic_block_func in basic_block_list: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels, + num_filters, + has_se, + stride=1, + downsample=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = ConvNormLayer( + ch_in=num_channels, + ch_out=num_filters, + filter_size=1, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act="relu", + name=name + "_conv1") + self.conv2 = ConvNormLayer( + ch_in=num_filters, + ch_out=num_filters, + filter_size=3, + stride=stride, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act="relu", + name=name + "_conv2") + self.conv3 = ConvNormLayer( + ch_in=num_filters, + ch_out=num_filters * 4, + filter_size=1, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act=None, + name=name + "_conv3") + + if self.downsample: + self.conv_down = ConvNormLayer( + ch_in=num_channels, + ch_out=num_filters * 4, + filter_size=1, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act=None, + name=name + "_downsample") + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, + num_filters=num_filters * 4, + reduction_ratio=16, + name='fc' + name) + + def forward(self, input): + residual = input + conv1 = self.conv1(input) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(input) + + if self.has_se: + conv3 = self.se(conv3) + + y = paddle.add(x=residual, y=conv3) + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels, + num_filters, + stride=1, + has_se=False, + downsample=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + self.conv1 = ConvNormLayer( + ch_in=num_channels, + ch_out=num_filters, + filter_size=3, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + stride=stride, + act="relu", + name=name + "_conv1") + self.conv2 = ConvNormLayer( + ch_in=num_filters, + ch_out=num_filters, + filter_size=3, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + stride=1, + act=None, + name=name + "_conv2") + + if self.downsample: + self.conv_down = ConvNormLayer( + ch_in=num_channels, + ch_out=num_filters * 4, + filter_size=1, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act=None, + name=name + "_downsample") + + if self.has_se: + self.se = SELayer( + num_channels=num_filters, + num_filters=num_filters, + reduction_ratio=16, + name='fc' + name) + + def forward(self, input): + residual = input + conv1 = self.conv1(input) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(input) + + if self.has_se: + conv2 = self.se(conv2) + + y = paddle.add(x=residual, y=conv2) + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels, num_filters, reduction_ratio, name=None): + super(SELayer, self).__init__() + + self.pool2d_gap = AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = Linear( + num_channels, + med_ch, + weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = Linear( + med_ch, + num_filters, + weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv))) + + def forward(self, input): + pool = self.pool2d_gap(input) + pool = paddle.squeeze(pool, axis=[2, 3]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.unsqueeze(excitation, axis=[2, 3]) + out = input * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels, + num_modules, + num_filters, + has_se=False, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + multi_scale_output=True, + name=None): + super(Stage, self).__init__() + + self._num_modules = num_modules + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_filters=num_filters, + has_se=has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + multi_scale_output=False, + name=name + '_' + str(i + 1))) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_filters=num_filters, + has_se=has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + '_' + str(i + 1))) + + self.stage_func_list.append(stage_func) + + def forward(self, input): + out = input + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels, + num_filters, + has_se=False, + multi_scale_output=True, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(HighResolutionModule, self).__init__() + self.branches_func = Branches( + block_num=4, + in_channels=num_channels, + out_channels=num_filters, + has_se=has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name) + + def forward(self, input): + out = self.branches_func(input) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels, + out_channels, + multi_scale_output=True, + norm_momentum=0.9, + norm_decay=0., + freeze_norm=True, + name=None): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + residual_func = None + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + ConvNormLayer( + ch_in=in_channels[j], + ch_out=out_channels[i], + filter_size=1, + stride=1, + act=None, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + '_layer_' + str(i + 1) + '_' + + str(j + 1))) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format( + name, i + 1, j + 1, k + 1), + ConvNormLayer( + ch_in=pre_num_filters, + ch_out=out_channels[i], + filter_size=3, + stride=2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act=None, + name=name + '_layer_' + str(i + 1) + '_' + + str(j + 1) + '_' + str(k + 1))) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format( + name, i + 1, j + 1, k + 1), + ConvNormLayer( + ch_in=pre_num_filters, + ch_out=out_channels[j], + filter_size=3, + stride=2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act="relu", + name=name + '_layer_' + str(i + 1) + '_' + + str(j + 1) + '_' + str(k + 1))) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, input): + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = input[i] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](input[j]) + residual_func_idx += 1 + y = F.interpolate(y, scale_factor=2**(j - i)) + residual = paddle.add(x=residual, y=y) + elif j < i: + y = input[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + residual = paddle.add(x=residual, y=y) + residual = F.relu(residual) + outs.append(residual) + + return outs + + +@register +class HRNet(nn.Layer): + """ + HRNet, see https://arxiv.org/abs/1908.07919 + + Args: + width (int): the width of HRNet + has_se (bool): whether to add SE block for each stage + freeze_at (int): the stage to freeze + freeze_norm (bool): whether to freeze norm in HRNet + norm_momentum (float): momentum of BatchNorm + norm_decay (float): weight decay for normalization layer weights + return_idx (List): the stage to return + upsample (bool): whether to upsample and concat the backbone feats + """ + + def __init__(self, + width=18, + has_se=False, + freeze_at=0, + freeze_norm=True, + norm_momentum=0.9, + norm_decay=0., + return_idx=[0, 1, 2, 3], + upsample=False, + downsample=False): + super(HRNet, self).__init__() + + self.width = width + self.has_se = has_se + if isinstance(return_idx, Integral): + return_idx = [return_idx] + + assert len(return_idx) > 0, "need one or more return index" + self.freeze_at = freeze_at + self.return_idx = return_idx + self.upsample = upsample + self.downsample = downsample + + self.channels = { + 18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]], + 30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]], + 32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]], + 40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]], + 44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]], + 48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]], + 60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]], + 64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]] + } + + channels_2, channels_3, channels_4 = self.channels[width] + num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3 + self._out_channels = [sum(channels_4)] if self.upsample else channels_4 + self._out_strides = [4] if self.upsample else [4, 8, 16, 32] + + self.conv_layer1_1 = ConvNormLayer( + ch_in=3, + ch_out=64, + filter_size=3, + stride=2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act='relu', + name="layer1_1") + + self.conv_layer1_2 = ConvNormLayer( + ch_in=64, + ch_out=64, + filter_size=3, + stride=2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + act='relu', + name="layer1_2") + + self.la1 = Layer1( + num_channels=64, + has_se=has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[256], + out_channels=channels_2, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="tr1") + + self.st2 = Stage( + num_channels=channels_2, + num_modules=num_modules_2, + num_filters=channels_2, + has_se=self.has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="st2") + + self.tr2 = TransitionLayer( + in_channels=channels_2, + out_channels=channels_3, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="tr2") + + self.st3 = Stage( + num_channels=channels_3, + num_modules=num_modules_3, + num_filters=channels_3, + has_se=self.has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="st3") + + self.tr3 = TransitionLayer( + in_channels=channels_3, + out_channels=channels_4, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="tr3") + self.st4 = Stage( + num_channels=channels_4, + num_modules=num_modules_4, + num_filters=channels_4, + has_se=self.has_se, + norm_momentum=norm_momentum, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + multi_scale_output=len(return_idx) > 1, + name="st4") + + if self.downsample: + self.incre_modules, self.downsamp_modules, \ + self.final_layer = self._make_head(channels_4, norm_momentum=norm_momentum, has_se=self.has_se) + + def _make_layer(self, + block, + inplanes, + planes, + blocks, + stride=1, + norm_momentum=0.9, + has_se=False, + name=None): + downsample = None + if stride != 1 or inplanes != planes * 4: + downsample = True + + layers = [] + layers.append( + block( + inplanes, + planes, + has_se, + stride, + downsample, + norm_momentum=norm_momentum, + freeze_norm=False, + name=name + "_s0")) + inplanes = planes * 4 + for i in range(1, blocks): + layers.append( + block( + inplanes, + planes, + has_se, + norm_momentum=norm_momentum, + freeze_norm=False, + name=name + "_s" + str(i))) + + return nn.Sequential(*layers) + + def _make_head(self, pre_stage_channels, norm_momentum=0.9, has_se=False): + head_block = BottleneckBlock + head_channels = [32, 64, 128, 256] + + # Increasing the #channels on each resolution + # from C, 2C, 4C, 8C to 128, 256, 512, 1024 + incre_modules = [] + for i, channels in enumerate(pre_stage_channels): + incre_module = self._make_layer( + head_block, + channels, + head_channels[i], + 1, + stride=1, + norm_momentum=norm_momentum, + has_se=has_se, + name='incre' + str(i)) + incre_modules.append(incre_module) + incre_modules = nn.LayerList(incre_modules) + + # downsampling modules + downsamp_modules = [] + for i in range(len(pre_stage_channels) - 1): + in_channels = head_channels[i] * 4 + out_channels = head_channels[i + 1] * 4 + + downsamp_module = nn.Sequential( + nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=2, + padding=1), + nn.BatchNorm2D( + out_channels, momentum=norm_momentum), + nn.ReLU()) + + downsamp_modules.append(downsamp_module) + downsamp_modules = nn.LayerList(downsamp_modules) + + final_layer = nn.Sequential( + nn.Conv2D( + in_channels=head_channels[3] * 4, + out_channels=2048, + kernel_size=1, + stride=1, + padding=0), + nn.BatchNorm2D( + 2048, momentum=norm_momentum), + nn.ReLU()) + + return incre_modules, downsamp_modules, final_layer + + def forward(self, inputs): + x = inputs['image'] + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + tr2 = self.tr2(st2) + + st3 = self.st3(tr2) + tr3 = self.tr3(st3) + + st4 = self.st4(tr3) + + if self.upsample: + # Upsampling + x0_h, x0_w = st4[0].shape[2:4] + x1 = F.upsample(st4[1], size=(x0_h, x0_w), mode='bilinear') + x2 = F.upsample(st4[2], size=(x0_h, x0_w), mode='bilinear') + x3 = F.upsample(st4[3], size=(x0_h, x0_w), mode='bilinear') + x = paddle.concat([st4[0], x1, x2, x3], 1) + return x + + if self.downsample: + y = self.incre_modules[0](st4[0]) + for i in range(len(self.downsamp_modules)): + y = self.incre_modules[i+1](st4[i+1]) + \ + self.downsamp_modules[i](y) + y = self.final_layer(y) + return y + + res = [] + for i, layer in enumerate(st4): + if i == self.freeze_at: + layer.stop_gradient = True + if i in self.return_idx: + res.append(layer) + + return res + + @property + def out_shape(self): + if self.upsample: + self.return_idx = [0] + return [ + ShapeSpec( + channels=self._out_channels[i], stride=self._out_strides[i]) + for i in self.return_idx + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/lcnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/lcnet.py new file mode 100644 index 0000000000000000000000000000000000000000..76da139ee36be9a5e5c10c08b56989435ff69e9f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/lcnet.py @@ -0,0 +1,271 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D, Conv2D +from paddle.regularizer import L2Decay +from paddle.nn.initializer import KaimingNormal + +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec + +__all__ = ['LCNet'] + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False], ], + "blocks3": [ + [3, 32, 64, 2, False], + [3, 64, 64, 1, False], + ], + "blocks4": [ + [3, 64, 128, 2, False], + [3, 128, 128, 1, False], + ], + "blocks5": [ + [3, 128, 256, 2, False], + [5, 256, 256, 1, False], + [5, 256, 256, 1, False], + [5, 256, 256, 1, False], + [5, 256, 256, 1, False], + [5, 256, 256, 1, False], + ], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(nn.Layer): + def __init__(self, + num_channels, + filter_size, + num_filters, + stride, + num_groups=1, + act='hard_swish'): + super().__init__() + + self.conv = Conv2D( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = nn.BatchNorm2D( + num_filters, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + if act == 'hard_swish': + self.act = nn.Hardswish() + elif act == 'relu6': + self.act = nn.ReLU6() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.act(x) + return x + + +class DepthwiseSeparable(nn.Layer): + def __init__(self, + num_channels, + num_filters, + stride, + dw_size=3, + use_se=False, + act='hard_swish'): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer( + num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels, + act=act) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer( + num_channels=num_channels, + filter_size=1, + num_filters=num_filters, + stride=1, + act=act) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(nn.Layer): + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D( + in_channels=channel, + out_channels=channel // reduction, + kernel_size=1, + stride=1, + padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D( + in_channels=channel // reduction, + out_channels=channel, + kernel_size=1, + stride=1, + padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +@register +@serializable +class LCNet(nn.Layer): + def __init__(self, scale=1.0, feature_maps=[3, 4, 5], act='hard_swish'): + super().__init__() + self.scale = scale + self.feature_maps = feature_maps + + out_channels = [] + + self.conv1 = ConvBNLayer( + num_channels=3, + filter_size=3, + num_filters=make_divisible(16 * scale), + stride=2, + act=act) + + self.blocks2 = nn.Sequential(* [ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se, + act=act) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(* [ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se, + act=act) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + out_channels.append( + make_divisible(NET_CONFIG["blocks3"][-1][2] * scale)) + + self.blocks4 = nn.Sequential(* [ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se, + act=act) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + out_channels.append( + make_divisible(NET_CONFIG["blocks4"][-1][2] * scale)) + + self.blocks5 = nn.Sequential(* [ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se, + act=act) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + out_channels.append( + make_divisible(NET_CONFIG["blocks5"][-1][2] * scale)) + + self.blocks6 = nn.Sequential(* [ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se, + act=act) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + out_channels.append( + make_divisible(NET_CONFIG["blocks6"][-1][2] * scale)) + self._out_channels = [ + ch for idx, ch in enumerate(out_channels) if idx + 2 in feature_maps + ] + + def forward(self, inputs): + x = inputs['image'] + outs = [] + + x = self.conv1(x) + x = self.blocks2(x) + x = self.blocks3(x) + outs.append(x) + x = self.blocks4(x) + outs.append(x) + x = self.blocks5(x) + outs.append(x) + x = self.blocks6(x) + outs.append(x) + outs = [o for i, o in enumerate(outs) if i + 2 in self.feature_maps] + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/lite_hrnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/lite_hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..95e3a2630bbe631040daae5c8381a92be6ba0ae6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/lite_hrnet.py @@ -0,0 +1,891 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on +https://github.com/HRNet/Lite-HRNet/blob/hrnet/models/backbones/litehrnet.py +""" + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from numbers import Integral +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Normal, Constant +from ppdet.core.workspace import register +from ppdet.modeling.shape_spec import ShapeSpec +from ppdet.modeling.ops import channel_shuffle +from .. import layers as L + +__all__ = ['LiteHRNet'] + + +class ConvNormLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + stride=1, + groups=1, + norm_type=None, + norm_groups=32, + norm_decay=0., + freeze_norm=False, + act=None): + super(ConvNormLayer, self).__init__() + self.act = act + norm_lr = 0. if freeze_norm else 1. + if norm_type is not None: + assert norm_type in ['bn', 'sync_bn', 'gn'], \ + "norm_type should be one of ['bn', 'sync_bn', 'gn'], but got {}".format(norm_type) + param_attr = ParamAttr( + initializer=Constant(1.0), + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), ) + bias_attr = ParamAttr( + learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) + global_stats = True if freeze_norm else None + if norm_type in ['bn', 'sync_bn']: + self.norm = nn.BatchNorm2D( + ch_out, + weight_attr=param_attr, + bias_attr=bias_attr, + use_global_stats=global_stats, ) + elif norm_type == 'gn': + self.norm = nn.GroupNorm( + num_groups=norm_groups, + num_channels=ch_out, + weight_attr=param_attr, + bias_attr=bias_attr) + norm_params = self.norm.parameters() + if freeze_norm: + for param in norm_params: + param.stop_gradient = True + conv_bias_attr = False + else: + conv_bias_attr = True + self.norm = None + + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.001)), + bias_attr=conv_bias_attr) + + def forward(self, inputs): + out = self.conv(inputs) + if self.norm is not None: + out = self.norm(out) + + if self.act == 'relu': + out = F.relu(out) + elif self.act == 'sigmoid': + out = F.sigmoid(out) + return out + + +class DepthWiseSeparableConvNormLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + stride=1, + dw_norm_type=None, + pw_norm_type=None, + norm_decay=0., + freeze_norm=False, + dw_act=None, + pw_act=None): + super(DepthWiseSeparableConvNormLayer, self).__init__() + self.depthwise_conv = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_in, + filter_size=filter_size, + stride=stride, + groups=ch_in, + norm_type=dw_norm_type, + act=dw_act, + norm_decay=norm_decay, + freeze_norm=freeze_norm, ) + self.pointwise_conv = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=1, + stride=1, + norm_type=pw_norm_type, + act=pw_act, + norm_decay=norm_decay, + freeze_norm=freeze_norm, ) + + def forward(self, x): + x = self.depthwise_conv(x) + x = self.pointwise_conv(x) + return x + + +class CrossResolutionWeightingModule(nn.Layer): + def __init__(self, + channels, + ratio=16, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(CrossResolutionWeightingModule, self).__init__() + self.channels = channels + total_channel = sum(channels) + self.conv1 = ConvNormLayer( + ch_in=total_channel, + ch_out=total_channel // ratio, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + self.conv2 = ConvNormLayer( + ch_in=total_channel // ratio, + ch_out=total_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='sigmoid', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + + def forward(self, x): + mini_size = x[-1].shape[-2:] + out = [F.adaptive_avg_pool2d(s, mini_size) for s in x[:-1]] + [x[-1]] + out = paddle.concat(out, 1) + out = self.conv1(out) + out = self.conv2(out) + out = paddle.split(out, self.channels, 1) + out = [ + s * F.interpolate( + a, s.shape[-2:], mode='nearest') for s, a in zip(x, out) + ] + return out + + +class SpatialWeightingModule(nn.Layer): + def __init__(self, in_channel, ratio=16, freeze_norm=False, norm_decay=0.): + super(SpatialWeightingModule, self).__init__() + self.global_avgpooling = nn.AdaptiveAvgPool2D(1) + self.conv1 = ConvNormLayer( + ch_in=in_channel, + ch_out=in_channel // ratio, + filter_size=1, + stride=1, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + self.conv2 = ConvNormLayer( + ch_in=in_channel // ratio, + ch_out=in_channel, + filter_size=1, + stride=1, + act='sigmoid', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + + def forward(self, x): + out = self.global_avgpooling(x) + out = self.conv1(out) + out = self.conv2(out) + return x * out + + +class ConditionalChannelWeightingBlock(nn.Layer): + def __init__(self, + in_channels, + stride, + reduce_ratio, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(ConditionalChannelWeightingBlock, self).__init__() + assert stride in [1, 2] + branch_channels = [channel // 2 for channel in in_channels] + + self.cross_resolution_weighting = CrossResolutionWeightingModule( + branch_channels, + ratio=reduce_ratio, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay) + self.depthwise_convs = nn.LayerList([ + ConvNormLayer( + channel, + channel, + filter_size=3, + stride=stride, + groups=channel, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay) for channel in branch_channels + ]) + + self.spatial_weighting = nn.LayerList([ + SpatialWeightingModule( + channel, + ratio=4, + freeze_norm=freeze_norm, + norm_decay=norm_decay) for channel in branch_channels + ]) + + def forward(self, x): + x = [s.chunk(2, axis=1) for s in x] + x1 = [s[0] for s in x] + x2 = [s[1] for s in x] + + x2 = self.cross_resolution_weighting(x2) + x2 = [dw(s) for s, dw in zip(x2, self.depthwise_convs)] + x2 = [sw(s) for s, sw in zip(x2, self.spatial_weighting)] + + out = [paddle.concat([s1, s2], axis=1) for s1, s2 in zip(x1, x2)] + out = [channel_shuffle(s, groups=2) for s in out] + return out + + +class ShuffleUnit(nn.Layer): + def __init__(self, + in_channel, + out_channel, + stride, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(ShuffleUnit, self).__init__() + branch_channel = out_channel // 2 + self.stride = stride + if self.stride == 1: + assert in_channel == branch_channel * 2, \ + "when stride=1, in_channel {} should equal to branch_channel*2 {}".format(in_channel, branch_channel * 2) + if stride > 1: + self.branch1 = nn.Sequential( + ConvNormLayer( + ch_in=in_channel, + ch_out=in_channel, + filter_size=3, + stride=self.stride, + groups=in_channel, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay), + ConvNormLayer( + ch_in=in_channel, + ch_out=branch_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay), ) + self.branch2 = nn.Sequential( + ConvNormLayer( + ch_in=branch_channel if stride == 1 else in_channel, + ch_out=branch_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay), + ConvNormLayer( + ch_in=branch_channel, + ch_out=branch_channel, + filter_size=3, + stride=self.stride, + groups=branch_channel, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay), + ConvNormLayer( + ch_in=branch_channel, + ch_out=branch_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay), ) + + def forward(self, x): + if self.stride > 1: + x1 = self.branch1(x) + x2 = self.branch2(x) + else: + x1, x2 = x.chunk(2, axis=1) + x2 = self.branch2(x2) + out = paddle.concat([x1, x2], axis=1) + out = channel_shuffle(out, groups=2) + return out + + +class IterativeHead(nn.Layer): + def __init__(self, + in_channels, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(IterativeHead, self).__init__() + num_branches = len(in_channels) + self.in_channels = in_channels[::-1] + + projects = [] + for i in range(num_branches): + if i != num_branches - 1: + projects.append( + DepthWiseSeparableConvNormLayer( + ch_in=self.in_channels[i], + ch_out=self.in_channels[i + 1], + filter_size=3, + stride=1, + dw_act=None, + pw_act='relu', + dw_norm_type=norm_type, + pw_norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay)) + else: + projects.append( + DepthWiseSeparableConvNormLayer( + ch_in=self.in_channels[i], + ch_out=self.in_channels[i], + filter_size=3, + stride=1, + dw_act=None, + pw_act='relu', + dw_norm_type=norm_type, + pw_norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay)) + self.projects = nn.LayerList(projects) + + def forward(self, x): + x = x[::-1] + y = [] + last_x = None + for i, s in enumerate(x): + if last_x is not None: + last_x = F.interpolate( + last_x, + size=s.shape[-2:], + mode='bilinear', + align_corners=True) + s = s + last_x + s = self.projects[i](s) + y.append(s) + last_x = s + + return y[::-1] + + +class Stem(nn.Layer): + def __init__(self, + in_channel, + stem_channel, + out_channel, + expand_ratio, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(Stem, self).__init__() + self.conv1 = ConvNormLayer( + in_channel, + stem_channel, + filter_size=3, + stride=2, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + mid_channel = int(round(stem_channel * expand_ratio)) + branch_channel = stem_channel // 2 + if stem_channel == out_channel: + inc_channel = out_channel - branch_channel + else: + inc_channel = out_channel - stem_channel + self.branch1 = nn.Sequential( + ConvNormLayer( + ch_in=branch_channel, + ch_out=branch_channel, + filter_size=3, + stride=2, + groups=branch_channel, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay), + ConvNormLayer( + ch_in=branch_channel, + ch_out=inc_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay), ) + self.expand_conv = ConvNormLayer( + ch_in=branch_channel, + ch_out=mid_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + self.depthwise_conv = ConvNormLayer( + ch_in=mid_channel, + ch_out=mid_channel, + filter_size=3, + stride=2, + groups=mid_channel, + norm_type=norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay) + self.linear_conv = ConvNormLayer( + ch_in=mid_channel, + ch_out=branch_channel + if stem_channel == out_channel else stem_channel, + filter_size=1, + stride=1, + norm_type=norm_type, + act='relu', + freeze_norm=freeze_norm, + norm_decay=norm_decay) + + def forward(self, x): + x = self.conv1(x) + x1, x2 = x.chunk(2, axis=1) + x1 = self.branch1(x1) + x2 = self.expand_conv(x2) + x2 = self.depthwise_conv(x2) + x2 = self.linear_conv(x2) + out = paddle.concat([x1, x2], axis=1) + out = channel_shuffle(out, groups=2) + + return out + + +class LiteHRNetModule(nn.Layer): + def __init__(self, + num_branches, + num_blocks, + in_channels, + reduce_ratio, + module_type, + multiscale_output=False, + with_fuse=True, + norm_type='bn', + freeze_norm=False, + norm_decay=0.): + super(LiteHRNetModule, self).__init__() + assert num_branches == len(in_channels),\ + "num_branches {} should equal to num_in_channels {}".format(num_branches, len(in_channels)) + assert module_type in [ + 'LITE', 'NAIVE' + ], "module_type should be one of ['LITE', 'NAIVE']" + self.num_branches = num_branches + self.in_channels = in_channels + self.multiscale_output = multiscale_output + self.with_fuse = with_fuse + self.norm_type = 'bn' + self.module_type = module_type + + if self.module_type == 'LITE': + self.layers = self._make_weighting_blocks( + num_blocks, + reduce_ratio, + freeze_norm=freeze_norm, + norm_decay=norm_decay) + elif self.module_type == 'NAIVE': + self.layers = self._make_naive_branches( + num_branches, + num_blocks, + freeze_norm=freeze_norm, + norm_decay=norm_decay) + + if self.with_fuse: + self.fuse_layers = self._make_fuse_layers( + freeze_norm=freeze_norm, norm_decay=norm_decay) + self.relu = nn.ReLU() + + def _make_weighting_blocks(self, + num_blocks, + reduce_ratio, + stride=1, + freeze_norm=False, + norm_decay=0.): + layers = [] + for i in range(num_blocks): + layers.append( + ConditionalChannelWeightingBlock( + self.in_channels, + stride=stride, + reduce_ratio=reduce_ratio, + norm_type=self.norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay)) + return nn.Sequential(*layers) + + def _make_naive_branches(self, + num_branches, + num_blocks, + freeze_norm=False, + norm_decay=0.): + branches = [] + for branch_idx in range(num_branches): + layers = [] + for i in range(num_blocks): + layers.append( + ShuffleUnit( + self.in_channels[branch_idx], + self.in_channels[branch_idx], + stride=1, + norm_type=self.norm_type, + freeze_norm=freeze_norm, + norm_decay=norm_decay)) + branches.append(nn.Sequential(*layers)) + return nn.LayerList(branches) + + def _make_fuse_layers(self, freeze_norm=False, norm_decay=0.): + if self.num_branches == 1: + return None + fuse_layers = [] + num_out_branches = self.num_branches if self.multiscale_output else 1 + for i in range(num_out_branches): + fuse_layer = [] + for j in range(self.num_branches): + if j > i: + fuse_layer.append( + nn.Sequential( + L.Conv2d( + self.in_channels[j], + self.in_channels[i], + kernel_size=1, + stride=1, + padding=0, + bias=False, ), + nn.BatchNorm2D(self.in_channels[i]), + nn.Upsample( + scale_factor=2**(j - i), mode='nearest'))) + elif j == i: + fuse_layer.append(None) + else: + conv_downsamples = [] + for k in range(i - j): + if k == i - j - 1: + conv_downsamples.append( + nn.Sequential( + L.Conv2d( + self.in_channels[j], + self.in_channels[j], + kernel_size=3, + stride=2, + padding=1, + groups=self.in_channels[j], + bias=False, ), + nn.BatchNorm2D(self.in_channels[j]), + L.Conv2d( + self.in_channels[j], + self.in_channels[i], + kernel_size=1, + stride=1, + padding=0, + bias=False, ), + nn.BatchNorm2D(self.in_channels[i]))) + else: + conv_downsamples.append( + nn.Sequential( + L.Conv2d( + self.in_channels[j], + self.in_channels[j], + kernel_size=3, + stride=2, + padding=1, + groups=self.in_channels[j], + bias=False, ), + nn.BatchNorm2D(self.in_channels[j]), + L.Conv2d( + self.in_channels[j], + self.in_channels[j], + kernel_size=1, + stride=1, + padding=0, + bias=False, ), + nn.BatchNorm2D(self.in_channels[j]), + nn.ReLU())) + + fuse_layer.append(nn.Sequential(*conv_downsamples)) + fuse_layers.append(nn.LayerList(fuse_layer)) + + return nn.LayerList(fuse_layers) + + def forward(self, x): + if self.num_branches == 1: + return [self.layers[0](x[0])] + if self.module_type == 'LITE': + out = self.layers(x) + elif self.module_type == 'NAIVE': + for i in range(self.num_branches): + x[i] = self.layers[i](x[i]) + out = x + if self.with_fuse: + out_fuse = [] + for i in range(len(self.fuse_layers)): + y = out[0] if i == 0 else self.fuse_layers[i][0](out[0]) + for j in range(self.num_branches): + if j == 0: + y += y + elif i == j: + y += out[j] + else: + y += self.fuse_layers[i][j](out[j]) + if i == 0: + out[i] = y + out_fuse.append(self.relu(y)) + out = out_fuse + elif not self.multiscale_output: + out = [out[0]] + return out + + +@register +class LiteHRNet(nn.Layer): + """ + @inproceedings{Yulitehrnet21, + title={Lite-HRNet: A Lightweight High-Resolution Network}, + author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong}, + booktitle={CVPR},year={2021} + } + Args: + network_type (str): the network_type should be one of ["lite_18", "lite_30", "naive", "wider_naive"], + "naive": Simply combining the shuffle block in ShuffleNet and the highresolution design pattern in HRNet. + "wider_naive": Naive network with wider channels in each block. + "lite_18": Lite-HRNet-18, which replaces the pointwise convolution in a shuffle block by conditional channel weighting. + "lite_30": Lite-HRNet-30, with more blocks compared with Lite-HRNet-18. + freeze_at (int): the stage to freeze + freeze_norm (bool): whether to freeze norm in HRNet + norm_decay (float): weight decay for normalization layer weights + return_idx (List): the stage to return + """ + + def __init__(self, + network_type, + freeze_at=0, + freeze_norm=True, + norm_decay=0., + return_idx=[0, 1, 2, 3]): + super(LiteHRNet, self).__init__() + if isinstance(return_idx, Integral): + return_idx = [return_idx] + assert network_type in ["lite_18", "lite_30", "naive", "wider_naive"], \ + "the network_type should be one of [lite_18, lite_30, naive, wider_naive]" + assert len(return_idx) > 0, "need one or more return index" + self.freeze_at = freeze_at + self.freeze_norm = freeze_norm + self.norm_decay = norm_decay + self.return_idx = return_idx + self.norm_type = 'bn' + + self.module_configs = { + "lite_18": { + "num_modules": [2, 4, 2], + "num_branches": [2, 3, 4], + "num_blocks": [2, 2, 2], + "module_type": ["LITE", "LITE", "LITE"], + "reduce_ratios": [8, 8, 8], + "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]], + }, + "lite_30": { + "num_modules": [3, 8, 3], + "num_branches": [2, 3, 4], + "num_blocks": [2, 2, 2], + "module_type": ["LITE", "LITE", "LITE"], + "reduce_ratios": [8, 8, 8], + "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]], + }, + "naive": { + "num_modules": [2, 4, 2], + "num_branches": [2, 3, 4], + "num_blocks": [2, 2, 2], + "module_type": ["NAIVE", "NAIVE", "NAIVE"], + "reduce_ratios": [1, 1, 1], + "num_channels": [[30, 60], [30, 60, 120], [30, 60, 120, 240]], + }, + "wider_naive": { + "num_modules": [2, 4, 2], + "num_branches": [2, 3, 4], + "num_blocks": [2, 2, 2], + "module_type": ["NAIVE", "NAIVE", "NAIVE"], + "reduce_ratios": [1, 1, 1], + "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]], + }, + } + + self.stages_config = self.module_configs[network_type] + + self.stem = Stem(3, 32, 32, 1) + num_channels_pre_layer = [32] + for stage_idx in range(3): + num_channels = self.stages_config["num_channels"][stage_idx] + setattr(self, 'transition{}'.format(stage_idx), + self._make_transition_layer(num_channels_pre_layer, + num_channels, self.freeze_norm, + self.norm_decay)) + stage, num_channels_pre_layer = self._make_stage( + self.stages_config, stage_idx, num_channels, True, + self.freeze_norm, self.norm_decay) + setattr(self, 'stage{}'.format(stage_idx), stage) + self.head_layer = IterativeHead(num_channels_pre_layer, 'bn', + self.freeze_norm, self.norm_decay) + + def _make_transition_layer(self, + num_channels_pre_layer, + num_channels_cur_layer, + freeze_norm=False, + norm_decay=0.): + num_branches_pre = len(num_channels_pre_layer) + num_branches_cur = len(num_channels_cur_layer) + transition_layers = [] + for i in range(num_branches_cur): + if i < num_branches_pre: + if num_channels_cur_layer[i] != num_channels_pre_layer[i]: + transition_layers.append( + nn.Sequential( + L.Conv2d( + num_channels_pre_layer[i], + num_channels_pre_layer[i], + kernel_size=3, + stride=1, + padding=1, + groups=num_channels_pre_layer[i], + bias=False), + nn.BatchNorm2D(num_channels_pre_layer[i]), + L.Conv2d( + num_channels_pre_layer[i], + num_channels_cur_layer[i], + kernel_size=1, + stride=1, + padding=0, + bias=False, ), + nn.BatchNorm2D(num_channels_cur_layer[i]), + nn.ReLU())) + else: + transition_layers.append(None) + else: + conv_downsamples = [] + for j in range(i + 1 - num_branches_pre): + conv_downsamples.append( + nn.Sequential( + L.Conv2d( + num_channels_pre_layer[-1], + num_channels_pre_layer[-1], + groups=num_channels_pre_layer[-1], + kernel_size=3, + stride=2, + padding=1, + bias=False, ), + nn.BatchNorm2D(num_channels_pre_layer[-1]), + L.Conv2d( + num_channels_pre_layer[-1], + num_channels_cur_layer[i] + if j == i - num_branches_pre else + num_channels_pre_layer[-1], + kernel_size=1, + stride=1, + padding=0, + bias=False, ), + nn.BatchNorm2D(num_channels_cur_layer[i] + if j == i - num_branches_pre else + num_channels_pre_layer[-1]), + nn.ReLU())) + transition_layers.append(nn.Sequential(*conv_downsamples)) + return nn.LayerList(transition_layers) + + def _make_stage(self, + stages_config, + stage_idx, + in_channels, + multiscale_output, + freeze_norm=False, + norm_decay=0.): + num_modules = stages_config["num_modules"][stage_idx] + num_branches = stages_config["num_branches"][stage_idx] + num_blocks = stages_config["num_blocks"][stage_idx] + reduce_ratio = stages_config['reduce_ratios'][stage_idx] + module_type = stages_config['module_type'][stage_idx] + + modules = [] + for i in range(num_modules): + if not multiscale_output and i == num_modules - 1: + reset_multiscale_output = False + else: + reset_multiscale_output = True + modules.append( + LiteHRNetModule( + num_branches, + num_blocks, + in_channels, + reduce_ratio, + module_type, + multiscale_output=reset_multiscale_output, + with_fuse=True, + freeze_norm=freeze_norm, + norm_decay=norm_decay)) + in_channels = modules[-1].in_channels + return nn.Sequential(*modules), in_channels + + def forward(self, inputs): + x = inputs['image'] + dims = x.shape + if len(dims) == 5: + x = paddle.reshape(x, (dims[0] * dims[1], dims[2], dims[3], + dims[4])) # [6, 3, 128, 96] + + x = self.stem(x) + y_list = [x] + for stage_idx in range(3): + x_list = [] + transition = getattr(self, 'transition{}'.format(stage_idx)) + for j in range(self.stages_config["num_branches"][stage_idx]): + if transition[j] is not None: + if j >= len(y_list): + x_list.append(transition[j](y_list[-1])) + else: + x_list.append(transition[j](y_list[j])) + else: + x_list.append(y_list[j]) + y_list = getattr(self, 'stage{}'.format(stage_idx))(x_list) + x = self.head_layer(y_list) + res = [] + for i, layer in enumerate(x): + if i == self.freeze_at: + layer.stop_gradient = True + if i in self.return_idx: + res.append(layer) + return res + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self._out_channels[i], stride=self._out_strides[i]) + for i in self.return_idx + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v1.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v1.py new file mode 100644 index 0000000000000000000000000000000000000000..a39435be5289b47ef4ad8ac73580d9fe4cb21d10 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v1.py @@ -0,0 +1,402 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import KaimingNormal +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec + +__all__ = ['MobileNet'] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride, + padding, + num_groups=1, + act='relu', + conv_lr=1., + conv_decay=0., + norm_decay=0., + norm_type='bn', + name=None): + super(ConvBNLayer, self).__init__() + self.act = act + self._conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=num_groups, + weight_attr=ParamAttr( + learning_rate=conv_lr, + initializer=KaimingNormal(), + regularizer=L2Decay(conv_decay)), + bias_attr=False) + + param_attr = ParamAttr(regularizer=L2Decay(norm_decay)) + bias_attr = ParamAttr(regularizer=L2Decay(norm_decay)) + if norm_type in ['sync_bn', 'bn']: + self._batch_norm = nn.BatchNorm2D( + out_channels, weight_attr=param_attr, bias_attr=bias_attr) + + def forward(self, x): + x = self._conv(x) + x = self._batch_norm(x) + if self.act == "relu": + x = F.relu(x) + elif self.act == "relu6": + x = F.relu6(x) + return x + + +class DepthwiseSeparable(nn.Layer): + def __init__(self, + in_channels, + out_channels1, + out_channels2, + num_groups, + stride, + scale, + conv_lr=1., + conv_decay=0., + norm_decay=0., + norm_type='bn', + name=None): + super(DepthwiseSeparable, self).__init__() + + self._depthwise_conv = ConvBNLayer( + in_channels, + int(out_channels1 * scale), + kernel_size=3, + stride=stride, + padding=1, + num_groups=int(num_groups * scale), + conv_lr=conv_lr, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name=name + "_dw") + + self._pointwise_conv = ConvBNLayer( + int(out_channels1 * scale), + int(out_channels2 * scale), + kernel_size=1, + stride=1, + padding=0, + conv_lr=conv_lr, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name=name + "_sep") + + def forward(self, x): + x = self._depthwise_conv(x) + x = self._pointwise_conv(x) + return x + + +class ExtraBlock(nn.Layer): + def __init__(self, + in_channels, + out_channels1, + out_channels2, + num_groups=1, + stride=2, + conv_lr=1., + conv_decay=0., + norm_decay=0., + norm_type='bn', + name=None): + super(ExtraBlock, self).__init__() + + self.pointwise_conv = ConvBNLayer( + in_channels, + int(out_channels1), + kernel_size=1, + stride=1, + padding=0, + num_groups=int(num_groups), + act='relu6', + conv_lr=conv_lr, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name=name + "_extra1") + + self.normal_conv = ConvBNLayer( + int(out_channels1), + int(out_channels2), + kernel_size=3, + stride=stride, + padding=1, + num_groups=int(num_groups), + act='relu6', + conv_lr=conv_lr, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name=name + "_extra2") + + def forward(self, x): + x = self.pointwise_conv(x) + x = self.normal_conv(x) + return x + + +@register +@serializable +class MobileNet(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + norm_type='bn', + norm_decay=0., + conv_decay=0., + scale=1, + conv_learning_rate=1.0, + feature_maps=[4, 6, 13], + with_extra_blocks=False, + extra_block_filters=[[256, 512], [128, 256], [128, 256], + [64, 128]]): + super(MobileNet, self).__init__() + if isinstance(feature_maps, Integral): + feature_maps = [feature_maps] + self.feature_maps = feature_maps + self.with_extra_blocks = with_extra_blocks + self.extra_block_filters = extra_block_filters + + self._out_channels = [] + + self.conv1 = ConvBNLayer( + in_channels=3, + out_channels=int(32 * scale), + kernel_size=3, + stride=2, + padding=1, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv1") + + self.dwsl = [] + dws21 = self.add_sublayer( + "conv2_1", + sublayer=DepthwiseSeparable( + in_channels=int(32 * scale), + out_channels1=32, + out_channels2=64, + num_groups=32, + stride=1, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv2_1")) + self.dwsl.append(dws21) + self._update_out_channels(int(64 * scale), len(self.dwsl), feature_maps) + dws22 = self.add_sublayer( + "conv2_2", + sublayer=DepthwiseSeparable( + in_channels=int(64 * scale), + out_channels1=64, + out_channels2=128, + num_groups=64, + stride=2, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv2_2")) + self.dwsl.append(dws22) + self._update_out_channels(int(128 * scale), len(self.dwsl), feature_maps) + # 1/4 + dws31 = self.add_sublayer( + "conv3_1", + sublayer=DepthwiseSeparable( + in_channels=int(128 * scale), + out_channels1=128, + out_channels2=128, + num_groups=128, + stride=1, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv3_1")) + self.dwsl.append(dws31) + self._update_out_channels(int(128 * scale), len(self.dwsl), feature_maps) + dws32 = self.add_sublayer( + "conv3_2", + sublayer=DepthwiseSeparable( + in_channels=int(128 * scale), + out_channels1=128, + out_channels2=256, + num_groups=128, + stride=2, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv3_2")) + self.dwsl.append(dws32) + self._update_out_channels(int(256 * scale), len(self.dwsl), feature_maps) + # 1/8 + dws41 = self.add_sublayer( + "conv4_1", + sublayer=DepthwiseSeparable( + in_channels=int(256 * scale), + out_channels1=256, + out_channels2=256, + num_groups=256, + stride=1, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv4_1")) + self.dwsl.append(dws41) + self._update_out_channels(int(256 * scale), len(self.dwsl), feature_maps) + dws42 = self.add_sublayer( + "conv4_2", + sublayer=DepthwiseSeparable( + in_channels=int(256 * scale), + out_channels1=256, + out_channels2=512, + num_groups=256, + stride=2, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv4_2")) + self.dwsl.append(dws42) + self._update_out_channels(int(512 * scale), len(self.dwsl), feature_maps) + # 1/16 + for i in range(5): + tmp = self.add_sublayer( + "conv5_" + str(i + 1), + sublayer=DepthwiseSeparable( + in_channels=int(512 * scale), + out_channels1=512, + out_channels2=512, + num_groups=512, + stride=1, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv5_" + str(i + 1))) + self.dwsl.append(tmp) + self._update_out_channels(int(512 * scale), len(self.dwsl), feature_maps) + dws56 = self.add_sublayer( + "conv5_6", + sublayer=DepthwiseSeparable( + in_channels=int(512 * scale), + out_channels1=512, + out_channels2=1024, + num_groups=512, + stride=2, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv5_6")) + self.dwsl.append(dws56) + self._update_out_channels(int(1024 * scale), len(self.dwsl), feature_maps) + # 1/32 + dws6 = self.add_sublayer( + "conv6", + sublayer=DepthwiseSeparable( + in_channels=int(1024 * scale), + out_channels1=1024, + out_channels2=1024, + num_groups=1024, + stride=1, + scale=scale, + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv6")) + self.dwsl.append(dws6) + self._update_out_channels(int(1024 * scale), len(self.dwsl), feature_maps) + + if self.with_extra_blocks: + self.extra_blocks = [] + for i, block_filter in enumerate(self.extra_block_filters): + in_c = 1024 if i == 0 else self.extra_block_filters[i - 1][1] + conv_extra = self.add_sublayer( + "conv7_" + str(i + 1), + sublayer=ExtraBlock( + in_c, + block_filter[0], + block_filter[1], + conv_lr=conv_learning_rate, + conv_decay=conv_decay, + norm_decay=norm_decay, + norm_type=norm_type, + name="conv7_" + str(i + 1))) + self.extra_blocks.append(conv_extra) + self._update_out_channels( + block_filter[1], + len(self.dwsl) + len(self.extra_blocks), feature_maps) + + def _update_out_channels(self, channel, feature_idx, feature_maps): + if feature_idx in feature_maps: + self._out_channels.append(channel) + + def forward(self, inputs): + outs = [] + y = self.conv1(inputs['image']) + for i, block in enumerate(self.dwsl): + y = block(y) + if i + 1 in self.feature_maps: + outs.append(y) + + if not self.with_extra_blocks: + return outs + + y = outs[-1] + for i, block in enumerate(self.extra_blocks): + idx = i + len(self.dwsl) + y = block(y) + if idx + 1 in self.feature_maps: + outs.append(y) + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v3.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v3.py new file mode 100644 index 0000000000000000000000000000000000000000..2bd88567a1487437a067ec68497ee9f3b62b4d47 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobilenet_v3.py @@ -0,0 +1,478 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec + +__all__ = ['MobileNetV3'] + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_c, + out_c, + filter_size, + stride, + padding, + num_groups=1, + act=None, + lr_mult=1., + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + name=""): + super(ConvBNLayer, self).__init__() + self.act = act + self.conv = nn.Conv2D( + in_channels=in_c, + out_channels=out_c, + kernel_size=filter_size, + stride=stride, + padding=padding, + groups=num_groups, + weight_attr=ParamAttr( + learning_rate=lr_mult, regularizer=L2Decay(conv_decay)), + bias_attr=False) + + norm_lr = 0. if freeze_norm else lr_mult + param_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + bias_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + global_stats = True if freeze_norm else None + if norm_type in ['sync_bn', 'bn']: + self.bn = nn.BatchNorm2D( + out_c, + weight_attr=param_attr, + bias_attr=bias_attr, + use_global_stats=global_stats) + norm_params = self.bn.parameters() + if freeze_norm: + for param in norm_params: + param.stop_gradient = True + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + if self.act is not None: + if self.act == "relu": + x = F.relu(x) + elif self.act == "relu6": + x = F.relu6(x) + elif self.act == "hard_swish": + x = F.hardswish(x) + else: + raise NotImplementedError( + "The activation function is selected incorrectly.") + return x + + +class ResidualUnit(nn.Layer): + def __init__(self, + in_c, + mid_c, + out_c, + filter_size, + stride, + use_se, + lr_mult, + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + act=None, + return_list=False, + name=''): + super(ResidualUnit, self).__init__() + self.if_shortcut = stride == 1 and in_c == out_c + self.use_se = use_se + self.return_list = return_list + + self.expand_conv = ConvBNLayer( + in_c=in_c, + out_c=mid_c, + filter_size=1, + stride=1, + padding=0, + act=act, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_expand") + self.bottleneck_conv = ConvBNLayer( + in_c=mid_c, + out_c=mid_c, + filter_size=filter_size, + stride=stride, + padding=int((filter_size - 1) // 2), + num_groups=mid_c, + act=act, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_depthwise") + if self.use_se: + self.mid_se = SEModule( + mid_c, lr_mult, conv_decay, name=name + "_se") + self.linear_conv = ConvBNLayer( + in_c=mid_c, + out_c=out_c, + filter_size=1, + stride=1, + padding=0, + act=None, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_linear") + + def forward(self, inputs): + y = self.expand_conv(inputs) + x = self.bottleneck_conv(y) + if self.use_se: + x = self.mid_se(x) + x = self.linear_conv(x) + if self.if_shortcut: + x = paddle.add(inputs, x) + if self.return_list: + return [y, x] + else: + return x + + +class SEModule(nn.Layer): + def __init__(self, channel, lr_mult, conv_decay, reduction=4, name=""): + super(SEModule, self).__init__() + self.avg_pool = nn.AdaptiveAvgPool2D(1) + mid_channels = int(channel // reduction) + self.conv1 = nn.Conv2D( + in_channels=channel, + out_channels=mid_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr( + learning_rate=lr_mult, regularizer=L2Decay(conv_decay)), + bias_attr=ParamAttr( + learning_rate=lr_mult, regularizer=L2Decay(conv_decay))) + self.conv2 = nn.Conv2D( + in_channels=mid_channels, + out_channels=channel, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr( + learning_rate=lr_mult, regularizer=L2Decay(conv_decay)), + bias_attr=ParamAttr( + learning_rate=lr_mult, regularizer=L2Decay(conv_decay))) + + def forward(self, inputs): + outputs = self.avg_pool(inputs) + outputs = self.conv1(outputs) + outputs = F.relu(outputs) + outputs = self.conv2(outputs) + outputs = F.hardsigmoid(outputs, slope=0.2, offset=0.5) + return paddle.multiply(x=inputs, y=outputs) + + +class ExtraBlockDW(nn.Layer): + def __init__(self, + in_c, + ch_1, + ch_2, + stride, + lr_mult, + conv_decay=0., + norm_type='bn', + norm_decay=0., + freeze_norm=False, + name=None): + super(ExtraBlockDW, self).__init__() + self.pointwise_conv = ConvBNLayer( + in_c=in_c, + out_c=ch_1, + filter_size=1, + stride=1, + padding='SAME', + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra1") + self.depthwise_conv = ConvBNLayer( + in_c=ch_1, + out_c=ch_2, + filter_size=3, + stride=stride, + padding='SAME', + num_groups=int(ch_1), + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra2_dw") + self.normal_conv = ConvBNLayer( + in_c=ch_2, + out_c=ch_2, + filter_size=1, + stride=1, + padding='SAME', + act='relu6', + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name=name + "_extra2_sep") + + def forward(self, inputs): + x = self.pointwise_conv(inputs) + x = self.depthwise_conv(x) + x = self.normal_conv(x) + return x + + +@register +@serializable +class MobileNetV3(nn.Layer): + __shared__ = ['norm_type'] + + def __init__( + self, + scale=1.0, + model_name="large", + feature_maps=[6, 12, 15], + with_extra_blocks=False, + extra_block_filters=[[256, 512], [128, 256], [128, 256], [64, 128]], + lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0], + conv_decay=0.0, + multiplier=1.0, + norm_type='bn', + norm_decay=0.0, + freeze_norm=False): + super(MobileNetV3, self).__init__() + if isinstance(feature_maps, Integral): + feature_maps = [feature_maps] + if norm_type == 'sync_bn' and freeze_norm: + raise ValueError( + "The norm_type should not be sync_bn when freeze_norm is True") + self.feature_maps = feature_maps + self.with_extra_blocks = with_extra_blocks + self.extra_block_filters = extra_block_filters + + inplanes = 16 + if model_name == "large": + self.cfg = [ + # k, exp, c, se, nl, s, + [3, 16, 16, False, "relu", 1], + [3, 64, 24, False, "relu", 2], + [3, 72, 24, False, "relu", 1], + [5, 72, 40, True, "relu", 2], # RCNN output + [5, 120, 40, True, "relu", 1], + [5, 120, 40, True, "relu", 1], # YOLOv3 output + [3, 240, 80, False, "hard_swish", 2], # RCNN output + [3, 200, 80, False, "hard_swish", 1], + [3, 184, 80, False, "hard_swish", 1], + [3, 184, 80, False, "hard_swish", 1], + [3, 480, 112, True, "hard_swish", 1], + [3, 672, 112, True, "hard_swish", 1], # YOLOv3 output + [5, 672, 160, True, "hard_swish", 2], # SSD/SSDLite/RCNN output + [5, 960, 160, True, "hard_swish", 1], + [5, 960, 160, True, "hard_swish", 1], # YOLOv3 output + ] + elif model_name == "small": + self.cfg = [ + # k, exp, c, se, nl, s, + [3, 16, 16, True, "relu", 2], + [3, 72, 24, False, "relu", 2], # RCNN output + [3, 88, 24, False, "relu", 1], # YOLOv3 output + [5, 96, 40, True, "hard_swish", 2], # RCNN output + [5, 240, 40, True, "hard_swish", 1], + [5, 240, 40, True, "hard_swish", 1], + [5, 120, 48, True, "hard_swish", 1], + [5, 144, 48, True, "hard_swish", 1], # YOLOv3 output + [5, 288, 96, True, "hard_swish", 2], # SSD/SSDLite/RCNN output + [5, 576, 96, True, "hard_swish", 1], + [5, 576, 96, True, "hard_swish", 1], # YOLOv3 output + ] + else: + raise NotImplementedError( + "mode[{}_model] is not implemented!".format(model_name)) + + if multiplier != 1.0: + self.cfg[-3][2] = int(self.cfg[-3][2] * multiplier) + self.cfg[-2][1] = int(self.cfg[-2][1] * multiplier) + self.cfg[-2][2] = int(self.cfg[-2][2] * multiplier) + self.cfg[-1][1] = int(self.cfg[-1][1] * multiplier) + self.cfg[-1][2] = int(self.cfg[-1][2] * multiplier) + + self.conv1 = ConvBNLayer( + in_c=3, + out_c=make_divisible(inplanes * scale), + filter_size=3, + stride=2, + padding=1, + num_groups=1, + act="hard_swish", + lr_mult=lr_mult_list[0], + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="conv1") + + self._out_channels = [] + self.block_list = [] + i = 0 + inplanes = make_divisible(inplanes * scale) + for (k, exp, c, se, nl, s) in self.cfg: + lr_idx = min(i // 3, len(lr_mult_list) - 1) + lr_mult = lr_mult_list[lr_idx] + + # for SSD/SSDLite, first head input is after ResidualUnit expand_conv + return_list = self.with_extra_blocks and i + 2 in self.feature_maps + + block = self.add_sublayer( + "conv" + str(i + 2), + sublayer=ResidualUnit( + in_c=inplanes, + mid_c=make_divisible(scale * exp), + out_c=make_divisible(scale * c), + filter_size=k, + stride=s, + use_se=se, + act=nl, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + return_list=return_list, + name="conv" + str(i + 2))) + self.block_list.append(block) + inplanes = make_divisible(scale * c) + i += 1 + self._update_out_channels( + make_divisible(scale * exp) + if return_list else inplanes, i + 1, feature_maps) + + if self.with_extra_blocks: + self.extra_block_list = [] + extra_out_c = make_divisible(scale * self.cfg[-1][1]) + lr_idx = min(i // 3, len(lr_mult_list) - 1) + lr_mult = lr_mult_list[lr_idx] + + conv_extra = self.add_sublayer( + "conv" + str(i + 2), + sublayer=ConvBNLayer( + in_c=inplanes, + out_c=extra_out_c, + filter_size=1, + stride=1, + padding=0, + num_groups=1, + act="hard_swish", + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name="conv" + str(i + 2))) + self.extra_block_list.append(conv_extra) + i += 1 + self._update_out_channels(extra_out_c, i + 1, feature_maps) + + for j, block_filter in enumerate(self.extra_block_filters): + in_c = extra_out_c if j == 0 else self.extra_block_filters[j - + 1][1] + conv_extra = self.add_sublayer( + "conv" + str(i + 2), + sublayer=ExtraBlockDW( + in_c, + block_filter[0], + block_filter[1], + stride=2, + lr_mult=lr_mult, + conv_decay=conv_decay, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + name='conv' + str(i + 2))) + self.extra_block_list.append(conv_extra) + i += 1 + self._update_out_channels(block_filter[1], i + 1, feature_maps) + + def _update_out_channels(self, channel, feature_idx, feature_maps): + if feature_idx in feature_maps: + self._out_channels.append(channel) + + def forward(self, inputs): + x = self.conv1(inputs['image']) + outs = [] + for idx, block in enumerate(self.block_list): + x = block(x) + if idx + 2 in self.feature_maps: + if isinstance(x, list): + outs.append(x[0]) + x = x[1] + else: + outs.append(x) + + if not self.with_extra_blocks: + return outs + + for i, block in enumerate(self.extra_block_list): + idx = i + len(self.block_list) + x = block(x) + if idx + 2 in self.feature_maps: + outs.append(x) + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobileone.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobileone.py new file mode 100644 index 0000000000000000000000000000000000000000..e548badd3ed714946e961bc29459191ec0ab7fcb --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/mobileone.py @@ -0,0 +1,266 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is the paddle implementation of MobileOne block, see: https://arxiv.org/pdf/2206.04040.pdf. +Some codes are based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py +Ths copyright of microsoft/Swin-Transformer is as follows: +MIT License [see LICENSE for details] +""" + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Normal, Constant + +from ppdet.modeling.ops import get_act_fn +from ppdet.modeling.layers import ConvNormLayer + + +class MobileOneBlock(nn.Layer): + def __init__( + self, + ch_in, + ch_out, + stride, + kernel_size, + conv_num=1, + norm_type='bn', + norm_decay=0., + norm_groups=32, + bias_on=False, + lr_scale=1., + freeze_norm=False, + initializer=Normal( + mean=0., std=0.01), + skip_quant=False, + act='relu', ): + super(MobileOneBlock, self).__init__() + + self.ch_in = ch_in + self.ch_out = ch_out + self.kernel_size = kernel_size + self.stride = stride + self.padding = (kernel_size - 1) // 2 + self.k = conv_num + + self.depth_conv = nn.LayerList() + self.point_conv = nn.LayerList() + for _ in range(self.k): + self.depth_conv.append( + ConvNormLayer( + ch_in, + ch_in, + kernel_size, + stride=stride, + groups=ch_in, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant)) + self.point_conv.append( + ConvNormLayer( + ch_in, + ch_out, + 1, + stride=1, + groups=1, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant)) + self.rbr_1x1 = ConvNormLayer( + ch_in, + ch_in, + 1, + stride=self.stride, + groups=ch_in, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant) + self.rbr_identity_st1 = nn.BatchNorm2D( + num_features=ch_in, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay( + 0.0))) if ch_in == ch_out and self.stride == 1 else None + self.rbr_identity_st2 = nn.BatchNorm2D( + num_features=ch_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay( + 0.0))) if ch_in == ch_out and self.stride == 1 else None + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + + def forward(self, x): + if hasattr(self, "conv1") and hasattr(self, "conv2"): + y = self.act(self.conv2(self.act(self.conv1(x)))) + else: + if self.rbr_identity_st1 is None: + id_out_st1 = 0 + else: + id_out_st1 = self.rbr_identity_st1(x) + + x1_1 = 0 + for i in range(self.k): + x1_1 += self.depth_conv[i](x) + + x1_2 = self.rbr_1x1(x) + x1 = self.act(x1_1 + x1_2 + id_out_st1) + + if self.rbr_identity_st2 is None: + id_out_st2 = 0 + else: + id_out_st2 = self.rbr_identity_st2(x1) + + x2_1 = 0 + for i in range(self.k): + x2_1 += self.point_conv[i](x1) + y = self.act(x2_1 + id_out_st2) + + return y + + def convert_to_deploy(self): + if not hasattr(self, 'conv1'): + self.conv1 = nn.Conv2D( + in_channels=self.ch_in, + out_channels=self.ch_in, + kernel_size=self.kernel_size, + stride=self.stride, + padding=self.padding, + groups=self.ch_in, + bias_attr=ParamAttr( + initializer=Constant(value=0.), learning_rate=1.)) + if not hasattr(self, 'conv2'): + self.conv2 = nn.Conv2D( + in_channels=self.ch_in, + out_channels=self.ch_out, + kernel_size=1, + stride=1, + padding='SAME', + groups=1, + bias_attr=ParamAttr( + initializer=Constant(value=0.), learning_rate=1.)) + + conv1_kernel, conv1_bias, conv2_kernel, conv2_bias = self.get_equivalent_kernel_bias( + ) + self.conv1.weight.set_value(conv1_kernel) + self.conv1.bias.set_value(conv1_bias) + self.conv2.weight.set_value(conv2_kernel) + self.conv2.bias.set_value(conv2_bias) + self.__delattr__('depth_conv') + self.__delattr__('point_conv') + self.__delattr__('rbr_1x1') + if hasattr(self, 'rbr_identity_st1'): + self.__delattr__('rbr_identity_st1') + if hasattr(self, 'rbr_identity_st2'): + self.__delattr__('rbr_identity_st2') + + def get_equivalent_kernel_bias(self): + st1_kernel3x3, st1_bias3x3 = self._fuse_bn_tensor(self.depth_conv) + st1_kernel1x1, st1_bias1x1 = self._fuse_bn_tensor(self.rbr_1x1) + st1_kernelid, st1_biasid = self._fuse_bn_tensor( + self.rbr_identity_st1, kernel_size=self.kernel_size) + + st2_kernel1x1, st2_bias1x1 = self._fuse_bn_tensor(self.point_conv) + st2_kernelid, st2_biasid = self._fuse_bn_tensor( + self.rbr_identity_st2, kernel_size=1) + + conv1_kernel = st1_kernel3x3 + self._pad_1x1_to_3x3_tensor( + st1_kernel1x1) + st1_kernelid + + conv1_bias = st1_bias3x3 + st1_bias1x1 + st1_biasid + + conv2_kernel = st2_kernel1x1 + st2_kernelid + conv2_bias = st2_bias1x1 + st2_biasid + + return conv1_kernel, conv1_bias, conv2_kernel, conv2_bias + + def _pad_1x1_to_3x3_tensor(self, kernel1x1): + if kernel1x1 is None: + return 0 + else: + padding_size = (self.kernel_size - 1) // 2 + return nn.functional.pad( + kernel1x1, + [padding_size, padding_size, padding_size, padding_size]) + + def _fuse_bn_tensor(self, branch, kernel_size=3): + if branch is None: + return 0, 0 + + if isinstance(branch, nn.LayerList): + fused_kernels = [] + fused_bias = [] + for block in branch: + kernel = block.conv.weight + running_mean = block.norm._mean + running_var = block.norm._variance + gamma = block.norm.weight + beta = block.norm.bias + eps = block.norm._epsilon + + std = (running_var + eps).sqrt() + t = (gamma / std).reshape((-1, 1, 1, 1)) + + fused_kernels.append(kernel * t) + fused_bias.append(beta - running_mean * gamma / std) + + return sum(fused_kernels), sum(fused_bias) + + elif isinstance(branch, ConvNormLayer): + kernel = branch.conv.weight + running_mean = branch.norm._mean + running_var = branch.norm._variance + gamma = branch.norm.weight + beta = branch.norm.bias + eps = branch.norm._epsilon + else: + assert isinstance(branch, nn.BatchNorm2D) + input_dim = self.ch_in if kernel_size == 1 else 1 + kernel_value = paddle.zeros( + shape=[self.ch_in, input_dim, kernel_size, kernel_size], + dtype='float32') + if kernel_size > 1: + for i in range(self.ch_in): + kernel_value[i, i % input_dim, (kernel_size - 1) // 2, ( + kernel_size - 1) // 2] = 1 + elif kernel_size == 1: + for i in range(self.ch_in): + kernel_value[i, i % input_dim, 0, 0] = 1 + else: + raise ValueError("Invalid kernel size recieved!") + kernel = paddle.to_tensor(kernel_value, place=branch.weight.place) + running_mean = branch._mean + running_var = branch._variance + gamma = branch.weight + beta = branch.bias + eps = branch._epsilon + + std = (running_var + eps).sqrt() + t = (gamma / std).reshape((-1, 1, 1, 1)) + + return kernel * t, beta - running_mean * gamma / std diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/name_adapter.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/name_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..4afbb9b189e5091dc048194ca5f3a5cbaea061d3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/name_adapter.py @@ -0,0 +1,69 @@ +class NameAdapter(object): + """Fix the backbones variable names for pretrained weight""" + + def __init__(self, model): + super(NameAdapter, self).__init__() + self.model = model + + @property + def model_type(self): + return getattr(self.model, '_model_type', '') + + @property + def variant(self): + return getattr(self.model, 'variant', '') + + def fix_conv_norm_name(self, name): + if name == "conv1": + bn_name = "bn_" + name + else: + bn_name = "bn" + name[3:] + # the naming rule is same as pretrained weight + if self.model_type == 'SEResNeXt': + bn_name = name + "_bn" + return bn_name + + def fix_shortcut_name(self, name): + if self.model_type == 'SEResNeXt': + name = 'conv' + name + '_prj' + return name + + def fix_bottleneck_name(self, name): + if self.model_type == 'SEResNeXt': + conv_name1 = 'conv' + name + '_x1' + conv_name2 = 'conv' + name + '_x2' + conv_name3 = 'conv' + name + '_x3' + shortcut_name = name + else: + conv_name1 = name + "_branch2a" + conv_name2 = name + "_branch2b" + conv_name3 = name + "_branch2c" + shortcut_name = name + "_branch1" + return conv_name1, conv_name2, conv_name3, shortcut_name + + def fix_basicblock_name(self, name): + if self.model_type == 'SEResNeXt': + conv_name1 = 'conv' + name + '_x1' + conv_name2 = 'conv' + name + '_x2' + shortcut_name = name + else: + conv_name1 = name + "_branch2a" + conv_name2 = name + "_branch2b" + shortcut_name = name + "_branch1" + return conv_name1, conv_name2, shortcut_name + + def fix_layer_warp_name(self, stage_num, count, i): + name = 'res' + str(stage_num) + if count > 10 and stage_num == 4: + if i == 0: + conv_name = name + "a" + else: + conv_name = name + "b" + str(i) + else: + conv_name = name + chr(ord("a") + i) + if self.model_type == 'SEResNeXt': + conv_name = str(stage_num + 2) + '_' + str(i + 1) + return conv_name + + def fix_c1_stage_name(self): + return "res_conv1" if self.model_type == 'ResNeXt' else "conv1" diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/res2net.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/res2net.py new file mode 100644 index 0000000000000000000000000000000000000000..9e7677247914afdcfce375acdbaa595bf9dbe75d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/res2net.py @@ -0,0 +1,357 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from numbers import Integral + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec +from .resnet import ConvNormLayer + +__all__ = ['Res2Net', 'Res2NetC5'] + +Res2Net_cfg = { + 50: [3, 4, 6, 3], + 101: [3, 4, 23, 3], + 152: [3, 8, 36, 3], + 200: [3, 12, 48, 3] +} + + +class BottleNeck(nn.Layer): + def __init__(self, + ch_in, + ch_out, + stride, + shortcut, + width, + scales=4, + variant='b', + groups=1, + lr=1.0, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + dcn_v2=False): + super(BottleNeck, self).__init__() + + self.shortcut = shortcut + self.scales = scales + self.stride = stride + if not shortcut: + if variant == 'd' and stride == 2: + self.branch1 = nn.Sequential() + self.branch1.add_sublayer( + 'pool', + nn.AvgPool2D( + kernel_size=2, stride=2, padding=0, ceil_mode=True)) + self.branch1.add_sublayer( + 'conv', + ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=1, + stride=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr)) + else: + self.branch1 = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=1, + stride=stride, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.branch2a = ConvNormLayer( + ch_in=ch_in, + ch_out=width * scales, + filter_size=1, + stride=stride if variant == 'a' else 1, + groups=1, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.branch2b = nn.LayerList([ + ConvNormLayer( + ch_in=width, + ch_out=width, + filter_size=3, + stride=1 if variant == 'a' else stride, + groups=groups, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr, + dcn_v2=dcn_v2) for _ in range(self.scales - 1) + ]) + + self.branch2c = ConvNormLayer( + ch_in=width * scales, + ch_out=ch_out, + filter_size=1, + stride=1, + groups=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + def forward(self, inputs): + + out = self.branch2a(inputs) + feature_split = paddle.split(out, self.scales, 1) + out_split = [] + for i in range(self.scales - 1): + if i == 0 or self.stride == 2: + out_split.append(self.branch2b[i](feature_split[i])) + else: + out_split.append(self.branch2b[i](paddle.add(feature_split[i], + out_split[-1]))) + if self.stride == 1: + out_split.append(feature_split[-1]) + else: + out_split.append(F.avg_pool2d(feature_split[-1], 3, self.stride, 1)) + out = self.branch2c(paddle.concat(out_split, 1)) + + if self.shortcut: + short = inputs + else: + short = self.branch1(inputs) + + out = paddle.add(out, short) + out = F.relu(out) + + return out + + +class Blocks(nn.Layer): + def __init__(self, + ch_in, + ch_out, + count, + stage_num, + width, + scales=4, + variant='b', + groups=1, + lr=1.0, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + dcn_v2=False): + super(Blocks, self).__init__() + + self.blocks = nn.Sequential() + for i in range(count): + self.blocks.add_sublayer( + str(i), + BottleNeck( + ch_in=ch_in if i == 0 else ch_out, + ch_out=ch_out, + stride=2 if i == 0 and stage_num != 2 else 1, + shortcut=False if i == 0 else True, + width=width * (2**(stage_num - 2)), + scales=scales, + variant=variant, + groups=groups, + lr=lr, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + dcn_v2=dcn_v2)) + + def forward(self, inputs): + return self.blocks(inputs) + + +@register +@serializable +class Res2Net(nn.Layer): + """ + Res2Net, see https://arxiv.org/abs/1904.01169 + Args: + depth (int): Res2Net depth, should be 50, 101, 152, 200. + width (int): Res2Net width + scales (int): Res2Net scale + variant (str): Res2Net variant, supports 'a', 'b', 'c', 'd' currently + lr_mult_list (list): learning rate ratio of different resnet stages(2,3,4,5), + lower learning rate ratio is need for pretrained model + got using distillation(default as [1.0, 1.0, 1.0, 1.0]). + groups (int): The groups number of the Conv Layer. + norm_type (str): normalization type, 'bn' or 'sync_bn' + norm_decay (float): weight decay for normalization layer weights + freeze_norm (bool): freeze normalization layers + freeze_at (int): freeze the backbone at which stage + return_idx (list): index of stages whose feature maps are returned, + index 0 stands for res2 + dcn_v2_stages (list): index of stages who select deformable conv v2 + num_stages (int): number of stages created + + """ + __shared__ = ['norm_type'] + + def __init__(self, + depth=50, + width=26, + scales=4, + variant='b', + lr_mult_list=[1.0, 1.0, 1.0, 1.0], + groups=1, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + freeze_at=0, + return_idx=[0, 1, 2, 3], + dcn_v2_stages=[-1], + num_stages=4): + super(Res2Net, self).__init__() + + self._model_type = 'Res2Net' if groups == 1 else 'Res2NeXt' + + assert depth in [50, 101, 152, 200], \ + "depth {} not in [50, 101, 152, 200]" + assert variant in ['a', 'b', 'c', 'd'], "invalid Res2Net variant" + assert num_stages >= 1 and num_stages <= 4 + + self.depth = depth + self.variant = variant + self.norm_type = norm_type + self.norm_decay = norm_decay + self.freeze_norm = freeze_norm + self.freeze_at = freeze_at + if isinstance(return_idx, Integral): + return_idx = [return_idx] + assert max(return_idx) < num_stages, \ + 'the maximum return index must smaller than num_stages, ' \ + 'but received maximum return index is {} and num_stages ' \ + 'is {}'.format(max(return_idx), num_stages) + self.return_idx = return_idx + self.num_stages = num_stages + assert len(lr_mult_list) == 4, \ + "lr_mult_list length must be 4 but got {}".format(len(lr_mult_list)) + if isinstance(dcn_v2_stages, Integral): + dcn_v2_stages = [dcn_v2_stages] + assert max(dcn_v2_stages) < num_stages + self.dcn_v2_stages = dcn_v2_stages + + block_nums = Res2Net_cfg[depth] + + # C1 stage + if self.variant in ['c', 'd']: + conv_def = [ + [3, 32, 3, 2, "conv1_1"], + [32, 32, 3, 1, "conv1_2"], + [32, 64, 3, 1, "conv1_3"], + ] + else: + conv_def = [[3, 64, 7, 2, "conv1"]] + self.res1 = nn.Sequential() + for (c_in, c_out, k, s, _name) in conv_def: + self.res1.add_sublayer( + _name, + ConvNormLayer( + ch_in=c_in, + ch_out=c_out, + filter_size=k, + stride=s, + groups=1, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=1.0)) + + self._in_channels = [64, 256, 512, 1024] + self._out_channels = [256, 512, 1024, 2048] + self._out_strides = [4, 8, 16, 32] + + # C2-C5 stages + self.res_layers = [] + for i in range(num_stages): + lr_mult = lr_mult_list[i] + stage_num = i + 2 + self.res_layers.append( + self.add_sublayer( + "res{}".format(stage_num), + Blocks( + self._in_channels[i], + self._out_channels[i], + count=block_nums[i], + stage_num=stage_num, + width=width, + scales=scales, + groups=groups, + lr=lr_mult, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + dcn_v2=(i in self.dcn_v2_stages)))) + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self._out_channels[i], stride=self._out_strides[i]) + for i in self.return_idx + ] + + def forward(self, inputs): + x = inputs['image'] + res1 = self.res1(x) + x = F.max_pool2d(res1, kernel_size=3, stride=2, padding=1) + outs = [] + for idx, stage in enumerate(self.res_layers): + x = stage(x) + if idx == self.freeze_at: + x.stop_gradient = True + if idx in self.return_idx: + outs.append(x) + return outs + + +@register +class Res2NetC5(nn.Layer): + def __init__(self, depth=50, width=26, scales=4, variant='b'): + super(Res2NetC5, self).__init__() + feat_in, feat_out = [1024, 2048] + self.res5 = Blocks( + feat_in, + feat_out, + count=3, + stage_num=5, + width=width, + scales=scales, + variant=variant) + self.feat_out = feat_out + + @property + def out_shape(self): + return [ShapeSpec( + channels=self.feat_out, + stride=32, )] + + def forward(self, roi_feat, stage=0): + y = self.res5(roi_feat) + return y diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/resnet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..3b9508c49f932ffa34f53a946224ed8d7a3ae564 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/resnet.py @@ -0,0 +1,609 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from numbers import Integral + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Uniform +from paddle import ParamAttr +from paddle.nn.initializer import Constant +from paddle.vision.ops import DeformConv2D +from .name_adapter import NameAdapter +from ..shape_spec import ShapeSpec + +__all__ = ['ResNet', 'Res5Head', 'Blocks', 'BasicBlock', 'BottleNeck'] + +ResNet_cfg = { + 18: [2, 2, 2, 2], + 34: [3, 4, 6, 3], + 50: [3, 4, 6, 3], + 101: [3, 4, 23, 3], + 152: [3, 8, 36, 3], +} + + +class ConvNormLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + stride, + groups=1, + act=None, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + lr=1.0, + dcn_v2=False): + super(ConvNormLayer, self).__init__() + assert norm_type in ['bn', 'sync_bn'] + self.norm_type = norm_type + self.act = act + self.dcn_v2 = dcn_v2 + + if not self.dcn_v2: + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr(learning_rate=lr), + bias_attr=False) + else: + self.offset_channel = 2 * filter_size**2 + self.mask_channel = filter_size**2 + + self.conv_offset = nn.Conv2D( + in_channels=ch_in, + out_channels=3 * filter_size**2, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + weight_attr=ParamAttr(initializer=Constant(0.)), + bias_attr=ParamAttr(initializer=Constant(0.))) + self.conv = DeformConv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + dilation=1, + groups=groups, + weight_attr=ParamAttr(learning_rate=lr), + bias_attr=False) + + norm_lr = 0. if freeze_norm else lr + param_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + bias_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + + global_stats = True if freeze_norm else None + if norm_type in ['sync_bn', 'bn']: + self.norm = nn.BatchNorm2D( + ch_out, + weight_attr=param_attr, + bias_attr=bias_attr, + use_global_stats=global_stats) + norm_params = self.norm.parameters() + + if freeze_norm: + for param in norm_params: + param.stop_gradient = True + + def forward(self, inputs): + if not self.dcn_v2: + out = self.conv(inputs) + else: + offset_mask = self.conv_offset(inputs) + offset, mask = paddle.split( + offset_mask, + num_or_sections=[self.offset_channel, self.mask_channel], + axis=1) + mask = F.sigmoid(mask) + out = self.conv(inputs, offset, mask=mask) + + if self.norm_type in ['bn', 'sync_bn']: + out = self.norm(out) + if self.act: + out = getattr(F, self.act)(out) + return out + + +class SELayer(nn.Layer): + def __init__(self, ch, reduction_ratio=16): + super(SELayer, self).__init__() + self.pool = nn.AdaptiveAvgPool2D(1) + stdv = 1.0 / math.sqrt(ch) + c_ = ch // reduction_ratio + self.squeeze = nn.Linear( + ch, + c_, + weight_attr=paddle.ParamAttr(initializer=Uniform(-stdv, stdv)), + bias_attr=True) + + stdv = 1.0 / math.sqrt(c_) + self.extract = nn.Linear( + c_, + ch, + weight_attr=paddle.ParamAttr(initializer=Uniform(-stdv, stdv)), + bias_attr=True) + + def forward(self, inputs): + out = self.pool(inputs) + out = paddle.squeeze(out, axis=[2, 3]) + out = self.squeeze(out) + out = F.relu(out) + out = self.extract(out) + out = F.sigmoid(out) + out = paddle.unsqueeze(out, axis=[2, 3]) + scale = out * inputs + return scale + + +class BasicBlock(nn.Layer): + + expansion = 1 + + def __init__(self, + ch_in, + ch_out, + stride, + shortcut, + variant='b', + groups=1, + base_width=64, + lr=1.0, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + dcn_v2=False, + std_senet=False): + super(BasicBlock, self).__init__() + assert groups == 1 and base_width == 64, 'BasicBlock only supports groups=1 and base_width=64' + + self.shortcut = shortcut + if not shortcut: + if variant == 'd' and stride == 2: + self.short = nn.Sequential() + self.short.add_sublayer( + 'pool', + nn.AvgPool2D( + kernel_size=2, stride=2, padding=0, ceil_mode=True)) + self.short.add_sublayer( + 'conv', + ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=1, + stride=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr)) + else: + self.short = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=1, + stride=stride, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.branch2a = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=3, + stride=stride, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.branch2b = ConvNormLayer( + ch_in=ch_out, + ch_out=ch_out, + filter_size=3, + stride=1, + act=None, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr, + dcn_v2=dcn_v2) + + self.std_senet = std_senet + if self.std_senet: + self.se = SELayer(ch_out) + + def forward(self, inputs): + out = self.branch2a(inputs) + out = self.branch2b(out) + if self.std_senet: + out = self.se(out) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + out = paddle.add(x=out, y=short) + out = F.relu(out) + + return out + + +class BottleNeck(nn.Layer): + + expansion = 4 + + def __init__(self, + ch_in, + ch_out, + stride, + shortcut, + variant='b', + groups=1, + base_width=4, + lr=1.0, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + dcn_v2=False, + std_senet=False): + super(BottleNeck, self).__init__() + if variant == 'a': + stride1, stride2 = stride, 1 + else: + stride1, stride2 = 1, stride + + # ResNeXt + width = int(ch_out * (base_width / 64.)) * groups + + self.branch2a = ConvNormLayer( + ch_in=ch_in, + ch_out=width, + filter_size=1, + stride=stride1, + groups=1, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.branch2b = ConvNormLayer( + ch_in=width, + ch_out=width, + filter_size=3, + stride=stride2, + groups=groups, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr, + dcn_v2=dcn_v2) + + self.branch2c = ConvNormLayer( + ch_in=width, + ch_out=ch_out * self.expansion, + filter_size=1, + stride=1, + groups=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.shortcut = shortcut + if not shortcut: + if variant == 'd' and stride == 2: + self.short = nn.Sequential() + self.short.add_sublayer( + 'pool', + nn.AvgPool2D( + kernel_size=2, stride=2, padding=0, ceil_mode=True)) + self.short.add_sublayer( + 'conv', + ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out * self.expansion, + filter_size=1, + stride=1, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr)) + else: + self.short = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out * self.expansion, + filter_size=1, + stride=stride, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=lr) + + self.std_senet = std_senet + if self.std_senet: + self.se = SELayer(ch_out * self.expansion) + + def forward(self, inputs): + + out = self.branch2a(inputs) + out = self.branch2b(out) + out = self.branch2c(out) + + if self.std_senet: + out = self.se(out) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + out = paddle.add(x=out, y=short) + out = F.relu(out) + + return out + + +class Blocks(nn.Layer): + def __init__(self, + block, + ch_in, + ch_out, + count, + name_adapter, + stage_num, + variant='b', + groups=1, + base_width=64, + lr=1.0, + norm_type='bn', + norm_decay=0., + freeze_norm=True, + dcn_v2=False, + std_senet=False): + super(Blocks, self).__init__() + + self.blocks = [] + for i in range(count): + conv_name = name_adapter.fix_layer_warp_name(stage_num, count, i) + layer = self.add_sublayer( + conv_name, + block( + ch_in=ch_in, + ch_out=ch_out, + stride=2 if i == 0 and stage_num != 2 else 1, + shortcut=False if i == 0 else True, + variant=variant, + groups=groups, + base_width=base_width, + lr=lr, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + dcn_v2=dcn_v2, + std_senet=std_senet)) + self.blocks.append(layer) + if i == 0: + ch_in = ch_out * block.expansion + + def forward(self, inputs): + block_out = inputs + for block in self.blocks: + block_out = block(block_out) + return block_out + + +@register +@serializable +class ResNet(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + depth=50, + ch_in=64, + variant='b', + lr_mult_list=[1.0, 1.0, 1.0, 1.0], + groups=1, + base_width=64, + norm_type='bn', + norm_decay=0, + freeze_norm=True, + freeze_at=0, + return_idx=[0, 1, 2, 3], + dcn_v2_stages=[-1], + num_stages=4, + std_senet=False): + """ + Residual Network, see https://arxiv.org/abs/1512.03385 + + Args: + depth (int): ResNet depth, should be 18, 34, 50, 101, 152. + ch_in (int): output channel of first stage, default 64 + variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently + lr_mult_list (list): learning rate ratio of different resnet stages(2,3,4,5), + lower learning rate ratio is need for pretrained model + got using distillation(default as [1.0, 1.0, 1.0, 1.0]). + groups (int): group convolution cardinality + base_width (int): base width of each group convolution + norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel' + norm_decay (float): weight decay for normalization layer weights + freeze_norm (bool): freeze normalization layers + freeze_at (int): freeze the backbone at which stage + return_idx (list): index of the stages whose feature maps are returned + dcn_v2_stages (list): index of stages who select deformable conv v2 + num_stages (int): total num of stages + std_senet (bool): whether use senet, default True + """ + super(ResNet, self).__init__() + self._model_type = 'ResNet' if groups == 1 else 'ResNeXt' + assert num_stages >= 1 and num_stages <= 4 + self.depth = depth + self.variant = variant + self.groups = groups + self.base_width = base_width + self.norm_type = norm_type + self.norm_decay = norm_decay + self.freeze_norm = freeze_norm + self.freeze_at = freeze_at + if isinstance(return_idx, Integral): + return_idx = [return_idx] + assert max(return_idx) < num_stages, \ + 'the maximum return index must smaller than num_stages, ' \ + 'but received maximum return index is {} and num_stages ' \ + 'is {}'.format(max(return_idx), num_stages) + self.return_idx = return_idx + self.num_stages = num_stages + assert len(lr_mult_list) == 4, \ + "lr_mult_list length must be 4 but got {}".format(len(lr_mult_list)) + if isinstance(dcn_v2_stages, Integral): + dcn_v2_stages = [dcn_v2_stages] + assert max(dcn_v2_stages) < num_stages + + if isinstance(dcn_v2_stages, Integral): + dcn_v2_stages = [dcn_v2_stages] + assert max(dcn_v2_stages) < num_stages + self.dcn_v2_stages = dcn_v2_stages + + block_nums = ResNet_cfg[depth] + na = NameAdapter(self) + + conv1_name = na.fix_c1_stage_name() + if variant in ['c', 'd']: + conv_def = [ + [3, ch_in // 2, 3, 2, "conv1_1"], + [ch_in // 2, ch_in // 2, 3, 1, "conv1_2"], + [ch_in // 2, ch_in, 3, 1, "conv1_3"], + ] + else: + conv_def = [[3, ch_in, 7, 2, conv1_name]] + self.conv1 = nn.Sequential() + for (c_in, c_out, k, s, _name) in conv_def: + self.conv1.add_sublayer( + _name, + ConvNormLayer( + ch_in=c_in, + ch_out=c_out, + filter_size=k, + stride=s, + groups=1, + act='relu', + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + lr=1.0)) + + self.ch_in = ch_in + ch_out_list = [64, 128, 256, 512] + block = BottleNeck if depth >= 50 else BasicBlock + + self._out_channels = [block.expansion * v for v in ch_out_list] + self._out_strides = [4, 8, 16, 32] + + self.res_layers = [] + for i in range(num_stages): + lr_mult = lr_mult_list[i] + stage_num = i + 2 + res_name = "res{}".format(stage_num) + res_layer = self.add_sublayer( + res_name, + Blocks( + block, + self.ch_in, + ch_out_list[i], + count=block_nums[i], + name_adapter=na, + stage_num=stage_num, + variant=variant, + groups=groups, + base_width=base_width, + lr=lr_mult, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + dcn_v2=(i in self.dcn_v2_stages), + std_senet=std_senet)) + self.res_layers.append(res_layer) + self.ch_in = self._out_channels[i] + + if freeze_at >= 0: + self._freeze_parameters(self.conv1) + for i in range(min(freeze_at + 1, num_stages)): + self._freeze_parameters(self.res_layers[i]) + + def _freeze_parameters(self, m): + for p in m.parameters(): + p.stop_gradient = True + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self._out_channels[i], stride=self._out_strides[i]) + for i in self.return_idx + ] + + def forward(self, inputs): + x = inputs['image'] + conv1 = self.conv1(x) + x = F.max_pool2d(conv1, kernel_size=3, stride=2, padding=1) + outs = [] + for idx, stage in enumerate(self.res_layers): + x = stage(x) + if idx in self.return_idx: + outs.append(x) + return outs + + +@register +class Res5Head(nn.Layer): + def __init__(self, depth=50): + super(Res5Head, self).__init__() + feat_in, feat_out = [1024, 512] + if depth < 50: + feat_in = 256 + na = NameAdapter(self) + block = BottleNeck if depth >= 50 else BasicBlock + self.res5 = Blocks( + block, feat_in, feat_out, count=3, name_adapter=na, stage_num=5) + self.feat_out = feat_out if depth < 50 else feat_out * 4 + + @property + def out_shape(self): + return [ShapeSpec( + channels=self.feat_out, + stride=16, )] + + def forward(self, roi_feat, stage=0): + y = self.res5(roi_feat) + return y diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/senet.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/senet.py new file mode 100644 index 0000000000000000000000000000000000000000..db1e29b9779aef5fc8ccd4ca2f4cffeedee715f1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/senet.py @@ -0,0 +1,141 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle.nn as nn + +from ppdet.core.workspace import register, serializable +from .resnet import ResNet, Blocks, BasicBlock, BottleNeck +from ..shape_spec import ShapeSpec +from .name_adapter import NameAdapter + +__all__ = ['SENet', 'SERes5Head'] + + +@register +@serializable +class SENet(ResNet): + __shared__ = ['norm_type'] + + def __init__(self, + depth=50, + variant='b', + lr_mult_list=[1.0, 1.0, 1.0, 1.0], + groups=1, + base_width=64, + norm_type='bn', + norm_decay=0, + freeze_norm=True, + freeze_at=0, + return_idx=[0, 1, 2, 3], + dcn_v2_stages=[-1], + std_senet=True, + num_stages=4): + """ + Squeeze-and-Excitation Networks, see https://arxiv.org/abs/1709.01507 + + Args: + depth (int): SENet depth, should be 50, 101, 152 + variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently + lr_mult_list (list): learning rate ratio of different resnet stages(2,3,4,5), + lower learning rate ratio is need for pretrained model + got using distillation(default as [1.0, 1.0, 1.0, 1.0]). + groups (int): group convolution cardinality + base_width (int): base width of each group convolution + norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel' + norm_decay (float): weight decay for normalization layer weights + freeze_norm (bool): freeze normalization layers + freeze_at (int): freeze the backbone at which stage + return_idx (list): index of the stages whose feature maps are returned + dcn_v2_stages (list): index of stages who select deformable conv v2 + std_senet (bool): whether use senet, default True + num_stages (int): total num of stages + """ + + super(SENet, self).__init__( + depth=depth, + variant=variant, + lr_mult_list=lr_mult_list, + ch_in=128, + groups=groups, + base_width=base_width, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + freeze_at=freeze_at, + return_idx=return_idx, + dcn_v2_stages=dcn_v2_stages, + std_senet=std_senet, + num_stages=num_stages) + + +@register +class SERes5Head(nn.Layer): + def __init__(self, + depth=50, + variant='b', + lr_mult=1.0, + groups=1, + base_width=64, + norm_type='bn', + norm_decay=0, + dcn_v2=False, + freeze_norm=False, + std_senet=True): + """ + SERes5Head layer + + Args: + depth (int): SENet depth, should be 50, 101, 152 + variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently + lr_mult (list): learning rate ratio of SERes5Head, default as 1.0. + groups (int): group convolution cardinality + base_width (int): base width of each group convolution + norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel' + norm_decay (float): weight decay for normalization layer weights + dcn_v2_stages (list): index of stages who select deformable conv v2 + std_senet (bool): whether use senet, default True + + """ + super(SERes5Head, self).__init__() + ch_out = 512 + ch_in = 256 if depth < 50 else 1024 + na = NameAdapter(self) + block = BottleNeck if depth >= 50 else BasicBlock + self.res5 = Blocks( + block, + ch_in, + ch_out, + count=3, + name_adapter=na, + stage_num=5, + variant=variant, + groups=groups, + base_width=base_width, + lr=lr_mult, + norm_type=norm_type, + norm_decay=norm_decay, + freeze_norm=freeze_norm, + dcn_v2=dcn_v2, + std_senet=std_senet) + self.ch_out = ch_out * block.expansion + + @property + def out_shape(self): + return [ShapeSpec( + channels=self.ch_out, + stride=16, )] + + def forward(self, roi_feat): + y = self.res5(roi_feat) + return y diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/shufflenet_v2.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/shufflenet_v2.py new file mode 100644 index 0000000000000000000000000000000000000000..ca7ebb93fb8099aa07f348a051d9c9e2f95e3a5f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/shufflenet_v2.py @@ -0,0 +1,250 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +import paddle.nn.functional as F +from paddle.nn import Conv2D, MaxPool2D, AdaptiveAvgPool2D, BatchNorm2D +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + +from ppdet.core.workspace import register, serializable +from numbers import Integral +from ..shape_spec import ShapeSpec +from ppdet.modeling.ops import channel_shuffle + +__all__ = ['ShuffleNetV2'] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride, + padding, + groups=1, + act=None): + super(ConvBNLayer, self).__init__() + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self._batch_norm = BatchNorm2D( + out_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + if act == "hard_swish": + act = 'hardswish' + self.act = act + + def forward(self, inputs): + y = self._conv(inputs) + y = self._batch_norm(y) + if self.act: + y = getattr(F, self.act)(y) + return y + + +class InvertedResidual(nn.Layer): + def __init__(self, in_channels, out_channels, stride, act="relu"): + super(InvertedResidual, self).__init__() + self._conv_pw = ConvBNLayer( + in_channels=in_channels // 2, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + self._conv_dw = ConvBNLayer( + in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=stride, + padding=1, + groups=out_channels // 2, + act=None) + self._conv_linear = ConvBNLayer( + in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + + def forward(self, inputs): + x1, x2 = paddle.split( + inputs, + num_or_sections=[inputs.shape[1] // 2, inputs.shape[1] // 2], + axis=1) + x2 = self._conv_pw(x2) + x2 = self._conv_dw(x2) + x2 = self._conv_linear(x2) + out = paddle.concat([x1, x2], axis=1) + return channel_shuffle(out, 2) + + +class InvertedResidualDS(nn.Layer): + def __init__(self, in_channels, out_channels, stride, act="relu"): + super(InvertedResidualDS, self).__init__() + + # branch1 + self._conv_dw_1 = ConvBNLayer( + in_channels=in_channels, + out_channels=in_channels, + kernel_size=3, + stride=stride, + padding=1, + groups=in_channels, + act=None) + self._conv_linear_1 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + # branch2 + self._conv_pw_2 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + self._conv_dw_2 = ConvBNLayer( + in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=stride, + padding=1, + groups=out_channels // 2, + act=None) + self._conv_linear_2 = ConvBNLayer( + in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=1, + stride=1, + padding=0, + groups=1, + act=act) + + def forward(self, inputs): + x1 = self._conv_dw_1(inputs) + x1 = self._conv_linear_1(x1) + x2 = self._conv_pw_2(inputs) + x2 = self._conv_dw_2(x2) + x2 = self._conv_linear_2(x2) + out = paddle.concat([x1, x2], axis=1) + + return channel_shuffle(out, 2) + + +@register +@serializable +class ShuffleNetV2(nn.Layer): + def __init__(self, scale=1.0, act="relu", feature_maps=[5, 13, 17]): + super(ShuffleNetV2, self).__init__() + self.scale = scale + if isinstance(feature_maps, Integral): + feature_maps = [feature_maps] + self.feature_maps = feature_maps + stage_repeats = [4, 8, 4] + + if scale == 0.25: + stage_out_channels = [-1, 24, 24, 48, 96, 512] + elif scale == 0.33: + stage_out_channels = [-1, 24, 32, 64, 128, 512] + elif scale == 0.5: + stage_out_channels = [-1, 24, 48, 96, 192, 1024] + elif scale == 1.0: + stage_out_channels = [-1, 24, 116, 232, 464, 1024] + elif scale == 1.5: + stage_out_channels = [-1, 24, 176, 352, 704, 1024] + elif scale == 2.0: + stage_out_channels = [-1, 24, 244, 488, 976, 2048] + else: + raise NotImplementedError("This scale size:[" + str(scale) + + "] is not implemented!") + self._out_channels = [] + self._feature_idx = 0 + # 1. conv1 + self._conv1 = ConvBNLayer( + in_channels=3, + out_channels=stage_out_channels[1], + kernel_size=3, + stride=2, + padding=1, + act=act) + self._max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1) + self._feature_idx += 1 + + # 2. bottleneck sequences + self._block_list = [] + for stage_id, num_repeat in enumerate(stage_repeats): + for i in range(num_repeat): + if i == 0: + block = self.add_sublayer( + name=str(stage_id + 2) + '_' + str(i + 1), + sublayer=InvertedResidualDS( + in_channels=stage_out_channels[stage_id + 1], + out_channels=stage_out_channels[stage_id + 2], + stride=2, + act=act)) + else: + block = self.add_sublayer( + name=str(stage_id + 2) + '_' + str(i + 1), + sublayer=InvertedResidual( + in_channels=stage_out_channels[stage_id + 2], + out_channels=stage_out_channels[stage_id + 2], + stride=1, + act=act)) + self._block_list.append(block) + self._feature_idx += 1 + self._update_out_channels(stage_out_channels[stage_id + 2], + self._feature_idx, self.feature_maps) + + def _update_out_channels(self, channel, feature_idx, feature_maps): + if feature_idx in feature_maps: + self._out_channels.append(channel) + + def forward(self, inputs): + y = self._conv1(inputs['image']) + y = self._max_pool(y) + outs = [] + for i, inv in enumerate(self._block_list): + y = inv(y) + if i + 2 in self.feature_maps: + outs.append(y) + + return outs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/swin_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/swin_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..aa4311ff812dffcfe889b843ad9a5ec6a5ce8e48 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/swin_transformer.py @@ -0,0 +1,696 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py +Ths copyright of microsoft/Swin-Transformer is as follows: +MIT License [see LICENSE for details] +""" + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.modeling.shape_spec import ShapeSpec +from ppdet.core.workspace import register, serializable +import numpy as np + +from .transformer_utils import DropPath, Identity +from .transformer_utils import add_parameter, to_2tuple +from .transformer_utils import ones_, zeros_, trunc_normal_ + + +class Mlp(nn.Layer): + def __init__(self, + in_features, + hidden_features=None, + out_features=None, + act_layer=nn.GELU, + drop=0.): + super().__init__() + out_features = out_features or in_features + hidden_features = hidden_features or in_features + self.fc1 = nn.Linear(in_features, hidden_features) + self.act = act_layer() + self.fc2 = nn.Linear(hidden_features, out_features) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + x = self.drop(x) + x = self.fc2(x) + x = self.drop(x) + return x + + +def window_partition(x, window_size): + """ + Args: + x: (B, H, W, C) + window_size (int): window size + Returns: + windows: (num_windows*B, window_size, window_size, C) + """ + B, H, W, C = x.shape + x = x.reshape( + [-1, H // window_size, window_size, W // window_size, window_size, C]) + windows = x.transpose([0, 1, 3, 2, 4, 5]).reshape( + [-1, window_size, window_size, C]) + return windows + + +def window_reverse(windows, window_size, H, W): + """ + Args: + windows: (num_windows*B, window_size, window_size, C) + window_size (int): Window size + H (int): Height of image + W (int): Width of image + Returns: + x: (B, H, W, C) + """ + _, _, _, C = windows.shape + B = int(windows.shape[0] / (H * W / window_size / window_size)) + x = windows.reshape( + [-1, H // window_size, W // window_size, window_size, window_size, C]) + x = x.transpose([0, 1, 3, 2, 4, 5]).reshape([-1, H, W, C]) + return x + + +class WindowAttention(nn.Layer): + """ Window based multi-head self attention (W-MSA) module with relative position bias. + It supports both of shifted and non-shifted window. + + Args: + dim (int): Number of input channels. + window_size (tuple[int]): The height and width of the window. + num_heads (int): Number of attention heads. + qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True + qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set + attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0 + proj_drop (float, optional): Dropout ratio of output. Default: 0.0 + """ + + def __init__(self, + dim, + window_size, + num_heads, + qkv_bias=True, + qk_scale=None, + attn_drop=0., + proj_drop=0.): + + super().__init__() + self.dim = dim + self.window_size = window_size # Wh, Ww + self.num_heads = num_heads + head_dim = dim // num_heads + self.scale = qk_scale or head_dim**-0.5 + + # define a parameter table of relative position bias + self.relative_position_bias_table = add_parameter( + self, + paddle.zeros(((2 * window_size[0] - 1) * (2 * window_size[1] - 1), + num_heads))) # 2*Wh-1 * 2*Ww-1, nH + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(self.window_size[0]) + coords_w = paddle.arange(self.window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww + coords_flatten_1 = coords_flatten.unsqueeze(axis=2) + coords_flatten_2 = coords_flatten.unsqueeze(axis=1) + relative_coords = coords_flatten_1 - coords_flatten_2 + relative_coords = relative_coords.transpose( + [1, 2, 0]) # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += self.window_size[ + 0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += self.window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1 + self.relative_position_index = relative_coords.sum(-1) # Wh*Ww, Wh*Ww + self.register_buffer("relative_position_index", + self.relative_position_index) + + self.qkv = nn.Linear(dim, dim * 3, bias_attr=qkv_bias) + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + + trunc_normal_(self.relative_position_bias_table) + self.softmax = nn.Softmax(axis=-1) + + def forward(self, x, mask=None): + """ Forward function. + Args: + x: input features with shape of (num_windows*B, N, C) + mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None + """ + B_, N, C = x.shape + qkv = self.qkv(x).reshape( + [-1, N, 3, self.num_heads, C // self.num_heads]).transpose( + [2, 0, 3, 1, 4]) + q, k, v = qkv[0], qkv[1], qkv[2] + + q = q * self.scale + attn = paddle.mm(q, k.transpose([0, 1, 3, 2])) + + index = self.relative_position_index.flatten() + + relative_position_bias = paddle.index_select( + self.relative_position_bias_table, index) + relative_position_bias = relative_position_bias.reshape([ + self.window_size[0] * self.window_size[1], + self.window_size[0] * self.window_size[1], -1 + ]) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.transpose( + [2, 0, 1]) # nH, Wh*Ww, Wh*Ww + attn = attn + relative_position_bias.unsqueeze(0) + + if mask is not None: + nW = mask.shape[0] + attn = attn.reshape([-1, nW, self.num_heads, N, N + ]) + mask.unsqueeze(1).unsqueeze(0) + attn = attn.reshape([-1, self.num_heads, N, N]) + attn = self.softmax(attn) + else: + attn = self.softmax(attn) + + attn = self.attn_drop(attn) + + # x = (attn @ v).transpose(1, 2).reshape([B_, N, C]) + x = paddle.mm(attn, v).transpose([0, 2, 1, 3]).reshape([-1, N, C]) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class SwinTransformerBlock(nn.Layer): + """ Swin Transformer Block. + Args: + dim (int): Number of input channels. + num_heads (int): Number of attention heads. + window_size (int): Window size. + shift_size (int): Shift size for SW-MSA. + mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. + qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True + qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. + drop (float, optional): Dropout rate. Default: 0.0 + attn_drop (float, optional): Attention dropout rate. Default: 0.0 + drop_path (float, optional): Stochastic depth rate. Default: 0.0 + act_layer (nn.Layer, optional): Activation layer. Default: nn.GELU + norm_layer (nn.Layer, optional): Normalization layer. Default: nn.LayerNorm + """ + + def __init__(self, + dim, + num_heads, + window_size=7, + shift_size=0, + mlp_ratio=4., + qkv_bias=True, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + act_layer=nn.GELU, + norm_layer=nn.LayerNorm): + super().__init__() + self.dim = dim + self.num_heads = num_heads + self.window_size = window_size + self.shift_size = shift_size + self.mlp_ratio = mlp_ratio + assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size" + + self.norm1 = norm_layer(dim) + self.attn = WindowAttention( + dim, + window_size=to_2tuple(self.window_size), + num_heads=num_heads, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + attn_drop=attn_drop, + proj_drop=drop) + + self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() + self.norm2 = norm_layer(dim) + mlp_hidden_dim = int(dim * mlp_ratio) + self.mlp = Mlp(in_features=dim, + hidden_features=mlp_hidden_dim, + act_layer=act_layer, + drop=drop) + + self.H = None + self.W = None + + def forward(self, x, mask_matrix): + """ Forward function. + Args: + x: Input feature, tensor size (B, H*W, C). + H, W: Spatial resolution of the input feature. + mask_matrix: Attention mask for cyclic shift. + """ + B, L, C = x.shape + H, W = self.H, self.W + assert L == H * W, "input feature has wrong size" + + shortcut = x + x = self.norm1(x) + x = x.reshape([-1, H, W, C]) + + # pad feature maps to multiples of window size + pad_l = pad_t = 0 + pad_r = (self.window_size - W % self.window_size) % self.window_size + pad_b = (self.window_size - H % self.window_size) % self.window_size + x = F.pad(x, [0, pad_l, 0, pad_b, 0, pad_r, 0, pad_t]) + _, Hp, Wp, _ = x.shape + + # cyclic shift + if self.shift_size > 0: + shifted_x = paddle.roll( + x, shifts=(-self.shift_size, -self.shift_size), axis=(1, 2)) + attn_mask = mask_matrix + else: + shifted_x = x + attn_mask = None + + # partition windows + x_windows = window_partition( + shifted_x, self.window_size) # nW*B, window_size, window_size, C + x_windows = x_windows.reshape( + [x_windows.shape[0], self.window_size * self.window_size, + C]) # nW*B, window_size*window_size, C + + # W-MSA/SW-MSA + attn_windows = self.attn( + x_windows, mask=attn_mask) # nW*B, window_size*window_size, C + + # merge windows + attn_windows = attn_windows.reshape( + [x_windows.shape[0], self.window_size, self.window_size, C]) + shifted_x = window_reverse(attn_windows, self.window_size, Hp, + Wp) # B H' W' C + + # reverse cyclic shift + if self.shift_size > 0: + x = paddle.roll( + shifted_x, + shifts=(self.shift_size, self.shift_size), + axis=(1, 2)) + else: + x = shifted_x + + if pad_r > 0 or pad_b > 0: + x = x[:, :H, :W, :] + + x = x.reshape([-1, H * W, C]) + + # FFN + x = shortcut + self.drop_path(x) + x = x + self.drop_path(self.mlp(self.norm2(x))) + + return x + + +class PatchMerging(nn.Layer): + r""" Patch Merging Layer. + Args: + dim (int): Number of input channels. + norm_layer (nn.Layer, optional): Normalization layer. Default: nn.LayerNorm + """ + + def __init__(self, dim, norm_layer=nn.LayerNorm): + super().__init__() + self.dim = dim + self.reduction = nn.Linear(4 * dim, 2 * dim, bias_attr=False) + self.norm = norm_layer(4 * dim) + + def forward(self, x, H, W): + """ Forward function. + Args: + x: Input feature, tensor size (B, H*W, C). + H, W: Spatial resolution of the input feature. + """ + B, L, C = x.shape + assert L == H * W, "input feature has wrong size" + + x = x.reshape([-1, H, W, C]) + + # padding + pad_input = (H % 2 == 1) or (W % 2 == 1) + if pad_input: + x = F.pad(x, [0, 0, 0, W % 2, 0, H % 2]) + + x0 = x[:, 0::2, 0::2, :] # B H/2 W/2 C + x1 = x[:, 1::2, 0::2, :] # B H/2 W/2 C + x2 = x[:, 0::2, 1::2, :] # B H/2 W/2 C + x3 = x[:, 1::2, 1::2, :] # B H/2 W/2 C + x = paddle.concat([x0, x1, x2, x3], -1) # B H/2 W/2 4*C + x = x.reshape([-1, H * W // 4, 4 * C]) # B H/2*W/2 4*C + + x = self.norm(x) + x = self.reduction(x) + + return x + + +class BasicLayer(nn.Layer): + """ A basic Swin Transformer layer for one stage. + Args: + dim (int): Number of input channels. + input_resolution (tuple[int]): Input resolution. + depth (int): Number of blocks. + num_heads (int): Number of attention heads. + window_size (int): Local window size. + mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. + qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True + qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. + drop (float, optional): Dropout rate. Default: 0.0 + attn_drop (float, optional): Attention dropout rate. Default: 0.0 + drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0 + norm_layer (nn.Layer, optional): Normalization layer. Default: nn.LayerNorm + downsample (nn.Layer | None, optional): Downsample layer at the end of the layer. Default: None + """ + + def __init__(self, + dim, + depth, + num_heads, + window_size=7, + mlp_ratio=4., + qkv_bias=True, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + norm_layer=nn.LayerNorm, + downsample=None): + super().__init__() + self.window_size = window_size + self.shift_size = window_size // 2 + self.depth = depth + + # build blocks + self.blocks = nn.LayerList([ + SwinTransformerBlock( + dim=dim, + num_heads=num_heads, + window_size=window_size, + shift_size=0 if (i % 2 == 0) else window_size // 2, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop, + attn_drop=attn_drop, + drop_path=drop_path[i] + if isinstance(drop_path, np.ndarray) else drop_path, + norm_layer=norm_layer) for i in range(depth) + ]) + + # patch merging layer + if downsample is not None: + self.downsample = downsample(dim=dim, norm_layer=norm_layer) + else: + self.downsample = None + + def forward(self, x, H, W): + """ Forward function. + Args: + x: Input feature, tensor size (B, H*W, C). + H, W: Spatial resolution of the input feature. + """ + + # calculate attention mask for SW-MSA + Hp = int(np.ceil(H / self.window_size)) * self.window_size + Wp = int(np.ceil(W / self.window_size)) * self.window_size + img_mask = paddle.zeros([1, Hp, Wp, 1], dtype='float32') # 1 Hp Wp 1 + h_slices = (slice(0, -self.window_size), + slice(-self.window_size, -self.shift_size), + slice(-self.shift_size, None)) + w_slices = (slice(0, -self.window_size), + slice(-self.window_size, -self.shift_size), + slice(-self.shift_size, None)) + cnt = 0 + for h in h_slices: + for w in w_slices: + try: + img_mask[:, h, w, :] = cnt + except: + pass + + cnt += 1 + + mask_windows = window_partition( + img_mask, self.window_size) # nW, window_size, window_size, 1 + mask_windows = mask_windows.reshape( + [-1, self.window_size * self.window_size]) + attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2) + huns = -100.0 * paddle.ones_like(attn_mask) + attn_mask = huns * (attn_mask != 0).astype("float32") + + for blk in self.blocks: + blk.H, blk.W = H, W + x = blk(x, attn_mask) + if self.downsample is not None: + x_down = self.downsample(x, H, W) + Wh, Ww = (H + 1) // 2, (W + 1) // 2 + return x, H, W, x_down, Wh, Ww + else: + return x, H, W, x, H, W + + +class PatchEmbed(nn.Layer): + """ Image to Patch Embedding + Args: + patch_size (int): Patch token size. Default: 4. + in_chans (int): Number of input image channels. Default: 3. + embed_dim (int): Number of linear projection output channels. Default: 96. + norm_layer (nn.Layer, optional): Normalization layer. Default: None + """ + + def __init__(self, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None): + super().__init__() + patch_size = to_2tuple(patch_size) + self.patch_size = patch_size + + self.in_chans = in_chans + self.embed_dim = embed_dim + + self.proj = nn.Conv2D( + in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) + if norm_layer is not None: + self.norm = norm_layer(embed_dim) + else: + self.norm = None + + def forward(self, x): + B, C, H, W = x.shape + # assert [H, W] == self.img_size[:2], "Input image size ({H}*{W}) doesn't match model ({}*{}).".format(H, W, self.img_size[0], self.img_size[1]) + if W % self.patch_size[1] != 0: + x = F.pad(x, [0, self.patch_size[1] - W % self.patch_size[1], 0, 0]) + if H % self.patch_size[0] != 0: + x = F.pad(x, [0, 0, 0, self.patch_size[0] - H % self.patch_size[0]]) + + x = self.proj(x) + if self.norm is not None: + _, _, Wh, Ww = x.shape + x = x.flatten(2).transpose([0, 2, 1]) + x = self.norm(x) + x = x.transpose([0, 2, 1]).reshape([-1, self.embed_dim, Wh, Ww]) + + return x + + +@register +@serializable +class SwinTransformer(nn.Layer): + """ Swin Transformer + A PaddlePaddle impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - + https://arxiv.org/pdf/2103.14030 + + Args: + img_size (int | tuple(int)): Input image size. Default 224 + patch_size (int | tuple(int)): Patch size. Default: 4 + in_chans (int): Number of input image channels. Default: 3 + num_classes (int): Number of classes for classification head. Default: 1000 + embed_dim (int): Patch embedding dimension. Default: 96 + depths (tuple(int)): Depth of each Swin Transformer layer. + num_heads (tuple(int)): Number of attention heads in different layers. + window_size (int): Window size. Default: 7 + mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4 + qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True + qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None + drop_rate (float): Dropout rate. Default: 0 + attn_drop_rate (float): Attention dropout rate. Default: 0 + drop_path_rate (float): Stochastic depth rate. Default: 0.1 + norm_layer (nn.Layer): Normalization layer. Default: nn.LayerNorm. + ape (bool): If True, add absolute position embedding to the patch embedding. Default: False + patch_norm (bool): If True, add normalization after patch embedding. Default: True + """ + + def __init__(self, + pretrain_img_size=224, + patch_size=4, + in_chans=3, + embed_dim=96, + depths=[2, 2, 6, 2], + num_heads=[3, 6, 12, 24], + window_size=7, + mlp_ratio=4., + qkv_bias=True, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0.2, + norm_layer=nn.LayerNorm, + ape=False, + patch_norm=True, + out_indices=(0, 1, 2, 3), + frozen_stages=-1, + pretrained=None): + super(SwinTransformer, self).__init__() + + self.pretrain_img_size = pretrain_img_size + self.num_layers = len(depths) + self.embed_dim = embed_dim + self.ape = ape + self.patch_norm = patch_norm + self.out_indices = out_indices + self.frozen_stages = frozen_stages + + # split image into non-overlapping patches + self.patch_embed = PatchEmbed( + patch_size=patch_size, + in_chans=in_chans, + embed_dim=embed_dim, + norm_layer=norm_layer if self.patch_norm else None) + + # absolute position embedding + if self.ape: + pretrain_img_size = to_2tuple(pretrain_img_size) + patch_size = to_2tuple(patch_size) + patches_resolution = [ + pretrain_img_size[0] // patch_size[0], + pretrain_img_size[1] // patch_size[1] + ] + + self.absolute_pos_embed = add_parameter( + self, + paddle.zeros((1, embed_dim, patches_resolution[0], + patches_resolution[1]))) + trunc_normal_(self.absolute_pos_embed) + + self.pos_drop = nn.Dropout(p=drop_rate) + + # stochastic depth + dpr = np.linspace(0, drop_path_rate, + sum(depths)) # stochastic depth decay rule + + # build layers + self.layers = nn.LayerList() + for i_layer in range(self.num_layers): + layer = BasicLayer( + dim=int(embed_dim * 2**i_layer), + depth=depths[i_layer], + num_heads=num_heads[i_layer], + window_size=window_size, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])], + norm_layer=norm_layer, + downsample=PatchMerging + if (i_layer < self.num_layers - 1) else None) + self.layers.append(layer) + + num_features = [int(embed_dim * 2**i) for i in range(self.num_layers)] + self.num_features = num_features + + # add a norm layer for each output + for i_layer in out_indices: + layer = norm_layer(num_features[i_layer]) + layer_name = f'norm{i_layer}' + self.add_sublayer(layer_name, layer) + + self.apply(self._init_weights) + self._freeze_stages() + if pretrained: + if 'http' in pretrained: #URL + path = paddle.utils.download.get_weights_path_from_url( + pretrained) + else: #model in local path + path = pretrained + self.set_state_dict(paddle.load(path)) + + def _freeze_stages(self): + if self.frozen_stages >= 0: + self.patch_embed.eval() + for param in self.patch_embed.parameters(): + param.stop_gradient = True + + if self.frozen_stages >= 1 and self.ape: + self.absolute_pos_embed.stop_gradient = True + + if self.frozen_stages >= 2: + self.pos_drop.eval() + for i in range(0, self.frozen_stages - 1): + m = self.layers[i] + m.eval() + for param in m.parameters(): + param.stop_gradient = True + + def _init_weights(self, m): + if isinstance(m, nn.Linear): + trunc_normal_(m.weight) + if isinstance(m, nn.Linear) and m.bias is not None: + zeros_(m.bias) + elif isinstance(m, nn.LayerNorm): + zeros_(m.bias) + ones_(m.weight) + + def forward(self, x): + """Forward function.""" + x = self.patch_embed(x['image']) + B, _, Wh, Ww = x.shape + if self.ape: + # interpolate the position embedding to the corresponding size + absolute_pos_embed = F.interpolate( + self.absolute_pos_embed, size=(Wh, Ww), mode='bicubic') + x = (x + absolute_pos_embed).flatten(2).transpose([0, 2, 1]) + else: + x = x.flatten(2).transpose([0, 2, 1]) + x = self.pos_drop(x) + outs = [] + for i in range(self.num_layers): + layer = self.layers[i] + x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww) + if i in self.out_indices: + norm_layer = getattr(self, f'norm{i}') + x_out = norm_layer(x_out) + out = x_out.reshape((-1, H, W, self.num_features[i])).transpose( + (0, 3, 1, 2)) + outs.append(out) + + return tuple(outs) + + @property + def out_shape(self): + out_strides = [4, 8, 16, 32] + return [ + ShapeSpec( + channels=self.num_features[i], stride=out_strides[i]) + for i in self.out_indices + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/trans_encoder.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/trans_encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..1a45e0f0e61567b153b2210fa59ea2bfe2bb8b16 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/trans_encoder.py @@ -0,0 +1,381 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import ReLU, Swish, GELU +import math + +from ppdet.core.workspace import register +from ..shape_spec import ShapeSpec + +__all__ = ['TransEncoder'] + + +class BertEmbeddings(nn.Layer): + def __init__(self, word_size, position_embeddings_size, word_type_size, + hidden_size, dropout_prob): + super(BertEmbeddings, self).__init__() + self.word_embeddings = nn.Embedding( + word_size, hidden_size, padding_idx=0) + self.position_embeddings = nn.Embedding(position_embeddings_size, + hidden_size) + self.token_type_embeddings = nn.Embedding(word_type_size, hidden_size) + self.layernorm = nn.LayerNorm(hidden_size, epsilon=1e-8) + self.dropout = nn.Dropout(dropout_prob) + + def forward(self, x, token_type_ids=None, position_ids=None): + seq_len = paddle.shape(x)[1] + if position_ids is None: + position_ids = paddle.arange(seq_len).unsqueeze(0).expand_as(x) + if token_type_ids is None: + token_type_ids = paddle.zeros(paddle.shape(x)) + + word_embs = self.word_embeddings(x) + position_embs = self.position_embeddings(position_ids) + token_type_embs = self.token_type_embeddings(token_type_ids) + + embs_cmb = word_embs + position_embs + token_type_embs + embs_out = self.layernorm(embs_cmb) + embs_out = self.dropout(embs_out) + return embs_out + + +class BertSelfAttention(nn.Layer): + def __init__(self, + hidden_size, + num_attention_heads, + attention_probs_dropout_prob, + output_attentions=False): + super(BertSelfAttention, self).__init__() + if hidden_size % num_attention_heads != 0: + raise ValueError( + "The hidden_size must be a multiple of the number of attention " + "heads, but got {} % {} != 0" % + (hidden_size, num_attention_heads)) + + self.num_attention_heads = num_attention_heads + self.attention_head_size = int(hidden_size / num_attention_heads) + self.all_head_size = self.num_attention_heads * self.attention_head_size + + self.query = nn.Linear(hidden_size, self.all_head_size) + self.key = nn.Linear(hidden_size, self.all_head_size) + self.value = nn.Linear(hidden_size, self.all_head_size) + + self.dropout = nn.Dropout(attention_probs_dropout_prob) + self.output_attentions = output_attentions + + def forward(self, x, attention_mask, head_mask=None): + query = self.query(x) + key = self.key(x) + value = self.value(x) + + query_dim1, query_dim2 = paddle.shape(query)[:-1] + new_shape = [ + query_dim1, query_dim2, self.num_attention_heads, + self.attention_head_size + ] + query = query.reshape(new_shape).transpose(perm=(0, 2, 1, 3)) + key = key.reshape(new_shape).transpose(perm=(0, 2, 3, 1)) + value = value.reshape(new_shape).transpose(perm=(0, 2, 1, 3)) + + attention = paddle.matmul(query, + key) / math.sqrt(self.attention_head_size) + attention = attention + attention_mask + attention_value = F.softmax(attention, axis=-1) + attention_value = self.dropout(attention_value) + + if head_mask is not None: + attention_value = attention_value * head_mask + + context = paddle.matmul(attention_value, value).transpose(perm=(0, 2, 1, + 3)) + ctx_dim1, ctx_dim2 = paddle.shape(context)[:-2] + new_context_shape = [ + ctx_dim1, + ctx_dim2, + self.all_head_size, + ] + context = context.reshape(new_context_shape) + + if self.output_attentions: + return (context, attention_value) + else: + return (context, ) + + +class BertAttention(nn.Layer): + def __init__(self, + hidden_size, + num_attention_heads, + attention_probs_dropout_prob, + fc_dropout_prob, + output_attentions=False): + super(BertAttention, self).__init__() + self.bert_selfattention = BertSelfAttention( + hidden_size, num_attention_heads, attention_probs_dropout_prob, + output_attentions) + self.fc = nn.Linear(hidden_size, hidden_size) + self.layernorm = nn.LayerNorm(hidden_size, epsilon=1e-8) + self.dropout = nn.Dropout(fc_dropout_prob) + + def forward(self, x, attention_mask, head_mask=None): + attention_feats = self.bert_selfattention(x, attention_mask, head_mask) + features = self.fc(attention_feats[0]) + features = self.dropout(features) + features = self.layernorm(features + x) + if len(attention_feats) == 2: + return (features, attention_feats[1]) + else: + return (features, ) + + +class BertFeedForward(nn.Layer): + def __init__(self, + hidden_size, + intermediate_size, + num_attention_heads, + attention_probs_dropout_prob, + fc_dropout_prob, + act_fn='ReLU', + output_attentions=False): + super(BertFeedForward, self).__init__() + self.fc1 = nn.Linear(hidden_size, intermediate_size) + self.act_fn = eval(act_fn) + self.fc2 = nn.Linear(intermediate_size, hidden_size) + self.layernorm = nn.LayerNorm(hidden_size, epsilon=1e-8) + self.dropout = nn.Dropout(fc_dropout_prob) + + def forward(self, x): + features = self.fc1(x) + features = self.act_fn(features) + features = self.fc2(features) + features = self.dropout(features) + features = self.layernorm(features + x) + return features + + +class BertLayer(nn.Layer): + def __init__(self, + hidden_size, + intermediate_size, + num_attention_heads, + attention_probs_dropout_prob, + fc_dropout_prob, + act_fn='ReLU', + output_attentions=False): + super(BertLayer, self).__init__() + self.attention = BertAttention(hidden_size, num_attention_heads, + attention_probs_dropout_prob, + output_attentions) + self.feed_forward = BertFeedForward( + hidden_size, intermediate_size, num_attention_heads, + attention_probs_dropout_prob, fc_dropout_prob, act_fn, + output_attentions) + + def forward(self, x, attention_mask, head_mask=None): + attention_feats = self.attention(x, attention_mask, head_mask) + features = self.feed_forward(attention_feats[0]) + if len(attention_feats) == 2: + return (features, attention_feats[1]) + else: + return (features, ) + + +class BertEncoder(nn.Layer): + def __init__(self, + num_hidden_layers, + hidden_size, + intermediate_size, + num_attention_heads, + attention_probs_dropout_prob, + fc_dropout_prob, + act_fn='ReLU', + output_attentions=False, + output_hidden_feats=False): + super(BertEncoder, self).__init__() + self.output_attentions = output_attentions + self.output_hidden_feats = output_hidden_feats + self.layers = nn.LayerList([ + BertLayer(hidden_size, intermediate_size, num_attention_heads, + attention_probs_dropout_prob, fc_dropout_prob, act_fn, + output_attentions) for _ in range(num_hidden_layers) + ]) + + def forward(self, x, attention_mask, head_mask=None): + all_features = (x, ) + all_attentions = () + + for i, layer in enumerate(self.layers): + mask = head_mask[i] if head_mask is not None else None + layer_out = layer(x, attention_mask, mask) + + if self.output_hidden_feats: + all_features = all_features + (x, ) + x = layer_out[0] + if self.output_attentions: + all_attentions = all_attentions + (layer_out[1], ) + + outputs = (x, ) + if self.output_hidden_feats: + outputs += (all_features, ) + if self.output_attentions: + outputs += (all_attentions, ) + return outputs + + +class BertPooler(nn.Layer): + def __init__(self, hidden_size): + super(BertPooler, self).__init__() + self.fc = nn.Linear(hidden_size, hidden_size) + self.act = nn.Tanh() + + def forward(self, x): + first_token = x[:, 0] + pooled_output = self.fc(first_token) + pooled_output = self.act(pooled_output) + return pooled_output + + +class METROEncoder(nn.Layer): + def __init__(self, + vocab_size, + num_hidden_layers, + features_dims, + position_embeddings_size, + hidden_size, + intermediate_size, + output_feature_dim, + num_attention_heads, + attention_probs_dropout_prob, + fc_dropout_prob, + act_fn='ReLU', + output_attentions=False, + output_hidden_feats=False, + use_img_layernorm=False): + super(METROEncoder, self).__init__() + self.img_dims = features_dims + self.num_hidden_layers = num_hidden_layers + self.use_img_layernorm = use_img_layernorm + self.output_attentions = output_attentions + self.embedding = BertEmbeddings(vocab_size, position_embeddings_size, 2, + hidden_size, fc_dropout_prob) + self.encoder = BertEncoder( + num_hidden_layers, hidden_size, intermediate_size, + num_attention_heads, attention_probs_dropout_prob, fc_dropout_prob, + act_fn, output_attentions, output_hidden_feats) + self.pooler = BertPooler(hidden_size) + self.position_embeddings = nn.Embedding(position_embeddings_size, + hidden_size) + self.img_embedding = nn.Linear( + features_dims, hidden_size, bias_attr=True) + self.dropout = nn.Dropout(fc_dropout_prob) + self.cls_head = nn.Linear(hidden_size, output_feature_dim) + self.residual = nn.Linear(features_dims, output_feature_dim) + + self.apply(self.init_weights) + + def init_weights(self, module): + """ Initialize the weights. + """ + if isinstance(module, (nn.Linear, nn.Embedding)): + module.weight.set_value( + paddle.normal( + mean=0.0, std=0.02, shape=module.weight.shape)) + elif isinstance(module, nn.LayerNorm): + module.bias.set_value(paddle.zeros(shape=module.bias.shape)) + module.weight.set_value( + paddle.full( + shape=module.weight.shape, fill_value=1.0)) + if isinstance(module, nn.Linear) and module.bias is not None: + module.bias.set_value(paddle.zeros(shape=module.bias.shape)) + + def forward(self, x): + batchsize, seq_len = paddle.shape(x)[:2] + input_ids = paddle.zeros((batchsize, seq_len), dtype="int64") + position_ids = paddle.arange( + seq_len, dtype="int64").unsqueeze(0).expand_as(input_ids) + + attention_mask = paddle.ones_like(input_ids).unsqueeze(1).unsqueeze(2) + head_mask = [None] * self.num_hidden_layers + + position_embs = self.position_embeddings(position_ids) + attention_mask = (1.0 - attention_mask) * -10000.0 + + img_features = self.img_embedding(x) + + # We empirically observe that adding an additional learnable position embedding leads to more stable training + embeddings = position_embs + img_features + if self.use_img_layernorm: + embeddings = self.layernorm(embeddings) + embeddings = self.dropout(embeddings) + + encoder_outputs = self.encoder( + embeddings, attention_mask, head_mask=head_mask) + + pred_score = self.cls_head(encoder_outputs[0]) + res_img_feats = self.residual(x) + pred_score = pred_score + res_img_feats + + if self.output_attentions and self.output_hidden_feats: + return pred_score, encoder_outputs[1], encoder_outputs[-1] + else: + return pred_score + + +def gelu(x): + """Implementation of the gelu activation function. + https://arxiv.org/abs/1606.08415 + """ + return x * 0.5 * (1.0 + paddle.erf(x / math.sqrt(2.0))) + + +@register +class TransEncoder(nn.Layer): + def __init__(self, + vocab_size=30522, + num_hidden_layers=4, + num_attention_heads=4, + position_embeddings_size=512, + intermediate_size=3072, + input_feat_dim=[2048, 512, 128], + hidden_feat_dim=[1024, 256, 128], + attention_probs_dropout_prob=0.1, + fc_dropout_prob=0.1, + act_fn='gelu', + output_attentions=False, + output_hidden_feats=False): + super(TransEncoder, self).__init__() + output_feat_dim = input_feat_dim[1:] + [3] + trans_encoder = [] + for i in range(len(output_feat_dim)): + features_dims = input_feat_dim[i] + output_feature_dim = output_feat_dim[i] + hidden_size = hidden_feat_dim[i] + + # init a transformer encoder and append it to a list + assert hidden_size % num_attention_heads == 0 + model = METROEncoder(vocab_size, num_hidden_layers, features_dims, + position_embeddings_size, hidden_size, + intermediate_size, output_feature_dim, + num_attention_heads, + attention_probs_dropout_prob, fc_dropout_prob, + act_fn, output_attentions, output_hidden_feats) + trans_encoder.append(model) + self.trans_encoder = paddle.nn.Sequential(*trans_encoder) + + def forward(self, x): + out = self.trans_encoder(x) + return out diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/transformer_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/transformer_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..46d7b9f28e6271d7d304d329854c0f38cd3e350a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/transformer_utils.py @@ -0,0 +1,74 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from paddle.nn.initializer import TruncatedNormal, Constant, Assign + +# Common initializations +ones_ = Constant(value=1.) +zeros_ = Constant(value=0.) +trunc_normal_ = TruncatedNormal(std=.02) + + +# Common Layers +def drop_path(x, drop_prob=0., training=False): + """ + Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... + See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... + """ + if drop_prob == 0. or not training: + return x + keep_prob = paddle.to_tensor(1 - drop_prob, dtype=x.dtype) + shape = (paddle.shape(x)[0], ) + (1, ) * (x.ndim - 1) + random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype) + random_tensor = paddle.floor(random_tensor) # binarize + output = x.divide(keep_prob) * random_tensor + return output + + +class DropPath(nn.Layer): + def __init__(self, drop_prob=None): + super(DropPath, self).__init__() + self.drop_prob = drop_prob + + def forward(self, x): + return drop_path(x, self.drop_prob, self.training) + + +class Identity(nn.Layer): + def __init__(self): + super(Identity, self).__init__() + + def forward(self, input): + return input + + +# common funcs + + +def to_2tuple(x): + if isinstance(x, (list, tuple)): + return x + return tuple([x] * 2) + + +def add_parameter(layer, datas, name=None): + parameter = layer.create_parameter( + shape=(datas.shape), default_initializer=Assign(datas)) + if name: + layer.add_parameter(name, parameter) + return parameter diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/vgg.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/vgg.py new file mode 100644 index 0000000000000000000000000000000000000000..e05753209a6235a33101fda669979990c3b225f0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/vgg.py @@ -0,0 +1,210 @@ +from __future__ import division + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn import Conv2D, MaxPool2D +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['VGG'] + +VGG_cfg = {16: [2, 2, 3, 3, 3], 19: [2, 2, 4, 4, 4]} + + +class ConvBlock(nn.Layer): + def __init__(self, + in_channels, + out_channels, + groups, + pool_size=2, + pool_stride=2, + pool_padding=0, + name=None): + super(ConvBlock, self).__init__() + + self.groups = groups + self.conv0 = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=1, + padding=1) + self.conv_out_list = [] + for i in range(1, groups): + conv_out = self.add_sublayer( + 'conv{}'.format(i), + Conv2D( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=1, + padding=1)) + self.conv_out_list.append(conv_out) + + self.pool = MaxPool2D( + kernel_size=pool_size, + stride=pool_stride, + padding=pool_padding, + ceil_mode=True) + + def forward(self, inputs): + out = self.conv0(inputs) + out = F.relu(out) + for conv_i in self.conv_out_list: + out = conv_i(out) + out = F.relu(out) + pool = self.pool(out) + return out, pool + + +class ExtraBlock(nn.Layer): + def __init__(self, + in_channels, + mid_channels, + out_channels, + padding, + stride, + kernel_size, + name=None): + super(ExtraBlock, self).__init__() + + self.conv0 = Conv2D( + in_channels=in_channels, + out_channels=mid_channels, + kernel_size=1, + stride=1, + padding=0) + self.conv1 = Conv2D( + in_channels=mid_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding) + + def forward(self, inputs): + out = self.conv0(inputs) + out = F.relu(out) + out = self.conv1(out) + out = F.relu(out) + return out + + +class L2NormScale(nn.Layer): + def __init__(self, num_channels, scale=1.0): + super(L2NormScale, self).__init__() + self.scale = self.create_parameter( + attr=ParamAttr(initializer=paddle.nn.initializer.Constant(scale)), + shape=[num_channels]) + + def forward(self, inputs): + out = F.normalize(inputs, axis=1, epsilon=1e-10) + # out = self.scale.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as( + # out) * out + out = self.scale.unsqueeze(0).unsqueeze(2).unsqueeze(3) * out + return out + + +@register +@serializable +class VGG(nn.Layer): + def __init__(self, + depth=16, + normalizations=[20., -1, -1, -1, -1, -1], + extra_block_filters=[[256, 512, 1, 2, 3], [128, 256, 1, 2, 3], + [128, 256, 0, 1, 3], + [128, 256, 0, 1, 3]]): + super(VGG, self).__init__() + + assert depth in [16, 19], \ + "depth as 16/19 supported currently, but got {}".format(depth) + self.depth = depth + self.groups = VGG_cfg[depth] + self.normalizations = normalizations + self.extra_block_filters = extra_block_filters + + self._out_channels = [] + + self.conv_block_0 = ConvBlock( + 3, 64, self.groups[0], 2, 2, 0, name="conv1_") + self.conv_block_1 = ConvBlock( + 64, 128, self.groups[1], 2, 2, 0, name="conv2_") + self.conv_block_2 = ConvBlock( + 128, 256, self.groups[2], 2, 2, 0, name="conv3_") + self.conv_block_3 = ConvBlock( + 256, 512, self.groups[3], 2, 2, 0, name="conv4_") + self.conv_block_4 = ConvBlock( + 512, 512, self.groups[4], 3, 1, 1, name="conv5_") + self._out_channels.append(512) + + self.fc6 = Conv2D( + in_channels=512, + out_channels=1024, + kernel_size=3, + stride=1, + padding=6, + dilation=6) + self.fc7 = Conv2D( + in_channels=1024, + out_channels=1024, + kernel_size=1, + stride=1, + padding=0) + self._out_channels.append(1024) + + # extra block + self.extra_convs = [] + last_channels = 1024 + for i, v in enumerate(self.extra_block_filters): + assert len(v) == 5, "extra_block_filters size not fix" + extra_conv = self.add_sublayer("conv{}".format(6 + i), + ExtraBlock(last_channels, v[0], v[1], + v[2], v[3], v[4])) + last_channels = v[1] + self.extra_convs.append(extra_conv) + self._out_channels.append(last_channels) + + self.norms = [] + for i, n in enumerate(self.normalizations): + if n != -1: + norm = self.add_sublayer("norm{}".format(i), + L2NormScale( + self.extra_block_filters[i][1], n)) + else: + norm = None + self.norms.append(norm) + + def forward(self, inputs): + outputs = [] + + conv, pool = self.conv_block_0(inputs['image']) + conv, pool = self.conv_block_1(pool) + conv, pool = self.conv_block_2(pool) + conv, pool = self.conv_block_3(pool) + outputs.append(conv) + + conv, pool = self.conv_block_4(pool) + out = self.fc6(pool) + out = F.relu(out) + out = self.fc7(out) + out = F.relu(out) + outputs.append(out) + + if not self.extra_block_filters: + return outputs + + # extra block + for extra_conv in self.extra_convs: + out = extra_conv(out) + outputs.append(out) + + for i, n in enumerate(self.normalizations): + if n != -1: + outputs[i] = self.norms[i](outputs[i]) + + return outputs + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/backbones/vision_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/backbones/vision_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..a21eefc7aca0d2a5fe0bfa94eddf007612f5f464 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/backbones/vision_transformer.py @@ -0,0 +1,652 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import numpy as np +from paddle.nn.initializer import Constant + +from ppdet.modeling.shape_spec import ShapeSpec +from ppdet.core.workspace import register, serializable + +from .transformer_utils import zeros_, DropPath, Identity + + +class Mlp(nn.Layer): + def __init__(self, + in_features, + hidden_features=None, + out_features=None, + act_layer=nn.GELU, + drop=0.): + super().__init__() + out_features = out_features or in_features + hidden_features = hidden_features or in_features + self.fc1 = nn.Linear(in_features, hidden_features) + self.act = act_layer() + self.fc2 = nn.Linear(hidden_features, out_features) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + x = self.drop(x) + x = self.fc2(x) + x = self.drop(x) + return x + + +class Attention(nn.Layer): + def __init__(self, + dim, + num_heads=8, + qkv_bias=False, + qk_scale=None, + attn_drop=0., + proj_drop=0., + window_size=None): + super().__init__() + self.num_heads = num_heads + head_dim = dim // num_heads + self.scale = qk_scale or head_dim**-0.5 + + self.qkv = nn.Linear(dim, dim * 3, bias_attr=False) + + if qkv_bias: + self.q_bias = self.create_parameter( + shape=([dim]), default_initializer=zeros_) + self.v_bias = self.create_parameter( + shape=([dim]), default_initializer=zeros_) + else: + self.q_bias = None + self.v_bias = None + if window_size: + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.relative_position_bias_table = self.create_parameter( + shape=(self.num_relative_distance, num_heads), + default_initializer=zeros_) # 2*Wh-1 * 2*Ww-1, nH + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww + coords_flatten_1 = paddle.unsqueeze(coords_flatten, 2) + coords_flatten_2 = paddle.unsqueeze(coords_flatten, 1) + relative_coords = coords_flatten_1.clone() - coords_flatten_2.clone( + ) + + #relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :] # 2, Wh*Ww, Wh*Wh + relative_coords = relative_coords.transpose( + (1, 2, 0)) #.contiguous() # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[ + 0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros(shape=(window_size[0] * window_size[1] + 1, ) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + + self.register_buffer("relative_position_index", + relative_position_index) + # trunc_normal_(self.relative_position_bias_table, std=.0) + else: + self.window_size = None + self.relative_position_bias_table = None + self.relative_position_index = None + + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + + def forward(self, x, rel_pos_bias=None): + x_shape = paddle.shape(x) + N, C = x_shape[1], x_shape[2] + + qkv_bias = None + if self.q_bias is not None: + qkv_bias = paddle.concat( + (self.q_bias, paddle.zeros_like(self.v_bias), self.v_bias)) + qkv = F.linear(x, weight=self.qkv.weight, bias=qkv_bias) + + qkv = qkv.reshape((-1, N, 3, self.num_heads, + C // self.num_heads)).transpose((2, 0, 3, 1, 4)) + q, k, v = qkv[0], qkv[1], qkv[2] + attn = (q.matmul(k.transpose((0, 1, 3, 2)))) * self.scale + + if self.relative_position_bias_table is not None: + relative_position_bias = self.relative_position_bias_table[ + self.relative_position_index.reshape([-1])].reshape([ + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1 + ]) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.transpose( + (2, 0, 1)) #.contiguous() # nH, Wh*Ww, Wh*Ww + attn = attn + relative_position_bias.unsqueeze(0) + if rel_pos_bias is not None: + attn = attn + rel_pos_bias + + attn = nn.functional.softmax(attn, axis=-1) + attn = self.attn_drop(attn) + + x = (attn.matmul(v)).transpose((0, 2, 1, 3)).reshape((-1, N, C)) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Block(nn.Layer): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + window_size=None, + init_values=None, + act_layer=nn.GELU, + norm_layer='nn.LayerNorm', + epsilon=1e-5): + super().__init__() + self.norm1 = nn.LayerNorm(dim, epsilon=1e-6) + self.attn = Attention( + dim, + num_heads=num_heads, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + attn_drop=attn_drop, + proj_drop=drop, + window_size=window_size) + # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here + self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() + self.norm2 = eval(norm_layer)(dim, epsilon=epsilon) + mlp_hidden_dim = int(dim * mlp_ratio) + self.mlp = Mlp(in_features=dim, + hidden_features=mlp_hidden_dim, + act_layer=act_layer, + drop=drop) + if init_values is not None: + self.gamma_1 = self.create_parameter( + shape=([dim]), default_initializer=Constant(value=init_values)) + self.gamma_2 = self.create_parameter( + shape=([dim]), default_initializer=Constant(value=init_values)) + else: + self.gamma_1, self.gamma_2 = None, None + + def forward(self, x, rel_pos_bias=None): + + if self.gamma_1 is None: + x = x + self.drop_path( + self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.mlp(self.norm2(x))) + else: + x = x + self.drop_path(self.gamma_1 * self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x))) + return x + + +class PatchEmbed(nn.Layer): + """ Image to Patch Embedding + """ + + def __init__(self, + img_size=[224, 224], + patch_size=16, + in_chans=3, + embed_dim=768): + super().__init__() + self.num_patches_w = img_size[0] // patch_size + self.num_patches_h = img_size[1] // patch_size + + num_patches = self.num_patches_w * self.num_patches_h + self.patch_shape = (img_size[0] // patch_size, + img_size[1] // patch_size) + self.img_size = img_size + self.patch_size = patch_size + self.num_patches = num_patches + + self.proj = nn.Conv2D( + in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) + + @property + def num_patches_in_h(self): + return self.img_size[1] // self.patch_size + + @property + def num_patches_in_w(self): + return self.img_size[0] // self.patch_size + + def forward(self, x, mask=None): + B, C, H, W = x.shape + return self.proj(x) + + +class RelativePositionBias(nn.Layer): + def __init__(self, window_size, num_heads): + super().__init__() + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.relative_position_bias_table = self.create_parameter( + shape=(self.num_relative_distance, num_heads), + default_initialize=zeros_) + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = coords.flatten(1) # 2, Wh*Ww + + relative_coords = coords_flatten[:, :, + None] - coords_flatten[:, + None, :] # 2, Wh*Ww, Wh*Ww + relative_coords = relative_coords.transpos( + (1, 2, 0)) # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros(size=(window_size[0] * window_size[1] + 1,) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + self.register_buffer("relative_position_index", relative_position_index) + + def forward(self): + relative_position_bias = \ + self.relative_position_bias_table[self.relative_position_index.reshape([-1])].reshape([ + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1]) # Wh*Ww,Wh*Ww,nH + return relative_position_bias.transpose((2, 0, 1)) # nH, Wh*Ww, Wh*Ww + + +def get_sinusoid_encoding_table(n_position, d_hid, token=False): + ''' Sinusoid position encoding table ''' + + def get_position_angle_vec(position): + return [ + position / np.power(10000, 2 * (hid_j // 2) / d_hid) + for hid_j in range(d_hid) + ] + + sinusoid_table = np.array( + [get_position_angle_vec(pos_i) for pos_i in range(n_position)]) + sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i + sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 + if token: + sinusoid_table = np.concatenate( + [sinusoid_table, np.zeros([1, d_hid])], dim=0) + + return paddle.to_tensor(sinusoid_table, dtype=paddle.float32).unsqueeze(0) + + +@register +@serializable +class VisionTransformer(nn.Layer): + """ Vision Transformer with support for patch input + """ + + def __init__(self, + img_size=[672, 1092], + patch_size=16, + in_chans=3, + embed_dim=768, + depth=12, + num_heads=12, + mlp_ratio=4, + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer='nn.LayerNorm', + init_values=None, + use_rel_pos_bias=False, + use_shared_rel_pos_bias=False, + epsilon=1e-5, + final_norm=False, + pretrained=None, + out_indices=[3, 5, 7, 11], + use_abs_pos_emb=False, + use_sincos_pos_emb=True, + with_fpn=True, + num_fpn_levels=4, + use_checkpoint=False, + **args): + super().__init__() + self.img_size = img_size + self.embed_dim = embed_dim + self.with_fpn = with_fpn + self.use_checkpoint = use_checkpoint + self.use_sincos_pos_emb = use_sincos_pos_emb + self.use_rel_pos_bias = use_rel_pos_bias + self.final_norm = final_norm + self.out_indices = out_indices + self.num_fpn_levels = num_fpn_levels + + if use_checkpoint: + paddle.seed(0) + + self.patch_embed = PatchEmbed( + img_size=img_size, + patch_size=patch_size, + in_chans=in_chans, + embed_dim=embed_dim) + + self.pos_w = self.patch_embed.num_patches_in_w + self.pos_h = self.patch_embed.num_patches_in_h + + self.cls_token = self.create_parameter( + shape=(1, 1, embed_dim), + default_initializer=paddle.nn.initializer.Constant(value=0.)) + + if use_abs_pos_emb: + self.pos_embed = self.create_parameter( + shape=(1, self.pos_w * self.pos_h + 1, embed_dim), + default_initializer=paddle.nn.initializer.TruncatedNormal( + std=.02)) + elif use_sincos_pos_emb: + pos_embed = self.build_2d_sincos_position_embedding(embed_dim) + + self.pos_embed = pos_embed + self.pos_embed = self.create_parameter(shape=pos_embed.shape) + self.pos_embed.set_value(pos_embed.numpy()) + self.pos_embed.stop_gradient = True + + else: + self.pos_embed = None + + self.pos_drop = nn.Dropout(p=drop_rate) + + if use_shared_rel_pos_bias: + self.rel_pos_bias = RelativePositionBias( + window_size=self.patch_embed.patch_shape, num_heads=num_heads) + else: + self.rel_pos_bias = None + + dpr = np.linspace(0, drop_path_rate, depth) + + self.blocks = nn.LayerList([ + Block( + dim=embed_dim, + num_heads=num_heads, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[i], + norm_layer=norm_layer, + init_values=init_values, + window_size=self.patch_embed.patch_shape + if use_rel_pos_bias else None, + epsilon=epsilon) for i in range(depth) + ]) + + self.pretrained = pretrained + self.init_weight() + + assert len(out_indices) <= 4, '' + self.out_indices = out_indices + self.out_channels = [embed_dim for _ in range(num_fpn_levels)] + self.out_strides = [4, 8, 16, 32][-num_fpn_levels:] if with_fpn else [ + patch_size for _ in range(len(out_indices)) + ] + + self.norm = Identity() + + if self.with_fpn: + assert num_fpn_levels <= 4, '' + self.init_fpn( + embed_dim=embed_dim, + patch_size=patch_size, ) + + def init_weight(self): + pretrained = self.pretrained + + if pretrained: + if 'http' in pretrained: #URL + path = paddle.utils.download.get_weights_path_from_url( + pretrained) + else: #model in local path + path = pretrained + + load_state_dict = paddle.load(path) + model_state_dict = self.state_dict() + pos_embed_name = "pos_embed" + + if pos_embed_name in load_state_dict.keys(): + load_pos_embed = paddle.to_tensor( + load_state_dict[pos_embed_name], dtype="float32") + if self.pos_embed.shape != load_pos_embed.shape: + pos_size = int(math.sqrt(load_pos_embed.shape[1] - 1)) + model_state_dict[pos_embed_name] = self.resize_pos_embed( + load_pos_embed, (pos_size, pos_size), + (self.pos_h, self.pos_w)) + + # self.set_state_dict(model_state_dict) + load_state_dict[pos_embed_name] = model_state_dict[ + pos_embed_name] + + print("Load pos_embed and resize it from {} to {} .".format( + load_pos_embed.shape, self.pos_embed.shape)) + + self.set_state_dict(load_state_dict) + print("Load load_state_dict....") + + def init_fpn(self, embed_dim=768, patch_size=16, out_with_norm=False): + if patch_size == 16: + self.fpn1 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), + nn.BatchNorm2D(embed_dim), + nn.GELU(), + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn2 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn3 = Identity() + + self.fpn4 = nn.MaxPool2D(kernel_size=2, stride=2) + elif patch_size == 8: + self.fpn1 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn2 = Identity() + + self.fpn3 = nn.Sequential(nn.MaxPool2D(kernel_size=2, stride=2), ) + + self.fpn4 = nn.Sequential(nn.MaxPool2D(kernel_size=4, stride=4), ) + + if not out_with_norm: + self.norm = Identity() + else: + self.norm = nn.LayerNorm(embed_dim, epsilon=1e-6) + + def interpolate_pos_encoding(self, x, w, h): + npatch = x.shape[1] - 1 + N = self.pos_embed.shape[1] - 1 + w0 = w // self.patch_embed.patch_size + h0 = h // self.patch_embed.patch_size + if npatch == N and w0 == self.patch_embed.num_patches_w and h0 == self.patch_embed.num_patches_h: + return self.pos_embed + class_pos_embed = self.pos_embed[:, 0] + patch_pos_embed = self.pos_embed[:, 1:] + dim = x.shape[-1] + # we add a small number to avoid floating point error in the interpolation + # see discussion at https://github.com/facebookresearch/dino/issues/8 + # w0, h0 = w0 + 0.1, h0 + 0.1 + # patch_pos_embed = nn.functional.interpolate( + # patch_pos_embed.reshape([ + # 1, self.patch_embed.num_patches_w, + # self.patch_embed.num_patches_h, dim + # ]).transpose((0, 3, 1, 2)), + # scale_factor=(w0 / self.patch_embed.num_patches_w, + # h0 / self.patch_embed.num_patches_h), + # mode='bicubic', ) + + patch_pos_embed = nn.functional.interpolate( + patch_pos_embed.reshape([ + 1, self.patch_embed.num_patches_w, + self.patch_embed.num_patches_h, dim + ]).transpose((0, 3, 1, 2)), + (w0, h0), + mode='bicubic', ) + + assert int(w0) == patch_pos_embed.shape[-2] and int( + h0) == patch_pos_embed.shape[-1] + patch_pos_embed = patch_pos_embed.transpose( + (0, 2, 3, 1)).reshape([1, -1, dim]) + return paddle.concat( + (class_pos_embed.unsqueeze(0), patch_pos_embed), axis=1) + + def resize_pos_embed(self, pos_embed, old_hw, new_hw): + """ + Resize pos_embed weight. + Args: + pos_embed (Tensor): the pos_embed weight + old_hw (list[int]): the height and width of old pos_embed + new_hw (list[int]): the height and width of new pos_embed + Returns: + Tensor: the resized pos_embed weight + """ + cls_pos_embed = pos_embed[:, :1, :] + pos_embed = pos_embed[:, 1:, :] + + pos_embed = pos_embed.transpose([0, 2, 1]) + pos_embed = pos_embed.reshape([1, -1, old_hw[0], old_hw[1]]) + pos_embed = F.interpolate( + pos_embed, new_hw, mode='bicubic', align_corners=False) + pos_embed = pos_embed.flatten(2).transpose([0, 2, 1]) + pos_embed = paddle.concat([cls_pos_embed, pos_embed], axis=1) + + return pos_embed + + def build_2d_sincos_position_embedding( + self, + embed_dim=768, + temperature=10000., ): + h, w = self.patch_embed.patch_shape + grid_w = paddle.arange(w, dtype=paddle.float32) + grid_h = paddle.arange(h, dtype=paddle.float32) + grid_w, grid_h = paddle.meshgrid(grid_w, grid_h) + assert embed_dim % 4 == 0, 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding' + pos_dim = embed_dim // 4 + omega = paddle.arange(pos_dim, dtype=paddle.float32) / pos_dim + omega = 1. / (temperature**omega) + + out_w = grid_w.flatten()[..., None] @omega[None] + out_h = grid_h.flatten()[..., None] @omega[None] + + pos_emb = paddle.concat( + [ + paddle.sin(out_w), paddle.cos(out_w), paddle.sin(out_h), + paddle.cos(out_h) + ], + axis=1)[None, :, :] + + pe_token = paddle.zeros([1, 1, embed_dim], dtype=paddle.float32) + pos_embed = paddle.concat([pe_token, pos_emb], axis=1) + # pos_embed.stop_gradient = True + + return pos_embed + + def forward(self, x): + x = x['image'] if isinstance(x, dict) else x + _, _, h, w = x.shape + + x = self.patch_embed(x) + + B, D, Hp, Wp = x.shape # b * c * h * w + + cls_tokens = self.cls_token.expand( + (B, self.cls_token.shape[-2], self.cls_token.shape[-1])) + x = x.flatten(2).transpose([0, 2, 1]) # b * hw * c + x = paddle.concat([cls_tokens, x], axis=1) + + if self.pos_embed is not None: + # x = x + self.interpolate_pos_encoding(x, w, h) + x = x + self.interpolate_pos_encoding(x, h, w) + + x = self.pos_drop(x) + + rel_pos_bias = self.rel_pos_bias( + ) if self.rel_pos_bias is not None else None + + feats = [] + for idx, blk in enumerate(self.blocks): + if self.use_checkpoint and self.training: + x = paddle.distributed.fleet.utils.recompute( + blk, x, rel_pos_bias, **{"preserve_rng_state": True}) + else: + x = blk(x, rel_pos_bias) + + if idx in self.out_indices: + xp = paddle.reshape( + paddle.transpose( + self.norm(x[:, 1:, :]), perm=[0, 2, 1]), + shape=[B, D, Hp, Wp]) + feats.append(xp) + + if self.with_fpn: + fpns = [self.fpn1, self.fpn2, self.fpn3, self.fpn4][ + -self.num_fpn_levels:] + assert len(fpns) == len(feats) or len(feats) == 1, '' + outputs = [] + for i, m in enumerate(fpns): + outputs.append( + m(feats[i] if len(feats) == len(fpns) else feats[-1])) + + return outputs + + return feats + + @property + def num_layers(self): + return len(self.blocks) + + @property + def no_weight_decay(self): + return {'pos_embed', 'cls_token'} + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=c, stride=s) + for c, s in zip(self.out_channels, self.out_strides) + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/bbox_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/bbox_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..576cbbf04bff806242f36f81785b84c921e523f0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/bbox_utils.py @@ -0,0 +1,607 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import numpy as np + + +def bbox2delta(src_boxes, tgt_boxes, weights=[1.0, 1.0, 1.0, 1.0]): + """Encode bboxes to deltas. + """ + src_w = src_boxes[:, 2] - src_boxes[:, 0] + src_h = src_boxes[:, 3] - src_boxes[:, 1] + src_ctr_x = src_boxes[:, 0] + 0.5 * src_w + src_ctr_y = src_boxes[:, 1] + 0.5 * src_h + + tgt_w = tgt_boxes[:, 2] - tgt_boxes[:, 0] + tgt_h = tgt_boxes[:, 3] - tgt_boxes[:, 1] + tgt_ctr_x = tgt_boxes[:, 0] + 0.5 * tgt_w + tgt_ctr_y = tgt_boxes[:, 1] + 0.5 * tgt_h + + wx, wy, ww, wh = weights + dx = wx * (tgt_ctr_x - src_ctr_x) / src_w + dy = wy * (tgt_ctr_y - src_ctr_y) / src_h + dw = ww * paddle.log(tgt_w / src_w) + dh = wh * paddle.log(tgt_h / src_h) + + deltas = paddle.stack((dx, dy, dw, dh), axis=1) + return deltas + + +def delta2bbox(deltas, boxes, weights=[1.0, 1.0, 1.0, 1.0], max_shape=None): + """Decode deltas to boxes. Used in RCNNBox,CascadeHead,RCNNHead,RetinaHead. + Note: return tensor shape [n,1,4] + If you want to add a reshape, please add after the calling code instead of here. + """ + clip_scale = math.log(1000.0 / 16) + + widths = boxes[:, 2] - boxes[:, 0] + heights = boxes[:, 3] - boxes[:, 1] + ctr_x = boxes[:, 0] + 0.5 * widths + ctr_y = boxes[:, 1] + 0.5 * heights + + wx, wy, ww, wh = weights + dx = deltas[:, 0::4] / wx + dy = deltas[:, 1::4] / wy + dw = deltas[:, 2::4] / ww + dh = deltas[:, 3::4] / wh + # Prevent sending too large values into paddle.exp() + dw = paddle.clip(dw, max=clip_scale) + dh = paddle.clip(dh, max=clip_scale) + + pred_ctr_x = dx * widths.unsqueeze(1) + ctr_x.unsqueeze(1) + pred_ctr_y = dy * heights.unsqueeze(1) + ctr_y.unsqueeze(1) + pred_w = paddle.exp(dw) * widths.unsqueeze(1) + pred_h = paddle.exp(dh) * heights.unsqueeze(1) + + pred_boxes = [] + pred_boxes.append(pred_ctr_x - 0.5 * pred_w) + pred_boxes.append(pred_ctr_y - 0.5 * pred_h) + pred_boxes.append(pred_ctr_x + 0.5 * pred_w) + pred_boxes.append(pred_ctr_y + 0.5 * pred_h) + pred_boxes = paddle.stack(pred_boxes, axis=-1) + + if max_shape is not None: + pred_boxes[..., 0::2] = pred_boxes[..., 0::2].clip( + min=0, max=max_shape[1]) + pred_boxes[..., 1::2] = pred_boxes[..., 1::2].clip( + min=0, max=max_shape[0]) + return pred_boxes + + +def bbox2delta_v2(src_boxes, + tgt_boxes, + delta_mean=[0.0, 0.0, 0.0, 0.0], + delta_std=[1.0, 1.0, 1.0, 1.0]): + """Encode bboxes to deltas. + Modified from bbox2delta() which just use weight parameters to multiply deltas. + """ + src_w = src_boxes[:, 2] - src_boxes[:, 0] + src_h = src_boxes[:, 3] - src_boxes[:, 1] + src_ctr_x = src_boxes[:, 0] + 0.5 * src_w + src_ctr_y = src_boxes[:, 1] + 0.5 * src_h + + tgt_w = tgt_boxes[:, 2] - tgt_boxes[:, 0] + tgt_h = tgt_boxes[:, 3] - tgt_boxes[:, 1] + tgt_ctr_x = tgt_boxes[:, 0] + 0.5 * tgt_w + tgt_ctr_y = tgt_boxes[:, 1] + 0.5 * tgt_h + + dx = (tgt_ctr_x - src_ctr_x) / src_w + dy = (tgt_ctr_y - src_ctr_y) / src_h + dw = paddle.log(tgt_w / src_w) + dh = paddle.log(tgt_h / src_h) + + deltas = paddle.stack((dx, dy, dw, dh), axis=1) + deltas = ( + deltas - paddle.to_tensor(delta_mean)) / paddle.to_tensor(delta_std) + return deltas + + +def delta2bbox_v2(deltas, + boxes, + delta_mean=[0.0, 0.0, 0.0, 0.0], + delta_std=[1.0, 1.0, 1.0, 1.0], + max_shape=None, + ctr_clip=32.0): + """Decode deltas to bboxes. + Modified from delta2bbox() which just use weight parameters to be divided by deltas. + Used in YOLOFHead. + Note: return tensor shape [n,1,4] + If you want to add a reshape, please add after the calling code instead of here. + """ + clip_scale = math.log(1000.0 / 16) + + widths = boxes[:, 2] - boxes[:, 0] + heights = boxes[:, 3] - boxes[:, 1] + ctr_x = boxes[:, 0] + 0.5 * widths + ctr_y = boxes[:, 1] + 0.5 * heights + + deltas = deltas * paddle.to_tensor(delta_std) + paddle.to_tensor(delta_mean) + dx = deltas[:, 0::4] + dy = deltas[:, 1::4] + dw = deltas[:, 2::4] + dh = deltas[:, 3::4] + + # Prevent sending too large values into paddle.exp() + dx = dx * widths.unsqueeze(1) + dy = dy * heights.unsqueeze(1) + if ctr_clip is not None: + dx = paddle.clip(dx, max=ctr_clip, min=-ctr_clip) + dy = paddle.clip(dy, max=ctr_clip, min=-ctr_clip) + dw = paddle.clip(dw, max=clip_scale) + dh = paddle.clip(dh, max=clip_scale) + else: + dw = dw.clip(min=-clip_scale, max=clip_scale) + dh = dh.clip(min=-clip_scale, max=clip_scale) + + pred_ctr_x = dx + ctr_x.unsqueeze(1) + pred_ctr_y = dy + ctr_y.unsqueeze(1) + pred_w = paddle.exp(dw) * widths.unsqueeze(1) + pred_h = paddle.exp(dh) * heights.unsqueeze(1) + + pred_boxes = [] + pred_boxes.append(pred_ctr_x - 0.5 * pred_w) + pred_boxes.append(pred_ctr_y - 0.5 * pred_h) + pred_boxes.append(pred_ctr_x + 0.5 * pred_w) + pred_boxes.append(pred_ctr_y + 0.5 * pred_h) + pred_boxes = paddle.stack(pred_boxes, axis=-1) + + if max_shape is not None: + pred_boxes[..., 0::2] = pred_boxes[..., 0::2].clip( + min=0, max=max_shape[1]) + pred_boxes[..., 1::2] = pred_boxes[..., 1::2].clip( + min=0, max=max_shape[0]) + return pred_boxes + + +def expand_bbox(bboxes, scale): + w_half = (bboxes[:, 2] - bboxes[:, 0]) * .5 + h_half = (bboxes[:, 3] - bboxes[:, 1]) * .5 + x_c = (bboxes[:, 2] + bboxes[:, 0]) * .5 + y_c = (bboxes[:, 3] + bboxes[:, 1]) * .5 + + w_half *= scale + h_half *= scale + + bboxes_exp = np.zeros(bboxes.shape, dtype=np.float32) + bboxes_exp[:, 0] = x_c - w_half + bboxes_exp[:, 2] = x_c + w_half + bboxes_exp[:, 1] = y_c - h_half + bboxes_exp[:, 3] = y_c + h_half + + return bboxes_exp + + +def clip_bbox(boxes, im_shape): + h, w = im_shape[0], im_shape[1] + x1 = boxes[:, 0].clip(0, w) + y1 = boxes[:, 1].clip(0, h) + x2 = boxes[:, 2].clip(0, w) + y2 = boxes[:, 3].clip(0, h) + return paddle.stack([x1, y1, x2, y2], axis=1) + + +def nonempty_bbox(boxes, min_size=0, return_mask=False): + w = boxes[:, 2] - boxes[:, 0] + h = boxes[:, 3] - boxes[:, 1] + mask = paddle.logical_and(h > min_size, w > min_size) + if return_mask: + return mask + keep = paddle.nonzero(mask).flatten() + return keep + + +def bbox_area(boxes): + return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) + + +def bbox_overlaps(boxes1, boxes2): + """ + Calculate overlaps between boxes1 and boxes2 + + Args: + boxes1 (Tensor): boxes with shape [M, 4] + boxes2 (Tensor): boxes with shape [N, 4] + + Return: + overlaps (Tensor): overlaps between boxes1 and boxes2 with shape [M, N] + """ + M = boxes1.shape[0] + N = boxes2.shape[0] + if M * N == 0: + return paddle.zeros([M, N], dtype='float32') + area1 = bbox_area(boxes1) + area2 = bbox_area(boxes2) + + xy_max = paddle.minimum( + paddle.unsqueeze(boxes1, 1)[:, :, 2:], boxes2[:, 2:]) + xy_min = paddle.maximum( + paddle.unsqueeze(boxes1, 1)[:, :, :2], boxes2[:, :2]) + width_height = xy_max - xy_min + width_height = width_height.clip(min=0) + inter = width_height.prod(axis=2) + + overlaps = paddle.where(inter > 0, inter / + (paddle.unsqueeze(area1, 1) + area2 - inter), + paddle.zeros_like(inter)) + return overlaps + + +def batch_bbox_overlaps(bboxes1, + bboxes2, + mode='iou', + is_aligned=False, + eps=1e-6): + """Calculate overlap between two set of bboxes. + If ``is_aligned `` is ``False``, then calculate the overlaps between each + bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned + pair of bboxes1 and bboxes2. + Args: + bboxes1 (Tensor): shape (B, m, 4) in format or empty. + bboxes2 (Tensor): shape (B, n, 4) in format or empty. + B indicates the batch dim, in shape (B1, B2, ..., Bn). + If ``is_aligned `` is ``True``, then m and n must be equal. + mode (str): "iou" (intersection over union) or "iof" (intersection over + foreground). + is_aligned (bool, optional): If True, then m and n must be equal. + Default False. + eps (float, optional): A value added to the denominator for numerical + stability. Default 1e-6. + Returns: + Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,) + """ + assert mode in ['iou', 'iof', 'giou'], 'Unsupported mode {}'.format(mode) + # Either the boxes are empty or the length of boxes's last dimenstion is 4 + assert (bboxes1.shape[-1] == 4 or bboxes1.shape[0] == 0) + assert (bboxes2.shape[-1] == 4 or bboxes2.shape[0] == 0) + + # Batch dim must be the same + # Batch dim: (B1, B2, ... Bn) + assert bboxes1.shape[:-2] == bboxes2.shape[:-2] + batch_shape = bboxes1.shape[:-2] + + rows = bboxes1.shape[-2] if bboxes1.shape[0] > 0 else 0 + cols = bboxes2.shape[-2] if bboxes2.shape[0] > 0 else 0 + if is_aligned: + assert rows == cols + + if rows * cols == 0: + if is_aligned: + return paddle.full(batch_shape + (rows, ), 1) + else: + return paddle.full(batch_shape + (rows, cols), 1) + + area1 = (bboxes1[:, 2] - bboxes1[:, 0]) * (bboxes1[:, 3] - bboxes1[:, 1]) + area2 = (bboxes2[:, 2] - bboxes2[:, 0]) * (bboxes2[:, 3] - bboxes2[:, 1]) + + if is_aligned: + lt = paddle.maximum(bboxes1[:, :2], bboxes2[:, :2]) # [B, rows, 2] + rb = paddle.minimum(bboxes1[:, 2:], bboxes2[:, 2:]) # [B, rows, 2] + + wh = (rb - lt).clip(min=0) # [B, rows, 2] + overlap = wh[:, 0] * wh[:, 1] + + if mode in ['iou', 'giou']: + union = area1 + area2 - overlap + else: + union = area1 + if mode == 'giou': + enclosed_lt = paddle.minimum(bboxes1[:, :2], bboxes2[:, :2]) + enclosed_rb = paddle.maximum(bboxes1[:, 2:], bboxes2[:, 2:]) + else: + lt = paddle.maximum(bboxes1[:, :2].reshape([rows, 1, 2]), + bboxes2[:, :2]) # [B, rows, cols, 2] + rb = paddle.minimum(bboxes1[:, 2:].reshape([rows, 1, 2]), + bboxes2[:, 2:]) # [B, rows, cols, 2] + + wh = (rb - lt).clip(min=0) # [B, rows, cols, 2] + overlap = wh[:, :, 0] * wh[:, :, 1] + + if mode in ['iou', 'giou']: + union = area1.reshape([rows,1]) \ + + area2.reshape([1,cols]) - overlap + else: + union = area1[:, None] + if mode == 'giou': + enclosed_lt = paddle.minimum(bboxes1[:, :2].reshape([rows, 1, 2]), + bboxes2[:, :2]) + enclosed_rb = paddle.maximum(bboxes1[:, 2:].reshape([rows, 1, 2]), + bboxes2[:, 2:]) + + eps = paddle.to_tensor([eps]) + union = paddle.maximum(union, eps) + ious = overlap / union + if mode in ['iou', 'iof']: + return ious + # calculate gious + enclose_wh = (enclosed_rb - enclosed_lt).clip(min=0) + enclose_area = enclose_wh[:, :, 0] * enclose_wh[:, :, 1] + enclose_area = paddle.maximum(enclose_area, eps) + gious = ious - (enclose_area - union) / enclose_area + return 1 - gious + + +def xywh2xyxy(box): + x, y, w, h = box + x1 = x - w * 0.5 + y1 = y - h * 0.5 + x2 = x + w * 0.5 + y2 = y + h * 0.5 + return [x1, y1, x2, y2] + + +def make_grid(h, w, dtype): + yv, xv = paddle.meshgrid([paddle.arange(h), paddle.arange(w)]) + return paddle.stack((xv, yv), 2).cast(dtype=dtype) + + +def decode_yolo(box, anchor, downsample_ratio): + """decode yolo box + + Args: + box (list): [x, y, w, h], all have the shape [b, na, h, w, 1] + anchor (list): anchor with the shape [na, 2] + downsample_ratio (int): downsample ratio, default 32 + scale (float): scale, default 1. + + Return: + box (list): decoded box, [x, y, w, h], all have the shape [b, na, h, w, 1] + """ + x, y, w, h = box + na, grid_h, grid_w = x.shape[1:4] + grid = make_grid(grid_h, grid_w, x.dtype).reshape((1, 1, grid_h, grid_w, 2)) + x1 = (x + grid[:, :, :, :, 0:1]) / grid_w + y1 = (y + grid[:, :, :, :, 1:2]) / grid_h + + anchor = paddle.to_tensor(anchor, dtype=x.dtype) + anchor = anchor.reshape((1, na, 1, 1, 2)) + w1 = paddle.exp(w) * anchor[:, :, :, :, 0:1] / (downsample_ratio * grid_w) + h1 = paddle.exp(h) * anchor[:, :, :, :, 1:2] / (downsample_ratio * grid_h) + + return [x1, y1, w1, h1] + + +def batch_iou_similarity(box1, box2, eps=1e-9): + """Calculate iou of box1 and box2 in batch + + Args: + box1 (Tensor): box with the shape [N, M1, 4] + box2 (Tensor): box with the shape [N, M2, 4] + + Return: + iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2] + """ + box1 = box1.unsqueeze(2) # [N, M1, 4] -> [N, M1, 1, 4] + box2 = box2.unsqueeze(1) # [N, M2, 4] -> [N, 1, M2, 4] + px1y1, px2y2 = box1[:, :, :, 0:2], box1[:, :, :, 2:4] + gx1y1, gx2y2 = box2[:, :, :, 0:2], box2[:, :, :, 2:4] + x1y1 = paddle.maximum(px1y1, gx1y1) + x2y2 = paddle.minimum(px2y2, gx2y2) + overlap = (x2y2 - x1y1).clip(0).prod(-1) + area1 = (px2y2 - px1y1).clip(0).prod(-1) + area2 = (gx2y2 - gx1y1).clip(0).prod(-1) + union = area1 + area2 - overlap + eps + return overlap / union + + +def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9): + """calculate the iou of box1 and box2 + + Args: + box1 (list): [x, y, w, h], all have the shape [b, na, h, w, 1] + box2 (list): [x, y, w, h], all have the shape [b, na, h, w, 1] + giou (bool): whether use giou or not, default False + diou (bool): whether use diou or not, default False + ciou (bool): whether use ciou or not, default False + eps (float): epsilon to avoid divide by zero + + Return: + iou (Tensor): iou of box1 and box1, with the shape [b, na, h, w, 1] + """ + px1, py1, px2, py2 = box1 + gx1, gy1, gx2, gy2 = box2 + x1 = paddle.maximum(px1, gx1) + y1 = paddle.maximum(py1, gy1) + x2 = paddle.minimum(px2, gx2) + y2 = paddle.minimum(py2, gy2) + + overlap = ((x2 - x1).clip(0)) * ((y2 - y1).clip(0)) + + area1 = (px2 - px1) * (py2 - py1) + area1 = area1.clip(0) + + area2 = (gx2 - gx1) * (gy2 - gy1) + area2 = area2.clip(0) + + union = area1 + area2 - overlap + eps + iou = overlap / union + + if giou or ciou or diou: + # convex w, h + cw = paddle.maximum(px2, gx2) - paddle.minimum(px1, gx1) + ch = paddle.maximum(py2, gy2) - paddle.minimum(py1, gy1) + if giou: + c_area = cw * ch + eps + return iou - (c_area - union) / c_area + else: + # convex diagonal squared + c2 = cw**2 + ch**2 + eps + # center distance + rho2 = ((px1 + px2 - gx1 - gx2)**2 + (py1 + py2 - gy1 - gy2)**2) / 4 + if diou: + return iou - rho2 / c2 + else: + w1, h1 = px2 - px1, py2 - py1 + eps + w2, h2 = gx2 - gx1, gy2 - gy1 + eps + delta = paddle.atan(w1 / h1) - paddle.atan(w2 / h2) + v = (4 / math.pi**2) * paddle.pow(delta, 2) + alpha = v / (1 + eps - iou + v) + alpha.stop_gradient = True + return iou - (rho2 / c2 + v * alpha) + else: + return iou + + +def bbox_iou_np_expand(box1, box2, x1y1x2y2=True, eps=1e-16): + """ + Calculate the iou of box1 and box2 with numpy. + + Args: + box1 (ndarray): [N, 4] + box2 (ndarray): [M, 4], usually N != M + x1y1x2y2 (bool): whether in x1y1x2y2 stype, default True + eps (float): epsilon to avoid divide by zero + Return: + iou (ndarray): iou of box1 and box2, [N, M] + """ + N, M = len(box1), len(box2) # usually N != M + if x1y1x2y2: + b1_x1, b1_y1 = box1[:, 0], box1[:, 1] + b1_x2, b1_y2 = box1[:, 2], box1[:, 3] + b2_x1, b2_y1 = box2[:, 0], box2[:, 1] + b2_x2, b2_y2 = box2[:, 2], box2[:, 3] + else: + # cxcywh style + # Transform from center and width to exact coordinates + b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 + b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 + b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 + b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 + + # get the coordinates of the intersection rectangle + inter_rect_x1 = np.zeros((N, M), dtype=np.float32) + inter_rect_y1 = np.zeros((N, M), dtype=np.float32) + inter_rect_x2 = np.zeros((N, M), dtype=np.float32) + inter_rect_y2 = np.zeros((N, M), dtype=np.float32) + for i in range(len(box2)): + inter_rect_x1[:, i] = np.maximum(b1_x1, b2_x1[i]) + inter_rect_y1[:, i] = np.maximum(b1_y1, b2_y1[i]) + inter_rect_x2[:, i] = np.minimum(b1_x2, b2_x2[i]) + inter_rect_y2[:, i] = np.minimum(b1_y2, b2_y2[i]) + # Intersection area + inter_area = np.maximum(inter_rect_x2 - inter_rect_x1, 0) * np.maximum( + inter_rect_y2 - inter_rect_y1, 0) + # Union Area + b1_area = np.repeat( + ((b1_x2 - b1_x1) * (b1_y2 - b1_y1)).reshape(-1, 1), M, axis=-1) + b2_area = np.repeat( + ((b2_x2 - b2_x1) * (b2_y2 - b2_y1)).reshape(1, -1), N, axis=0) + + ious = inter_area / (b1_area + b2_area - inter_area + eps) + return ious + + +def bbox2distance(points, bbox, max_dis=None, eps=0.1): + """Decode bounding box based on distances. + Args: + points (Tensor): Shape (n, 2), [x, y]. + bbox (Tensor): Shape (n, 4), "xyxy" format + max_dis (float): Upper bound of the distance. + eps (float): a small value to ensure target < max_dis, instead <= + Returns: + Tensor: Decoded distances. + """ + left = points[:, 0] - bbox[:, 0] + top = points[:, 1] - bbox[:, 1] + right = bbox[:, 2] - points[:, 0] + bottom = bbox[:, 3] - points[:, 1] + if max_dis is not None: + left = left.clip(min=0, max=max_dis - eps) + top = top.clip(min=0, max=max_dis - eps) + right = right.clip(min=0, max=max_dis - eps) + bottom = bottom.clip(min=0, max=max_dis - eps) + return paddle.stack([left, top, right, bottom], -1) + + +def distance2bbox(points, distance, max_shape=None): + """Decode distance prediction to bounding box. + Args: + points (Tensor): Shape (n, 2), [x, y]. + distance (Tensor): Distance from the given point to 4 + boundaries (left, top, right, bottom). + max_shape (tuple): Shape of the image. + Returns: + Tensor: Decoded bboxes. + """ + x1 = points[:, 0] - distance[:, 0] + y1 = points[:, 1] - distance[:, 1] + x2 = points[:, 0] + distance[:, 2] + y2 = points[:, 1] + distance[:, 3] + if max_shape is not None: + x1 = x1.clip(min=0, max=max_shape[1]) + y1 = y1.clip(min=0, max=max_shape[0]) + x2 = x2.clip(min=0, max=max_shape[1]) + y2 = y2.clip(min=0, max=max_shape[0]) + return paddle.stack([x1, y1, x2, y2], -1) + + +def bbox_center(boxes): + """Get bbox centers from boxes. + Args: + boxes (Tensor): boxes with shape (..., 4), "xmin, ymin, xmax, ymax" format. + Returns: + Tensor: boxes centers with shape (..., 2), "cx, cy" format. + """ + boxes_cx = (boxes[..., 0] + boxes[..., 2]) / 2 + boxes_cy = (boxes[..., 1] + boxes[..., 3]) / 2 + return paddle.stack([boxes_cx, boxes_cy], axis=-1) + + +def batch_distance2bbox(points, distance, max_shapes=None): + """Decode distance prediction to bounding box for batch. + Args: + points (Tensor): [B, ..., 2], "xy" format + distance (Tensor): [B, ..., 4], "ltrb" format + max_shapes (Tensor): [B, 2], "h,w" format, Shape of the image. + Returns: + Tensor: Decoded bboxes, "x1y1x2y2" format. + """ + lt, rb = paddle.split(distance, 2, -1) + # while tensor add parameters, parameters should be better placed on the second place + x1y1 = -lt + points + x2y2 = rb + points + out_bbox = paddle.concat([x1y1, x2y2], -1) + if max_shapes is not None: + max_shapes = max_shapes.flip(-1).tile([1, 2]) + delta_dim = out_bbox.ndim - max_shapes.ndim + for _ in range(delta_dim): + max_shapes.unsqueeze_(1) + out_bbox = paddle.where(out_bbox < max_shapes, out_bbox, max_shapes) + out_bbox = paddle.where(out_bbox > 0, out_bbox, + paddle.zeros_like(out_bbox)) + return out_bbox + + +def iou_similarity(box1, box2, eps=1e-10): + """Calculate iou of box1 and box2 + + Args: + box1 (Tensor): box with the shape [M1, 4] + box2 (Tensor): box with the shape [M2, 4] + + Return: + iou (Tensor): iou between box1 and box2 with the shape [M1, M2] + """ + box1 = box1.unsqueeze(1) # [M1, 4] -> [M1, 1, 4] + box2 = box2.unsqueeze(0) # [M2, 4] -> [1, M2, 4] + px1y1, px2y2 = box1[:, :, 0:2], box1[:, :, 2:4] + gx1y1, gx2y2 = box2[:, :, 0:2], box2[:, :, 2:4] + x1y1 = paddle.maximum(px1y1, gx1y1) + x2y2 = paddle.minimum(px2y2, gx2y2) + overlap = (x2y2 - x1y1).clip(0).prod(-1) + area1 = (px2y2 - px1y1).clip(0).prod(-1) + area2 = (gx2y2 - gx1y1).clip(0).prod(-1) + union = area1 + area2 - overlap + eps + return overlap / union diff --git a/PaddleDetection-release-2.6/ppdet/modeling/cls_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/cls_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..3ae8d116959a96bb2bf337dee7330c5909bc61ac --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/cls_utils.py @@ -0,0 +1,40 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +def _get_class_default_kwargs(cls, *args, **kwargs): + """ + Get default arguments of a class in dict format, if args and + kwargs is specified, it will replace default arguments + """ + varnames = cls.__init__.__code__.co_varnames + argcount = cls.__init__.__code__.co_argcount + keys = varnames[:argcount] + assert keys[0] == 'self' + keys = keys[1:] + + values = list(cls.__init__.__defaults__) + assert len(values) == len(keys) + + if len(args) > 0: + for i, arg in enumerate(args): + values[i] = arg + + default_kwargs = dict(zip(keys, values)) + + if len(kwargs) > 0: + for k, v in kwargs.items(): + default_kwargs[k] = v + + return default_kwargs diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..07df124cd3aeeb2b77910cee115700aed1234632 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/__init__.py @@ -0,0 +1,70 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import bbox_head +from . import mask_head +from . import yolo_head +from . import roi_extractor +from . import ssd_head +from . import fcos_head +from . import solov2_head +from . import ttf_head +from . import cascade_head +from . import face_head +from . import s2anet_head +from . import keypoint_hrhrnet_head +from . import centernet_head +from . import gfl_head +from . import simota_head +from . import pico_head +from . import detr_head +from . import sparsercnn_head +from . import tood_head +from . import retina_head +from . import ppyoloe_head +from . import fcosr_head +from . import ppyoloe_r_head +from . import yolof_head +from . import ppyoloe_contrast_head +from . import centertrack_head +from . import sparse_roi_head + +from .bbox_head import * +from .mask_head import * +from .yolo_head import * +from .roi_extractor import * +from .ssd_head import * +from .fcos_head import * +from .solov2_head import * +from .ttf_head import * +from .cascade_head import * +from .face_head import * +from .s2anet_head import * +from .keypoint_hrhrnet_head import * +from .centernet_head import * +from .gfl_head import * +from .simota_head import * +from .pico_head import * +from .detr_head import * +from .sparsercnn_head import * +from .tood_head import * +from .retina_head import * +from .ppyoloe_head import * +from .fcosr_head import * +from .ppyoloe_r_head import * +from .yolof_head import * +from .ppyoloe_contrast_head import * +from .centertrack_head import * +from .sparse_roi_head import * +from .petr_head import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f4403a021f8f1bbdb19113cb094fc67399d3b5f4 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/bbox_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/bbox_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..571d28f8ab9d83df118d19bd8fbb1adb3803351e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/bbox_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/cascade_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/cascade_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b83f7fede67638aecbdbb3b0c7aee752b05041b6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/cascade_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centernet_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centernet_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ead4304e3cb5430378ac873d2758b6a4502af57b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centernet_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centertrack_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centertrack_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..16366fa21fa0b4f4a0cfbe1faef057588ae32b6a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/centertrack_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/detr_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/detr_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b28cc2ab2b9d01b3864cdb40899332b1b29aadfb Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/detr_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/face_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/face_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..047d1977d4e21eaab1979c5a7340fdc9287ec9d6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/face_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcos_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcos_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..78c6cfbfe19e6efde074f3e6535eae4fdde854e5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcos_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcosr_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcosr_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b93f3fb1220cf0de2490de48bb29aa221121a2d3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/fcosr_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/gfl_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/gfl_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..db13969bc8e634e6e209d7c8898769af7a72fc32 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/gfl_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/keypoint_hrhrnet_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/keypoint_hrhrnet_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..167abf26826b09afdbb15c22fcf9108242708143 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/keypoint_hrhrnet_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/mask_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/mask_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..59a3f717183e88e4b7268dc9fc3dc6ec9fd41a5d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/mask_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/petr_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/petr_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c1b96a6d3a0830aa5103345a9091d5e7d196747c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/petr_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/pico_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/pico_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..24f61b478ad60315116a0fe396c5a81a6e9bb649 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/pico_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_contrast_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_contrast_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6a46edfd6fdf249ab1af972f581cd44f1b14dd6c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_contrast_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2c9736ecb0317df62bee36ba3a95266bc74a3764 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_r_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_r_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..16823a25bd2d437ae11ac1c367dac70c1bd9745f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ppyoloe_r_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/retina_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/retina_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c7dd2db2507f5bfddcef05c7a8bec1eef193a97f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/retina_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/roi_extractor.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/roi_extractor.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7acfff6b2b14264d1388708c4cd233963024bfd9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/roi_extractor.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/s2anet_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/s2anet_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ae9eb2b16d1e1b2c79ee6c34fe4fa71fca720139 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/s2anet_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/simota_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/simota_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8eb325d9372a30f485df8eff1bc42ec1731c7efe Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/simota_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/solov2_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/solov2_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c576a78d9b97ce743e8e53ee0ba91095846c896e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/solov2_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparse_roi_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparse_roi_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4ae7783fea0ea1a6cac835d8ab6a0c2995870f02 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparse_roi_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparsercnn_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparsercnn_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..73663ead404e8d729c400f02d17c844ddd40d014 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/sparsercnn_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ssd_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ssd_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..88bef531daebff09f81a6643187542d44aabdd71 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ssd_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/tood_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/tood_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..015ea756ae4f8b4795924d619c327a783e95660c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/tood_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ttf_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ttf_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7f9edf80d0c0435c95d834c2d8ad6776570c578f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/ttf_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolo_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolo_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..eed4dc6a34659aaf40f43ee86b95ed885bf6dc3e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolo_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolof_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolof_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c16885be9d6eab7fc6c3cc77a5b977091d0fc2b8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/heads/__pycache__/yolof_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/bbox_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/bbox_head.py new file mode 100644 index 0000000000000000000000000000000000000000..3ce47983cec3e7e5961be95e719a1187c7d7fc55 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/bbox_head.py @@ -0,0 +1,443 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, XavierUniform, KaimingNormal +from paddle.regularizer import L2Decay + +from ppdet.core.workspace import register, create +from .roi_extractor import RoIAlign +from ..shape_spec import ShapeSpec +from ..bbox_utils import bbox2delta +from ..cls_utils import _get_class_default_kwargs +from ppdet.modeling.layers import ConvNormLayer + +__all__ = ['TwoFCHead', 'XConvNormHead', 'BBoxHead'] + + +@register +class TwoFCHead(nn.Layer): + """ + RCNN bbox head with Two fc layers to extract feature + + Args: + in_channel (int): Input channel which can be derived by from_config + out_channel (int): Output channel + resolution (int): Resolution of input feature map, default 7 + """ + + def __init__(self, in_channel=256, out_channel=1024, resolution=7): + super(TwoFCHead, self).__init__() + self.in_channel = in_channel + self.out_channel = out_channel + fan = in_channel * resolution * resolution + self.fc6 = nn.Linear( + in_channel * resolution * resolution, + out_channel, + weight_attr=paddle.ParamAttr( + initializer=XavierUniform(fan_out=fan))) + self.fc6.skip_quant = True + + self.fc7 = nn.Linear( + out_channel, + out_channel, + weight_attr=paddle.ParamAttr(initializer=XavierUniform())) + self.fc7.skip_quant = True + + @classmethod + def from_config(cls, cfg, input_shape): + s = input_shape + s = s[0] if isinstance(s, (list, tuple)) else s + return {'in_channel': s.channels} + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, )] + + def forward(self, rois_feat): + rois_feat = paddle.flatten(rois_feat, start_axis=1, stop_axis=-1) + fc6 = self.fc6(rois_feat) + fc6 = F.relu(fc6) + fc7 = self.fc7(fc6) + fc7 = F.relu(fc7) + return fc7 + + +@register +class XConvNormHead(nn.Layer): + __shared__ = ['norm_type', 'freeze_norm'] + """ + RCNN bbox head with serveral convolution layers + + Args: + in_channel (int): Input channels which can be derived by from_config + num_convs (int): The number of conv layers + conv_dim (int): The number of channels for the conv layers + out_channel (int): Output channels + resolution (int): Resolution of input feature map + norm_type (string): Norm type, bn, gn, sync_bn are available, + default `gn` + freeze_norm (bool): Whether to freeze the norm + stage_name (string): Prefix name for conv layer, '' by default + """ + + def __init__(self, + in_channel=256, + num_convs=4, + conv_dim=256, + out_channel=1024, + resolution=7, + norm_type='gn', + freeze_norm=False, + stage_name=''): + super(XConvNormHead, self).__init__() + self.in_channel = in_channel + self.num_convs = num_convs + self.conv_dim = conv_dim + self.out_channel = out_channel + self.norm_type = norm_type + self.freeze_norm = freeze_norm + + self.bbox_head_convs = [] + fan = conv_dim * 3 * 3 + initializer = KaimingNormal(fan_in=fan) + for i in range(self.num_convs): + in_c = in_channel if i == 0 else conv_dim + head_conv_name = stage_name + 'bbox_head_conv{}'.format(i) + head_conv = self.add_sublayer( + head_conv_name, + ConvNormLayer( + ch_in=in_c, + ch_out=conv_dim, + filter_size=3, + stride=1, + norm_type=self.norm_type, + freeze_norm=self.freeze_norm, + initializer=initializer)) + self.bbox_head_convs.append(head_conv) + + fan = conv_dim * resolution * resolution + self.fc6 = nn.Linear( + conv_dim * resolution * resolution, + out_channel, + weight_attr=paddle.ParamAttr( + initializer=XavierUniform(fan_out=fan)), + bias_attr=paddle.ParamAttr( + learning_rate=2., regularizer=L2Decay(0.))) + + @classmethod + def from_config(cls, cfg, input_shape): + s = input_shape + s = s[0] if isinstance(s, (list, tuple)) else s + return {'in_channel': s.channels} + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, )] + + def forward(self, rois_feat): + for i in range(self.num_convs): + rois_feat = F.relu(self.bbox_head_convs[i](rois_feat)) + rois_feat = paddle.flatten(rois_feat, start_axis=1, stop_axis=-1) + fc6 = F.relu(self.fc6(rois_feat)) + return fc6 + + +@register +class BBoxHead(nn.Layer): + __shared__ = ['num_classes', 'use_cot'] + __inject__ = ['bbox_assigner', 'bbox_loss', 'loss_cot'] + """ + RCNN bbox head + + Args: + head (nn.Layer): Extract feature in bbox head + in_channel (int): Input channel after RoI extractor + roi_extractor (object): The module of RoI Extractor + bbox_assigner (object): The module of Box Assigner, label and sample the + box. + with_pool (bool): Whether to use pooling for the RoI feature. + num_classes (int): The number of classes + bbox_weight (List[float]): The weight to get the decode box + cot_classes (int): The number of base classes + loss_cot (object): The module of Label-cotuning + use_cot(bool): whether to use Label-cotuning + """ + + def __init__(self, + head, + in_channel, + roi_extractor=_get_class_default_kwargs(RoIAlign), + bbox_assigner='BboxAssigner', + with_pool=False, + num_classes=80, + bbox_weight=[10., 10., 5., 5.], + bbox_loss=None, + loss_normalize_pos=False, + cot_classes=None, + loss_cot='COTLoss', + use_cot=False): + super(BBoxHead, self).__init__() + self.head = head + self.roi_extractor = roi_extractor + if isinstance(roi_extractor, dict): + self.roi_extractor = RoIAlign(**roi_extractor) + self.bbox_assigner = bbox_assigner + + self.with_pool = with_pool + self.num_classes = num_classes + self.bbox_weight = bbox_weight + self.bbox_loss = bbox_loss + self.loss_normalize_pos = loss_normalize_pos + + self.loss_cot = loss_cot + self.cot_relation = None + self.cot_classes = cot_classes + self.use_cot = use_cot + if use_cot: + self.cot_bbox_score = nn.Linear( + in_channel, + self.num_classes + 1, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.01))) + + self.bbox_score = nn.Linear( + in_channel, + self.cot_classes + 1, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.01))) + self.cot_bbox_score.skip_quant = True + else: + self.bbox_score = nn.Linear( + in_channel, + self.num_classes + 1, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.01))) + self.bbox_score.skip_quant = True + + self.bbox_delta = nn.Linear( + in_channel, + 4 * self.num_classes, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.001))) + self.bbox_delta.skip_quant = True + self.assigned_label = None + self.assigned_rois = None + + def init_cot_head(self, relationship): + self.cot_relation = relationship + + @classmethod + def from_config(cls, cfg, input_shape): + roi_pooler = cfg['roi_extractor'] + assert isinstance(roi_pooler, dict) + kwargs = RoIAlign.from_config(cfg, input_shape) + roi_pooler.update(kwargs) + kwargs = {'input_shape': input_shape} + head = create(cfg['head'], **kwargs) + return { + 'roi_extractor': roi_pooler, + 'head': head, + 'in_channel': head.out_shape[0].channels + } + + def forward(self, body_feats=None, rois=None, rois_num=None, inputs=None, cot=False): + """ + body_feats (list[Tensor]): Feature maps from backbone + rois (list[Tensor]): RoIs generated from RPN module + rois_num (Tensor): The number of RoIs in each image + inputs (dict{Tensor}): The ground-truth of image + """ + if self.training: + rois, rois_num, targets = self.bbox_assigner(rois, rois_num, inputs) + self.assigned_rois = (rois, rois_num) + self.assigned_targets = targets + + rois_feat = self.roi_extractor(body_feats, rois, rois_num) + bbox_feat = self.head(rois_feat) + if self.with_pool: + feat = F.adaptive_avg_pool2d(bbox_feat, output_size=1) + feat = paddle.squeeze(feat, axis=[2, 3]) + else: + feat = bbox_feat + if self.use_cot: + scores = self.cot_bbox_score(feat) + cot_scores = self.bbox_score(feat) + else: + scores = self.bbox_score(feat) + deltas = self.bbox_delta(feat) + + if self.training: + loss = self.get_loss( + scores, + deltas, + targets, + rois, + self.bbox_weight, + loss_normalize_pos=self.loss_normalize_pos) + + if self.cot_relation is not None: + loss_cot = self.loss_cot(cot_scores, targets, self.cot_relation) + loss.update(loss_cot) + return loss, bbox_feat + else: + if cot: + pred = self.get_prediction(cot_scores, deltas) + else: + pred = self.get_prediction(scores, deltas) + return pred, self.head + + + def get_loss(self, + scores, + deltas, + targets, + rois, + bbox_weight, + loss_normalize_pos=False): + """ + scores (Tensor): scores from bbox head outputs + deltas (Tensor): deltas from bbox head outputs + targets (list[List[Tensor]]): bbox targets containing tgt_labels, tgt_bboxes and tgt_gt_inds + rois (List[Tensor]): RoIs generated in each batch + """ + cls_name = 'loss_bbox_cls' + reg_name = 'loss_bbox_reg' + loss_bbox = {} + + # TODO: better pass args + tgt_labels, tgt_bboxes, tgt_gt_inds = targets + + # bbox cls + tgt_labels = paddle.concat(tgt_labels) if len( + tgt_labels) > 1 else tgt_labels[0] + valid_inds = paddle.nonzero(tgt_labels >= 0).flatten() + if valid_inds.shape[0] == 0: + loss_bbox[cls_name] = paddle.zeros([1], dtype='float32') + else: + tgt_labels = tgt_labels.cast('int64') + tgt_labels.stop_gradient = True + + if not loss_normalize_pos: + loss_bbox_cls = F.cross_entropy( + input=scores, label=tgt_labels, reduction='mean') + else: + loss_bbox_cls = F.cross_entropy( + input=scores, label=tgt_labels, + reduction='none').sum() / (tgt_labels.shape[0] + 1e-7) + + loss_bbox[cls_name] = loss_bbox_cls + + # bbox reg + + cls_agnostic_bbox_reg = deltas.shape[1] == 4 + + fg_inds = paddle.nonzero( + paddle.logical_and(tgt_labels >= 0, tgt_labels < + self.num_classes)).flatten() + + if fg_inds.numel() == 0: + loss_bbox[reg_name] = paddle.zeros([1], dtype='float32') + return loss_bbox + + if cls_agnostic_bbox_reg: + reg_delta = paddle.gather(deltas, fg_inds) + else: + fg_gt_classes = paddle.gather(tgt_labels, fg_inds) + + reg_row_inds = paddle.arange(fg_gt_classes.shape[0]).unsqueeze(1) + reg_row_inds = paddle.tile(reg_row_inds, [1, 4]).reshape([-1, 1]) + + reg_col_inds = 4 * fg_gt_classes.unsqueeze(1) + paddle.arange(4) + + reg_col_inds = reg_col_inds.reshape([-1, 1]) + reg_inds = paddle.concat([reg_row_inds, reg_col_inds], axis=1) + + reg_delta = paddle.gather(deltas, fg_inds) + reg_delta = paddle.gather_nd(reg_delta, reg_inds).reshape([-1, 4]) + rois = paddle.concat(rois) if len(rois) > 1 else rois[0] + tgt_bboxes = paddle.concat(tgt_bboxes) if len( + tgt_bboxes) > 1 else tgt_bboxes[0] + + reg_target = bbox2delta(rois, tgt_bboxes, bbox_weight) + reg_target = paddle.gather(reg_target, fg_inds) + reg_target.stop_gradient = True + + if self.bbox_loss is not None: + reg_delta = self.bbox_transform(reg_delta) + reg_target = self.bbox_transform(reg_target) + + if not loss_normalize_pos: + loss_bbox_reg = self.bbox_loss( + reg_delta, reg_target).sum() / tgt_labels.shape[0] + loss_bbox_reg *= self.num_classes + + else: + loss_bbox_reg = self.bbox_loss( + reg_delta, reg_target).sum() / (tgt_labels.shape[0] + 1e-7) + + else: + loss_bbox_reg = paddle.abs(reg_delta - reg_target).sum( + ) / tgt_labels.shape[0] + + loss_bbox[reg_name] = loss_bbox_reg + + return loss_bbox + + def bbox_transform(self, deltas, weights=[0.1, 0.1, 0.2, 0.2]): + wx, wy, ww, wh = weights + + deltas = paddle.reshape(deltas, shape=(0, -1, 4)) + + dx = paddle.slice(deltas, axes=[2], starts=[0], ends=[1]) * wx + dy = paddle.slice(deltas, axes=[2], starts=[1], ends=[2]) * wy + dw = paddle.slice(deltas, axes=[2], starts=[2], ends=[3]) * ww + dh = paddle.slice(deltas, axes=[2], starts=[3], ends=[4]) * wh + + dw = paddle.clip(dw, -1.e10, np.log(1000. / 16)) + dh = paddle.clip(dh, -1.e10, np.log(1000. / 16)) + + pred_ctr_x = dx + pred_ctr_y = dy + pred_w = paddle.exp(dw) + pred_h = paddle.exp(dh) + + x1 = pred_ctr_x - 0.5 * pred_w + y1 = pred_ctr_y - 0.5 * pred_h + x2 = pred_ctr_x + 0.5 * pred_w + y2 = pred_ctr_y + 0.5 * pred_h + + x1 = paddle.reshape(x1, shape=(-1, )) + y1 = paddle.reshape(y1, shape=(-1, )) + x2 = paddle.reshape(x2, shape=(-1, )) + y2 = paddle.reshape(y2, shape=(-1, )) + + return paddle.concat([x1, y1, x2, y2]) + + def get_prediction(self, score, delta): + bbox_prob = F.softmax(score) + return delta, bbox_prob + + def get_head(self, ): + return self.head + + def get_assigned_targets(self, ): + return self.assigned_targets + + def get_assigned_rois(self, ): + return self.assigned_rois diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/cascade_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/cascade_head.py new file mode 100644 index 0000000000000000000000000000000000000000..bb0beadbd38f2c2c34a730cfb1705058f3f538bd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/cascade_head.py @@ -0,0 +1,337 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal + +from ppdet.core.workspace import register +from .bbox_head import BBoxHead, TwoFCHead, XConvNormHead +from .roi_extractor import RoIAlign +from ..shape_spec import ShapeSpec +from ..bbox_utils import delta2bbox, clip_bbox, nonempty_bbox +from ..cls_utils import _get_class_default_kwargs + +__all__ = ['CascadeTwoFCHead', 'CascadeXConvNormHead', 'CascadeHead'] + + +@register +class CascadeTwoFCHead(nn.Layer): + __shared__ = ['num_cascade_stage'] + """ + Cascade RCNN bbox head with Two fc layers to extract feature + + Args: + in_channel (int): Input channel which can be derived by from_config + out_channel (int): Output channel + resolution (int): Resolution of input feature map, default 7 + num_cascade_stage (int): The number of cascade stage, default 3 + """ + + def __init__(self, + in_channel=256, + out_channel=1024, + resolution=7, + num_cascade_stage=3): + super(CascadeTwoFCHead, self).__init__() + + self.in_channel = in_channel + self.out_channel = out_channel + + self.head_list = [] + for stage in range(num_cascade_stage): + head_per_stage = self.add_sublayer( + str(stage), TwoFCHead(in_channel, out_channel, resolution)) + self.head_list.append(head_per_stage) + + @classmethod + def from_config(cls, cfg, input_shape): + s = input_shape + s = s[0] if isinstance(s, (list, tuple)) else s + return {'in_channel': s.channels} + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, )] + + def forward(self, rois_feat, stage=0): + out = self.head_list[stage](rois_feat) + return out + + +@register +class CascadeXConvNormHead(nn.Layer): + __shared__ = ['norm_type', 'freeze_norm', 'num_cascade_stage'] + """ + Cascade RCNN bbox head with serveral convolution layers + + Args: + in_channel (int): Input channels which can be derived by from_config + num_convs (int): The number of conv layers + conv_dim (int): The number of channels for the conv layers + out_channel (int): Output channels + resolution (int): Resolution of input feature map + norm_type (string): Norm type, bn, gn, sync_bn are available, + default `gn` + freeze_norm (bool): Whether to freeze the norm + num_cascade_stage (int): The number of cascade stage, default 3 + """ + + def __init__(self, + in_channel=256, + num_convs=4, + conv_dim=256, + out_channel=1024, + resolution=7, + norm_type='gn', + freeze_norm=False, + num_cascade_stage=3): + super(CascadeXConvNormHead, self).__init__() + self.in_channel = in_channel + self.out_channel = out_channel + + self.head_list = [] + for stage in range(num_cascade_stage): + head_per_stage = self.add_sublayer( + str(stage), + XConvNormHead( + in_channel, + num_convs, + conv_dim, + out_channel, + resolution, + norm_type, + freeze_norm, + stage_name='stage{}_'.format(stage))) + self.head_list.append(head_per_stage) + + @classmethod + def from_config(cls, cfg, input_shape): + s = input_shape + s = s[0] if isinstance(s, (list, tuple)) else s + return {'in_channel': s.channels} + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, )] + + def forward(self, rois_feat, stage=0): + out = self.head_list[stage](rois_feat) + return out + + +@register +class CascadeHead(BBoxHead): + __shared__ = ['num_classes', 'num_cascade_stages'] + __inject__ = ['bbox_assigner', 'bbox_loss'] + """ + Cascade RCNN bbox head + + Args: + head (nn.Layer): Extract feature in bbox head + in_channel (int): Input channel after RoI extractor + roi_extractor (object): The module of RoI Extractor + bbox_assigner (object): The module of Box Assigner, label and sample the + box. + num_classes (int): The number of classes + bbox_weight (List[List[float]]): The weight to get the decode box and the + length of weight is the number of cascade stage + num_cascade_stages (int): THe number of stage to refine the box + """ + + def __init__(self, + head, + in_channel, + roi_extractor=_get_class_default_kwargs(RoIAlign), + bbox_assigner='BboxAssigner', + num_classes=80, + bbox_weight=[[10., 10., 5., 5.], [20.0, 20.0, 10.0, 10.0], + [30.0, 30.0, 15.0, 15.0]], + num_cascade_stages=3, + bbox_loss=None, + reg_class_agnostic=True, + stage_loss_weights=None, + loss_normalize_pos=False, + add_gt_as_proposals=[True, False, False]): + + nn.Layer.__init__(self, ) + self.head = head + self.roi_extractor = roi_extractor + if isinstance(roi_extractor, dict): + self.roi_extractor = RoIAlign(**roi_extractor) + self.bbox_assigner = bbox_assigner + + self.num_classes = num_classes + self.bbox_weight = bbox_weight + self.num_cascade_stages = num_cascade_stages + self.bbox_loss = bbox_loss + self.stage_loss_weights = [ + 1. / num_cascade_stages for _ in range(num_cascade_stages) + ] if stage_loss_weights is None else stage_loss_weights + self.add_gt_as_proposals = add_gt_as_proposals + + assert len( + self.stage_loss_weights + ) == num_cascade_stages, f'stage_loss_weights({len(self.stage_loss_weights)}) do not equal to num_cascade_stages({num_cascade_stages})' + + self.reg_class_agnostic = reg_class_agnostic + num_bbox_delta = 4 if reg_class_agnostic else 4 * num_classes + self.loss_normalize_pos = loss_normalize_pos + + self.bbox_score_list = [] + self.bbox_delta_list = [] + for i in range(num_cascade_stages): + score_name = 'bbox_score_stage{}'.format(i) + delta_name = 'bbox_delta_stage{}'.format(i) + bbox_score = self.add_sublayer( + score_name, + nn.Linear( + in_channel, + self.num_classes + 1, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.01)))) + + bbox_delta = self.add_sublayer( + delta_name, + nn.Linear( + in_channel, + num_bbox_delta, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0.0, std=0.001)))) + self.bbox_score_list.append(bbox_score) + self.bbox_delta_list.append(bbox_delta) + self.assigned_label = None + self.assigned_rois = None + + def forward(self, body_feats=None, rois=None, rois_num=None, inputs=None): + """ + body_feats (list[Tensor]): Feature maps from backbone + rois (Tensor): RoIs generated from RPN module + rois_num (Tensor): The number of RoIs in each image + inputs (dict{Tensor}): The ground-truth of image + """ + targets = [] + if self.training: + rois, rois_num, targets = self.bbox_assigner( + rois, + rois_num, + inputs, + add_gt_as_proposals=self.add_gt_as_proposals[0]) + targets_list = [targets] + self.assigned_rois = (rois, rois_num) + self.assigned_targets = targets + + pred_bbox = None + head_out_list = [] + for i in range(self.num_cascade_stages): + if i > 0: + rois, rois_num = self._get_rois_from_boxes(pred_bbox, + inputs['im_shape']) + if self.training: + rois, rois_num, targets = self.bbox_assigner( + rois, + rois_num, + inputs, + i, + is_cascade=True, + add_gt_as_proposals=self.add_gt_as_proposals[i]) + targets_list.append(targets) + + rois_feat = self.roi_extractor(body_feats, rois, rois_num) + bbox_feat = self.head(rois_feat, i) + scores = self.bbox_score_list[i](bbox_feat) + deltas = self.bbox_delta_list[i](bbox_feat) + + # TODO (lyuwenyu) Is it correct for only one class ? + if not self.reg_class_agnostic and i < self.num_cascade_stages - 1: + deltas = deltas.reshape([deltas.shape[0], self.num_classes, 4]) + labels = scores[:, :-1].argmax(axis=-1) + + if self.training: + deltas = deltas[paddle.arange(deltas.shape[0]), labels] + else: + deltas = deltas[((deltas + 10000) * F.one_hot( + labels, num_classes=self.num_classes).unsqueeze(-1) != 0 + ).nonzero(as_tuple=True)].reshape( + [deltas.shape[0], 4]) + + head_out_list.append([scores, deltas, rois]) + pred_bbox = self._get_pred_bbox(deltas, rois, self.bbox_weight[i]) + + if self.training: + loss = {} + for stage, value in enumerate(zip(head_out_list, targets_list)): + (scores, deltas, rois), targets = value + loss_stage = self.get_loss( + scores, + deltas, + targets, + rois, + self.bbox_weight[stage], + loss_normalize_pos=self.loss_normalize_pos) + for k, v in loss_stage.items(): + loss[k + "_stage{}".format( + stage)] = v * self.stage_loss_weights[stage] + + return loss, bbox_feat + else: + scores, deltas, self.refined_rois = self.get_prediction( + head_out_list) + return (deltas, scores), self.head + + def _get_rois_from_boxes(self, boxes, im_shape): + rois = [] + for i, boxes_per_image in enumerate(boxes): + clip_box = clip_bbox(boxes_per_image, im_shape[i]) + if self.training: + keep = nonempty_bbox(clip_box) + if keep.shape[0] == 0: + keep = paddle.zeros([1], dtype='int32') + clip_box = paddle.gather(clip_box, keep) + rois.append(clip_box) + rois_num = paddle.concat([paddle.shape(r)[0] for r in rois]) + return rois, rois_num + + def _get_pred_bbox(self, deltas, proposals, weights): + pred_proposals = paddle.concat(proposals) if len( + proposals) > 1 else proposals[0] + pred_bbox = delta2bbox(deltas, pred_proposals, weights) + pred_bbox = paddle.reshape(pred_bbox, [-1, deltas.shape[-1]]) + num_prop = [] + for p in proposals: + num_prop.append(p.shape[0]) + + # NOTE(dev): num_prob will be tagged as LoDTensorArray because it + # depends on batch_size under @to_static. However the argument + # num_or_sections in paddle.split does not support LoDTensorArray, + # so we use [-1] to replace it if num_prop is not list. The modification + # This ensures the correctness of both dynamic and static graphs. + if not isinstance(num_prop, list): + num_prop = [-1] + return pred_bbox.split(num_prop) + + def get_prediction(self, head_out_list): + """ + head_out_list(List[Tensor]): scores, deltas, rois + """ + pred_list = [] + scores_list = [F.softmax(head[0]) for head in head_out_list] + scores = paddle.add_n(scores_list) / self.num_cascade_stages + # Get deltas and rois from the last stage + _, deltas, rois = head_out_list[-1] + return scores, deltas, rois + + def get_refined_rois(self, ): + return self.refined_rois diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/centernet_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/centernet_head.py new file mode 100644 index 0000000000000000000000000000000000000000..76577749a8c45cf752cba6572ab81490ad4d1e7a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/centernet_head.py @@ -0,0 +1,293 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant, Uniform +from ppdet.core.workspace import register +from ppdet.modeling.losses import CTFocalLoss, GIoULoss + + +class ConvLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + bias=False): + super(ConvLayer, self).__init__() + bias_attr = False + fan_in = ch_in * kernel_size**2 + bound = 1 / math.sqrt(fan_in) + param_attr = paddle.ParamAttr(initializer=Uniform(-bound, bound)) + if bias: + bias_attr = paddle.ParamAttr(initializer=Constant(0.)) + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + weight_attr=param_attr, + bias_attr=bias_attr) + + def forward(self, inputs): + out = self.conv(inputs) + return out + + +@register +class CenterNetHead(nn.Layer): + """ + Args: + in_channels (int): the channel number of input to CenterNetHead. + num_classes (int): the number of classes, 80 (COCO dataset) by default. + head_planes (int): the channel number in all head, 256 by default. + prior_bias (float): prior bias in heatmap head, -2.19 by default, -4.6 in CenterTrack + regress_ltrb (bool): whether to regress left/top/right/bottom or + width/height for a box, True by default. + size_loss (str): the type of size regression loss, 'L1' by default, can be 'giou'. + loss_weight (dict): the weight of each loss. + add_iou (bool): whether to add iou branch, False by default. + """ + + __shared__ = ['num_classes'] + + def __init__(self, + in_channels, + num_classes=80, + head_planes=256, + prior_bias=-2.19, + regress_ltrb=True, + size_loss='L1', + loss_weight={ + 'heatmap': 1.0, + 'size': 0.1, + 'offset': 1.0, + 'iou': 0.0, + }, + add_iou=False): + super(CenterNetHead, self).__init__() + self.regress_ltrb = regress_ltrb + self.loss_weight = loss_weight + self.add_iou = add_iou + + # heatmap head + self.heatmap = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + num_classes, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + with paddle.no_grad(): + self.heatmap[2].conv.bias[:] = prior_bias + + # size(ltrb or wh) head + self.size = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + 4 if regress_ltrb else 2, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + self.size_loss = size_loss + + # offset head + self.offset = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, 2, kernel_size=1, stride=1, padding=0, bias=True)) + + # iou head (optinal) + if self.add_iou and 'iou' in self.loss_weight: + self.iou = nn.Sequential( + ConvLayer( + in_channels, + head_planes, + kernel_size=3, + padding=1, + bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + 4 if regress_ltrb else 2, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channels': input_shape.channels} + + def forward(self, feat, inputs): + heatmap = F.sigmoid(self.heatmap(feat)) + size = self.size(feat) + offset = self.offset(feat) + head_outs = {'heatmap': heatmap, 'size': size, 'offset': offset} + if self.add_iou and 'iou' in self.loss_weight: + iou = self.iou(feat) + head_outs.update({'iou': iou}) + + if self.training: + losses = self.get_loss(inputs, self.loss_weight, head_outs) + return losses + else: + return head_outs + + def get_loss(self, inputs, weights, head_outs): + # 1.heatmap(hm) head loss: CTFocalLoss + heatmap = head_outs['heatmap'] + heatmap_target = inputs['heatmap'] + heatmap = paddle.clip(heatmap, 1e-4, 1 - 1e-4) + ctfocal_loss = CTFocalLoss() + heatmap_loss = ctfocal_loss(heatmap, heatmap_target) + + # 2.size(wh) head loss: L1 loss or GIoU loss + size = head_outs['size'] + index = inputs['index'] + mask = inputs['index_mask'] + size = paddle.transpose(size, perm=[0, 2, 3, 1]) + size_n, _, _, size_c = size.shape + size = paddle.reshape(size, shape=[size_n, -1, size_c]) + index = paddle.unsqueeze(index, 2) + batch_inds = list() + for i in range(size_n): + batch_ind = paddle.full( + shape=[1, index.shape[1], 1], fill_value=i, dtype='int64') + batch_inds.append(batch_ind) + batch_inds = paddle.concat(batch_inds, axis=0) + index = paddle.concat(x=[batch_inds, index], axis=2) + pos_size = paddle.gather_nd(size, index=index) + mask = paddle.unsqueeze(mask, axis=2) + size_mask = paddle.expand_as(mask, pos_size) + size_mask = paddle.cast(size_mask, dtype=pos_size.dtype) + pos_num = size_mask.sum() + size_mask.stop_gradient = True + if self.size_loss == 'L1': + if self.regress_ltrb: + size_target = inputs['size'] + # shape: [bs, max_per_img, 4] + else: + if inputs['size'].shape[-1] == 2: + # inputs['size'] is wh, and regress as wh + # shape: [bs, max_per_img, 2] + size_target = inputs['size'] + else: + # inputs['size'] is ltrb, but regress as wh + # shape: [bs, max_per_img, 4] + size_target = inputs['size'][:, :, 0:2] + inputs[ + 'size'][:, :, 2:] + + size_target.stop_gradient = True + size_loss = F.l1_loss( + pos_size * size_mask, size_target * size_mask, reduction='sum') + size_loss = size_loss / (pos_num + 1e-4) + elif self.size_loss == 'giou': + size_target = inputs['bbox_xys'] + size_target.stop_gradient = True + centers_x = (size_target[:, :, 0:1] + size_target[:, :, 2:3]) / 2.0 + centers_y = (size_target[:, :, 1:2] + size_target[:, :, 3:4]) / 2.0 + x1 = centers_x - pos_size[:, :, 0:1] + y1 = centers_y - pos_size[:, :, 1:2] + x2 = centers_x + pos_size[:, :, 2:3] + y2 = centers_y + pos_size[:, :, 3:4] + pred_boxes = paddle.concat([x1, y1, x2, y2], axis=-1) + giou_loss = GIoULoss(reduction='sum') + size_loss = giou_loss( + pred_boxes * size_mask, + size_target * size_mask, + iou_weight=size_mask, + loc_reweight=None) + size_loss = size_loss / (pos_num + 1e-4) + + # 3.offset(reg) head loss: L1 loss + offset = head_outs['offset'] + offset_target = inputs['offset'] + offset = paddle.transpose(offset, perm=[0, 2, 3, 1]) + offset_n, _, _, offset_c = offset.shape + offset = paddle.reshape(offset, shape=[offset_n, -1, offset_c]) + pos_offset = paddle.gather_nd(offset, index=index) + offset_mask = paddle.expand_as(mask, pos_offset) + offset_mask = paddle.cast(offset_mask, dtype=pos_offset.dtype) + pos_num = offset_mask.sum() + offset_mask.stop_gradient = True + offset_target.stop_gradient = True + offset_loss = F.l1_loss( + pos_offset * offset_mask, + offset_target * offset_mask, + reduction='sum') + offset_loss = offset_loss / (pos_num + 1e-4) + + # 4.iou head loss: GIoU loss (optinal) + if self.add_iou and 'iou' in self.loss_weight: + iou = head_outs['iou'] + iou = paddle.transpose(iou, perm=[0, 2, 3, 1]) + iou_n, _, _, iou_c = iou.shape + iou = paddle.reshape(iou, shape=[iou_n, -1, iou_c]) + pos_iou = paddle.gather_nd(iou, index=index) + iou_mask = paddle.expand_as(mask, pos_iou) + iou_mask = paddle.cast(iou_mask, dtype=pos_iou.dtype) + pos_num = iou_mask.sum() + iou_mask.stop_gradient = True + gt_bbox_xys = inputs['bbox_xys'] + gt_bbox_xys.stop_gradient = True + centers_x = (gt_bbox_xys[:, :, 0:1] + gt_bbox_xys[:, :, 2:3]) / 2.0 + centers_y = (gt_bbox_xys[:, :, 1:2] + gt_bbox_xys[:, :, 3:4]) / 2.0 + x1 = centers_x - pos_size[:, :, 0:1] + y1 = centers_y - pos_size[:, :, 1:2] + x2 = centers_x + pos_size[:, :, 2:3] + y2 = centers_y + pos_size[:, :, 3:4] + pred_boxes = paddle.concat([x1, y1, x2, y2], axis=-1) + giou_loss = GIoULoss(reduction='sum') + iou_loss = giou_loss( + pred_boxes * iou_mask, + gt_bbox_xys * iou_mask, + iou_weight=iou_mask, + loc_reweight=None) + iou_loss = iou_loss / (pos_num + 1e-4) + + losses = { + 'heatmap_loss': heatmap_loss, + 'size_loss': size_loss, + 'offset_loss': offset_loss, + } + det_loss = weights['heatmap'] * heatmap_loss + weights[ + 'size'] * size_loss + weights['offset'] * offset_loss + + if self.add_iou and 'iou' in self.loss_weight: + losses.update({'iou_loss': iou_loss}) + det_loss += weights['iou'] * iou_loss + losses.update({'det_loss': det_loss}) + return losses diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/centertrack_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/centertrack_head.py new file mode 100644 index 0000000000000000000000000000000000000000..dc353362ad85bc6f61619e9210627d0e6f6c9862 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/centertrack_head.py @@ -0,0 +1,244 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from .centernet_head import ConvLayer +from ..keypoint_utils import get_affine_transform + +__all__ = ['CenterTrackHead'] + + +@register +class CenterTrackHead(nn.Layer): + """ + Args: + in_channels (int): the channel number of input to CenterNetHead. + num_classes (int): the number of classes, 1 (MOT17 dataset) by default. + head_planes (int): the channel number in all head, 256 by default. + task (str): the type of task for regression, 'tracking' by default. + loss_weight (dict): the weight of each loss. + add_ltrb_amodal (bool): whether to add ltrb_amodal branch, False by default. + """ + + __shared__ = ['num_classes'] + + def __init__(self, + in_channels, + num_classes=1, + head_planes=256, + task='tracking', + loss_weight={ + 'tracking': 1.0, + 'ltrb_amodal': 0.1, + }, + add_ltrb_amodal=True): + super(CenterTrackHead, self).__init__() + self.task = task + self.loss_weight = loss_weight + self.add_ltrb_amodal = add_ltrb_amodal + + # tracking head + self.tracking = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, 2, kernel_size=1, stride=1, padding=0, bias=True)) + + # ltrb_amodal head + if self.add_ltrb_amodal and 'ltrb_amodal' in self.loss_weight: + self.ltrb_amodal = nn.Sequential( + ConvLayer( + in_channels, + head_planes, + kernel_size=3, + padding=1, + bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + 4, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + + # TODO: add more tasks + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channels': input_shape.channels} + + def forward(self, + feat, + inputs, + bboxes=None, + bbox_inds=None, + topk_clses=None, + topk_ys=None, + topk_xs=None): + tracking = self.tracking(feat) + head_outs = {'tracking': tracking} + if self.add_ltrb_amodal and 'ltrb_amodal' in self.loss_weight: + ltrb_amodal = self.ltrb_amodal(feat) + head_outs.update({'ltrb_amodal': ltrb_amodal}) + + if self.training: + losses = self.get_loss(inputs, self.loss_weight, head_outs) + return losses + else: + ret = self.generic_decode(head_outs, bboxes, bbox_inds, topk_ys, + topk_xs) + return ret + + def get_loss(self, inputs, weights, head_outs): + index = inputs['index'].unsqueeze(2) + mask = inputs['index_mask'].unsqueeze(2) + batch_inds = list() + for i in range(head_outs['tracking'].shape[0]): + batch_ind = paddle.full( + shape=[1, index.shape[1], 1], fill_value=i, dtype='int64') + batch_inds.append(batch_ind) + batch_inds = paddle.concat(batch_inds, axis=0) + index = paddle.concat(x=[batch_inds, index], axis=2) + + # 1.tracking head loss: L1 loss + tracking = head_outs['tracking'].transpose([0, 2, 3, 1]) + tracking_target = inputs['tracking'] + bs, _, _, c = tracking.shape + tracking = tracking.reshape([bs, -1, c]) + pos_tracking = paddle.gather_nd(tracking, index=index) + tracking_mask = paddle.cast( + paddle.expand_as(mask, pos_tracking), dtype=pos_tracking.dtype) + pos_num = tracking_mask.sum() + tracking_mask.stop_gradient = True + tracking_target.stop_gradient = True + tracking_loss = F.l1_loss( + pos_tracking * tracking_mask, + tracking_target * tracking_mask, + reduction='sum') + tracking_loss = tracking_loss / (pos_num + 1e-4) + + # 2.ltrb_amodal head loss(optinal): L1 loss + if self.add_ltrb_amodal and 'ltrb_amodal' in self.loss_weight: + ltrb_amodal = head_outs['ltrb_amodal'].transpose([0, 2, 3, 1]) + ltrb_amodal_target = inputs['ltrb_amodal'] + bs, _, _, c = ltrb_amodal.shape + ltrb_amodal = ltrb_amodal.reshape([bs, -1, c]) + pos_ltrb_amodal = paddle.gather_nd(ltrb_amodal, index=index) + ltrb_amodal_mask = paddle.cast( + paddle.expand_as(mask, pos_ltrb_amodal), + dtype=pos_ltrb_amodal.dtype) + pos_num = ltrb_amodal_mask.sum() + ltrb_amodal_mask.stop_gradient = True + ltrb_amodal_target.stop_gradient = True + ltrb_amodal_loss = F.l1_loss( + pos_ltrb_amodal * ltrb_amodal_mask, + ltrb_amodal_target * ltrb_amodal_mask, + reduction='sum') + ltrb_amodal_loss = ltrb_amodal_loss / (pos_num + 1e-4) + + losses = {'tracking_loss': tracking_loss, } + plugin_loss = weights['tracking'] * tracking_loss + + if self.add_ltrb_amodal and 'ltrb_amodal' in self.loss_weight: + losses.update({'ltrb_amodal_loss': ltrb_amodal_loss}) + plugin_loss += weights['ltrb_amodal'] * ltrb_amodal_loss + losses.update({'plugin_loss': plugin_loss}) + return losses + + def generic_decode(self, head_outs, bboxes, bbox_inds, topk_ys, topk_xs): + topk_ys = paddle.floor(topk_ys) # note: More accurate + topk_xs = paddle.floor(topk_xs) + cts = paddle.concat([topk_xs, topk_ys], 1) + ret = {'bboxes': bboxes, 'cts': cts} + + regression_heads = ['tracking'] # todo: add more tasks + for head in regression_heads: + if head in head_outs: + ret[head] = _tranpose_and_gather_feat(head_outs[head], + bbox_inds) + + if 'ltrb_amodal' in head_outs: + ltrb_amodal = head_outs['ltrb_amodal'] + ltrb_amodal = _tranpose_and_gather_feat(ltrb_amodal, bbox_inds) + bboxes_amodal = paddle.concat( + [ + topk_xs * 1.0 + ltrb_amodal[..., 0:1], + topk_ys * 1.0 + ltrb_amodal[..., 1:2], + topk_xs * 1.0 + ltrb_amodal[..., 2:3], + topk_ys * 1.0 + ltrb_amodal[..., 3:4] + ], + axis=1) + ret['bboxes'] = paddle.concat([bboxes[:, 0:2], bboxes_amodal], 1) + # cls_id, score, x0, y0, x1, y1 + + return ret + + def centertrack_post_process(self, dets, meta, out_thresh): + if not ('bboxes' in dets): + return [{}] + + preds = [] + c, s = meta['center'].numpy(), meta['scale'].numpy() + h, w = meta['out_height'].numpy(), meta['out_width'].numpy() + trans = get_affine_transform( + center=c[0], + input_size=s[0], + rot=0, + output_size=[w[0], h[0]], + shift=(0., 0.), + inv=True).astype(np.float32) + for i, dets_bbox in enumerate(dets['bboxes']): + if dets_bbox[1] < out_thresh: + break + item = {} + item['score'] = dets_bbox[1] + item['class'] = int(dets_bbox[0]) + 1 + item['ct'] = transform_preds_with_trans( + dets['cts'][i].reshape([1, 2]), trans).reshape(2) + + if 'tracking' in dets: + tracking = transform_preds_with_trans( + (dets['tracking'][i] + dets['cts'][i]).reshape([1, 2]), + trans).reshape(2) + item['tracking'] = tracking - item['ct'] + + if 'bboxes' in dets: + bbox = transform_preds_with_trans( + dets_bbox[2:6].reshape([2, 2]), trans).reshape(4) + item['bbox'] = bbox + + preds.append(item) + return preds + + +def transform_preds_with_trans(coords, trans): + target_coords = np.ones((coords.shape[0], 3), np.float32) + target_coords[:, :2] = coords + target_coords = np.dot(trans, target_coords.transpose()).transpose() + return target_coords[:, :2] + + +def _tranpose_and_gather_feat(feat, bbox_inds): + feat = feat.transpose([0, 2, 3, 1]) + feat = feat.reshape([-1, feat.shape[3]]) + feat = paddle.gather(feat, bbox_inds) + return feat diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/detr_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/detr_head.py new file mode 100644 index 0000000000000000000000000000000000000000..6b9d8d8db91bbd44efd9d3e80889f3c418bed176 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/detr_head.py @@ -0,0 +1,404 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +import pycocotools.mask as mask_util +from ..initializer import linear_init_, constant_ +from ..transformers.utils import inverse_sigmoid + +__all__ = ['DETRHead', 'DeformableDETRHead', 'DINOHead'] + + +class MLP(nn.Layer): + """This code is based on + https://github.com/facebookresearch/detr/blob/main/models/detr.py + """ + + def __init__(self, input_dim, hidden_dim, output_dim, num_layers): + super().__init__() + self.num_layers = num_layers + h = [hidden_dim] * (num_layers - 1) + self.layers = nn.LayerList( + nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim])) + + self._reset_parameters() + + def _reset_parameters(self): + for l in self.layers: + linear_init_(l) + + def forward(self, x): + for i, layer in enumerate(self.layers): + x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x) + return x + + +class MultiHeadAttentionMap(nn.Layer): + """This code is based on + https://github.com/facebookresearch/detr/blob/main/models/segmentation.py + + This is a 2D attention module, which only returns the attention softmax (no multiplication by value) + """ + + def __init__(self, query_dim, hidden_dim, num_heads, dropout=0.0, + bias=True): + super().__init__() + self.num_heads = num_heads + self.hidden_dim = hidden_dim + self.dropout = nn.Dropout(dropout) + + weight_attr = paddle.ParamAttr( + initializer=paddle.nn.initializer.XavierUniform()) + bias_attr = paddle.framework.ParamAttr( + initializer=paddle.nn.initializer.Constant()) if bias else False + + self.q_proj = nn.Linear(query_dim, hidden_dim, weight_attr, bias_attr) + self.k_proj = nn.Conv2D( + query_dim, + hidden_dim, + 1, + weight_attr=weight_attr, + bias_attr=bias_attr) + + self.normalize_fact = float(hidden_dim / self.num_heads)**-0.5 + + def forward(self, q, k, mask=None): + q = self.q_proj(q) + k = self.k_proj(k) + bs, num_queries, n, c, h, w = q.shape[0], q.shape[1], self.num_heads,\ + self.hidden_dim // self.num_heads, k.shape[-2], k.shape[-1] + qh = q.reshape([bs, num_queries, n, c]) + kh = k.reshape([bs, n, c, h, w]) + # weights = paddle.einsum("bqnc,bnchw->bqnhw", qh * self.normalize_fact, kh) + qh = qh.transpose([0, 2, 1, 3]).reshape([-1, num_queries, c]) + kh = kh.reshape([-1, c, h * w]) + weights = paddle.bmm(qh * self.normalize_fact, kh).reshape( + [bs, n, num_queries, h, w]).transpose([0, 2, 1, 3, 4]) + + if mask is not None: + weights += mask + # fix a potenial bug: https://github.com/facebookresearch/detr/issues/247 + weights = F.softmax(weights.flatten(3), axis=-1).reshape(weights.shape) + weights = self.dropout(weights) + return weights + + +class MaskHeadFPNConv(nn.Layer): + """This code is based on + https://github.com/facebookresearch/detr/blob/main/models/segmentation.py + + Simple convolutional head, using group norm. + Upsampling is done using a FPN approach + """ + + def __init__(self, input_dim, fpn_dims, context_dim, num_groups=8): + super().__init__() + + inter_dims = [input_dim, + ] + [context_dim // (2**i) for i in range(1, 5)] + weight_attr = paddle.ParamAttr( + initializer=paddle.nn.initializer.KaimingUniform()) + bias_attr = paddle.framework.ParamAttr( + initializer=paddle.nn.initializer.Constant()) + + self.conv0 = self._make_layers(input_dim, input_dim, 3, num_groups, + weight_attr, bias_attr) + self.conv_inter = nn.LayerList() + for in_dims, out_dims in zip(inter_dims[:-1], inter_dims[1:]): + self.conv_inter.append( + self._make_layers(in_dims, out_dims, 3, num_groups, weight_attr, + bias_attr)) + + self.conv_out = nn.Conv2D( + inter_dims[-1], + 1, + 3, + padding=1, + weight_attr=weight_attr, + bias_attr=bias_attr) + + self.adapter = nn.LayerList() + for i in range(len(fpn_dims)): + self.adapter.append( + nn.Conv2D( + fpn_dims[i], + inter_dims[i + 1], + 1, + weight_attr=weight_attr, + bias_attr=bias_attr)) + + def _make_layers(self, + in_dims, + out_dims, + kernel_size, + num_groups, + weight_attr=None, + bias_attr=None): + return nn.Sequential( + nn.Conv2D( + in_dims, + out_dims, + kernel_size, + padding=kernel_size // 2, + weight_attr=weight_attr, + bias_attr=bias_attr), + nn.GroupNorm(num_groups, out_dims), + nn.ReLU()) + + def forward(self, x, bbox_attention_map, fpns): + x = paddle.concat([ + x.tile([bbox_attention_map.shape[1], 1, 1, 1]), + bbox_attention_map.flatten(0, 1) + ], 1) + x = self.conv0(x) + for inter_layer, adapter_layer, feat in zip(self.conv_inter[:-1], + self.adapter, fpns): + feat = adapter_layer(feat).tile( + [bbox_attention_map.shape[1], 1, 1, 1]) + x = inter_layer(x) + x = feat + F.interpolate(x, size=feat.shape[-2:]) + + x = self.conv_inter[-1](x) + x = self.conv_out(x) + return x + + +@register +class DETRHead(nn.Layer): + __shared__ = ['num_classes', 'hidden_dim', 'use_focal_loss'] + __inject__ = ['loss'] + + def __init__(self, + num_classes=80, + hidden_dim=256, + nhead=8, + num_mlp_layers=3, + loss='DETRLoss', + fpn_dims=[1024, 512, 256], + with_mask_head=False, + use_focal_loss=False): + super(DETRHead, self).__init__() + # add background class + self.num_classes = num_classes if use_focal_loss else num_classes + 1 + self.hidden_dim = hidden_dim + self.loss = loss + self.with_mask_head = with_mask_head + self.use_focal_loss = use_focal_loss + + self.score_head = nn.Linear(hidden_dim, self.num_classes) + self.bbox_head = MLP(hidden_dim, + hidden_dim, + output_dim=4, + num_layers=num_mlp_layers) + if self.with_mask_head: + self.bbox_attention = MultiHeadAttentionMap(hidden_dim, hidden_dim, + nhead) + self.mask_head = MaskHeadFPNConv(hidden_dim + nhead, fpn_dims, + hidden_dim) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.score_head) + + @classmethod + def from_config(cls, cfg, hidden_dim, nhead, input_shape): + + return { + 'hidden_dim': hidden_dim, + 'nhead': nhead, + 'fpn_dims': [i.channels for i in input_shape[::-1]][1:] + } + + @staticmethod + def get_gt_mask_from_polygons(gt_poly, pad_mask): + out_gt_mask = [] + for polygons, padding in zip(gt_poly, pad_mask): + height, width = int(padding[:, 0].sum()), int(padding[0, :].sum()) + masks = [] + for obj_poly in polygons: + rles = mask_util.frPyObjects(obj_poly, height, width) + rle = mask_util.merge(rles) + masks.append( + paddle.to_tensor(mask_util.decode(rle)).astype('float32')) + masks = paddle.stack(masks) + masks_pad = paddle.zeros( + [masks.shape[0], pad_mask.shape[1], pad_mask.shape[2]]) + masks_pad[:, :height, :width] = masks + out_gt_mask.append(masks_pad) + return out_gt_mask + + def forward(self, out_transformer, body_feats, inputs=None): + r""" + Args: + out_transformer (Tuple): (feats: [num_levels, batch_size, + num_queries, hidden_dim], + memory: [batch_size, hidden_dim, h, w], + src_proj: [batch_size, h*w, hidden_dim], + src_mask: [batch_size, 1, 1, h, w]) + body_feats (List(Tensor)): list[[B, C, H, W]] + inputs (dict): dict(inputs) + """ + feats, memory, src_proj, src_mask = out_transformer + outputs_logit = self.score_head(feats) + outputs_bbox = F.sigmoid(self.bbox_head(feats)) + outputs_seg = None + if self.with_mask_head: + bbox_attention_map = self.bbox_attention(feats[-1], memory, + src_mask) + fpn_feats = [a for a in body_feats[::-1]][1:] + outputs_seg = self.mask_head(src_proj, bbox_attention_map, + fpn_feats) + outputs_seg = outputs_seg.reshape([ + feats.shape[1], feats.shape[2], outputs_seg.shape[-2], + outputs_seg.shape[-1] + ]) + + if self.training: + assert inputs is not None + assert 'gt_bbox' in inputs and 'gt_class' in inputs + gt_mask = self.get_gt_mask_from_polygons( + inputs['gt_poly'], + inputs['pad_mask']) if 'gt_poly' in inputs else None + return self.loss( + outputs_bbox, + outputs_logit, + inputs['gt_bbox'], + inputs['gt_class'], + masks=outputs_seg, + gt_mask=gt_mask) + else: + return (outputs_bbox[-1], outputs_logit[-1], outputs_seg) + + +@register +class DeformableDETRHead(nn.Layer): + __shared__ = ['num_classes', 'hidden_dim'] + __inject__ = ['loss'] + + def __init__(self, + num_classes=80, + hidden_dim=512, + nhead=8, + num_mlp_layers=3, + loss='DETRLoss'): + super(DeformableDETRHead, self).__init__() + self.num_classes = num_classes + self.hidden_dim = hidden_dim + self.nhead = nhead + self.loss = loss + + self.score_head = nn.Linear(hidden_dim, self.num_classes) + self.bbox_head = MLP(hidden_dim, + hidden_dim, + output_dim=4, + num_layers=num_mlp_layers) + + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.score_head) + constant_(self.score_head.bias, -4.595) + constant_(self.bbox_head.layers[-1].weight) + + with paddle.no_grad(): + bias = paddle.zeros_like(self.bbox_head.layers[-1].bias) + bias[2:] = -2.0 + self.bbox_head.layers[-1].bias.set_value(bias) + + @classmethod + def from_config(cls, cfg, hidden_dim, nhead, input_shape): + return {'hidden_dim': hidden_dim, 'nhead': nhead} + + def forward(self, out_transformer, body_feats, inputs=None): + r""" + Args: + out_transformer (Tuple): (feats: [num_levels, batch_size, + num_queries, hidden_dim], + memory: [batch_size, + \sum_{l=0}^{L-1} H_l \cdot W_l, hidden_dim], + reference_points: [batch_size, num_queries, 2]) + body_feats (List(Tensor)): list[[B, C, H, W]] + inputs (dict): dict(inputs) + """ + feats, memory, reference_points = out_transformer + reference_points = inverse_sigmoid(reference_points.unsqueeze(0)) + outputs_bbox = self.bbox_head(feats) + + # It's equivalent to "outputs_bbox[:, :, :, :2] += reference_points", + # but the gradient is wrong in paddle. + outputs_bbox = paddle.concat( + [ + outputs_bbox[:, :, :, :2] + reference_points, + outputs_bbox[:, :, :, 2:] + ], + axis=-1) + + outputs_bbox = F.sigmoid(outputs_bbox) + outputs_logit = self.score_head(feats) + + if self.training: + assert inputs is not None + assert 'gt_bbox' in inputs and 'gt_class' in inputs + + return self.loss(outputs_bbox, outputs_logit, inputs['gt_bbox'], + inputs['gt_class']) + else: + return (outputs_bbox[-1], outputs_logit[-1], None) + + +@register +class DINOHead(nn.Layer): + __inject__ = ['loss'] + + def __init__(self, loss='DINOLoss'): + super(DINOHead, self).__init__() + self.loss = loss + + def forward(self, out_transformer, body_feats, inputs=None): + (dec_out_bboxes, dec_out_logits, enc_topk_bboxes, enc_topk_logits, + dn_meta) = out_transformer + if self.training: + assert inputs is not None + assert 'gt_bbox' in inputs and 'gt_class' in inputs + + if dn_meta is not None: + dn_out_bboxes, dec_out_bboxes = paddle.split( + dec_out_bboxes, dn_meta['dn_num_split'], axis=2) + dn_out_logits, dec_out_logits = paddle.split( + dec_out_logits, dn_meta['dn_num_split'], axis=2) + else: + dn_out_bboxes, dn_out_logits = None, None + + out_bboxes = paddle.concat( + [enc_topk_bboxes.unsqueeze(0), dec_out_bboxes]) + out_logits = paddle.concat( + [enc_topk_logits.unsqueeze(0), dec_out_logits]) + + return self.loss( + out_bboxes, + out_logits, + inputs['gt_bbox'], + inputs['gt_class'], + dn_out_bboxes=dn_out_bboxes, + dn_out_logits=dn_out_logits, + dn_meta=dn_meta) + else: + return (dec_out_bboxes[-1], dec_out_logits[-1], None) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/face_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/face_head.py new file mode 100644 index 0000000000000000000000000000000000000000..360f909a67fd272acc15cdbcd79c1172e9b1088a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/face_head.py @@ -0,0 +1,111 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register +from ..layers import AnchorGeneratorSSD +from ..cls_utils import _get_class_default_kwargs + + +@register +class FaceHead(nn.Layer): + """ + Head block for Face detection network + + Args: + num_classes (int): Number of output classes. + in_channels (int): Number of input channels. + anchor_generator(object): instance of anchor genertor method. + kernel_size (int): kernel size of Conv2D in FaceHead. + padding (int): padding of Conv2D in FaceHead. + conv_decay (float): norm_decay (float): weight decay for conv layer weights. + loss (object): loss of face detection model. + """ + __shared__ = ['num_classes'] + __inject__ = ['anchor_generator', 'loss'] + + def __init__(self, + num_classes=80, + in_channels=[96, 96], + anchor_generator=_get_class_default_kwargs(AnchorGeneratorSSD), + kernel_size=3, + padding=1, + conv_decay=0., + loss='SSDLoss'): + super(FaceHead, self).__init__() + # add background class + self.num_classes = num_classes + 1 + self.in_channels = in_channels + self.anchor_generator = anchor_generator + self.loss = loss + + if isinstance(anchor_generator, dict): + self.anchor_generator = AnchorGeneratorSSD(**anchor_generator) + + self.num_priors = self.anchor_generator.num_priors + self.box_convs = [] + self.score_convs = [] + for i, num_prior in enumerate(self.num_priors): + box_conv_name = "boxes{}".format(i) + box_conv = self.add_sublayer( + box_conv_name, + nn.Conv2D( + in_channels=self.in_channels[i], + out_channels=num_prior * 4, + kernel_size=kernel_size, + padding=padding)) + self.box_convs.append(box_conv) + + score_conv_name = "scores{}".format(i) + score_conv = self.add_sublayer( + score_conv_name, + nn.Conv2D( + in_channels=self.in_channels[i], + out_channels=num_prior * self.num_classes, + kernel_size=kernel_size, + padding=padding)) + self.score_convs.append(score_conv) + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def forward(self, feats, image, gt_bbox=None, gt_class=None): + box_preds = [] + cls_scores = [] + prior_boxes = [] + for feat, box_conv, score_conv in zip(feats, self.box_convs, + self.score_convs): + box_pred = box_conv(feat) + box_pred = paddle.transpose(box_pred, [0, 2, 3, 1]) + box_pred = paddle.reshape(box_pred, [0, -1, 4]) + box_preds.append(box_pred) + + cls_score = score_conv(feat) + cls_score = paddle.transpose(cls_score, [0, 2, 3, 1]) + cls_score = paddle.reshape(cls_score, [0, -1, self.num_classes]) + cls_scores.append(cls_score) + + prior_boxes = self.anchor_generator(feats, image) + + if self.training: + return self.get_loss(box_preds, cls_scores, gt_bbox, gt_class, + prior_boxes) + else: + return (box_preds, cls_scores), prior_boxes + + def get_loss(self, boxes, scores, gt_bbox, gt_class, prior_boxes): + return self.loss(boxes, scores, gt_bbox, gt_class, prior_boxes) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/fcos_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/fcos_head.py new file mode 100644 index 0000000000000000000000000000000000000000..d6dab8c8d851685034c7a4b61418f495dd02b5fa --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/fcos_head.py @@ -0,0 +1,363 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Constant + +from ppdet.core.workspace import register +from ppdet.modeling.layers import ConvNormLayer, MultiClassNMS + +__all__ = ['FCOSFeat', 'FCOSHead'] + + +class ScaleReg(nn.Layer): + """ + Parameter for scaling the regression outputs. + """ + + def __init__(self): + super(ScaleReg, self).__init__() + self.scale_reg = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=1.)), + dtype="float32") + + def forward(self, inputs): + out = inputs * self.scale_reg + return out + + +@register +class FCOSFeat(nn.Layer): + """ + FCOSFeat of FCOS + + Args: + feat_in (int): The channel number of input Tensor. + feat_out (int): The channel number of output Tensor. + num_convs (int): The convolution number of the FCOSFeat. + norm_type (str): Normalization type, 'bn'/'sync_bn'/'gn'. + use_dcn (bool): Whether to use dcn in tower or not. + """ + + def __init__(self, + feat_in=256, + feat_out=256, + num_convs=4, + norm_type='bn', + use_dcn=False): + super(FCOSFeat, self).__init__() + self.feat_in = feat_in + self.feat_out = feat_out + self.num_convs = num_convs + self.norm_type = norm_type + self.cls_subnet_convs = [] + self.reg_subnet_convs = [] + for i in range(self.num_convs): + in_c = feat_in if i == 0 else feat_out + + cls_conv_name = 'fcos_head_cls_tower_conv_{}'.format(i) + cls_conv = self.add_sublayer( + cls_conv_name, + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=3, + stride=1, + norm_type=norm_type, + use_dcn=use_dcn, + bias_on=True, + lr_scale=2.)) + self.cls_subnet_convs.append(cls_conv) + + reg_conv_name = 'fcos_head_reg_tower_conv_{}'.format(i) + reg_conv = self.add_sublayer( + reg_conv_name, + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=3, + stride=1, + norm_type=norm_type, + use_dcn=use_dcn, + bias_on=True, + lr_scale=2.)) + self.reg_subnet_convs.append(reg_conv) + + def forward(self, fpn_feat): + cls_feat = fpn_feat + reg_feat = fpn_feat + for i in range(self.num_convs): + cls_feat = F.relu(self.cls_subnet_convs[i](cls_feat)) + reg_feat = F.relu(self.reg_subnet_convs[i](reg_feat)) + return cls_feat, reg_feat + + +@register +class FCOSHead(nn.Layer): + """ + FCOSHead + Args: + num_classes (int): Number of classes + fcos_feat (object): Instance of 'FCOSFeat' + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + norm_reg_targets (bool): Normalization the regression target if true + centerness_on_reg (bool): The prediction of centerness on regression or clssification branch + num_shift (float): Relative offset between the center of the first shift and the top-left corner of img + fcos_loss (object): Instance of 'FCOSLoss' + nms (object): Instance of 'MultiClassNMS' + trt (bool): Whether to use trt in nms of deploy + """ + __inject__ = ['fcos_feat', 'fcos_loss', 'nms'] + __shared__ = ['num_classes', 'trt'] + + def __init__(self, + num_classes=80, + fcos_feat='FCOSFeat', + fpn_stride=[8, 16, 32, 64, 128], + prior_prob=0.01, + multiply_strides_reg_targets=False, + norm_reg_targets=True, + centerness_on_reg=True, + num_shift=0.5, + sqrt_score=False, + fcos_loss='FCOSLoss', + nms='MultiClassNMS', + trt=False): + super(FCOSHead, self).__init__() + self.fcos_feat = fcos_feat + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.fcos_loss = fcos_loss + self.norm_reg_targets = norm_reg_targets + self.centerness_on_reg = centerness_on_reg + self.multiply_strides_reg_targets = multiply_strides_reg_targets + self.num_shift = num_shift + self.nms = nms + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt + self.sqrt_score = sqrt_score + self.is_teacher = False + + conv_cls_name = "fcos_head_cls" + bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) + self.fcos_head_cls = self.add_sublayer( + conv_cls_name, + nn.Conv2D( + in_channels=256, + out_channels=self.num_classes, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr( + initializer=Constant(value=bias_init_value)))) + + conv_reg_name = "fcos_head_reg" + self.fcos_head_reg = self.add_sublayer( + conv_reg_name, + nn.Conv2D( + in_channels=256, + out_channels=4, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + conv_centerness_name = "fcos_head_centerness" + self.fcos_head_centerness = self.add_sublayer( + conv_centerness_name, + nn.Conv2D( + in_channels=256, + out_channels=1, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + self.scales_regs = [] + for i in range(len(self.fpn_stride)): + lvl = int(math.log(int(self.fpn_stride[i]), 2)) + feat_name = 'p{}_feat'.format(lvl) + scale_reg = self.add_sublayer(feat_name, ScaleReg()) + self.scales_regs.append(scale_reg) + + def _compute_locations_by_level(self, fpn_stride, feature, num_shift=0.5): + """ + Compute locations of anchor points of each FPN layer + Args: + fpn_stride (int): The stride of current FPN feature map + feature (Tensor): Tensor of current FPN feature map + Return: + Anchor points locations of current FPN feature map + """ + h, w = feature.shape[2], feature.shape[3] + shift_x = paddle.arange(0, w * fpn_stride, fpn_stride) + shift_y = paddle.arange(0, h * fpn_stride, fpn_stride) + shift_x = paddle.unsqueeze(shift_x, axis=0) + shift_y = paddle.unsqueeze(shift_y, axis=1) + shift_x = paddle.expand(shift_x, shape=[h, w]) + shift_y = paddle.expand(shift_y, shape=[h, w]) + + shift_x = paddle.reshape(shift_x, shape=[-1]) + shift_y = paddle.reshape(shift_y, shape=[-1]) + location = paddle.stack( + [shift_x, shift_y], axis=-1) + float(fpn_stride * num_shift) + return location + + def forward(self, fpn_feats, targets=None): + assert len(fpn_feats) == len( + self.fpn_stride + ), "The size of fpn_feats is not equal to size of fpn_stride" + cls_logits_list = [] + bboxes_reg_list = [] + centerness_list = [] + for scale_reg, fpn_stride, fpn_feat in zip(self.scales_regs, + self.fpn_stride, fpn_feats): + fcos_cls_feat, fcos_reg_feat = self.fcos_feat(fpn_feat) + cls_logits = self.fcos_head_cls(fcos_cls_feat) + bbox_reg = scale_reg(self.fcos_head_reg(fcos_reg_feat)) + if self.centerness_on_reg: + centerness = self.fcos_head_centerness(fcos_reg_feat) + else: + centerness = self.fcos_head_centerness(fcos_cls_feat) + if self.norm_reg_targets: + bbox_reg = F.relu(bbox_reg) + if self.multiply_strides_reg_targets: + bbox_reg = bbox_reg * fpn_stride + else: + if not self.training or targets.get( + 'get_data', + False) or targets.get('is_teacher', False): + bbox_reg = bbox_reg * fpn_stride + else: + bbox_reg = paddle.exp(bbox_reg) + cls_logits_list.append(cls_logits) + bboxes_reg_list.append(bbox_reg) + centerness_list.append(centerness) + + if targets is not None: + self.is_teacher = targets.get('is_teacher', False) + if self.is_teacher: + return [cls_logits_list, bboxes_reg_list, centerness_list] + + if self.training and targets is not None: + get_data = targets.get('get_data', False) + if get_data: + return [cls_logits_list, bboxes_reg_list, centerness_list] + + losses = {} + fcos_head_outs = [cls_logits_list, bboxes_reg_list, centerness_list] + losses_fcos = self.get_loss(fcos_head_outs, targets) + losses.update(losses_fcos) + + total_loss = paddle.add_n(list(losses.values())) + losses.update({'loss': total_loss}) + return losses + else: + # eval or infer + locations_list = [] + for fpn_stride, feature in zip(self.fpn_stride, fpn_feats): + location = self._compute_locations_by_level(fpn_stride, feature, + self.num_shift) + locations_list.append(location) + + fcos_head_outs = [ + locations_list, cls_logits_list, bboxes_reg_list, + centerness_list + ] + return fcos_head_outs + + def get_loss(self, fcos_head_outs, targets): + cls_logits, bboxes_reg, centerness = fcos_head_outs + + # get labels,reg_target,centerness + tag_labels, tag_bboxes, tag_centerness = [], [], [] + for i in range(len(self.fpn_stride)): + k_lbl = 'labels{}'.format(i) + if k_lbl in targets: + tag_labels.append(targets[k_lbl]) + k_box = 'reg_target{}'.format(i) + if k_box in targets: + tag_bboxes.append(targets[k_box]) + k_ctn = 'centerness{}'.format(i) + if k_ctn in targets: + tag_centerness.append(targets[k_ctn]) + + losses_fcos = self.fcos_loss(cls_logits, bboxes_reg, centerness, + tag_labels, tag_bboxes, tag_centerness) + return losses_fcos + + def _post_process_by_level(self, + locations, + box_cls, + box_reg, + box_ctn, + sqrt_score=False): + box_scores = F.sigmoid(box_cls).flatten(2).transpose([0, 2, 1]) + box_centerness = F.sigmoid(box_ctn).flatten(2).transpose([0, 2, 1]) + pred_scores = box_scores * box_centerness + if sqrt_score: + pred_scores = paddle.sqrt(pred_scores) + + box_reg_ch_last = box_reg.flatten(2).transpose([0, 2, 1]) + box_reg_decoding = paddle.stack( + [ + locations[:, 0] - box_reg_ch_last[:, :, 0], + locations[:, 1] - box_reg_ch_last[:, :, 1], + locations[:, 0] + box_reg_ch_last[:, :, 2], + locations[:, 1] + box_reg_ch_last[:, :, 3] + ], + axis=1) + pred_boxes = box_reg_decoding.transpose([0, 2, 1]) + + return pred_scores, pred_boxes + + def post_process(self, fcos_head_outs, scale_factor): + locations, cls_logits, bboxes_reg, centerness = fcos_head_outs + pred_bboxes, pred_scores = [], [] + + for pts, cls, reg, ctn in zip(locations, cls_logits, bboxes_reg, + centerness): + scores, boxes = self._post_process_by_level(pts, cls, reg, ctn, + self.sqrt_score) + pred_scores.append(scores) + pred_bboxes.append(boxes) + pred_bboxes = paddle.concat(pred_bboxes, axis=1) + pred_scores = paddle.concat(pred_scores, axis=1) + + # scale bbox to origin + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], axis=-1).reshape([-1, 1, 4]) + pred_bboxes /= scale_factor + + pred_scores = pred_scores.transpose([0, 2, 1]) + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/fcosr_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/fcosr_head.py new file mode 100644 index 0000000000000000000000000000000000000000..df98883dffdb4d10697fea0d28a1aa890186b835 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/fcosr_head.py @@ -0,0 +1,396 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from paddle import ParamAttr +from paddle.regularizer import L2Decay + +from .fcos_head import ScaleReg +from ..initializer import bias_init_with_prob, constant_, normal_ +from ..ops import get_act_fn, anchor_generator +from ..rbox_utils import box2corners +from ..losses import ProbIoULoss +import numpy as np + +__all__ = ['FCOSRHead'] + + +def trunc_div(a, b): + ipt = paddle.divide(a, b) + sign_ipt = paddle.sign(ipt) + abs_ipt = paddle.abs(ipt) + abs_ipt = paddle.floor(abs_ipt) + out = paddle.multiply(sign_ipt, abs_ipt) + return out + + +def fmod(a, b): + return a - trunc_div(a, b) * b + + +def fmod_eval(a, b): + return a - a.divide(b).cast(paddle.int32).cast(paddle.float32) * b + + +class ConvBNLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size=3, + stride=1, + groups=1, + padding=0, + norm_cfg={'name': 'gn', + 'num_groups': 32}, + act=None): + super(ConvBNLayer, self).__init__() + + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=padding, + groups=groups, + bias_attr=False) + + norm_type = norm_cfg['name'] + if norm_type in ['sync_bn', 'bn']: + self.norm = nn.BatchNorm2D( + ch_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + else: + groups = norm_cfg.get('num_groups', 1) + self.norm = nn.GroupNorm( + num_groups=groups, + num_channels=ch_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + + def forward(self, x): + x = self.conv(x) + x = self.norm(x) + x = self.act(x) + + return x + + +@register +class FCOSRHead(nn.Layer): + """ FCOSR Head, refer to https://arxiv.org/abs/2111.10780 for details """ + + __shared__ = ['num_classes', 'trt'] + __inject__ = ['assigner', 'nms'] + + def __init__(self, + num_classes=15, + in_channels=256, + feat_channels=256, + stacked_convs=4, + act='relu', + fpn_strides=[4, 8, 16, 32, 64], + trt=False, + loss_weight={'class': 1.0, + 'probiou': 1.0}, + norm_cfg={'name': 'gn', + 'num_groups': 32}, + assigner='FCOSRAssigner', + nms='MultiClassNMS'): + + super(FCOSRHead, self).__init__() + self.in_channels = in_channels + self.num_classes = num_classes + self.fpn_strides = fpn_strides + self.stacked_convs = stacked_convs + self.loss_weight = loss_weight + self.half_pi = paddle.to_tensor( + [1.5707963267948966], dtype=paddle.float32) + self.probiou_loss = ProbIoULoss(mode='l1') + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + self.trt = trt + self.loss_weight = loss_weight + self.assigner = assigner + self.nms = nms + # stem + self.stem_cls = nn.LayerList() + self.stem_reg = nn.LayerList() + for i in range(self.stacked_convs): + self.stem_cls.append( + ConvBNLayer( + self.in_channels[i], + feat_channels, + filter_size=3, + stride=1, + padding=1, + norm_cfg=norm_cfg, + act=act)) + self.stem_reg.append( + ConvBNLayer( + self.in_channels[i], + feat_channels, + filter_size=3, + stride=1, + padding=1, + norm_cfg=norm_cfg, + act=act)) + + self.scales = nn.LayerList( + [ScaleReg() for _ in range(len(fpn_strides))]) + + # prediction + self.pred_cls = nn.Conv2D(feat_channels, self.num_classes, 3, padding=1) + + self.pred_xy = nn.Conv2D(feat_channels, 2, 3, padding=1) + + self.pred_wh = nn.Conv2D(feat_channels, 2, 3, padding=1) + + self.pred_angle = nn.Conv2D(feat_channels, 1, 3, padding=1) + + self._init_weights() + + def _init_weights(self): + for cls_, reg_ in zip(self.stem_cls, self.stem_reg): + normal_(cls_.conv.weight, std=0.01) + normal_(reg_.conv.weight, std=0.01) + + bias_cls = bias_init_with_prob(0.01) + normal_(self.pred_cls.weight, std=0.01) + constant_(self.pred_cls.bias, bias_cls) + normal_(self.pred_xy.weight, std=0.01) + normal_(self.pred_wh.weight, std=0.01) + normal_(self.pred_angle.weight, std=0.01) + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def _generate_anchors(self, feats): + if self.trt: + anchor_points = [] + for feat, stride in zip(feats, self.fpn_strides): + _, _, h, w = paddle.shape(feat) + anchor, _ = anchor_generator( + feat, + stride * 4, + 1.0, [1.0, 1.0, 1.0, 1.0], [stride, stride], + offset=0.5) + x1, y1, x2, y2 = paddle.split(anchor, 4, axis=-1) + xc = (x1 + x2 + 1) / 2 + yc = (y1 + y2 + 1) / 2 + anchor_point = paddle.concat( + [xc, yc], axis=-1).reshape((1, h * w, 2)) + anchor_points.append(anchor_point) + anchor_points = paddle.concat(anchor_points, axis=1) + return anchor_points, None, None + else: + anchor_points = [] + stride_tensor = [] + num_anchors_list = [] + for feat, stride in zip(feats, self.fpn_strides): + _, _, h, w = paddle.shape(feat) + shift_x = (paddle.arange(end=w) + 0.5) * stride + shift_y = (paddle.arange(end=h) + 0.5) * stride + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([1, -1, 2])) + stride_tensor.append( + paddle.full( + [1, h * w, 1], stride, dtype='float32')) + num_anchors_list.append(h * w) + anchor_points = paddle.concat(anchor_points, axis=1) + stride_tensor = paddle.concat(stride_tensor, axis=1) + return anchor_points, stride_tensor, num_anchors_list + + def forward(self, feats, target=None): + if self.training: + return self.forward_train(feats, target) + else: + return self.forward_eval(feats, target) + + def forward_train(self, feats, target=None): + anchor_points, stride_tensor, num_anchors_list = self._generate_anchors( + feats) + cls_pred_list, reg_pred_list = [], [] + for stride, feat, scale in zip(self.fpn_strides, feats, self.scales): + # cls + cls_feat = feat + for cls_layer in self.stem_cls: + cls_feat = cls_layer(cls_feat) + cls_pred = F.sigmoid(self.pred_cls(cls_feat)) + cls_pred_list.append(cls_pred.flatten(2).transpose((0, 2, 1))) + # reg + reg_feat = feat + for reg_layer in self.stem_reg: + reg_feat = reg_layer(reg_feat) + + reg_xy = scale(self.pred_xy(reg_feat)) * stride + reg_wh = F.elu(scale(self.pred_wh(reg_feat)) + 1.) * stride + reg_angle = self.pred_angle(reg_feat) + reg_angle = fmod(reg_angle, self.half_pi) + reg_pred = paddle.concat([reg_xy, reg_wh, reg_angle], axis=1) + reg_pred_list.append(reg_pred.flatten(2).transpose((0, 2, 1))) + + cls_pred_list = paddle.concat(cls_pred_list, axis=1) + reg_pred_list = paddle.concat(reg_pred_list, axis=1) + + return self.get_loss([ + cls_pred_list, reg_pred_list, anchor_points, stride_tensor, + num_anchors_list + ], target) + + def forward_eval(self, feats, target=None): + cls_pred_list, reg_pred_list = [], [] + anchor_points, _, _ = self._generate_anchors(feats) + for stride, feat, scale in zip(self.fpn_strides, feats, self.scales): + b, _, h, w = paddle.shape(feat) + # cls + cls_feat = feat + for cls_layer in self.stem_cls: + cls_feat = cls_layer(cls_feat) + cls_pred = F.sigmoid(self.pred_cls(cls_feat)) + cls_pred_list.append(cls_pred.reshape([b, self.num_classes, h * w])) + # reg + reg_feat = feat + for reg_layer in self.stem_reg: + reg_feat = reg_layer(reg_feat) + + reg_xy = scale(self.pred_xy(reg_feat)) * stride + reg_wh = F.elu(scale(self.pred_wh(reg_feat)) + 1.) * stride + reg_angle = self.pred_angle(reg_feat) + reg_angle = fmod_eval(reg_angle, self.half_pi) + reg_pred = paddle.concat([reg_xy, reg_wh, reg_angle], axis=1) + reg_pred = reg_pred.reshape([b, 5, h * w]).transpose((0, 2, 1)) + reg_pred_list.append(reg_pred) + + cls_pred_list = paddle.concat(cls_pred_list, axis=2) + reg_pred_list = paddle.concat(reg_pred_list, axis=1) + reg_pred_list = self._bbox_decode(anchor_points, reg_pred_list) + return cls_pred_list, reg_pred_list + + def _bbox_decode(self, points, reg_pred_list): + xy, wha = paddle.split(reg_pred_list, [2, 3], axis=-1) + xy = xy + points + return paddle.concat([xy, wha], axis=-1) + + def _box2corners(self, pred_bboxes): + """ convert (x, y, w, h, angle) to (x1, y1, x2, y2, x3, y3, x4, y4) + + Args: + pred_bboxes (Tensor): [B, N, 5] + + Returns: + polys (Tensor): [B, N, 8] + """ + x, y, w, h, angle = paddle.split(pred_bboxes, 5, axis=-1) + cos_a_half = paddle.cos(angle) * 0.5 + sin_a_half = paddle.sin(angle) * 0.5 + w_x = cos_a_half * w + w_y = sin_a_half * w + h_x = -sin_a_half * h + h_y = cos_a_half * h + return paddle.concat( + [ + x + w_x + h_x, y + w_y + h_y, x - w_x + h_x, y - w_y + h_y, + x - w_x - h_x, y - w_y - h_y, x + w_x - h_x, y + w_y - h_y + ], + axis=-1) + + def get_loss(self, head_outs, gt_meta): + cls_pred_list, reg_pred_list, anchor_points, stride_tensor, num_anchors_list = head_outs + gt_labels = gt_meta['gt_class'] + gt_bboxes = gt_meta['gt_bbox'] + gt_rboxes = gt_meta['gt_rbox'] + pad_gt_mask = gt_meta['pad_gt_mask'] + # decode + pred_rboxes = self._bbox_decode(anchor_points, reg_pred_list) + # label assignment + assigned_labels, assigned_rboxes, assigned_scores = \ + self.assigner( + anchor_points, + stride_tensor, + num_anchors_list, + gt_labels, + gt_bboxes, + gt_rboxes, + pad_gt_mask, + self.num_classes, + pred_rboxes + ) + + # reg_loss + mask_positive = (assigned_labels != self.num_classes) + num_pos = mask_positive.sum().item() + if num_pos > 0: + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 5]) + pred_rboxes_pos = paddle.masked_select(pred_rboxes, + bbox_mask).reshape([-1, 5]) + assigned_rboxes_pos = paddle.masked_select( + assigned_rboxes, bbox_mask).reshape([-1, 5]) + bbox_weight = paddle.masked_select( + assigned_scores.sum(-1), mask_positive).reshape([-1]) + avg_factor = bbox_weight.sum() + loss_probiou = self.probiou_loss(pred_rboxes_pos, + assigned_rboxes_pos) + loss_probiou = paddle.sum(loss_probiou * bbox_weight) / avg_factor + else: + loss_probiou = pred_rboxes.sum() * 0. + + avg_factor = max(num_pos, 1.0) + # cls_loss + loss_cls = self._qfocal_loss( + cls_pred_list, assigned_scores, reduction='sum') + loss_cls = loss_cls / avg_factor + + loss = self.loss_weight['class'] * loss_cls + \ + self.loss_weight['probiou'] * loss_probiou + out_dict = { + 'loss': loss, + 'loss_probiou': loss_probiou, + 'loss_cls': loss_cls + } + return out_dict + + @staticmethod + def _qfocal_loss(score, label, gamma=2.0, reduction='sum'): + weight = (score - label).pow(gamma) + loss = F.binary_cross_entropy( + score, label, weight=weight, reduction=reduction) + return loss + + def post_process(self, head_outs, scale_factor): + pred_scores, pred_rboxes = head_outs + # [B, N, 5] -> [B, N, 4, 2] -> [B, N, 8] + pred_rboxes = self._box2corners(pred_rboxes) + # scale bbox to origin + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [ + scale_x, scale_y, scale_x, scale_y, scale_x, scale_y, scale_x, + scale_y + ], + axis=-1).reshape([-1, 1, 8]) + pred_rboxes /= scale_factor + bbox_pred, bbox_num, before_nms_indexes = self.nms(pred_rboxes, + pred_scores) + return bbox_pred, bbox_num, before_nms_indexes diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/gfl_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/gfl_head.py new file mode 100644 index 0000000000000000000000000000000000000000..040a3f7090d4a82ed5f4641967ceae1c0349d3fb --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/gfl_head.py @@ -0,0 +1,736 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/gfl_head.py + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Constant + +from ppdet.core.workspace import register +from ppdet.modeling.bbox_utils import distance2bbox, bbox2distance, batch_distance2bbox +from ppdet.data.transform.atss_assigner import bbox_overlaps + +__all__ = ['GFLHead', 'LDGFLHead'] + + +class ScaleReg(nn.Layer): + """ + Parameter for scaling the regression outputs. + """ + + def __init__(self): + super(ScaleReg, self).__init__() + self.scale_reg = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=1.)), + dtype="float32") + + def forward(self, inputs): + out = inputs * self.scale_reg + return out + + +class Integral(nn.Layer): + """A fixed layer for calculating integral result from distribution. + This layer calculates the target location by :math: `sum{P(y_i) * y_i}`, + P(y_i) denotes the softmax vector that represents the discrete distribution + y_i denotes the discrete set, usually {0, 1, 2, ..., reg_max} + Args: + reg_max (int): The maximal value of the discrete set. Default: 16. You + may want to reset it according to your new dataset or related + settings. + """ + + def __init__(self, reg_max=16): + super(Integral, self).__init__() + self.reg_max = reg_max + self.register_buffer('project', + paddle.linspace(0, self.reg_max, self.reg_max + 1)) + + def forward(self, x): + """Forward feature from the regression head to get integral result of + bounding box location. + Args: + x (Tensor): Features of the regression head, shape (N, 4*(n+1)), + n is self.reg_max. + Returns: + x (Tensor): Integral result of box locations, i.e., distance + offsets from the box center in four directions, shape (N, 4). + """ + x = F.softmax(x.reshape([-1, self.reg_max + 1]), axis=1) + x = F.linear(x, self.project) + if self.training: + x = x.reshape([-1, 4]) + return x + + +@register +class DGQP(nn.Layer): + """Distribution-Guided Quality Predictor of GFocal head + Args: + reg_topk (int): top-k statistics of distribution to guide LQE + reg_channels (int): hidden layer unit to generate LQE + add_mean (bool): Whether to calculate the mean of top-k statistics + """ + + def __init__(self, reg_topk=4, reg_channels=64, add_mean=True): + super(DGQP, self).__init__() + self.reg_topk = reg_topk + self.reg_channels = reg_channels + self.add_mean = add_mean + self.total_dim = reg_topk + if add_mean: + self.total_dim += 1 + self.reg_conv1 = self.add_sublayer( + 'dgqp_reg_conv1', + nn.Conv2D( + in_channels=4 * self.total_dim, + out_channels=self.reg_channels, + kernel_size=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + self.reg_conv2 = self.add_sublayer( + 'dgqp_reg_conv2', + nn.Conv2D( + in_channels=self.reg_channels, + out_channels=1, + kernel_size=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + def forward(self, x): + """Forward feature from the regression head to get integral result of + bounding box location. + Args: + x (Tensor): Features of the regression head, shape (N, 4*(n+1)), + n is self.reg_max. + Returns: + x (Tensor): Integral result of box locations, i.e., distance + offsets from the box center in four directions, shape (N, 4). + """ + N, _, H, W = x.shape[:] + prob = F.softmax(x.reshape([N, 4, -1, H, W]), axis=2) + prob_topk, _ = prob.topk(self.reg_topk, axis=2) + if self.add_mean: + stat = paddle.concat( + [prob_topk, prob_topk.mean( + axis=2, keepdim=True)], axis=2) + else: + stat = prob_topk + y = F.relu(self.reg_conv1(stat.reshape([N, 4 * self.total_dim, H, W]))) + y = F.sigmoid(self.reg_conv2(y)) + return y + + +@register +class GFLHead(nn.Layer): + """ + GFLHead + Args: + conv_feat (object): Instance of 'FCOSFeat' + num_classes (int): Number of classes + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + loss_class (object): Instance of QualityFocalLoss. + loss_dfl (object): Instance of DistributionFocalLoss. + loss_bbox (object): Instance of bbox loss. + reg_max: Max value of integral set :math: `{0, ..., reg_max}` + n QFL setting. Default: 16. + """ + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', 'nms' + ] + __shared__ = ['num_classes'] + + def __init__(self, + conv_feat='FCOSFeat', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32, 64, 128], + prior_prob=0.01, + loss_class='QualityFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + reg_max=16, + feat_in_chan=256, + nms=None, + nms_pre=1000, + cell_offset=0): + super(GFLHead, self).__init__() + self.conv_feat = conv_feat + self.dgqp_module = dgqp_module + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.loss_qfl = loss_class + self.loss_dfl = loss_dfl + self.loss_bbox = loss_bbox + self.reg_max = reg_max + self.feat_in_chan = feat_in_chan + self.nms = nms + self.nms_pre = nms_pre + self.cell_offset = cell_offset + self.use_sigmoid = self.loss_qfl.use_sigmoid + if self.use_sigmoid: + self.cls_out_channels = self.num_classes + else: + self.cls_out_channels = self.num_classes + 1 + + conv_cls_name = "gfl_head_cls" + bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) + self.gfl_head_cls = self.add_sublayer( + conv_cls_name, + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=self.cls_out_channels, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr( + initializer=Constant(value=bias_init_value)))) + + conv_reg_name = "gfl_head_reg" + self.gfl_head_reg = self.add_sublayer( + conv_reg_name, + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=4 * (self.reg_max + 1), + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + self.scales_regs = [] + for i in range(len(self.fpn_stride)): + lvl = int(math.log(int(self.fpn_stride[i]), 2)) + feat_name = 'p{}_feat'.format(lvl) + scale_reg = self.add_sublayer(feat_name, ScaleReg()) + self.scales_regs.append(scale_reg) + + self.distribution_project = Integral(self.reg_max) + + def forward(self, fpn_feats): + assert len(fpn_feats) == len( + self.fpn_stride + ), "The size of fpn_feats is not equal to size of fpn_stride" + cls_logits_list = [] + bboxes_reg_list = [] + for stride, scale_reg, fpn_feat in zip(self.fpn_stride, + self.scales_regs, fpn_feats): + conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat) + cls_score = self.gfl_head_cls(conv_cls_feat) + bbox_pred = scale_reg(self.gfl_head_reg(conv_reg_feat)) + if self.dgqp_module: + quality_score = self.dgqp_module(bbox_pred) + cls_score = F.sigmoid(cls_score) * quality_score + if not self.training: + cls_score = F.sigmoid(cls_score.transpose([0, 2, 3, 1])) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]) + b, cell_h, cell_w, _ = paddle.shape(cls_score) + y, x = self.get_single_level_center_point( + [cell_h, cell_w], stride, cell_offset=self.cell_offset) + center_points = paddle.stack([x, y], axis=-1) + cls_score = cls_score.reshape([b, -1, self.cls_out_channels]) + bbox_pred = self.distribution_project(bbox_pred) * stride + bbox_pred = bbox_pred.reshape([-1, cell_h * cell_w, 4]) + + # NOTE: If keep_ratio=False and image shape value that + # multiples of 32, distance2bbox not set max_shapes parameter + # to speed up model prediction. If need to set max_shapes, + # please use inputs['im_shape']. + bbox_pred = batch_distance2bbox( + center_points, bbox_pred, max_shapes=None) + + cls_logits_list.append(cls_score) + bboxes_reg_list.append(bbox_pred) + + return (cls_logits_list, bboxes_reg_list) + + def _images_to_levels(self, target, num_level_anchors): + """ + Convert targets by image to targets by feature level. + """ + level_targets = [] + start = 0 + for n in num_level_anchors: + end = start + n + level_targets.append(target[:, start:end].squeeze(0)) + start = end + return level_targets + + def _grid_cells_to_center(self, grid_cells): + """ + Get center location of each gird cell + Args: + grid_cells: grid cells of a feature map + Returns: + center points + """ + cells_cx = (grid_cells[:, 2] + grid_cells[:, 0]) / 2 + cells_cy = (grid_cells[:, 3] + grid_cells[:, 1]) / 2 + return paddle.stack([cells_cx, cells_cy], axis=-1) + + def get_loss(self, gfl_head_outs, gt_meta): + cls_logits, bboxes_reg = gfl_head_outs + num_level_anchors = [ + featmap.shape[-2] * featmap.shape[-1] for featmap in cls_logits + ] + grid_cells_list = self._images_to_levels(gt_meta['grid_cells'], + num_level_anchors) + labels_list = self._images_to_levels(gt_meta['labels'], + num_level_anchors) + label_weights_list = self._images_to_levels(gt_meta['label_weights'], + num_level_anchors) + bbox_targets_list = self._images_to_levels(gt_meta['bbox_targets'], + num_level_anchors) + num_total_pos = sum(gt_meta['pos_num']) + try: + paddle.distributed.all_reduce(num_total_pos) + num_total_pos = paddle.clip( + num_total_pos / paddle.distributed.get_world_size(), min=1) + except: + num_total_pos = max(num_total_pos, 1) + + loss_bbox_list, loss_dfl_list, loss_qfl_list, avg_factor = [], [], [], [] + for cls_score, bbox_pred, grid_cells, labels, label_weights, bbox_targets, stride in zip( + cls_logits, bboxes_reg, grid_cells_list, labels_list, + label_weights_list, bbox_targets_list, self.fpn_stride): + grid_cells = grid_cells.reshape([-1, 4]) + cls_score = cls_score.transpose([0, 2, 3, 1]).reshape( + [-1, self.cls_out_channels]) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [-1, 4 * (self.reg_max + 1)]) + bbox_targets = bbox_targets.reshape([-1, 4]) + labels = labels.reshape([-1]) + label_weights = label_weights.reshape([-1]) + + bg_class_ind = self.num_classes + pos_inds = paddle.nonzero( + paddle.logical_and((labels >= 0), (labels < bg_class_ind)), + as_tuple=False).squeeze(1) + score = np.zeros(labels.shape) + if len(pos_inds) > 0: + pos_bbox_targets = paddle.gather(bbox_targets, pos_inds, axis=0) + pos_bbox_pred = paddle.gather(bbox_pred, pos_inds, axis=0) + pos_grid_cells = paddle.gather(grid_cells, pos_inds, axis=0) + pos_grid_cell_centers = self._grid_cells_to_center( + pos_grid_cells) / stride + + weight_targets = F.sigmoid(cls_score.detach()) + weight_targets = paddle.gather( + weight_targets.max(axis=1, keepdim=True), pos_inds, axis=0) + pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred) + pos_decode_bbox_pred = distance2bbox(pos_grid_cell_centers, + pos_bbox_pred_corners) + pos_decode_bbox_targets = pos_bbox_targets / stride + bbox_iou = bbox_overlaps( + pos_decode_bbox_pred.detach().numpy(), + pos_decode_bbox_targets.detach().numpy(), + is_aligned=True) + score[pos_inds.numpy()] = bbox_iou + pred_corners = pos_bbox_pred.reshape([-1, self.reg_max + 1]) + target_corners = bbox2distance(pos_grid_cell_centers, + pos_decode_bbox_targets, + self.reg_max).reshape([-1]) + # regression loss + loss_bbox = paddle.sum( + self.loss_bbox(pos_decode_bbox_pred, + pos_decode_bbox_targets) * weight_targets) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + else: + loss_bbox = bbox_pred.sum() * 0 + loss_dfl = bbox_pred.sum() * 0 + weight_targets = paddle.to_tensor([0], dtype='float32') + + # qfl loss + score = paddle.to_tensor(score) + loss_qfl = self.loss_qfl( + cls_score, (labels, score), + weight=label_weights, + avg_factor=num_total_pos) + loss_bbox_list.append(loss_bbox) + loss_dfl_list.append(loss_dfl) + loss_qfl_list.append(loss_qfl) + avg_factor.append(weight_targets.sum()) + + avg_factor = sum(avg_factor) + try: + paddle.distributed.all_reduce(avg_factor) + avg_factor = paddle.clip( + avg_factor / paddle.distributed.get_world_size(), min=1) + except: + avg_factor = max(avg_factor.item(), 1) + if avg_factor <= 0: + loss_qfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_bbox = paddle.to_tensor( + 0, dtype='float32', stop_gradient=False) + loss_dfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + else: + losses_bbox = list(map(lambda x: x / avg_factor, loss_bbox_list)) + losses_dfl = list(map(lambda x: x / avg_factor, loss_dfl_list)) + loss_qfl = sum(loss_qfl_list) + loss_bbox = sum(losses_bbox) + loss_dfl = sum(losses_dfl) + + loss_states = dict( + loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + + return loss_states + + def get_single_level_center_point(self, featmap_size, stride, + cell_offset=0): + """ + Generate pixel centers of a single stage feature map. + Args: + featmap_size: height and width of the feature map + stride: down sample stride of the feature map + Returns: + y and x of the center points + """ + h, w = featmap_size + x_range = (paddle.arange(w, dtype='float32') + cell_offset) * stride + y_range = (paddle.arange(h, dtype='float32') + cell_offset) * stride + y, x = paddle.meshgrid(y_range, x_range) + y = y.flatten() + x = x.flatten() + return y, x + + def post_process(self, gfl_head_outs, im_shape, scale_factor): + cls_scores, bboxes_reg = gfl_head_outs + bboxes = paddle.concat(bboxes_reg, axis=1) + # rescale: [h_scale, w_scale] -> [w_scale, h_scale, w_scale, h_scale] + im_scale = scale_factor.flip([1]).tile([1, 2]).unsqueeze(1) + bboxes /= im_scale + mlvl_scores = paddle.concat(cls_scores, axis=1) + mlvl_scores = mlvl_scores.transpose([0, 2, 1]) + bbox_pred, bbox_num, _ = self.nms(bboxes, mlvl_scores) + return bbox_pred, bbox_num + + +@register +class LDGFLHead(GFLHead): + """ + GFLHead for LD distill + Args: + conv_feat (object): Instance of 'FCOSFeat' + num_classes (int): Number of classes + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + loss_class (object): Instance of QualityFocalLoss. + loss_dfl (object): Instance of DistributionFocalLoss. + loss_bbox (object): Instance of bbox loss. + reg_max: Max value of integral set :math: `{0, ..., reg_max}` + n QFL setting. Default: 16. + """ + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', + 'loss_ld', 'loss_ld_vlr', 'loss_kd', 'nms' + ] + __shared__ = ['num_classes'] + + def __init__(self, + conv_feat='FCOSFeat', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32, 64, 128], + prior_prob=0.01, + loss_class='QualityFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + loss_ld='KnowledgeDistillationKLDivLoss', + loss_ld_vlr='KnowledgeDistillationKLDivLoss', + loss_kd='KnowledgeDistillationKLDivLoss', + reg_max=16, + feat_in_chan=256, + nms=None, + nms_pre=1000, + cell_offset=0): + + super(LDGFLHead, self).__init__( + conv_feat=conv_feat, + dgqp_module=dgqp_module, + num_classes=num_classes, + fpn_stride=fpn_stride, + prior_prob=prior_prob, + loss_class=loss_class, + loss_dfl=loss_dfl, + loss_bbox=loss_bbox, + reg_max=reg_max, + feat_in_chan=feat_in_chan, + nms=nms, + nms_pre=nms_pre, + cell_offset=cell_offset) + self.loss_ld = loss_ld + self.loss_kd = loss_kd + self.loss_ld_vlr = loss_ld_vlr + + def forward(self, fpn_feats): + assert len(fpn_feats) == len( + self.fpn_stride + ), "The size of fpn_feats is not equal to size of fpn_stride" + cls_logits_list = [] + bboxes_reg_list = [] + for stride, scale_reg, fpn_feat in zip(self.fpn_stride, + self.scales_regs, fpn_feats): + conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat) + cls_score = self.gfl_head_cls(conv_cls_feat) + bbox_pred = scale_reg(self.gfl_head_reg(conv_reg_feat)) + + if self.dgqp_module: + quality_score = self.dgqp_module(bbox_pred) + cls_score = F.sigmoid(cls_score) * quality_score + if not self.training: + cls_score = F.sigmoid(cls_score.transpose([0, 2, 3, 1])) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]) + b, cell_h, cell_w, _ = paddle.shape(cls_score) + y, x = self.get_single_level_center_point( + [cell_h, cell_w], stride, cell_offset=self.cell_offset) + center_points = paddle.stack([x, y], axis=-1) + cls_score = cls_score.reshape([b, -1, self.cls_out_channels]) + bbox_pred = self.distribution_project(bbox_pred) * stride + bbox_pred = bbox_pred.reshape([b, cell_h * cell_w, 4]) + + # NOTE: If keep_ratio=False and image shape value that + # multiples of 32, distance2bbox not set max_shapes parameter + # to speed up model prediction. If need to set max_shapes, + # please use inputs['im_shape']. + bbox_pred = batch_distance2bbox( + center_points, bbox_pred, max_shapes=None) + + cls_logits_list.append(cls_score) + bboxes_reg_list.append(bbox_pred) + + return (cls_logits_list, bboxes_reg_list) + + def get_loss(self, gfl_head_outs, gt_meta, soft_label_list, + soft_targets_list): + cls_logits, bboxes_reg = gfl_head_outs + + num_level_anchors = [ + featmap.shape[-2] * featmap.shape[-1] for featmap in cls_logits + ] + + grid_cells_list = self._images_to_levels(gt_meta['grid_cells'], + num_level_anchors) + + labels_list = self._images_to_levels(gt_meta['labels'], + num_level_anchors) + + label_weights_list = self._images_to_levels(gt_meta['label_weights'], + num_level_anchors) + bbox_targets_list = self._images_to_levels(gt_meta['bbox_targets'], + num_level_anchors) + # vlr regions + vlr_regions_list = self._images_to_levels(gt_meta['vlr_regions'], + num_level_anchors) + + num_total_pos = sum(gt_meta['pos_num']) + try: + paddle.distributed.all_reduce(num_total_pos) + num_total_pos = paddle.clip( + num_total_pos / paddle.distributed.get_world_size(), min=1.) + except: + num_total_pos = max(num_total_pos, 1) + + loss_bbox_list, loss_dfl_list, loss_qfl_list, loss_ld_list, avg_factor = [], [], [], [], [] + loss_ld_vlr_list, loss_kd_list = [], [] + + for cls_score, bbox_pred, grid_cells, labels, label_weights, bbox_targets, stride, soft_targets,\ + soft_label, vlr_region in zip( + cls_logits, bboxes_reg, grid_cells_list, labels_list, + label_weights_list, bbox_targets_list, self.fpn_stride, soft_targets_list, + soft_label_list, vlr_regions_list): + + grid_cells = grid_cells.reshape([-1, 4]) + cls_score = cls_score.transpose([0, 2, 3, 1]).reshape( + [-1, self.cls_out_channels]) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [-1, 4 * (self.reg_max + 1)]) + + soft_targets = soft_targets.transpose([0, 2, 3, 1]).reshape( + [-1, 4 * (self.reg_max + 1)]) + + soft_label = soft_label.transpose([0, 2, 3, 1]).reshape( + [-1, self.cls_out_channels]) + + # feture im + # teacher_x = teacher_x.transpose([0, 2, 3, 1]).reshape([-1, 256]) + # x = x.transpose([0, 2, 3, 1]).reshape([-1, 256]) + + bbox_targets = bbox_targets.reshape([-1, 4]) + labels = labels.reshape([-1]) + label_weights = label_weights.reshape([-1]) + + vlr_region = vlr_region.reshape([-1]) + + bg_class_ind = self.num_classes + pos_inds = paddle.nonzero( + paddle.logical_and((labels >= 0), (labels < bg_class_ind)), + as_tuple=False).squeeze(1) + score = np.zeros(labels.shape) + + remain_inds = (vlr_region > 0).nonzero() + + if len(pos_inds) > 0: + pos_bbox_targets = paddle.gather(bbox_targets, pos_inds, axis=0) + pos_bbox_pred = paddle.gather(bbox_pred, pos_inds, axis=0) + pos_grid_cells = paddle.gather(grid_cells, pos_inds, axis=0) + + pos_grid_cell_centers = self._grid_cells_to_center( + pos_grid_cells) / stride + + weight_targets = F.sigmoid(cls_score.detach()) + weight_targets = paddle.gather( + weight_targets.max(axis=1, keepdim=True), pos_inds, axis=0) + pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred) + pos_decode_bbox_pred = distance2bbox(pos_grid_cell_centers, + pos_bbox_pred_corners) + pos_decode_bbox_targets = pos_bbox_targets / stride + bbox_iou = bbox_overlaps( + pos_decode_bbox_pred.detach().numpy(), + pos_decode_bbox_targets.detach().numpy(), + is_aligned=True) + score[pos_inds.numpy()] = bbox_iou + pred_corners = pos_bbox_pred.reshape([-1, self.reg_max + 1]) + + pos_soft_targets = paddle.gather(soft_targets, pos_inds, axis=0) + soft_corners = pos_soft_targets.reshape([-1, self.reg_max + 1]) + + target_corners = bbox2distance(pos_grid_cell_centers, + pos_decode_bbox_targets, + self.reg_max).reshape([-1]) + # regression loss + loss_bbox = paddle.sum( + self.loss_bbox(pos_decode_bbox_pred, + pos_decode_bbox_targets) * weight_targets) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + + # ld loss + loss_ld = self.loss_ld( + pred_corners, + soft_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + + loss_kd = self.loss_kd( + paddle.gather( + cls_score, pos_inds, axis=0), + paddle.gather( + soft_label, pos_inds, axis=0), + weight=paddle.gather( + label_weights, pos_inds, axis=0), + avg_factor=pos_inds.shape[0]) + + else: + loss_bbox = bbox_pred.sum() * 0 + loss_dfl = bbox_pred.sum() * 0 + loss_ld = bbox_pred.sum() * 0 + loss_kd = bbox_pred.sum() * 0 + weight_targets = paddle.to_tensor([0], dtype='float32') + + if len(remain_inds) > 0: + neg_pred_corners = bbox_pred[remain_inds].reshape( + [-1, self.reg_max + 1]) + neg_soft_corners = soft_targets[remain_inds].reshape( + [-1, self.reg_max + 1]) + + remain_targets = vlr_region[remain_inds] + + loss_ld_vlr = self.loss_ld_vlr( + neg_pred_corners, + neg_soft_corners, + weight=remain_targets.expand([-1, 4]).reshape([-1]), + avg_factor=16.0) + else: + loss_ld_vlr = bbox_pred.sum() * 0 + + # qfl loss + score = paddle.to_tensor(score) + loss_qfl = self.loss_qfl( + cls_score, (labels, score), + weight=label_weights, + avg_factor=num_total_pos) + + loss_bbox_list.append(loss_bbox) + loss_dfl_list.append(loss_dfl) + loss_qfl_list.append(loss_qfl) + loss_ld_list.append(loss_ld) + loss_ld_vlr_list.append(loss_ld_vlr) + loss_kd_list.append(loss_kd) + avg_factor.append(weight_targets.sum()) + + avg_factor = sum(avg_factor) # + 1e-6 + try: + paddle.distributed.all_reduce(avg_factor) + avg_factor = paddle.clip( + avg_factor / paddle.distributed.get_world_size(), min=1) + except: + avg_factor = max(avg_factor.item(), 1) + + if avg_factor <= 0: + loss_qfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_bbox = paddle.to_tensor( + 0, dtype='float32', stop_gradient=False) + loss_dfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_ld = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_ld_vlr = paddle.to_tensor( + 0, dtype='float32', stop_gradient=False) + loss_kd = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + else: + losses_bbox = list(map(lambda x: x / avg_factor, loss_bbox_list)) + losses_dfl = list(map(lambda x: x / avg_factor, loss_dfl_list)) + loss_qfl = sum(loss_qfl_list) + loss_bbox = sum(losses_bbox) + loss_dfl = sum(losses_dfl) + loss_ld = sum(loss_ld_list) + loss_ld_vlr = sum(loss_ld_vlr_list) + loss_kd = sum(loss_kd_list) + + loss_states = dict( + loss_qfl=loss_qfl, + loss_bbox=loss_bbox, + loss_dfl=loss_dfl, + loss_ld=loss_ld, + loss_ld_vlr=loss_ld_vlr, + loss_kd=loss_kd) + + return loss_states diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/keypoint_hrhrnet_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/keypoint_hrhrnet_head.py new file mode 100644 index 0000000000000000000000000000000000000000..869b1816e6688bd354296e5e4c4ed84cd6a0566a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/keypoint_hrhrnet_head.py @@ -0,0 +1,108 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register +from .. import layers as L +from ..backbones.hrnet import BasicBlock + + +@register +class HrHRNetHead(nn.Layer): + __inject__ = ['loss'] + + def __init__(self, num_joints, loss='HrHRNetLoss', swahr=False, width=32): + """ + Head for HigherHRNet network + + Args: + num_joints (int): number of keypoints + hrloss (object): HrHRNetLoss instance + swahr (bool): whether to use swahr + width (int): hrnet channel width + """ + super(HrHRNetHead, self).__init__() + self.loss = loss + + self.num_joints = num_joints + num_featout1 = num_joints * 2 + num_featout2 = num_joints + self.swahr = swahr + self.conv1 = L.Conv2d(width, num_featout1, 1, 1, 0, bias=True) + self.conv2 = L.Conv2d(width, num_featout2, 1, 1, 0, bias=True) + self.deconv = nn.Sequential( + L.ConvTranspose2d( + num_featout1 + width, width, 4, 2, 1, 0, bias=False), + L.BatchNorm2d(width), + L.ReLU()) + self.blocks = nn.Sequential(*(BasicBlock( + num_channels=width, + num_filters=width, + has_se=False, + freeze_norm=False, + name='HrHRNetHead_{}'.format(i)) for i in range(4))) + + self.interpolate = L.Upsample(2, mode='bilinear') + self.concat = L.Concat(dim=1) + if swahr: + self.scalelayer0 = nn.Sequential( + L.Conv2d( + width, num_joints, 1, 1, 0, bias=True), + L.BatchNorm2d(num_joints), + L.ReLU(), + L.Conv2d( + num_joints, + num_joints, + 9, + 1, + 4, + groups=num_joints, + bias=True)) + self.scalelayer1 = nn.Sequential( + L.Conv2d( + width, num_joints, 1, 1, 0, bias=True), + L.BatchNorm2d(num_joints), + L.ReLU(), + L.Conv2d( + num_joints, + num_joints, + 9, + 1, + 4, + groups=num_joints, + bias=True)) + + def forward(self, feats, targets=None): + x1 = feats[0] + xo1 = self.conv1(x1) + x2 = self.blocks(self.deconv(self.concat((x1, xo1)))) + xo2 = self.conv2(x2) + num_joints = self.num_joints + if self.training: + heatmap1, tagmap = paddle.split(xo1, 2, axis=1) + if self.swahr: + so1 = self.scalelayer0(x1) + so2 = self.scalelayer1(x2) + hrhrnet_outputs = ([heatmap1, so1], [xo2, so2], tagmap) + return self.loss(hrhrnet_outputs, targets) + else: + hrhrnet_outputs = (heatmap1, xo2, tagmap) + return self.loss(hrhrnet_outputs, targets) + + # averaged heatmap, upsampled tagmap + upsampled = self.interpolate(xo1) + avg = (upsampled[:, :num_joints] + xo2[:, :num_joints]) / 2 + return avg, upsampled[:, num_joints:] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/mask_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/mask_head.py new file mode 100644 index 0000000000000000000000000000000000000000..403d4ceebcf676f9c55c666e2969a33be5816e9c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/mask_head.py @@ -0,0 +1,251 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import KaimingNormal + +from ppdet.core.workspace import register, create +from ppdet.modeling.layers import ConvNormLayer +from .roi_extractor import RoIAlign +from ..cls_utils import _get_class_default_kwargs + + +@register +class MaskFeat(nn.Layer): + """ + Feature extraction in Mask head + + Args: + in_channel (int): Input channels + out_channel (int): Output channels + num_convs (int): The number of conv layers, default 4 + norm_type (string | None): Norm type, bn, gn, sync_bn are available, + default None + """ + + def __init__(self, + in_channel=256, + out_channel=256, + num_convs=4, + norm_type=None): + super(MaskFeat, self).__init__() + self.num_convs = num_convs + self.in_channel = in_channel + self.out_channel = out_channel + self.norm_type = norm_type + fan_conv = out_channel * 3 * 3 + fan_deconv = out_channel * 2 * 2 + + mask_conv = nn.Sequential() + if norm_type == 'gn': + for i in range(self.num_convs): + conv_name = 'mask_inter_feat_{}'.format(i + 1) + mask_conv.add_sublayer( + conv_name, + ConvNormLayer( + ch_in=in_channel if i == 0 else out_channel, + ch_out=out_channel, + filter_size=3, + stride=1, + norm_type=self.norm_type, + initializer=KaimingNormal(fan_in=fan_conv), + skip_quant=True)) + mask_conv.add_sublayer(conv_name + 'act', nn.ReLU()) + else: + for i in range(self.num_convs): + conv_name = 'mask_inter_feat_{}'.format(i + 1) + conv = nn.Conv2D( + in_channels=in_channel if i == 0 else out_channel, + out_channels=out_channel, + kernel_size=3, + padding=1, + weight_attr=paddle.ParamAttr( + initializer=KaimingNormal(fan_in=fan_conv))) + conv.skip_quant = True + mask_conv.add_sublayer(conv_name, conv) + mask_conv.add_sublayer(conv_name + 'act', nn.ReLU()) + mask_conv.add_sublayer( + 'conv5_mask', + nn.Conv2DTranspose( + in_channels=self.out_channel if num_convs > 0 else self.in_channel, + out_channels=self.out_channel, + kernel_size=2, + stride=2, + weight_attr=paddle.ParamAttr( + initializer=KaimingNormal(fan_in=fan_deconv)))) + mask_conv.add_sublayer('conv5_mask' + 'act', nn.ReLU()) + self.upsample = mask_conv + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channel': input_shape.channels, } + + def out_channels(self): + return self.out_channel + + def forward(self, feats): + return self.upsample(feats) + + +@register +class MaskHead(nn.Layer): + __shared__ = ['num_classes', 'export_onnx'] + __inject__ = ['mask_assigner'] + """ + RCNN mask head + + Args: + head (nn.Layer): Extract feature in mask head + roi_extractor (object): The module of RoI Extractor + mask_assigner (object): The module of Mask Assigner, + label and sample the mask + num_classes (int): The number of classes + share_bbox_feat (bool): Whether to share the feature from bbox head, + default false + """ + + def __init__(self, + head, + roi_extractor=_get_class_default_kwargs(RoIAlign), + mask_assigner='MaskAssigner', + num_classes=80, + share_bbox_feat=False, + export_onnx=False): + super(MaskHead, self).__init__() + self.num_classes = num_classes + self.export_onnx = export_onnx + + self.roi_extractor = roi_extractor + if isinstance(roi_extractor, dict): + self.roi_extractor = RoIAlign(**roi_extractor) + self.head = head + self.in_channels = head.out_channels() + self.mask_assigner = mask_assigner + self.share_bbox_feat = share_bbox_feat + self.bbox_head = None + + self.mask_fcn_logits = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.num_classes, + kernel_size=1, + weight_attr=paddle.ParamAttr(initializer=KaimingNormal( + fan_in=self.num_classes))) + self.mask_fcn_logits.skip_quant = True + + @classmethod + def from_config(cls, cfg, input_shape): + roi_pooler = cfg['roi_extractor'] + assert isinstance(roi_pooler, dict) + kwargs = RoIAlign.from_config(cfg, input_shape) + roi_pooler.update(kwargs) + kwargs = {'input_shape': input_shape} + head = create(cfg['head'], **kwargs) + return { + 'roi_extractor': roi_pooler, + 'head': head, + } + + def get_loss(self, mask_logits, mask_label, mask_target, mask_weight): + mask_label = F.one_hot(mask_label, self.num_classes).unsqueeze([2, 3]) + mask_label = paddle.expand_as(mask_label, mask_logits) + mask_label.stop_gradient = True + mask_pred = paddle.gather_nd(mask_logits, paddle.nonzero(mask_label)) + shape = mask_logits.shape + mask_pred = paddle.reshape(mask_pred, [shape[0], shape[2], shape[3]]) + + mask_target = mask_target.cast('float32') + mask_weight = mask_weight.unsqueeze([1, 2]) + loss_mask = F.binary_cross_entropy_with_logits( + mask_pred, mask_target, weight=mask_weight, reduction="mean") + return loss_mask + + def forward_train(self, body_feats, rois, rois_num, inputs, targets, + bbox_feat): + """ + body_feats (list[Tensor]): Multi-level backbone features + rois (list[Tensor]): Proposals for each batch with shape [N, 4] + rois_num (Tensor): The number of proposals for each batch + inputs (dict): ground truth info + """ + tgt_labels, _, tgt_gt_inds = targets + rois, rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights = self.mask_assigner( + rois, tgt_labels, tgt_gt_inds, inputs) + + if self.share_bbox_feat: + rois_feat = paddle.gather(bbox_feat, mask_index) + else: + rois_feat = self.roi_extractor(body_feats, rois, rois_num) + mask_feat = self.head(rois_feat) + mask_logits = self.mask_fcn_logits(mask_feat) + + loss_mask = self.get_loss(mask_logits, tgt_classes, tgt_masks, + tgt_weights) + return {'loss_mask': loss_mask} + + def forward_test(self, + body_feats, + rois, + rois_num, + scale_factor, + feat_func=None): + """ + body_feats (list[Tensor]): Multi-level backbone features + rois (Tensor): Prediction from bbox head with shape [N, 6] + rois_num (Tensor): The number of prediction for each batch + scale_factor (Tensor): The scale factor from origin size to input size + """ + if not self.export_onnx and rois.shape[0] == 0: + mask_out = paddle.full([1, 1, 1], -1) + else: + bbox = [rois[:, 2:]] + labels = rois[:, 0].cast('int32') + rois_feat = self.roi_extractor(body_feats, bbox, rois_num) + if self.share_bbox_feat: + assert feat_func is not None + rois_feat = feat_func(rois_feat) + + mask_feat = self.head(rois_feat) + mask_logit = self.mask_fcn_logits(mask_feat) + if self.num_classes == 1: + mask_out = F.sigmoid(mask_logit)[:, 0, :, :] + else: + num_masks = paddle.shape(mask_logit)[0] + index = paddle.arange(num_masks).cast('int32') + mask_out = mask_logit[index, labels] + mask_out_shape = paddle.shape(mask_out) + mask_out = paddle.reshape(mask_out, [ + paddle.shape(index), mask_out_shape[-2], mask_out_shape[-1] + ]) + mask_out = F.sigmoid(mask_out) + return mask_out + + def forward(self, + body_feats, + rois, + rois_num, + inputs, + targets=None, + bbox_feat=None, + feat_func=None): + if self.training: + return self.forward_train(body_feats, rois, rois_num, inputs, + targets, bbox_feat) + else: + im_scale = inputs['scale_factor'] + return self.forward_test(body_feats, rois, rois_num, im_scale, + feat_func) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/petr_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/petr_head.py new file mode 100644 index 0000000000000000000000000000000000000000..90760c665157b658f776e2ff9f7fbef0b525a006 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/petr_head.py @@ -0,0 +1,1161 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is base on https://github.com/hikvision-research/opera/blob/main/opera/models/dense_heads/petr_head.py +""" +import copy +import numpy as np + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +import paddle.distributed as dist + +from ..transformers.petr_transformer import inverse_sigmoid, masked_fill +from ..initializer import constant_, normal_ + +__all__ = ["PETRHead"] + +from functools import partial + + +def bias_init_with_prob(prior_prob: float) -> float: + """initialize conv/fc bias value according to a given probability value.""" + bias_init = float(-np.log((1 - prior_prob) / prior_prob)) + return bias_init + + +def multi_apply(func, *args, **kwargs): + """Apply function to a list of arguments. + + Note: + This function applies the ``func`` to multiple inputs and + map the multiple outputs of the ``func`` into different + list. Each list contains the same type of outputs corresponding + to different inputs. + + Args: + func (Function): A function that will be applied to a list of + arguments + + Returns: + tuple(list): A tuple containing multiple list, each list contains \ + a kind of returned results by the function + """ + pfunc = partial(func, **kwargs) if kwargs else func + map_results = map(pfunc, *args) + res = tuple(map(list, zip(*map_results))) + return res + + +def reduce_mean(tensor): + """"Obtain the mean of tensor on different GPUs.""" + if not (dist.get_world_size() and dist.is_initialized()): + return tensor + tensor = tensor.clone() + dist.all_reduce( + tensor.divide( + paddle.to_tensor( + dist.get_world_size(), dtype='float32')), + op=dist.ReduceOp.SUM) + return tensor + + +def gaussian_radius(det_size, min_overlap=0.7): + """calculate gaussian radius according to object size. + """ + height, width = det_size + + a1 = 1 + b1 = (height + width) + c1 = width * height * (1 - min_overlap) / (1 + min_overlap) + sq1 = paddle.sqrt(b1**2 - 4 * a1 * c1) + r1 = (b1 + sq1) / 2 + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = paddle.sqrt(b2**2 - 4 * a2 * c2) + r2 = (b2 + sq2) / 2 + + a3 = 4 * min_overlap + b3 = -2 * min_overlap * (height + width) + c3 = (min_overlap - 1) * width * height + sq3 = paddle.sqrt(b3**2 - 4 * a3 * c3) + r3 = (b3 + sq3) / 2 + return min(r1, r2, r3) + + +def gaussian2D(shape, sigma=1): + m, n = [(ss - 1.) / 2. for ss in shape] + y = paddle.arange(-m, m + 1, dtype="float32")[:, None] + x = paddle.arange(-n, n + 1, dtype="float32")[None, :] + # y, x = np.ogrid[-m:m + 1, -n:n + 1] + + h = paddle.exp(-(x * x + y * y) / (2 * sigma * sigma)) + h[h < np.finfo(np.float32).eps * h.max()] = 0 + return h + + +def draw_umich_gaussian(heatmap, center, radius, k=1): + diameter = 2 * radius + 1 + gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) + gaussian = paddle.to_tensor(gaussian, dtype=heatmap.dtype) + + x, y = int(center[0]), int(center[1]) + radius = int(radius) + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left: + radius + right] + # assert masked_gaussian.equal(1).float().sum() == 1 + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: + heatmap[y - top:y + bottom, x - left:x + right] = paddle.maximum( + masked_heatmap, masked_gaussian * k) + return heatmap + + +@register +class PETRHead(nn.Layer): + """Head of `End-to-End Multi-Person Pose Estimation with Transformers`. + + Args: + num_classes (int): Number of categories excluding the background. + in_channels (int): Number of channels in the input feature map. + num_query (int): Number of query in Transformer. + num_kpt_fcs (int, optional): Number of fully-connected layers used in + `FFN`, which is then used for the keypoint regression head. + Default 2. + transformer (obj:`mmcv.ConfigDict`|dict): ConfigDict is used for + building the Encoder and Decoder. Default: None. + sync_cls_avg_factor (bool): Whether to sync the avg_factor of + all ranks. Default to False. + positional_encoding (obj:`mmcv.ConfigDict`|dict): + Config for position encoding. + loss_cls (obj:`mmcv.ConfigDict`|dict): Config of the + classification loss. Default `CrossEntropyLoss`. + loss_kpt (obj:`mmcv.ConfigDict`|dict): Config of the + regression loss. Default `L1Loss`. + loss_oks (obj:`mmcv.ConfigDict`|dict): Config of the + regression oks loss. Default `OKSLoss`. + loss_hm (obj:`mmcv.ConfigDict`|dict): Config of the + regression heatmap loss. Default `NegLoss`. + as_two_stage (bool) : Whether to generate the proposal from + the outputs of encoder. + with_kpt_refine (bool): Whether to refine the reference points + in the decoder. Defaults to True. + test_cfg (obj:`mmcv.ConfigDict`|dict): Testing config of + transformer head. + init_cfg (dict or list[dict], optional): Initialization config dict. + Default: None. + """ + __inject__ = [ + "transformer", "positional_encoding", "assigner", "sampler", "loss_cls", + "loss_kpt", "loss_oks", "loss_hm", "loss_kpt_rpn", "loss_kpt_refine", + "loss_oks_refine" + ] + + def __init__(self, + num_classes, + in_channels, + num_query=100, + num_kpt_fcs=2, + num_keypoints=17, + transformer=None, + sync_cls_avg_factor=True, + positional_encoding='SinePositionalEncoding', + loss_cls='FocalLoss', + loss_kpt='L1Loss', + loss_oks='OKSLoss', + loss_hm='CenterFocalLoss', + with_kpt_refine=True, + assigner='PoseHungarianAssigner', + sampler='PseudoSampler', + loss_kpt_rpn='L1Loss', + loss_kpt_refine='L1Loss', + loss_oks_refine='opera.OKSLoss', + test_cfg=dict(max_per_img=100), + init_cfg=None, + **kwargs): + # NOTE here use `AnchorFreeHead` instead of `TransformerHead`, + # since it brings inconvenience when the initialization of + # `AnchorFreeHead` is called. + super().__init__() + self.bg_cls_weight = 0 + self.sync_cls_avg_factor = sync_cls_avg_factor + self.assigner = assigner + self.sampler = sampler + self.num_query = num_query + self.num_classes = num_classes + self.in_channels = in_channels + self.num_kpt_fcs = num_kpt_fcs + self.test_cfg = test_cfg + self.fp16_enabled = False + self.as_two_stage = transformer.as_two_stage + self.with_kpt_refine = with_kpt_refine + self.num_keypoints = num_keypoints + self.loss_cls = loss_cls + self.loss_kpt = loss_kpt + self.loss_kpt_rpn = loss_kpt_rpn + self.loss_kpt_refine = loss_kpt_refine + self.loss_oks = loss_oks + self.loss_oks_refine = loss_oks_refine + self.loss_hm = loss_hm + if self.loss_cls.use_sigmoid: + self.cls_out_channels = num_classes + else: + self.cls_out_channels = num_classes + 1 + self.positional_encoding = positional_encoding + self.transformer = transformer + self.embed_dims = self.transformer.embed_dims + # assert 'num_feats' in positional_encoding + num_feats = positional_encoding.num_pos_feats + assert num_feats * 2 == self.embed_dims, 'embed_dims should' \ + f' be exactly 2 times of num_feats. Found {self.embed_dims}' \ + f' and {num_feats}.' + self._init_layers() + self.init_weights() + + def _init_layers(self): + """Initialize classification branch and keypoint branch of head.""" + + fc_cls = nn.Linear(self.embed_dims, self.cls_out_channels) + + kpt_branch = [] + kpt_branch.append(nn.Linear(self.embed_dims, 512)) + kpt_branch.append(nn.ReLU()) + for _ in range(self.num_kpt_fcs): + kpt_branch.append(nn.Linear(512, 512)) + kpt_branch.append(nn.ReLU()) + kpt_branch.append(nn.Linear(512, 2 * self.num_keypoints)) + kpt_branch = nn.Sequential(*kpt_branch) + + def _get_clones(module, N): + return nn.LayerList([copy.deepcopy(module) for i in range(N)]) + + # last kpt_branch is used to generate proposal from + # encode feature map when as_two_stage is True. + num_pred = (self.transformer.decoder.num_layers + 1) if \ + self.as_two_stage else self.transformer.decoder.num_layers + + if self.with_kpt_refine: + self.cls_branches = _get_clones(fc_cls, num_pred) + self.kpt_branches = _get_clones(kpt_branch, num_pred) + else: + self.cls_branches = nn.LayerList([fc_cls for _ in range(num_pred)]) + self.kpt_branches = nn.LayerList( + [kpt_branch for _ in range(num_pred)]) + + self.query_embedding = nn.Embedding(self.num_query, self.embed_dims * 2) + + refine_kpt_branch = [] + for _ in range(self.num_kpt_fcs): + refine_kpt_branch.append( + nn.Linear(self.embed_dims, self.embed_dims)) + refine_kpt_branch.append(nn.ReLU()) + refine_kpt_branch.append(nn.Linear(self.embed_dims, 2)) + refine_kpt_branch = nn.Sequential(*refine_kpt_branch) + if self.with_kpt_refine: + num_pred = self.transformer.refine_decoder.num_layers + self.refine_kpt_branches = _get_clones(refine_kpt_branch, num_pred) + self.fc_hm = nn.Linear(self.embed_dims, self.num_keypoints) + + def init_weights(self): + """Initialize weights of the PETR head.""" + self.transformer.init_weights() + if self.loss_cls.use_sigmoid: + bias_init = bias_init_with_prob(0.01) + for m in self.cls_branches: + constant_(m.bias, bias_init) + for m in self.kpt_branches: + constant_(m[-1].bias, 0) + # initialization of keypoint refinement branch + if self.with_kpt_refine: + for m in self.refine_kpt_branches: + constant_(m[-1].bias, 0) + # initialize bias for heatmap prediction + bias_init = bias_init_with_prob(0.1) + normal_(self.fc_hm.weight, std=0.01) + constant_(self.fc_hm.bias, bias_init) + + def forward(self, mlvl_feats, img_metas): + """Forward function. + + Args: + mlvl_feats (tuple[Tensor]): Features from the upstream + network, each is a 4D-tensor with shape + (N, C, H, W). + img_metas (list[dict]): List of image information. + + Returns: + outputs_classes (Tensor): Outputs from the classification head, + shape [nb_dec, bs, num_query, cls_out_channels]. Note + cls_out_channels should include background. + outputs_kpts (Tensor): Sigmoid outputs from the regression + head with normalized coordinate format (cx, cy, w, h). + Shape [nb_dec, bs, num_query, K*2]. + enc_outputs_class (Tensor): The score of each point on encode + feature map, has shape (N, h*w, num_class). Only when + as_two_stage is Ture it would be returned, otherwise + `None` would be returned. + enc_outputs_kpt (Tensor): The proposal generate from the + encode feature map, has shape (N, h*w, K*2). Only when + as_two_stage is Ture it would be returned, otherwise + `None` would be returned. + """ + + batch_size = mlvl_feats[0].shape[0] + input_img_h, input_img_w = img_metas[0]['batch_input_shape'] + img_masks = paddle.zeros( + (batch_size, input_img_h, input_img_w), dtype=mlvl_feats[0].dtype) + for img_id in range(batch_size): + img_h, img_w, _ = img_metas[img_id]['img_shape'] + img_masks[img_id, :img_h, :img_w] = 1 + + mlvl_masks = [] + mlvl_positional_encodings = [] + for feat in mlvl_feats: + mlvl_masks.append( + F.interpolate( + img_masks[None], size=feat.shape[-2:]).squeeze(0)) + mlvl_positional_encodings.append( + self.positional_encoding(mlvl_masks[-1]).transpose( + [0, 3, 1, 2])) + + query_embeds = self.query_embedding.weight + hs, init_reference, inter_references, \ + enc_outputs_class, enc_outputs_kpt, hm_proto, memory = \ + self.transformer( + mlvl_feats, + mlvl_masks, + query_embeds, + mlvl_positional_encodings, + kpt_branches=self.kpt_branches \ + if self.with_kpt_refine else None, # noqa:E501 + cls_branches=self.cls_branches \ + if self.as_two_stage else None # noqa:E501 + ) + + outputs_classes = [] + outputs_kpts = [] + + for lvl in range(hs.shape[0]): + if lvl == 0: + reference = init_reference + else: + reference = inter_references[lvl - 1] + reference = inverse_sigmoid(reference) + outputs_class = self.cls_branches[lvl](hs[lvl]) + tmp_kpt = self.kpt_branches[lvl](hs[lvl]) + assert reference.shape[-1] == self.num_keypoints * 2 + tmp_kpt += reference + outputs_kpt = F.sigmoid(tmp_kpt) + outputs_classes.append(outputs_class) + outputs_kpts.append(outputs_kpt) + + outputs_classes = paddle.stack(outputs_classes) + outputs_kpts = paddle.stack(outputs_kpts) + + if hm_proto is not None: + # get heatmap prediction (training phase) + hm_memory, hm_mask = hm_proto + hm_pred = self.fc_hm(hm_memory) + hm_proto = (hm_pred.transpose((0, 3, 1, 2)), hm_mask) + + if self.as_two_stage: + return outputs_classes, outputs_kpts, \ + enc_outputs_class, F.sigmoid(enc_outputs_kpt), \ + hm_proto, memory, mlvl_masks + else: + raise RuntimeError('only "as_two_stage=True" is supported.') + + def forward_refine(self, memory, mlvl_masks, refine_targets, losses, + img_metas): + """Forward function. + + Args: + mlvl_masks (tuple[Tensor]): The key_padding_mask from + different level used for encoder and decoder, + each is a 3D-tensor with shape (bs, H, W). + losses (dict[str, Tensor]): A dictionary of loss components. + img_metas (list[dict]): List of image information. + + Returns: + dict[str, Tensor]: A dictionary of loss components. + """ + kpt_preds, kpt_targets, area_targets, kpt_weights = refine_targets + pos_inds = kpt_weights.sum(-1) > 0 + if not pos_inds.any(): + pos_kpt_preds = paddle.zeros_like(kpt_preds[:1]) + pos_img_inds = paddle.zeros([1], dtype="int64") + else: + pos_kpt_preds = kpt_preds[pos_inds] + pos_img_inds = (pos_inds.nonzero() / + self.num_query).squeeze(1).astype("int64") + hs, init_reference, inter_references = self.transformer.forward_refine( + mlvl_masks, + memory, + pos_kpt_preds.detach(), + pos_img_inds, + kpt_branches=self.refine_kpt_branches + if self.with_kpt_refine else None, # noqa:E501 + ) + + outputs_kpts = [] + + for lvl in range(hs.shape[0]): + if lvl == 0: + reference = init_reference + else: + reference = inter_references[lvl - 1] + reference = inverse_sigmoid(reference) + tmp_kpt = self.refine_kpt_branches[lvl](hs[lvl]) + assert reference.shape[-1] == 2 + tmp_kpt += reference + outputs_kpt = F.sigmoid(tmp_kpt) + outputs_kpts.append(outputs_kpt) + outputs_kpts = paddle.stack(outputs_kpts) + + if not self.training: + return outputs_kpts + + num_valid_kpt = paddle.clip( + reduce_mean(kpt_weights.sum()), min=1).item() + num_total_pos = paddle.to_tensor( + [outputs_kpts.shape[1]], dtype=kpt_weights.dtype) + num_total_pos = paddle.clip(reduce_mean(num_total_pos), min=1).item() + + if not pos_inds.any(): + for i, kpt_refine_preds in enumerate(outputs_kpts): + loss_kpt = loss_oks = kpt_refine_preds.sum() * 0 + losses[f'd{i}.loss_kpt_refine'] = loss_kpt + losses[f'd{i}.loss_oks_refine'] = loss_oks + continue + return losses + + batch_size = mlvl_masks[0].shape[0] + factors = [] + for img_id in range(batch_size): + img_h, img_w, _ = img_metas[img_id]['img_shape'] + factor = paddle.to_tensor( + [img_w, img_h, img_w, img_h], + dtype="float32").squeeze(-1).unsqueeze(0).tile( + (self.num_query, 1)) + factors.append(factor) + factors = paddle.concat(factors, 0) + factors = factors[pos_inds][:, :2].tile((1, kpt_preds.shape[-1] // 2)) + + pos_kpt_weights = kpt_weights[pos_inds] + pos_kpt_targets = kpt_targets[pos_inds] + pos_kpt_targets_scaled = pos_kpt_targets * factors + pos_areas = area_targets[pos_inds] + pos_valid = kpt_weights[pos_inds][:, 0::2] + for i, kpt_refine_preds in enumerate(outputs_kpts): + if not pos_inds.any(): + print("refine kpt and oks skip") + loss_kpt = loss_oks = kpt_refine_preds.sum() * 0 + losses[f'd{i}.loss_kpt_refine'] = loss_kpt + losses[f'd{i}.loss_oks_refine'] = loss_oks + continue + + # kpt L1 Loss + pos_refine_preds = kpt_refine_preds.reshape( + (kpt_refine_preds.shape[0], -1)) + loss_kpt = self.loss_kpt_refine( + pos_refine_preds, + pos_kpt_targets, + pos_kpt_weights, + avg_factor=num_valid_kpt) + losses[f'd{i}.loss_kpt_refine'] = loss_kpt + # kpt oks loss + pos_refine_preds_scaled = pos_refine_preds * factors + assert (pos_areas > 0).all() + loss_oks = self.loss_oks_refine( + pos_refine_preds_scaled, + pos_kpt_targets_scaled, + pos_valid, + pos_areas, + avg_factor=num_total_pos) + losses[f'd{i}.loss_oks_refine'] = loss_oks + return losses + + # over-write because img_metas are needed as inputs for bbox_head. + def forward_train(self, + x, + img_metas, + gt_bboxes, + gt_labels=None, + gt_keypoints=None, + gt_areas=None, + gt_bboxes_ignore=None, + proposal_cfg=None, + **kwargs): + """Forward function for training mode. + + Args: + x (list[Tensor]): Features from backbone. + img_metas (list[dict]): Meta information of each image, e.g., + image size, scaling factor, etc. + gt_bboxes (list[Tensor]): Ground truth bboxes of the image, + shape (num_gts, 4). + gt_labels (list[Tensor]): Ground truth labels of each box, + shape (num_gts,). + gt_keypoints (list[Tensor]): Ground truth keypoints of the image, + shape (num_gts, K*3). + gt_areas (list[Tensor]): Ground truth mask areas of each box, + shape (num_gts,). + gt_bboxes_ignore (list[Tensor]): Ground truth bboxes to be + ignored, shape (num_ignored_gts, 4). + proposal_cfg (mmcv.Config): Test / postprocessing configuration, + if None, test_cfg would be used. + + Returns: + dict[str, Tensor]: A dictionary of loss components. + """ + assert proposal_cfg is None, '"proposal_cfg" must be None' + outs = self(x, img_metas) + memory, mlvl_masks = outs[-2:] + outs = outs[:-2] + if gt_labels is None: + loss_inputs = outs + (gt_bboxes, gt_keypoints, gt_areas, img_metas) + else: + loss_inputs = outs + (gt_bboxes, gt_labels, gt_keypoints, gt_areas, + img_metas) + losses_and_targets = self.loss( + *loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) + # losses = losses_and_targets + losses, refine_targets = losses_and_targets + # get pose refinement loss + losses = self.forward_refine(memory, mlvl_masks, refine_targets, losses, + img_metas) + return losses + + def loss(self, + all_cls_scores, + all_kpt_preds, + enc_cls_scores, + enc_kpt_preds, + enc_hm_proto, + gt_bboxes_list, + gt_labels_list, + gt_keypoints_list, + gt_areas_list, + img_metas, + gt_bboxes_ignore=None): + """Loss function. + + Args: + all_cls_scores (Tensor): Classification score of all + decoder layers, has shape + [nb_dec, bs, num_query, cls_out_channels]. + all_kpt_preds (Tensor): Sigmoid regression + outputs of all decode layers. Each is a 4D-tensor with + normalized coordinate format (x_{i}, y_{i}) and shape + [nb_dec, bs, num_query, K*2]. + enc_cls_scores (Tensor): Classification scores of + points on encode feature map, has shape + (N, h*w, num_classes). Only be passed when as_two_stage is + True, otherwise is None. + enc_kpt_preds (Tensor): Regression results of each points + on the encode feature map, has shape (N, h*w, K*2). Only be + passed when as_two_stage is True, otherwise is None. + gt_bboxes_list (list[Tensor]): Ground truth bboxes for each image + with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format. + gt_labels_list (list[Tensor]): Ground truth class indices for each + image with shape (num_gts, ). + gt_keypoints_list (list[Tensor]): Ground truth keypoints for each + image with shape (num_gts, K*3) in [p^{1}_x, p^{1}_y, p^{1}_v, + ..., p^{K}_x, p^{K}_y, p^{K}_v] format. + gt_areas_list (list[Tensor]): Ground truth mask areas for each + image with shape (num_gts, ). + img_metas (list[dict]): List of image meta information. + gt_bboxes_ignore (list[Tensor], optional): Bounding boxes + which can be ignored for each image. Default None. + + Returns: + dict[str, Tensor]: A dictionary of loss components. + """ + assert gt_bboxes_ignore is None, \ + f'{self.__class__.__name__} only supports ' \ + f'for gt_bboxes_ignore setting to None.' + + num_dec_layers = len(all_cls_scores) + all_gt_labels_list = [gt_labels_list for _ in range(num_dec_layers)] + all_gt_keypoints_list = [ + gt_keypoints_list for _ in range(num_dec_layers) + ] + all_gt_areas_list = [gt_areas_list for _ in range(num_dec_layers)] + img_metas_list = [img_metas for _ in range(num_dec_layers)] + + losses_cls, losses_kpt, losses_oks, kpt_preds_list, kpt_targets_list, \ + area_targets_list, kpt_weights_list = multi_apply( + self.loss_single, all_cls_scores, all_kpt_preds, + all_gt_labels_list, all_gt_keypoints_list, + all_gt_areas_list, img_metas_list) + + loss_dict = dict() + # loss of proposal generated from encode feature map. + if enc_cls_scores is not None: + binary_labels_list = [ + paddle.zeros_like(gt_labels_list[i]) + for i in range(len(img_metas)) + ] + enc_loss_cls, enc_losses_kpt = \ + self.loss_single_rpn( + enc_cls_scores, enc_kpt_preds, binary_labels_list, + gt_keypoints_list, gt_areas_list, img_metas) + loss_dict['enc_loss_cls'] = enc_loss_cls + loss_dict['enc_loss_kpt'] = enc_losses_kpt + + # loss from the last decoder layer + loss_dict['loss_cls'] = losses_cls[-1] + loss_dict['loss_kpt'] = losses_kpt[-1] + loss_dict['loss_oks'] = losses_oks[-1] + # loss from other decoder layers + num_dec_layer = 0 + for loss_cls_i, loss_kpt_i, loss_oks_i in zip( + losses_cls[:-1], losses_kpt[:-1], losses_oks[:-1]): + loss_dict[f'd{num_dec_layer}.loss_cls'] = loss_cls_i + loss_dict[f'd{num_dec_layer}.loss_kpt'] = loss_kpt_i + loss_dict[f'd{num_dec_layer}.loss_oks'] = loss_oks_i + num_dec_layer += 1 + + # losses of heatmap generated from P3 feature map + hm_pred, hm_mask = enc_hm_proto + loss_hm = self.loss_heatmap(hm_pred, hm_mask, gt_keypoints_list, + gt_labels_list, gt_bboxes_list) + loss_dict['loss_hm'] = loss_hm + + return loss_dict, (kpt_preds_list[-1], kpt_targets_list[-1], + area_targets_list[-1], kpt_weights_list[-1]) + + def loss_heatmap(self, hm_pred, hm_mask, gt_keypoints, gt_labels, + gt_bboxes): + assert hm_pred.shape[-2:] == hm_mask.shape[-2:] + num_img, _, h, w = hm_pred.shape + # placeholder of heatmap target (Gaussian distribution) + hm_target = paddle.zeros(hm_pred.shape, hm_pred.dtype) + for i, (gt_label, gt_bbox, gt_keypoint + ) in enumerate(zip(gt_labels, gt_bboxes, gt_keypoints)): + if gt_label.shape[0] == 0: + continue + gt_keypoint = gt_keypoint.reshape((gt_keypoint.shape[0], -1, + 3)).clone() + gt_keypoint[..., :2] /= 8 + + assert gt_keypoint[..., 0].max() <= w + 0.5 # new coordinate system + assert gt_keypoint[..., 1].max() <= h + 0.5 # new coordinate system + gt_bbox /= 8 + gt_w = gt_bbox[:, 2] - gt_bbox[:, 0] + gt_h = gt_bbox[:, 3] - gt_bbox[:, 1] + for j in range(gt_label.shape[0]): + # get heatmap radius + kp_radius = paddle.clip( + paddle.floor( + gaussian_radius( + (gt_h[j], gt_w[j]), min_overlap=0.9)), + min=0, + max=3) + for k in range(self.num_keypoints): + if gt_keypoint[j, k, 2] > 0: + gt_kp = gt_keypoint[j, k, :2] + gt_kp_int = paddle.floor(gt_kp) + hm_target[i, k] = draw_umich_gaussian( + hm_target[i, k], gt_kp_int, kp_radius) + # compute heatmap loss + hm_pred = paddle.clip( + F.sigmoid(hm_pred), min=1e-4, max=1 - 1e-4) # refer to CenterNet + loss_hm = self.loss_hm( + hm_pred, + hm_target.detach(), + mask=~hm_mask.astype("bool").unsqueeze(1)) + return loss_hm + + def loss_single(self, cls_scores, kpt_preds, gt_labels_list, + gt_keypoints_list, gt_areas_list, img_metas): + """Loss function for outputs from a single decoder layer of a single + feature level. + + Args: + cls_scores (Tensor): Box score logits from a single decoder layer + for all images. Shape [bs, num_query, cls_out_channels]. + kpt_preds (Tensor): Sigmoid outputs from a single decoder layer + for all images, with normalized coordinate (x_{i}, y_{i}) and + shape [bs, num_query, K*2]. + gt_labels_list (list[Tensor]): Ground truth class indices for each + image with shape (num_gts, ). + gt_keypoints_list (list[Tensor]): Ground truth keypoints for each + image with shape (num_gts, K*3) in [p^{1}_x, p^{1}_y, p^{1}_v, + ..., p^{K}_x, p^{K}_y, p^{K}_v] format. + gt_areas_list (list[Tensor]): Ground truth mask areas for each + image with shape (num_gts, ). + img_metas (list[dict]): List of image meta information. + + Returns: + dict[str, Tensor]: A dictionary of loss components for outputs from + a single decoder layer. + """ + num_imgs = cls_scores.shape[0] + cls_scores_list = [cls_scores[i] for i in range(num_imgs)] + kpt_preds_list = [kpt_preds[i] for i in range(num_imgs)] + cls_reg_targets = self.get_targets(cls_scores_list, kpt_preds_list, + gt_labels_list, gt_keypoints_list, + gt_areas_list, img_metas) + (labels_list, label_weights_list, kpt_targets_list, kpt_weights_list, + area_targets_list, num_total_pos, num_total_neg) = cls_reg_targets + labels = paddle.concat(labels_list, 0) + label_weights = paddle.concat(label_weights_list, 0) + kpt_targets = paddle.concat(kpt_targets_list, 0) + kpt_weights = paddle.concat(kpt_weights_list, 0) + area_targets = paddle.concat(area_targets_list, 0) + + # classification loss + cls_scores = cls_scores.reshape((-1, self.cls_out_channels)) + # construct weighted avg_factor to match with the official DETR repo + cls_avg_factor = num_total_pos * 1.0 + \ + num_total_neg * self.bg_cls_weight + if self.sync_cls_avg_factor: + cls_avg_factor = reduce_mean( + paddle.to_tensor( + [cls_avg_factor], dtype=cls_scores.dtype)) + cls_avg_factor = max(cls_avg_factor, 1) + + loss_cls = self.loss_cls( + cls_scores, labels, label_weights, avg_factor=cls_avg_factor) + + # Compute the average number of gt keypoints accross all gpus, for + # normalization purposes + num_total_pos = paddle.to_tensor([num_total_pos], dtype=loss_cls.dtype) + num_total_pos = paddle.clip(reduce_mean(num_total_pos), min=1).item() + + # construct factors used for rescale keypoints + factors = [] + for img_meta, kpt_pred in zip(img_metas, kpt_preds): + img_h, img_w, _ = img_meta['img_shape'] + factor = paddle.to_tensor( + [img_w, img_h, img_w, img_h], + dtype=kpt_pred.dtype).squeeze().unsqueeze(0).tile( + (kpt_pred.shape[0], 1)) + factors.append(factor) + factors = paddle.concat(factors, 0) + + # keypoint regression loss + kpt_preds = kpt_preds.reshape((-1, kpt_preds.shape[-1])) + num_valid_kpt = paddle.clip( + reduce_mean(kpt_weights.sum()), min=1).item() + # assert num_valid_kpt == (kpt_targets>0).sum().item() + loss_kpt = self.loss_kpt( + kpt_preds, + kpt_targets.detach(), + kpt_weights.detach(), + avg_factor=num_valid_kpt) + + # keypoint oks loss + pos_inds = kpt_weights.sum(-1) > 0 + if not pos_inds.any(): + loss_oks = kpt_preds.sum() * 0 + else: + factors = factors[pos_inds][:, :2].tile(( + (1, kpt_preds.shape[-1] // 2))) + pos_kpt_preds = kpt_preds[pos_inds] * factors + pos_kpt_targets = kpt_targets[pos_inds] * factors + pos_areas = area_targets[pos_inds] + pos_valid = kpt_weights[pos_inds][..., 0::2] + assert (pos_areas > 0).all() + loss_oks = self.loss_oks( + pos_kpt_preds, + pos_kpt_targets, + pos_valid, + pos_areas, + avg_factor=num_total_pos) + return loss_cls, loss_kpt, loss_oks, kpt_preds, kpt_targets, \ + area_targets, kpt_weights + + def get_targets(self, cls_scores_list, kpt_preds_list, gt_labels_list, + gt_keypoints_list, gt_areas_list, img_metas): + """Compute regression and classification targets for a batch image. + + Outputs from a single decoder layer of a single feature level are used. + + Args: + cls_scores_list (list[Tensor]): Box score logits from a single + decoder layer for each image with shape [num_query, + cls_out_channels]. + kpt_preds_list (list[Tensor]): Sigmoid outputs from a single + decoder layer for each image, with normalized coordinate + (x_{i}, y_{i}) and shape [num_query, K*2]. + gt_labels_list (list[Tensor]): Ground truth class indices for each + image with shape (num_gts, ). + gt_keypoints_list (list[Tensor]): Ground truth keypoints for each + image with shape (num_gts, K*3). + gt_areas_list (list[Tensor]): Ground truth mask areas for each + image with shape (num_gts, ). + img_metas (list[dict]): List of image meta information. + + Returns: + tuple: a tuple containing the following targets. + + - labels_list (list[Tensor]): Labels for all images. + - label_weights_list (list[Tensor]): Label weights for all + images. + - kpt_targets_list (list[Tensor]): Keypoint targets for all + images. + - kpt_weights_list (list[Tensor]): Keypoint weights for all + images. + - area_targets_list (list[Tensor]): area targets for all + images. + - num_total_pos (int): Number of positive samples in all + images. + - num_total_neg (int): Number of negative samples in all + images. + """ + (labels_list, label_weights_list, kpt_targets_list, kpt_weights_list, + area_targets_list, pos_inds_list, neg_inds_list) = multi_apply( + self._get_target_single, cls_scores_list, kpt_preds_list, + gt_labels_list, gt_keypoints_list, gt_areas_list, img_metas) + num_total_pos = sum((inds.numel() for inds in pos_inds_list)) + num_total_neg = sum((inds.numel() for inds in neg_inds_list)) + return (labels_list, label_weights_list, kpt_targets_list, + kpt_weights_list, area_targets_list, num_total_pos, + num_total_neg) + + def _get_target_single(self, cls_score, kpt_pred, gt_labels, gt_keypoints, + gt_areas, img_meta): + """Compute regression and classification targets for one image. + + Outputs from a single decoder layer of a single feature level are used. + + Args: + cls_score (Tensor): Box score logits from a single decoder layer + for one image. Shape [num_query, cls_out_channels]. + kpt_pred (Tensor): Sigmoid outputs from a single decoder layer + for one image, with normalized coordinate (x_{i}, y_{i}) and + shape [num_query, K*2]. + gt_labels (Tensor): Ground truth class indices for one image + with shape (num_gts, ). + gt_keypoints (Tensor): Ground truth keypoints for one image with + shape (num_gts, K*3) in [p^{1}_x, p^{1}_y, p^{1}_v, ..., \ + p^{K}_x, p^{K}_y, p^{K}_v] format. + gt_areas (Tensor): Ground truth mask areas for one image + with shape (num_gts, ). + img_meta (dict): Meta information for one image. + + Returns: + tuple[Tensor]: a tuple containing the following for one image. + + - labels (Tensor): Labels of each image. + - label_weights (Tensor): Label weights of each image. + - kpt_targets (Tensor): Keypoint targets of each image. + - kpt_weights (Tensor): Keypoint weights of each image. + - area_targets (Tensor): Area targets of each image. + - pos_inds (Tensor): Sampled positive indices for each image. + - neg_inds (Tensor): Sampled negative indices for each image. + """ + num_bboxes = kpt_pred.shape[0] + # assigner and sampler + assign_result = self.assigner.assign(cls_score, kpt_pred, gt_labels, + gt_keypoints, gt_areas, img_meta) + sampling_result = self.sampler.sample(assign_result, kpt_pred, + gt_keypoints) + + pos_inds = sampling_result.pos_inds + neg_inds = sampling_result.neg_inds + + # label targets + labels = paddle.full((num_bboxes, ), self.num_classes, dtype="int64") + label_weights = paddle.ones((num_bboxes, ), dtype=gt_labels.dtype) + kpt_targets = paddle.zeros_like(kpt_pred) + kpt_weights = paddle.zeros_like(kpt_pred) + area_targets = paddle.zeros((kpt_pred.shape[0], ), dtype=kpt_pred.dtype) + + if pos_inds.size == 0: + return (labels, label_weights, kpt_targets, kpt_weights, + area_targets, pos_inds, neg_inds) + + labels[pos_inds] = gt_labels[sampling_result.pos_assigned_gt_inds][ + ..., 0].astype("int64") + + img_h, img_w, _ = img_meta['img_shape'] + # keypoint targets + pos_gt_kpts = gt_keypoints[sampling_result.pos_assigned_gt_inds] + pos_gt_kpts = pos_gt_kpts.reshape( + (len(sampling_result.pos_assigned_gt_inds), -1, 3)) + valid_idx = pos_gt_kpts[:, :, 2] > 0 + pos_kpt_weights = kpt_weights[pos_inds].reshape( + (pos_gt_kpts.shape[0], kpt_weights.shape[-1] // 2, 2)) + # pos_kpt_weights[valid_idx][...] = 1.0 + pos_kpt_weights = masked_fill(pos_kpt_weights, + valid_idx.unsqueeze(-1), 1.0) + kpt_weights[pos_inds] = pos_kpt_weights.reshape( + (pos_kpt_weights.shape[0], kpt_pred.shape[-1])) + + factor = paddle.to_tensor( + [img_w, img_h], dtype=kpt_pred.dtype).squeeze().unsqueeze(0) + pos_gt_kpts_normalized = pos_gt_kpts[..., :2] + pos_gt_kpts_normalized[..., 0] = pos_gt_kpts_normalized[..., 0] / \ + factor[:, 0:1] + pos_gt_kpts_normalized[..., 1] = pos_gt_kpts_normalized[..., 1] / \ + factor[:, 1:2] + kpt_targets[pos_inds] = pos_gt_kpts_normalized.reshape( + (pos_gt_kpts.shape[0], kpt_pred.shape[-1])) + + pos_gt_areas = gt_areas[sampling_result.pos_assigned_gt_inds][..., 0] + area_targets[pos_inds] = pos_gt_areas + + return (labels, label_weights, kpt_targets, kpt_weights, area_targets, + pos_inds, neg_inds) + + def loss_single_rpn(self, cls_scores, kpt_preds, gt_labels_list, + gt_keypoints_list, gt_areas_list, img_metas): + """Loss function for outputs from a single decoder layer of a single + feature level. + + Args: + cls_scores (Tensor): Box score logits from a single decoder layer + for all images. Shape [bs, num_query, cls_out_channels]. + kpt_preds (Tensor): Sigmoid outputs from a single decoder layer + for all images, with normalized coordinate (x_{i}, y_{i}) and + shape [bs, num_query, K*2]. + gt_labels_list (list[Tensor]): Ground truth class indices for each + image with shape (num_gts, ). + gt_keypoints_list (list[Tensor]): Ground truth keypoints for each + image with shape (num_gts, K*3) in [p^{1}_x, p^{1}_y, p^{1}_v, + ..., p^{K}_x, p^{K}_y, p^{K}_v] format. + gt_areas_list (list[Tensor]): Ground truth mask areas for each + image with shape (num_gts, ). + img_metas (list[dict]): List of image meta information. + + Returns: + dict[str, Tensor]: A dictionary of loss components for outputs from + a single decoder layer. + """ + num_imgs = cls_scores.shape[0] + cls_scores_list = [cls_scores[i] for i in range(num_imgs)] + kpt_preds_list = [kpt_preds[i] for i in range(num_imgs)] + cls_reg_targets = self.get_targets(cls_scores_list, kpt_preds_list, + gt_labels_list, gt_keypoints_list, + gt_areas_list, img_metas) + (labels_list, label_weights_list, kpt_targets_list, kpt_weights_list, + area_targets_list, num_total_pos, num_total_neg) = cls_reg_targets + labels = paddle.concat(labels_list, 0) + label_weights = paddle.concat(label_weights_list, 0) + kpt_targets = paddle.concat(kpt_targets_list, 0) + kpt_weights = paddle.concat(kpt_weights_list, 0) + + # classification loss + cls_scores = cls_scores.reshape((-1, self.cls_out_channels)) + # construct weighted avg_factor to match with the official DETR repo + cls_avg_factor = num_total_pos * 1.0 + \ + num_total_neg * self.bg_cls_weight + if self.sync_cls_avg_factor: + cls_avg_factor = reduce_mean( + paddle.to_tensor( + [cls_avg_factor], dtype=cls_scores.dtype)) + cls_avg_factor = max(cls_avg_factor, 1) + + cls_avg_factor = max(cls_avg_factor, 1) + loss_cls = self.loss_cls( + cls_scores, labels, label_weights, avg_factor=cls_avg_factor) + + # Compute the average number of gt keypoints accross all gpus, for + # normalization purposes + # num_total_pos = loss_cls.to_tensor([num_total_pos]) + # num_total_pos = paddle.clip(reduce_mean(num_total_pos), min=1).item() + + # keypoint regression loss + kpt_preds = kpt_preds.reshape((-1, kpt_preds.shape[-1])) + num_valid_kpt = paddle.clip( + reduce_mean(kpt_weights.sum()), min=1).item() + # assert num_valid_kpt == (kpt_targets>0).sum().item() + loss_kpt = self.loss_kpt_rpn( + kpt_preds, kpt_targets, kpt_weights, avg_factor=num_valid_kpt) + + return loss_cls, loss_kpt + + def get_bboxes(self, + all_cls_scores, + all_kpt_preds, + enc_cls_scores, + enc_kpt_preds, + hm_proto, + memory, + mlvl_masks, + img_metas, + rescale=False): + """Transform network outputs for a batch into bbox predictions. + + Args: + all_cls_scores (Tensor): Classification score of all + decoder layers, has shape + [nb_dec, bs, num_query, cls_out_channels]. + all_kpt_preds (Tensor): Sigmoid regression + outputs of all decode layers. Each is a 4D-tensor with + normalized coordinate format (x_{i}, y_{i}) and shape + [nb_dec, bs, num_query, K*2]. + enc_cls_scores (Tensor): Classification scores of points on + encode feature map, has shape (N, h*w, num_classes). + Only be passed when as_two_stage is True, otherwise is None. + enc_kpt_preds (Tensor): Regression results of each points + on the encode feature map, has shape (N, h*w, K*2). Only be + passed when as_two_stage is True, otherwise is None. + img_metas (list[dict]): Meta information of each image. + rescale (bool, optional): If True, return boxes in original + image space. Defalut False. + + Returns: + list[list[Tensor, Tensor]]: Each item in result_list is 3-tuple. + The first item is an (n, 5) tensor, where the first 4 columns + are bounding box positions (tl_x, tl_y, br_x, br_y) and the + 5-th column is a score between 0 and 1. The second item is a + (n,) tensor where each item is the predicted class label of + the corresponding box. The third item is an (n, K, 3) tensor + with [p^{1}_x, p^{1}_y, p^{1}_v, ..., p^{K}_x, p^{K}_y, + p^{K}_v] format. + """ + cls_scores = all_cls_scores[-1] + kpt_preds = all_kpt_preds[-1] + + result_list = [] + for img_id in range(len(img_metas)): + cls_score = cls_scores[img_id] + kpt_pred = kpt_preds[img_id] + img_shape = img_metas[img_id]['img_shape'] + scale_factor = img_metas[img_id]['scale_factor'] + # TODO: only support single image test + # memory_i = memory[:, img_id, :] + # mlvl_mask = mlvl_masks[img_id] + proposals = self._get_bboxes_single(cls_score, kpt_pred, img_shape, + scale_factor, memory, + mlvl_masks, rescale) + result_list.append(proposals) + return result_list + + def _get_bboxes_single(self, + cls_score, + kpt_pred, + img_shape, + scale_factor, + memory, + mlvl_masks, + rescale=False): + """Transform outputs from the last decoder layer into bbox predictions + for each image. + + Args: + cls_score (Tensor): Box score logits from the last decoder layer + for each image. Shape [num_query, cls_out_channels]. + kpt_pred (Tensor): Sigmoid outputs from the last decoder layer + for each image, with coordinate format (x_{i}, y_{i}) and + shape [num_query, K*2]. + img_shape (tuple[int]): Shape of input image, (height, width, 3). + scale_factor (ndarray, optional): Scale factor of the image arange + as (w_scale, h_scale, w_scale, h_scale). + rescale (bool, optional): If True, return boxes in original image + space. Default False. + + Returns: + tuple[Tensor]: Results of detected bboxes and labels. + + - det_bboxes: Predicted bboxes with shape [num_query, 5], + where the first 4 columns are bounding box positions + (tl_x, tl_y, br_x, br_y) and the 5-th column are scores + between 0 and 1. + - det_labels: Predicted labels of the corresponding box with + shape [num_query]. + - det_kpts: Predicted keypoints with shape [num_query, K, 3]. + """ + assert len(cls_score) == len(kpt_pred) + max_per_img = self.test_cfg.get('max_per_img', self.num_query) + # exclude background + if self.loss_cls.use_sigmoid: + cls_score = F.sigmoid(cls_score) + scores, indexs = cls_score.reshape([-1]).topk(max_per_img) + det_labels = indexs % self.num_classes + bbox_index = indexs // self.num_classes + kpt_pred = kpt_pred[bbox_index] + else: + scores, det_labels = F.softmax(cls_score, axis=-1)[..., :-1].max(-1) + scores, bbox_index = scores.topk(max_per_img) + kpt_pred = kpt_pred[bbox_index] + det_labels = det_labels[bbox_index] + + # ----- results after pose decoder ----- + # det_kpts = kpt_pred.reshape((kpt_pred.shape[0], -1, 2)) + + # ----- results after joint decoder (default) ----- + # import time + # start = time.time() + refine_targets = (kpt_pred, None, None, paddle.ones_like(kpt_pred)) + refine_outputs = self.forward_refine(memory, mlvl_masks, refine_targets, + None, None) + # end = time.time() + # print(f'refine time: {end - start:.6f}') + det_kpts = refine_outputs[-1] + + det_kpts[..., 0] = det_kpts[..., 0] * img_shape[1] + det_kpts[..., 1] = det_kpts[..., 1] * img_shape[0] + det_kpts[..., 0].clip_(min=0, max=img_shape[1]) + det_kpts[..., 1].clip_(min=0, max=img_shape[0]) + if rescale: + det_kpts /= paddle.to_tensor( + scale_factor[:2], + dtype=det_kpts.dtype).unsqueeze(0).unsqueeze(0) + + # use circumscribed rectangle box of keypoints as det bboxes + x1 = det_kpts[..., 0].min(axis=1, keepdim=True) + y1 = det_kpts[..., 1].min(axis=1, keepdim=True) + x2 = det_kpts[..., 0].max(axis=1, keepdim=True) + y2 = det_kpts[..., 1].max(axis=1, keepdim=True) + det_bboxes = paddle.concat([x1, y1, x2, y2], axis=1) + det_bboxes = paddle.concat((det_bboxes, scores.unsqueeze(1)), -1) + + det_kpts = paddle.concat( + (det_kpts, paddle.ones( + det_kpts[..., :1].shape, dtype=det_kpts.dtype)), + axis=2) + + return det_bboxes, det_labels, det_kpts + + def simple_test(self, feats, img_metas, rescale=False): + """Test det bboxes without test-time augmentation. + + Args: + feats (tuple[paddle.Tensor]): Multi-level features from the + upstream network, each is a 4D-tensor. + img_metas (list[dict]): List of image information. + rescale (bool, optional): Whether to rescale the results. + Defaults to False. + + Returns: + list[tuple[Tensor, Tensor, Tensor]]: Each item in result_list is + 3-tuple. The first item is ``bboxes`` with shape (n, 5), + where 5 represent (tl_x, tl_y, br_x, br_y, score). + The shape of the second tensor in the tuple is ``labels`` + with shape (n,). The third item is ``kpts`` with shape + (n, K, 3), in [p^{1}_x, p^{1}_y, p^{1}_v, p^{K}_x, p^{K}_y, + p^{K}_v] format. + """ + # forward of this head requires img_metas + outs = self.forward(feats, img_metas) + results_list = self.get_bboxes(*outs, img_metas, rescale=rescale) + return results_list + + def get_loss(self, boxes, scores, gt_bbox, gt_class, prior_boxes): + return self.loss(boxes, scores, gt_bbox, gt_class, prior_boxes) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/pico_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/pico_head.py new file mode 100644 index 0000000000000000000000000000000000000000..e5232239910be187bffa294e5d0cc719bb4f10da --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/pico_head.py @@ -0,0 +1,783 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Constant + +from ppdet.modeling.ops import get_static_shape +from ..initializer import normal_ +from ..assigners.utils import generate_anchors_for_grid_cell +from ..bbox_utils import bbox_center, batch_distance2bbox, bbox2distance +from ppdet.core.workspace import register +from ppdet.modeling.layers import ConvNormLayer +from .simota_head import OTAVFLHead +from .gfl_head import Integral, GFLHead +from ppdet.modeling.necks.csp_pan import DPModule + +eps = 1e-9 + +__all__ = ['PicoHead', 'PicoHeadV2', 'PicoFeat'] + + +class PicoSE(nn.Layer): + def __init__(self, feat_channels): + super(PicoSE, self).__init__() + self.fc = nn.Conv2D(feat_channels, feat_channels, 1) + self.conv = ConvNormLayer(feat_channels, feat_channels, 1, 1) + + self._init_weights() + + def _init_weights(self): + normal_(self.fc.weight, std=0.001) + + def forward(self, feat, avg_feat): + weight = F.sigmoid(self.fc(avg_feat)) + out = self.conv(feat * weight) + return out + + +@register +class PicoFeat(nn.Layer): + """ + PicoFeat of PicoDet + + Args: + feat_in (int): The channel number of input Tensor. + feat_out (int): The channel number of output Tensor. + num_convs (int): The convolution number of the LiteGFLFeat. + norm_type (str): Normalization type, 'bn'/'sync_bn'/'gn'. + share_cls_reg (bool): Whether to share the cls and reg output. + act (str): The act of per layers. + use_se (bool): Whether to use se module. + """ + + def __init__(self, + feat_in=256, + feat_out=96, + num_fpn_stride=3, + num_convs=2, + norm_type='bn', + share_cls_reg=False, + act='hard_swish', + use_se=False): + super(PicoFeat, self).__init__() + self.num_convs = num_convs + self.norm_type = norm_type + self.share_cls_reg = share_cls_reg + self.act = act + self.use_se = use_se + self.cls_convs = [] + self.reg_convs = [] + if use_se: + assert share_cls_reg == True, \ + 'In the case of using se, share_cls_reg must be set to True' + self.se = nn.LayerList() + for stage_idx in range(num_fpn_stride): + cls_subnet_convs = [] + reg_subnet_convs = [] + for i in range(self.num_convs): + in_c = feat_in if i == 0 else feat_out + cls_conv_dw = self.add_sublayer( + 'cls_conv_dw{}.{}'.format(stage_idx, i), + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=5, + stride=1, + groups=feat_out, + norm_type=norm_type, + bias_on=False, + lr_scale=2.)) + cls_subnet_convs.append(cls_conv_dw) + cls_conv_pw = self.add_sublayer( + 'cls_conv_pw{}.{}'.format(stage_idx, i), + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=1, + stride=1, + norm_type=norm_type, + bias_on=False, + lr_scale=2.)) + cls_subnet_convs.append(cls_conv_pw) + + if not self.share_cls_reg: + reg_conv_dw = self.add_sublayer( + 'reg_conv_dw{}.{}'.format(stage_idx, i), + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=5, + stride=1, + groups=feat_out, + norm_type=norm_type, + bias_on=False, + lr_scale=2.)) + reg_subnet_convs.append(reg_conv_dw) + reg_conv_pw = self.add_sublayer( + 'reg_conv_pw{}.{}'.format(stage_idx, i), + ConvNormLayer( + ch_in=in_c, + ch_out=feat_out, + filter_size=1, + stride=1, + norm_type=norm_type, + bias_on=False, + lr_scale=2.)) + reg_subnet_convs.append(reg_conv_pw) + self.cls_convs.append(cls_subnet_convs) + self.reg_convs.append(reg_subnet_convs) + if use_se: + self.se.append(PicoSE(feat_out)) + + def act_func(self, x): + if self.act == "leaky_relu": + x = F.leaky_relu(x) + elif self.act == "hard_swish": + x = F.hardswish(x) + elif self.act == "relu6": + x = F.relu6(x) + return x + + def forward(self, fpn_feat, stage_idx): + assert stage_idx < len(self.cls_convs) + cls_feat = fpn_feat + reg_feat = fpn_feat + for i in range(len(self.cls_convs[stage_idx])): + cls_feat = self.act_func(self.cls_convs[stage_idx][i](cls_feat)) + reg_feat = cls_feat + if not self.share_cls_reg: + reg_feat = self.act_func(self.reg_convs[stage_idx][i](reg_feat)) + if self.use_se: + avg_feat = F.adaptive_avg_pool2d(cls_feat, (1, 1)) + se_feat = self.act_func(self.se[stage_idx](cls_feat, avg_feat)) + return cls_feat, se_feat + return cls_feat, reg_feat + + +@register +class PicoHead(OTAVFLHead): + """ + PicoHead + Args: + conv_feat (object): Instance of 'PicoFeat' + num_classes (int): Number of classes + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + loss_class (object): Instance of VariFocalLoss. + loss_dfl (object): Instance of DistributionFocalLoss. + loss_bbox (object): Instance of bbox loss. + assigner (object): Instance of label assigner. + reg_max: Max value of integral set :math: `{0, ..., reg_max}` + n QFL setting. Default: 7. + """ + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', + 'assigner', 'nms' + ] + __shared__ = ['num_classes', 'eval_size'] + + def __init__(self, + conv_feat='PicoFeat', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32], + prior_prob=0.01, + loss_class='VariFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + assigner='SimOTAAssigner', + reg_max=16, + feat_in_chan=96, + nms=None, + nms_pre=1000, + cell_offset=0, + eval_size=None): + super(PicoHead, self).__init__( + conv_feat=conv_feat, + dgqp_module=dgqp_module, + num_classes=num_classes, + fpn_stride=fpn_stride, + prior_prob=prior_prob, + loss_class=loss_class, + loss_dfl=loss_dfl, + loss_bbox=loss_bbox, + assigner=assigner, + reg_max=reg_max, + feat_in_chan=feat_in_chan, + nms=nms, + nms_pre=nms_pre, + cell_offset=cell_offset) + self.conv_feat = conv_feat + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.loss_vfl = loss_class + self.loss_dfl = loss_dfl + self.loss_bbox = loss_bbox + self.assigner = assigner + self.reg_max = reg_max + self.feat_in_chan = feat_in_chan + self.nms = nms + self.nms_pre = nms_pre + self.cell_offset = cell_offset + self.eval_size = eval_size + + self.use_sigmoid = self.loss_vfl.use_sigmoid + if self.use_sigmoid: + self.cls_out_channels = self.num_classes + else: + self.cls_out_channels = self.num_classes + 1 + bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) + # Clear the super class initialization + self.gfl_head_cls = None + self.gfl_head_reg = None + self.scales_regs = None + + self.head_cls_list = [] + self.head_reg_list = [] + for i in range(len(fpn_stride)): + head_cls = self.add_sublayer( + "head_cls" + str(i), + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=self.cls_out_channels + 4 * (self.reg_max + 1) + if self.conv_feat.share_cls_reg else self.cls_out_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr( + initializer=Constant(value=bias_init_value)))) + self.head_cls_list.append(head_cls) + if not self.conv_feat.share_cls_reg: + head_reg = self.add_sublayer( + "head_reg" + str(i), + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=4 * (self.reg_max + 1), + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + self.head_reg_list.append(head_reg) + + # initialize the anchor points + if self.eval_size: + self.anchor_points, self.stride_tensor = self._generate_anchors() + + def forward(self, fpn_feats, export_post_process=True): + assert len(fpn_feats) == len( + self.fpn_stride + ), "The size of fpn_feats is not equal to size of fpn_stride" + + if self.training: + return self.forward_train(fpn_feats) + else: + return self.forward_eval( + fpn_feats, export_post_process=export_post_process) + + def forward_train(self, fpn_feats): + cls_logits_list, bboxes_reg_list = [], [] + for i, fpn_feat in enumerate(fpn_feats): + conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat, i) + if self.conv_feat.share_cls_reg: + cls_logits = self.head_cls_list[i](conv_cls_feat) + cls_score, bbox_pred = paddle.split( + cls_logits, + [self.cls_out_channels, 4 * (self.reg_max + 1)], + axis=1) + else: + cls_score = self.head_cls_list[i](conv_cls_feat) + bbox_pred = self.head_reg_list[i](conv_reg_feat) + + if self.dgqp_module: + quality_score = self.dgqp_module(bbox_pred) + cls_score = F.sigmoid(cls_score) * quality_score + + cls_logits_list.append(cls_score) + bboxes_reg_list.append(bbox_pred) + + return (cls_logits_list, bboxes_reg_list) + + def forward_eval(self, fpn_feats, export_post_process=True): + if self.eval_size: + anchor_points, stride_tensor = self.anchor_points, self.stride_tensor + else: + anchor_points, stride_tensor = self._generate_anchors(fpn_feats) + cls_logits_list, bboxes_reg_list = [], [] + for i, fpn_feat in enumerate(fpn_feats): + conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat, i) + if self.conv_feat.share_cls_reg: + cls_logits = self.head_cls_list[i](conv_cls_feat) + cls_score, bbox_pred = paddle.split( + cls_logits, + [self.cls_out_channels, 4 * (self.reg_max + 1)], + axis=1) + else: + cls_score = self.head_cls_list[i](conv_cls_feat) + bbox_pred = self.head_reg_list[i](conv_reg_feat) + + if self.dgqp_module: + quality_score = self.dgqp_module(bbox_pred) + cls_score = F.sigmoid(cls_score) * quality_score + + if not export_post_process: + # Now only supports batch size = 1 in deploy + # TODO(ygh): support batch size > 1 + cls_score_out = F.sigmoid(cls_score).reshape( + [1, self.cls_out_channels, -1]).transpose([0, 2, 1]) + bbox_pred = bbox_pred.reshape([1, (self.reg_max + 1) * 4, + -1]).transpose([0, 2, 1]) + else: + _, _, h, w = fpn_feat.shape + l = h * w + cls_score_out = F.sigmoid( + cls_score.reshape([-1, self.cls_out_channels, l])) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]) + bbox_pred = self.distribution_project(bbox_pred) + bbox_pred = bbox_pred.reshape([-1, l, 4]) + + cls_logits_list.append(cls_score_out) + bboxes_reg_list.append(bbox_pred) + + if export_post_process: + cls_logits_list = paddle.concat(cls_logits_list, axis=-1) + bboxes_reg_list = paddle.concat(bboxes_reg_list, axis=1) + bboxes_reg_list = batch_distance2bbox(anchor_points, + bboxes_reg_list) + bboxes_reg_list *= stride_tensor + + return (cls_logits_list, bboxes_reg_list) + + def _generate_anchors(self, feats=None): + # just use in eval time + anchor_points = [] + stride_tensor = [] + for i, stride in enumerate(self.fpn_stride): + if feats is not None: + _, _, h, w = feats[i].shape + else: + h = math.ceil(self.eval_size[0] / stride) + w = math.ceil(self.eval_size[1] / stride) + shift_x = paddle.arange(end=w) + self.cell_offset + shift_y = paddle.arange(end=h) + self.cell_offset + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [h * w, 1], stride, dtype='float32')) + anchor_points = paddle.concat(anchor_points) + stride_tensor = paddle.concat(stride_tensor) + return anchor_points, stride_tensor + + def post_process(self, head_outs, scale_factor, export_nms=True): + pred_scores, pred_bboxes = head_outs + if not export_nms: + return pred_bboxes, pred_scores + else: + # rescale: [h_scale, w_scale] -> [w_scale, h_scale, w_scale, h_scale] + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], + axis=-1).reshape([-1, 1, 4]) + # scale bbox to origin image size. + pred_bboxes /= scale_factor + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num + + +@register +class PicoHeadV2(GFLHead): + """ + PicoHeadV2 + Args: + conv_feat (object): Instance of 'PicoFeat' + num_classes (int): Number of classes + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + loss_class (object): Instance of VariFocalLoss. + loss_dfl (object): Instance of DistributionFocalLoss. + loss_bbox (object): Instance of bbox loss. + assigner (object): Instance of label assigner. + reg_max: Max value of integral set :math: `{0, ..., reg_max}` + n QFL setting. Default: 7. + """ + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', + 'static_assigner', 'assigner', 'nms' + ] + __shared__ = ['num_classes', 'eval_size'] + + def __init__(self, + conv_feat='PicoFeatV2', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32], + prior_prob=0.01, + use_align_head=True, + loss_class='VariFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + static_assigner_epoch=60, + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner', + reg_max=16, + feat_in_chan=96, + nms=None, + nms_pre=1000, + cell_offset=0, + act='hard_swish', + grid_cell_scale=5.0, + eval_size=None): + super(PicoHeadV2, self).__init__( + conv_feat=conv_feat, + dgqp_module=dgqp_module, + num_classes=num_classes, + fpn_stride=fpn_stride, + prior_prob=prior_prob, + loss_class=loss_class, + loss_dfl=loss_dfl, + loss_bbox=loss_bbox, + reg_max=reg_max, + feat_in_chan=feat_in_chan, + nms=nms, + nms_pre=nms_pre, + cell_offset=cell_offset, ) + self.conv_feat = conv_feat + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.loss_vfl = loss_class + self.loss_dfl = loss_dfl + self.loss_bbox = loss_bbox + + self.static_assigner_epoch = static_assigner_epoch + self.static_assigner = static_assigner + self.assigner = assigner + + self.reg_max = reg_max + self.feat_in_chan = feat_in_chan + self.nms = nms + self.nms_pre = nms_pre + self.cell_offset = cell_offset + self.act = act + self.grid_cell_scale = grid_cell_scale + self.use_align_head = use_align_head + self.cls_out_channels = self.num_classes + self.eval_size = eval_size + + bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) + # Clear the super class initialization + self.gfl_head_cls = None + self.gfl_head_reg = None + self.scales_regs = None + + self.head_cls_list = nn.LayerList() + self.head_reg_list = nn.LayerList() + self.cls_align = nn.LayerList() + + for i in range(len(fpn_stride)): + head_cls = self.add_sublayer( + "head_cls" + str(i), + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=self.cls_out_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr( + initializer=Constant(value=bias_init_value)))) + self.head_cls_list.append(head_cls) + head_reg = self.add_sublayer( + "head_reg" + str(i), + nn.Conv2D( + in_channels=self.feat_in_chan, + out_channels=4 * (self.reg_max + 1), + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + self.head_reg_list.append(head_reg) + if self.use_align_head: + self.cls_align.append( + DPModule( + self.feat_in_chan, + 1, + 5, + act=self.act, + use_act_in_out=False)) + + # initialize the anchor points + if self.eval_size: + self.anchor_points, self.stride_tensor = self._generate_anchors() + + def forward(self, fpn_feats, export_post_process=True): + assert len(fpn_feats) == len( + self.fpn_stride + ), "The size of fpn_feats is not equal to size of fpn_stride" + + if self.training: + return self.forward_train(fpn_feats) + else: + return self.forward_eval( + fpn_feats, export_post_process=export_post_process) + + def forward_train(self, fpn_feats): + cls_score_list, reg_list, box_list = [], [], [] + for i, (fpn_feat, stride) in enumerate(zip(fpn_feats, self.fpn_stride)): + b, _, h, w = get_static_shape(fpn_feat) + # task decomposition + conv_cls_feat, se_feat = self.conv_feat(fpn_feat, i) + cls_logit = self.head_cls_list[i](se_feat) + reg_pred = self.head_reg_list[i](se_feat) + + # cls prediction and alignment + if self.use_align_head: + cls_prob = F.sigmoid(self.cls_align[i](conv_cls_feat)) + cls_score = (F.sigmoid(cls_logit) * cls_prob + eps).sqrt() + else: + cls_score = F.sigmoid(cls_logit) + + cls_score_out = cls_score.transpose([0, 2, 3, 1]) + bbox_pred = reg_pred.transpose([0, 2, 3, 1]) + b, cell_h, cell_w, _ = paddle.shape(cls_score_out) + y, x = self.get_single_level_center_point( + [cell_h, cell_w], stride, cell_offset=self.cell_offset) + center_points = paddle.stack([x, y], axis=-1) + cls_score_out = cls_score_out.reshape( + [b, -1, self.cls_out_channels]) + bbox_pred = self.distribution_project(bbox_pred) * stride + bbox_pred = bbox_pred.reshape([b, cell_h * cell_w, 4]) + bbox_pred = batch_distance2bbox( + center_points, bbox_pred, max_shapes=None) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + reg_list.append(reg_pred.flatten(2).transpose([0, 2, 1])) + box_list.append(bbox_pred / stride) + + cls_score_list = paddle.concat(cls_score_list, axis=1) + box_list = paddle.concat(box_list, axis=1) + reg_list = paddle.concat(reg_list, axis=1) + return cls_score_list, reg_list, box_list, fpn_feats + + def forward_eval(self, fpn_feats, export_post_process=True): + if self.eval_size: + anchor_points, stride_tensor = self.anchor_points, self.stride_tensor + else: + anchor_points, stride_tensor = self._generate_anchors(fpn_feats) + cls_score_list, box_list = [], [] + for i, (fpn_feat, stride) in enumerate(zip(fpn_feats, self.fpn_stride)): + _, _, h, w = fpn_feat.shape + # task decomposition + conv_cls_feat, se_feat = self.conv_feat(fpn_feat, i) + cls_logit = self.head_cls_list[i](se_feat) + reg_pred = self.head_reg_list[i](se_feat) + + # cls prediction and alignment + if self.use_align_head: + cls_prob = F.sigmoid(self.cls_align[i](conv_cls_feat)) + cls_score = (F.sigmoid(cls_logit) * cls_prob + eps).sqrt() + else: + cls_score = F.sigmoid(cls_logit) + + if not export_post_process: + # Now only supports batch size = 1 in deploy + cls_score_list.append( + cls_score.reshape([1, self.cls_out_channels, -1]).transpose( + [0, 2, 1])) + box_list.append( + reg_pred.reshape([1, (self.reg_max + 1) * 4, -1]).transpose( + [0, 2, 1])) + else: + l = h * w + cls_score_out = cls_score.reshape( + [-1, self.cls_out_channels, l]) + bbox_pred = reg_pred.transpose([0, 2, 3, 1]) + bbox_pred = self.distribution_project(bbox_pred) + bbox_pred = bbox_pred.reshape([-1, l, 4]) + cls_score_list.append(cls_score_out) + box_list.append(bbox_pred) + + if export_post_process: + cls_score_list = paddle.concat(cls_score_list, axis=-1) + box_list = paddle.concat(box_list, axis=1) + box_list = batch_distance2bbox(anchor_points, box_list) + box_list *= stride_tensor + + return cls_score_list, box_list + + def get_loss(self, head_outs, gt_meta): + pred_scores, pred_regs, pred_bboxes, fpn_feats = head_outs + gt_labels = gt_meta['gt_class'] + gt_bboxes = gt_meta['gt_bbox'] + gt_scores = gt_meta['gt_score'] if 'gt_score' in gt_meta else None + num_imgs = gt_meta['im_id'].shape[0] + pad_gt_mask = gt_meta['pad_gt_mask'] + + anchors, _, num_anchors_list, stride_tensor_list = generate_anchors_for_grid_cell( + fpn_feats, self.fpn_stride, self.grid_cell_scale, self.cell_offset) + + centers = bbox_center(anchors) + + # label assignment + if gt_meta['epoch_id'] < self.static_assigner_epoch: + assigned_labels, assigned_bboxes, assigned_scores, _ = self.static_assigner( + anchors, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes, + gt_scores=gt_scores, + pred_bboxes=pred_bboxes.detach() * stride_tensor_list) + + else: + assigned_labels, assigned_bboxes, assigned_scores, _ = self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor_list, + centers, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes, + gt_scores=gt_scores) + + assigned_bboxes /= stride_tensor_list + + centers_shape = centers.shape + flatten_centers = centers.expand( + [num_imgs, centers_shape[0], centers_shape[1]]).reshape([-1, 2]) + flatten_strides = stride_tensor_list.expand( + [num_imgs, centers_shape[0], 1]).reshape([-1, 1]) + flatten_cls_preds = pred_scores.reshape([-1, self.num_classes]) + flatten_regs = pred_regs.reshape([-1, 4 * (self.reg_max + 1)]) + flatten_bboxes = pred_bboxes.reshape([-1, 4]) + flatten_bbox_targets = assigned_bboxes.reshape([-1, 4]) + flatten_labels = assigned_labels.reshape([-1]) + flatten_assigned_scores = assigned_scores.reshape( + [-1, self.num_classes]) + + pos_inds = paddle.nonzero( + paddle.logical_and((flatten_labels >= 0), + (flatten_labels < self.num_classes)), + as_tuple=False).squeeze(1) + + num_total_pos = len(pos_inds) + + if num_total_pos > 0: + pos_bbox_targets = paddle.gather( + flatten_bbox_targets, pos_inds, axis=0) + pos_decode_bbox_pred = paddle.gather( + flatten_bboxes, pos_inds, axis=0) + pos_reg = paddle.gather(flatten_regs, pos_inds, axis=0) + pos_strides = paddle.gather(flatten_strides, pos_inds, axis=0) + pos_centers = paddle.gather( + flatten_centers, pos_inds, axis=0) / pos_strides + + weight_targets = flatten_assigned_scores.detach() + weight_targets = paddle.gather( + weight_targets.max(axis=1, keepdim=True), pos_inds, axis=0) + + pred_corners = pos_reg.reshape([-1, self.reg_max + 1]) + target_corners = bbox2distance(pos_centers, pos_bbox_targets, + self.reg_max).reshape([-1]) + # regression loss + loss_bbox = paddle.sum( + self.loss_bbox(pos_decode_bbox_pred, + pos_bbox_targets) * weight_targets) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + else: + loss_bbox = paddle.zeros([1]) + loss_dfl = paddle.zeros([1]) + + avg_factor = flatten_assigned_scores.sum() + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(avg_factor) + avg_factor = paddle.clip( + avg_factor / paddle.distributed.get_world_size(), min=1) + loss_vfl = self.loss_vfl( + flatten_cls_preds, flatten_assigned_scores, avg_factor=avg_factor) + + loss_bbox = loss_bbox / avg_factor + loss_dfl = loss_dfl / avg_factor + + loss_states = dict( + loss_vfl=loss_vfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + + return loss_states + + def _generate_anchors(self, feats=None): + # just use in eval time + anchor_points = [] + stride_tensor = [] + for i, stride in enumerate(self.fpn_stride): + if feats is not None: + _, _, h, w = feats[i].shape + else: + h = math.ceil(self.eval_size[0] / stride) + w = math.ceil(self.eval_size[1] / stride) + shift_x = paddle.arange(end=w) + self.cell_offset + shift_y = paddle.arange(end=h) + self.cell_offset + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [h * w, 1], stride, dtype='float32')) + anchor_points = paddle.concat(anchor_points) + stride_tensor = paddle.concat(stride_tensor) + return anchor_points, stride_tensor + + def post_process(self, head_outs, scale_factor, export_nms=True): + pred_scores, pred_bboxes = head_outs + if not export_nms: + return pred_bboxes, pred_scores + else: + # rescale: [h_scale, w_scale] -> [w_scale, h_scale, w_scale, h_scale] + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], + axis=-1).reshape([-1, 1, 4]) + # scale bbox to origin image size. + pred_bboxes /= scale_factor + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_contrast_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_contrast_head.py new file mode 100644 index 0000000000000000000000000000000000000000..3b8e26e63d038627fd5f9cd0c7e935f81770ee0b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_contrast_head.py @@ -0,0 +1,197 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +from ..initializer import bias_init_with_prob, constant_ +from ..assigners.utils import generate_anchors_for_grid_cell +from ppdet.modeling.heads.ppyoloe_head import PPYOLOEHead + +__all__ = ['PPYOLOEContrastHead'] + + +@register +class PPYOLOEContrastHead(PPYOLOEHead): + __shared__ = [ + 'num_classes', 'eval_size', 'trt', 'exclude_nms', + 'exclude_post_process', 'use_shared_conv', 'for_distill' + ] + __inject__ = ['static_assigner', 'assigner', 'nms', 'contrast_loss'] + + def __init__(self, + in_channels=[1024, 512, 256], + num_classes=80, + act='swish', + fpn_strides=(32, 16, 8), + grid_cell_scale=5.0, + grid_cell_offset=0.5, + reg_max=16, + reg_range=None, + static_assigner_epoch=4, + use_varifocal_loss=True, + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner', + contrast_loss='SupContrast', + nms='MultiClassNMS', + eval_size=None, + loss_weight={ + 'class': 1.0, + 'iou': 2.5, + 'dfl': 0.5, + }, + trt=False, + attn_conv='convbn', + exclude_nms=False, + exclude_post_process=False, + use_shared_conv=True, + for_distill=False): + super().__init__(in_channels, num_classes, act, fpn_strides, + grid_cell_scale, grid_cell_offset, reg_max, reg_range, + static_assigner_epoch, use_varifocal_loss, + static_assigner, assigner, nms, eval_size, loss_weight, + trt, attn_conv, exclude_nms, exclude_post_process, + use_shared_conv, for_distill) + + assert len(in_channels) > 0, "len(in_channels) should > 0" + self.contrast_loss = contrast_loss + self.contrast_encoder = nn.LayerList() + for in_c in self.in_channels: + self.contrast_encoder.append(nn.Conv2D(in_c, 128, 3, padding=1)) + self._init_contrast_encoder() + + def _init_contrast_encoder(self): + bias_en = bias_init_with_prob(0.01) + for en_ in self.contrast_encoder: + constant_(en_.weight) + constant_(en_.bias, bias_en) + + def forward_train(self, feats, targets, aux_pred=None): + anchors, anchor_points, num_anchors_list, stride_tensor = \ + generate_anchors_for_grid_cell( + feats, self.fpn_strides, self.grid_cell_scale, + self.grid_cell_offset) + + cls_score_list, reg_distri_list = [], [] + contrast_encoder_list = [] + for i, feat in enumerate(feats): + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) + + feat) + reg_distri = self.pred_reg[i](self.stem_reg[i](feat, avg_feat)) + contrast_logit = self.contrast_encoder[i](self.stem_cls[i]( + feat, avg_feat) + feat) + contrast_encoder_list.append( + contrast_logit.flatten(2).transpose([0, 2, 1])) + # cls and reg + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + reg_distri_list.append(reg_distri.flatten(2).transpose([0, 2, 1])) + cls_score_list = paddle.concat(cls_score_list, axis=1) + reg_distri_list = paddle.concat(reg_distri_list, axis=1) + contrast_encoder_list = paddle.concat(contrast_encoder_list, axis=1) + + return self.get_loss([ + cls_score_list, reg_distri_list, contrast_encoder_list, anchors, + anchor_points, num_anchors_list, stride_tensor + ], targets) + + def get_loss(self, head_outs, gt_meta): + pred_scores, pred_distri, pred_contrast_encoder, anchors,\ + anchor_points, num_anchors_list, stride_tensor = head_outs + + anchor_points_s = anchor_points / stride_tensor + pred_bboxes = self._bbox_decode(anchor_points_s, pred_distri) + + gt_labels = gt_meta['gt_class'] + gt_bboxes = gt_meta['gt_bbox'] + pad_gt_mask = gt_meta['pad_gt_mask'] + # label assignment + if gt_meta['epoch_id'] < self.static_assigner_epoch: + assigned_labels, assigned_bboxes, assigned_scores, _ = \ + self.static_assigner( + anchors, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes, + pred_bboxes=pred_bboxes.detach() * stride_tensor) + alpha_l = 0.25 + else: + if self.sm_use: + assigned_labels, assigned_bboxes, assigned_scores, _ = \ + self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, + anchor_points, + stride_tensor, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + else: + assigned_labels, assigned_bboxes, assigned_scores, _ = \ + self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + alpha_l = -1 + # rescale bbox + assigned_bboxes /= stride_tensor + # cls loss + if self.use_varifocal_loss: + one_hot_label = F.one_hot(assigned_labels, + self.num_classes + 1)[..., :-1] + loss_cls = self._varifocal_loss(pred_scores, assigned_scores, + one_hot_label) + else: + loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l) + + assigned_scores_sum = assigned_scores.sum() + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(assigned_scores_sum) + assigned_scores_sum /= paddle.distributed.get_world_size() + assigned_scores_sum = paddle.clip(assigned_scores_sum, min=1.) + loss_cls /= assigned_scores_sum + + loss_l1, loss_iou, loss_dfl = \ + self._bbox_loss(pred_distri, pred_bboxes, anchor_points_s, + assigned_labels, assigned_bboxes, assigned_scores, + assigned_scores_sum) + # contrast loss + loss_contrast = self.contrast_loss(pred_contrast_encoder.reshape([-1, pred_contrast_encoder.shape[-1]]), \ + assigned_labels.reshape([-1]), assigned_scores.max(-1).reshape([-1])) + + loss = self.loss_weight['class'] * loss_cls + \ + self.loss_weight['iou'] * loss_iou + \ + self.loss_weight['dfl'] * loss_dfl + \ + self.loss_weight['contrast'] * loss_contrast + + out_dict = { + 'loss': loss, + 'loss_cls': loss_cls, + 'loss_iou': loss_iou, + 'loss_dfl': loss_dfl, + 'loss_l1': loss_l1, + 'loss_contrast': loss_contrast + } + return out_dict diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_head.py new file mode 100644 index 0000000000000000000000000000000000000000..261c0c4933b7288cbaa19750f430dc617cef2111 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_head.py @@ -0,0 +1,701 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from paddle import ParamAttr +from paddle.nn.initializer import KaimingNormal +from paddle.nn.initializer import Normal, Constant + +from ..bbox_utils import batch_distance2bbox +from ..losses import GIoULoss +from ..initializer import bias_init_with_prob, constant_, normal_ +from ..assigners.utils import generate_anchors_for_grid_cell +from ppdet.modeling.backbones.cspresnet import ConvBNLayer, RepVggBlock +from ppdet.modeling.ops import get_static_shape, get_act_fn +from ppdet.modeling.layers import MultiClassNMS + +__all__ = ['PPYOLOEHead', 'SimpleConvHead'] + + +class ESEAttn(nn.Layer): + def __init__(self, feat_channels, act='swish', attn_conv='convbn'): + super(ESEAttn, self).__init__() + self.fc = nn.Conv2D(feat_channels, feat_channels, 1) + if attn_conv == 'convbn': + self.conv = ConvBNLayer(feat_channels, feat_channels, 1, act=act) + elif attn_conv == 'repvgg': + self.conv = RepVggBlock(feat_channels, feat_channels, act=act) + else: + self.conv = None + self._init_weights() + + def _init_weights(self): + normal_(self.fc.weight, std=0.001) + + def forward(self, feat, avg_feat): + weight = F.sigmoid(self.fc(avg_feat)) + if self.conv: + return self.conv(feat * weight) + else: + return feat * weight + + +@register +class PPYOLOEHead(nn.Layer): + __shared__ = [ + 'num_classes', 'eval_size', 'trt', 'exclude_nms', + 'exclude_post_process', 'use_shared_conv', 'for_distill' + ] + __inject__ = ['static_assigner', 'assigner', 'nms'] + + def __init__(self, + in_channels=[1024, 512, 256], + num_classes=80, + act='swish', + fpn_strides=(32, 16, 8), + grid_cell_scale=5.0, + grid_cell_offset=0.5, + reg_max=16, + reg_range=None, + static_assigner_epoch=4, + use_varifocal_loss=True, + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner', + nms='MultiClassNMS', + eval_size=None, + loss_weight={ + 'class': 1.0, + 'iou': 2.5, + 'dfl': 0.5, + }, + trt=False, + attn_conv='convbn', + exclude_nms=False, + exclude_post_process=False, + use_shared_conv=True, + for_distill=False): + super(PPYOLOEHead, self).__init__() + assert len(in_channels) > 0, "len(in_channels) should > 0" + self.in_channels = in_channels + self.num_classes = num_classes + self.fpn_strides = fpn_strides + self.grid_cell_scale = grid_cell_scale + self.grid_cell_offset = grid_cell_offset + if reg_range: + self.sm_use = True + self.reg_range = reg_range + else: + self.sm_use = False + self.reg_range = (0, reg_max + 1) + self.reg_channels = self.reg_range[1] - self.reg_range[0] + self.iou_loss = GIoULoss() + self.loss_weight = loss_weight + self.use_varifocal_loss = use_varifocal_loss + self.eval_size = eval_size + + self.static_assigner_epoch = static_assigner_epoch + self.static_assigner = static_assigner + self.assigner = assigner + self.nms = nms + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt + self.exclude_nms = exclude_nms + self.exclude_post_process = exclude_post_process + self.use_shared_conv = use_shared_conv + self.for_distill = for_distill + self.is_teacher = False + + # stem + self.stem_cls = nn.LayerList() + self.stem_reg = nn.LayerList() + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + for in_c in self.in_channels: + self.stem_cls.append(ESEAttn(in_c, act=act, attn_conv=attn_conv)) + self.stem_reg.append(ESEAttn(in_c, act=act, attn_conv=attn_conv)) + # pred head + self.pred_cls = nn.LayerList() + self.pred_reg = nn.LayerList() + for in_c in self.in_channels: + self.pred_cls.append( + nn.Conv2D( + in_c, self.num_classes, 3, padding=1)) + self.pred_reg.append( + nn.Conv2D( + in_c, 4 * self.reg_channels, 3, padding=1)) + # projection conv + self.proj_conv = nn.Conv2D(self.reg_channels, 1, 1, bias_attr=False) + self.proj_conv.skip_quant = True + self._init_weights() + + if self.for_distill: + self.distill_pairs = {} + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def _init_weights(self): + bias_cls = bias_init_with_prob(0.01) + for cls_, reg_ in zip(self.pred_cls, self.pred_reg): + constant_(cls_.weight) + constant_(cls_.bias, bias_cls) + constant_(reg_.weight) + constant_(reg_.bias, 1.0) + + proj = paddle.linspace(self.reg_range[0], self.reg_range[1] - 1, + self.reg_channels).reshape( + [1, self.reg_channels, 1, 1]) + self.proj_conv.weight.set_value(proj) + self.proj_conv.weight.stop_gradient = True + if self.eval_size: + anchor_points, stride_tensor = self._generate_anchors() + self.anchor_points = anchor_points + self.stride_tensor = stride_tensor + + def forward_train(self, feats, targets, aux_pred=None): + anchors, anchor_points, num_anchors_list, stride_tensor = \ + generate_anchors_for_grid_cell( + feats, self.fpn_strides, self.grid_cell_scale, + self.grid_cell_offset) + + cls_score_list, reg_distri_list = [], [] + for i, feat in enumerate(feats): + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) + + feat) + reg_distri = self.pred_reg[i](self.stem_reg[i](feat, avg_feat)) + # cls and reg + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + reg_distri_list.append(reg_distri.flatten(2).transpose([0, 2, 1])) + cls_score_list = paddle.concat(cls_score_list, axis=1) + reg_distri_list = paddle.concat(reg_distri_list, axis=1) + + if targets.get('is_teacher', False): + pred_deltas, pred_dfls = self._bbox_decode_fake(reg_distri_list) + return cls_score_list, pred_deltas * stride_tensor, pred_dfls + + if targets.get('get_data', False): + pred_deltas, pred_dfls = self._bbox_decode_fake(reg_distri_list) + return cls_score_list, pred_deltas * stride_tensor, pred_dfls + + return self.get_loss([ + cls_score_list, reg_distri_list, anchors, anchor_points, + num_anchors_list, stride_tensor + ], targets, aux_pred) + + def _generate_anchors(self, feats=None, dtype='float32'): + # just use in eval time + anchor_points = [] + stride_tensor = [] + for i, stride in enumerate(self.fpn_strides): + if feats is not None: + _, _, h, w = feats[i].shape + else: + h = int(self.eval_size[0] / stride) + w = int(self.eval_size[1] / stride) + shift_x = paddle.arange(end=w) + self.grid_cell_offset + shift_y = paddle.arange(end=h) + self.grid_cell_offset + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype=dtype) + anchor_points.append(anchor_point.reshape([-1, 2])) + stride_tensor.append(paddle.full([h * w, 1], stride, dtype=dtype)) + anchor_points = paddle.concat(anchor_points) + stride_tensor = paddle.concat(stride_tensor) + return anchor_points, stride_tensor + + def forward_eval(self, feats): + if self.eval_size: + anchor_points, stride_tensor = self.anchor_points, self.stride_tensor + else: + anchor_points, stride_tensor = self._generate_anchors(feats) + cls_score_list, reg_dist_list = [], [] + for i, feat in enumerate(feats): + _, _, h, w = feat.shape + l = h * w + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) + + feat) + reg_dist = self.pred_reg[i](self.stem_reg[i](feat, avg_feat)) + reg_dist = reg_dist.reshape( + [-1, 4, self.reg_channels, l]).transpose([0, 2, 3, 1]) + if self.use_shared_conv: + reg_dist = self.proj_conv(F.softmax( + reg_dist, axis=1)).squeeze(1) + else: + reg_dist = F.softmax(reg_dist, axis=1) + # cls and reg + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.reshape([-1, self.num_classes, l])) + reg_dist_list.append(reg_dist) + + cls_score_list = paddle.concat(cls_score_list, axis=-1) + if self.use_shared_conv: + reg_dist_list = paddle.concat(reg_dist_list, axis=1) + else: + reg_dist_list = paddle.concat(reg_dist_list, axis=2) + reg_dist_list = self.proj_conv(reg_dist_list).squeeze(1) + + return cls_score_list, reg_dist_list, anchor_points, stride_tensor + + def forward(self, feats, targets=None, aux_pred=None): + assert len(feats) == len(self.fpn_strides), \ + "The size of feats is not equal to size of fpn_strides" + + if self.training: + return self.forward_train(feats, targets, aux_pred) + else: + if targets is not None: + # only for semi-det + self.is_teacher = targets.get('is_teacher', False) + if self.is_teacher: + return self.forward_train(feats, targets, aux_pred=None) + else: + return self.forward_eval(feats) + + return self.forward_eval(feats) + + @staticmethod + def _focal_loss(score, label, alpha=0.25, gamma=2.0): + weight = (score - label).pow(gamma) + if alpha > 0: + alpha_t = alpha * label + (1 - alpha) * (1 - label) + weight *= alpha_t + loss = F.binary_cross_entropy( + score, label, weight=weight, reduction='sum') + return loss + + @staticmethod + def _varifocal_loss(pred_score, gt_score, label, alpha=0.75, gamma=2.0): + weight = alpha * pred_score.pow(gamma) * (1 - label) + gt_score * label + loss = F.binary_cross_entropy( + pred_score, gt_score, weight=weight, reduction='sum') + return loss + + def _bbox_decode(self, anchor_points, pred_dist): + _, l, _ = get_static_shape(pred_dist) + pred_dist = F.softmax(pred_dist.reshape([-1, l, 4, self.reg_channels])) + pred_dist = self.proj_conv(pred_dist.transpose([0, 3, 1, 2])).squeeze(1) + return batch_distance2bbox(anchor_points, pred_dist) + + def _bbox_decode_fake(self, pred_dist): + _, l, _ = get_static_shape(pred_dist) + pred_dist_dfl = F.softmax( + pred_dist.reshape([-1, l, 4, self.reg_channels])) + pred_dist = self.proj_conv(pred_dist_dfl.transpose([0, 3, 1, 2 + ])).squeeze(1) + return pred_dist, pred_dist_dfl + + def _bbox2distance(self, points, bbox): + x1y1, x2y2 = paddle.split(bbox, 2, -1) + lt = points - x1y1 + rb = x2y2 - points + return paddle.concat([lt, rb], -1).clip(self.reg_range[0], + self.reg_range[1] - 1 - 0.01) + + def _df_loss(self, pred_dist, target, lower_bound=0): + target_left = paddle.cast(target.floor(), 'int64') + target_right = target_left + 1 + weight_left = target_right.astype('float32') - target + weight_right = 1 - weight_left + loss_left = F.cross_entropy( + pred_dist, target_left - lower_bound, + reduction='none') * weight_left + loss_right = F.cross_entropy( + pred_dist, target_right - lower_bound, + reduction='none') * weight_right + return (loss_left + loss_right).mean(-1, keepdim=True) + + def _bbox_loss(self, pred_dist, pred_bboxes, anchor_points, assigned_labels, + assigned_bboxes, assigned_scores, assigned_scores_sum): + # select positive samples mask + mask_positive = (assigned_labels != self.num_classes) + + if self.for_distill: + # only used for LD main_kd distill + self.distill_pairs['mask_positive_select'] = mask_positive + + num_pos = mask_positive.sum() + # pos/neg loss + if num_pos > 0: + # l1 + iou + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 4]) + pred_bboxes_pos = paddle.masked_select(pred_bboxes, + bbox_mask).reshape([-1, 4]) + assigned_bboxes_pos = paddle.masked_select( + assigned_bboxes, bbox_mask).reshape([-1, 4]) + bbox_weight = paddle.masked_select( + assigned_scores.sum(-1), mask_positive).unsqueeze(-1) + + loss_l1 = F.l1_loss(pred_bboxes_pos, assigned_bboxes_pos) + + loss_iou = self.iou_loss(pred_bboxes_pos, + assigned_bboxes_pos) * bbox_weight + loss_iou = loss_iou.sum() / assigned_scores_sum + + dist_mask = mask_positive.unsqueeze(-1).tile( + [1, 1, self.reg_channels * 4]) + pred_dist_pos = paddle.masked_select( + pred_dist, dist_mask).reshape([-1, 4, self.reg_channels]) + assigned_ltrb = self._bbox2distance(anchor_points, assigned_bboxes) + assigned_ltrb_pos = paddle.masked_select( + assigned_ltrb, bbox_mask).reshape([-1, 4]) + loss_dfl = self._df_loss(pred_dist_pos, assigned_ltrb_pos, + self.reg_range[0]) * bbox_weight + loss_dfl = loss_dfl.sum() / assigned_scores_sum + if self.for_distill: + self.distill_pairs['pred_bboxes_pos'] = pred_bboxes_pos + self.distill_pairs['pred_dist_pos'] = pred_dist_pos + self.distill_pairs['bbox_weight'] = bbox_weight + else: + loss_l1 = paddle.zeros([1]) + loss_iou = paddle.zeros([1]) + loss_dfl = pred_dist.sum() * 0. + return loss_l1, loss_iou, loss_dfl + + def get_loss(self, head_outs, gt_meta, aux_pred=None): + pred_scores, pred_distri, anchors,\ + anchor_points, num_anchors_list, stride_tensor = head_outs + + anchor_points_s = anchor_points / stride_tensor + pred_bboxes = self._bbox_decode(anchor_points_s, pred_distri) + + if aux_pred is not None: + pred_scores_aux = aux_pred[0] + pred_bboxes_aux = self._bbox_decode(anchor_points_s, aux_pred[1]) + + gt_labels = gt_meta['gt_class'] + gt_bboxes = gt_meta['gt_bbox'] + pad_gt_mask = gt_meta['pad_gt_mask'] + # label assignment + if gt_meta['epoch_id'] < self.static_assigner_epoch: + assigned_labels, assigned_bboxes, assigned_scores, mask_positive = \ + self.static_assigner( + anchors, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes, + pred_bboxes=pred_bboxes.detach() * stride_tensor) + alpha_l = 0.25 + else: + if self.sm_use: + # only used in smalldet of PPYOLOE-SOD model + assigned_labels, assigned_bboxes, assigned_scores, mask_positive = \ + self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, + anchor_points, + stride_tensor, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + else: + if aux_pred is None: + if not hasattr(self, "assigned_labels"): + assigned_labels, assigned_bboxes, assigned_scores, mask_positive = \ + self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + if self.for_distill: + self.assigned_labels = assigned_labels + self.assigned_bboxes = assigned_bboxes + self.assigned_scores = assigned_scores + self.mask_positive = mask_positive + else: + # only used in distill + assigned_labels = self.assigned_labels + assigned_bboxes = self.assigned_bboxes + assigned_scores = self.assigned_scores + mask_positive = self.mask_positive + else: + assigned_labels, assigned_bboxes, assigned_scores, mask_positive = \ + self.assigner( + pred_scores_aux.detach(), + pred_bboxes_aux.detach() * stride_tensor, + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + alpha_l = -1 + # rescale bbox + assigned_bboxes /= stride_tensor + + assign_out_dict = self.get_loss_from_assign( + pred_scores, pred_distri, pred_bboxes, anchor_points_s, + assigned_labels, assigned_bboxes, assigned_scores, mask_positive, + alpha_l) + + if aux_pred is not None: + assign_out_dict_aux = self.get_loss_from_assign( + aux_pred[0], aux_pred[1], pred_bboxes_aux, anchor_points_s, + assigned_labels, assigned_bboxes, assigned_scores, + mask_positive, alpha_l) + loss = {} + for key in assign_out_dict.keys(): + loss[key] = assign_out_dict[key] + assign_out_dict_aux[key] + else: + loss = assign_out_dict + + return loss + + def get_loss_from_assign(self, pred_scores, pred_distri, pred_bboxes, + anchor_points_s, assigned_labels, assigned_bboxes, + assigned_scores, mask_positive, alpha_l): + # cls loss + if self.use_varifocal_loss: + one_hot_label = F.one_hot(assigned_labels, + self.num_classes + 1)[..., :-1] + loss_cls = self._varifocal_loss(pred_scores, assigned_scores, + one_hot_label) + else: + loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l) + + assigned_scores_sum = assigned_scores.sum() + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(assigned_scores_sum) + assigned_scores_sum /= paddle.distributed.get_world_size() + assigned_scores_sum = paddle.clip(assigned_scores_sum, min=1.) + loss_cls /= assigned_scores_sum + + if self.for_distill: + self.distill_pairs['pred_cls_scores'] = pred_scores + self.distill_pairs['pos_num'] = assigned_scores_sum + self.distill_pairs['assigned_scores'] = assigned_scores + self.distill_pairs['mask_positive'] = mask_positive + one_hot_label = F.one_hot(assigned_labels, + self.num_classes + 1)[..., :-1] + self.distill_pairs['target_labels'] = one_hot_label + + loss_l1, loss_iou, loss_dfl = \ + self._bbox_loss(pred_distri, pred_bboxes, anchor_points_s, + assigned_labels, assigned_bboxes, assigned_scores, + assigned_scores_sum) + loss = self.loss_weight['class'] * loss_cls + \ + self.loss_weight['iou'] * loss_iou + \ + self.loss_weight['dfl'] * loss_dfl + out_dict = { + 'loss': loss, + 'loss_cls': loss_cls, + 'loss_iou': loss_iou, + 'loss_dfl': loss_dfl, + 'loss_l1': loss_l1, + } + return out_dict + + def post_process(self, head_outs, scale_factor): + pred_scores, pred_dist, anchor_points, stride_tensor = head_outs + pred_bboxes = batch_distance2bbox(anchor_points, pred_dist) + pred_bboxes *= stride_tensor + if self.exclude_post_process: + return paddle.concat( + [pred_bboxes, pred_scores.transpose([0, 2, 1])], + axis=-1), None, None + else: + # scale bbox to origin + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], + axis=-1).reshape([-1, 1, 4]) + pred_bboxes /= scale_factor + if self.exclude_nms: + # `exclude_nms=True` just use in benchmark + return pred_bboxes, pred_scores, None + else: + bbox_pred, bbox_num, nms_keep_idx = self.nms(pred_bboxes, + pred_scores) + return bbox_pred, bbox_num, nms_keep_idx + + +def get_activation(name="LeakyReLU"): + if name == "silu": + module = nn.Silu() + elif name == "relu": + module = nn.ReLU() + elif name in ["LeakyReLU", 'leakyrelu', 'lrelu']: + module = nn.LeakyReLU(0.1) + elif name is None: + module = nn.Identity() + else: + raise AttributeError("Unsupported act type: {}".format(name)) + return module + + +class ConvNormLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + norm_type='gn', + activation="LeakyReLU"): + super(ConvNormLayer, self).__init__() + assert norm_type in ['bn', 'sync_bn', 'syncbn', 'gn', None] + self.conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + bias_attr=False, + weight_attr=ParamAttr(initializer=KaimingNormal())) + + if norm_type in ['bn', 'sync_bn', 'syncbn']: + self.norm = nn.BatchNorm2D(out_channels) + elif norm_type == 'gn': + self.norm = nn.GroupNorm(num_groups=32, num_channels=out_channels) + else: + self.norm = None + + self.act = get_activation(activation) + + def forward(self, x): + y = self.conv(x) + if self.norm is not None: + y = self.norm(y) + y = self.act(y) + return y + + +class ScaleReg(nn.Layer): + """ + Parameter for scaling the regression outputs. + """ + + def __init__(self, scale=1.0): + super(ScaleReg, self).__init__() + scale = paddle.to_tensor(scale) + self.scale = self.create_parameter( + shape=[1], + dtype='float32', + default_initializer=nn.initializer.Assign(scale)) + + def forward(self, x): + return x * self.scale + + +@register +class SimpleConvHead(nn.Layer): + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + feat_in=288, + feat_out=288, + num_convs=1, + fpn_strides=[32, 16, 8, 4], + norm_type='gn', + act='LeakyReLU', + prior_prob=0.01, + reg_max=16): + super(SimpleConvHead, self).__init__() + self.num_classes = num_classes + self.feat_in = feat_in + self.feat_out = feat_out + self.num_convs = num_convs + self.fpn_strides = fpn_strides + self.reg_max = reg_max + + self.cls_convs = nn.LayerList() + self.reg_convs = nn.LayerList() + for i in range(self.num_convs): + in_c = feat_in if i == 0 else feat_out + self.cls_convs.append( + ConvNormLayer( + in_c, + feat_out, + 3, + stride=1, + padding=1, + norm_type=norm_type, + activation=act)) + self.reg_convs.append( + ConvNormLayer( + in_c, + feat_out, + 3, + stride=1, + padding=1, + norm_type=norm_type, + activation=act)) + + bias_cls = bias_init_with_prob(prior_prob) + self.gfl_cls = nn.Conv2D( + feat_out, + self.num_classes, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=bias_cls))) + self.gfl_reg = nn.Conv2D( + feat_out, + 4 * (self.reg_max + 1), + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0))) + + self.scales = nn.LayerList() + for i in range(len(self.fpn_strides)): + self.scales.append(ScaleReg(1.0)) + + def forward(self, feats): + cls_scores = [] + bbox_preds = [] + for x, scale in zip(feats, self.scales): + cls_feat = x + reg_feat = x + for cls_conv in self.cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in self.reg_convs: + reg_feat = reg_conv(reg_feat) + + cls_score = self.gfl_cls(cls_feat) + cls_score = F.sigmoid(cls_score) + cls_score = cls_score.flatten(2).transpose([0, 2, 1]) + cls_scores.append(cls_score) + + bbox_pred = scale(self.gfl_reg(reg_feat)) + bbox_pred = bbox_pred.flatten(2).transpose([0, 2, 1]) + bbox_preds.append(bbox_pred) + + cls_scores = paddle.concat(cls_scores, axis=1) + bbox_preds = paddle.concat(bbox_preds, axis=1) + return cls_scores, bbox_preds diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_r_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_r_head.py new file mode 100644 index 0000000000000000000000000000000000000000..e7cf772f56991152f138dfbf7f5297d01c0e0b0f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/ppyoloe_r_head.py @@ -0,0 +1,425 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +from ..losses import ProbIoULoss +from ..initializer import bias_init_with_prob, constant_, normal_, vector_ +from ppdet.modeling.backbones.cspresnet import ConvBNLayer +from ppdet.modeling.ops import get_static_shape, get_act_fn, anchor_generator +from ppdet.modeling.layers import MultiClassNMS + +__all__ = ['PPYOLOERHead'] + + +class ESEAttn(nn.Layer): + def __init__(self, feat_channels, act='swish'): + super(ESEAttn, self).__init__() + self.fc = nn.Conv2D(feat_channels, feat_channels, 1) + self.conv = ConvBNLayer(feat_channels, feat_channels, 1, act=act) + + self._init_weights() + + def _init_weights(self): + normal_(self.fc.weight, std=0.01) + + def forward(self, feat, avg_feat): + weight = F.sigmoid(self.fc(avg_feat)) + return self.conv(feat * weight) + + +@register +class PPYOLOERHead(nn.Layer): + __shared__ = ['num_classes', 'trt', 'export_onnx'] + __inject__ = ['static_assigner', 'assigner', 'nms'] + + def __init__(self, + in_channels=[1024, 512, 256], + num_classes=15, + act='swish', + fpn_strides=(32, 16, 8), + grid_cell_offset=0.5, + angle_max=90, + use_varifocal_loss=True, + static_assigner_epoch=4, + trt=False, + export_onnx=False, + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner', + nms='MultiClassNMS', + loss_weight={'class': 1.0, + 'iou': 2.5, + 'dfl': 0.05}): + super(PPYOLOERHead, self).__init__() + assert len(in_channels) > 0, "len(in_channels) should > 0" + self.in_channels = in_channels + self.num_classes = num_classes + self.fpn_strides = fpn_strides + self.grid_cell_offset = grid_cell_offset + self.angle_max = angle_max + self.loss_weight = loss_weight + self.use_varifocal_loss = use_varifocal_loss + self.half_pi = paddle.to_tensor( + [1.5707963267948966], dtype=paddle.float32) + self.half_pi_bin = self.half_pi / angle_max + self.iou_loss = ProbIoULoss() + self.static_assigner_epoch = static_assigner_epoch + self.static_assigner = static_assigner + self.assigner = assigner + self.nms = nms + # stem + self.stem_cls = nn.LayerList() + self.stem_reg = nn.LayerList() + self.stem_angle = nn.LayerList() + trt = False if export_onnx else trt + self.export_onnx = export_onnx + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + self.trt = trt + for in_c in self.in_channels: + self.stem_cls.append(ESEAttn(in_c, act=act)) + self.stem_reg.append(ESEAttn(in_c, act=act)) + self.stem_angle.append(ESEAttn(in_c, act=act)) + # pred head + self.pred_cls = nn.LayerList() + self.pred_reg = nn.LayerList() + self.pred_angle = nn.LayerList() + for in_c in self.in_channels: + self.pred_cls.append( + nn.Conv2D( + in_c, self.num_classes, 3, padding=1)) + self.pred_reg.append(nn.Conv2D(in_c, 4, 3, padding=1)) + self.pred_angle.append( + nn.Conv2D( + in_c, self.angle_max + 1, 3, padding=1)) + self.angle_proj_conv = nn.Conv2D( + self.angle_max + 1, 1, 1, bias_attr=False) + self._init_weights() + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def _init_weights(self): + bias_cls = bias_init_with_prob(0.01) + bias_angle = [10.] + [1.] * self.angle_max + for cls_, reg_, angle_ in zip(self.pred_cls, self.pred_reg, + self.pred_angle): + normal_(cls_.weight, std=0.01) + constant_(cls_.bias, bias_cls) + normal_(reg_.weight, std=0.01) + constant_(reg_.bias) + constant_(angle_.weight) + vector_(angle_.bias, bias_angle) + + angle_proj = paddle.linspace(0, self.angle_max, self.angle_max + 1) + self.angle_proj = angle_proj * self.half_pi_bin + self.angle_proj_conv.weight.set_value( + self.angle_proj.reshape([1, self.angle_max + 1, 1, 1])) + self.angle_proj_conv.weight.stop_gradient = True + + def _generate_anchors(self, feats): + if self.trt: + anchor_points = [] + for feat, stride in zip(feats, self.fpn_strides): + _, _, h, w = paddle.shape(feat) + anchor, _ = anchor_generator( + feat, + stride * 4, + 1.0, [1.0, 1.0, 1.0, 1.0], [stride, stride], + offset=0.5) + x1, y1, x2, y2 = paddle.split(anchor, 4, axis=-1) + xc = (x1 + x2 + 1) / 2 + yc = (y1 + y2 + 1) / 2 + anchor_point = paddle.concat( + [xc, yc], axis=-1).reshape((1, h * w, 2)) + anchor_points.append(anchor_point) + anchor_points = paddle.concat(anchor_points, axis=1) + return anchor_points, None, None + else: + anchor_points = [] + stride_tensor = [] + num_anchors_list = [] + for feat, stride in zip(feats, self.fpn_strides): + _, _, h, w = paddle.shape(feat) + shift_x = (paddle.arange(end=w) + 0.5) * stride + shift_y = (paddle.arange(end=h) + 0.5) * stride + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([1, -1, 2])) + stride_tensor.append( + paddle.full( + [1, h * w, 1], stride, dtype='float32')) + num_anchors_list.append(h * w) + anchor_points = paddle.concat(anchor_points, axis=1) + stride_tensor = paddle.concat(stride_tensor, axis=1) + return anchor_points, stride_tensor, num_anchors_list + + def forward(self, feats, targets=None): + assert len(feats) == len(self.fpn_strides), \ + "The size of feats is not equal to size of fpn_strides" + + if self.training: + return self.forward_train(feats, targets) + else: + return self.forward_eval(feats) + + def forward_train(self, feats, targets): + anchor_points, stride_tensor, num_anchors_list = self._generate_anchors( + feats) + + cls_score_list, reg_dist_list, reg_angle_list = [], [], [] + for i, feat in enumerate(feats): + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) + + feat) + reg_dist = self.pred_reg[i](self.stem_reg[i](feat, avg_feat)) + reg_angle = self.pred_angle[i](self.stem_angle[i](feat, avg_feat)) + # cls and reg + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + reg_dist_list.append(reg_dist.flatten(2).transpose([0, 2, 1])) + reg_angle_list.append(reg_angle.flatten(2).transpose([0, 2, 1])) + cls_score_list = paddle.concat(cls_score_list, axis=1) + reg_dist_list = paddle.concat(reg_dist_list, axis=1) + reg_angle_list = paddle.concat(reg_angle_list, axis=1) + + return self.get_loss([ + cls_score_list, reg_dist_list, reg_angle_list, anchor_points, + num_anchors_list, stride_tensor + ], targets) + + def forward_eval(self, feats): + cls_score_list, reg_box_list = [], [] + anchor_points, _, _ = self._generate_anchors(feats) + for i, (feat, stride) in enumerate(zip(feats, self.fpn_strides)): + b, _, h, w = paddle.shape(feat) + l = h * w + # cls + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) + + feat) + # reg + reg_dist = self.pred_reg[i](self.stem_reg[i](feat, avg_feat)) + reg_xy, reg_wh = paddle.split(reg_dist, 2, axis=1) + reg_xy = reg_xy * stride + reg_wh = (F.elu(reg_wh) + 1.) * stride + reg_angle = self.pred_angle[i](self.stem_angle[i](feat, avg_feat)) + reg_angle = self.angle_proj_conv(F.softmax(reg_angle, axis=1)) + reg_box = paddle.concat([reg_xy, reg_wh, reg_angle], axis=1) + # cls and reg + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.reshape([b, self.num_classes, l])) + reg_box_list.append(reg_box.reshape([b, 5, l])) + + cls_score_list = paddle.concat(cls_score_list, axis=-1) + reg_box_list = paddle.concat(reg_box_list, axis=-1).transpose([0, 2, 1]) + reg_xy, reg_wha = paddle.split(reg_box_list, [2, 3], axis=-1) + reg_xy = reg_xy + anchor_points + reg_box_list = paddle.concat([reg_xy, reg_wha], axis=-1) + return cls_score_list, reg_box_list + + def _bbox_decode(self, points, pred_dist, pred_angle, stride_tensor): + # predict vector to x, y, w, h, angle + b, l = pred_angle.shape[:2] + xy, wh = paddle.split(pred_dist, 2, axis=-1) + xy = xy * stride_tensor + points + wh = (F.elu(wh) + 1.) * stride_tensor + angle = F.softmax(pred_angle.reshape([b, l, 1, self.angle_max + 1 + ])).matmul(self.angle_proj) + return paddle.concat([xy, wh, angle], axis=-1) + + def get_loss(self, head_outs, gt_meta): + pred_scores, pred_dist, pred_angle, \ + anchor_points, num_anchors_list, stride_tensor = head_outs + # [B, N, 5] -> [B, N, 5] + pred_bboxes = self._bbox_decode(anchor_points, pred_dist, pred_angle, + stride_tensor) + gt_labels = gt_meta['gt_class'] + # [B, N, 5] + gt_bboxes = gt_meta['gt_rbox'] + pad_gt_mask = gt_meta['pad_gt_mask'] + # label assignment + if gt_meta['epoch_id'] < self.static_assigner_epoch: + assigned_labels, assigned_bboxes, assigned_scores = \ + self.static_assigner( + anchor_points, + stride_tensor, + num_anchors_list, + gt_labels, + gt_meta['gt_bbox'], + gt_bboxes, + pad_gt_mask, + self.num_classes, + pred_bboxes.detach() + ) + else: + assigned_labels, assigned_bboxes, assigned_scores = \ + self.assigner( + pred_scores.detach(), + pred_bboxes.detach(), + anchor_points, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + alpha_l = -1 + # cls loss + if self.use_varifocal_loss: + one_hot_label = F.one_hot(assigned_labels, + self.num_classes + 1)[..., :-1] + loss_cls = self._varifocal_loss(pred_scores, assigned_scores, + one_hot_label) + else: + loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l) + + assigned_scores_sum = assigned_scores.sum() + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(assigned_scores_sum) + assigned_scores_sum = paddle.clip( + assigned_scores_sum / paddle.distributed.get_world_size(), + min=1.) + else: + assigned_scores_sum = paddle.clip(assigned_scores_sum, min=1.) + loss_cls /= assigned_scores_sum + + loss_iou, loss_dfl = self._bbox_loss(pred_angle, pred_bboxes, + anchor_points, assigned_labels, + assigned_bboxes, assigned_scores, + assigned_scores_sum, stride_tensor) + + loss = self.loss_weight['class'] * loss_cls + \ + self.loss_weight['iou'] * loss_iou + \ + self.loss_weight['dfl'] * loss_dfl + out_dict = { + 'loss': loss, + 'loss_cls': loss_cls, + 'loss_iou': loss_iou, + 'loss_dfl': loss_dfl + } + return out_dict + + @staticmethod + def _focal_loss(score, label, alpha=0.25, gamma=2.0): + weight = (score - label).pow(gamma) + if alpha > 0: + alpha_t = alpha * label + (1 - alpha) * (1 - label) + weight *= alpha_t + loss = F.binary_cross_entropy( + score, label, weight=weight, reduction='sum') + return loss + + @staticmethod + def _varifocal_loss(pred_score, gt_score, label, alpha=0.75, gamma=2.0): + weight = alpha * pred_score.pow(gamma) * (1 - label) + gt_score * label + loss = F.binary_cross_entropy( + pred_score, gt_score, weight=weight, reduction='sum') + return loss + + @staticmethod + def _df_loss(pred_dist, target): + target_left = paddle.cast(target, 'int64') + target_right = target_left + 1 + weight_left = target_right.astype('float32') - target + weight_right = 1 - weight_left + loss_left = F.cross_entropy( + pred_dist, target_left, reduction='none') * weight_left + loss_right = F.cross_entropy( + pred_dist, target_right, reduction='none') * weight_right + return (loss_left + loss_right).mean(-1, keepdim=True) + + def _bbox_loss(self, pred_angle, pred_bboxes, anchor_points, + assigned_labels, assigned_bboxes, assigned_scores, + assigned_scores_sum, stride_tensor): + # select positive samples mask + mask_positive = (assigned_labels != self.num_classes) + num_pos = mask_positive.sum() + # pos/neg loss + if num_pos > 0: + # iou + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 5]) + pred_bboxes_pos = paddle.masked_select(pred_bboxes, + bbox_mask).reshape([-1, 5]) + assigned_bboxes_pos = paddle.masked_select( + assigned_bboxes, bbox_mask).reshape([-1, 5]) + bbox_weight = paddle.masked_select( + assigned_scores.sum(-1), mask_positive).reshape([-1]) + + loss_iou = self.iou_loss(pred_bboxes_pos, + assigned_bboxes_pos) * bbox_weight + loss_iou = loss_iou.sum() / assigned_scores_sum + + # dfl + angle_mask = mask_positive.unsqueeze(-1).tile( + [1, 1, self.angle_max + 1]) + pred_angle_pos = paddle.masked_select( + pred_angle, angle_mask).reshape([-1, self.angle_max + 1]) + assigned_angle_pos = ( + assigned_bboxes_pos[:, 4] / + self.half_pi_bin).clip(0, self.angle_max - 0.01) + loss_dfl = self._df_loss(pred_angle_pos, assigned_angle_pos) + else: + loss_iou = pred_bboxes.sum() * 0. + loss_dfl = paddle.zeros([1]) + + return loss_iou, loss_dfl + + def _box2corners(self, pred_bboxes): + """ convert (x, y, w, h, angle) to (x1, y1, x2, y2, x3, y3, x4, y4) + + Args: + pred_bboxes (Tensor): [B, N, 5] + + Returns: + polys (Tensor): [B, N, 8] + """ + x, y, w, h, angle = paddle.split(pred_bboxes, 5, axis=-1) + cos_a_half = paddle.cos(angle) * 0.5 + sin_a_half = paddle.sin(angle) * 0.5 + w_x = cos_a_half * w + w_y = sin_a_half * w + h_x = -sin_a_half * h + h_y = cos_a_half * h + return paddle.concat( + [ + x + w_x + h_x, y + w_y + h_y, x - w_x + h_x, y - w_y + h_y, + x - w_x - h_x, y - w_y - h_y, x + w_x - h_x, y + w_y - h_y + ], + axis=-1) + + def post_process(self, head_outs, scale_factor): + pred_scores, pred_bboxes = head_outs + # [B, N, 5] -> [B, N, 8] + pred_bboxes = self._box2corners(pred_bboxes) + # scale bbox to origin + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [ + scale_x, scale_y, scale_x, scale_y, scale_x, scale_y, scale_x, + scale_y + ], + axis=-1).reshape([-1, 1, 8]) + pred_bboxes /= scale_factor + if self.export_onnx: + return pred_bboxes, pred_scores, None + bbox_pred, bbox_num, nms_keep_idx = self.nms(pred_bboxes, + pred_scores) + return bbox_pred, bbox_num, nms_keep_idx diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/retina_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/retina_head.py new file mode 100644 index 0000000000000000000000000000000000000000..67a51265d1decee3d3077a504771f5be050101f3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/retina_head.py @@ -0,0 +1,278 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Constant +from ppdet.modeling.bbox_utils import bbox2delta, delta2bbox +from ppdet.modeling.heads.fcos_head import FCOSFeat + +from ppdet.core.workspace import register + +__all__ = ['RetinaHead'] + + +@register +class RetinaFeat(FCOSFeat): + """We use FCOSFeat to construct conv layers in RetinaNet. + We rename FCOSFeat to RetinaFeat to avoid confusion. + """ + pass + + +@register +class RetinaHead(nn.Layer): + """Used in RetinaNet proposed in paper https://arxiv.org/pdf/1708.02002.pdf + """ + __shared__ = ['num_classes'] + __inject__ = [ + 'conv_feat', 'anchor_generator', 'bbox_assigner', 'loss_class', + 'loss_bbox', 'nms' + ] + + def __init__(self, + num_classes=80, + conv_feat='RetinaFeat', + anchor_generator='RetinaAnchorGenerator', + bbox_assigner='MaxIoUAssigner', + loss_class='FocalLoss', + loss_bbox='SmoothL1Loss', + nms='MultiClassNMS', + prior_prob=0.01, + nms_pre=1000, + weights=[1., 1., 1., 1.]): + super(RetinaHead, self).__init__() + self.num_classes = num_classes + self.conv_feat = conv_feat + self.anchor_generator = anchor_generator + self.bbox_assigner = bbox_assigner + self.loss_class = loss_class + self.loss_bbox = loss_bbox + self.nms = nms + self.nms_pre = nms_pre + self.weights = weights + + bias_init_value = -math.log((1 - prior_prob) / prior_prob) + num_anchors = self.anchor_generator.num_anchors + self.retina_cls = nn.Conv2D( + in_channels=self.conv_feat.feat_out, + out_channels=self.num_classes * num_anchors, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=bias_init_value))) + self.retina_reg = nn.Conv2D( + in_channels=self.conv_feat.feat_out, + out_channels=4 * num_anchors, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0))) + + def forward(self, neck_feats, targets=None): + cls_logits_list = [] + bboxes_reg_list = [] + for neck_feat in neck_feats: + conv_cls_feat, conv_reg_feat = self.conv_feat(neck_feat) + cls_logits = self.retina_cls(conv_cls_feat) + bbox_reg = self.retina_reg(conv_reg_feat) + cls_logits_list.append(cls_logits) + bboxes_reg_list.append(bbox_reg) + + if self.training: + return self.get_loss([cls_logits_list, bboxes_reg_list], targets) + else: + return [cls_logits_list, bboxes_reg_list] + + def get_loss(self, head_outputs, targets): + """Here we calculate loss for a batch of images. + We assign anchors to gts in each image and gather all the assigned + postive and negative samples. Then loss is calculated on the gathered + samples. + """ + cls_logits_list, bboxes_reg_list = head_outputs + anchors = self.anchor_generator(cls_logits_list) + anchors = paddle.concat(anchors) + + # matches: contain gt_inds + # match_labels: -1(ignore), 0(neg) or 1(pos) + matches_list, match_labels_list = [], [] + # assign anchors to gts, no sampling is involved + for gt_bbox in targets['gt_bbox']: + matches, match_labels = self.bbox_assigner(anchors, gt_bbox) + matches_list.append(matches) + match_labels_list.append(match_labels) + + # reshape network outputs + cls_logits = [ + _.transpose([0, 2, 3, 1]).reshape([0, -1, self.num_classes]) + for _ in cls_logits_list + ] + bboxes_reg = [ + _.transpose([0, 2, 3, 1]).reshape([0, -1, 4]) + for _ in bboxes_reg_list + ] + cls_logits = paddle.concat(cls_logits, axis=1) + bboxes_reg = paddle.concat(bboxes_reg, axis=1) + + cls_pred_list, cls_tar_list = [], [] + reg_pred_list, reg_tar_list = [], [] + # find and gather preds and targets in each image + for matches, match_labels, cls_logit, bbox_reg, gt_bbox, gt_class in \ + zip(matches_list, match_labels_list, cls_logits, bboxes_reg, + targets['gt_bbox'], targets['gt_class']): + pos_mask = (match_labels == 1) + neg_mask = (match_labels == 0) + chosen_mask = paddle.logical_or(pos_mask, neg_mask) + + gt_class = gt_class.reshape([-1]) + bg_class = paddle.to_tensor( + [self.num_classes], dtype=gt_class.dtype) + # a trick to assign num_classes to negative targets + gt_class = paddle.concat([gt_class, bg_class], axis=-1) + matches = paddle.where(neg_mask, + paddle.full_like(matches, gt_class.size - 1), + matches) + + cls_pred = cls_logit[chosen_mask] + cls_tar = gt_class[matches[chosen_mask]] + reg_pred = bbox_reg[pos_mask].reshape([-1, 4]) + reg_tar = gt_bbox[matches[pos_mask]].reshape([-1, 4]) + reg_tar = bbox2delta(anchors[pos_mask], reg_tar, self.weights) + cls_pred_list.append(cls_pred) + cls_tar_list.append(cls_tar) + reg_pred_list.append(reg_pred) + reg_tar_list.append(reg_tar) + cls_pred = paddle.concat(cls_pred_list) + cls_tar = paddle.concat(cls_tar_list) + reg_pred = paddle.concat(reg_pred_list) + reg_tar = paddle.concat(reg_tar_list) + + avg_factor = max(1.0, reg_pred.shape[0]) + cls_loss = self.loss_class( + cls_pred, cls_tar, reduction='sum') / avg_factor + + if reg_pred.shape[0] == 0: + reg_loss = paddle.zeros([1]) + reg_loss.stop_gradient = False + else: + reg_loss = self.loss_bbox( + reg_pred, reg_tar, reduction='sum') / avg_factor + + loss = cls_loss + reg_loss + out_dict = { + 'loss_cls': cls_loss, + 'loss_reg': reg_loss, + 'loss': loss, + } + return out_dict + + def get_bboxes_single(self, + anchors, + cls_scores_list, + bbox_preds_list, + im_shape, + scale_factor, + rescale=True): + assert len(cls_scores_list) == len(bbox_preds_list) + mlvl_bboxes = [] + mlvl_scores = [] + for anchor, cls_score, bbox_pred in zip(anchors, cls_scores_list, + bbox_preds_list): + cls_score = cls_score.reshape([-1, self.num_classes]) + bbox_pred = bbox_pred.reshape([-1, 4]) + if self.nms_pre is not None and cls_score.shape[0] > self.nms_pre: + max_score = cls_score.max(axis=1) + _, topk_inds = max_score.topk(self.nms_pre) + bbox_pred = bbox_pred.gather(topk_inds) + anchor = anchor.gather(topk_inds) + cls_score = cls_score.gather(topk_inds) + bbox_pred = delta2bbox(bbox_pred, anchor, self.weights).squeeze() + mlvl_bboxes.append(bbox_pred) + mlvl_scores.append(F.sigmoid(cls_score)) + mlvl_bboxes = paddle.concat(mlvl_bboxes) + mlvl_bboxes = paddle.squeeze(mlvl_bboxes) + if rescale: + mlvl_bboxes = mlvl_bboxes / paddle.concat( + [scale_factor[::-1], scale_factor[::-1]]) + mlvl_scores = paddle.concat(mlvl_scores) + mlvl_scores = mlvl_scores.transpose([1, 0]) + return mlvl_bboxes, mlvl_scores + + def decode(self, anchors, cls_logits, bboxes_reg, im_shape, scale_factor): + batch_bboxes = [] + batch_scores = [] + for img_id in range(cls_logits[0].shape[0]): + num_lvls = len(cls_logits) + cls_scores_list = [cls_logits[i][img_id] for i in range(num_lvls)] + bbox_preds_list = [bboxes_reg[i][img_id] for i in range(num_lvls)] + bboxes, scores = self.get_bboxes_single( + anchors, cls_scores_list, bbox_preds_list, im_shape[img_id], + scale_factor[img_id]) + batch_bboxes.append(bboxes) + batch_scores.append(scores) + batch_bboxes = paddle.stack(batch_bboxes, axis=0) + batch_scores = paddle.stack(batch_scores, axis=0) + return batch_bboxes, batch_scores + + def post_process(self, head_outputs, im_shape, scale_factor): + cls_logits_list, bboxes_reg_list = head_outputs + anchors = self.anchor_generator(cls_logits_list) + cls_logits = [_.transpose([0, 2, 3, 1]) for _ in cls_logits_list] + bboxes_reg = [_.transpose([0, 2, 3, 1]) for _ in bboxes_reg_list] + bboxes, scores = self.decode(anchors, cls_logits, bboxes_reg, im_shape, + scale_factor) + + bbox_pred, bbox_num, nms_keep_idx = self.nms(bboxes, scores) + return bbox_pred, bbox_num, nms_keep_idx + + + def get_scores_single(self, cls_scores_list): + mlvl_logits = [] + for cls_score in cls_scores_list: + cls_score = cls_score.reshape([-1, self.num_classes]) + if self.nms_pre is not None and cls_score.shape[0] > self.nms_pre: + max_score = cls_score.max(axis=1) + _, topk_inds = max_score.topk(self.nms_pre) + cls_score = cls_score.gather(topk_inds) + + mlvl_logits.append(cls_score) + + mlvl_logits = paddle.concat(mlvl_logits) + mlvl_logits = mlvl_logits.transpose([1, 0]) + + return mlvl_logits + + def decode_cls_logits(self, cls_logits_list): + cls_logits = [_.transpose([0, 2, 3, 1]) for _ in cls_logits_list] + batch_logits = [] + for img_id in range(cls_logits[0].shape[0]): + num_lvls = len(cls_logits) + cls_scores_list = [cls_logits[i][img_id] for i in range(num_lvls)] + logits = self.get_scores_single(cls_scores_list) + batch_logits.append(logits) + batch_logits = paddle.stack(batch_logits, axis=0) + return batch_logits + diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/roi_extractor.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/roi_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..6c2f5c81904bf1e55e7799c78a76edc2034e447b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/roi_extractor.py @@ -0,0 +1,118 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +from ppdet.core.workspace import register +from ppdet.modeling import ops +import paddle.nn as nn + + +def _to_list(v): + if not isinstance(v, (list, tuple)): + return [v] + return v + + +@register +class RoIAlign(nn.Layer): + """ + RoI Align module + + For more details, please refer to the document of roi_align in + in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/vision/ops.py + + Args: + resolution (int): The output size, default 14 + spatial_scale (float): Multiplicative spatial scale factor to translate + ROI coords from their input scale to the scale used when pooling. + default 0.0625 + sampling_ratio (int): The number of sampling points in the interpolation + grid, default 0 + canconical_level (int): The referring level of FPN layer with + specified level. default 4 + canonical_size (int): The referring scale of FPN layer with + specified scale. default 224 + start_level (int): The start level of FPN layer to extract RoI feature, + default 0 + end_level (int): The end level of FPN layer to extract RoI feature, + default 3 + aligned (bool): Whether to add offset to rois' coord in roi_align. + default false + """ + + def __init__(self, + resolution=14, + spatial_scale=0.0625, + sampling_ratio=0, + canconical_level=4, + canonical_size=224, + start_level=0, + end_level=3, + aligned=False): + super(RoIAlign, self).__init__() + self.resolution = resolution + self.spatial_scale = _to_list(spatial_scale) + self.sampling_ratio = sampling_ratio + self.canconical_level = canconical_level + self.canonical_size = canonical_size + self.start_level = start_level + self.end_level = end_level + self.aligned = aligned + + @classmethod + def from_config(cls, cfg, input_shape): + return {'spatial_scale': [1. / i.stride for i in input_shape]} + + def forward(self, feats, roi, rois_num): + roi = paddle.concat(roi) if len(roi) > 1 else roi[0] + if len(feats) == 1: + rois_feat = paddle.vision.ops.roi_align( + x=feats[self.start_level], + boxes=roi, + boxes_num=rois_num, + output_size=self.resolution, + spatial_scale=self.spatial_scale[0], + aligned=self.aligned) + else: + offset = 2 + k_min = self.start_level + offset + k_max = self.end_level + offset + if hasattr(paddle.vision.ops, "distribute_fpn_proposals"): + distribute_fpn_proposals = getattr(paddle.vision.ops, + "distribute_fpn_proposals") + else: + distribute_fpn_proposals = ops.distribute_fpn_proposals + rois_dist, restore_index, rois_num_dist = distribute_fpn_proposals( + roi, + k_min, + k_max, + self.canconical_level, + self.canonical_size, + rois_num=rois_num) + + rois_feat_list = [] + for lvl in range(self.start_level, self.end_level + 1): + roi_feat = paddle.vision.ops.roi_align( + x=feats[lvl], + boxes=rois_dist[lvl], + boxes_num=rois_num_dist[lvl], + output_size=self.resolution, + spatial_scale=self.spatial_scale[lvl], + sampling_ratio=self.sampling_ratio, + aligned=self.aligned) + rois_feat_list.append(roi_feat) + rois_feat_shuffle = paddle.concat(rois_feat_list) + rois_feat = paddle.gather(rois_feat_shuffle, restore_index) + + return rois_feat diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/s2anet_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/s2anet_head.py new file mode 100644 index 0000000000000000000000000000000000000000..8abddcff13540fa06a2a7374ed247ebed0915817 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/s2anet_head.py @@ -0,0 +1,745 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# The code is based on https://github.com/csuhan/s2anet/blob/master/mmdet/models/anchor_heads_rotated/s2anet_head.py + +import paddle +from paddle import ParamAttr +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, Constant +from ppdet.core.workspace import register +from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner +from ppdet.modeling.proposal_generator.anchor_generator import S2ANetAnchorGenerator +from ppdet.modeling.layers import AlignConv +from ..cls_utils import _get_class_default_kwargs +import numpy as np + + +@register +class S2ANetHead(nn.Layer): + """ + S2Anet head + Args: + stacked_convs (int): number of stacked_convs + feat_in (int): input channels of feat + feat_out (int): output channels of feat + num_classes (int): num_classes + anchor_strides (list): stride of anchors + anchor_scales (list): scale of anchors + anchor_ratios (list): ratios of anchors + target_means (list): target_means + target_stds (list): target_stds + align_conv_type (str): align_conv_type ['Conv', 'AlignConv'] + align_conv_size (int): kernel size of align_conv + use_sigmoid_cls (bool): use sigmoid_cls or not + reg_loss_weight (list): loss weight for regression + """ + __shared__ = ['num_classes'] + __inject__ = ['anchor_assign', 'nms'] + + def __init__(self, + stacked_convs=2, + feat_in=256, + feat_out=256, + num_classes=15, + anchor_strides=[8, 16, 32, 64, 128], + anchor_scales=[4], + anchor_ratios=[1.0], + target_means=0.0, + target_stds=1.0, + align_conv_type='AlignConv', + align_conv_size=3, + use_sigmoid_cls=True, + anchor_assign=_get_class_default_kwargs(RBoxAssigner), + reg_loss_weight=[1.0, 1.0, 1.0, 1.0, 1.1], + cls_loss_weight=[1.1, 1.05], + reg_loss_type='l1', + nms_pre=2000, + nms='MultiClassNMS'): + super(S2ANetHead, self).__init__() + self.stacked_convs = stacked_convs + self.feat_in = feat_in + self.feat_out = feat_out + self.anchor_list = None + self.anchor_scales = anchor_scales + self.anchor_ratios = anchor_ratios + self.anchor_strides = anchor_strides + self.anchor_strides = paddle.to_tensor(anchor_strides) + self.anchor_base_sizes = list(anchor_strides) + self.means = paddle.ones(shape=[5]) * target_means + self.stds = paddle.ones(shape=[5]) * target_stds + assert align_conv_type in ['AlignConv', 'Conv', 'DCN'] + self.align_conv_type = align_conv_type + self.align_conv_size = align_conv_size + + self.use_sigmoid_cls = use_sigmoid_cls + self.cls_out_channels = num_classes if self.use_sigmoid_cls else num_classes + 1 + self.sampling = False + self.anchor_assign = anchor_assign + self.reg_loss_weight = reg_loss_weight + self.cls_loss_weight = cls_loss_weight + self.alpha = 1.0 + self.beta = 1.0 + self.reg_loss_type = reg_loss_type + self.nms_pre = nms_pre + self.nms = nms + self.fake_bbox = paddle.to_tensor( + np.array( + [[-1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], + dtype='float32')) + self.fake_bbox_num = paddle.to_tensor(np.array([1], dtype='int32')) + + # anchor + self.anchor_generators = [] + for anchor_base in self.anchor_base_sizes: + self.anchor_generators.append( + S2ANetAnchorGenerator(anchor_base, anchor_scales, + anchor_ratios)) + + self.anchor_generators = nn.LayerList(self.anchor_generators) + self.fam_cls_convs = nn.Sequential() + self.fam_reg_convs = nn.Sequential() + + for i in range(self.stacked_convs): + chan_in = self.feat_in if i == 0 else self.feat_out + + self.fam_cls_convs.add_sublayer( + 'fam_cls_conv_{}'.format(i), + nn.Conv2D( + in_channels=chan_in, + out_channels=self.feat_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0)))) + + self.fam_cls_convs.add_sublayer('fam_cls_conv_{}_act'.format(i), + nn.ReLU()) + + self.fam_reg_convs.add_sublayer( + 'fam_reg_conv_{}'.format(i), + nn.Conv2D( + in_channels=chan_in, + out_channels=self.feat_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0)))) + + self.fam_reg_convs.add_sublayer('fam_reg_conv_{}_act'.format(i), + nn.ReLU()) + + self.fam_reg = nn.Conv2D( + self.feat_out, + 5, + 1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0))) + prior_prob = 0.01 + bias_init = float(-np.log((1 - prior_prob) / prior_prob)) + self.fam_cls = nn.Conv2D( + self.feat_out, + self.cls_out_channels, + 1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(bias_init))) + + if self.align_conv_type == "AlignConv": + self.align_conv = AlignConv(self.feat_out, self.feat_out, + self.align_conv_size) + elif self.align_conv_type == "Conv": + self.align_conv = nn.Conv2D( + self.feat_out, + self.feat_out, + self.align_conv_size, + padding=(self.align_conv_size - 1) // 2, + bias_attr=ParamAttr(initializer=Constant(0))) + + elif self.align_conv_type == "DCN": + self.align_conv_offset = nn.Conv2D( + self.feat_out, + 2 * self.align_conv_size**2, + 1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0))) + + self.align_conv = paddle.vision.ops.DeformConv2D( + self.feat_out, + self.feat_out, + self.align_conv_size, + padding=(self.align_conv_size - 1) // 2, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=False) + + self.or_conv = nn.Conv2D( + self.feat_out, + self.feat_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0))) + + # ODM + self.odm_cls_convs = nn.Sequential() + self.odm_reg_convs = nn.Sequential() + + for i in range(self.stacked_convs): + ch_in = self.feat_out + # ch_in = int(self.feat_out / 8) if i == 0 else self.feat_out + + self.odm_cls_convs.add_sublayer( + 'odm_cls_conv_{}'.format(i), + nn.Conv2D( + in_channels=ch_in, + out_channels=self.feat_out, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0)))) + + self.odm_cls_convs.add_sublayer('odm_cls_conv_{}_act'.format(i), + nn.ReLU()) + + self.odm_reg_convs.add_sublayer( + 'odm_reg_conv_{}'.format(i), + nn.Conv2D( + in_channels=self.feat_out, + out_channels=self.feat_out, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0)))) + + self.odm_reg_convs.add_sublayer('odm_reg_conv_{}_act'.format(i), + nn.ReLU()) + + self.odm_cls = nn.Conv2D( + self.feat_out, + self.cls_out_channels, + 3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(bias_init))) + self.odm_reg = nn.Conv2D( + self.feat_out, + 5, + 3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)), + bias_attr=ParamAttr(initializer=Constant(0))) + + def forward(self, feats, targets=None): + fam_reg_list, fam_cls_list = [], [] + odm_reg_list, odm_cls_list = [], [] + num_anchors_list, base_anchors_list, refine_anchors_list = [], [], [] + + for i, feat in enumerate(feats): + # get shape + B = feat.shape[0] + H, W = paddle.shape(feat)[2], paddle.shape(feat)[3] + + NA = H * W + num_anchors_list.append(NA) + + fam_cls_feat = self.fam_cls_convs(feat) + fam_cls = self.fam_cls(fam_cls_feat) + # [N, CLS, H, W] --> [N, H, W, CLS] + fam_cls = fam_cls.transpose([0, 2, 3, 1]).reshape( + [B, NA, self.cls_out_channels]) + fam_cls_list.append(fam_cls) + + fam_reg_feat = self.fam_reg_convs(feat) + fam_reg = self.fam_reg(fam_reg_feat) + # [N, 5, H, W] --> [N, H, W, 5] + fam_reg = fam_reg.transpose([0, 2, 3, 1]).reshape([B, NA, 5]) + fam_reg_list.append(fam_reg) + + # prepare anchor + init_anchors = self.anchor_generators[i]((H, W), + self.anchor_strides[i]) + init_anchors = init_anchors.reshape([1, NA, 5]) + base_anchors_list.append(init_anchors.squeeze(0)) + + if self.training: + refine_anchor = self.bbox_decode(fam_reg.detach(), init_anchors) + else: + refine_anchor = self.bbox_decode(fam_reg, init_anchors) + + refine_anchors_list.append(refine_anchor) + + if self.align_conv_type == 'AlignConv': + align_feat = self.align_conv(feat, + refine_anchor.clone(), (H, W), + self.anchor_strides[i]) + elif self.align_conv_type == 'DCN': + align_offset = self.align_conv_offset(feat) + align_feat = self.align_conv(feat, align_offset) + elif self.align_conv_type == 'Conv': + align_feat = self.align_conv(feat) + + or_feat = self.or_conv(align_feat) + odm_reg_feat = or_feat + odm_cls_feat = or_feat + + odm_reg_feat = self.odm_reg_convs(odm_reg_feat) + odm_cls_feat = self.odm_cls_convs(odm_cls_feat) + + odm_cls = self.odm_cls(odm_cls_feat) + # [N, CLS, H, W] --> [N, H, W, CLS] + odm_cls = odm_cls.transpose([0, 2, 3, 1]).reshape( + [B, NA, self.cls_out_channels]) + odm_cls_list.append(odm_cls) + + odm_reg = self.odm_reg(odm_reg_feat) + # [N, 5, H, W] --> [N, H, W, 5] + odm_reg = odm_reg.transpose([0, 2, 3, 1]).reshape([B, NA, 5]) + odm_reg_list.append(odm_reg) + + if self.training: + return self.get_loss([ + fam_cls_list, fam_reg_list, odm_cls_list, odm_reg_list, + num_anchors_list, base_anchors_list, refine_anchors_list + ], targets) + else: + odm_bboxes_list = [] + for odm_reg, refine_anchor in zip(odm_reg_list, + refine_anchors_list): + odm_bboxes = self.bbox_decode(odm_reg, refine_anchor) + odm_bboxes_list.append(odm_bboxes) + return [odm_bboxes_list, odm_cls_list] + + def get_bboxes(self, head_outs): + perd_bboxes_list, pred_scores_list = head_outs + batch = paddle.shape(pred_scores_list[0])[0] + bboxes, bbox_num = [], [] + for i in range(batch): + pred_scores_per_image = [t[i] for t in pred_scores_list] + pred_bboxes_per_image = [t[i] for t in perd_bboxes_list] + bbox_per_image, bbox_num_per_image = self.get_bboxes_single( + pred_scores_per_image, pred_bboxes_per_image) + bboxes.append(bbox_per_image) + bbox_num.append(bbox_num_per_image) + + bboxes = paddle.concat(bboxes) + bbox_num = paddle.concat(bbox_num) + return bboxes, bbox_num + + def get_pred(self, bboxes, bbox_num, im_shape, scale_factor): + """ + Rescale, clip and filter the bbox from the output of NMS to + get final prediction. + Args: + bboxes(Tensor): bboxes [N, 10] + bbox_num(Tensor): bbox_num + im_shape(Tensor): [1 2] + scale_factor(Tensor): [1 2] + Returns: + bbox_pred(Tensor): The output is the prediction with shape [N, 8] + including labels, scores and bboxes. The size of + bboxes are corresponding to the original image. + """ + origin_shape = paddle.floor(im_shape / scale_factor + 0.5) + + origin_shape_list = [] + scale_factor_list = [] + # scale_factor: scale_y, scale_x + for i in range(bbox_num.shape[0]): + expand_shape = paddle.expand(origin_shape[i:i + 1, :], + [bbox_num[i], 2]) + scale_y, scale_x = scale_factor[i][0], scale_factor[i][1] + scale = paddle.concat([ + scale_x, scale_y, scale_x, scale_y, scale_x, scale_y, scale_x, + scale_y + ]) + expand_scale = paddle.expand(scale, [bbox_num[i], 8]) + origin_shape_list.append(expand_shape) + scale_factor_list.append(expand_scale) + + origin_shape_list = paddle.concat(origin_shape_list) + scale_factor_list = paddle.concat(scale_factor_list) + + # bboxes: [N, 10], label, score, bbox + pred_label_score = bboxes[:, 0:2] + pred_bbox = bboxes[:, 2:] + + # rescale bbox to original image + pred_bbox = pred_bbox.reshape([-1, 8]) + scaled_bbox = pred_bbox / scale_factor_list + origin_h = origin_shape_list[:, 0] + origin_w = origin_shape_list[:, 1] + + bboxes = scaled_bbox + zeros = paddle.zeros_like(origin_h) + x1 = paddle.maximum(paddle.minimum(bboxes[:, 0], origin_w - 1), zeros) + y1 = paddle.maximum(paddle.minimum(bboxes[:, 1], origin_h - 1), zeros) + x2 = paddle.maximum(paddle.minimum(bboxes[:, 2], origin_w - 1), zeros) + y2 = paddle.maximum(paddle.minimum(bboxes[:, 3], origin_h - 1), zeros) + x3 = paddle.maximum(paddle.minimum(bboxes[:, 4], origin_w - 1), zeros) + y3 = paddle.maximum(paddle.minimum(bboxes[:, 5], origin_h - 1), zeros) + x4 = paddle.maximum(paddle.minimum(bboxes[:, 6], origin_w - 1), zeros) + y4 = paddle.maximum(paddle.minimum(bboxes[:, 7], origin_h - 1), zeros) + pred_bbox = paddle.stack([x1, y1, x2, y2, x3, y3, x4, y4], axis=-1) + pred_result = paddle.concat([pred_label_score, pred_bbox], axis=1) + return pred_result + + def get_bboxes_single(self, cls_score_list, bbox_pred_list): + mlvl_bboxes = [] + mlvl_scores = [] + + for cls_score, bbox_pred in zip(cls_score_list, bbox_pred_list): + if self.use_sigmoid_cls: + scores = F.sigmoid(cls_score) + else: + scores = F.softmax(cls_score, axis=-1) + + if scores.shape[0] > self.nms_pre: + # Get maximum scores for foreground classes. + if self.use_sigmoid_cls: + max_scores = paddle.max(scores, axis=1) + else: + max_scores = paddle.max(scores[:, :-1], axis=1) + + topk_val, topk_inds = paddle.topk(max_scores, self.nms_pre) + bbox_pred = paddle.gather(bbox_pred, topk_inds) + scores = paddle.gather(scores, topk_inds) + + mlvl_bboxes.append(bbox_pred) + mlvl_scores.append(scores) + + mlvl_bboxes = paddle.concat(mlvl_bboxes) + mlvl_scores = paddle.concat(mlvl_scores) + + mlvl_polys = self.rbox2poly(mlvl_bboxes).unsqueeze(0) + mlvl_scores = paddle.transpose(mlvl_scores, [1, 0]).unsqueeze(0) + + bbox, bbox_num, _ = self.nms(mlvl_polys, mlvl_scores) + if bbox.shape[0] <= 0: + bbox = self.fake_bbox + bbox_num = self.fake_bbox_num + + return bbox, bbox_num + + def smooth_l1_loss(self, pred, label, delta=1.0 / 9.0): + """ + Args: + pred: pred score + label: label + delta: delta + Returns: loss + """ + assert pred.shape == label.shape and label.numel() > 0 + assert delta > 0 + diff = paddle.abs(pred - label) + loss = paddle.where(diff < delta, 0.5 * diff * diff / delta, + diff - 0.5 * delta) + return loss + + def get_fam_loss(self, fam_target, s2anet_head_out, reg_loss_type='l1'): + (labels, label_weights, bbox_targets, bbox_weights, bbox_gt_bboxes, + pos_inds, neg_inds) = fam_target + fam_cls_branch_list, fam_reg_branch_list, odm_cls_branch_list, odm_reg_branch_list, num_anchors_list = s2anet_head_out + + fam_cls_losses = [] + fam_bbox_losses = [] + st_idx = 0 + num_total_samples = len(pos_inds) + len( + neg_inds) if self.sampling else len(pos_inds) + num_total_samples = max(1, num_total_samples) + + for idx, feat_anchor_num in enumerate(num_anchors_list): + # step1: get data + feat_labels = labels[st_idx:st_idx + feat_anchor_num] + feat_label_weights = label_weights[st_idx:st_idx + feat_anchor_num] + + feat_bbox_targets = bbox_targets[st_idx:st_idx + feat_anchor_num, :] + feat_bbox_weights = bbox_weights[st_idx:st_idx + feat_anchor_num, :] + + # step2: calc cls loss + feat_labels = feat_labels.reshape(-1) + feat_label_weights = feat_label_weights.reshape(-1) + + fam_cls_score = fam_cls_branch_list[idx] + fam_cls_score = paddle.squeeze(fam_cls_score, axis=0) + fam_cls_score1 = fam_cls_score + + feat_labels = paddle.to_tensor(feat_labels) + feat_labels_one_hot = paddle.nn.functional.one_hot( + feat_labels, self.cls_out_channels + 1) + feat_labels_one_hot = feat_labels_one_hot[:, 1:] + feat_labels_one_hot.stop_gradient = True + + num_total_samples = paddle.to_tensor( + num_total_samples, dtype='float32', stop_gradient=True) + + fam_cls = F.sigmoid_focal_loss( + fam_cls_score1, + feat_labels_one_hot, + normalizer=num_total_samples, + reduction='none') + + feat_label_weights = feat_label_weights.reshape( + feat_label_weights.shape[0], 1) + feat_label_weights = np.repeat( + feat_label_weights, self.cls_out_channels, axis=1) + feat_label_weights = paddle.to_tensor( + feat_label_weights, stop_gradient=True) + + fam_cls = fam_cls * feat_label_weights + fam_cls_total = paddle.sum(fam_cls) + fam_cls_losses.append(fam_cls_total) + + # step3: regression loss + feat_bbox_targets = paddle.to_tensor( + feat_bbox_targets, dtype='float32', stop_gradient=True) + feat_bbox_targets = paddle.reshape(feat_bbox_targets, [-1, 5]) + + fam_bbox_pred = fam_reg_branch_list[idx] + fam_bbox_pred = paddle.squeeze(fam_bbox_pred, axis=0) + fam_bbox_pred = paddle.reshape(fam_bbox_pred, [-1, 5]) + fam_bbox = self.smooth_l1_loss(fam_bbox_pred, feat_bbox_targets) + loss_weight = paddle.to_tensor( + self.reg_loss_weight, dtype='float32', stop_gradient=True) + fam_bbox = paddle.multiply(fam_bbox, loss_weight) + feat_bbox_weights = paddle.to_tensor( + feat_bbox_weights, stop_gradient=True) + + fam_bbox = fam_bbox * feat_bbox_weights + fam_bbox_total = paddle.sum(fam_bbox) / num_total_samples + fam_bbox_losses.append(fam_bbox_total) + st_idx += feat_anchor_num + + fam_cls_loss = paddle.add_n(fam_cls_losses) + fam_cls_loss_weight = paddle.to_tensor( + self.cls_loss_weight[0], dtype='float32', stop_gradient=True) + fam_cls_loss = fam_cls_loss * fam_cls_loss_weight + fam_reg_loss = paddle.add_n(fam_bbox_losses) + return fam_cls_loss, fam_reg_loss + + def get_odm_loss(self, odm_target, s2anet_head_out, reg_loss_type='l1'): + (labels, label_weights, bbox_targets, bbox_weights, bbox_gt_bboxes, + pos_inds, neg_inds) = odm_target + fam_cls_branch_list, fam_reg_branch_list, odm_cls_branch_list, odm_reg_branch_list, num_anchors_list = s2anet_head_out + + odm_cls_losses = [] + odm_bbox_losses = [] + st_idx = 0 + num_total_samples = len(pos_inds) + len( + neg_inds) if self.sampling else len(pos_inds) + num_total_samples = max(1, num_total_samples) + + for idx, feat_anchor_num in enumerate(num_anchors_list): + # step1: get data + feat_labels = labels[st_idx:st_idx + feat_anchor_num] + feat_label_weights = label_weights[st_idx:st_idx + feat_anchor_num] + + feat_bbox_targets = bbox_targets[st_idx:st_idx + feat_anchor_num, :] + feat_bbox_weights = bbox_weights[st_idx:st_idx + feat_anchor_num, :] + + # step2: calc cls loss + feat_labels = feat_labels.reshape(-1) + feat_label_weights = feat_label_weights.reshape(-1) + + odm_cls_score = odm_cls_branch_list[idx] + odm_cls_score = paddle.squeeze(odm_cls_score, axis=0) + odm_cls_score1 = odm_cls_score + + feat_labels = paddle.to_tensor(feat_labels) + feat_labels_one_hot = paddle.nn.functional.one_hot( + feat_labels, self.cls_out_channels + 1) + feat_labels_one_hot = feat_labels_one_hot[:, 1:] + feat_labels_one_hot.stop_gradient = True + + num_total_samples = paddle.to_tensor( + num_total_samples, dtype='float32', stop_gradient=True) + odm_cls = F.sigmoid_focal_loss( + odm_cls_score1, + feat_labels_one_hot, + normalizer=num_total_samples, + reduction='none') + + feat_label_weights = feat_label_weights.reshape( + feat_label_weights.shape[0], 1) + feat_label_weights = np.repeat( + feat_label_weights, self.cls_out_channels, axis=1) + feat_label_weights = paddle.to_tensor(feat_label_weights) + feat_label_weights.stop_gradient = True + + odm_cls = odm_cls * feat_label_weights + odm_cls_total = paddle.sum(odm_cls) + odm_cls_losses.append(odm_cls_total) + + # # step3: regression loss + feat_bbox_targets = paddle.to_tensor( + feat_bbox_targets, dtype='float32') + feat_bbox_targets = paddle.reshape(feat_bbox_targets, [-1, 5]) + feat_bbox_targets.stop_gradient = True + + odm_bbox_pred = odm_reg_branch_list[idx] + odm_bbox_pred = paddle.squeeze(odm_bbox_pred, axis=0) + odm_bbox_pred = paddle.reshape(odm_bbox_pred, [-1, 5]) + odm_bbox = self.smooth_l1_loss(odm_bbox_pred, feat_bbox_targets) + + loss_weight = paddle.to_tensor( + self.reg_loss_weight, dtype='float32', stop_gradient=True) + odm_bbox = paddle.multiply(odm_bbox, loss_weight) + feat_bbox_weights = paddle.to_tensor( + feat_bbox_weights, stop_gradient=True) + + odm_bbox = odm_bbox * feat_bbox_weights + odm_bbox_total = paddle.sum(odm_bbox) / num_total_samples + + odm_bbox_losses.append(odm_bbox_total) + st_idx += feat_anchor_num + + odm_cls_loss = paddle.add_n(odm_cls_losses) + odm_cls_loss_weight = paddle.to_tensor( + self.cls_loss_weight[1], dtype='float32', stop_gradient=True) + odm_cls_loss = odm_cls_loss * odm_cls_loss_weight + odm_reg_loss = paddle.add_n(odm_bbox_losses) + return odm_cls_loss, odm_reg_loss + + def get_loss(self, head_outs, inputs): + fam_cls_list, fam_reg_list, odm_cls_list, odm_reg_list, \ + num_anchors_list, base_anchors_list, refine_anchors_list = head_outs + + # compute loss + fam_cls_loss_lst = [] + fam_reg_loss_lst = [] + odm_cls_loss_lst = [] + odm_reg_loss_lst = [] + + batch = len(inputs['gt_rbox']) + for i in range(batch): + # data_format: (xc, yc, w, h, theta) + gt_mask = inputs['pad_gt_mask'][i, :, 0] + gt_idx = paddle.nonzero(gt_mask).squeeze(-1) + gt_bboxes = paddle.gather(inputs['gt_rbox'][i], gt_idx).numpy() + gt_labels = paddle.gather(inputs['gt_class'][i], gt_idx).numpy() + is_crowd = paddle.gather(inputs['is_crowd'][i], gt_idx).numpy() + gt_labels = gt_labels + 1 + + anchors_per_image = np.concatenate(base_anchors_list) + + fam_cls_per_image = [t[i] for t in fam_cls_list] + fam_reg_per_image = [t[i] for t in fam_reg_list] + odm_cls_per_image = [t[i] for t in odm_cls_list] + odm_reg_per_image = [t[i] for t in odm_reg_list] + im_s2anet_head_out = (fam_cls_per_image, fam_reg_per_image, + odm_cls_per_image, odm_reg_per_image, + num_anchors_list) + # FAM + im_fam_target = self.anchor_assign(anchors_per_image, gt_bboxes, + gt_labels, is_crowd) + if im_fam_target is not None: + im_fam_cls_loss, im_fam_reg_loss = self.get_fam_loss( + im_fam_target, im_s2anet_head_out, self.reg_loss_type) + fam_cls_loss_lst.append(im_fam_cls_loss) + fam_reg_loss_lst.append(im_fam_reg_loss) + + # ODM + refine_anchors_per_image = [t[i] for t in refine_anchors_list] + refine_anchors_per_image = paddle.concat( + refine_anchors_per_image).numpy() + im_odm_target = self.anchor_assign(refine_anchors_per_image, + gt_bboxes, gt_labels, is_crowd) + + if im_odm_target is not None: + im_odm_cls_loss, im_odm_reg_loss = self.get_odm_loss( + im_odm_target, im_s2anet_head_out, self.reg_loss_type) + odm_cls_loss_lst.append(im_odm_cls_loss) + odm_reg_loss_lst.append(im_odm_reg_loss) + + fam_cls_loss = paddle.add_n(fam_cls_loss_lst) / batch + fam_reg_loss = paddle.add_n(fam_reg_loss_lst) / batch + odm_cls_loss = paddle.add_n(odm_cls_loss_lst) / batch + odm_reg_loss = paddle.add_n(odm_reg_loss_lst) / batch + loss = fam_cls_loss + fam_reg_loss + odm_cls_loss + odm_reg_loss + + return { + 'loss': loss, + 'fam_cls_loss': fam_cls_loss, + 'fam_reg_loss': fam_reg_loss, + 'odm_cls_loss': odm_cls_loss, + 'odm_reg_loss': odm_reg_loss + } + + def bbox_decode(self, preds, anchors, wh_ratio_clip=1e-6): + """decode bbox from deltas + Args: + preds: [B, L, 5] + anchors: [1, L, 5] + return: + bboxes: [B, L, 5] + """ + preds = paddle.add(paddle.multiply(preds, self.stds), self.means) + + dx, dy, dw, dh, dangle = paddle.split(preds, 5, axis=-1) + max_ratio = np.abs(np.log(wh_ratio_clip)) + dw = paddle.clip(dw, min=-max_ratio, max=max_ratio) + dh = paddle.clip(dh, min=-max_ratio, max=max_ratio) + + rroi_x, rroi_y, rroi_w, rroi_h, rroi_angle = paddle.split( + anchors, 5, axis=-1) + + gx = dx * rroi_w * paddle.cos(rroi_angle) - dy * rroi_h * paddle.sin( + rroi_angle) + rroi_x + gy = dx * rroi_w * paddle.sin(rroi_angle) + dy * rroi_h * paddle.cos( + rroi_angle) + rroi_y + gw = rroi_w * dw.exp() + gh = rroi_h * dh.exp() + ga = np.pi * dangle + rroi_angle + ga = (ga + np.pi / 4) % np.pi - np.pi / 4 + bboxes = paddle.concat([gx, gy, gw, gh, ga], axis=-1) + return bboxes + + def rbox2poly(self, rboxes): + """ + rboxes: [x_ctr,y_ctr,w,h,angle] + to + polys: [x0,y0,x1,y1,x2,y2,x3,y3] + """ + N = paddle.shape(rboxes)[0] + + x_ctr = rboxes[:, 0] + y_ctr = rboxes[:, 1] + width = rboxes[:, 2] + height = rboxes[:, 3] + angle = rboxes[:, 4] + + tl_x, tl_y, br_x, br_y = -width * 0.5, -height * 0.5, width * 0.5, height * 0.5 + + normal_rects = paddle.stack( + [tl_x, br_x, br_x, tl_x, tl_y, tl_y, br_y, br_y], axis=0) + normal_rects = paddle.reshape(normal_rects, [2, 4, N]) + normal_rects = paddle.transpose(normal_rects, [2, 0, 1]) + + sin, cos = paddle.sin(angle), paddle.cos(angle) + # M: [N,2,2] + M = paddle.stack([cos, -sin, sin, cos], axis=0) + M = paddle.reshape(M, [2, 2, N]) + M = paddle.transpose(M, [2, 0, 1]) + + # polys: [N,8] + polys = paddle.matmul(M, normal_rects) + polys = paddle.transpose(polys, [2, 1, 0]) + polys = paddle.reshape(polys, [-1, N]) + polys = paddle.transpose(polys, [1, 0]) + + tmp = paddle.stack( + [x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr], axis=1) + polys = polys + tmp + return polys diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/simota_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/simota_head.py new file mode 100644 index 0000000000000000000000000000000000000000..e74f017570e09e95d921b206c532e3cf4435194f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/simota_head.py @@ -0,0 +1,500 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/yolox_head.py + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +from functools import partial +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Normal, Constant + +from ppdet.core.workspace import register + +from ppdet.modeling.bbox_utils import distance2bbox, bbox2distance +from ppdet.data.transform.atss_assigner import bbox_overlaps + +from .gfl_head import GFLHead + + +@register +class OTAHead(GFLHead): + """ + OTAHead + Args: + conv_feat (object): Instance of 'FCOSFeat' + num_classes (int): Number of classes + fpn_stride (list): The stride of each FPN Layer + prior_prob (float): Used to set the bias init for the class prediction layer + loss_qfl (object): Instance of QualityFocalLoss. + loss_dfl (object): Instance of DistributionFocalLoss. + loss_bbox (object): Instance of bbox loss. + assigner (object): Instance of label assigner. + reg_max: Max value of integral set :math: `{0, ..., reg_max}` + n QFL setting. Default: 16. + """ + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', + 'assigner', 'nms' + ] + __shared__ = ['num_classes'] + + def __init__(self, + conv_feat='FCOSFeat', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32, 64, 128], + prior_prob=0.01, + loss_class='QualityFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + assigner='SimOTAAssigner', + reg_max=16, + feat_in_chan=256, + nms=None, + nms_pre=1000, + cell_offset=0): + super(OTAHead, self).__init__( + conv_feat=conv_feat, + dgqp_module=dgqp_module, + num_classes=num_classes, + fpn_stride=fpn_stride, + prior_prob=prior_prob, + loss_class=loss_class, + loss_dfl=loss_dfl, + loss_bbox=loss_bbox, + reg_max=reg_max, + feat_in_chan=feat_in_chan, + nms=nms, + nms_pre=nms_pre, + cell_offset=cell_offset) + self.conv_feat = conv_feat + self.dgqp_module = dgqp_module + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.loss_qfl = loss_class + self.loss_dfl = loss_dfl + self.loss_bbox = loss_bbox + self.reg_max = reg_max + self.feat_in_chan = feat_in_chan + self.nms = nms + self.nms_pre = nms_pre + self.cell_offset = cell_offset + self.use_sigmoid = self.loss_qfl.use_sigmoid + + self.assigner = assigner + + def _get_target_single(self, flatten_cls_pred, flatten_center_and_stride, + flatten_bbox, gt_bboxes, gt_labels): + """Compute targets for priors in a single image. + """ + pos_num, label, label_weight, bbox_target = self.assigner( + F.sigmoid(flatten_cls_pred), flatten_center_and_stride, + flatten_bbox, gt_bboxes, gt_labels) + + return (pos_num, label, label_weight, bbox_target) + + def get_loss(self, head_outs, gt_meta): + cls_scores, bbox_preds = head_outs + num_level_anchors = [ + featmap.shape[-2] * featmap.shape[-1] for featmap in cls_scores + ] + num_imgs = gt_meta['im_id'].shape[0] + featmap_sizes = [[featmap.shape[-2], featmap.shape[-1]] + for featmap in cls_scores] + + decode_bbox_preds = [] + center_and_strides = [] + for featmap_size, stride, bbox_pred in zip(featmap_sizes, + self.fpn_stride, bbox_preds): + + # center in origin image + yy, xx = self.get_single_level_center_point(featmap_size, stride, + self.cell_offset) + + center_and_stride = paddle.stack([xx, yy, stride, stride], -1).tile( + [num_imgs, 1, 1]) + center_and_strides.append(center_and_stride) + center_in_feature = center_and_stride.reshape( + [-1, 4])[:, :-2] / stride + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [num_imgs, -1, 4 * (self.reg_max + 1)]) + pred_distances = self.distribution_project(bbox_pred) + decode_bbox_pred_wo_stride = distance2bbox( + center_in_feature, pred_distances).reshape([num_imgs, -1, 4]) + decode_bbox_preds.append(decode_bbox_pred_wo_stride * stride) + + flatten_cls_preds = [ + cls_pred.transpose([0, 2, 3, 1]).reshape( + [num_imgs, -1, self.cls_out_channels]) + for cls_pred in cls_scores + ] + flatten_cls_preds = paddle.concat(flatten_cls_preds, axis=1) + flatten_bboxes = paddle.concat(decode_bbox_preds, axis=1) + flatten_center_and_strides = paddle.concat(center_and_strides, axis=1) + + gt_boxes, gt_labels = gt_meta['gt_bbox'], gt_meta['gt_class'] + pos_num_l, label_l, label_weight_l, bbox_target_l = [], [], [], [] + for flatten_cls_pred,flatten_center_and_stride,flatten_bbox,gt_box, gt_label \ + in zip(flatten_cls_preds.detach(),flatten_center_and_strides.detach(), \ + flatten_bboxes.detach(),gt_boxes, gt_labels): + pos_num, label, label_weight, bbox_target = self._get_target_single( + flatten_cls_pred, flatten_center_and_stride, flatten_bbox, + gt_box, gt_label) + pos_num_l.append(pos_num) + label_l.append(label) + label_weight_l.append(label_weight) + bbox_target_l.append(bbox_target) + + labels = paddle.to_tensor(np.stack(label_l, axis=0)) + label_weights = paddle.to_tensor(np.stack(label_weight_l, axis=0)) + bbox_targets = paddle.to_tensor(np.stack(bbox_target_l, axis=0)) + + center_and_strides_list = self._images_to_levels( + flatten_center_and_strides, num_level_anchors) + labels_list = self._images_to_levels(labels, num_level_anchors) + label_weights_list = self._images_to_levels(label_weights, + num_level_anchors) + bbox_targets_list = self._images_to_levels(bbox_targets, + num_level_anchors) + num_total_pos = sum(pos_num_l) + try: + paddle.distributed.all_reduce(num_total_pos) + num_total_pos = paddle.clip( + num_total_pos / paddle.distributed.get_world_size(), min=1.) + except: + num_total_pos = max(num_total_pos, 1) + + loss_bbox_list, loss_dfl_list, loss_qfl_list, avg_factor = [], [], [], [] + for cls_score, bbox_pred, center_and_strides, labels, label_weights, bbox_targets, stride in zip( + cls_scores, bbox_preds, center_and_strides_list, labels_list, + label_weights_list, bbox_targets_list, self.fpn_stride): + center_and_strides = center_and_strides.reshape([-1, 4]) + cls_score = cls_score.transpose([0, 2, 3, 1]).reshape( + [-1, self.cls_out_channels]) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [-1, 4 * (self.reg_max + 1)]) + bbox_targets = bbox_targets.reshape([-1, 4]) + labels = labels.reshape([-1]) + label_weights = label_weights.reshape([-1]) + + bg_class_ind = self.num_classes + pos_inds = paddle.nonzero( + paddle.logical_and((labels >= 0), (labels < bg_class_ind)), + as_tuple=False).squeeze(1) + score = np.zeros(labels.shape) + + if len(pos_inds) > 0: + pos_bbox_targets = paddle.gather(bbox_targets, pos_inds, axis=0) + pos_bbox_pred = paddle.gather(bbox_pred, pos_inds, axis=0) + pos_centers = paddle.gather( + center_and_strides[:, :-2], pos_inds, axis=0) / stride + + weight_targets = F.sigmoid(cls_score.detach()) + weight_targets = paddle.gather( + weight_targets.max(axis=1, keepdim=True), pos_inds, axis=0) + pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred) + pos_decode_bbox_pred = distance2bbox(pos_centers, + pos_bbox_pred_corners) + pos_decode_bbox_targets = pos_bbox_targets / stride + bbox_iou = bbox_overlaps( + pos_decode_bbox_pred.detach().numpy(), + pos_decode_bbox_targets.detach().numpy(), + is_aligned=True) + score[pos_inds.numpy()] = bbox_iou + + pred_corners = pos_bbox_pred.reshape([-1, self.reg_max + 1]) + target_corners = bbox2distance(pos_centers, + pos_decode_bbox_targets, + self.reg_max).reshape([-1]) + # regression loss + loss_bbox = paddle.sum( + self.loss_bbox(pos_decode_bbox_pred, + pos_decode_bbox_targets) * weight_targets) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + else: + loss_bbox = bbox_pred.sum() * 0 + loss_dfl = bbox_pred.sum() * 0 + weight_targets = paddle.to_tensor([0], dtype='float32') + + # qfl loss + score = paddle.to_tensor(score) + loss_qfl = self.loss_qfl( + cls_score, (labels, score), + weight=label_weights, + avg_factor=num_total_pos) + loss_bbox_list.append(loss_bbox) + loss_dfl_list.append(loss_dfl) + loss_qfl_list.append(loss_qfl) + avg_factor.append(weight_targets.sum()) + + avg_factor = sum(avg_factor) + try: + paddle.distributed.all_reduce(avg_factor) + avg_factor = paddle.clip( + avg_factor / paddle.distributed.get_world_size(), min=1) + except: + avg_factor = max(avg_factor.item(), 1) + if avg_factor <= 0: + loss_qfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_bbox = paddle.to_tensor( + 0, dtype='float32', stop_gradient=False) + loss_dfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + else: + losses_bbox = list(map(lambda x: x / avg_factor, loss_bbox_list)) + losses_dfl = list(map(lambda x: x / avg_factor, loss_dfl_list)) + loss_qfl = sum(loss_qfl_list) + loss_bbox = sum(losses_bbox) + loss_dfl = sum(losses_dfl) + + loss_states = dict( + loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + + return loss_states + + +@register +class OTAVFLHead(OTAHead): + __inject__ = [ + 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', + 'assigner', 'nms' + ] + __shared__ = ['num_classes'] + + def __init__(self, + conv_feat='FCOSFeat', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32, 64, 128], + prior_prob=0.01, + loss_class='VarifocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + assigner='SimOTAAssigner', + reg_max=16, + feat_in_chan=256, + nms=None, + nms_pre=1000, + cell_offset=0): + super(OTAVFLHead, self).__init__( + conv_feat=conv_feat, + dgqp_module=dgqp_module, + num_classes=num_classes, + fpn_stride=fpn_stride, + prior_prob=prior_prob, + loss_class=loss_class, + loss_dfl=loss_dfl, + loss_bbox=loss_bbox, + reg_max=reg_max, + feat_in_chan=feat_in_chan, + nms=nms, + nms_pre=nms_pre, + cell_offset=cell_offset) + self.conv_feat = conv_feat + self.dgqp_module = dgqp_module + self.num_classes = num_classes + self.fpn_stride = fpn_stride + self.prior_prob = prior_prob + self.loss_vfl = loss_class + self.loss_dfl = loss_dfl + self.loss_bbox = loss_bbox + self.reg_max = reg_max + self.feat_in_chan = feat_in_chan + self.nms = nms + self.nms_pre = nms_pre + self.cell_offset = cell_offset + self.use_sigmoid = self.loss_vfl.use_sigmoid + + self.assigner = assigner + + def get_loss(self, head_outs, gt_meta): + cls_scores, bbox_preds = head_outs + num_level_anchors = [ + featmap.shape[-2] * featmap.shape[-1] for featmap in cls_scores + ] + num_imgs = gt_meta['im_id'].shape[0] + featmap_sizes = [[featmap.shape[-2], featmap.shape[-1]] + for featmap in cls_scores] + + decode_bbox_preds = [] + center_and_strides = [] + for featmap_size, stride, bbox_pred in zip(featmap_sizes, + self.fpn_stride, bbox_preds): + # center in origin image + yy, xx = self.get_single_level_center_point(featmap_size, stride, + self.cell_offset) + strides = paddle.full((len(xx), ), stride) + center_and_stride = paddle.stack([xx, yy, strides, strides], + -1).tile([num_imgs, 1, 1]) + center_and_strides.append(center_and_stride) + center_in_feature = center_and_stride.reshape( + [-1, 4])[:, :-2] / stride + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [num_imgs, -1, 4 * (self.reg_max + 1)]) + pred_distances = self.distribution_project(bbox_pred) + decode_bbox_pred_wo_stride = distance2bbox( + center_in_feature, pred_distances).reshape([num_imgs, -1, 4]) + decode_bbox_preds.append(decode_bbox_pred_wo_stride * stride) + + flatten_cls_preds = [ + cls_pred.transpose([0, 2, 3, 1]).reshape( + [num_imgs, -1, self.cls_out_channels]) + for cls_pred in cls_scores + ] + flatten_cls_preds = paddle.concat(flatten_cls_preds, axis=1) + flatten_bboxes = paddle.concat(decode_bbox_preds, axis=1) + flatten_center_and_strides = paddle.concat(center_and_strides, axis=1) + + gt_boxes, gt_labels = gt_meta['gt_bbox'], gt_meta['gt_class'] + pos_num_l, label_l, label_weight_l, bbox_target_l = [], [], [], [] + for flatten_cls_pred, flatten_center_and_stride, flatten_bbox,gt_box,gt_label \ + in zip(flatten_cls_preds.detach(), flatten_center_and_strides.detach(), \ + flatten_bboxes.detach(),gt_boxes,gt_labels): + pos_num, label, label_weight, bbox_target = self._get_target_single( + flatten_cls_pred, flatten_center_and_stride, flatten_bbox, + gt_box, gt_label) + pos_num_l.append(pos_num) + label_l.append(label) + label_weight_l.append(label_weight) + bbox_target_l.append(bbox_target) + + labels = paddle.to_tensor(np.stack(label_l, axis=0)) + label_weights = paddle.to_tensor(np.stack(label_weight_l, axis=0)) + bbox_targets = paddle.to_tensor(np.stack(bbox_target_l, axis=0)) + + center_and_strides_list = self._images_to_levels( + flatten_center_and_strides, num_level_anchors) + labels_list = self._images_to_levels(labels, num_level_anchors) + label_weights_list = self._images_to_levels(label_weights, + num_level_anchors) + bbox_targets_list = self._images_to_levels(bbox_targets, + num_level_anchors) + num_total_pos = sum(pos_num_l) + try: + paddle.distributed.all_reduce(num_total_pos) + num_total_pos = paddle.clip( + num_total_pos / paddle.distributed.get_world_size(), min=1.) + except: + num_total_pos = max(num_total_pos, 1) + + loss_bbox_list, loss_dfl_list, loss_vfl_list, avg_factor = [], [], [], [] + for cls_score, bbox_pred, center_and_strides, labels, label_weights, bbox_targets, stride in zip( + cls_scores, bbox_preds, center_and_strides_list, labels_list, + label_weights_list, bbox_targets_list, self.fpn_stride): + center_and_strides = center_and_strides.reshape([-1, 4]) + cls_score = cls_score.transpose([0, 2, 3, 1]).reshape( + [-1, self.cls_out_channels]) + bbox_pred = bbox_pred.transpose([0, 2, 3, 1]).reshape( + [-1, 4 * (self.reg_max + 1)]) + bbox_targets = bbox_targets.reshape([-1, 4]) + labels = labels.reshape([-1]) + + bg_class_ind = self.num_classes + pos_inds = paddle.nonzero( + paddle.logical_and((labels >= 0), (labels < bg_class_ind)), + as_tuple=False).squeeze(1) + # vfl + vfl_score = np.zeros(cls_score.shape) + + if len(pos_inds) > 0: + pos_bbox_targets = paddle.gather(bbox_targets, pos_inds, axis=0) + pos_bbox_pred = paddle.gather(bbox_pred, pos_inds, axis=0) + pos_centers = paddle.gather( + center_and_strides[:, :-2], pos_inds, axis=0) / stride + + weight_targets = F.sigmoid(cls_score.detach()) + weight_targets = paddle.gather( + weight_targets.max(axis=1, keepdim=True), pos_inds, axis=0) + pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred) + pos_decode_bbox_pred = distance2bbox(pos_centers, + pos_bbox_pred_corners) + pos_decode_bbox_targets = pos_bbox_targets / stride + bbox_iou = bbox_overlaps( + pos_decode_bbox_pred.detach().numpy(), + pos_decode_bbox_targets.detach().numpy(), + is_aligned=True) + + # vfl + pos_labels = paddle.gather(labels, pos_inds, axis=0) + vfl_score[pos_inds.numpy(), pos_labels] = bbox_iou + + pred_corners = pos_bbox_pred.reshape([-1, self.reg_max + 1]) + target_corners = bbox2distance(pos_centers, + pos_decode_bbox_targets, + self.reg_max).reshape([-1]) + # regression loss + loss_bbox = paddle.sum( + self.loss_bbox(pos_decode_bbox_pred, + pos_decode_bbox_targets) * weight_targets) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets.expand([-1, 4]).reshape([-1]), + avg_factor=4.0) + else: + loss_bbox = bbox_pred.sum() * 0 + loss_dfl = bbox_pred.sum() * 0 + weight_targets = paddle.to_tensor([0], dtype='float32') + + # vfl loss + num_pos_avg_per_gpu = num_total_pos + vfl_score = paddle.to_tensor(vfl_score) + loss_vfl = self.loss_vfl( + cls_score, vfl_score, avg_factor=num_pos_avg_per_gpu) + + loss_bbox_list.append(loss_bbox) + loss_dfl_list.append(loss_dfl) + loss_vfl_list.append(loss_vfl) + avg_factor.append(weight_targets.sum()) + + avg_factor = sum(avg_factor) + try: + paddle.distributed.all_reduce(avg_factor) + avg_factor = paddle.clip( + avg_factor / paddle.distributed.get_world_size(), min=1) + except: + avg_factor = max(avg_factor.item(), 1) + if avg_factor <= 0: + loss_vfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + loss_bbox = paddle.to_tensor( + 0, dtype='float32', stop_gradient=False) + loss_dfl = paddle.to_tensor(0, dtype='float32', stop_gradient=False) + else: + losses_bbox = list(map(lambda x: x / avg_factor, loss_bbox_list)) + losses_dfl = list(map(lambda x: x / avg_factor, loss_dfl_list)) + loss_vfl = sum(loss_vfl_list) + loss_bbox = sum(losses_bbox) + loss_dfl = sum(losses_dfl) + + loss_states = dict( + loss_vfl=loss_vfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + + return loss_states diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/solov2_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/solov2_head.py new file mode 100644 index 0000000000000000000000000000000000000000..6989abb3a8a5378a939b64d0b54ee864580ccd85 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/solov2_head.py @@ -0,0 +1,554 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from paddle import ParamAttr +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, Constant + +from ppdet.modeling.layers import ConvNormLayer, MaskMatrixNMS, DropBlock +from ppdet.core.workspace import register + +from six.moves import zip +import numpy as np + +__all__ = ['SOLOv2Head'] + + +@register +class SOLOv2MaskHead(nn.Layer): + """ + MaskHead of SOLOv2. + The code of this function is based on: + https://github.com/WXinlong/SOLO/blob/master/mmdet/models/mask_heads/mask_feat_head.py + + Args: + in_channels (int): The channel number of input Tensor. + out_channels (int): The channel number of output Tensor. + start_level (int): The position where the input starts. + end_level (int): The position where the input ends. + use_dcn_in_tower (bool): Whether to use dcn in tower or not. + """ + __shared__ = ['norm_type'] + + def __init__(self, + in_channels=256, + mid_channels=128, + out_channels=256, + start_level=0, + end_level=3, + use_dcn_in_tower=False, + norm_type='gn'): + super(SOLOv2MaskHead, self).__init__() + assert start_level >= 0 and end_level >= start_level + self.in_channels = in_channels + self.out_channels = out_channels + self.mid_channels = mid_channels + self.use_dcn_in_tower = use_dcn_in_tower + self.range_level = end_level - start_level + 1 + self.use_dcn = True if self.use_dcn_in_tower else False + self.convs_all_levels = [] + self.norm_type = norm_type + for i in range(start_level, end_level + 1): + conv_feat_name = 'mask_feat_head.convs_all_levels.{}'.format(i) + conv_pre_feat = nn.Sequential() + if i == start_level: + conv_pre_feat.add_sublayer( + conv_feat_name + '.conv' + str(i), + ConvNormLayer( + ch_in=self.in_channels, + ch_out=self.mid_channels, + filter_size=3, + stride=1, + use_dcn=self.use_dcn, + norm_type=self.norm_type)) + self.add_sublayer('conv_pre_feat' + str(i), conv_pre_feat) + self.convs_all_levels.append(conv_pre_feat) + else: + for j in range(i): + ch_in = 0 + if j == 0: + ch_in = self.in_channels + 2 if i == end_level else self.in_channels + else: + ch_in = self.mid_channels + conv_pre_feat.add_sublayer( + conv_feat_name + '.conv' + str(j), + ConvNormLayer( + ch_in=ch_in, + ch_out=self.mid_channels, + filter_size=3, + stride=1, + use_dcn=self.use_dcn, + norm_type=self.norm_type)) + conv_pre_feat.add_sublayer( + conv_feat_name + '.conv' + str(j) + 'act', nn.ReLU()) + conv_pre_feat.add_sublayer( + 'upsample' + str(i) + str(j), + nn.Upsample( + scale_factor=2, mode='bilinear')) + self.add_sublayer('conv_pre_feat' + str(i), conv_pre_feat) + self.convs_all_levels.append(conv_pre_feat) + + conv_pred_name = 'mask_feat_head.conv_pred.0' + self.conv_pred = self.add_sublayer( + conv_pred_name, + ConvNormLayer( + ch_in=self.mid_channels, + ch_out=self.out_channels, + filter_size=1, + stride=1, + use_dcn=self.use_dcn, + norm_type=self.norm_type)) + + def forward(self, inputs): + """ + Get SOLOv2MaskHead output. + + Args: + inputs(list[Tensor]): feature map from each necks with shape of [N, C, H, W] + Returns: + ins_pred(Tensor): Output of SOLOv2MaskHead head + """ + feat_all_level = F.relu(self.convs_all_levels[0](inputs[0])) + for i in range(1, self.range_level): + input_p = inputs[i] + if i == (self.range_level - 1): + input_feat = input_p + x_range = paddle.linspace( + -1, 1, paddle.shape(input_feat)[-1], dtype='float32') + y_range = paddle.linspace( + -1, 1, paddle.shape(input_feat)[-2], dtype='float32') + y, x = paddle.meshgrid([y_range, x_range]) + x = paddle.unsqueeze(x, [0, 1]) + y = paddle.unsqueeze(y, [0, 1]) + y = paddle.expand( + y, shape=[paddle.shape(input_feat)[0], 1, -1, -1]) + x = paddle.expand( + x, shape=[paddle.shape(input_feat)[0], 1, -1, -1]) + coord_feat = paddle.concat([x, y], axis=1) + input_p = paddle.concat([input_p, coord_feat], axis=1) + feat_all_level = paddle.add(feat_all_level, + self.convs_all_levels[i](input_p)) + ins_pred = F.relu(self.conv_pred(feat_all_level)) + + return ins_pred + + +@register +class SOLOv2Head(nn.Layer): + """ + Head block for SOLOv2 network + + Args: + num_classes (int): Number of output classes. + in_channels (int): Number of input channels. + seg_feat_channels (int): Num_filters of kernel & categroy branch convolution operation. + stacked_convs (int): Times of convolution operation. + num_grids (list[int]): List of feature map grids size. + kernel_out_channels (int): Number of output channels in kernel branch. + dcn_v2_stages (list): Which stage use dcn v2 in tower. It is between [0, stacked_convs). + segm_strides (list[int]): List of segmentation area stride. + solov2_loss (object): SOLOv2Loss instance. + score_threshold (float): Threshold of categroy score. + mask_nms (object): MaskMatrixNMS instance. + """ + __inject__ = ['solov2_loss', 'mask_nms'] + __shared__ = ['norm_type', 'num_classes'] + + def __init__(self, + num_classes=80, + in_channels=256, + seg_feat_channels=256, + stacked_convs=4, + num_grids=[40, 36, 24, 16, 12], + kernel_out_channels=256, + dcn_v2_stages=[], + segm_strides=[8, 8, 16, 32, 32], + solov2_loss=None, + score_threshold=0.1, + mask_threshold=0.5, + mask_nms=None, + norm_type='gn', + drop_block=False): + super(SOLOv2Head, self).__init__() + self.num_classes = num_classes + self.in_channels = in_channels + self.seg_num_grids = num_grids + self.cate_out_channels = self.num_classes + self.seg_feat_channels = seg_feat_channels + self.stacked_convs = stacked_convs + self.kernel_out_channels = kernel_out_channels + self.dcn_v2_stages = dcn_v2_stages + self.segm_strides = segm_strides + self.solov2_loss = solov2_loss + self.mask_nms = mask_nms + self.score_threshold = score_threshold + self.mask_threshold = mask_threshold + self.norm_type = norm_type + self.drop_block = drop_block + + self.kernel_pred_convs = [] + self.cate_pred_convs = [] + for i in range(self.stacked_convs): + use_dcn = True if i in self.dcn_v2_stages else False + ch_in = self.in_channels + 2 if i == 0 else self.seg_feat_channels + kernel_conv = self.add_sublayer( + 'bbox_head.kernel_convs.' + str(i), + ConvNormLayer( + ch_in=ch_in, + ch_out=self.seg_feat_channels, + filter_size=3, + stride=1, + use_dcn=use_dcn, + norm_type=self.norm_type)) + self.kernel_pred_convs.append(kernel_conv) + ch_in = self.in_channels if i == 0 else self.seg_feat_channels + cate_conv = self.add_sublayer( + 'bbox_head.cate_convs.' + str(i), + ConvNormLayer( + ch_in=ch_in, + ch_out=self.seg_feat_channels, + filter_size=3, + stride=1, + use_dcn=use_dcn, + norm_type=self.norm_type)) + self.cate_pred_convs.append(cate_conv) + + self.solo_kernel = self.add_sublayer( + 'bbox_head.solo_kernel', + nn.Conv2D( + self.seg_feat_channels, + self.kernel_out_channels, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=True)) + self.solo_cate = self.add_sublayer( + 'bbox_head.solo_cate', + nn.Conv2D( + self.seg_feat_channels, + self.cate_out_channels, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.01)), + bias_attr=ParamAttr(initializer=Constant( + value=float(-np.log((1 - 0.01) / 0.01)))))) + + if self.drop_block and self.training: + self.drop_block_fun = DropBlock( + block_size=3, keep_prob=0.9, name='solo_cate.dropblock') + + def _points_nms(self, heat, kernel_size=2): + hmax = F.max_pool2d(heat, kernel_size=kernel_size, stride=1, padding=1) + keep = paddle.cast((hmax[:, :, :-1, :-1] == heat), 'float32') + return heat * keep + + def _split_feats(self, feats): + return (F.interpolate( + feats[0], + scale_factor=0.5, + align_corners=False, + align_mode=0, + mode='bilinear'), feats[1], feats[2], feats[3], F.interpolate( + feats[4], + size=paddle.shape(feats[3])[-2:], + mode='bilinear', + align_corners=False, + align_mode=0)) + + def forward(self, input): + """ + Get SOLOv2 head output + + Args: + input (list): List of Tensors, output of backbone or neck stages + Returns: + cate_pred_list (list): Tensors of each category branch layer + kernel_pred_list (list): Tensors of each kernel branch layer + """ + feats = self._split_feats(input) + cate_pred_list = [] + kernel_pred_list = [] + for idx in range(len(self.seg_num_grids)): + cate_pred, kernel_pred = self._get_output_single(feats[idx], idx) + cate_pred_list.append(cate_pred) + kernel_pred_list.append(kernel_pred) + + return cate_pred_list, kernel_pred_list + + def _get_output_single(self, input, idx): + ins_kernel_feat = input + # CoordConv + x_range = paddle.linspace( + -1, 1, paddle.shape(ins_kernel_feat)[-1], dtype='float32') + y_range = paddle.linspace( + -1, 1, paddle.shape(ins_kernel_feat)[-2], dtype='float32') + y, x = paddle.meshgrid([y_range, x_range]) + x = paddle.unsqueeze(x, [0, 1]) + y = paddle.unsqueeze(y, [0, 1]) + y = paddle.expand( + y, shape=[paddle.shape(ins_kernel_feat)[0], 1, -1, -1]) + x = paddle.expand( + x, shape=[paddle.shape(ins_kernel_feat)[0], 1, -1, -1]) + coord_feat = paddle.concat([x, y], axis=1) + ins_kernel_feat = paddle.concat([ins_kernel_feat, coord_feat], axis=1) + + # kernel branch + kernel_feat = ins_kernel_feat + seg_num_grid = self.seg_num_grids[idx] + kernel_feat = F.interpolate( + kernel_feat, + size=[seg_num_grid, seg_num_grid], + mode='bilinear', + align_corners=False, + align_mode=0) + cate_feat = kernel_feat[:, :-2, :, :] + + for kernel_layer in self.kernel_pred_convs: + kernel_feat = F.relu(kernel_layer(kernel_feat)) + if self.drop_block and self.training: + kernel_feat = self.drop_block_fun(kernel_feat) + kernel_pred = self.solo_kernel(kernel_feat) + # cate branch + for cate_layer in self.cate_pred_convs: + cate_feat = F.relu(cate_layer(cate_feat)) + if self.drop_block and self.training: + cate_feat = self.drop_block_fun(cate_feat) + cate_pred = self.solo_cate(cate_feat) + + if not self.training: + cate_pred = self._points_nms(F.sigmoid(cate_pred), kernel_size=2) + cate_pred = paddle.transpose(cate_pred, [0, 2, 3, 1]) + return cate_pred, kernel_pred + + def get_loss(self, cate_preds, kernel_preds, ins_pred, ins_labels, + cate_labels, grid_order_list, fg_num): + """ + Get loss of network of SOLOv2. + + Args: + cate_preds (list): Tensor list of categroy branch output. + kernel_preds (list): Tensor list of kernel branch output. + ins_pred (list): Tensor list of instance branch output. + ins_labels (list): List of instance labels pre batch. + cate_labels (list): List of categroy labels pre batch. + grid_order_list (list): List of index in pre grid. + fg_num (int): Number of positive samples in a mini-batch. + Returns: + loss_ins (Tensor): The instance loss Tensor of SOLOv2 network. + loss_cate (Tensor): The category loss Tensor of SOLOv2 network. + """ + batch_size = paddle.shape(grid_order_list[0])[0] + ins_pred_list = [] + for kernel_preds_level, grid_orders_level in zip(kernel_preds, + grid_order_list): + if grid_orders_level.shape[1] == 0: + ins_pred_list.append(None) + continue + grid_orders_level = paddle.reshape(grid_orders_level, [-1]) + reshape_pred = paddle.reshape( + kernel_preds_level, + shape=(paddle.shape(kernel_preds_level)[0], + paddle.shape(kernel_preds_level)[1], -1)) + reshape_pred = paddle.transpose(reshape_pred, [0, 2, 1]) + reshape_pred = paddle.reshape( + reshape_pred, shape=(-1, paddle.shape(reshape_pred)[2])) + gathered_pred = paddle.gather(reshape_pred, index=grid_orders_level) + gathered_pred = paddle.reshape( + gathered_pred, + shape=[batch_size, -1, paddle.shape(gathered_pred)[1]]) + cur_ins_pred = ins_pred + cur_ins_pred = paddle.reshape( + cur_ins_pred, + shape=(paddle.shape(cur_ins_pred)[0], + paddle.shape(cur_ins_pred)[1], -1)) + ins_pred_conv = paddle.matmul(gathered_pred, cur_ins_pred) + cur_ins_pred = paddle.reshape( + ins_pred_conv, + shape=(-1, paddle.shape(ins_pred)[-2], + paddle.shape(ins_pred)[-1])) + ins_pred_list.append(cur_ins_pred) + + num_ins = paddle.sum(fg_num) + cate_preds = [ + paddle.reshape( + paddle.transpose(cate_pred, [0, 2, 3, 1]), + shape=(-1, self.cate_out_channels)) for cate_pred in cate_preds + ] + flatten_cate_preds = paddle.concat(cate_preds) + new_cate_labels = [] + for cate_label in cate_labels: + new_cate_labels.append(paddle.reshape(cate_label, shape=[-1])) + cate_labels = paddle.concat(new_cate_labels) + + loss_ins, loss_cate = self.solov2_loss( + ins_pred_list, ins_labels, flatten_cate_preds, cate_labels, num_ins) + + return {'loss_ins': loss_ins, 'loss_cate': loss_cate} + + def get_prediction(self, cate_preds, kernel_preds, seg_pred, im_shape, + scale_factor): + """ + Get prediction result of SOLOv2 network + + Args: + cate_preds (list): List of Variables, output of categroy branch. + kernel_preds (list): List of Variables, output of kernel branch. + seg_pred (list): List of Variables, output of mask head stages. + im_shape (Variables): [h, w] for input images. + scale_factor (Variables): [scale, scale] for input images. + Returns: + seg_masks (Tensor): The prediction segmentation. + cate_labels (Tensor): The prediction categroy label of each segmentation. + seg_masks (Tensor): The prediction score of each segmentation. + """ + num_levels = len(cate_preds) + featmap_size = paddle.shape(seg_pred)[-2:] + seg_masks_list = [] + cate_labels_list = [] + cate_scores_list = [] + cate_preds = [cate_pred * 1.0 for cate_pred in cate_preds] + kernel_preds = [kernel_pred * 1.0 for kernel_pred in kernel_preds] + # Currently only supports batch size == 1 + for idx in range(1): + cate_pred_list = [ + paddle.reshape( + cate_preds[i][idx], shape=(-1, self.cate_out_channels)) + for i in range(num_levels) + ] + seg_pred_list = seg_pred + kernel_pred_list = [ + paddle.reshape( + paddle.transpose(kernel_preds[i][idx], [1, 2, 0]), + shape=(-1, self.kernel_out_channels)) + for i in range(num_levels) + ] + cate_pred_list = paddle.concat(cate_pred_list, axis=0) + kernel_pred_list = paddle.concat(kernel_pred_list, axis=0) + + seg_masks, cate_labels, cate_scores = self.get_seg_single( + cate_pred_list, seg_pred_list, kernel_pred_list, featmap_size, + im_shape[idx], scale_factor[idx][0]) + bbox_num = paddle.shape(cate_labels)[0] + return seg_masks, cate_labels, cate_scores, bbox_num + + def get_seg_single(self, cate_preds, seg_preds, kernel_preds, featmap_size, + im_shape, scale_factor): + """ + The code of this function is based on: + https://github.com/WXinlong/SOLO/blob/master/mmdet/models/anchor_heads/solov2_head.py#L385 + """ + h = paddle.cast(im_shape[0], 'int32')[0] + w = paddle.cast(im_shape[1], 'int32')[0] + upsampled_size_out = [featmap_size[0] * 4, featmap_size[1] * 4] + + y = paddle.zeros(shape=paddle.shape(cate_preds), dtype='float32') + inds = paddle.where(cate_preds > self.score_threshold, cate_preds, y) + inds = paddle.nonzero(inds) + cate_preds = paddle.reshape(cate_preds, shape=[-1]) + # Prevent empty and increase fake data + ind_a = paddle.cast(paddle.shape(kernel_preds)[0], 'int64') + ind_b = paddle.zeros(shape=[1], dtype='int64') + inds_end = paddle.unsqueeze(paddle.concat([ind_a, ind_b]), 0) + inds = paddle.concat([inds, inds_end]) + kernel_preds_end = paddle.ones( + shape=[1, self.kernel_out_channels], dtype='float32') + kernel_preds = paddle.concat([kernel_preds, kernel_preds_end]) + cate_preds = paddle.concat( + [cate_preds, paddle.zeros( + shape=[1], dtype='float32')]) + + # cate_labels & kernel_preds + cate_labels = inds[:, 1] + kernel_preds = paddle.gather(kernel_preds, index=inds[:, 0]) + cate_score_idx = paddle.add(inds[:, 0] * self.cate_out_channels, + cate_labels) + cate_scores = paddle.gather(cate_preds, index=cate_score_idx) + + size_trans = np.power(self.seg_num_grids, 2) + strides = [] + for _ind in range(len(self.segm_strides)): + strides.append( + paddle.full( + shape=[int(size_trans[_ind])], + fill_value=self.segm_strides[_ind], + dtype="int32")) + strides = paddle.concat(strides) + strides = paddle.concat( + [strides, paddle.zeros( + shape=[1], dtype='int32')]) + strides = paddle.gather(strides, index=inds[:, 0]) + + # mask encoding. + kernel_preds = paddle.unsqueeze(kernel_preds, [2, 3]) + seg_preds = F.conv2d(seg_preds, kernel_preds) + seg_preds = F.sigmoid(paddle.squeeze(seg_preds, [0])) + seg_masks = seg_preds > self.mask_threshold + seg_masks = paddle.cast(seg_masks, 'float32') + sum_masks = paddle.sum(seg_masks, axis=[1, 2]) + + y = paddle.zeros(shape=paddle.shape(sum_masks), dtype='float32') + keep = paddle.where(sum_masks > strides, sum_masks, y) + keep = paddle.nonzero(keep) + keep = paddle.squeeze(keep, axis=[1]) + # Prevent empty and increase fake data + keep_other = paddle.concat( + [keep, paddle.cast(paddle.shape(sum_masks)[0] - 1, 'int64')]) + keep_scores = paddle.concat( + [keep, paddle.cast(paddle.shape(sum_masks)[0], 'int64')]) + cate_scores_end = paddle.zeros(shape=[1], dtype='float32') + cate_scores = paddle.concat([cate_scores, cate_scores_end]) + + seg_masks = paddle.gather(seg_masks, index=keep_other) + seg_preds = paddle.gather(seg_preds, index=keep_other) + sum_masks = paddle.gather(sum_masks, index=keep_other) + cate_labels = paddle.gather(cate_labels, index=keep_other) + cate_scores = paddle.gather(cate_scores, index=keep_scores) + + # mask scoring. + seg_mul = paddle.cast(seg_preds * seg_masks, 'float32') + seg_scores = paddle.sum(seg_mul, axis=[1, 2]) / sum_masks + cate_scores *= seg_scores + # Matrix NMS + seg_preds, cate_scores, cate_labels = self.mask_nms( + seg_preds, seg_masks, cate_labels, cate_scores, sum_masks=sum_masks) + ori_shape = im_shape[:2] / scale_factor + 0.5 + ori_shape = paddle.cast(ori_shape, 'int32') + seg_preds = F.interpolate( + paddle.unsqueeze(seg_preds, 0), + size=upsampled_size_out, + mode='bilinear', + align_corners=False, + align_mode=0) + seg_preds = paddle.slice( + seg_preds, axes=[2, 3], starts=[0, 0], ends=[h, w]) + seg_masks = paddle.squeeze( + F.interpolate( + seg_preds, + size=ori_shape[:2], + mode='bilinear', + align_corners=False, + align_mode=0), + axis=[0]) + seg_masks = paddle.cast(seg_masks > self.mask_threshold, 'uint8') + return seg_masks, cate_labels, cate_scores diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/sparse_roi_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/sparse_roi_head.py new file mode 100644 index 0000000000000000000000000000000000000000..bdc76a946a2a3a817376f7caba64d99200c81a97 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/sparse_roi_head.py @@ -0,0 +1,467 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This code is referenced from: https://github.com/open-mmlab/mmdetection + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import copy + +import paddle +from paddle import nn + +from ppdet.core.workspace import register +from ppdet.modeling import initializer as init +from .roi_extractor import RoIAlign +from ..bbox_utils import delta2bbox_v2 +from ..cls_utils import _get_class_default_kwargs +from ..layers import MultiHeadAttention + +__all__ = ['SparseRoIHead', 'DIIHead', 'DynamicMaskHead'] + + +class DynamicConv(nn.Layer): + def __init__(self, + in_channels=256, + feature_channels=64, + out_channels=None, + roi_resolution=7, + with_proj=True): + super(DynamicConv, self).__init__() + + self.in_channels = in_channels + self.feature_channels = feature_channels + self.out_channels = out_channels if out_channels else in_channels + + self.num_params_in = self.in_channels * self.feature_channels + self.num_params_out = self.out_channels * self.feature_channels + self.dynamic_layer = nn.Linear(self.in_channels, + self.num_params_in + self.num_params_out) + + self.norm_in = nn.LayerNorm(self.feature_channels) + self.norm_out = nn.LayerNorm(self.out_channels) + + self.activation = nn.ReLU() + + self.with_proj = with_proj + if self.with_proj: + num_output = self.out_channels * roi_resolution**2 + self.fc_layer = nn.Linear(num_output, self.out_channels) + self.fc_norm = nn.LayerNorm(self.out_channels) + + def forward(self, param_feature, input_feature): + input_feature = input_feature.flatten(2).transpose([2, 0, 1]) + input_feature = input_feature.transpose([1, 0, 2]) + + parameters = self.dynamic_layer(param_feature) + + param_in = parameters[:, :self.num_params_in].reshape( + [-1, self.in_channels, self.feature_channels]) + param_out = parameters[:, -self.num_params_out:].reshape( + [-1, self.feature_channels, self.out_channels]) + + features = paddle.bmm(input_feature, param_in) + features = self.norm_in(features) + features = self.activation(features) + + features = paddle.bmm(features, param_out) + features = self.norm_out(features) + features = self.activation(features) + + if self.with_proj: + features = features.flatten(1) + features = self.fc_layer(features) + features = self.fc_norm(features) + features = self.activation(features) + + return features + + +class FFN(nn.Layer): + def __init__(self, + embed_dims=256, + feedforward_channels=2048, + num_fcs=2, + ffn_drop=0.0, + add_identity=True): + super(FFN, self).__init__() + + layers = [] + in_channels = embed_dims + for _ in range(num_fcs - 1): + layers.append( + nn.Sequential( + nn.Linear(in_channels, feedforward_channels), + nn.ReLU(), nn.Dropout(ffn_drop))) + in_channels = feedforward_channels + layers.append(nn.Linear(feedforward_channels, embed_dims)) + layers.append(nn.Dropout(ffn_drop)) + self.layers = nn.Sequential(*layers) + + self.add_identity = add_identity + + def forward(self, x): + identity = x + out = self.layers(x) + if not self.add_identity: + return out + else: + return out + identity + + +@register +class DynamicMaskHead(nn.Layer): + __shared__ = ['num_classes', 'proposal_embedding_dim', 'norm_type'] + + def __init__(self, + num_classes=80, + proposal_embedding_dim=256, + dynamic_feature_channels=64, + roi_resolution=14, + num_convs=4, + conv_kernel_size=3, + conv_channels=256, + upsample_method='deconv', + upsample_scale_factor=2, + norm_type='bn'): + super(DynamicMaskHead, self).__init__() + + self.d_model = proposal_embedding_dim + + self.instance_interactive_conv = DynamicConv( + self.d_model, + dynamic_feature_channels, + roi_resolution=roi_resolution, + with_proj=False) + + self.convs = nn.LayerList() + for i in range(num_convs): + self.convs.append( + nn.Sequential( + nn.Conv2D( + self.d_model if i == 0 else conv_channels, + conv_channels, + conv_kernel_size, + padding='same', + bias_attr=False), + nn.BatchNorm2D(conv_channels), + nn.ReLU())) + if norm_type == 'sync_bn': + self.convs = nn.SyncBatchNorm.convert_sync_batchnorm(self.convs) + + self.upsample_method = upsample_method + if upsample_method is None: + self.upsample = None + elif upsample_method == 'deconv': + self.upsample = nn.Conv2DTranspose( + conv_channels if num_convs > 0 else self.d_model, + conv_channels, + upsample_scale_factor, + stride=upsample_scale_factor) + self.relu = nn.ReLU() + else: + self.upsample = nn.Upsample(None, upsample_scale_factor) + + cls_in_channels = conv_channels if num_convs > 0 else self.d_model + cls_in_channels = conv_channels if upsample_method == 'deconv' else cls_in_channels + self.conv_cls = nn.Conv2D(cls_in_channels, num_classes, 1) + + self._init_weights() + + def _init_weights(self): + for p in self.parameters(): + if p.dim() > 1: + init.xavier_uniform_(p) + + init.constant_(self.conv_cls.bias, 0.) + + def forward(self, roi_features, attn_features): + attn_features = attn_features.reshape([-1, self.d_model]) + attn_features_iic = self.instance_interactive_conv(attn_features, + roi_features) + + x = attn_features_iic.transpose([0, 2, 1]).reshape(roi_features.shape) + + for conv in self.convs: + x = conv(x) + if self.upsample is not None: + x = self.upsample(x) + if self.upsample_method == 'deconv': + x = self.relu(x) + mask_pred = self.conv_cls(x) + return mask_pred + + +@register +class DIIHead(nn.Layer): + __shared__ = ['num_classes', 'proposal_embedding_dim'] + + def __init__(self, + num_classes=80, + proposal_embedding_dim=256, + feedforward_channels=2048, + dynamic_feature_channels=64, + roi_resolution=7, + num_attn_heads=8, + dropout=0.0, + num_ffn_fcs=2, + num_cls_fcs=1, + num_reg_fcs=3): + super(DIIHead, self).__init__() + + self.num_classes = num_classes + self.d_model = proposal_embedding_dim + + self.attention = MultiHeadAttention(self.d_model, num_attn_heads, + dropout) + self.attention_norm = nn.LayerNorm(self.d_model) + + self.instance_interactive_conv = DynamicConv( + self.d_model, + dynamic_feature_channels, + roi_resolution=roi_resolution, + with_proj=True) + self.instance_interactive_conv_dropout = nn.Dropout(dropout) + self.instance_interactive_conv_norm = nn.LayerNorm(self.d_model) + + self.ffn = FFN(self.d_model, feedforward_channels, num_ffn_fcs, dropout) + self.ffn_norm = nn.LayerNorm(self.d_model) + + self.cls_fcs = nn.LayerList() + for _ in range(num_cls_fcs): + self.cls_fcs.append( + nn.Linear( + self.d_model, self.d_model, bias_attr=False)) + self.cls_fcs.append(nn.LayerNorm(self.d_model)) + self.cls_fcs.append(nn.ReLU()) + self.fc_cls = nn.Linear(self.d_model, self.num_classes) + + self.reg_fcs = nn.LayerList() + for _ in range(num_reg_fcs): + self.reg_fcs.append( + nn.Linear( + self.d_model, self.d_model, bias_attr=False)) + self.reg_fcs.append(nn.LayerNorm(self.d_model)) + self.reg_fcs.append(nn.ReLU()) + self.fc_reg = nn.Linear(self.d_model, 4) + + self._init_weights() + + def _init_weights(self): + for p in self.parameters(): + if p.dim() > 1: + init.xavier_uniform_(p) + + bias_init = init.bias_init_with_prob(0.01) + init.constant_(self.fc_cls.bias, bias_init) + + def forward(self, roi_features, proposal_features): + N, num_proposals = proposal_features.shape[:2] + + proposal_features = proposal_features + self.attention( + proposal_features) + attn_features = self.attention_norm(proposal_features) + + proposal_features = attn_features.reshape([-1, self.d_model]) + proposal_features_iic = self.instance_interactive_conv( + proposal_features, roi_features) + proposal_features = proposal_features + self.instance_interactive_conv_dropout( + proposal_features_iic) + obj_features = self.instance_interactive_conv_norm(proposal_features) + + obj_features = self.ffn(obj_features) + obj_features = self.ffn_norm(obj_features) + + cls_feature = obj_features.clone() + reg_feature = obj_features.clone() + + for cls_layer in self.cls_fcs: + cls_feature = cls_layer(cls_feature) + class_logits = self.fc_cls(cls_feature) + for reg_layer in self.reg_fcs: + reg_feature = reg_layer(reg_feature) + bbox_deltas = self.fc_reg(reg_feature) + + class_logits = class_logits.reshape( + [N, num_proposals, self.num_classes]) + bbox_deltas = bbox_deltas.reshape([N, num_proposals, 4]) + obj_features = obj_features.reshape([N, num_proposals, self.d_model]) + + return class_logits, bbox_deltas, obj_features, attn_features + + @staticmethod + def refine_bboxes(proposal_bboxes, bbox_deltas): + pred_bboxes = delta2bbox_v2( + bbox_deltas.reshape([-1, 4]), + proposal_bboxes.reshape([-1, 4]), + delta_mean=[0.0, 0.0, 0.0, 0.0], + delta_std=[0.5, 0.5, 1.0, 1.0], + ctr_clip=None) + return pred_bboxes.reshape(proposal_bboxes.shape) + + +@register +class SparseRoIHead(nn.Layer): + __inject__ = ['bbox_head', 'mask_head', 'loss_func'] + + def __init__(self, + num_stages=6, + bbox_roi_extractor=_get_class_default_kwargs(RoIAlign), + mask_roi_extractor=_get_class_default_kwargs(RoIAlign), + bbox_head='DIIHead', + mask_head='DynamicMaskHead', + loss_func='QueryInstLoss'): + super(SparseRoIHead, self).__init__() + + self.num_stages = num_stages + + self.bbox_roi_extractor = bbox_roi_extractor + self.mask_roi_extractor = mask_roi_extractor + if isinstance(bbox_roi_extractor, dict): + self.bbox_roi_extractor = RoIAlign(**bbox_roi_extractor) + if isinstance(mask_roi_extractor, dict): + self.mask_roi_extractor = RoIAlign(**mask_roi_extractor) + + self.bbox_heads = nn.LayerList( + [copy.deepcopy(bbox_head) for _ in range(num_stages)]) + self.mask_heads = nn.LayerList( + [copy.deepcopy(mask_head) for _ in range(num_stages)]) + + self.loss_helper = loss_func + + @classmethod + def from_config(cls, cfg, input_shape): + bbox_roi_extractor = cfg['bbox_roi_extractor'] + mask_roi_extractor = cfg['mask_roi_extractor'] + assert isinstance(bbox_roi_extractor, dict) + assert isinstance(mask_roi_extractor, dict) + + kwargs = RoIAlign.from_config(cfg, input_shape) + bbox_roi_extractor.update(kwargs) + mask_roi_extractor.update(kwargs) + + return { + 'bbox_roi_extractor': bbox_roi_extractor, + 'mask_roi_extractor': mask_roi_extractor + } + + @staticmethod + def get_roi_features(features, bboxes, roi_extractor): + rois_list = [ + bboxes[i] for i in range(len(bboxes)) if len(bboxes[i]) > 0 + ] + rois_num = paddle.to_tensor( + [len(bboxes[i]) for i in range(len(bboxes))], dtype='int32') + + pos_ids = paddle.cast(rois_num, dtype='bool') + if pos_ids.sum() != len(rois_num): + rois_num = rois_num[pos_ids] + features = [features[i][pos_ids] for i in range(len(features))] + + return roi_extractor(features, rois_list, rois_num) + + def _forward_train(self, body_feats, pro_bboxes, pro_feats, targets): + all_stage_losses = {} + for stage in range(self.num_stages): + bbox_head = self.bbox_heads[stage] + mask_head = self.mask_heads[stage] + + roi_feats = self.get_roi_features(body_feats, pro_bboxes, + self.bbox_roi_extractor) + class_logits, bbox_deltas, pro_feats, attn_feats = bbox_head( + roi_feats, pro_feats) + bbox_pred = self.bbox_heads[stage].refine_bboxes(pro_bboxes, + bbox_deltas) + + indices = self.loss_helper.matcher({ + 'pred_logits': class_logits.detach(), + 'pred_boxes': bbox_pred.detach() + }, targets) + avg_factor = paddle.to_tensor( + [sum(len(tgt['labels']) for tgt in targets)], dtype='float32') + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(avg_factor) + avg_factor /= paddle.distributed.get_world_size() + avg_factor = paddle.clip(avg_factor, min=1.) + + loss_classes = self.loss_helper.loss_classes(class_logits, targets, + indices, avg_factor) + if sum(len(v['labels']) for v in targets) == 0: + loss_bboxes = { + 'loss_bbox': paddle.to_tensor([0.]), + 'loss_giou': paddle.to_tensor([0.]) + } + loss_masks = {'loss_mask': paddle.to_tensor([0.])} + else: + loss_bboxes = self.loss_helper.loss_bboxes(bbox_pred, targets, + indices, avg_factor) + + pos_attn_feats = paddle.concat([ + paddle.gather( + src, src_idx, axis=0) + for src, (src_idx, _) in zip(attn_feats, indices) + ]) + pos_bbox_pred = [ + paddle.gather( + src, src_idx, axis=0) + for src, (src_idx, _) in zip(bbox_pred.detach(), indices) + ] + pos_roi_feats = self.get_roi_features(body_feats, pos_bbox_pred, + self.mask_roi_extractor) + mask_logits = mask_head(pos_roi_feats, pos_attn_feats) + loss_masks = self.loss_helper.loss_masks( + pos_bbox_pred, mask_logits, targets, indices, avg_factor) + + for loss in [loss_classes, loss_bboxes, loss_masks]: + for key in loss.keys(): + all_stage_losses[f'stage{stage}_{key}'] = loss[key] + + pro_bboxes = bbox_pred.detach() + + return all_stage_losses + + def _forward_test(self, body_feats, pro_bboxes, pro_feats): + for stage in range(self.num_stages): + roi_feats = self.get_roi_features(body_feats, pro_bboxes, + self.bbox_roi_extractor) + class_logits, bbox_deltas, pro_feats, attn_feats = self.bbox_heads[ + stage](roi_feats, pro_feats) + bbox_pred = self.bbox_heads[stage].refine_bboxes(pro_bboxes, + bbox_deltas) + + pro_bboxes = bbox_pred.detach() + + roi_feats = self.get_roi_features(body_feats, bbox_pred, + self.mask_roi_extractor) + mask_logits = self.mask_heads[stage](roi_feats, attn_feats) + + return { + 'class_logits': class_logits, + 'bbox_pred': bbox_pred, + 'mask_logits': mask_logits + } + + def forward(self, + body_features, + proposal_bboxes, + proposal_features, + targets=None): + if self.training: + return self._forward_train(body_features, proposal_bboxes, + proposal_features, targets) + else: + return self._forward_test(body_features, proposal_bboxes, + proposal_features) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/sparsercnn_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/sparsercnn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..801ff04fb772aec568dd94d9d4916d2b778e88fe --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/sparsercnn_head.py @@ -0,0 +1,377 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/PeizeSun/SparseR-CNN/blob/main/projects/SparseRCNN/sparsercnn/head.py +Ths copyright of PeizeSun/SparseR-CNN is as follows: +MIT License [see LICENSE for details] +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import copy +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register +from ppdet.modeling.heads.roi_extractor import RoIAlign +from ppdet.modeling.bbox_utils import delta2bbox +from .. import initializer as init + +_DEFAULT_SCALE_CLAMP = math.log(100000. / 16) + + +class DynamicConv(nn.Layer): + def __init__( + self, + head_hidden_dim, + head_dim_dynamic, + head_num_dynamic, ): + super().__init__() + + self.hidden_dim = head_hidden_dim + self.dim_dynamic = head_dim_dynamic + self.num_dynamic = head_num_dynamic + self.num_params = self.hidden_dim * self.dim_dynamic + self.dynamic_layer = nn.Linear(self.hidden_dim, + self.num_dynamic * self.num_params) + + self.norm1 = nn.LayerNorm(self.dim_dynamic) + self.norm2 = nn.LayerNorm(self.hidden_dim) + + self.activation = nn.ReLU() + + pooler_resolution = 7 + num_output = self.hidden_dim * pooler_resolution**2 + self.out_layer = nn.Linear(num_output, self.hidden_dim) + self.norm3 = nn.LayerNorm(self.hidden_dim) + + def forward(self, pro_features, roi_features): + ''' + pro_features: (1, N * nr_boxes, self.d_model) + roi_features: (49, N * nr_boxes, self.d_model) + ''' + features = roi_features.transpose(perm=[1, 0, 2]) + parameters = self.dynamic_layer(pro_features).transpose(perm=[1, 0, 2]) + + param1 = parameters[:, :, :self.num_params].reshape( + [-1, self.hidden_dim, self.dim_dynamic]) + param2 = parameters[:, :, self.num_params:].reshape( + [-1, self.dim_dynamic, self.hidden_dim]) + + features = paddle.bmm(features, param1) + features = self.norm1(features) + features = self.activation(features) + + features = paddle.bmm(features, param2) + features = self.norm2(features) + features = self.activation(features) + + features = features.flatten(1) + features = self.out_layer(features) + features = self.norm3(features) + features = self.activation(features) + + return features + + +class RCNNHead(nn.Layer): + def __init__( + self, + d_model, + num_classes, + dim_feedforward, + nhead, + dropout, + head_cls, + head_reg, + head_dim_dynamic, + head_num_dynamic, + scale_clamp: float=_DEFAULT_SCALE_CLAMP, + bbox_weights=(2.0, 2.0, 1.0, 1.0), ): + super().__init__() + + self.d_model = d_model + + # dynamic. + self.self_attn = nn.MultiHeadAttention(d_model, nhead, dropout=dropout) + self.inst_interact = DynamicConv(d_model, head_dim_dynamic, + head_num_dynamic) + + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(dropout) + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.norm3 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout) + self.dropout2 = nn.Dropout(dropout) + self.dropout3 = nn.Dropout(dropout) + + self.activation = nn.ReLU() + + # cls. + num_cls = head_cls + cls_module = list() + for _ in range(num_cls): + cls_module.append(nn.Linear(d_model, d_model, bias_attr=False)) + cls_module.append(nn.LayerNorm(d_model)) + cls_module.append(nn.ReLU()) + self.cls_module = nn.LayerList(cls_module) + + # reg. + num_reg = head_reg + reg_module = list() + for _ in range(num_reg): + reg_module.append(nn.Linear(d_model, d_model, bias_attr=False)) + reg_module.append(nn.LayerNorm(d_model)) + reg_module.append(nn.ReLU()) + self.reg_module = nn.LayerList(reg_module) + + # pred. + self.class_logits = nn.Linear(d_model, num_classes) + self.bboxes_delta = nn.Linear(d_model, 4) + self.scale_clamp = scale_clamp + self.bbox_weights = bbox_weights + + def forward(self, features, bboxes, pro_features, pooler): + """ + :param bboxes: (N, nr_boxes, 4) + :param pro_features: (N, nr_boxes, d_model) + """ + + N, nr_boxes = bboxes.shape[:2] + + proposal_boxes = list() + for b in range(N): + proposal_boxes.append(bboxes[b]) + roi_num = paddle.full([N], nr_boxes).astype("int32") + + roi_features = pooler(features, proposal_boxes, roi_num) + roi_features = roi_features.reshape( + [N * nr_boxes, self.d_model, -1]).transpose(perm=[2, 0, 1]) + + # self_att. + pro_features = pro_features.reshape([N, nr_boxes, self.d_model]) + pro_features2 = self.self_attn( + pro_features, pro_features, value=pro_features) + pro_features = pro_features.transpose(perm=[1, 0, 2]) + self.dropout1( + pro_features2.transpose(perm=[1, 0, 2])) + pro_features = self.norm1(pro_features) + + # inst_interact. + pro_features = pro_features.reshape( + [nr_boxes, N, self.d_model]).transpose(perm=[1, 0, 2]).reshape( + [1, N * nr_boxes, self.d_model]) + pro_features2 = self.inst_interact(pro_features, roi_features) + pro_features = pro_features + self.dropout2(pro_features2) + obj_features = self.norm2(pro_features) + + # obj_feature. + obj_features2 = self.linear2( + self.dropout(self.activation(self.linear1(obj_features)))) + obj_features = obj_features + self.dropout3(obj_features2) + obj_features = self.norm3(obj_features) + + fc_feature = obj_features.transpose(perm=[1, 0, 2]).reshape( + [N * nr_boxes, -1]) + cls_feature = fc_feature.clone() + reg_feature = fc_feature.clone() + for cls_layer in self.cls_module: + cls_feature = cls_layer(cls_feature) + for reg_layer in self.reg_module: + reg_feature = reg_layer(reg_feature) + class_logits = self.class_logits(cls_feature) + bboxes_deltas = self.bboxes_delta(reg_feature) + pred_bboxes = delta2bbox(bboxes_deltas, + bboxes.reshape([-1, 4]), self.bbox_weights) + + return class_logits.reshape([N, nr_boxes, -1]), pred_bboxes.reshape( + [N, nr_boxes, -1]), obj_features + + +@register +class SparseRCNNHead(nn.Layer): + ''' + SparsercnnHead + Args: + roi_input_shape (list[ShapeSpec]): The output shape of fpn + num_classes (int): Number of classes, + head_hidden_dim (int): The param of MultiHeadAttention, + head_dim_feedforward (int): The param of MultiHeadAttention, + nhead (int): The param of MultiHeadAttention, + head_dropout (float): The p of dropout, + head_cls (int): The number of class head, + head_reg (int): The number of regressionhead, + head_num_dynamic (int): The number of DynamicConv's param, + head_num_heads (int): The number of RCNNHead, + deep_supervision (int): wheather supervise the intermediate results, + num_proposals (int): the number of proposals boxes and features + ''' + __inject__ = ['loss_func'] + __shared__ = ['num_classes'] + + def __init__( + self, + head_hidden_dim, + head_dim_feedforward, + nhead, + head_dropout, + head_cls, + head_reg, + head_dim_dynamic, + head_num_dynamic, + head_num_heads, + deep_supervision, + num_proposals, + num_classes=80, + loss_func="SparseRCNNLoss", + roi_input_shape=None, ): + super().__init__() + assert head_num_heads > 0, \ + f'At least one RoI Head is required, but {head_num_heads}.' + + # Build RoI. + box_pooler = self._init_box_pooler(roi_input_shape) + self.box_pooler = box_pooler + + # Build heads. + rcnn_head = RCNNHead( + head_hidden_dim, + num_classes, + head_dim_feedforward, + nhead, + head_dropout, + head_cls, + head_reg, + head_dim_dynamic, + head_num_dynamic, ) + self.head_series = nn.LayerList( + [copy.deepcopy(rcnn_head) for i in range(head_num_heads)]) + self.return_intermediate = deep_supervision + + self.num_classes = num_classes + + # build init proposal + self.init_proposal_features = nn.Embedding(num_proposals, + head_hidden_dim) + self.init_proposal_boxes = nn.Embedding(num_proposals, 4) + + self.lossfunc = loss_func + + # Init parameters. + init.reset_initialized_parameter(self) + self._reset_parameters() + + def _reset_parameters(self): + # init all parameters. + prior_prob = 0.01 + bias_value = -math.log((1 - prior_prob) / prior_prob) + + for m in self.sublayers(): + if isinstance(m, nn.Linear): + init.xavier_normal_(m.weight, reverse=True) + elif not isinstance(m, nn.Embedding) and hasattr( + m, "weight") and m.weight.dim() > 1: + init.xavier_normal_(m.weight, reverse=False) + + if hasattr(m, "bias") and m.bias is not None and m.bias.shape[ + -1] == self.num_classes: + init.constant_(m.bias, bias_value) + + init_bboxes = paddle.empty_like(self.init_proposal_boxes.weight) + init_bboxes[:, :2] = 0.5 + init_bboxes[:, 2:] = 1.0 + self.init_proposal_boxes.weight.set_value(init_bboxes) + + @staticmethod + def _init_box_pooler(input_shape): + + pooler_resolution = 7 + sampling_ratio = 2 + + if input_shape is not None: + pooler_scales = tuple(1.0 / input_shape[k].stride + for k in range(len(input_shape))) + in_channels = [ + input_shape[f].channels for f in range(len(input_shape)) + ] + end_level = len(input_shape) - 1 + # Check all channel counts are equal + assert len(set(in_channels)) == 1, in_channels + else: + pooler_scales = [1.0 / 4.0, 1.0 / 8.0, 1.0 / 16.0, 1.0 / 32.0] + end_level = 3 + + box_pooler = RoIAlign( + resolution=pooler_resolution, + spatial_scale=pooler_scales, + sampling_ratio=sampling_ratio, + end_level=end_level, + aligned=True) + return box_pooler + + def forward(self, features, input_whwh): + + bs = len(features[0]) + bboxes = box_cxcywh_to_xyxy(self.init_proposal_boxes.weight.clone( + )).unsqueeze(0) + bboxes = bboxes * input_whwh.unsqueeze(-2) + + init_features = self.init_proposal_features.weight.unsqueeze(0).tile( + [1, bs, 1]) + proposal_features = init_features.clone() + + inter_class_logits = [] + inter_pred_bboxes = [] + + for stage, rcnn_head in enumerate(self.head_series): + class_logits, pred_bboxes, proposal_features = rcnn_head( + features, bboxes, proposal_features, self.box_pooler) + + if self.return_intermediate or stage == len(self.head_series) - 1: + inter_class_logits.append(class_logits) + inter_pred_bboxes.append(pred_bboxes) + bboxes = pred_bboxes.detach() + + output = { + 'pred_logits': inter_class_logits[-1], + 'pred_boxes': inter_pred_bboxes[-1] + } + if self.return_intermediate: + output['aux_outputs'] = [{ + 'pred_logits': a, + 'pred_boxes': b + } for a, b in zip(inter_class_logits[:-1], inter_pred_bboxes[:-1])] + + return output + + def get_loss(self, outputs, targets): + losses = self.lossfunc(outputs, targets) + weight_dict = self.lossfunc.weight_dict + + for k in losses.keys(): + if k in weight_dict: + losses[k] *= weight_dict[k] + + return losses + + +def box_cxcywh_to_xyxy(x): + x_c, y_c, w, h = x.unbind(-1) + b = [(x_c - 0.5 * w), (y_c - 0.5 * h), (x_c + 0.5 * w), (y_c + 0.5 * h)] + return paddle.stack(b, axis=-1) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/ssd_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/ssd_head.py new file mode 100644 index 0000000000000000000000000000000000000000..a6df4824dc036d6419f73ec82dc00e8adf0bd780 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/ssd_head.py @@ -0,0 +1,216 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from paddle.regularizer import L2Decay +from paddle import ParamAttr + +from ..layers import AnchorGeneratorSSD +from ..cls_utils import _get_class_default_kwargs + + +class SepConvLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size=3, + padding=1, + conv_decay=0.): + super(SepConvLayer, self).__init__() + self.dw_conv = nn.Conv2D( + in_channels=in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + stride=1, + padding=padding, + groups=in_channels, + weight_attr=ParamAttr(regularizer=L2Decay(conv_decay)), + bias_attr=False) + + self.bn = nn.BatchNorm2D( + in_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.)), + bias_attr=ParamAttr(regularizer=L2Decay(0.))) + + self.pw_conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=ParamAttr(regularizer=L2Decay(conv_decay)), + bias_attr=False) + + def forward(self, x): + x = self.dw_conv(x) + x = F.relu6(self.bn(x)) + x = self.pw_conv(x) + return x + + +class SSDExtraHead(nn.Layer): + def __init__(self, + in_channels=256, + out_channels=([256, 512], [256, 512], [128, 256], [128, 256], + [128, 256]), + strides=(2, 2, 2, 1, 1), + paddings=(1, 1, 1, 0, 0)): + super(SSDExtraHead, self).__init__() + self.convs = nn.LayerList() + for out_channel, stride, padding in zip(out_channels, strides, + paddings): + self.convs.append( + self._make_layers(in_channels, out_channel[0], out_channel[1], + stride, padding)) + in_channels = out_channel[-1] + + def _make_layers(self, c_in, c_hidden, c_out, stride_3x3, padding_3x3): + return nn.Sequential( + nn.Conv2D(c_in, c_hidden, 1), + nn.ReLU(), + nn.Conv2D(c_hidden, c_out, 3, stride_3x3, padding_3x3), nn.ReLU()) + + def forward(self, x): + out = [x] + for conv_layer in self.convs: + out.append(conv_layer(out[-1])) + return out + + +@register +class SSDHead(nn.Layer): + """ + SSDHead + + Args: + num_classes (int): Number of classes + in_channels (list): Number of channels per input feature + anchor_generator (dict): Configuration of 'AnchorGeneratorSSD' instance + kernel_size (int): Conv kernel size + padding (int): Conv padding + use_sepconv (bool): Use SepConvLayer if true + conv_decay (float): Conv regularization coeff + loss (object): 'SSDLoss' instance + use_extra_head (bool): If use ResNet34 as baskbone, you should set `use_extra_head`=True + """ + + __shared__ = ['num_classes'] + __inject__ = ['anchor_generator', 'loss'] + + def __init__(self, + num_classes=80, + in_channels=(512, 1024, 512, 256, 256, 256), + anchor_generator=_get_class_default_kwargs(AnchorGeneratorSSD), + kernel_size=3, + padding=1, + use_sepconv=False, + conv_decay=0., + loss='SSDLoss', + use_extra_head=False): + super(SSDHead, self).__init__() + # add background class + self.num_classes = num_classes + 1 + self.in_channels = in_channels + self.anchor_generator = anchor_generator + self.loss = loss + self.use_extra_head = use_extra_head + + if self.use_extra_head: + self.ssd_extra_head = SSDExtraHead() + self.in_channels = [256, 512, 512, 256, 256, 256] + + if isinstance(anchor_generator, dict): + self.anchor_generator = AnchorGeneratorSSD(**anchor_generator) + + self.num_priors = self.anchor_generator.num_priors + self.box_convs = [] + self.score_convs = [] + for i, num_prior in enumerate(self.num_priors): + box_conv_name = "boxes{}".format(i) + if not use_sepconv: + box_conv = self.add_sublayer( + box_conv_name, + nn.Conv2D( + in_channels=self.in_channels[i], + out_channels=num_prior * 4, + kernel_size=kernel_size, + padding=padding)) + else: + box_conv = self.add_sublayer( + box_conv_name, + SepConvLayer( + in_channels=self.in_channels[i], + out_channels=num_prior * 4, + kernel_size=kernel_size, + padding=padding, + conv_decay=conv_decay)) + self.box_convs.append(box_conv) + + score_conv_name = "scores{}".format(i) + if not use_sepconv: + score_conv = self.add_sublayer( + score_conv_name, + nn.Conv2D( + in_channels=self.in_channels[i], + out_channels=num_prior * self.num_classes, + kernel_size=kernel_size, + padding=padding)) + else: + score_conv = self.add_sublayer( + score_conv_name, + SepConvLayer( + in_channels=self.in_channels[i], + out_channels=num_prior * self.num_classes, + kernel_size=kernel_size, + padding=padding, + conv_decay=conv_decay)) + self.score_convs.append(score_conv) + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def forward(self, feats, image, gt_bbox=None, gt_class=None): + if self.use_extra_head: + assert len(feats) == 1, \ + ("If you set use_extra_head=True, backbone feature " + "list length should be 1.") + feats = self.ssd_extra_head(feats[0]) + box_preds = [] + cls_scores = [] + for feat, box_conv, score_conv in zip(feats, self.box_convs, + self.score_convs): + box_pred = box_conv(feat) + box_pred = paddle.transpose(box_pred, [0, 2, 3, 1]) + box_pred = paddle.reshape(box_pred, [0, -1, 4]) + box_preds.append(box_pred) + + cls_score = score_conv(feat) + cls_score = paddle.transpose(cls_score, [0, 2, 3, 1]) + cls_score = paddle.reshape(cls_score, [0, -1, self.num_classes]) + cls_scores.append(cls_score) + + prior_boxes = self.anchor_generator(feats, image) + + if self.training: + return self.get_loss(box_preds, cls_scores, gt_bbox, gt_class, + prior_boxes) + else: + return (box_preds, cls_scores), prior_boxes + + def get_loss(self, boxes, scores, gt_bbox, gt_class, prior_boxes): + return self.loss(boxes, scores, gt_bbox, gt_class, prior_boxes) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/tood_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/tood_head.py new file mode 100644 index 0000000000000000000000000000000000000000..81b2edd7b720acedc48a9d8929d389457c3c1e9e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/tood_head.py @@ -0,0 +1,370 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant + +from ppdet.core.workspace import register +from ..initializer import normal_, constant_, bias_init_with_prob +from ppdet.modeling.bbox_utils import bbox_center, batch_distance2bbox +from ..losses import GIoULoss +from ppdet.modeling.layers import ConvNormLayer +from ppdet.modeling.ops import get_static_shape +from ppdet.modeling.assigners.utils import generate_anchors_for_grid_cell + + +class ScaleReg(nn.Layer): + """ + Parameter for scaling the regression outputs. + """ + + def __init__(self, init_scale=1.): + super(ScaleReg, self).__init__() + self.scale_reg = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=init_scale)), + dtype="float32") + + def forward(self, inputs): + out = inputs * self.scale_reg + return out + + +class TaskDecomposition(nn.Layer): + """This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/models/dense_heads/tood_head.py + """ + + def __init__( + self, + feat_channels, + stacked_convs, + la_down_rate=8, + norm_type='gn', + norm_groups=32, ): + super(TaskDecomposition, self).__init__() + self.feat_channels = feat_channels + self.stacked_convs = stacked_convs + self.norm_type = norm_type + self.norm_groups = norm_groups + self.in_channels = self.feat_channels * self.stacked_convs + self.la_conv1 = nn.Conv2D(self.in_channels, + self.in_channels // la_down_rate, 1) + self.la_conv2 = nn.Conv2D(self.in_channels // la_down_rate, + self.stacked_convs, 1) + + self.reduction_conv = ConvNormLayer( + self.in_channels, + self.feat_channels, + filter_size=1, + stride=1, + norm_type=self.norm_type, + norm_groups=self.norm_groups) + + self._init_weights() + + def _init_weights(self): + normal_(self.la_conv1.weight, std=0.001) + normal_(self.la_conv2.weight, std=0.001) + + def forward(self, feat, avg_feat): + b, _, h, w = get_static_shape(feat) + weight = F.relu(self.la_conv1(avg_feat)) + weight = F.sigmoid(self.la_conv2(weight)).unsqueeze(-1) + feat = paddle.reshape( + feat, [b, self.stacked_convs, self.feat_channels, h, w]) * weight + feat = self.reduction_conv(feat.flatten(1, 2)) + feat = F.relu(feat) + return feat + + +@register +class TOODHead(nn.Layer): + """This code is based on + https://github.com/fcjian/TOOD/blob/master/mmdet/models/dense_heads/tood_head.py + """ + __inject__ = ['nms', 'static_assigner', 'assigner'] + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + feat_channels=256, + stacked_convs=6, + fpn_strides=(8, 16, 32, 64, 128), + grid_cell_scale=8, + grid_cell_offset=0.5, + norm_type='gn', + norm_groups=32, + static_assigner_epoch=4, + use_align_head=True, + loss_weight={ + 'class': 1.0, + 'bbox': 1.0, + 'iou': 2.0, + }, + nms='MultiClassNMS', + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner'): + super(TOODHead, self).__init__() + self.num_classes = num_classes + self.feat_channels = feat_channels + self.stacked_convs = stacked_convs + self.fpn_strides = fpn_strides + self.grid_cell_scale = grid_cell_scale + self.grid_cell_offset = grid_cell_offset + self.static_assigner_epoch = static_assigner_epoch + self.use_align_head = use_align_head + self.nms = nms + self.static_assigner = static_assigner + self.assigner = assigner + self.loss_weight = loss_weight + self.giou_loss = GIoULoss() + + self.inter_convs = nn.LayerList() + for i in range(self.stacked_convs): + self.inter_convs.append( + ConvNormLayer( + self.feat_channels, + self.feat_channels, + filter_size=3, + stride=1, + norm_type=norm_type, + norm_groups=norm_groups)) + + self.cls_decomp = TaskDecomposition( + self.feat_channels, + self.stacked_convs, + self.stacked_convs * 8, + norm_type=norm_type, + norm_groups=norm_groups) + self.reg_decomp = TaskDecomposition( + self.feat_channels, + self.stacked_convs, + self.stacked_convs * 8, + norm_type=norm_type, + norm_groups=norm_groups) + + self.tood_cls = nn.Conv2D( + self.feat_channels, self.num_classes, 3, padding=1) + self.tood_reg = nn.Conv2D(self.feat_channels, 4, 3, padding=1) + + if self.use_align_head: + self.cls_prob_conv1 = nn.Conv2D(self.feat_channels * + self.stacked_convs, + self.feat_channels // 4, 1) + self.cls_prob_conv2 = nn.Conv2D( + self.feat_channels // 4, 1, 3, padding=1) + self.reg_offset_conv1 = nn.Conv2D(self.feat_channels * + self.stacked_convs, + self.feat_channels // 4, 1) + self.reg_offset_conv2 = nn.Conv2D( + self.feat_channels // 4, 4 * 2, 3, padding=1) + + self.scales_regs = nn.LayerList([ScaleReg() for _ in self.fpn_strides]) + + self._init_weights() + + @classmethod + def from_config(cls, cfg, input_shape): + return { + 'feat_channels': input_shape[0].channels, + 'fpn_strides': [i.stride for i in input_shape], + } + + def _init_weights(self): + bias_cls = bias_init_with_prob(0.01) + normal_(self.tood_cls.weight, std=0.01) + constant_(self.tood_cls.bias, bias_cls) + normal_(self.tood_reg.weight, std=0.01) + + if self.use_align_head: + normal_(self.cls_prob_conv1.weight, std=0.01) + normal_(self.cls_prob_conv2.weight, std=0.01) + constant_(self.cls_prob_conv2.bias, bias_cls) + normal_(self.reg_offset_conv1.weight, std=0.001) + constant_(self.reg_offset_conv2.weight) + constant_(self.reg_offset_conv2.bias) + + def _reg_grid_sample(self, feat, offset, anchor_points): + b, _, h, w = get_static_shape(feat) + feat = paddle.reshape(feat, [-1, 1, h, w]) + offset = paddle.reshape(offset, [-1, 2, h, w]).transpose([0, 2, 3, 1]) + grid_shape = paddle.concat([w, h]).astype('float32') + grid = (offset + anchor_points) / grid_shape + grid = 2 * grid.clip(0., 1.) - 1 + feat = F.grid_sample(feat, grid) + feat = paddle.reshape(feat, [b, -1, h, w]) + return feat + + def forward(self, feats): + assert len(feats) == len(self.fpn_strides), \ + "The size of feats is not equal to size of fpn_strides" + + anchors, anchor_points, num_anchors_list, stride_tensor =\ + generate_anchors_for_grid_cell( + feats, self.fpn_strides, self.grid_cell_scale, + self.grid_cell_offset) + anchor_centers_split = paddle.split(anchor_points / stride_tensor, + num_anchors_list) + + cls_score_list, bbox_pred_list = [], [] + for feat, scale_reg, anchor_centers, stride in zip( + feats, self.scales_regs, anchor_centers_split, + self.fpn_strides): + b, _, h, w = get_static_shape(feat) + inter_feats = [] + for inter_conv in self.inter_convs: + feat = F.relu(inter_conv(feat)) + inter_feats.append(feat) + feat = paddle.concat(inter_feats, axis=1) + + # task decomposition + avg_feat = F.adaptive_avg_pool2d(feat, (1, 1)) + cls_feat = self.cls_decomp(feat, avg_feat) + reg_feat = self.reg_decomp(feat, avg_feat) + + # cls prediction and alignment + cls_logits = self.tood_cls(cls_feat) + if self.use_align_head: + cls_prob = F.relu(self.cls_prob_conv1(feat)) + cls_prob = F.sigmoid(self.cls_prob_conv2(cls_prob)) + cls_score = (F.sigmoid(cls_logits) * cls_prob).sqrt() + else: + cls_score = F.sigmoid(cls_logits) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + + # reg prediction and alignment + reg_dist = scale_reg(self.tood_reg(reg_feat).exp()) + reg_dist = reg_dist.flatten(2).transpose([0, 2, 1]) + reg_bbox = batch_distance2bbox( + anchor_centers.unsqueeze(0), reg_dist) + if self.use_align_head: + reg_offset = F.relu(self.reg_offset_conv1(feat)) + reg_offset = self.reg_offset_conv2(reg_offset) + reg_bbox = reg_bbox.transpose([0, 2, 1]).reshape([b, 4, h, w]) + anchor_centers = anchor_centers.reshape([1, h, w, 2]) + bbox_pred = self._reg_grid_sample(reg_bbox, reg_offset, + anchor_centers) + bbox_pred = bbox_pred.flatten(2).transpose([0, 2, 1]) + else: + bbox_pred = reg_bbox + + if not self.training: + bbox_pred *= stride + bbox_pred_list.append(bbox_pred) + cls_score_list = paddle.concat(cls_score_list, axis=1) + bbox_pred_list = paddle.concat(bbox_pred_list, axis=1) + + return cls_score_list, bbox_pred_list, anchors, num_anchors_list, stride_tensor + + @staticmethod + def _focal_loss(score, label, alpha=0.25, gamma=2.0): + weight = (score - label).pow(gamma) + if alpha > 0: + alpha_t = alpha * label + (1 - alpha) * (1 - label) + weight *= alpha_t + loss = F.binary_cross_entropy( + score, label, weight=weight, reduction='sum') + return loss + + def get_loss(self, head_outs, gt_meta): + pred_scores, pred_bboxes, anchors, \ + num_anchors_list, stride_tensor = head_outs + gt_labels = gt_meta['gt_class'] + gt_bboxes = gt_meta['gt_bbox'] + pad_gt_mask = gt_meta['pad_gt_mask'] + # label assignment + if gt_meta['epoch_id'] < self.static_assigner_epoch: + assigned_labels, assigned_bboxes, assigned_scores, _ = self.static_assigner( + anchors, + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + alpha_l = 0.25 + else: + assigned_labels, assigned_bboxes, assigned_scores, _ = self.assigner( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, + bbox_center(anchors), + num_anchors_list, + gt_labels, + gt_bboxes, + pad_gt_mask, + bg_index=self.num_classes) + alpha_l = -1 + + # rescale bbox + assigned_bboxes /= stride_tensor + # classification loss + loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha=alpha_l) + # select positive samples mask + mask_positive = (assigned_labels != self.num_classes) + num_pos = mask_positive.astype(paddle.float32).sum() + # bbox regression loss + if num_pos > 0: + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 4]) + pred_bboxes_pos = paddle.masked_select(pred_bboxes, + bbox_mask).reshape([-1, 4]) + assigned_bboxes_pos = paddle.masked_select( + assigned_bboxes, bbox_mask).reshape([-1, 4]) + bbox_weight = paddle.masked_select( + assigned_scores.sum(-1), mask_positive).unsqueeze(-1) + # iou loss + loss_iou = self.giou_loss(pred_bboxes_pos, + assigned_bboxes_pos) * bbox_weight + loss_iou = loss_iou.sum() / bbox_weight.sum() + # l1 loss + loss_l1 = F.l1_loss(pred_bboxes_pos, assigned_bboxes_pos) + else: + loss_iou = paddle.zeros([1]) + loss_l1 = paddle.zeros([1]) + + loss_cls /= assigned_scores.sum().clip(min=1) + loss = self.loss_weight['class'] * loss_cls + self.loss_weight[ + 'iou'] * loss_iou + + return { + 'loss': loss, + 'loss_class': loss_cls, + 'loss_iou': loss_iou, + 'loss_l1': loss_l1 + } + + def post_process(self, head_outs, img_shape, scale_factor): + pred_scores, pred_bboxes, _, _, _ = head_outs + pred_scores = pred_scores.transpose([0, 2, 1]) + + for i in range(len(pred_bboxes)): + pred_bboxes[i, :, 0] = pred_bboxes[i, :, 0].clip( + min=0, max=img_shape[i, 1]) + pred_bboxes[i, :, 1] = pred_bboxes[i, :, 1].clip( + min=0, max=img_shape[i, 0]) + pred_bboxes[i, :, 2] = pred_bboxes[i, :, 2].clip( + min=0, max=img_shape[i, 1]) + pred_bboxes[i, :, 3] = pred_bboxes[i, :, 3].clip( + min=0, max=img_shape[i, 0]) + # scale bbox to origin + scale_factor = scale_factor.flip([1]).tile([1, 2]).unsqueeze(1) + pred_bboxes /= scale_factor + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/ttf_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/ttf_head.py new file mode 100644 index 0000000000000000000000000000000000000000..dfe97bdb715c613618c78f218ca5b4f99cedaf94 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/ttf_head.py @@ -0,0 +1,311 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant, Normal +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register +from ppdet.modeling.layers import DeformableConvV2, LiteConv +import numpy as np + + +@register +class HMHead(nn.Layer): + """ + Args: + ch_in (int): The channel number of input Tensor. + ch_out (int): The channel number of output Tensor. + num_classes (int): Number of classes. + conv_num (int): The convolution number of hm_feat. + dcn_head(bool): whether use dcn in head. False by default. + lite_head(bool): whether use lite version. False by default. + norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional. + bn by default + + Return: + Heatmap head output + """ + __shared__ = ['num_classes', 'norm_type'] + + def __init__( + self, + ch_in, + ch_out=128, + num_classes=80, + conv_num=2, + dcn_head=False, + lite_head=False, + norm_type='bn', ): + super(HMHead, self).__init__() + head_conv = nn.Sequential() + for i in range(conv_num): + name = 'conv.{}'.format(i) + if lite_head: + lite_name = 'hm.' + name + head_conv.add_sublayer( + lite_name, + LiteConv( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + norm_type=norm_type)) + else: + if dcn_head: + head_conv.add_sublayer( + name, + DeformableConvV2( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + kernel_size=3, + weight_attr=ParamAttr(initializer=Normal(0, 0.01)))) + else: + head_conv.add_sublayer( + name, + nn.Conv2D( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0, 0.01)), + bias_attr=ParamAttr( + learning_rate=2., regularizer=L2Decay(0.)))) + head_conv.add_sublayer(name + '.act', nn.ReLU()) + self.feat = head_conv + bias_init = float(-np.log((1 - 0.01) / 0.01)) + weight_attr = None if lite_head else ParamAttr(initializer=Normal(0, + 0.01)) + self.head = nn.Conv2D( + in_channels=ch_out, + out_channels=num_classes, + kernel_size=1, + weight_attr=weight_attr, + bias_attr=ParamAttr( + learning_rate=2., + regularizer=L2Decay(0.), + initializer=Constant(bias_init))) + + def forward(self, feat): + out = self.feat(feat) + out = self.head(out) + return out + + +@register +class WHHead(nn.Layer): + """ + Args: + ch_in (int): The channel number of input Tensor. + ch_out (int): The channel number of output Tensor. + conv_num (int): The convolution number of wh_feat. + dcn_head(bool): whether use dcn in head. False by default. + lite_head(bool): whether use lite version. False by default. + norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional. + bn by default + Return: + Width & Height head output + """ + __shared__ = ['norm_type'] + + def __init__(self, + ch_in, + ch_out=64, + conv_num=2, + dcn_head=False, + lite_head=False, + norm_type='bn'): + super(WHHead, self).__init__() + head_conv = nn.Sequential() + for i in range(conv_num): + name = 'conv.{}'.format(i) + if lite_head: + lite_name = 'wh.' + name + head_conv.add_sublayer( + lite_name, + LiteConv( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + norm_type=norm_type)) + else: + if dcn_head: + head_conv.add_sublayer( + name, + DeformableConvV2( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + kernel_size=3, + weight_attr=ParamAttr(initializer=Normal(0, 0.01)))) + else: + head_conv.add_sublayer( + name, + nn.Conv2D( + in_channels=ch_in if i == 0 else ch_out, + out_channels=ch_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0, 0.01)), + bias_attr=ParamAttr( + learning_rate=2., regularizer=L2Decay(0.)))) + head_conv.add_sublayer(name + '.act', nn.ReLU()) + + weight_attr = None if lite_head else ParamAttr(initializer=Normal(0, + 0.01)) + self.feat = head_conv + self.head = nn.Conv2D( + in_channels=ch_out, + out_channels=4, + kernel_size=1, + weight_attr=weight_attr, + bias_attr=ParamAttr( + learning_rate=2., regularizer=L2Decay(0.))) + + def forward(self, feat): + out = self.feat(feat) + out = self.head(out) + out = F.relu(out) + return out + + +@register +class TTFHead(nn.Layer): + """ + TTFHead + Args: + in_channels (int): the channel number of input to TTFHead. + num_classes (int): the number of classes, 80 by default. + hm_head_planes (int): the channel number in heatmap head, + 128 by default. + wh_head_planes (int): the channel number in width & height head, + 64 by default. + hm_head_conv_num (int): the number of convolution in heatmap head, + 2 by default. + wh_head_conv_num (int): the number of convolution in width & height + head, 2 by default. + hm_loss (object): Instance of 'CTFocalLoss'. + wh_loss (object): Instance of 'GIoULoss'. + wh_offset_base (float): the base offset of width and height, + 16.0 by default. + down_ratio (int): the actual down_ratio is calculated by base_down_ratio + (default 16) and the number of upsample layers. + lite_head(bool): whether use lite version. False by default. + norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional. + bn by default + ags_module(bool): whether use AGS module to reweight location feature. + false by default. + + """ + + __shared__ = ['num_classes', 'down_ratio', 'norm_type'] + __inject__ = ['hm_loss', 'wh_loss'] + + def __init__(self, + in_channels, + num_classes=80, + hm_head_planes=128, + wh_head_planes=64, + hm_head_conv_num=2, + wh_head_conv_num=2, + hm_loss='CTFocalLoss', + wh_loss='GIoULoss', + wh_offset_base=16., + down_ratio=4, + dcn_head=False, + lite_head=False, + norm_type='bn', + ags_module=False): + super(TTFHead, self).__init__() + self.in_channels = in_channels + self.hm_head = HMHead(in_channels, hm_head_planes, num_classes, + hm_head_conv_num, dcn_head, lite_head, norm_type) + self.wh_head = WHHead(in_channels, wh_head_planes, wh_head_conv_num, + dcn_head, lite_head, norm_type) + self.hm_loss = hm_loss + self.wh_loss = wh_loss + + self.wh_offset_base = wh_offset_base + self.down_ratio = down_ratio + self.ags_module = ags_module + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channels': input_shape.channels, } + + def forward(self, feats): + hm = self.hm_head(feats) + wh = self.wh_head(feats) * self.wh_offset_base + return hm, wh + + def filter_box_by_weight(self, pred, target, weight): + """ + Filter out boxes where ttf_reg_weight is 0, only keep positive samples. + """ + index = paddle.nonzero(weight > 0) + index.stop_gradient = True + weight = paddle.gather_nd(weight, index) + pred = paddle.gather_nd(pred, index) + target = paddle.gather_nd(target, index) + return pred, target, weight + + def filter_loc_by_weight(self, score, weight): + index = paddle.nonzero(weight > 0) + index.stop_gradient = True + score = paddle.gather_nd(score, index) + return score + + def get_loss(self, pred_hm, pred_wh, target_hm, box_target, target_weight): + pred_hm = paddle.clip(F.sigmoid(pred_hm), 1e-4, 1 - 1e-4) + hm_loss = self.hm_loss(pred_hm, target_hm) + H, W = target_hm.shape[2:] + mask = paddle.reshape(target_weight, [-1, H, W]) + avg_factor = paddle.sum(mask) + 1e-4 + + base_step = self.down_ratio + shifts_x = paddle.arange(0, W * base_step, base_step, dtype='int32') + shifts_y = paddle.arange(0, H * base_step, base_step, dtype='int32') + shift_y, shift_x = paddle.tensor.meshgrid([shifts_y, shifts_x]) + base_loc = paddle.stack([shift_x, shift_y], axis=0) + base_loc.stop_gradient = True + + pred_boxes = paddle.concat( + [0 - pred_wh[:, 0:2, :, :] + base_loc, pred_wh[:, 2:4] + base_loc], + axis=1) + pred_boxes = paddle.transpose(pred_boxes, [0, 2, 3, 1]) + boxes = paddle.transpose(box_target, [0, 2, 3, 1]) + boxes.stop_gradient = True + + if self.ags_module: + pred_hm_max = paddle.max(pred_hm, axis=1, keepdim=True) + pred_hm_max_softmax = F.softmax(pred_hm_max, axis=1) + pred_hm_max_softmax = paddle.transpose(pred_hm_max_softmax, + [0, 2, 3, 1]) + pred_hm_max_softmax = self.filter_loc_by_weight(pred_hm_max_softmax, + mask) + else: + pred_hm_max_softmax = None + + pred_boxes, boxes, mask = self.filter_box_by_weight(pred_boxes, boxes, + mask) + mask.stop_gradient = True + wh_loss = self.wh_loss( + pred_boxes, + boxes, + iou_weight=mask.unsqueeze(1), + loc_reweight=pred_hm_max_softmax) + wh_loss = wh_loss / avg_factor + + ttf_loss = {'hm_loss': hm_loss, 'wh_loss': wh_loss} + return ttf_loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/yolo_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/yolo_head.py new file mode 100644 index 0000000000000000000000000000000000000000..0a63060d02aab1d20901ab7c4422d58e55166c3d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/yolo_head.py @@ -0,0 +1,416 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register + +import math +import numpy as np +from ..initializer import bias_init_with_prob, constant_ +from ..backbones.csp_darknet import BaseConv, DWConv +from ..losses import IouLoss +from ppdet.modeling.assigners.simota_assigner import SimOTAAssigner +from ppdet.modeling.bbox_utils import bbox_overlaps +from ppdet.modeling.layers import MultiClassNMS + +__all__ = ['YOLOv3Head', 'YOLOXHead'] + + +def _de_sigmoid(x, eps=1e-7): + x = paddle.clip(x, eps, 1. / eps) + x = paddle.clip(1. / x - 1., eps, 1. / eps) + x = -paddle.log(x) + return x + + +@register +class YOLOv3Head(nn.Layer): + __shared__ = ['num_classes', 'data_format'] + __inject__ = ['loss'] + + def __init__(self, + in_channels=[1024, 512, 256], + anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], + [59, 119], [116, 90], [156, 198], [373, 326]], + anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]], + num_classes=80, + loss='YOLOv3Loss', + iou_aware=False, + iou_aware_factor=0.4, + data_format='NCHW'): + """ + Head for YOLOv3 network + + Args: + num_classes (int): number of foreground classes + anchors (list): anchors + anchor_masks (list): anchor masks + loss (object): YOLOv3Loss instance + iou_aware (bool): whether to use iou_aware + iou_aware_factor (float): iou aware factor + data_format (str): data format, NCHW or NHWC + """ + super(YOLOv3Head, self).__init__() + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + self.num_classes = num_classes + self.loss = loss + + self.iou_aware = iou_aware + self.iou_aware_factor = iou_aware_factor + + self.parse_anchor(anchors, anchor_masks) + self.num_outputs = len(self.anchors) + self.data_format = data_format + + self.yolo_outputs = [] + for i in range(len(self.anchors)): + + if self.iou_aware: + num_filters = len(self.anchors[i]) * (self.num_classes + 6) + else: + num_filters = len(self.anchors[i]) * (self.num_classes + 5) + name = 'yolo_output.{}'.format(i) + conv = nn.Conv2D( + in_channels=self.in_channels[i], + out_channels=num_filters, + kernel_size=1, + stride=1, + padding=0, + data_format=data_format, + bias_attr=ParamAttr(regularizer=L2Decay(0.))) + conv.skip_quant = True + yolo_output = self.add_sublayer(name, conv) + self.yolo_outputs.append(yolo_output) + + def parse_anchor(self, anchors, anchor_masks): + self.anchors = [[anchors[i] for i in mask] for mask in anchor_masks] + self.mask_anchors = [] + anchor_num = len(anchors) + for masks in anchor_masks: + self.mask_anchors.append([]) + for mask in masks: + assert mask < anchor_num, "anchor mask index overflow" + self.mask_anchors[-1].extend(anchors[mask]) + + def forward(self, feats, targets=None): + assert len(feats) == len(self.anchors) + yolo_outputs = [] + for i, feat in enumerate(feats): + yolo_output = self.yolo_outputs[i](feat) + if self.data_format == 'NHWC': + yolo_output = paddle.transpose(yolo_output, [0, 3, 1, 2]) + yolo_outputs.append(yolo_output) + + if self.training: + return self.loss(yolo_outputs, targets, self.anchors) + else: + if self.iou_aware: + y = [] + for i, out in enumerate(yolo_outputs): + na = len(self.anchors[i]) + ioup, x = out[:, 0:na, :, :], out[:, na:, :, :] + b, c, h, w = x.shape + no = c // na + x = x.reshape((b, na, no, h * w)) + ioup = ioup.reshape((b, na, 1, h * w)) + obj = x[:, :, 4:5, :] + ioup = F.sigmoid(ioup) + obj = F.sigmoid(obj) + obj_t = (obj**(1 - self.iou_aware_factor)) * ( + ioup**self.iou_aware_factor) + obj_t = _de_sigmoid(obj_t) + loc_t = x[:, :, :4, :] + cls_t = x[:, :, 5:, :] + y_t = paddle.concat([loc_t, obj_t, cls_t], axis=2) + y_t = y_t.reshape((b, c, h, w)) + y.append(y_t) + return y + else: + return yolo_outputs + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + +@register +class YOLOXHead(nn.Layer): + __shared__ = ['num_classes', 'width_mult', 'act', 'trt', 'exclude_nms'] + __inject__ = ['assigner', 'nms'] + + def __init__(self, + num_classes=80, + width_mult=1.0, + depthwise=False, + in_channels=[256, 512, 1024], + feat_channels=256, + fpn_strides=(8, 16, 32), + l1_epoch=285, + act='silu', + assigner=SimOTAAssigner(use_vfl=False), + nms='MultiClassNMS', + loss_weight={ + 'cls': 1.0, + 'obj': 1.0, + 'iou': 5.0, + 'l1': 1.0, + }, + trt=False, + exclude_nms=False): + super(YOLOXHead, self).__init__() + self._dtype = paddle.framework.get_default_dtype() + self.num_classes = num_classes + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + feat_channels = int(feat_channels * width_mult) + self.fpn_strides = fpn_strides + self.l1_epoch = l1_epoch + self.assigner = assigner + self.nms = nms + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt + self.exclude_nms = exclude_nms + self.loss_weight = loss_weight + self.iou_loss = IouLoss(loss_weight=1.0) # default loss_weight 2.5 + + ConvBlock = DWConv if depthwise else BaseConv + + self.stem_conv = nn.LayerList() + self.conv_cls = nn.LayerList() + self.conv_reg = nn.LayerList() # reg [x,y,w,h] + obj + for in_c in self.in_channels: + self.stem_conv.append(BaseConv(in_c, feat_channels, 1, 1, act=act)) + + self.conv_cls.append( + nn.Sequential(* [ + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + nn.Conv2D( + feat_channels, + self.num_classes, + 1, + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + ])) + + self.conv_reg.append( + nn.Sequential(* [ + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + nn.Conv2D( + feat_channels, + 4 + 1, # reg [x,y,w,h] + obj + 1, + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + ])) + + self._init_weights() + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def _init_weights(self): + bias_cls = bias_init_with_prob(0.01) + bias_reg = paddle.full([5], math.log(5.), dtype=self._dtype) + bias_reg[:2] = 0. + bias_reg[-1] = bias_cls + for cls_, reg_ in zip(self.conv_cls, self.conv_reg): + constant_(cls_[-1].weight) + constant_(cls_[-1].bias, bias_cls) + constant_(reg_[-1].weight) + reg_[-1].bias.set_value(bias_reg) + + def _generate_anchor_point(self, feat_sizes, strides, offset=0.): + anchor_points, stride_tensor = [], [] + num_anchors_list = [] + for feat_size, stride in zip(feat_sizes, strides): + h, w = feat_size + x = (paddle.arange(w) + offset) * stride + y = (paddle.arange(h) + offset) * stride + y, x = paddle.meshgrid(y, x) + anchor_points.append(paddle.stack([x, y], axis=-1).reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [len(anchor_points[-1]), 1], stride, dtype=self._dtype)) + num_anchors_list.append(len(anchor_points[-1])) + anchor_points = paddle.concat(anchor_points).astype(self._dtype) + anchor_points.stop_gradient = True + stride_tensor = paddle.concat(stride_tensor) + stride_tensor.stop_gradient = True + return anchor_points, stride_tensor, num_anchors_list + + def forward(self, feats, targets=None): + assert len(feats) == len(self.fpn_strides), \ + "The size of feats is not equal to size of fpn_strides" + + feat_sizes = [[f.shape[-2], f.shape[-1]] for f in feats] + cls_score_list, reg_pred_list = [], [] + obj_score_list = [] + for i, feat in enumerate(feats): + feat = self.stem_conv[i](feat) + cls_logit = self.conv_cls[i](feat) + reg_pred = self.conv_reg[i](feat) + # cls prediction + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + # reg prediction + reg_xywh, obj_logit = paddle.split(reg_pred, [4, 1], axis=1) + reg_xywh = reg_xywh.flatten(2).transpose([0, 2, 1]) + reg_pred_list.append(reg_xywh) + # obj prediction + obj_score = F.sigmoid(obj_logit) + obj_score_list.append(obj_score.flatten(2).transpose([0, 2, 1])) + + cls_score_list = paddle.concat(cls_score_list, axis=1) + reg_pred_list = paddle.concat(reg_pred_list, axis=1) + obj_score_list = paddle.concat(obj_score_list, axis=1) + + # bbox decode + anchor_points, stride_tensor, _ =\ + self._generate_anchor_point(feat_sizes, self.fpn_strides) + reg_xy, reg_wh = paddle.split(reg_pred_list, 2, axis=-1) + reg_xy += (anchor_points / stride_tensor) + reg_wh = paddle.exp(reg_wh) * 0.5 + bbox_pred_list = paddle.concat( + [reg_xy - reg_wh, reg_xy + reg_wh], axis=-1) + + if self.training: + anchor_points, stride_tensor, num_anchors_list =\ + self._generate_anchor_point(feat_sizes, self.fpn_strides, 0.5) + yolox_losses = self.get_loss([ + cls_score_list, bbox_pred_list, obj_score_list, anchor_points, + stride_tensor, num_anchors_list + ], targets) + return yolox_losses + else: + pred_scores = (cls_score_list * obj_score_list).sqrt() + return pred_scores, bbox_pred_list, stride_tensor + + def get_loss(self, head_outs, targets): + pred_cls, pred_bboxes, pred_obj,\ + anchor_points, stride_tensor, num_anchors_list = head_outs + gt_labels = targets['gt_class'] + gt_bboxes = targets['gt_bbox'] + pred_scores = (pred_cls * pred_obj).sqrt() + # label assignment + center_and_strides = paddle.concat( + [anchor_points, stride_tensor, stride_tensor], axis=-1) + pos_num_list, label_list, bbox_target_list = [], [], [] + for pred_score, pred_bbox, gt_box, gt_label in zip( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, gt_bboxes, gt_labels): + pos_num, label, _, bbox_target = self.assigner( + pred_score, center_and_strides, pred_bbox, gt_box, gt_label) + pos_num_list.append(pos_num) + label_list.append(label) + bbox_target_list.append(bbox_target) + labels = paddle.to_tensor(np.stack(label_list, axis=0)) + bbox_targets = paddle.to_tensor(np.stack(bbox_target_list, axis=0)) + bbox_targets /= stride_tensor # rescale bbox + + # 1. obj score loss + mask_positive = (labels != self.num_classes) + loss_obj = F.binary_cross_entropy( + pred_obj, + mask_positive.astype(pred_obj.dtype).unsqueeze(-1), + reduction='sum') + + num_pos = sum(pos_num_list) + + if num_pos > 0: + num_pos = paddle.to_tensor(num_pos, dtype=self._dtype).clip(min=1) + loss_obj /= num_pos + + # 2. iou loss + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 4]) + pred_bboxes_pos = paddle.masked_select(pred_bboxes, + bbox_mask).reshape([-1, 4]) + assigned_bboxes_pos = paddle.masked_select( + bbox_targets, bbox_mask).reshape([-1, 4]) + bbox_iou = bbox_overlaps(pred_bboxes_pos, assigned_bboxes_pos) + bbox_iou = paddle.diag(bbox_iou) + + loss_iou = self.iou_loss( + pred_bboxes_pos.split( + 4, axis=-1), + assigned_bboxes_pos.split( + 4, axis=-1)) + loss_iou = loss_iou.sum() / num_pos + + # 3. cls loss + cls_mask = mask_positive.unsqueeze(-1).tile( + [1, 1, self.num_classes]) + pred_cls_pos = paddle.masked_select( + pred_cls, cls_mask).reshape([-1, self.num_classes]) + assigned_cls_pos = paddle.masked_select(labels, mask_positive) + assigned_cls_pos = F.one_hot(assigned_cls_pos, + self.num_classes + 1)[..., :-1] + assigned_cls_pos *= bbox_iou.unsqueeze(-1) + loss_cls = F.binary_cross_entropy( + pred_cls_pos, assigned_cls_pos, reduction='sum') + loss_cls /= num_pos + + # 4. l1 loss + if targets['epoch_id'] >= self.l1_epoch: + loss_l1 = F.l1_loss( + pred_bboxes_pos, assigned_bboxes_pos, reduction='sum') + loss_l1 /= num_pos + else: + loss_l1 = paddle.zeros([1]) + loss_l1.stop_gradient = False + else: + loss_cls = paddle.zeros([1]) + loss_iou = paddle.zeros([1]) + loss_l1 = paddle.zeros([1]) + loss_cls.stop_gradient = False + loss_iou.stop_gradient = False + loss_l1.stop_gradient = False + + loss = self.loss_weight['obj'] * loss_obj + \ + self.loss_weight['cls'] * loss_cls + \ + self.loss_weight['iou'] * loss_iou + + if targets['epoch_id'] >= self.l1_epoch: + loss += (self.loss_weight['l1'] * loss_l1) + + yolox_losses = { + 'loss': loss, + 'loss_cls': loss_cls, + 'loss_obj': loss_obj, + 'loss_iou': loss_iou, + 'loss_l1': loss_l1, + } + return yolox_losses + + def post_process(self, head_outs, img_shape, scale_factor): + pred_scores, pred_bboxes, stride_tensor = head_outs + pred_scores = pred_scores.transpose([0, 2, 1]) + pred_bboxes *= stride_tensor + # scale bbox to origin image + scale_factor = scale_factor.flip(-1).tile([1, 2]).unsqueeze(1) + pred_bboxes /= scale_factor + if self.exclude_nms: + # `exclude_nms=True` just use in benchmark + return pred_bboxes.sum(), pred_scores.sum() + else: + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/heads/yolof_head.py b/PaddleDetection-release-2.6/ppdet/modeling/heads/yolof_head.py new file mode 100644 index 0000000000000000000000000000000000000000..4893337366e5bd9e828bc08ad9b2e41f0002fa98 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/heads/yolof_head.py @@ -0,0 +1,368 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Normal, Constant + +from ppdet.modeling.layers import MultiClassNMS +from ppdet.core.workspace import register +from ppdet.modeling.bbox_utils import delta2bbox_v2 + +__all__ = ['YOLOFHead'] + +INF = 1e8 + + +def reduce_mean(tensor): + world_size = paddle.distributed.get_world_size() + if world_size == 1: + return tensor + paddle.distributed.all_reduce(tensor) + return tensor / world_size + + +def find_inside_anchor(feat_size, stride, num_anchors, im_shape): + feat_h, feat_w = feat_size[:2] + im_h, im_w = im_shape[:2] + inside_h = min(int(np.ceil(im_h / stride)), feat_h) + inside_w = min(int(np.ceil(im_w / stride)), feat_w) + inside_mask = paddle.zeros([feat_h, feat_w], dtype=paddle.bool) + inside_mask[:inside_h, :inside_w] = True + inside_mask = inside_mask.unsqueeze(-1).expand( + [feat_h, feat_w, num_anchors]) + return inside_mask.reshape([-1]) + + +@register +class YOLOFFeat(nn.Layer): + def __init__(self, + feat_in=256, + feat_out=256, + num_cls_convs=2, + num_reg_convs=4, + norm_type='bn'): + super(YOLOFFeat, self).__init__() + assert norm_type == 'bn', "YOLOFFeat only support BN now." + self.feat_in = feat_in + self.feat_out = feat_out + self.num_cls_convs = num_cls_convs + self.num_reg_convs = num_reg_convs + self.norm_type = norm_type + + cls_subnet, reg_subnet = [], [] + for i in range(self.num_cls_convs): + feat_in = self.feat_in if i == 0 else self.feat_out + cls_subnet.append( + nn.Conv2D( + feat_in, + self.feat_out, + 3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0.0)))) + cls_subnet.append( + nn.BatchNorm2D( + self.feat_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0)))) + cls_subnet.append(nn.ReLU()) + + for i in range(self.num_reg_convs): + feat_in = self.feat_in if i == 0 else self.feat_out + reg_subnet.append( + nn.Conv2D( + feat_in, + self.feat_out, + 3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0.0)))) + reg_subnet.append( + nn.BatchNorm2D( + self.feat_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0)))) + reg_subnet.append(nn.ReLU()) + + self.cls_subnet = nn.Sequential(*cls_subnet) + self.reg_subnet = nn.Sequential(*reg_subnet) + + def forward(self, fpn_feat): + cls_feat = self.cls_subnet(fpn_feat) + reg_feat = self.reg_subnet(fpn_feat) + return cls_feat, reg_feat + + +@register +class YOLOFHead(nn.Layer): + __shared__ = ['num_classes', 'trt', 'exclude_nms'] + __inject__ = [ + 'conv_feat', 'anchor_generator', 'bbox_assigner', 'loss_class', + 'loss_bbox', 'nms' + ] + + def __init__(self, + num_classes=80, + conv_feat='YOLOFFeat', + anchor_generator='AnchorGenerator', + bbox_assigner='UniformAssigner', + loss_class='FocalLoss', + loss_bbox='GIoULoss', + ctr_clip=32.0, + delta_mean=[0.0, 0.0, 0.0, 0.0], + delta_std=[1.0, 1.0, 1.0, 1.0], + nms='MultiClassNMS', + prior_prob=0.01, + nms_pre=1000, + use_inside_anchor=False, + trt=False, + exclude_nms=False): + super(YOLOFHead, self).__init__() + self.num_classes = num_classes + self.conv_feat = conv_feat + self.anchor_generator = anchor_generator + self.na = self.anchor_generator.num_anchors + self.bbox_assigner = bbox_assigner + self.loss_class = loss_class + self.loss_bbox = loss_bbox + self.ctr_clip = ctr_clip + self.delta_mean = delta_mean + self.delta_std = delta_std + self.nms = nms + self.nms_pre = nms_pre + self.use_inside_anchor = use_inside_anchor + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt + self.exclude_nms = exclude_nms + + bias_init_value = -math.log((1 - prior_prob) / prior_prob) + self.cls_score = self.add_sublayer( + 'cls_score', + nn.Conv2D( + in_channels=conv_feat.feat_out, + out_channels=self.num_classes * self.na, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant( + value=bias_init_value)))) + + self.bbox_pred = self.add_sublayer( + 'bbox_pred', + nn.Conv2D( + in_channels=conv_feat.feat_out, + out_channels=4 * self.na, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + self.object_pred = self.add_sublayer( + 'object_pred', + nn.Conv2D( + in_channels=conv_feat.feat_out, + out_channels=self.na, + kernel_size=3, + stride=1, + padding=1, + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(value=0)))) + + def forward(self, feats, targets=None): + assert len(feats) == 1, "YOLOF only has one level feature." + conv_cls_feat, conv_reg_feat = self.conv_feat(feats[0]) + cls_logits = self.cls_score(conv_cls_feat) + objectness = self.object_pred(conv_reg_feat) + bboxes_reg = self.bbox_pred(conv_reg_feat) + + N, C, H, W = paddle.shape(cls_logits)[:] + cls_logits = cls_logits.reshape((N, self.na, self.num_classes, H, W)) + objectness = objectness.reshape((N, self.na, 1, H, W)) + norm_cls_logits = cls_logits + objectness - paddle.log( + 1.0 + paddle.clip( + cls_logits.exp(), max=INF) + paddle.clip( + objectness.exp(), max=INF)) + norm_cls_logits = norm_cls_logits.reshape((N, C, H, W)) + + anchors = self.anchor_generator([norm_cls_logits]) + + if self.training: + yolof_losses = self.get_loss( + [anchors[0], norm_cls_logits, bboxes_reg], targets) + return yolof_losses + else: + return anchors[0], norm_cls_logits, bboxes_reg + + def get_loss(self, head_outs, targets): + anchors, cls_logits, bbox_preds = head_outs + + feat_size = cls_logits.shape[-2:] + cls_logits = cls_logits.transpose([0, 2, 3, 1]) + cls_logits = cls_logits.reshape([0, -1, self.num_classes]) + bbox_preds = bbox_preds.transpose([0, 2, 3, 1]) + bbox_preds = bbox_preds.reshape([0, -1, 4]) + + num_pos_list = [] + cls_pred_list, cls_tar_list = [], [] + reg_pred_list, reg_tar_list = [], [] + # find and gather preds and targets in each image + for cls_logit, bbox_pred, gt_bbox, gt_class, im_shape in zip( + cls_logits, bbox_preds, targets['gt_bbox'], targets['gt_class'], + targets['im_shape']): + if self.use_inside_anchor: + inside_mask = find_inside_anchor( + feat_size, self.anchor_generator.strides[0], self.na, + im_shape.tolist()) + cls_logit = cls_logit[inside_mask] + bbox_pred = bbox_pred[inside_mask] + anchors = anchors[inside_mask] + + bbox_pred = delta2bbox_v2( + bbox_pred, + anchors, + self.delta_mean, + self.delta_std, + ctr_clip=self.ctr_clip) + bbox_pred = bbox_pred.reshape([-1, bbox_pred.shape[-1]]) + + # -2:ignore, -1:neg, >=0:pos + match_labels, pos_bbox_pred, pos_bbox_tar = self.bbox_assigner( + bbox_pred, anchors, gt_bbox) + pos_mask = (match_labels >= 0) + neg_mask = (match_labels == -1) + chosen_mask = paddle.logical_or(pos_mask, neg_mask) + + gt_class = gt_class.reshape([-1]) + bg_class = paddle.to_tensor( + [self.num_classes], dtype=gt_class.dtype) + # a trick to assign num_classes to negative targets + gt_class = paddle.concat([gt_class, bg_class], axis=-1) + match_labels = paddle.where( + neg_mask, + paddle.full_like(match_labels, gt_class.size - 1), match_labels) + num_pos_list.append(max(1.0, pos_mask.sum().item())) + + cls_pred_list.append(cls_logit[chosen_mask]) + cls_tar_list.append(gt_class[match_labels[chosen_mask]]) + reg_pred_list.append(pos_bbox_pred) + reg_tar_list.append(pos_bbox_tar) + + num_tot_pos = paddle.to_tensor(sum(num_pos_list)) + num_tot_pos = reduce_mean(num_tot_pos).item() + num_tot_pos = max(1.0, num_tot_pos) + + cls_pred = paddle.concat(cls_pred_list) + cls_tar = paddle.concat(cls_tar_list) + cls_loss = self.loss_class( + cls_pred, cls_tar, reduction='sum') / num_tot_pos + + reg_pred_list = [_ for _ in reg_pred_list if _ is not None] + reg_tar_list = [_ for _ in reg_tar_list if _ is not None] + if len(reg_pred_list) == 0: + reg_loss = bbox_preds.sum() * 0.0 + else: + reg_pred = paddle.concat(reg_pred_list) + reg_tar = paddle.concat(reg_tar_list) + reg_loss = self.loss_bbox(reg_pred, reg_tar).sum() / num_tot_pos + + yolof_losses = { + 'loss': cls_loss + reg_loss, + 'loss_cls': cls_loss, + 'loss_reg': reg_loss, + } + return yolof_losses + + def get_bboxes_single(self, + anchors, + cls_scores, + bbox_preds, + im_shape, + scale_factor, + rescale=True): + assert len(cls_scores) == len(bbox_preds) + mlvl_bboxes = [] + mlvl_scores = [] + for anchor, cls_score, bbox_pred in zip(anchors, cls_scores, + bbox_preds): + cls_score = cls_score.reshape([-1, self.num_classes]) + bbox_pred = bbox_pred.reshape([-1, 4]) + if self.nms_pre is not None and cls_score.shape[0] > self.nms_pre: + max_score = cls_score.max(axis=1) + _, topk_inds = max_score.topk(self.nms_pre) + bbox_pred = bbox_pred.gather(topk_inds) + anchor = anchor.gather(topk_inds) + cls_score = cls_score.gather(topk_inds) + + bbox_pred = delta2bbox_v2( + bbox_pred, + anchor, + self.delta_mean, + self.delta_std, + max_shape=im_shape, + ctr_clip=self.ctr_clip).squeeze() + mlvl_bboxes.append(bbox_pred) + mlvl_scores.append(F.sigmoid(cls_score)) + mlvl_bboxes = paddle.concat(mlvl_bboxes) + mlvl_bboxes = paddle.squeeze(mlvl_bboxes) + if rescale: + mlvl_bboxes = mlvl_bboxes / paddle.concat( + [scale_factor[::-1], scale_factor[::-1]]) + mlvl_scores = paddle.concat(mlvl_scores) + mlvl_scores = mlvl_scores.transpose([1, 0]) + return mlvl_bboxes, mlvl_scores + + def decode(self, anchors, cls_scores, bbox_preds, im_shape, scale_factor): + batch_bboxes = [] + batch_scores = [] + for img_id in range(cls_scores[0].shape[0]): + num_lvls = len(cls_scores) + cls_score_list = [cls_scores[i][img_id] for i in range(num_lvls)] + bbox_pred_list = [bbox_preds[i][img_id] for i in range(num_lvls)] + bboxes, scores = self.get_bboxes_single( + anchors, cls_score_list, bbox_pred_list, im_shape[img_id], + scale_factor[img_id]) + batch_bboxes.append(bboxes) + batch_scores.append(scores) + batch_bboxes = paddle.stack(batch_bboxes, 0) + batch_scores = paddle.stack(batch_scores, 0) + return batch_bboxes, batch_scores + + def post_process(self, head_outs, im_shape, scale_factor): + anchors, cls_scores, bbox_preds = head_outs + cls_scores = cls_scores.transpose([0, 2, 3, 1]) + bbox_preds = bbox_preds.transpose([0, 2, 3, 1]) + pred_bboxes, pred_scores = self.decode( + [anchors], [cls_scores], [bbox_preds], im_shape, scale_factor) + + if self.exclude_nms: + # `exclude_nms=True` just use in benchmark + return pred_bboxes.sum(), pred_scores.sum() + else: + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/initializer.py b/PaddleDetection-release-2.6/ppdet/modeling/initializer.py new file mode 100644 index 0000000000000000000000000000000000000000..758eed240eae4497e14b7fe1cb9e10aca702eb53 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/initializer.py @@ -0,0 +1,324 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/pytorch/pytorch/blob/master/torch/nn/init.py +Ths copyright of pytorch/pytorch is a BSD-style license, as found in the LICENSE file. +""" + +import math +import numpy as np + +import paddle +import paddle.nn as nn + +__all__ = [ + 'uniform_', + 'normal_', + 'constant_', + 'ones_', + 'zeros_', + 'xavier_uniform_', + 'xavier_normal_', + 'kaiming_uniform_', + 'kaiming_normal_', + 'linear_init_', + 'conv_init_', + 'reset_initialized_parameter', +] + + +def _no_grad_uniform_(tensor, a, b): + with paddle.no_grad(): + tensor.set_value( + paddle.uniform( + shape=tensor.shape, dtype=tensor.dtype, min=a, max=b)) + return tensor + + +def _no_grad_normal_(tensor, mean=0., std=1.): + with paddle.no_grad(): + tensor.set_value(paddle.normal(mean=mean, std=std, shape=tensor.shape)) + return tensor + + +def _no_grad_fill_(tensor, value=0.): + with paddle.no_grad(): + tensor.set_value(paddle.full_like(tensor, value, dtype=tensor.dtype)) + return tensor + + +def uniform_(tensor, a, b): + """ + Modified tensor inspace using uniform_ + Args: + tensor (paddle.Tensor): paddle Tensor + a (float|int): min value. + b (float|int): max value. + Return: + tensor + """ + return _no_grad_uniform_(tensor, a, b) + + +def normal_(tensor, mean=0., std=1.): + """ + Modified tensor inspace using normal_ + Args: + tensor (paddle.Tensor): paddle Tensor + mean (float|int): mean value. + std (float|int): std value. + Return: + tensor + """ + return _no_grad_normal_(tensor, mean, std) + + +def constant_(tensor, value=0.): + """ + Modified tensor inspace using constant_ + Args: + tensor (paddle.Tensor): paddle Tensor + value (float|int): value to fill tensor. + Return: + tensor + """ + return _no_grad_fill_(tensor, value) + + +def ones_(tensor): + """ + Modified tensor inspace using ones_ + Args: + tensor (paddle.Tensor): paddle Tensor + Return: + tensor + """ + return _no_grad_fill_(tensor, 1) + + +def zeros_(tensor): + """ + Modified tensor inspace using zeros_ + Args: + tensor (paddle.Tensor): paddle Tensor + Return: + tensor + """ + return _no_grad_fill_(tensor, 0) + + +def vector_(tensor, vector): + with paddle.no_grad(): + tensor.set_value(paddle.to_tensor(vector, dtype=tensor.dtype)) + return tensor + + +def _calculate_fan_in_and_fan_out(tensor, reverse=False): + """ + Calculate (fan_in, _fan_out) for tensor + + Args: + tensor (Tensor): paddle.Tensor + reverse (bool: False): tensor data format order, False by default as [fout, fin, ...]. e.g. : conv.weight [cout, cin, kh, kw] is False; linear.weight [cin, cout] is True + + Return: + Tuple[fan_in, fan_out] + """ + if tensor.ndim < 2: + raise ValueError( + "Fan in and fan out can not be computed for tensor with fewer than 2 dimensions" + ) + + if reverse: + num_input_fmaps, num_output_fmaps = tensor.shape[0], tensor.shape[1] + else: + num_input_fmaps, num_output_fmaps = tensor.shape[1], tensor.shape[0] + + receptive_field_size = 1 + if tensor.ndim > 2: + receptive_field_size = np.prod(tensor.shape[2:]) + + fan_in = num_input_fmaps * receptive_field_size + fan_out = num_output_fmaps * receptive_field_size + + return fan_in, fan_out + + +def xavier_uniform_(tensor, gain=1., reverse=False): + """ + Modified tensor inspace using xavier_uniform_ + Args: + tensor (paddle.Tensor): paddle Tensor + gain (float): super parameter, 1. default. + reverse (bool): reverse (bool: False): tensor data format order, False by default as [fout, fin, ...]. + Return: + tensor + """ + fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor, reverse=reverse) + std = gain * math.sqrt(2.0 / float(fan_in + fan_out)) + k = math.sqrt(3.0) * std + return _no_grad_uniform_(tensor, -k, k) + + +def xavier_normal_(tensor, gain=1., reverse=False): + """ + Modified tensor inspace using xavier_normal_ + Args: + tensor (paddle.Tensor): paddle Tensor + gain (float): super parameter, 1. default. + reverse (bool): reverse (bool: False): tensor data format order, False by default as [fout, fin, ...]. + Return: + tensor + """ + fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor, reverse=reverse) + std = gain * math.sqrt(2.0 / float(fan_in + fan_out)) + return _no_grad_normal_(tensor, 0, std) + + +# reference: https://pytorch.org/docs/stable/_modules/torch/nn/init.html +def _calculate_correct_fan(tensor, mode, reverse=False): + mode = mode.lower() + valid_modes = ['fan_in', 'fan_out'] + if mode not in valid_modes: + raise ValueError("Mode {} not supported, please use one of {}".format( + mode, valid_modes)) + + fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor, reverse) + + return fan_in if mode == 'fan_in' else fan_out + + +def _calculate_gain(nonlinearity, param=None): + linear_fns = [ + 'linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', + 'conv_transpose2d', 'conv_transpose3d' + ] + if nonlinearity in linear_fns or nonlinearity == 'sigmoid': + return 1 + elif nonlinearity == 'tanh': + return 5.0 / 3 + elif nonlinearity == 'relu': + return math.sqrt(2.0) + elif nonlinearity == 'leaky_relu': + if param is None: + negative_slope = 0.01 + elif not isinstance(param, bool) and isinstance( + param, int) or isinstance(param, float): + # True/False are instances of int, hence check above + negative_slope = param + else: + raise ValueError("negative_slope {} not a valid number".format( + param)) + return math.sqrt(2.0 / (1 + negative_slope**2)) + elif nonlinearity == 'selu': + return 3.0 / 4 + else: + raise ValueError("Unsupported nonlinearity {}".format(nonlinearity)) + + +def kaiming_uniform_(tensor, + a=0, + mode='fan_in', + nonlinearity='leaky_relu', + reverse=False): + """ + Modified tensor inspace using kaiming_uniform method + Args: + tensor (paddle.Tensor): paddle Tensor + mode (str): ['fan_in', 'fan_out'], 'fin_in' defalut + nonlinearity (str): nonlinearity method name + reverse (bool): reverse (bool: False): tensor data format order, False by default as [fout, fin, ...]. + Return: + tensor + """ + fan = _calculate_correct_fan(tensor, mode, reverse) + gain = _calculate_gain(nonlinearity, a) + std = gain / math.sqrt(fan) + k = math.sqrt(3.0) * std + return _no_grad_uniform_(tensor, -k, k) + + +def kaiming_normal_(tensor, + a=0, + mode='fan_in', + nonlinearity='leaky_relu', + reverse=False): + """ + Modified tensor inspace using kaiming_normal_ + Args: + tensor (paddle.Tensor): paddle Tensor + mode (str): ['fan_in', 'fan_out'], 'fin_in' defalut + nonlinearity (str): nonlinearity method name + reverse (bool): reverse (bool: False): tensor data format order, False by default as [fout, fin, ...]. + Return: + tensor + """ + fan = _calculate_correct_fan(tensor, mode, reverse) + gain = _calculate_gain(nonlinearity, a) + std = gain / math.sqrt(fan) + return _no_grad_normal_(tensor, 0, std) + + +def linear_init_(module): + bound = 1 / math.sqrt(module.weight.shape[0]) + uniform_(module.weight, -bound, bound) + uniform_(module.bias, -bound, bound) + + +def conv_init_(module): + bound = 1 / np.sqrt(np.prod(module.weight.shape[1:])) + uniform_(module.weight, -bound, bound) + if module.bias is not None: + uniform_(module.bias, -bound, bound) + + +def bias_init_with_prob(prior_prob=0.01): + """initialize conv/fc bias value according to a given probability value.""" + bias_init = float(-np.log((1 - prior_prob) / prior_prob)) + return bias_init + + +@paddle.no_grad() +def reset_initialized_parameter(model, include_self=True): + """ + Reset initialized parameter using following method for [conv, linear, embedding, bn] + + Args: + model (paddle.Layer): paddle Layer + include_self (bool: False): include_self for Layer.named_sublayers method. Indicate whether including itself + Return: + None + """ + for _, m in model.named_sublayers(include_self=include_self): + if isinstance(m, nn.Conv2D): + k = float(m._groups) / (m._in_channels * m._kernel_size[0] * + m._kernel_size[1]) + k = math.sqrt(k) + _no_grad_uniform_(m.weight, -k, k) + if hasattr(m, 'bias') and getattr(m, 'bias') is not None: + _no_grad_uniform_(m.bias, -k, k) + + elif isinstance(m, nn.Linear): + k = math.sqrt(1. / m.weight.shape[0]) + _no_grad_uniform_(m.weight, -k, k) + if hasattr(m, 'bias') and getattr(m, 'bias') is not None: + _no_grad_uniform_(m.bias, -k, k) + + elif isinstance(m, nn.Embedding): + _no_grad_normal_(m.weight, mean=0., std=1.) + + elif isinstance(m, (nn.BatchNorm2D, nn.LayerNorm)): + _no_grad_fill_(m.weight, 1.) + if hasattr(m, 'bias') and getattr(m, 'bias') is not None: + _no_grad_fill_(m.bias, 0) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/keypoint_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/keypoint_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d5cbeb3ba680b2cf801e541d03b2893d822579d4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/keypoint_utils.py @@ -0,0 +1,342 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is based on https://github.com/open-mmlab/mmpose +""" + +import cv2 +import numpy as np + + +def get_affine_mat_kernel(h, w, s, inv=False): + if w < h: + w_ = s + h_ = int(np.ceil((s / w * h) / 64.) * 64) + scale_w = w + scale_h = h_ / w_ * w + + else: + h_ = s + w_ = int(np.ceil((s / h * w) / 64.) * 64) + scale_h = h + scale_w = w_ / h_ * h + + center = np.array([np.round(w / 2.), np.round(h / 2.)]) + + size_resized = (w_, h_) + trans = get_affine_transform( + center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv) + + return trans, size_resized + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + input_size (np.ndarray[2, ]): Size of input feature (width, height). + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len( + a) == 2, 'input of _get_3rd_point should be point with length of 2' + assert len( + b) == 2, 'input of _get_3rd_point should be point with length of 2' + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def transpred(kpts, h, w, s): + trans, _ = get_affine_mat_kernel(h, w, s, inv=True) + + return warp_affine_joints(kpts[..., :2].copy(), trans) + + +def warp_affine_joints(joints, mat): + """Apply affine transformation defined by the transform matrix on the + joints. + + Args: + joints (np.ndarray[..., 2]): Origin coordinate of joints. + mat (np.ndarray[3, 2]): The affine matrix. + + Returns: + matrix (np.ndarray[..., 2]): Result coordinate of joints. + """ + joints = np.array(joints) + shape = joints.shape + joints = joints.reshape(-1, 2) + return np.dot(np.concatenate( + (joints, joints[:, 0:1] * 0 + 1), axis=1), + mat.T).reshape(shape) + + +def affine_transform(pt, t): + new_pt = np.array([pt[0], pt[1], 1.]).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] + + +def transform_preds(coords, center, scale, output_size): + target_coords = np.zeros(coords.shape) + trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1) + for p in range(coords.shape[0]): + target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) + return target_coords + + +def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None): + if not isinstance(sigmas, np.ndarray): + sigmas = np.array([ + .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, + .87, .87, .89, .89 + ]) / 10.0 + vars = (sigmas * 2)**2 + xg = g[0::3] + yg = g[1::3] + vg = g[2::3] + ious = np.zeros((d.shape[0])) + for n_d in range(0, d.shape[0]): + xd = d[n_d, 0::3] + yd = d[n_d, 1::3] + vd = d[n_d, 2::3] + dx = xd - xg + dy = yd - yg + e = (dx**2 + dy**2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2 + if in_vis_thre is not None: + ind = list(vg > in_vis_thre) and list(vd > in_vis_thre) + e = e[ind] + ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0 + return ious + + +def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None): + """greedily select boxes with high confidence and overlap with current maximum <= thresh + rule out overlap >= thresh + + Args: + kpts_db (list): The predicted keypoints within the image + thresh (float): The threshold to select the boxes + sigmas (np.array): The variance to calculate the oks iou + Default: None + in_vis_thre (float): The threshold to select the high confidence boxes + Default: None + + Return: + keep (list): indexes to keep + """ + + if len(kpts_db) == 0: + return [] + + scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))]) + kpts = np.array( + [kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))]) + areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))]) + + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + i = order[0] + keep.append(i) + + oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], + sigmas, in_vis_thre) + + inds = np.where(oks_ovr <= thresh)[0] + order = order[inds + 1] + + return keep + + +def rescore(overlap, scores, thresh, type='gaussian'): + assert overlap.shape[0] == scores.shape[0] + if type == 'linear': + inds = np.where(overlap >= thresh)[0] + scores[inds] = scores[inds] * (1 - overlap[inds]) + else: + scores = scores * np.exp(-overlap**2 / thresh) + + return scores + + +def soft_oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None): + """greedily select boxes with high confidence and overlap with current maximum <= thresh + rule out overlap >= thresh + + Args: + kpts_db (list): The predicted keypoints within the image + thresh (float): The threshold to select the boxes + sigmas (np.array): The variance to calculate the oks iou + Default: None + in_vis_thre (float): The threshold to select the high confidence boxes + Default: None + + Return: + keep (list): indexes to keep + """ + + if len(kpts_db) == 0: + return [] + + scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))]) + kpts = np.array( + [kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))]) + areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))]) + + order = scores.argsort()[::-1] + scores = scores[order] + + # max_dets = order.size + max_dets = 20 + keep = np.zeros(max_dets, dtype=np.intp) + keep_cnt = 0 + while order.size > 0 and keep_cnt < max_dets: + i = order[0] + + oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], + sigmas, in_vis_thre) + + order = order[1:] + scores = rescore(oks_ovr, scores[1:], thresh) + + tmp = scores.argsort()[::-1] + order = order[tmp] + scores = scores[tmp] + + keep[keep_cnt] = i + keep_cnt += 1 + + keep = keep[:keep_cnt] + + return keep diff --git a/PaddleDetection-release-2.6/ppdet/modeling/layers.py b/PaddleDetection-release-2.6/ppdet/modeling/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..16368e81e62d01cae7e628a30bd7c98cb9dcb234 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/layers.py @@ -0,0 +1,1346 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import six +import numpy as np +from numbers import Integral + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle import to_tensor +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, Constant, XavierUniform +from paddle.regularizer import L2Decay + +from ppdet.core.workspace import register, serializable +from ppdet.modeling.bbox_utils import delta2bbox +from . import ops +from .initializer import xavier_uniform_, constant_ + +from paddle.vision.ops import DeformConv2D + + +def _to_list(l): + if isinstance(l, (list, tuple)): + return list(l) + return [l] + + +class AlignConv(nn.Layer): + def __init__(self, in_channels, out_channels, kernel_size=3, groups=1): + super(AlignConv, self).__init__() + self.kernel_size = kernel_size + self.align_conv = paddle.vision.ops.DeformConv2D( + in_channels, + out_channels, + kernel_size=self.kernel_size, + padding=(self.kernel_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr(initializer=Normal(0, 0.01)), + bias_attr=None) + + @paddle.no_grad() + def get_offset(self, anchors, featmap_size, stride): + """ + Args: + anchors: [B, L, 5] xc,yc,w,h,angle + featmap_size: (feat_h, feat_w) + stride: 8 + Returns: + + """ + batch = anchors.shape[0] + dtype = anchors.dtype + feat_h, feat_w = featmap_size + pad = (self.kernel_size - 1) // 2 + idx = paddle.arange(-pad, pad + 1, dtype=dtype) + + yy, xx = paddle.meshgrid(idx, idx) + xx = paddle.reshape(xx, [-1]) + yy = paddle.reshape(yy, [-1]) + + # get sampling locations of default conv + xc = paddle.arange(0, feat_w, dtype=dtype) + yc = paddle.arange(0, feat_h, dtype=dtype) + yc, xc = paddle.meshgrid(yc, xc) + + xc = paddle.reshape(xc, [-1, 1]) + yc = paddle.reshape(yc, [-1, 1]) + x_conv = xc + xx + y_conv = yc + yy + + # get sampling locations of anchors + x_ctr, y_ctr, w, h, a = paddle.split(anchors, 5, axis=-1) + x_ctr = x_ctr / stride + y_ctr = y_ctr / stride + w_s = w / stride + h_s = h / stride + cos, sin = paddle.cos(a), paddle.sin(a) + dw, dh = w_s / self.kernel_size, h_s / self.kernel_size + x, y = dw * xx, dh * yy + xr = cos * x - sin * y + yr = sin * x + cos * y + x_anchor, y_anchor = xr + x_ctr, yr + y_ctr + # get offset filed + offset_x = x_anchor - x_conv + offset_y = y_anchor - y_conv + offset = paddle.stack([offset_y, offset_x], axis=-1) + offset = offset.reshape( + [batch, feat_h, feat_w, self.kernel_size * self.kernel_size * 2]) + offset = offset.transpose([0, 3, 1, 2]) + + return offset + + def forward(self, x, refine_anchors, featmap_size, stride): + batch = paddle.shape(x)[0].numpy() + offset = self.get_offset(refine_anchors, featmap_size, stride) + if self.training: + x = F.relu(self.align_conv(x, offset.detach())) + else: + x = F.relu(self.align_conv(x, offset)) + return x + + +class DeformableConvV2(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + weight_attr=None, + bias_attr=None, + lr_scale=1, + regularizer=None, + skip_quant=False, + dcn_bias_regularizer=L2Decay(0.), + dcn_bias_lr_scale=2.): + super(DeformableConvV2, self).__init__() + self.offset_channel = 2 * kernel_size**2 + self.mask_channel = kernel_size**2 + + if lr_scale == 1 and regularizer is None: + offset_bias_attr = ParamAttr(initializer=Constant(0.)) + else: + offset_bias_attr = ParamAttr( + initializer=Constant(0.), + learning_rate=lr_scale, + regularizer=regularizer) + self.conv_offset = nn.Conv2D( + in_channels, + 3 * kernel_size**2, + kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2, + weight_attr=ParamAttr(initializer=Constant(0.0)), + bias_attr=offset_bias_attr) + if skip_quant: + self.conv_offset.skip_quant = True + + if bias_attr: + # in FCOS-DCN head, specifically need learning_rate and regularizer + dcn_bias_attr = ParamAttr( + initializer=Constant(value=0), + regularizer=dcn_bias_regularizer, + learning_rate=dcn_bias_lr_scale) + else: + # in ResNet backbone, do not need bias + dcn_bias_attr = False + self.conv_dcn = DeformConv2D( + in_channels, + out_channels, + kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 * dilation, + dilation=dilation, + groups=groups, + weight_attr=weight_attr, + bias_attr=dcn_bias_attr) + + def forward(self, x): + offset_mask = self.conv_offset(x) + offset, mask = paddle.split( + offset_mask, + num_or_sections=[self.offset_channel, self.mask_channel], + axis=1) + mask = F.sigmoid(mask) + y = self.conv_dcn(x, offset, mask=mask) + return y + + +class ConvNormLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + stride, + groups=1, + norm_type='bn', + norm_decay=0., + norm_groups=32, + use_dcn=False, + bias_on=False, + lr_scale=1., + freeze_norm=False, + initializer=Normal( + mean=0., std=0.01), + skip_quant=False, + dcn_lr_scale=2., + dcn_regularizer=L2Decay(0.)): + super(ConvNormLayer, self).__init__() + assert norm_type in ['bn', 'sync_bn', 'gn', None] + + if bias_on: + bias_attr = ParamAttr( + initializer=Constant(value=0.), learning_rate=lr_scale) + else: + bias_attr = False + + if not use_dcn: + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr( + initializer=initializer, learning_rate=1.), + bias_attr=bias_attr) + if skip_quant: + self.conv.skip_quant = True + else: + # in FCOS-DCN head, specifically need learning_rate and regularizer + self.conv = DeformableConvV2( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr( + initializer=initializer, learning_rate=1.), + bias_attr=True, + lr_scale=dcn_lr_scale, + regularizer=dcn_regularizer, + dcn_bias_regularizer=dcn_regularizer, + dcn_bias_lr_scale=dcn_lr_scale, + skip_quant=skip_quant) + + norm_lr = 0. if freeze_norm else 1. + param_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay) if norm_decay is not None else None) + bias_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay) if norm_decay is not None else None) + if norm_type in ['bn', 'sync_bn']: + self.norm = nn.BatchNorm2D( + ch_out, weight_attr=param_attr, bias_attr=bias_attr) + elif norm_type == 'gn': + self.norm = nn.GroupNorm( + num_groups=norm_groups, + num_channels=ch_out, + weight_attr=param_attr, + bias_attr=bias_attr) + else: + self.norm = None + + def forward(self, inputs): + out = self.conv(inputs) + if self.norm is not None: + out = self.norm(out) + return out + + +class LiteConv(nn.Layer): + def __init__(self, + in_channels, + out_channels, + stride=1, + with_act=True, + norm_type='sync_bn', + name=None): + super(LiteConv, self).__init__() + self.lite_conv = nn.Sequential() + conv1 = ConvNormLayer( + in_channels, + in_channels, + filter_size=5, + stride=stride, + groups=in_channels, + norm_type=norm_type, + initializer=XavierUniform()) + conv2 = ConvNormLayer( + in_channels, + out_channels, + filter_size=1, + stride=stride, + norm_type=norm_type, + initializer=XavierUniform()) + conv3 = ConvNormLayer( + out_channels, + out_channels, + filter_size=1, + stride=stride, + norm_type=norm_type, + initializer=XavierUniform()) + conv4 = ConvNormLayer( + out_channels, + out_channels, + filter_size=5, + stride=stride, + groups=out_channels, + norm_type=norm_type, + initializer=XavierUniform()) + conv_list = [conv1, conv2, conv3, conv4] + self.lite_conv.add_sublayer('conv1', conv1) + self.lite_conv.add_sublayer('relu6_1', nn.ReLU6()) + self.lite_conv.add_sublayer('conv2', conv2) + if with_act: + self.lite_conv.add_sublayer('relu6_2', nn.ReLU6()) + self.lite_conv.add_sublayer('conv3', conv3) + self.lite_conv.add_sublayer('relu6_3', nn.ReLU6()) + self.lite_conv.add_sublayer('conv4', conv4) + if with_act: + self.lite_conv.add_sublayer('relu6_4', nn.ReLU6()) + + def forward(self, inputs): + out = self.lite_conv(inputs) + return out + + +class DropBlock(nn.Layer): + def __init__(self, block_size, keep_prob, name=None, data_format='NCHW'): + """ + DropBlock layer, see https://arxiv.org/abs/1810.12890 + + Args: + block_size (int): block size + keep_prob (int): keep probability + name (str): layer name + data_format (str): data format, NCHW or NHWC + """ + super(DropBlock, self).__init__() + self.block_size = block_size + self.keep_prob = keep_prob + self.name = name + self.data_format = data_format + + def forward(self, x): + if not self.training or self.keep_prob == 1: + return x + else: + gamma = (1. - self.keep_prob) / (self.block_size**2) + if self.data_format == 'NCHW': + shape = x.shape[2:] + else: + shape = x.shape[1:3] + for s in shape: + gamma *= s / (s - self.block_size + 1) + + matrix = paddle.cast(paddle.rand(x.shape) < gamma, x.dtype) + mask_inv = F.max_pool2d( + matrix, + self.block_size, + stride=1, + padding=self.block_size // 2, + data_format=self.data_format) + mask = 1. - mask_inv + y = x * mask * (mask.numel() / mask.sum()) + return y + + +@register +@serializable +class AnchorGeneratorSSD(object): + def __init__(self, + steps=[8, 16, 32, 64, 100, 300], + aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]], + min_ratio=15, + max_ratio=90, + base_size=300, + min_sizes=[30.0, 60.0, 111.0, 162.0, 213.0, 264.0], + max_sizes=[60.0, 111.0, 162.0, 213.0, 264.0, 315.0], + offset=0.5, + flip=True, + clip=False, + min_max_aspect_ratios_order=False): + self.steps = steps + self.aspect_ratios = aspect_ratios + self.min_ratio = min_ratio + self.max_ratio = max_ratio + self.base_size = base_size + self.min_sizes = min_sizes + self.max_sizes = max_sizes + self.offset = offset + self.flip = flip + self.clip = clip + self.min_max_aspect_ratios_order = min_max_aspect_ratios_order + + if self.min_sizes == [] and self.max_sizes == []: + num_layer = len(aspect_ratios) + step = int( + math.floor(((self.max_ratio - self.min_ratio)) / (num_layer - 2 + ))) + for ratio in six.moves.range(self.min_ratio, self.max_ratio + 1, + step): + self.min_sizes.append(self.base_size * ratio / 100.) + self.max_sizes.append(self.base_size * (ratio + step) / 100.) + self.min_sizes = [self.base_size * .10] + self.min_sizes + self.max_sizes = [self.base_size * .20] + self.max_sizes + + self.num_priors = [] + for aspect_ratio, min_size, max_size in zip( + aspect_ratios, self.min_sizes, self.max_sizes): + if isinstance(min_size, (list, tuple)): + self.num_priors.append( + len(_to_list(min_size)) + len(_to_list(max_size))) + else: + self.num_priors.append((len(aspect_ratio) * 2 + 1) * len( + _to_list(min_size)) + len(_to_list(max_size))) + + def __call__(self, inputs, image): + boxes = [] + for input, min_size, max_size, aspect_ratio, step in zip( + inputs, self.min_sizes, self.max_sizes, self.aspect_ratios, + self.steps): + box, _ = ops.prior_box( + input=input, + image=image, + min_sizes=_to_list(min_size), + max_sizes=_to_list(max_size), + aspect_ratios=aspect_ratio, + flip=self.flip, + clip=self.clip, + steps=[step, step], + offset=self.offset, + min_max_aspect_ratios_order=self.min_max_aspect_ratios_order) + boxes.append(paddle.reshape(box, [-1, 4])) + return boxes + + +@register +@serializable +class RCNNBox(object): + __shared__ = ['num_classes', 'export_onnx'] + + def __init__(self, + prior_box_var=[10., 10., 5., 5.], + code_type="decode_center_size", + box_normalized=False, + num_classes=80, + export_onnx=False): + super(RCNNBox, self).__init__() + self.prior_box_var = prior_box_var + self.code_type = code_type + self.box_normalized = box_normalized + self.num_classes = num_classes + self.export_onnx = export_onnx + + def __call__(self, bbox_head_out, rois, im_shape, scale_factor): + bbox_pred = bbox_head_out[0] + cls_prob = bbox_head_out[1] + roi = rois[0] + rois_num = rois[1] + + if self.export_onnx: + onnx_rois_num_per_im = rois_num[0] + origin_shape = paddle.expand(im_shape[0, :], + [onnx_rois_num_per_im, 2]) + + else: + origin_shape_list = [] + if isinstance(roi, list): + batch_size = len(roi) + else: + batch_size = paddle.slice(paddle.shape(im_shape), [0], [0], [1]) + + # bbox_pred.shape: [N, C*4] + for idx in range(batch_size): + rois_num_per_im = rois_num[idx] + expand_im_shape = paddle.expand(im_shape[idx, :], + [rois_num_per_im, 2]) + origin_shape_list.append(expand_im_shape) + + origin_shape = paddle.concat(origin_shape_list) + + # bbox_pred.shape: [N, C*4] + # C=num_classes in faster/mask rcnn(bbox_head), C=1 in cascade rcnn(cascade_head) + bbox = paddle.concat(roi) + bbox = delta2bbox(bbox_pred, bbox, self.prior_box_var) + scores = cls_prob[:, :-1] + + # bbox.shape: [N, C, 4] + # bbox.shape[1] must be equal to scores.shape[1] + total_num = bbox.shape[0] + bbox_dim = bbox.shape[-1] + bbox = paddle.expand(bbox, [total_num, self.num_classes, bbox_dim]) + + origin_h = paddle.unsqueeze(origin_shape[:, 0], axis=1) + origin_w = paddle.unsqueeze(origin_shape[:, 1], axis=1) + zeros = paddle.zeros_like(origin_h) + x1 = paddle.maximum(paddle.minimum(bbox[:, :, 0], origin_w), zeros) + y1 = paddle.maximum(paddle.minimum(bbox[:, :, 1], origin_h), zeros) + x2 = paddle.maximum(paddle.minimum(bbox[:, :, 2], origin_w), zeros) + y2 = paddle.maximum(paddle.minimum(bbox[:, :, 3], origin_h), zeros) + bbox = paddle.stack([x1, y1, x2, y2], axis=-1) + bboxes = (bbox, rois_num) + return bboxes, scores + + +@register +@serializable +class MultiClassNMS(object): + def __init__(self, + score_threshold=.05, + nms_top_k=-1, + keep_top_k=100, + nms_threshold=.5, + normalized=True, + nms_eta=1.0, + return_index=False, + return_rois_num=True, + trt=False): + super(MultiClassNMS, self).__init__() + self.score_threshold = score_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + self.nms_threshold = nms_threshold + self.normalized = normalized + self.nms_eta = nms_eta + self.return_index = return_index + self.return_rois_num = return_rois_num + self.trt = trt + + def __call__(self, bboxes, score, background_label=-1): + """ + bboxes (Tensor|List[Tensor]): 1. (Tensor) Predicted bboxes with shape + [N, M, 4], N is the batch size and M + is the number of bboxes + 2. (List[Tensor]) bboxes and bbox_num, + bboxes have shape of [M, C, 4], C + is the class number and bbox_num means + the number of bboxes of each batch with + shape [N,] + score (Tensor): Predicted scores with shape [N, C, M] or [M, C] + background_label (int): Ignore the background label; For example, RCNN + is num_classes and YOLO is -1. + """ + kwargs = self.__dict__.copy() + if isinstance(bboxes, tuple): + bboxes, bbox_num = bboxes + kwargs.update({'rois_num': bbox_num}) + if background_label > -1: + kwargs.update({'background_label': background_label}) + kwargs.pop('trt') + # TODO(wangxinxin08): paddle version should be develop or 2.3 and above to run nms on tensorrt + if self.trt and (int(paddle.version.major) == 0 or + (int(paddle.version.major) >= 2 and + int(paddle.version.minor) >= 3)): + # TODO(wangxinxin08): tricky switch to run nms on tensorrt + kwargs.update({'nms_eta': 1.1}) + bbox, bbox_num, _ = ops.multiclass_nms(bboxes, score, **kwargs) + bbox = bbox.reshape([1, -1, 6]) + idx = paddle.nonzero(bbox[..., 0] != -1) + bbox = paddle.gather_nd(bbox, idx) + return bbox, bbox_num, None + else: + return ops.multiclass_nms(bboxes, score, **kwargs) + + +@register +@serializable +class MatrixNMS(object): + __append_doc__ = True + + def __init__(self, + score_threshold=.05, + post_threshold=.05, + nms_top_k=-1, + keep_top_k=100, + use_gaussian=False, + gaussian_sigma=2., + normalized=False, + background_label=0): + super(MatrixNMS, self).__init__() + self.score_threshold = score_threshold + self.post_threshold = post_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + self.normalized = normalized + self.use_gaussian = use_gaussian + self.gaussian_sigma = gaussian_sigma + self.background_label = background_label + + def __call__(self, bbox, score, *args): + return ops.matrix_nms( + bboxes=bbox, + scores=score, + score_threshold=self.score_threshold, + post_threshold=self.post_threshold, + nms_top_k=self.nms_top_k, + keep_top_k=self.keep_top_k, + use_gaussian=self.use_gaussian, + gaussian_sigma=self.gaussian_sigma, + background_label=self.background_label, + normalized=self.normalized) + + +@register +@serializable +class YOLOBox(object): + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + conf_thresh=0.005, + downsample_ratio=32, + clip_bbox=True, + scale_x_y=1.): + self.num_classes = num_classes + self.conf_thresh = conf_thresh + self.downsample_ratio = downsample_ratio + self.clip_bbox = clip_bbox + self.scale_x_y = scale_x_y + + def __call__(self, + yolo_head_out, + anchors, + im_shape, + scale_factor, + var_weight=None): + boxes_list = [] + scores_list = [] + origin_shape = im_shape / scale_factor + origin_shape = paddle.cast(origin_shape, 'int32') + for i, head_out in enumerate(yolo_head_out): + boxes, scores = paddle.vision.ops.yolo_box( + head_out, + origin_shape, + anchors[i], + self.num_classes, + self.conf_thresh, + self.downsample_ratio // 2**i, + self.clip_bbox, + scale_x_y=self.scale_x_y) + boxes_list.append(boxes) + scores_list.append(paddle.transpose(scores, perm=[0, 2, 1])) + yolo_boxes = paddle.concat(boxes_list, axis=1) + yolo_scores = paddle.concat(scores_list, axis=2) + return yolo_boxes, yolo_scores + + +@register +@serializable +class SSDBox(object): + def __init__(self, + is_normalized=True, + prior_box_var=[0.1, 0.1, 0.2, 0.2], + use_fuse_decode=False): + self.is_normalized = is_normalized + self.norm_delta = float(not self.is_normalized) + self.prior_box_var = prior_box_var + self.use_fuse_decode = use_fuse_decode + + def __call__(self, + preds, + prior_boxes, + im_shape, + scale_factor, + var_weight=None): + boxes, scores = preds + boxes = paddle.concat(boxes, axis=1) + prior_boxes = paddle.concat(prior_boxes) + if self.use_fuse_decode: + output_boxes = ops.box_coder( + prior_boxes, + self.prior_box_var, + boxes, + code_type="decode_center_size", + box_normalized=self.is_normalized) + else: + pb_w = prior_boxes[:, 2] - prior_boxes[:, 0] + self.norm_delta + pb_h = prior_boxes[:, 3] - prior_boxes[:, 1] + self.norm_delta + pb_x = prior_boxes[:, 0] + pb_w * 0.5 + pb_y = prior_boxes[:, 1] + pb_h * 0.5 + out_x = pb_x + boxes[:, :, 0] * pb_w * self.prior_box_var[0] + out_y = pb_y + boxes[:, :, 1] * pb_h * self.prior_box_var[1] + out_w = paddle.exp(boxes[:, :, 2] * self.prior_box_var[2]) * pb_w + out_h = paddle.exp(boxes[:, :, 3] * self.prior_box_var[3]) * pb_h + output_boxes = paddle.stack( + [ + out_x - out_w / 2., out_y - out_h / 2., out_x + out_w / 2., + out_y + out_h / 2. + ], + axis=-1) + + if self.is_normalized: + h = (im_shape[:, 0] / scale_factor[:, 0]).unsqueeze(-1) + w = (im_shape[:, 1] / scale_factor[:, 1]).unsqueeze(-1) + im_shape = paddle.stack([w, h, w, h], axis=-1) + output_boxes *= im_shape + else: + output_boxes[..., -2:] -= 1.0 + output_scores = F.softmax(paddle.concat( + scores, axis=1)).transpose([0, 2, 1]) + + return output_boxes, output_scores + + +@register +class TTFBox(object): + __shared__ = ['down_ratio'] + + def __init__(self, max_per_img=100, score_thresh=0.01, down_ratio=4): + super(TTFBox, self).__init__() + self.max_per_img = max_per_img + self.score_thresh = score_thresh + self.down_ratio = down_ratio + + def _simple_nms(self, heat, kernel=3): + """ + Use maxpool to filter the max score, get local peaks. + """ + pad = (kernel - 1) // 2 + hmax = F.max_pool2d(heat, kernel, stride=1, padding=pad) + keep = paddle.cast(hmax == heat, 'float32') + return heat * keep + + def _topk(self, scores): + """ + Select top k scores and decode to get xy coordinates. + """ + k = self.max_per_img + shape_fm = paddle.shape(scores) + shape_fm.stop_gradient = True + cat, height, width = shape_fm[1], shape_fm[2], shape_fm[3] + # batch size is 1 + scores_r = paddle.reshape(scores, [cat, -1]) + topk_scores, topk_inds = paddle.topk(scores_r, k) + topk_ys = topk_inds // width + topk_xs = topk_inds % width + + topk_score_r = paddle.reshape(topk_scores, [-1]) + topk_score, topk_ind = paddle.topk(topk_score_r, k) + k_t = paddle.full(paddle.shape(topk_ind), k, dtype='int64') + topk_clses = paddle.cast(paddle.floor_divide(topk_ind, k_t), 'float32') + + topk_inds = paddle.reshape(topk_inds, [-1]) + topk_ys = paddle.reshape(topk_ys, [-1, 1]) + topk_xs = paddle.reshape(topk_xs, [-1, 1]) + topk_inds = paddle.gather(topk_inds, topk_ind) + topk_ys = paddle.gather(topk_ys, topk_ind) + topk_xs = paddle.gather(topk_xs, topk_ind) + + return topk_score, topk_inds, topk_clses, topk_ys, topk_xs + + def _decode(self, hm, wh, im_shape, scale_factor): + heatmap = F.sigmoid(hm) + heat = self._simple_nms(heatmap) + scores, inds, clses, ys, xs = self._topk(heat) + ys = paddle.cast(ys, 'float32') * self.down_ratio + xs = paddle.cast(xs, 'float32') * self.down_ratio + scores = paddle.tensor.unsqueeze(scores, [1]) + clses = paddle.tensor.unsqueeze(clses, [1]) + + wh_t = paddle.transpose(wh, [0, 2, 3, 1]) + wh = paddle.reshape(wh_t, [-1, paddle.shape(wh_t)[-1]]) + wh = paddle.gather(wh, inds) + + x1 = xs - wh[:, 0:1] + y1 = ys - wh[:, 1:2] + x2 = xs + wh[:, 2:3] + y2 = ys + wh[:, 3:4] + + bboxes = paddle.concat([x1, y1, x2, y2], axis=1) + + scale_y = scale_factor[:, 0:1] + scale_x = scale_factor[:, 1:2] + scale_expand = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], axis=1) + boxes_shape = paddle.shape(bboxes) + boxes_shape.stop_gradient = True + scale_expand = paddle.expand(scale_expand, shape=boxes_shape) + bboxes = paddle.divide(bboxes, scale_expand) + results = paddle.concat([clses, scores, bboxes], axis=1) + # hack: append result with cls=-1 and score=1. to avoid all scores + # are less than score_thresh which may cause error in gather. + fill_r = paddle.to_tensor(np.array([[-1, 1, 0, 0, 0, 0]])) + fill_r = paddle.cast(fill_r, results.dtype) + results = paddle.concat([results, fill_r]) + scores = results[:, 1] + valid_ind = paddle.nonzero(scores > self.score_thresh) + results = paddle.gather(results, valid_ind) + return results, paddle.shape(results)[0:1] + + def __call__(self, hm, wh, im_shape, scale_factor): + results = [] + results_num = [] + for i in range(scale_factor.shape[0]): + result, num = self._decode(hm[i:i + 1, ], wh[i:i + 1, ], + im_shape[i:i + 1, ], + scale_factor[i:i + 1, ]) + results.append(result) + results_num.append(num) + results = paddle.concat(results, axis=0) + results_num = paddle.concat(results_num, axis=0) + return results, results_num + + +@register +@serializable +class JDEBox(object): + __shared__ = ['num_classes'] + + def __init__(self, num_classes=1, conf_thresh=0.3, downsample_ratio=32): + self.num_classes = num_classes + self.conf_thresh = conf_thresh + self.downsample_ratio = downsample_ratio + + def generate_anchor(self, nGh, nGw, anchor_wh): + nA = len(anchor_wh) + yv, xv = paddle.meshgrid([paddle.arange(nGh), paddle.arange(nGw)]) + mesh = paddle.stack( + (xv, yv), axis=0).cast(dtype='float32') # 2 x nGh x nGw + meshs = paddle.tile(mesh, [nA, 1, 1, 1]) + + anchor_offset_mesh = anchor_wh[:, :, None][:, :, :, None].repeat( + int(nGh), axis=-2).repeat( + int(nGw), axis=-1) + anchor_offset_mesh = paddle.to_tensor( + anchor_offset_mesh.astype(np.float32)) + # nA x 2 x nGh x nGw + + anchor_mesh = paddle.concat([meshs, anchor_offset_mesh], axis=1) + anchor_mesh = paddle.transpose(anchor_mesh, + [0, 2, 3, 1]) # (nA x nGh x nGw) x 4 + return anchor_mesh + + def decode_delta(self, delta, fg_anchor_list): + px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:,1], \ + fg_anchor_list[:, 2], fg_anchor_list[:,3] + dx, dy, dw, dh = delta[:, 0], delta[:, 1], delta[:, 2], delta[:, 3] + gx = pw * dx + px + gy = ph * dy + py + gw = pw * paddle.exp(dw) + gh = ph * paddle.exp(dh) + gx1 = gx - gw * 0.5 + gy1 = gy - gh * 0.5 + gx2 = gx + gw * 0.5 + gy2 = gy + gh * 0.5 + return paddle.stack([gx1, gy1, gx2, gy2], axis=1) + + def decode_delta_map(self, nA, nGh, nGw, delta_map, anchor_vec): + anchor_mesh = self.generate_anchor(nGh, nGw, anchor_vec) + anchor_mesh = paddle.unsqueeze(anchor_mesh, 0) + pred_list = self.decode_delta( + paddle.reshape( + delta_map, shape=[-1, 4]), + paddle.reshape( + anchor_mesh, shape=[-1, 4])) + pred_map = paddle.reshape(pred_list, shape=[nA * nGh * nGw, 4]) + return pred_map + + def _postprocessing_by_level(self, nA, stride, head_out, anchor_vec): + boxes_shape = head_out.shape # [nB, nA*6, nGh, nGw] + nGh, nGw = boxes_shape[-2], boxes_shape[-1] + nB = 1 # TODO: only support bs=1 now + boxes_list, scores_list = [], [] + for idx in range(nB): + p = paddle.reshape( + head_out[idx], shape=[nA, self.num_classes + 5, nGh, nGw]) + p = paddle.transpose(p, perm=[0, 2, 3, 1]) # [nA, nGh, nGw, 6] + delta_map = p[:, :, :, :4] + boxes = self.decode_delta_map(nA, nGh, nGw, delta_map, anchor_vec) + # [nA * nGh * nGw, 4] + boxes_list.append(boxes * stride) + + p_conf = paddle.transpose( + p[:, :, :, 4:6], perm=[3, 0, 1, 2]) # [2, nA, nGh, nGw] + p_conf = F.softmax( + p_conf, axis=0)[1, :, :, :].unsqueeze(-1) # [nA, nGh, nGw, 1] + scores = paddle.reshape(p_conf, shape=[nA * nGh * nGw, 1]) + scores_list.append(scores) + + boxes_results = paddle.stack(boxes_list) + scores_results = paddle.stack(scores_list) + return boxes_results, scores_results + + def __call__(self, yolo_head_out, anchors): + bbox_pred_list = [] + for i, head_out in enumerate(yolo_head_out): + stride = self.downsample_ratio // 2**i + anc_w, anc_h = anchors[i][0::2], anchors[i][1::2] + anchor_vec = np.stack((anc_w, anc_h), axis=1) / stride + nA = len(anc_w) + boxes, scores = self._postprocessing_by_level(nA, stride, head_out, + anchor_vec) + bbox_pred_list.append(paddle.concat([boxes, scores], axis=-1)) + + yolo_boxes_scores = paddle.concat(bbox_pred_list, axis=1) + boxes_idx_over_conf_thr = paddle.nonzero( + yolo_boxes_scores[:, :, -1] > self.conf_thresh) + boxes_idx_over_conf_thr.stop_gradient = True + + return boxes_idx_over_conf_thr, yolo_boxes_scores + + +@register +@serializable +class MaskMatrixNMS(object): + """ + Matrix NMS for multi-class masks. + Args: + update_threshold (float): Updated threshold of categroy score in second time. + pre_nms_top_n (int): Number of total instance to be kept per image before NMS + post_nms_top_n (int): Number of total instance to be kept per image after NMS. + kernel (str): 'linear' or 'gaussian'. + sigma (float): std in gaussian method. + Input: + seg_preds (Variable): shape (n, h, w), segmentation feature maps + seg_masks (Variable): shape (n, h, w), segmentation feature maps + cate_labels (Variable): shape (n), mask labels in descending order + cate_scores (Variable): shape (n), mask scores in descending order + sum_masks (Variable): a float tensor of the sum of seg_masks + Returns: + Variable: cate_scores, tensors of shape (n) + """ + + def __init__(self, + update_threshold=0.05, + pre_nms_top_n=500, + post_nms_top_n=100, + kernel='gaussian', + sigma=2.0): + super(MaskMatrixNMS, self).__init__() + self.update_threshold = update_threshold + self.pre_nms_top_n = pre_nms_top_n + self.post_nms_top_n = post_nms_top_n + self.kernel = kernel + self.sigma = sigma + + def _sort_score(self, scores, top_num): + if paddle.shape(scores)[0] > top_num: + return paddle.topk(scores, top_num)[1] + else: + return paddle.argsort(scores, descending=True) + + def __call__(self, + seg_preds, + seg_masks, + cate_labels, + cate_scores, + sum_masks=None): + # sort and keep top nms_pre + sort_inds = self._sort_score(cate_scores, self.pre_nms_top_n) + seg_masks = paddle.gather(seg_masks, index=sort_inds) + seg_preds = paddle.gather(seg_preds, index=sort_inds) + sum_masks = paddle.gather(sum_masks, index=sort_inds) + cate_scores = paddle.gather(cate_scores, index=sort_inds) + cate_labels = paddle.gather(cate_labels, index=sort_inds) + + seg_masks = paddle.flatten(seg_masks, start_axis=1, stop_axis=-1) + # inter. + inter_matrix = paddle.mm(seg_masks, paddle.transpose(seg_masks, [1, 0])) + n_samples = paddle.shape(cate_labels) + # union. + sum_masks_x = paddle.expand(sum_masks, shape=[n_samples, n_samples]) + # iou. + iou_matrix = (inter_matrix / ( + sum_masks_x + paddle.transpose(sum_masks_x, [1, 0]) - inter_matrix)) + iou_matrix = paddle.triu(iou_matrix, diagonal=1) + # label_specific matrix. + cate_labels_x = paddle.expand(cate_labels, shape=[n_samples, n_samples]) + label_matrix = paddle.cast( + (cate_labels_x == paddle.transpose(cate_labels_x, [1, 0])), + 'float32') + label_matrix = paddle.triu(label_matrix, diagonal=1) + + # IoU compensation + compensate_iou = paddle.max((iou_matrix * label_matrix), axis=0) + compensate_iou = paddle.expand( + compensate_iou, shape=[n_samples, n_samples]) + compensate_iou = paddle.transpose(compensate_iou, [1, 0]) + + # IoU decay + decay_iou = iou_matrix * label_matrix + + # matrix nms + if self.kernel == 'gaussian': + decay_matrix = paddle.exp(-1 * self.sigma * (decay_iou**2)) + compensate_matrix = paddle.exp(-1 * self.sigma * + (compensate_iou**2)) + decay_coefficient = paddle.min(decay_matrix / compensate_matrix, + axis=0) + elif self.kernel == 'linear': + decay_matrix = (1 - decay_iou) / (1 - compensate_iou) + decay_coefficient = paddle.min(decay_matrix, axis=0) + else: + raise NotImplementedError + + # update the score. + cate_scores = cate_scores * decay_coefficient + y = paddle.zeros(shape=paddle.shape(cate_scores), dtype='float32') + keep = paddle.where(cate_scores >= self.update_threshold, cate_scores, + y) + keep = paddle.nonzero(keep) + keep = paddle.squeeze(keep, axis=[1]) + # Prevent empty and increase fake data + keep = paddle.concat( + [keep, paddle.cast(paddle.shape(cate_scores)[0] - 1, 'int64')]) + + seg_preds = paddle.gather(seg_preds, index=keep) + cate_scores = paddle.gather(cate_scores, index=keep) + cate_labels = paddle.gather(cate_labels, index=keep) + + # sort and keep top_k + sort_inds = self._sort_score(cate_scores, self.post_nms_top_n) + seg_preds = paddle.gather(seg_preds, index=sort_inds) + cate_scores = paddle.gather(cate_scores, index=sort_inds) + cate_labels = paddle.gather(cate_labels, index=sort_inds) + return seg_preds, cate_scores, cate_labels + + +def Conv2d(in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + bias=True, + weight_init=Normal(std=0.001), + bias_init=Constant(0.)): + weight_attr = paddle.framework.ParamAttr(initializer=weight_init) + if bias: + bias_attr = paddle.framework.ParamAttr(initializer=bias_init) + else: + bias_attr = False + conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size, + stride, + padding, + dilation, + groups, + weight_attr=weight_attr, + bias_attr=bias_attr) + return conv + + +def ConvTranspose2d(in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + output_padding=0, + groups=1, + bias=True, + dilation=1, + weight_init=Normal(std=0.001), + bias_init=Constant(0.)): + weight_attr = paddle.framework.ParamAttr(initializer=weight_init) + if bias: + bias_attr = paddle.framework.ParamAttr(initializer=bias_init) + else: + bias_attr = False + conv = nn.Conv2DTranspose( + in_channels, + out_channels, + kernel_size, + stride, + padding, + output_padding, + dilation, + groups, + weight_attr=weight_attr, + bias_attr=bias_attr) + return conv + + +def BatchNorm2d(num_features, eps=1e-05, momentum=0.9, affine=True): + if not affine: + weight_attr = False + bias_attr = False + else: + weight_attr = None + bias_attr = None + batchnorm = nn.BatchNorm2D( + num_features, + momentum, + eps, + weight_attr=weight_attr, + bias_attr=bias_attr) + return batchnorm + + +def ReLU(): + return nn.ReLU() + + +def Upsample(scale_factor=None, mode='nearest', align_corners=False): + return nn.Upsample(None, scale_factor, mode, align_corners) + + +def MaxPool(kernel_size, stride, padding, ceil_mode=False): + return nn.MaxPool2D(kernel_size, stride, padding, ceil_mode=ceil_mode) + + +class Concat(nn.Layer): + def __init__(self, dim=0): + super(Concat, self).__init__() + self.dim = dim + + def forward(self, inputs): + return paddle.concat(inputs, axis=self.dim) + + def extra_repr(self): + return 'dim={}'.format(self.dim) + + +def _convert_attention_mask(attn_mask, dtype): + """ + Convert the attention mask to the target dtype we expect. + Parameters: + attn_mask (Tensor, optional): A tensor used in multi-head attention + to prevents attention to some unwanted positions, usually the + paddings or the subsequent positions. It is a tensor with shape + broadcasted to `[batch_size, n_head, sequence_length, sequence_length]`. + When the data type is bool, the unwanted positions have `False` + values and the others have `True` values. When the data type is + int, the unwanted positions have 0 values and the others have 1 + values. When the data type is float, the unwanted positions have + `-INF` values and the others have 0 values. It can be None when + nothing wanted or needed to be prevented attention to. Default None. + dtype (VarType): The target type of `attn_mask` we expect. + Returns: + Tensor: A Tensor with shape same as input `attn_mask`, with data type `dtype`. + """ + return nn.layer.transformer._convert_attention_mask(attn_mask, dtype) + +@register +class MultiHeadAttention(nn.Layer): + """ + Attention mapps queries and a set of key-value pairs to outputs, and + Multi-Head Attention performs multiple parallel attention to jointly attending + to information from different representation subspaces. + + Please refer to `Attention Is All You Need `_ + for more details. + + Parameters: + embed_dim (int): The expected feature size in the input and output. + num_heads (int): The number of heads in multi-head attention. + dropout (float, optional): The dropout probability used on attention + weights to drop some attention targets. 0 for no dropout. Default 0 + kdim (int, optional): The feature size in key. If None, assumed equal to + `embed_dim`. Default None. + vdim (int, optional): The feature size in value. If None, assumed equal to + `embed_dim`. Default None. + need_weights (bool, optional): Indicate whether to return the attention + weights. Default False. + + Examples: + + .. code-block:: python + + import paddle + + # encoder input: [batch_size, sequence_length, d_model] + query = paddle.rand((2, 4, 128)) + # self attention mask: [batch_size, num_heads, query_len, query_len] + attn_mask = paddle.rand((2, 2, 4, 4)) + multi_head_attn = paddle.nn.MultiHeadAttention(128, 2) + output = multi_head_attn(query, None, None, attn_mask=attn_mask) # [2, 4, 128] + """ + + def __init__(self, + embed_dim, + num_heads, + dropout=0., + kdim=None, + vdim=None, + need_weights=False): + super(MultiHeadAttention, self).__init__() + self.embed_dim = embed_dim + self.kdim = kdim if kdim is not None else embed_dim + self.vdim = vdim if vdim is not None else embed_dim + self._qkv_same_embed_dim = self.kdim == embed_dim and self.vdim == embed_dim + + self.num_heads = num_heads + self.dropout = dropout + self.need_weights = need_weights + + self.head_dim = embed_dim // num_heads + assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" + + if self._qkv_same_embed_dim: + self.in_proj_weight = self.create_parameter( + shape=[embed_dim, 3 * embed_dim], + attr=None, + dtype=self._dtype, + is_bias=False) + self.in_proj_bias = self.create_parameter( + shape=[3 * embed_dim], + attr=None, + dtype=self._dtype, + is_bias=True) + else: + self.q_proj = nn.Linear(embed_dim, embed_dim) + self.k_proj = nn.Linear(self.kdim, embed_dim) + self.v_proj = nn.Linear(self.vdim, embed_dim) + + self.out_proj = nn.Linear(embed_dim, embed_dim) + self._type_list = ('q_proj', 'k_proj', 'v_proj') + + self._reset_parameters() + + def _reset_parameters(self): + for p in self.parameters(): + if p.dim() > 1: + xavier_uniform_(p) + else: + constant_(p) + + def compute_qkv(self, tensor, index): + if self._qkv_same_embed_dim: + tensor = F.linear( + x=tensor, + weight=self.in_proj_weight[:, index * self.embed_dim:(index + 1) + * self.embed_dim], + bias=self.in_proj_bias[index * self.embed_dim:(index + 1) * + self.embed_dim] + if self.in_proj_bias is not None else None) + else: + tensor = getattr(self, self._type_list[index])(tensor) + tensor = tensor.reshape( + [0, 0, self.num_heads, self.head_dim]).transpose([0, 2, 1, 3]) + return tensor + + def forward(self, query, key=None, value=None, attn_mask=None): + r""" + Applies multi-head attention to map queries and a set of key-value pairs + to outputs. + + Parameters: + query (Tensor): The queries for multi-head attention. It is a + tensor with shape `[batch_size, query_length, embed_dim]`. The + data type should be float32 or float64. + key (Tensor, optional): The keys for multi-head attention. It is + a tensor with shape `[batch_size, key_length, kdim]`. The + data type should be float32 or float64. If None, use `query` as + `key`. Default None. + value (Tensor, optional): The values for multi-head attention. It + is a tensor with shape `[batch_size, value_length, vdim]`. + The data type should be float32 or float64. If None, use `query` as + `value`. Default None. + attn_mask (Tensor, optional): A tensor used in multi-head attention + to prevents attention to some unwanted positions, usually the + paddings or the subsequent positions. It is a tensor with shape + broadcasted to `[batch_size, n_head, sequence_length, sequence_length]`. + When the data type is bool, the unwanted positions have `False` + values and the others have `True` values. When the data type is + int, the unwanted positions have 0 values and the others have 1 + values. When the data type is float, the unwanted positions have + `-INF` values and the others have 0 values. It can be None when + nothing wanted or needed to be prevented attention to. Default None. + + Returns: + Tensor|tuple: It is a tensor that has the same shape and data type \ + as `query`, representing attention output. Or a tuple if \ + `need_weights` is True or `cache` is not None. If `need_weights` \ + is True, except for attention output, the tuple also includes \ + the attention weights tensor shaped `[batch_size, num_heads, query_length, key_length]`. \ + If `cache` is not None, the tuple then includes the new cache \ + having the same type as `cache`, and if it is `StaticCache`, it \ + is same as the input `cache`, if it is `Cache`, the new cache \ + reserves tensors concatanating raw tensors with intermediate \ + results of current query. + """ + key = query if key is None else key + value = query if value is None else value + # compute q ,k ,v + q, k, v = (self.compute_qkv(t, i) + for i, t in enumerate([query, key, value])) + + # scale dot product attention + product = paddle.matmul(x=q, y=k, transpose_y=True) + scaling = float(self.head_dim)**-0.5 + product = product * scaling + + if attn_mask is not None: + # Support bool or int mask + attn_mask = _convert_attention_mask(attn_mask, product.dtype) + product = product + attn_mask + weights = F.softmax(product) + if self.dropout: + weights = F.dropout( + weights, + self.dropout, + training=self.training, + mode="upscale_in_train") + + out = paddle.matmul(weights, v) + + # combine heads + out = paddle.transpose(out, perm=[0, 2, 1, 3]) + out = paddle.reshape(x=out, shape=[0, 0, out.shape[2] * out.shape[3]]) + + # project to output + out = self.out_proj(out) + + outs = [out] + if self.need_weights: + outs.append(weights) + return out if len(outs) == 1 else tuple(outs) + + +@register +class ConvMixer(nn.Layer): + def __init__( + self, + dim, + depth, + kernel_size=3, ): + super().__init__() + self.dim = dim + self.depth = depth + self.kernel_size = kernel_size + + self.mixer = self.conv_mixer(dim, depth, kernel_size) + + def forward(self, x): + return self.mixer(x) + + @staticmethod + def conv_mixer( + dim, + depth, + kernel_size, ): + Seq, ActBn = nn.Sequential, lambda x: Seq(x, nn.GELU(), nn.BatchNorm2D(dim)) + Residual = type('Residual', (Seq, ), + {'forward': lambda self, x: self[0](x) + x}) + return Seq(*[ + Seq(Residual( + ActBn( + nn.Conv2D( + dim, dim, kernel_size, groups=dim, padding="same"))), + ActBn(nn.Conv2D(dim, dim, 1))) for i in range(depth) + ]) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..0e6b31de8a8fcacce2cfa62242f458565540d0b6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/__init__.py @@ -0,0 +1,54 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import yolo_loss +from . import iou_aware_loss +from . import iou_loss +from . import ssd_loss +from . import fcos_loss +from . import solov2_loss +from . import ctfocal_loss +from . import keypoint_loss +from . import jde_loss +from . import fairmot_loss +from . import gfocal_loss +from . import detr_loss +from . import sparsercnn_loss +from . import focal_loss +from . import smooth_l1_loss +from . import probiou_loss +from . import cot_loss +from . import supcontrast +from . import queryinst_loss + +from .yolo_loss import * +from .iou_aware_loss import * +from .iou_loss import * +from .ssd_loss import * +from .fcos_loss import * +from .solov2_loss import * +from .ctfocal_loss import * +from .keypoint_loss import * +from .jde_loss import * +from .fairmot_loss import * +from .gfocal_loss import * +from .detr_loss import * +from .sparsercnn_loss import * +from .focal_loss import * +from .smooth_l1_loss import * +from .pose3d_loss import * +from .probiou_loss import * +from .cot_loss import * +from .supcontrast import * +from .queryinst_loss import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e5688d2a9176846e4b9f1e27c8b627079d04d4e5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/cot_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/cot_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..554a23f9221160ca0c00cc3243e8106ef45803a6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/cot_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ctfocal_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ctfocal_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1c753e0457baecf64d3cc219e335edd0174ee6db Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ctfocal_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/detr_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/detr_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ee380f1affc8037a38bd1b1ff88a09a42d05420a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/detr_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fairmot_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fairmot_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..15150728e304e5db083e553421e89e07849b9b9d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fairmot_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fcos_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fcos_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..640f58ffe9d40453e87db5d414e4df4977cb591f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/fcos_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/focal_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/focal_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e03d3ef03bc1b46f0c9b6dedec31c1fc646679f8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/focal_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/gfocal_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/gfocal_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6ff4f2ea9349fc552377d0e021f1914d2fdcdb2e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/gfocal_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_aware_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_aware_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e749c7d869c9d080c14930f5007fd4489193288a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_aware_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..80f5bd973d54bb552536a3bec60e28913ceb98e2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/iou_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/jde_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/jde_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..64ea3c5d42197c108219929a4f0fb66a92eb43b5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/jde_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/keypoint_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/keypoint_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..03345225fa009c1fdbf5a48336f366026d1794ed Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/keypoint_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/pose3d_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/pose3d_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9889952513b30a1af53d842b72c8dde79dd45f21 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/pose3d_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/probiou_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/probiou_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b8a2114f005d52047f742def9cd00d437bfea16a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/probiou_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/queryinst_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/queryinst_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1e4988947702f674aba271e28152bc00edda7968 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/queryinst_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/smooth_l1_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/smooth_l1_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..e1e875e047d116ea568b9c8e521552910ad0a129 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/smooth_l1_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/solov2_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/solov2_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..82dced637aaf0237f6215edf7cfbd066e53b4975 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/solov2_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/sparsercnn_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/sparsercnn_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..30378b340e8fc32683a088de226edbbe82e37cb5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/sparsercnn_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ssd_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ssd_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..14047d976e248d47a8fffa304c3d3465eb2b9b30 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/ssd_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/supcontrast.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/supcontrast.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fe584d5a209f8bb0f0a7c3247a6ba037edecbedb Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/supcontrast.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/varifocal_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/varifocal_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..878e2a02ef54d003392d6bb6d49229936b80789e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/varifocal_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/yolo_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/yolo_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..962c54dead286b1b65602e26350b3a64f23c14a2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/losses/__pycache__/yolo_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/cot_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/cot_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..40f8f9acf9f92e3051c6e90c44604be3460e964a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/cot_loss.py @@ -0,0 +1,61 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import numpy as np +from ppdet.core.workspace import register + +__all__ = ['COTLoss'] + +@register +class COTLoss(nn.Layer): + __shared__ = ['num_classes'] + def __init__(self, + num_classes=80, + cot_scale=1, + cot_lambda=1): + super(COTLoss, self).__init__() + self.cot_scale = cot_scale + self.cot_lambda = cot_lambda + self.num_classes = num_classes + + def forward(self, scores, targets, cot_relation): + cls_name = 'loss_bbox_cls_cot' + loss_bbox = {} + + tgt_labels, tgt_bboxes, tgt_gt_inds = targets + tgt_labels = paddle.concat(tgt_labels) if len( + tgt_labels) > 1 else tgt_labels[0] + mask = (tgt_labels < self.num_classes) + valid_inds = paddle.nonzero(tgt_labels >= 0).flatten() + if valid_inds.shape[0] == 0: + loss_bbox[cls_name] = paddle.zeros([1], dtype='float32') + else: + tgt_labels = tgt_labels.cast('int64') + valid_cot_targets = [] + for i in range(tgt_labels.shape[0]): + train_label = tgt_labels[i] + if train_label < self.num_classes: + valid_cot_targets.append(cot_relation[train_label]) + coco_targets = paddle.to_tensor(valid_cot_targets) + coco_targets.stop_gradient = True + coco_loss = - coco_targets * F.log_softmax(scores[mask][:, :-1] * self.cot_scale) + loss_bbox[cls_name] = self.cot_lambda * paddle.mean(paddle.sum(coco_loss, axis=-1)) + return loss_bbox diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/ctfocal_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/ctfocal_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..dd00eb8540e273f4f25444a5217f44955c80da6e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/ctfocal_loss.py @@ -0,0 +1,68 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle + +from ppdet.core.workspace import register, serializable + +__all__ = ['CTFocalLoss'] + + +@register +@serializable +class CTFocalLoss(object): + """ + CTFocalLoss: CornerNet & CenterNet Focal Loss + Args: + loss_weight (float): loss weight + gamma (float): gamma parameter for Focal Loss + """ + + def __init__(self, loss_weight=1., gamma=2.0): + self.loss_weight = loss_weight + self.gamma = gamma + + def __call__(self, pred, target): + """ + Calculate the loss + Args: + pred (Tensor): heatmap prediction + target (Tensor): target for positive samples + Return: + ct_focal_loss (Tensor): Focal Loss used in CornerNet & CenterNet. + Note that the values in target are in [0, 1] since gaussian is + used to reduce the punishment and we treat [0, 1) as neg example. + """ + fg_map = paddle.cast(target == 1, 'float32') + fg_map.stop_gradient = True + bg_map = paddle.cast(target < 1, 'float32') + bg_map.stop_gradient = True + + neg_weights = paddle.pow(1 - target, 4) + pos_loss = 0 - paddle.log(pred) * paddle.pow(1 - pred, + self.gamma) * fg_map + + neg_loss = 0 - paddle.log(1 - pred) * paddle.pow( + pred, self.gamma) * neg_weights * bg_map + pos_loss = paddle.sum(pos_loss) + neg_loss = paddle.sum(neg_loss) + + fg_num = paddle.sum(fg_map) + ct_focal_loss = (pos_loss + neg_loss) / ( + fg_num + paddle.cast(fg_num == 0, 'float32')) + return ct_focal_loss * self.loss_weight diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/detr_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/detr_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..f4291a7a74b5cb979d746bbdf8178f8116c3403f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/detr_loss.py @@ -0,0 +1,330 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from .iou_loss import GIoULoss +from ..transformers import bbox_cxcywh_to_xyxy, sigmoid_focal_loss + +__all__ = ['DETRLoss', 'DINOLoss'] + + +@register +class DETRLoss(nn.Layer): + __shared__ = ['num_classes', 'use_focal_loss'] + __inject__ = ['matcher'] + + def __init__(self, + num_classes=80, + matcher='HungarianMatcher', + loss_coeff={ + 'class': 1, + 'bbox': 5, + 'giou': 2, + 'no_object': 0.1, + 'mask': 1, + 'dice': 1 + }, + aux_loss=True, + use_focal_loss=False): + r""" + Args: + num_classes (int): The number of classes. + matcher (HungarianMatcher): It computes an assignment between the targets + and the predictions of the network. + loss_coeff (dict): The coefficient of loss. + aux_loss (bool): If 'aux_loss = True', loss at each decoder layer are to be used. + use_focal_loss (bool): Use focal loss or not. + """ + super(DETRLoss, self).__init__() + self.num_classes = num_classes + + self.matcher = matcher + self.loss_coeff = loss_coeff + self.aux_loss = aux_loss + self.use_focal_loss = use_focal_loss + + if not self.use_focal_loss: + self.loss_coeff['class'] = paddle.full([num_classes + 1], + loss_coeff['class']) + self.loss_coeff['class'][-1] = loss_coeff['no_object'] + self.giou_loss = GIoULoss() + + def _get_loss_class(self, + logits, + gt_class, + match_indices, + bg_index, + num_gts, + postfix=""): + # logits: [b, query, num_classes], gt_class: list[[n, 1]] + name_class = "loss_class" + postfix + if logits is None: + return {name_class: paddle.zeros([1])} + target_label = paddle.full(logits.shape[:2], bg_index, dtype='int64') + bs, num_query_objects = target_label.shape + if sum(len(a) for a in gt_class) > 0: + index, updates = self._get_index_updates(num_query_objects, + gt_class, match_indices) + target_label = paddle.scatter( + target_label.reshape([-1, 1]), index, updates.astype('int64')) + target_label = target_label.reshape([bs, num_query_objects]) + if self.use_focal_loss: + target_label = F.one_hot(target_label, + self.num_classes + 1)[..., :-1] + return { + name_class: self.loss_coeff['class'] * sigmoid_focal_loss( + logits, target_label, num_gts / num_query_objects) + if self.use_focal_loss else F.cross_entropy( + logits, target_label, weight=self.loss_coeff['class']) + } + + def _get_loss_bbox(self, boxes, gt_bbox, match_indices, num_gts, + postfix=""): + # boxes: [b, query, 4], gt_bbox: list[[n, 4]] + name_bbox = "loss_bbox" + postfix + name_giou = "loss_giou" + postfix + if boxes is None: + return {name_bbox: paddle.zeros([1]), name_giou: paddle.zeros([1])} + loss = dict() + if sum(len(a) for a in gt_bbox) == 0: + loss[name_bbox] = paddle.to_tensor([0.]) + loss[name_giou] = paddle.to_tensor([0.]) + return loss + + src_bbox, target_bbox = self._get_src_target_assign(boxes, gt_bbox, + match_indices) + loss[name_bbox] = self.loss_coeff['bbox'] * F.l1_loss( + src_bbox, target_bbox, reduction='sum') / num_gts + loss[name_giou] = self.giou_loss( + bbox_cxcywh_to_xyxy(src_bbox), bbox_cxcywh_to_xyxy(target_bbox)) + loss[name_giou] = loss[name_giou].sum() / num_gts + loss[name_giou] = self.loss_coeff['giou'] * loss[name_giou] + return loss + + def _get_loss_mask(self, masks, gt_mask, match_indices, num_gts, + postfix=""): + # masks: [b, query, h, w], gt_mask: list[[n, H, W]] + name_mask = "loss_mask" + postfix + name_dice = "loss_dice" + postfix + if masks is None: + return {name_mask: paddle.zeros([1]), name_dice: paddle.zeros([1])} + loss = dict() + if sum(len(a) for a in gt_mask) == 0: + loss[name_mask] = paddle.to_tensor([0.]) + loss[name_dice] = paddle.to_tensor([0.]) + return loss + + src_masks, target_masks = self._get_src_target_assign(masks, gt_mask, + match_indices) + src_masks = F.interpolate( + src_masks.unsqueeze(0), + size=target_masks.shape[-2:], + mode="bilinear")[0] + loss[name_mask] = self.loss_coeff['mask'] * F.sigmoid_focal_loss( + src_masks, + target_masks, + paddle.to_tensor( + [num_gts], dtype='float32')) + loss[name_dice] = self.loss_coeff['dice'] * self._dice_loss( + src_masks, target_masks, num_gts) + return loss + + def _dice_loss(self, inputs, targets, num_gts): + inputs = F.sigmoid(inputs) + inputs = inputs.flatten(1) + targets = targets.flatten(1) + numerator = 2 * (inputs * targets).sum(1) + denominator = inputs.sum(-1) + targets.sum(-1) + loss = 1 - (numerator + 1) / (denominator + 1) + return loss.sum() / num_gts + + def _get_loss_aux(self, + boxes, + logits, + gt_bbox, + gt_class, + bg_index, + num_gts, + dn_match_indices=None, + postfix=""): + if boxes is None or logits is None: + return { + "loss_class_aux" + postfix: paddle.paddle.zeros([1]), + "loss_bbox_aux" + postfix: paddle.paddle.zeros([1]), + "loss_giou_aux" + postfix: paddle.paddle.zeros([1]) + } + loss_class = [] + loss_bbox = [] + loss_giou = [] + for aux_boxes, aux_logits in zip(boxes, logits): + if dn_match_indices is None: + match_indices = self.matcher(aux_boxes, aux_logits, gt_bbox, + gt_class) + else: + match_indices = dn_match_indices + loss_class.append( + self._get_loss_class(aux_logits, gt_class, match_indices, + bg_index, num_gts, postfix)['loss_class' + + postfix]) + loss_ = self._get_loss_bbox(aux_boxes, gt_bbox, match_indices, + num_gts, postfix) + loss_bbox.append(loss_['loss_bbox' + postfix]) + loss_giou.append(loss_['loss_giou' + postfix]) + loss = { + "loss_class_aux" + postfix: paddle.add_n(loss_class), + "loss_bbox_aux" + postfix: paddle.add_n(loss_bbox), + "loss_giou_aux" + postfix: paddle.add_n(loss_giou) + } + return loss + + def _get_index_updates(self, num_query_objects, target, match_indices): + batch_idx = paddle.concat([ + paddle.full_like(src, i) for i, (src, _) in enumerate(match_indices) + ]) + src_idx = paddle.concat([src for (src, _) in match_indices]) + src_idx += (batch_idx * num_query_objects) + target_assign = paddle.concat([ + paddle.gather( + t, dst, axis=0) for t, (_, dst) in zip(target, match_indices) + ]) + return src_idx, target_assign + + def _get_src_target_assign(self, src, target, match_indices): + src_assign = paddle.concat([ + paddle.gather( + t, I, axis=0) if len(I) > 0 else paddle.zeros([0, t.shape[-1]]) + for t, (I, _) in zip(src, match_indices) + ]) + target_assign = paddle.concat([ + paddle.gather( + t, J, axis=0) if len(J) > 0 else paddle.zeros([0, t.shape[-1]]) + for t, (_, J) in zip(target, match_indices) + ]) + return src_assign, target_assign + + def forward(self, + boxes, + logits, + gt_bbox, + gt_class, + masks=None, + gt_mask=None, + postfix="", + **kwargs): + r""" + Args: + boxes (Tensor|None): [l, b, query, 4] + logits (Tensor|None): [l, b, query, num_classes] + gt_bbox (List(Tensor)): list[[n, 4]] + gt_class (List(Tensor)): list[[n, 1]] + masks (Tensor, optional): [b, query, h, w] + gt_mask (List(Tensor), optional): list[[n, H, W]] + postfix (str): postfix of loss name + """ + dn_match_indices = kwargs.get("dn_match_indices", None) + if dn_match_indices is None and (boxes is not None and + logits is not None): + match_indices = self.matcher(boxes[-1].detach(), + logits[-1].detach(), gt_bbox, gt_class) + else: + match_indices = dn_match_indices + + num_gts = sum(len(a) for a in gt_bbox) + num_gts = paddle.to_tensor([num_gts], dtype="float32") + if paddle.distributed.get_world_size() > 1: + paddle.distributed.all_reduce(num_gts) + num_gts /= paddle.distributed.get_world_size() + num_gts = paddle.clip(num_gts, min=1.) * kwargs.get("dn_num_group", 1.) + + total_loss = dict() + total_loss.update( + self._get_loss_class(logits[ + -1] if logits is not None else None, gt_class, match_indices, + self.num_classes, num_gts, postfix)) + total_loss.update( + self._get_loss_bbox(boxes[-1] if boxes is not None else None, + gt_bbox, match_indices, num_gts, postfix)) + if masks is not None and gt_mask is not None: + total_loss.update( + self._get_loss_mask(masks if masks is not None else None, + gt_mask, match_indices, num_gts, postfix)) + + if self.aux_loss: + total_loss.update( + self._get_loss_aux( + boxes[:-1] if boxes is not None else None, logits[:-1] + if logits is not None else None, gt_bbox, gt_class, + self.num_classes, num_gts, dn_match_indices, postfix)) + + return total_loss + + +@register +class DINOLoss(DETRLoss): + def forward(self, + boxes, + logits, + gt_bbox, + gt_class, + masks=None, + gt_mask=None, + postfix="", + dn_out_bboxes=None, + dn_out_logits=None, + dn_meta=None, + **kwargs): + total_loss = super(DINOLoss, self).forward(boxes, logits, gt_bbox, + gt_class) + + if dn_meta is not None: + dn_positive_idx, dn_num_group = \ + dn_meta["dn_positive_idx"], dn_meta["dn_num_group"] + assert len(gt_class) == len(dn_positive_idx) + + # denoising match indices + dn_match_indices = [] + for i in range(len(gt_class)): + num_gt = len(gt_class[i]) + if num_gt > 0: + gt_idx = paddle.arange(end=num_gt, dtype="int64") + gt_idx = gt_idx.unsqueeze(0).tile( + [dn_num_group, 1]).flatten() + assert len(gt_idx) == len(dn_positive_idx[i]) + dn_match_indices.append((dn_positive_idx[i], gt_idx)) + else: + dn_match_indices.append((paddle.zeros( + [0], dtype="int64"), paddle.zeros( + [0], dtype="int64"))) + else: + dn_match_indices, dn_num_group = None, 1. + + # compute denoising training loss + dn_loss = super(DINOLoss, self).forward( + dn_out_bboxes, + dn_out_logits, + gt_bbox, + gt_class, + postfix="_dn", + dn_match_indices=dn_match_indices, + dn_num_group=dn_num_group) + total_loss.update(dn_loss) + + return total_loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/fairmot_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/fairmot_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..e24ff33fe341cce9bd4865807922a34bc2a91841 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/fairmot_loss.py @@ -0,0 +1,41 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +from paddle.nn.initializer import Constant +from ppdet.core.workspace import register + +__all__ = ['FairMOTLoss'] + + +@register +class FairMOTLoss(nn.Layer): + def __init__(self): + super(FairMOTLoss, self).__init__() + self.det_weight = self.create_parameter( + shape=[1], default_initializer=Constant(-1.85)) + self.reid_weight = self.create_parameter( + shape=[1], default_initializer=Constant(-1.05)) + + def forward(self, det_loss, reid_loss): + loss = paddle.exp(-self.det_weight) * det_loss + paddle.exp( + -self.reid_weight) * reid_loss + (self.det_weight + self.reid_weight + ) + loss *= 0.5 + return {'loss': loss} diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/fcos_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/fcos_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..6ff52bc2a592b448323ec638b979af2c932562ce --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/fcos_loss.py @@ -0,0 +1,263 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from ppdet.modeling import ops + +__all__ = ['FCOSLoss'] + + +def flatten_tensor(inputs, channel_first=False): + """ + Flatten a Tensor + Args: + inputs (Tensor): 4-D Tensor with shape [N, C, H, W] or [N, H, W, C] + channel_first (bool): If true the dimension order of Tensor is + [N, C, H, W], otherwise is [N, H, W, C] + Return: + output_channel_last (Tensor): The flattened Tensor in channel_last style + """ + if channel_first: + input_channel_last = paddle.transpose(inputs, perm=[0, 2, 3, 1]) + else: + input_channel_last = inputs + output_channel_last = paddle.flatten( + input_channel_last, start_axis=0, stop_axis=2) + return output_channel_last + + +@register +class FCOSLoss(nn.Layer): + """ + FCOSLoss + Args: + loss_alpha (float): alpha in focal loss + loss_gamma (float): gamma in focal loss + iou_loss_type (str): location loss type, IoU/GIoU/LINEAR_IoU + reg_weights (float): weight for location loss + quality (str): quality branch, centerness/iou + """ + + def __init__(self, + loss_alpha=0.25, + loss_gamma=2.0, + iou_loss_type="giou", + reg_weights=1.0, + quality='centerness'): + super(FCOSLoss, self).__init__() + self.loss_alpha = loss_alpha + self.loss_gamma = loss_gamma + self.iou_loss_type = iou_loss_type + self.reg_weights = reg_weights + self.quality = quality + + def __iou_loss(self, + pred, + targets, + positive_mask, + weights=None, + return_iou=False): + """ + Calculate the loss for location prediction + Args: + pred (Tensor): bounding boxes prediction + targets (Tensor): targets for positive samples + positive_mask (Tensor): mask of positive samples + weights (Tensor): weights for each positive samples + Return: + loss (Tensor): location loss + """ + plw = pred[:, 0] * positive_mask + pth = pred[:, 1] * positive_mask + prw = pred[:, 2] * positive_mask + pbh = pred[:, 3] * positive_mask + + tlw = targets[:, 0] * positive_mask + tth = targets[:, 1] * positive_mask + trw = targets[:, 2] * positive_mask + tbh = targets[:, 3] * positive_mask + tlw.stop_gradient = True + trw.stop_gradient = True + tth.stop_gradient = True + tbh.stop_gradient = True + + ilw = paddle.minimum(plw, tlw) + irw = paddle.minimum(prw, trw) + ith = paddle.minimum(pth, tth) + ibh = paddle.minimum(pbh, tbh) + + clw = paddle.maximum(plw, tlw) + crw = paddle.maximum(prw, trw) + cth = paddle.maximum(pth, tth) + cbh = paddle.maximum(pbh, tbh) + + area_predict = (plw + prw) * (pth + pbh) + area_target = (tlw + trw) * (tth + tbh) + area_inter = (ilw + irw) * (ith + ibh) + ious = (area_inter + 1.0) / ( + area_predict + area_target - area_inter + 1.0) + ious = ious * positive_mask + + if return_iou: + return ious + + if self.iou_loss_type.lower() == "linear_iou": + loss = 1.0 - ious + elif self.iou_loss_type.lower() == "giou": + area_uniou = area_predict + area_target - area_inter + area_circum = (clw + crw) * (cth + cbh) + 1e-7 + giou = ious - (area_circum - area_uniou) / area_circum + loss = 1.0 - giou + elif self.iou_loss_type.lower() == "iou": + loss = 0.0 - paddle.log(ious) + else: + raise KeyError + if weights is not None: + loss = loss * weights + return loss + + def forward(self, cls_logits, bboxes_reg, centerness, tag_labels, + tag_bboxes, tag_center): + """ + Calculate the loss for classification, location and centerness + Args: + cls_logits (list): list of Tensor, which is predicted + score for all anchor points with shape [N, M, C] + bboxes_reg (list): list of Tensor, which is predicted + offsets for all anchor points with shape [N, M, 4] + centerness (list): list of Tensor, which is predicted + centerness for all anchor points with shape [N, M, 1] + tag_labels (list): list of Tensor, which is category + targets for each anchor point + tag_bboxes (list): list of Tensor, which is bounding + boxes targets for positive samples + tag_center (list): list of Tensor, which is centerness + targets for positive samples + Return: + loss (dict): loss composed by classification loss, bounding box + """ + cls_logits_flatten_list = [] + bboxes_reg_flatten_list = [] + centerness_flatten_list = [] + tag_labels_flatten_list = [] + tag_bboxes_flatten_list = [] + tag_center_flatten_list = [] + num_lvl = len(cls_logits) + for lvl in range(num_lvl): + cls_logits_flatten_list.append( + flatten_tensor(cls_logits[lvl], True)) + bboxes_reg_flatten_list.append( + flatten_tensor(bboxes_reg[lvl], True)) + centerness_flatten_list.append( + flatten_tensor(centerness[lvl], True)) + + tag_labels_flatten_list.append( + flatten_tensor(tag_labels[lvl], False)) + tag_bboxes_flatten_list.append( + flatten_tensor(tag_bboxes[lvl], False)) + tag_center_flatten_list.append( + flatten_tensor(tag_center[lvl], False)) + + cls_logits_flatten = paddle.concat(cls_logits_flatten_list, axis=0) + bboxes_reg_flatten = paddle.concat(bboxes_reg_flatten_list, axis=0) + centerness_flatten = paddle.concat(centerness_flatten_list, axis=0) + + tag_labels_flatten = paddle.concat(tag_labels_flatten_list, axis=0) + tag_bboxes_flatten = paddle.concat(tag_bboxes_flatten_list, axis=0) + tag_center_flatten = paddle.concat(tag_center_flatten_list, axis=0) + tag_labels_flatten.stop_gradient = True + tag_bboxes_flatten.stop_gradient = True + tag_center_flatten.stop_gradient = True + + mask_positive_bool = tag_labels_flatten > 0 + mask_positive_bool.stop_gradient = True + mask_positive_float = paddle.cast(mask_positive_bool, dtype="float32") + mask_positive_float.stop_gradient = True + + num_positive_fp32 = paddle.sum(mask_positive_float) + num_positive_fp32.stop_gradient = True + num_positive_int32 = paddle.cast(num_positive_fp32, dtype="int32") + num_positive_int32 = num_positive_int32 * 0 + 1 + num_positive_int32.stop_gradient = True + + normalize_sum = paddle.sum(tag_center_flatten * mask_positive_float) + normalize_sum.stop_gradient = True + + # 1. cls_logits: sigmoid_focal_loss + # expand onehot labels + num_classes = cls_logits_flatten.shape[-1] + tag_labels_flatten = paddle.squeeze(tag_labels_flatten, axis=-1) + tag_labels_flatten_bin = F.one_hot( + tag_labels_flatten, num_classes=1 + num_classes) + tag_labels_flatten_bin = tag_labels_flatten_bin[:, 1:] + # sigmoid_focal_loss + cls_loss = F.sigmoid_focal_loss( + cls_logits_flatten, tag_labels_flatten_bin) / num_positive_fp32 + + if self.quality == 'centerness': + # 2. bboxes_reg: giou_loss + mask_positive_float = paddle.squeeze(mask_positive_float, axis=-1) + tag_center_flatten = paddle.squeeze(tag_center_flatten, axis=-1) + reg_loss = self.__iou_loss( + bboxes_reg_flatten, + tag_bboxes_flatten, + mask_positive_float, + weights=tag_center_flatten) + reg_loss = reg_loss * mask_positive_float / normalize_sum + + # 3. centerness: sigmoid_cross_entropy_with_logits_loss + centerness_flatten = paddle.squeeze(centerness_flatten, axis=-1) + quality_loss = ops.sigmoid_cross_entropy_with_logits( + centerness_flatten, tag_center_flatten) + quality_loss = quality_loss * mask_positive_float / num_positive_fp32 + + elif self.quality == 'iou': + # 2. bboxes_reg: giou_loss + mask_positive_float = paddle.squeeze(mask_positive_float, axis=-1) + tag_center_flatten = paddle.squeeze(tag_center_flatten, axis=-1) + reg_loss = self.__iou_loss( + bboxes_reg_flatten, + tag_bboxes_flatten, + mask_positive_float, + weights=None) + reg_loss = reg_loss * mask_positive_float / num_positive_fp32 + # num_positive_fp32 is num_foreground + + # 3. centerness: sigmoid_cross_entropy_with_logits_loss + centerness_flatten = paddle.squeeze(centerness_flatten, axis=-1) + gt_ious = self.__iou_loss( + bboxes_reg_flatten, + tag_bboxes_flatten, + mask_positive_float, + weights=None, + return_iou=True) + quality_loss = ops.sigmoid_cross_entropy_with_logits( + centerness_flatten, gt_ious) + quality_loss = quality_loss * mask_positive_float / num_positive_fp32 + else: + raise Exception(f'Unknown quality type: {self.quality}') + + loss_all = { + "loss_cls": paddle.sum(cls_loss), + "loss_box": paddle.sum(reg_loss), + "loss_quality": paddle.sum(quality_loss), + } + return loss_all diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/focal_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/focal_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..b9a64e1bc22d7e69256b311639ceb450c1381798 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/focal_loss.py @@ -0,0 +1,138 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn.functional as F +import paddle.nn as nn +from ppdet.core.workspace import register + +__all__ = ['FocalLoss', 'Weighted_FocalLoss'] + +@register +class FocalLoss(nn.Layer): + """A wrapper around paddle.nn.functional.sigmoid_focal_loss. + Args: + use_sigmoid (bool): currently only support use_sigmoid=True + alpha (float): parameter alpha in Focal Loss + gamma (float): parameter gamma in Focal Loss + loss_weight (float): final loss will be multiplied by this + """ + def __init__(self, + use_sigmoid=True, + alpha=0.25, + gamma=2.0, + loss_weight=1.0): + super(FocalLoss, self).__init__() + assert use_sigmoid == True, \ + 'Focal Loss only supports sigmoid at the moment' + self.use_sigmoid = use_sigmoid + self.alpha = alpha + self.gamma = gamma + self.loss_weight = loss_weight + + def forward(self, pred, target, reduction='none'): + """forward function. + Args: + pred (Tensor): logits of class prediction, of shape (N, num_classes) + target (Tensor): target class label, of shape (N, ) + reduction (str): the way to reduce loss, one of (none, sum, mean) + """ + num_classes = pred.shape[1] + target = F.one_hot(target, num_classes+1).cast(pred.dtype) + target = target[:, :-1].detach() + loss = F.sigmoid_focal_loss( + pred, target, alpha=self.alpha, gamma=self.gamma, + reduction=reduction) + return loss * self.loss_weight + + +@register +class Weighted_FocalLoss(FocalLoss): + """A wrapper around paddle.nn.functional.sigmoid_focal_loss. + Args: + use_sigmoid (bool): currently only support use_sigmoid=True + alpha (float): parameter alpha in Focal Loss + gamma (float): parameter gamma in Focal Loss + loss_weight (float): final loss will be multiplied by this + """ + def __init__(self, + use_sigmoid=True, + alpha=0.25, + gamma=2.0, + loss_weight=1.0, + reduction="mean"): + super(FocalLoss, self).__init__() + assert use_sigmoid == True, \ + 'Focal Loss only supports sigmoid at the moment' + self.use_sigmoid = use_sigmoid + self.alpha = alpha + self.gamma = gamma + self.loss_weight = loss_weight + self.reduction = reduction + + def forward(self, pred, target, weight=None, avg_factor=None, reduction_override=None): + """forward function. + Args: + pred (Tensor): logits of class prediction, of shape (N, num_classes) + target (Tensor): target class label, of shape (N, ) + reduction (str): the way to reduce loss, one of (none, sum, mean) + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + num_classes = pred.shape[1] + target = F.one_hot(target, num_classes + 1).astype(pred.dtype) + target = target[:, :-1].detach() + loss = F.sigmoid_focal_loss( + pred, target, alpha=self.alpha, gamma=self.gamma, + reduction='none') + + if weight is not None: + if weight.shape != loss.shape: + if weight.shape[0] == loss.shape[0]: + # For most cases, weight is of shape (num_priors, ), + # which means it does not have the second axis num_class + weight = weight.reshape((-1, 1)) + else: + # Sometimes, weight per anchor per class is also needed. e.g. + # in FSAF. But it may be flattened of shape + # (num_priors x num_class, ), while loss is still of shape + # (num_priors, num_class). + assert weight.numel() == loss.numel() + weight = weight.reshape((loss.shape[0], -1)) + assert weight.ndim == loss.ndim + loss = loss * weight + + # if avg_factor is not specified, just reduce the loss + if avg_factor is None: + if reduction == 'mean': + loss = loss.mean() + elif reduction == 'sum': + loss = loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == 'mean': + # Avoid causing ZeroDivisionError when avg_factor is 0.0, + # i.e., all labels of an image belong to ignore index. + eps = 1e-10 + loss = loss.sum() / (avg_factor + eps) + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != 'none': + raise ValueError('avg_factor can not be used with reduction="sum"') + + return loss * self.loss_weight diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/gfocal_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/gfocal_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..37e27f084e624491cc8226420548ea498f86d863 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/gfocal_loss.py @@ -0,0 +1,217 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/losses/gfocal_loss.py + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling import ops + +__all__ = ['QualityFocalLoss', 'DistributionFocalLoss'] + + +def quality_focal_loss(pred, target, beta=2.0, use_sigmoid=True): + """ + Quality Focal Loss (QFL) is from `Generalized Focal Loss: Learning + Qualified and Distributed Bounding Boxes for Dense Object Detection + `_. + Args: + pred (Tensor): Predicted joint representation of classification + and quality (IoU) estimation with shape (N, C), C is the number of + classes. + target (tuple([Tensor])): Target category label with shape (N,) + and target quality label with shape (N,). + beta (float): The beta parameter for calculating the modulating factor. + Defaults to 2.0. + Returns: + Tensor: Loss tensor with shape (N,). + """ + assert len(target) == 2, """target for QFL must be a tuple of two elements, + including category label and quality label, respectively""" + # label denotes the category id, score denotes the quality score + label, score = target + if use_sigmoid: + func = F.binary_cross_entropy_with_logits + else: + func = F.binary_cross_entropy + + # negatives are supervised by 0 quality score + pred_sigmoid = F.sigmoid(pred) if use_sigmoid else pred + scale_factor = pred_sigmoid + zerolabel = paddle.zeros(pred.shape, dtype='float32') + loss = func(pred, zerolabel, reduction='none') * scale_factor.pow(beta) + + # FG cat_id: [0, num_classes -1], BG cat_id: num_classes + bg_class_ind = pred.shape[1] + pos = paddle.logical_and((label >= 0), + (label < bg_class_ind)).nonzero().squeeze(1) + if pos.shape[0] == 0: + return loss.sum(axis=1) + pos_label = paddle.gather(label, pos, axis=0) + pos_mask = np.zeros(pred.shape, dtype=np.int32) + pos_mask[pos.numpy(), pos_label.numpy()] = 1 + pos_mask = paddle.to_tensor(pos_mask, dtype='bool') + score = score.unsqueeze(-1).expand([-1, pred.shape[1]]).cast('float32') + # positives are supervised by bbox quality (IoU) score + scale_factor_new = score - pred_sigmoid + + loss_pos = func( + pred, score, reduction='none') * scale_factor_new.abs().pow(beta) + loss = loss * paddle.logical_not(pos_mask) + loss_pos * pos_mask + loss = loss.sum(axis=1) + return loss + + +def distribution_focal_loss(pred, label): + """Distribution Focal Loss (DFL) is from `Generalized Focal Loss: Learning + Qualified and Distributed Bounding Boxes for Dense Object Detection + `_. + Args: + pred (Tensor): Predicted general distribution of bounding boxes + (before softmax) with shape (N, n+1), n is the max value of the + integral set `{0, ..., n}` in paper. + label (Tensor): Target distance label for bounding boxes with + shape (N,). + Returns: + Tensor: Loss tensor with shape (N,). + """ + dis_left = label.cast('int64') + dis_right = dis_left + 1 + weight_left = dis_right.cast('float32') - label + weight_right = label - dis_left.cast('float32') + loss = F.cross_entropy(pred, dis_left, reduction='none') * weight_left \ + + F.cross_entropy(pred, dis_right, reduction='none') * weight_right + return loss + + +@register +@serializable +class QualityFocalLoss(nn.Layer): + r"""Quality Focal Loss (QFL) is a variant of `Generalized Focal Loss: + Learning Qualified and Distributed Bounding Boxes for Dense Object + Detection `_. + Args: + use_sigmoid (bool): Whether sigmoid operation is conducted in QFL. + Defaults to True. + beta (float): The beta parameter for calculating the modulating factor. + Defaults to 2.0. + reduction (str): Options are "none", "mean" and "sum". + loss_weight (float): Loss weight of current loss. + """ + + def __init__(self, + use_sigmoid=True, + beta=2.0, + reduction='mean', + loss_weight=1.0): + super(QualityFocalLoss, self).__init__() + self.use_sigmoid = use_sigmoid + self.beta = beta + assert reduction in ('none', 'mean', 'sum') + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, pred, target, weight=None, avg_factor=None): + """Forward function. + Args: + pred (Tensor): Predicted joint representation of + classification and quality (IoU) estimation with shape (N, C), + C is the number of classes. + target (tuple([Tensor])): Target category label with shape + (N,) and target quality label with shape (N,). + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + """ + + loss = self.loss_weight * quality_focal_loss( + pred, target, beta=self.beta, use_sigmoid=self.use_sigmoid) + + if weight is not None: + loss = loss * weight + if avg_factor is None: + if self.reduction == 'none': + return loss + elif self.reduction == 'mean': + return loss.mean() + elif self.reduction == 'sum': + return loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if self.reduction == 'mean': + loss = loss.sum() / avg_factor + # if reduction is 'none', then do nothing, otherwise raise an error + elif self.reduction != 'none': + raise ValueError( + 'avg_factor can not be used with reduction="sum"') + return loss + + +@register +@serializable +class DistributionFocalLoss(nn.Layer): + """Distribution Focal Loss (DFL) is a variant of `Generalized Focal Loss: + Learning Qualified and Distributed Bounding Boxes for Dense Object + Detection `_. + Args: + reduction (str): Options are `'none'`, `'mean'` and `'sum'`. + loss_weight (float): Loss weight of current loss. + """ + + def __init__(self, reduction='mean', loss_weight=1.0): + super(DistributionFocalLoss, self).__init__() + assert reduction in ('none', 'mean', 'sum') + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, pred, target, weight=None, avg_factor=None): + """Forward function. + Args: + pred (Tensor): Predicted general distribution of bounding + boxes (before softmax) with shape (N, n+1), n is the max value + of the integral set `{0, ..., n}` in paper. + target (Tensor): Target distance label for bounding boxes + with shape (N,). + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + """ + loss = self.loss_weight * distribution_focal_loss(pred, target) + if weight is not None: + loss = loss * weight + if avg_factor is None: + if self.reduction == 'none': + return loss + elif self.reduction == 'mean': + return loss.mean() + elif self.reduction == 'sum': + return loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if self.reduction == 'mean': + loss = loss.sum() / avg_factor + # if reduction is 'none', then do nothing, otherwise raise an error + elif self.reduction != 'none': + raise ValueError( + 'avg_factor can not be used with reduction="sum"') + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_aware_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_aware_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..4a9e904dd8266c606f61e35bb52121865476997e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_aware_loss.py @@ -0,0 +1,47 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from .iou_loss import IouLoss +from ..bbox_utils import bbox_iou + + +@register +@serializable +class IouAwareLoss(IouLoss): + """ + iou aware loss, see https://arxiv.org/abs/1912.05992 + Args: + loss_weight (float): iou aware loss weight, default is 1.0 + max_height (int): max height of input to support random shape input + max_width (int): max width of input to support random shape input + """ + + def __init__(self, loss_weight=1.0, giou=False, diou=False, ciou=False): + super(IouAwareLoss, self).__init__( + loss_weight=loss_weight, giou=giou, diou=diou, ciou=ciou) + + def __call__(self, ioup, pbox, gbox): + iou = bbox_iou( + pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou) + iou.stop_gradient = True + loss_iou_aware = F.binary_cross_entropy_with_logits( + ioup, iou, reduction='none') + loss_iou_aware = loss_iou_aware * self.loss_weight + return loss_iou_aware diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..b5cac22e342e633b5c413805623ba4015073b3b1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/iou_loss.py @@ -0,0 +1,295 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import math +import paddle + +from ppdet.core.workspace import register, serializable +from ..bbox_utils import bbox_iou + +__all__ = ['IouLoss', 'GIoULoss', 'DIouLoss', 'SIoULoss'] + + +@register +@serializable +class IouLoss(object): + """ + iou loss, see https://arxiv.org/abs/1908.03851 + loss = 1.0 - iou * iou + Args: + loss_weight (float): iou loss weight, default is 2.5 + max_height (int): max height of input to support random shape input + max_width (int): max width of input to support random shape input + ciou_term (bool): whether to add ciou_term + loss_square (bool): whether to square the iou term + """ + + def __init__(self, + loss_weight=2.5, + giou=False, + diou=False, + ciou=False, + loss_square=True): + self.loss_weight = loss_weight + self.giou = giou + self.diou = diou + self.ciou = ciou + self.loss_square = loss_square + + def __call__(self, pbox, gbox): + iou = bbox_iou( + pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou) + if self.loss_square: + loss_iou = 1 - iou * iou + else: + loss_iou = 1 - iou + + loss_iou = loss_iou * self.loss_weight + return loss_iou + + +@register +@serializable +class GIoULoss(object): + """ + Generalized Intersection over Union, see https://arxiv.org/abs/1902.09630 + Args: + loss_weight (float): giou loss weight, default as 1 + eps (float): epsilon to avoid divide by zero, default as 1e-10 + reduction (string): Options are "none", "mean" and "sum". default as none + """ + + def __init__(self, loss_weight=1., eps=1e-10, reduction='none'): + self.loss_weight = loss_weight + self.eps = eps + assert reduction in ('none', 'mean', 'sum') + self.reduction = reduction + + def bbox_overlap(self, box1, box2, eps=1e-10): + """calculate the iou of box1 and box2 + Args: + box1 (Tensor): box1 with the shape (..., 4) + box2 (Tensor): box1 with the shape (..., 4) + eps (float): epsilon to avoid divide by zero + Return: + iou (Tensor): iou of box1 and box2 + overlap (Tensor): overlap of box1 and box2 + union (Tensor): union of box1 and box2 + """ + x1, y1, x2, y2 = box1 + x1g, y1g, x2g, y2g = box2 + + xkis1 = paddle.maximum(x1, x1g) + ykis1 = paddle.maximum(y1, y1g) + xkis2 = paddle.minimum(x2, x2g) + ykis2 = paddle.minimum(y2, y2g) + w_inter = (xkis2 - xkis1).clip(0) + h_inter = (ykis2 - ykis1).clip(0) + overlap = w_inter * h_inter + + area1 = (x2 - x1) * (y2 - y1) + area2 = (x2g - x1g) * (y2g - y1g) + union = area1 + area2 - overlap + eps + iou = overlap / union + + return iou, overlap, union + + def __call__(self, pbox, gbox, iou_weight=1., loc_reweight=None): + x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1) + x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1) + box1 = [x1, y1, x2, y2] + box2 = [x1g, y1g, x2g, y2g] + iou, overlap, union = self.bbox_overlap(box1, box2, self.eps) + xc1 = paddle.minimum(x1, x1g) + yc1 = paddle.minimum(y1, y1g) + xc2 = paddle.maximum(x2, x2g) + yc2 = paddle.maximum(y2, y2g) + + area_c = (xc2 - xc1) * (yc2 - yc1) + self.eps + miou = iou - ((area_c - union) / area_c) + if loc_reweight is not None: + loc_reweight = paddle.reshape(loc_reweight, shape=(-1, 1)) + loc_thresh = 0.9 + giou = 1 - (1 - loc_thresh + ) * miou - loc_thresh * miou * loc_reweight + else: + giou = 1 - miou + if self.reduction == 'none': + loss = giou + elif self.reduction == 'sum': + loss = paddle.sum(giou * iou_weight) + else: + loss = paddle.mean(giou * iou_weight) + return loss * self.loss_weight + + +@register +@serializable +class DIouLoss(GIoULoss): + """ + Distance-IoU Loss, see https://arxiv.org/abs/1911.08287 + Args: + loss_weight (float): giou loss weight, default as 1 + eps (float): epsilon to avoid divide by zero, default as 1e-10 + use_complete_iou_loss (bool): whether to use complete iou loss + """ + + def __init__(self, loss_weight=1., eps=1e-10, use_complete_iou_loss=True): + super(DIouLoss, self).__init__(loss_weight=loss_weight, eps=eps) + self.use_complete_iou_loss = use_complete_iou_loss + + def __call__(self, pbox, gbox, iou_weight=1.): + x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1) + x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1) + cx = (x1 + x2) / 2 + cy = (y1 + y2) / 2 + w = x2 - x1 + h = y2 - y1 + + cxg = (x1g + x2g) / 2 + cyg = (y1g + y2g) / 2 + wg = x2g - x1g + hg = y2g - y1g + + x2 = paddle.maximum(x1, x2) + y2 = paddle.maximum(y1, y2) + + # A and B + xkis1 = paddle.maximum(x1, x1g) + ykis1 = paddle.maximum(y1, y1g) + xkis2 = paddle.minimum(x2, x2g) + ykis2 = paddle.minimum(y2, y2g) + + # A or B + xc1 = paddle.minimum(x1, x1g) + yc1 = paddle.minimum(y1, y1g) + xc2 = paddle.maximum(x2, x2g) + yc2 = paddle.maximum(y2, y2g) + + intsctk = (xkis2 - xkis1) * (ykis2 - ykis1) + intsctk = intsctk * paddle.greater_than( + xkis2, xkis1) * paddle.greater_than(ykis2, ykis1) + unionk = (x2 - x1) * (y2 - y1) + (x2g - x1g) * (y2g - y1g + ) - intsctk + self.eps + iouk = intsctk / unionk + + # DIOU term + dist_intersection = (cx - cxg) * (cx - cxg) + (cy - cyg) * (cy - cyg) + dist_union = (xc2 - xc1) * (xc2 - xc1) + (yc2 - yc1) * (yc2 - yc1) + diou_term = (dist_intersection + self.eps) / (dist_union + self.eps) + + # CIOU term + ciou_term = 0 + if self.use_complete_iou_loss: + ar_gt = wg / hg + ar_pred = w / h + arctan = paddle.atan(ar_gt) - paddle.atan(ar_pred) + ar_loss = 4. / np.pi / np.pi * arctan * arctan + alpha = ar_loss / (1 - iouk + ar_loss + self.eps) + alpha.stop_gradient = True + ciou_term = alpha * ar_loss + + diou = paddle.mean((1 - iouk + ciou_term + diou_term) * iou_weight) + + return diou * self.loss_weight + + +@register +@serializable +class SIoULoss(GIoULoss): + """ + see https://arxiv.org/pdf/2205.12740.pdf + Args: + loss_weight (float): siou loss weight, default as 1 + eps (float): epsilon to avoid divide by zero, default as 1e-10 + theta (float): default as 4 + reduction (str): Options are "none", "mean" and "sum". default as none + """ + + def __init__(self, loss_weight=1., eps=1e-10, theta=4., reduction='none'): + super(SIoULoss, self).__init__(loss_weight=loss_weight, eps=eps) + self.loss_weight = loss_weight + self.eps = eps + self.theta = theta + self.reduction = reduction + + def __call__(self, pbox, gbox): + x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1) + x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1) + + box1 = [x1, y1, x2, y2] + box2 = [x1g, y1g, x2g, y2g] + iou = bbox_iou(box1, box2) + + cx = (x1 + x2) / 2 + cy = (y1 + y2) / 2 + w = x2 - x1 + self.eps + h = y2 - y1 + self.eps + + cxg = (x1g + x2g) / 2 + cyg = (y1g + y2g) / 2 + wg = x2g - x1g + self.eps + hg = y2g - y1g + self.eps + + x2 = paddle.maximum(x1, x2) + y2 = paddle.maximum(y1, y2) + + # A or B + xc1 = paddle.minimum(x1, x1g) + yc1 = paddle.minimum(y1, y1g) + xc2 = paddle.maximum(x2, x2g) + yc2 = paddle.maximum(y2, y2g) + + cw_out = xc2 - xc1 + ch_out = yc2 - yc1 + + ch = paddle.maximum(cy, cyg) - paddle.minimum(cy, cyg) + cw = paddle.maximum(cx, cxg) - paddle.minimum(cx, cxg) + + # angle cost + dist_intersection = paddle.sqrt((cx - cxg)**2 + (cy - cyg)**2) + sin_angle_alpha = ch / dist_intersection + sin_angle_beta = cw / dist_intersection + thred = paddle.pow(paddle.to_tensor(2), 0.5) / 2 + thred.stop_gradient = True + sin_alpha = paddle.where(sin_angle_alpha > thred, sin_angle_beta, + sin_angle_alpha) + angle_cost = paddle.cos(paddle.asin(sin_alpha) * 2 - math.pi / 2) + + # distance cost + gamma = 2 - angle_cost + # gamma.stop_gradient = True + beta_x = ((cxg - cx) / cw_out)**2 + beta_y = ((cyg - cy) / ch_out)**2 + dist_cost = 1 - paddle.exp(-gamma * beta_x) + 1 - paddle.exp(-gamma * + beta_y) + + # shape cost + omega_w = paddle.abs(w - wg) / paddle.maximum(w, wg) + omega_h = paddle.abs(hg - h) / paddle.maximum(h, hg) + omega = (1 - paddle.exp(-omega_w))**self.theta + ( + 1 - paddle.exp(-omega_h))**self.theta + siou_loss = 1 - iou + (omega + dist_cost) / 2 + + if self.reduction == 'mean': + siou_loss = paddle.mean(siou_loss) + elif self.reduction == 'sum': + siou_loss = paddle.sum(siou_loss) + + return siou_loss * self.loss_weight diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/jde_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/jde_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..5c3b5a61534e793b243526fabdcf604114ce2512 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/jde_loss.py @@ -0,0 +1,193 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +__all__ = ['JDEDetectionLoss', 'JDEEmbeddingLoss', 'JDELoss'] + + +@register +class JDEDetectionLoss(nn.Layer): + __shared__ = ['num_classes'] + + def __init__(self, num_classes=1, for_mot=True): + super(JDEDetectionLoss, self).__init__() + self.num_classes = num_classes + self.for_mot = for_mot + + def det_loss(self, p_det, anchor, t_conf, t_box): + pshape = paddle.shape(p_det) + pshape.stop_gradient = True + nB, nGh, nGw = pshape[0], pshape[-2], pshape[-1] + nA = len(anchor) + p_det = paddle.reshape( + p_det, [nB, nA, self.num_classes + 5, nGh, nGw]).transpose( + (0, 1, 3, 4, 2)) + + # 1. loss_conf: cross_entropy + p_conf = p_det[:, :, :, :, 4:6] + p_conf_flatten = paddle.reshape(p_conf, [-1, 2]) + t_conf_flatten = t_conf.flatten() + t_conf_flatten = paddle.cast(t_conf_flatten, dtype="int64") + t_conf_flatten.stop_gradient = True + loss_conf = F.cross_entropy( + p_conf_flatten, t_conf_flatten, ignore_index=-1, reduction='mean') + loss_conf.stop_gradient = False + + # 2. loss_box: smooth_l1_loss + p_box = p_det[:, :, :, :, :4] + p_box_flatten = paddle.reshape(p_box, [-1, 4]) + t_box_flatten = paddle.reshape(t_box, [-1, 4]) + fg_inds = paddle.nonzero(t_conf_flatten > 0).flatten() + if fg_inds.numel() > 0: + reg_delta = paddle.gather(p_box_flatten, fg_inds) + reg_target = paddle.gather(t_box_flatten, fg_inds) + else: + reg_delta = paddle.to_tensor([0, 0, 0, 0], dtype='float32') + reg_delta.stop_gradient = False + reg_target = paddle.to_tensor([0, 0, 0, 0], dtype='float32') + reg_target.stop_gradient = True + loss_box = F.smooth_l1_loss( + reg_delta, reg_target, reduction='mean', delta=1.0) + loss_box.stop_gradient = False + + return loss_conf, loss_box + + def forward(self, det_outs, targets, anchors): + """ + Args: + det_outs (list[Tensor]): output from detection head, each one + is a 4-D Tensor with shape [N, C, H, W]. + targets (dict): contains 'im_id', 'gt_bbox', 'gt_ide', 'image', + 'im_shape', 'scale_factor' and 'tbox', 'tconf', 'tide' of + each FPN level. + anchors (list[list]): anchor setting of JDE model, N row M col, N is + the anchor levels(FPN levels), M is the anchor scales each + level. + """ + assert len(det_outs) == len(anchors) + loss_confs = [] + loss_boxes = [] + for i, (p_det, anchor) in enumerate(zip(det_outs, anchors)): + t_conf = targets['tconf{}'.format(i)] + t_box = targets['tbox{}'.format(i)] + + loss_conf, loss_box = self.det_loss(p_det, anchor, t_conf, t_box) + loss_confs.append(loss_conf) + loss_boxes.append(loss_box) + if self.for_mot: + return {'loss_confs': loss_confs, 'loss_boxes': loss_boxes} + else: + jde_conf_losses = sum(loss_confs) + jde_box_losses = sum(loss_boxes) + jde_det_losses = { + "loss_conf": jde_conf_losses, + "loss_box": jde_box_losses, + "loss": jde_conf_losses + jde_box_losses, + } + return jde_det_losses + + +@register +class JDEEmbeddingLoss(nn.Layer): + def __init__(self, ): + super(JDEEmbeddingLoss, self).__init__() + self.phony = self.create_parameter(shape=[1], dtype="float32") + + def emb_loss(self, p_ide, t_conf, t_ide, emb_scale, classifier): + emb_dim = p_ide.shape[1] + p_ide = p_ide.transpose((0, 2, 3, 1)) + p_ide_flatten = paddle.reshape(p_ide, [-1, emb_dim]) + mask = t_conf > 0 + mask = paddle.cast(mask, dtype="int64") + mask.stop_gradient = True + emb_mask = mask.max(1).flatten() + emb_mask_inds = paddle.nonzero(emb_mask > 0).flatten() + emb_mask_inds.stop_gradient = True + # use max(1) to decide the id, TODO: more reseanable strategy + t_ide_flatten = t_ide.max(1).flatten() + t_ide_flatten = paddle.cast(t_ide_flatten, dtype="int64") + valid_inds = paddle.nonzero(t_ide_flatten != -1).flatten() + + if emb_mask_inds.numel() == 0 or valid_inds.numel() == 0: + # loss_ide = paddle.to_tensor([0]) # will be error in gradient backward + loss_ide = self.phony * 0 # todo + else: + embedding = paddle.gather(p_ide_flatten, emb_mask_inds) + embedding = emb_scale * F.normalize(embedding) + logits = classifier(embedding) + + ide_target = paddle.gather(t_ide_flatten, emb_mask_inds) + + loss_ide = F.cross_entropy( + logits, ide_target, ignore_index=-1, reduction='mean') + loss_ide.stop_gradient = False + + return loss_ide + + def forward(self, ide_outs, targets, emb_scale, classifier): + loss_ides = [] + for i, p_ide in enumerate(ide_outs): + t_conf = targets['tconf{}'.format(i)] + t_ide = targets['tide{}'.format(i)] + + loss_ide = self.emb_loss(p_ide, t_conf, t_ide, emb_scale, + classifier) + loss_ides.append(loss_ide) + return loss_ides + + +@register +class JDELoss(nn.Layer): + def __init__(self): + super(JDELoss, self).__init__() + + def forward(self, loss_confs, loss_boxes, loss_ides, loss_params_cls, + loss_params_reg, loss_params_ide, targets): + assert len(loss_confs) == len(loss_boxes) == len(loss_ides) + assert len(loss_params_cls) == len(loss_params_reg) == len( + loss_params_ide) + assert len(loss_confs) == len(loss_params_cls) + + batchsize = targets['gt_bbox'].shape[0] + nTargets = paddle.nonzero(paddle.sum(targets['gt_bbox'], axis=2)).shape[ + 0] / batchsize + nTargets = paddle.to_tensor(nTargets, dtype='float32') + nTargets.stop_gradient = True + + jde_losses = [] + for i, (loss_conf, loss_box, loss_ide, l_conf_p, l_box_p, + l_ide_p) in enumerate( + zip(loss_confs, loss_boxes, loss_ides, loss_params_cls, + loss_params_reg, loss_params_ide)): + + jde_loss = l_conf_p(loss_conf) + l_box_p(loss_box) + l_ide_p( + loss_ide) + jde_losses.append(jde_loss) + + loss_all = { + "loss_conf": sum(loss_confs), + "loss_box": sum(loss_boxes), + "loss_ide": sum(loss_ides), + "loss": sum(jde_losses), + "nTargets": nTargets, + } + return loss_all diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/keypoint_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/keypoint_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..37a24102a85eec227ef3acd05f0814274f275e54 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/keypoint_loss.py @@ -0,0 +1,632 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from itertools import cycle, islice +from collections import abc +import numpy as np +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register, serializable + +__all__ = ['HrHRNetLoss', 'KeyPointMSELoss', 'OKSLoss', 'CenterFocalLoss', 'L1Loss'] + + +@register +@serializable +class KeyPointMSELoss(nn.Layer): + def __init__(self, use_target_weight=True, loss_scale=0.5): + """ + KeyPointMSELoss layer + + Args: + use_target_weight (bool): whether to use target weight + """ + super(KeyPointMSELoss, self).__init__() + self.criterion = nn.MSELoss(reduction='mean') + self.use_target_weight = use_target_weight + self.loss_scale = loss_scale + + def forward(self, output, records): + target = records['target'] + target_weight = records['target_weight'] + batch_size = output.shape[0] + num_joints = output.shape[1] + heatmaps_pred = output.reshape( + (batch_size, num_joints, -1)).split(num_joints, 1) + heatmaps_gt = target.reshape( + (batch_size, num_joints, -1)).split(num_joints, 1) + loss = 0 + for idx in range(num_joints): + heatmap_pred = heatmaps_pred[idx].squeeze() + heatmap_gt = heatmaps_gt[idx].squeeze() + if self.use_target_weight: + loss += self.loss_scale * self.criterion( + heatmap_pred.multiply(target_weight[:, idx]), + heatmap_gt.multiply(target_weight[:, idx])) + else: + loss += self.loss_scale * self.criterion(heatmap_pred, + heatmap_gt) + keypoint_losses = dict() + keypoint_losses['loss'] = loss / num_joints + return keypoint_losses + + +@register +@serializable +class HrHRNetLoss(nn.Layer): + def __init__(self, num_joints, swahr): + """ + HrHRNetLoss layer + + Args: + num_joints (int): number of keypoints + """ + super(HrHRNetLoss, self).__init__() + if swahr: + self.heatmaploss = HeatMapSWAHRLoss(num_joints) + else: + self.heatmaploss = HeatMapLoss() + self.aeloss = AELoss() + self.ziploss = ZipLoss( + [self.heatmaploss, self.heatmaploss, self.aeloss]) + + def forward(self, inputs, records): + targets = [] + targets.append([records['heatmap_gt1x'], records['mask_1x']]) + targets.append([records['heatmap_gt2x'], records['mask_2x']]) + targets.append(records['tagmap']) + keypoint_losses = dict() + loss = self.ziploss(inputs, targets) + keypoint_losses['heatmap_loss'] = loss[0] + loss[1] + keypoint_losses['pull_loss'] = loss[2][0] + keypoint_losses['push_loss'] = loss[2][1] + keypoint_losses['loss'] = recursive_sum(loss) + return keypoint_losses + + +class HeatMapLoss(object): + def __init__(self, loss_factor=1.0): + super(HeatMapLoss, self).__init__() + self.loss_factor = loss_factor + + def __call__(self, preds, targets): + heatmap, mask = targets + loss = ((preds - heatmap)**2 * mask.cast('float').unsqueeze(1)) + loss = paddle.clip(loss, min=0, max=2).mean() + loss *= self.loss_factor + return loss + + +class HeatMapSWAHRLoss(object): + def __init__(self, num_joints, loss_factor=1.0): + super(HeatMapSWAHRLoss, self).__init__() + self.loss_factor = loss_factor + self.num_joints = num_joints + + def __call__(self, preds, targets): + heatmaps_gt, mask = targets + heatmaps_pred = preds[0] + scalemaps_pred = preds[1] + + heatmaps_scaled_gt = paddle.where(heatmaps_gt > 0, 0.5 * heatmaps_gt * ( + 1 + (1 + + (scalemaps_pred - 1.) * paddle.log(heatmaps_gt + 1e-10))**2), + heatmaps_gt) + + regularizer_loss = paddle.mean( + paddle.pow((scalemaps_pred - 1.) * (heatmaps_gt > 0).astype(float), + 2)) + omiga = 0.01 + # thres = 2**(-1/omiga), threshold for positive weight + hm_weight = heatmaps_scaled_gt**( + omiga + ) * paddle.abs(1 - heatmaps_pred) + paddle.abs(heatmaps_pred) * ( + 1 - heatmaps_scaled_gt**(omiga)) + + loss = (((heatmaps_pred - heatmaps_scaled_gt)**2) * + mask.cast('float').unsqueeze(1)) * hm_weight + loss = loss.mean() + loss = self.loss_factor * (loss + 1.0 * regularizer_loss) + return loss + + +class AELoss(object): + def __init__(self, pull_factor=0.001, push_factor=0.001): + super(AELoss, self).__init__() + self.pull_factor = pull_factor + self.push_factor = push_factor + + def apply_single(self, pred, tagmap): + if tagmap.numpy()[:, :, 3].sum() == 0: + return (paddle.zeros([1]), paddle.zeros([1])) + nonzero = paddle.nonzero(tagmap[:, :, 3] > 0) + if nonzero.shape[0] == 0: + return (paddle.zeros([1]), paddle.zeros([1])) + p_inds = paddle.unique(nonzero[:, 0]) + num_person = p_inds.shape[0] + if num_person == 0: + return (paddle.zeros([1]), paddle.zeros([1])) + + pull = 0 + tagpull_num = 0 + embs_all = [] + person_unvalid = 0 + for person_idx in p_inds.numpy(): + valid_single = tagmap[person_idx.item()] + validkpts = paddle.nonzero(valid_single[:, 3] > 0) + valid_single = paddle.index_select(valid_single, validkpts) + emb = paddle.gather_nd(pred, valid_single[:, :3]) + if emb.shape[0] == 1: + person_unvalid += 1 + mean = paddle.mean(emb, axis=0) + embs_all.append(mean) + pull += paddle.mean(paddle.pow(emb - mean, 2), axis=0) + tagpull_num += emb.shape[0] + pull /= max(num_person - person_unvalid, 1) + if num_person < 2: + return pull, paddle.zeros([1]) + + embs_all = paddle.stack(embs_all) + A = embs_all.expand([num_person, num_person]) + B = A.transpose([1, 0]) + diff = A - B + + diff = paddle.pow(diff, 2) + push = paddle.exp(-diff) + push = paddle.sum(push) - num_person + + push /= 2 * num_person * (num_person - 1) + return pull, push + + def __call__(self, preds, tagmaps): + bs = preds.shape[0] + losses = [ + self.apply_single(preds[i:i + 1].squeeze(), + tagmaps[i:i + 1].squeeze()) for i in range(bs) + ] + pull = self.pull_factor * sum(loss[0] for loss in losses) / len(losses) + push = self.push_factor * sum(loss[1] for loss in losses) / len(losses) + return pull, push + + +class ZipLoss(object): + def __init__(self, loss_funcs): + super(ZipLoss, self).__init__() + self.loss_funcs = loss_funcs + + def __call__(self, inputs, targets): + assert len(self.loss_funcs) == len(targets) >= len(inputs) + + def zip_repeat(*args): + longest = max(map(len, args)) + filled = [islice(cycle(x), longest) for x in args] + return zip(*filled) + + return tuple( + fn(x, y) + for x, y, fn in zip_repeat(inputs, targets, self.loss_funcs)) + + +def recursive_sum(inputs): + if isinstance(inputs, abc.Sequence): + return sum([recursive_sum(x) for x in inputs]) + return inputs + + +def oks_overlaps(kpt_preds, kpt_gts, kpt_valids, kpt_areas, sigmas): + if not kpt_gts.astype('bool').any(): + return kpt_preds.sum()*0 + + sigmas = paddle.to_tensor(sigmas, dtype=kpt_preds.dtype) + variances = (sigmas * 2)**2 + + assert kpt_preds.shape[0] == kpt_gts.shape[0] + kpt_preds = kpt_preds.reshape((-1, kpt_preds.shape[-1] // 2, 2)) + kpt_gts = kpt_gts.reshape((-1, kpt_gts.shape[-1] // 2, 2)) + + squared_distance = (kpt_preds[:, :, 0] - kpt_gts[:, :, 0]) ** 2 + \ + (kpt_preds[:, :, 1] - kpt_gts[:, :, 1]) ** 2 + assert (kpt_valids.sum(-1) > 0).all() + squared_distance0 = squared_distance / ( + kpt_areas[:, None] * variances[None, :] * 2) + squared_distance1 = paddle.exp(-squared_distance0) + squared_distance1 = squared_distance1 * kpt_valids + oks = squared_distance1.sum(axis=1) / kpt_valids.sum(axis=1) + + return oks + + +def oks_loss(pred, + target, + weight, + valid=None, + area=None, + linear=False, + sigmas=None, + eps=1e-6, + avg_factor=None, + reduction=None): + """Oks loss. + + Computing the oks loss between a set of predicted poses and target poses. + The loss is calculated as negative log of oks. + + Args: + pred (Tensor): Predicted poses of format (x1, y1, x2, y2, ...), + shape (n, K*2). + target (Tensor): Corresponding gt poses, shape (n, K*2). + linear (bool, optional): If True, use linear scale of loss instead of + log scale. Default: False. + eps (float): Eps to avoid log(0). + + Returns: + Tensor: Loss tensor. + """ + oks = oks_overlaps(pred, target, valid, area, sigmas).clip(min=eps) + if linear: + loss = 1 - oks + else: + loss = -oks.log() + + if weight is not None: + if weight.shape != loss.shape: + if weight.shape[0] == loss.shape[0]: + # For most cases, weight is of shape (num_priors, ), + # which means it does not have the second axis num_class + weight = weight.reshape((-1, 1)) + else: + # Sometimes, weight per anchor per class is also needed. e.g. + # in FSAF. But it may be flattened of shape + # (num_priors x num_class, ), while loss is still of shape + # (num_priors, num_class). + assert weight.numel() == loss.numel() + weight = weight.reshape((loss.shape[0], -1)) + assert weight.ndim == loss.ndim + loss = loss * weight + + # if avg_factor is not specified, just reduce the loss + if avg_factor is None: + if reduction == 'mean': + loss = loss.mean() + elif reduction == 'sum': + loss = loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == 'mean': + # Avoid causing ZeroDivisionError when avg_factor is 0.0, + # i.e., all labels of an image belong to ignore index. + eps = 1e-10 + loss = loss.sum() / (avg_factor + eps) + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != 'none': + raise ValueError('avg_factor can not be used with reduction="sum"') + + + return loss + +@register +@serializable +class OKSLoss(nn.Layer): + """OKSLoss. + + Computing the oks loss between a set of predicted poses and target poses. + + Args: + linear (bool): If True, use linear scale of loss instead of log scale. + Default: False. + eps (float): Eps to avoid log(0). + reduction (str): Options are "none", "mean" and "sum". + loss_weight (float): Weight of loss. + """ + + def __init__(self, + linear=False, + num_keypoints=17, + eps=1e-6, + reduction='mean', + loss_weight=1.0): + super(OKSLoss, self).__init__() + self.linear = linear + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + if num_keypoints == 17: + self.sigmas = np.array([ + .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, + 1.07, .87, .87, .89, .89 + ], dtype=np.float32) / 10.0 + elif num_keypoints == 14: + self.sigmas = np.array([ + .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89, + .79, .79 + ]) / 10.0 + else: + raise ValueError(f'Unsupported keypoints number {num_keypoints}') + + def forward(self, + pred, + target, + valid, + area, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs): + """Forward function. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + valid (Tensor): The visible flag of the target pose. + area (Tensor): The area of the target pose. + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. Options are "none", "mean" and "sum". + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + if (weight is not None) and (not paddle.any(weight > 0)) and ( + reduction != 'none'): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + if weight is not None and weight.dim() > 1: + # TODO: remove this in the future + # reduce the weight of shape (n, 4) to (n,) to match the + # iou_loss of shape (n,) + assert weight.shape == pred.shape + weight = weight.mean(-1) + loss = self.loss_weight * oks_loss( + pred, + target, + weight, + valid=valid, + area=area, + linear=self.linear, + sigmas=self.sigmas, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs) + return loss + + +def center_focal_loss(pred, gt, weight=None, mask=None, avg_factor=None, reduction=None): + """Modified focal loss. Exactly the same as CornerNet. + Runs faster and costs a little bit more memory. + + Args: + pred (Tensor): The prediction with shape [bs, c, h, w]. + gt (Tensor): The learning target of the prediction in gaussian + distribution, with shape [bs, c, h, w]. + mask (Tensor): The valid mask. Defaults to None. + """ + if not gt.astype('bool').any(): + return pred.sum()*0 + pos_inds = gt.equal(1).astype('float32') + if mask is None: + neg_inds = gt.less_than(paddle.to_tensor([1], dtype='float32')).astype('float32') + else: + neg_inds = gt.less_than(paddle.to_tensor([1], dtype='float32')).astype('float32') * mask.equal(0).astype('float32') + + neg_weights = paddle.pow(1 - gt, 4) + + loss = 0 + + pos_loss = paddle.log(pred) * paddle.pow(1 - pred, 2) * pos_inds + neg_loss = paddle.log(1 - pred) * paddle.pow(pred, 2) * neg_weights * \ + neg_inds + + num_pos = pos_inds.astype('float32').sum() + pos_loss = pos_loss.sum() + neg_loss = neg_loss.sum() + + if num_pos == 0: + loss = loss - neg_loss + else: + loss = loss - (pos_loss + neg_loss) / num_pos + + if weight is not None: + if weight.shape != loss.shape: + if weight.shape[0] == loss.shape[0]: + # For most cases, weight is of shape (num_priors, ), + # which means it does not have the second axis num_class + weight = weight.reshape((-1, 1)) + else: + # Sometimes, weight per anchor per class is also needed. e.g. + # in FSAF. But it may be flattened of shape + # (num_priors x num_class, ), while loss is still of shape + # (num_priors, num_class). + assert weight.numel() == loss.numel() + weight = weight.reshape((loss.shape[0], -1)) + assert weight.ndim == loss.ndim + loss = loss * weight + + # if avg_factor is not specified, just reduce the loss + if avg_factor is None: + if reduction == 'mean': + loss = loss.mean() + elif reduction == 'sum': + loss = loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == 'mean': + # Avoid causing ZeroDivisionError when avg_factor is 0.0, + # i.e., all labels of an image belong to ignore index. + eps = 1e-10 + loss = loss.sum() / (avg_factor + eps) + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != 'none': + raise ValueError('avg_factor can not be used with reduction="sum"') + + return loss + +@register +@serializable +class CenterFocalLoss(nn.Layer): + """CenterFocalLoss is a variant of focal loss. + + More details can be found in the `paper + `_ + + Args: + reduction (str): Options are "none", "mean" and "sum". + loss_weight (float): Loss weight of current loss. + """ + + def __init__(self, + reduction='none', + loss_weight=1.0): + super(CenterFocalLoss, self).__init__() + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, + pred, + target, + weight=None, + mask=None, + avg_factor=None, + reduction_override=None): + """Forward function. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction in gaussian + distribution. + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + mask (Tensor): The valid mask. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + loss_reg = self.loss_weight * center_focal_loss( + pred, + target, + weight, + mask=mask, + reduction=reduction, + avg_factor=avg_factor) + return loss_reg + +def l1_loss(pred, target, weight=None, reduction='mean', avg_factor=None): + """L1 loss. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + + Returns: + Tensor: Calculated loss + """ + if not target.astype('bool').any(): + return pred.sum() * 0 + + assert pred.shape == target.shape + loss = paddle.abs(pred - target) + + if weight is not None: + if weight.shape != loss.shape: + if weight.shape[0] == loss.shape[0]: + # For most cases, weight is of shape (num_priors, ), + # which means it does not have the second axis num_class + weight = weight.reshape((-1, 1)) + else: + # Sometimes, weight per anchor per class is also needed. e.g. + # in FSAF. But it may be flattened of shape + # (num_priors x num_class, ), while loss is still of shape + # (num_priors, num_class). + assert weight.numel() == loss.numel() + weight = weight.reshape((loss.shape[0], -1)) + assert weight.ndim == loss.ndim + loss = loss * weight + + # if avg_factor is not specified, just reduce the loss + if avg_factor is None: + if reduction == 'mean': + loss = loss.mean() + elif reduction == 'sum': + loss = loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == 'mean': + # Avoid causing ZeroDivisionError when avg_factor is 0.0, + # i.e., all labels of an image belong to ignore index. + eps = 1e-10 + loss = loss.sum() / (avg_factor + eps) + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != 'none': + raise ValueError('avg_factor can not be used with reduction="sum"') + + + return loss + +@register +@serializable +class L1Loss(nn.Layer): + """L1 loss. + + Args: + reduction (str, optional): The method to reduce the loss. + Options are "none", "mean" and "sum". + loss_weight (float, optional): The weight of loss. + """ + + def __init__(self, reduction='mean', loss_weight=1.0): + super(L1Loss, self).__init__() + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None): + """Forward function. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + loss_bbox = self.loss_weight * l1_loss( + pred, target, weight, reduction=reduction, avg_factor=avg_factor) + return loss_bbox + diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/pose3d_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/pose3d_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..4781d6e5cd02e468ba9ef58367234d3fd51ec435 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/pose3d_loss.py @@ -0,0 +1,250 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from itertools import cycle, islice +from collections import abc +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.engine') + +__all__ = ['Pose3DLoss'] + + +@register +@serializable +class Pose3DLoss(nn.Layer): + def __init__(self, weight_3d=1.0, weight_2d=0.0, reduction='none'): + """ + KeyPointMSELoss layer + + Args: + weight_3d (float): weight of 3d loss + weight_2d (float): weight of 2d loss + reduction (bool): whether use reduction to loss + """ + super(Pose3DLoss, self).__init__() + self.weight_3d = weight_3d + self.weight_2d = weight_2d + self.criterion_2dpose = nn.MSELoss(reduction=reduction) + self.criterion_3dpose = nn.L1Loss(reduction=reduction) + self.criterion_smoothl1 = nn.SmoothL1Loss( + reduction=reduction, delta=1.0) + self.criterion_vertices = nn.L1Loss() + + def forward(self, pred3d, pred2d, inputs): + """ + mpjpe: mpjpe loss between 3d joints + keypoint_2d_loss: 2d joints loss compute by criterion_2dpose + """ + gt_3d_joints = inputs['joints_3d'] + gt_2d_joints = inputs['joints_2d'] + has_3d_joints = inputs['has_3d_joints'] + has_2d_joints = inputs['has_2d_joints'] + + loss_3d = mpjpe_focal(pred3d, gt_3d_joints, has_3d_joints) + loss = self.weight_3d * loss_3d + epoch = inputs['epoch_id'] + if self.weight_2d > 0: + weight = self.weight_2d * pow(0.1, (epoch // 8)) + if epoch > 8: + weight = 0 + loss_2d = keypoint_2d_loss(self.criterion_2dpose, pred2d, + gt_2d_joints, has_2d_joints) + loss += weight * loss_2d + return loss + + +def filter_3d_joints(pred, gt, has_3d_joints): + """ + filter 3d joints + """ + gt = gt[has_3d_joints == 1] + gt = gt[:, :, :3] + pred = pred[has_3d_joints == 1] + + gt_pelvis = (gt[:, 2, :] + gt[:, 3, :]) / 2 + gt = gt - gt_pelvis[:, None, :] + pred_pelvis = (pred[:, 2, :] + pred[:, 3, :]) / 2 + pred = pred - pred_pelvis[:, None, :] + return pred, gt + + +def mpjpe(pred, gt, has_3d_joints): + """ + mPJPE loss + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + error = paddle.sqrt((paddle.minimum((pred - gt), paddle.to_tensor(1.2))**2 + ).sum(axis=-1)).mean() + return error + + +def mpjpe_focal(pred, gt, has_3d_joints): + """ + mPJPE loss + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + mse_error = ((pred - gt)**2).sum(axis=-1) + mpjpe_error = paddle.sqrt(mse_error) + mean = mpjpe_error.mean() + std = mpjpe_error.std() + atte = 2 * F.sigmoid(6 * (mpjpe_error - mean) / std) + mse_error *= atte + return mse_error.mean() + + +def mpjpe_mse(pred, gt, has_3d_joints, weight=1.): + """ + mPJPE loss + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + error = (((pred - gt)**2).sum(axis=-1)).mean() + return error + + +def mpjpe_criterion(pred, gt, has_3d_joints, criterion_pose3d): + """ + mPJPE loss of self define criterion + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + error = paddle.sqrt(criterion_pose3d(pred, gt)).mean() + return error + + +@register +@serializable +def weighted_mpjpe(pred, gt, has_3d_joints): + """ + Weighted_mPJPE + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + weight = paddle.linalg.norm(pred, p=2, axis=-1) + weight = paddle.to_tensor( + [1.5, 1.3, 1.2, 1.2, 1.3, 1.5, 1.5, 1.3, 1.2, 1.2, 1.3, 1.5, 1., 1.]) + error = (weight * paddle.linalg.norm(pred - gt, p=2, axis=-1)).mean() + return error + + +@register +@serializable +def normed_mpjpe(pred, gt, has_3d_joints): + """ + Normalized MPJPE (scale only), adapted from: + https://github.com/hrhodin/UnsupervisedGeometryAwareRepresentationLearning/blob/master/losses/poses.py + """ + assert pred.shape == gt.shape + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + + norm_predicted = paddle.mean( + paddle.sum(pred**2, axis=3, keepdim=True), axis=2, keepdim=True) + norm_target = paddle.mean( + paddle.sum(gt * pred, axis=3, keepdim=True), axis=2, keepdim=True) + scale = norm_target / norm_predicted + return mpjpe(scale * pred, gt) + + +@register +@serializable +def mpjpe_np(pred, gt, has_3d_joints): + """ + mPJPE_NP + """ + pred, gt = filter_3d_joints(pred, gt, has_3d_joints) + error = np.sqrt(((pred - gt)**2).sum(axis=-1)).mean() + return error + + +@register +@serializable +def mean_per_vertex_error(pred, gt, has_smpl): + """ + Compute mPVE + """ + pred = pred[has_smpl == 1] + gt = gt[has_smpl == 1] + with paddle.no_grad(): + error = paddle.sqrt(((pred - gt)**2).sum(axis=-1)).mean() + return error + + +@register +@serializable +def keypoint_2d_loss(criterion_keypoints, pred_keypoints_2d, gt_keypoints_2d, + has_pose_2d): + """ + Compute 2D reprojection loss if 2D keypoint annotations are available. + The confidence (conf) is binary and indicates whether the keypoints exist or not. + """ + conf = gt_keypoints_2d[:, :, -1].unsqueeze(-1).clone() + loss = (conf * criterion_keypoints( + pred_keypoints_2d, gt_keypoints_2d[:, :, :-1] * 0.001)).mean() + return loss + + +@register +@serializable +def keypoint_3d_loss(criterion_keypoints, pred_keypoints_3d, gt_keypoints_3d, + has_pose_3d): + """ + Compute 3D keypoint loss if 3D keypoint annotations are available. + """ + conf = gt_keypoints_3d[:, :, -1].unsqueeze(-1).clone() + gt_keypoints_3d = gt_keypoints_3d[:, :, :-1].clone() + gt_keypoints_3d = gt_keypoints_3d[has_pose_3d == 1] + conf = conf[has_pose_3d == 1] + pred_keypoints_3d = pred_keypoints_3d[has_pose_3d == 1] + if len(gt_keypoints_3d) > 0: + gt_pelvis = (gt_keypoints_3d[:, 2, :] + gt_keypoints_3d[:, 3, :]) / 2 + gt_keypoints_3d = gt_keypoints_3d - gt_pelvis[:, None, :] + pred_pelvis = ( + pred_keypoints_3d[:, 2, :] + pred_keypoints_3d[:, 3, :]) / 2 + pred_keypoints_3d = pred_keypoints_3d - pred_pelvis[:, None, :] + return (conf * criterion_keypoints(pred_keypoints_3d, + gt_keypoints_3d)).mean() + else: + return paddle.to_tensor([1.]).fill_(0.) + + +@register +@serializable +def vertices_loss(criterion_vertices, pred_vertices, gt_vertices, has_smpl): + """ + Compute per-vertex loss if vertex annotations are available. + """ + pred_vertices_with_shape = pred_vertices[has_smpl == 1] + gt_vertices_with_shape = gt_vertices[has_smpl == 1] + if len(gt_vertices_with_shape) > 0: + return criterion_vertices(pred_vertices_with_shape, + gt_vertices_with_shape) + else: + return paddle.to_tensor([1.]).fill_(0.) + + +@register +@serializable +def rectify_pose(pose): + pose = pose.copy() + R_mod = cv2.Rodrigues(np.array([np.pi, 0, 0]))[0] + R_root = cv2.Rodrigues(pose[:3])[0] + new_root = R_root.dot(R_mod) + pose[:3] = cv2.Rodrigues(new_root)[0].reshape(3) + return pose diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/probiou_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/probiou_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..c2a1c75879ee02ea8cd721272ad1f12fb3b96a67 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/probiou_loss.py @@ -0,0 +1,104 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np + +import paddle +import paddle.nn.functional as F + +from ppdet.core.workspace import register, serializable + +__all__ = ['ProbIoULoss'] + + +def gbb_form(boxes): + xy, wh, angle = paddle.split(boxes, [2, 2, 1], axis=-1) + return paddle.concat([xy, wh.pow(2) / 12., angle], axis=-1) + + +def rotated_form(a_, b_, angles): + cos_a = paddle.cos(angles) + sin_a = paddle.sin(angles) + a = a_ * paddle.pow(cos_a, 2) + b_ * paddle.pow(sin_a, 2) + b = a_ * paddle.pow(sin_a, 2) + b_ * paddle.pow(cos_a, 2) + c = (a_ - b_) * cos_a * sin_a + return a, b, c + + +def probiou_loss(pred, target, eps=1e-3, mode='l1'): + """ + pred -> a matrix [N,5](x,y,w,h,angle - in radians) containing ours predicted box ;in case of HBB angle == 0 + target -> a matrix [N,5](x,y,w,h,angle - in radians) containing ours target box ;in case of HBB angle == 0 + eps -> threshold to avoid infinite values + mode -> ('l1' in [0,1] or 'l2' in [0,inf]) metrics according our paper + + """ + + gbboxes1 = gbb_form(pred) + gbboxes2 = gbb_form(target) + + x1, y1, a1_, b1_, c1_ = gbboxes1[:, + 0], gbboxes1[:, + 1], gbboxes1[:, + 2], gbboxes1[:, + 3], gbboxes1[:, + 4] + x2, y2, a2_, b2_, c2_ = gbboxes2[:, + 0], gbboxes2[:, + 1], gbboxes2[:, + 2], gbboxes2[:, + 3], gbboxes2[:, + 4] + + a1, b1, c1 = rotated_form(a1_, b1_, c1_) + a2, b2, c2 = rotated_form(a2_, b2_, c2_) + + t1 = 0.25 * ((a1 + a2) * (paddle.pow(y1 - y2, 2)) + (b1 + b2) * (paddle.pow(x1 - x2, 2))) + \ + 0.5 * ((c1+c2)*(x2-x1)*(y1-y2)) + t2 = (a1 + a2) * (b1 + b2) - paddle.pow(c1 + c2, 2) + t3_ = (a1 * b1 - c1 * c1) * (a2 * b2 - c2 * c2) + t3 = 0.5 * paddle.log(t2 / (4 * paddle.sqrt(F.relu(t3_)) + eps)) + + B_d = (t1 / t2) + t3 + # B_d = t1 + t2 + t3 + + B_d = paddle.clip(B_d, min=eps, max=100.0) + l1 = paddle.sqrt(1.0 - paddle.exp(-B_d) + eps) + l_i = paddle.pow(l1, 2.0) + l2 = -paddle.log(1.0 - l_i + eps) + + if mode == 'l1': + probiou = l1 + if mode == 'l2': + probiou = l2 + + return probiou + + +@serializable +@register +class ProbIoULoss(object): + """ ProbIoU Loss, refer to https://arxiv.org/abs/2106.06072 for details """ + + def __init__(self, mode='l1', eps=1e-3): + super(ProbIoULoss, self).__init__() + self.mode = mode + self.eps = eps + + def __call__(self, pred_rboxes, assigned_rboxes): + return probiou_loss(pred_rboxes, assigned_rboxes, self.eps, self.mode) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/queryinst_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/queryinst_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..640b9b4102de867cab6852ee1220971c4ef3a405 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/queryinst_loss.py @@ -0,0 +1,175 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ppdet.modeling.losses.iou_loss import GIoULoss +from .sparsercnn_loss import HungarianMatcher + +__all__ = ['QueryInstLoss'] + + +@register +class QueryInstLoss(object): + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + focal_loss_alpha=0.25, + focal_loss_gamma=2.0, + class_weight=2.0, + l1_weight=5.0, + giou_weight=2.0, + mask_weight=8.0): + super(QueryInstLoss, self).__init__() + + self.num_classes = num_classes + self.focal_loss_alpha = focal_loss_alpha + self.focal_loss_gamma = focal_loss_gamma + self.loss_weights = { + "loss_cls": class_weight, + "loss_bbox": l1_weight, + "loss_giou": giou_weight, + "loss_mask": mask_weight + } + self.giou_loss = GIoULoss(eps=1e-6, reduction='sum') + + self.matcher = HungarianMatcher(focal_loss_alpha, focal_loss_gamma, + class_weight, l1_weight, giou_weight) + + def loss_classes(self, class_logits, targets, indices, avg_factor): + tgt_labels = paddle.full( + class_logits.shape[:2], self.num_classes, dtype='int32') + + if sum(len(v['labels']) for v in targets) > 0: + tgt_classes = paddle.concat([ + paddle.gather( + tgt['labels'], tgt_idx, axis=0) + for tgt, (_, tgt_idx) in zip(targets, indices) + ]) + batch_idx, src_idx = self._get_src_permutation_idx(indices) + for i, (batch_i, src_i) in enumerate(zip(batch_idx, src_idx)): + tgt_labels[int(batch_i), int(src_i)] = tgt_classes[i] + + tgt_labels = tgt_labels.flatten(0, 1).unsqueeze(-1) + + tgt_labels_onehot = paddle.cast( + tgt_labels == paddle.arange(0, self.num_classes), dtype='float32') + tgt_labels_onehot.stop_gradient = True + + src_logits = class_logits.flatten(0, 1) + + loss_cls = F.sigmoid_focal_loss( + src_logits, + tgt_labels_onehot, + alpha=self.focal_loss_alpha, + gamma=self.focal_loss_gamma, + reduction='sum') / avg_factor + losses = {'loss_cls': loss_cls * self.loss_weights['loss_cls']} + return losses + + def loss_bboxes(self, bbox_pred, targets, indices, avg_factor): + bboxes = paddle.concat([ + paddle.gather( + src, src_idx, axis=0) + for src, (src_idx, _) in zip(bbox_pred, indices) + ]) + + tgt_bboxes = paddle.concat([ + paddle.gather( + tgt['boxes'], tgt_idx, axis=0) + for tgt, (_, tgt_idx) in zip(targets, indices) + ]) + tgt_bboxes.stop_gradient = True + + im_shapes = paddle.concat([tgt['img_whwh_tgt'] for tgt in targets]) + bboxes_norm = bboxes / im_shapes + tgt_bboxes_norm = tgt_bboxes / im_shapes + + loss_giou = self.giou_loss(bboxes, tgt_bboxes) / avg_factor + loss_bbox = F.l1_loss( + bboxes_norm, tgt_bboxes_norm, reduction='sum') / avg_factor + losses = { + 'loss_bbox': loss_bbox * self.loss_weights['loss_bbox'], + 'loss_giou': loss_giou * self.loss_weights['loss_giou'] + } + return losses + + def loss_masks(self, pos_bbox_pred, mask_logits, targets, indices, + avg_factor): + tgt_segm = [ + paddle.gather( + tgt['gt_segm'], tgt_idx, axis=0) + for tgt, (_, tgt_idx) in zip(targets, indices) + ] + + tgt_masks = [] + for i in range(len(indices)): + gt_segm = tgt_segm[i].unsqueeze(1) + if len(gt_segm) == 0: + continue + boxes = pos_bbox_pred[i] + boxes[:, 0::2] = paddle.clip( + boxes[:, 0::2], min=0, max=gt_segm.shape[3]) + boxes[:, 1::2] = paddle.clip( + boxes[:, 1::2], min=0, max=gt_segm.shape[2]) + boxes_num = paddle.to_tensor([1] * len(boxes), dtype='int32') + gt_mask = paddle.vision.ops.roi_align( + gt_segm, + boxes, + boxes_num, + output_size=mask_logits.shape[-2:], + aligned=True) + tgt_masks.append(gt_mask) + tgt_masks = paddle.concat(tgt_masks).squeeze(1) + tgt_masks = paddle.cast(tgt_masks >= 0.5, dtype='float32') + tgt_masks.stop_gradient = True + + tgt_labels = paddle.concat([ + paddle.gather( + tgt['labels'], tgt_idx, axis=0) + for tgt, (_, tgt_idx) in zip(targets, indices) + ]) + + mask_label = F.one_hot(tgt_labels, self.num_classes).unsqueeze([2, 3]) + mask_label = paddle.expand_as(mask_label, mask_logits) + mask_label.stop_gradient = True + + src_masks = paddle.gather_nd(mask_logits, paddle.nonzero(mask_label)) + shape = mask_logits.shape + src_masks = paddle.reshape(src_masks, [shape[0], shape[2], shape[3]]) + src_masks = F.sigmoid(src_masks) + + X = src_masks.flatten(1) + Y = tgt_masks.flatten(1) + inter = paddle.sum(X * Y, 1) + union = paddle.sum(X * X, 1) + paddle.sum(Y * Y, 1) + dice = (2 * inter) / (union + 2e-5) + + loss_mask = (1 - dice).sum() / avg_factor + losses = {'loss_mask': loss_mask * self.loss_weights['loss_mask']} + return losses + + @staticmethod + def _get_src_permutation_idx(indices): + batch_idx = paddle.concat( + [paddle.full_like(src, i) for i, (src, _) in enumerate(indices)]) + src_idx = paddle.concat([src for (src, _) in indices]) + return batch_idx, src_idx diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/smooth_l1_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/smooth_l1_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..f89c28f664ba4ffe2f6d7fcdaa2ce582a15c4a80 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/smooth_l1_loss.py @@ -0,0 +1,60 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +__all__ = ['SmoothL1Loss'] + +@register +class SmoothL1Loss(nn.Layer): + """Smooth L1 Loss. + Args: + beta (float): controls smooth region, it becomes L1 Loss when beta=0.0 + loss_weight (float): the final loss will be multiplied by this + """ + def __init__(self, + beta=1.0, + loss_weight=1.0): + super(SmoothL1Loss, self).__init__() + assert beta >= 0 + self.beta = beta + self.loss_weight = loss_weight + + def forward(self, pred, target, reduction='none'): + """forward function, based on fvcore. + Args: + pred (Tensor): prediction tensor + target (Tensor): target tensor, pred.shape must be the same as target.shape + reduction (str): the way to reduce loss, one of (none, sum, mean) + """ + assert reduction in ('none', 'sum', 'mean') + target = target.detach() + if self.beta < 1e-5: + loss = paddle.abs(pred - target) + else: + n = paddle.abs(pred - target) + cond = n < self.beta + loss = paddle.where(cond, 0.5 * n ** 2 / self.beta, n - 0.5 * self.beta) + if reduction == 'mean': + loss = loss.mean() if loss.size > 0 else 0.0 * loss.sum() + elif reduction == 'sum': + loss = loss.sum() + return loss * self.loss_weight diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/solov2_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/solov2_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..ef97a7707b159cbf9fc2acba42dab58de43721b9 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/solov2_loss.py @@ -0,0 +1,101 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable + +__all__ = ['SOLOv2Loss'] + + +@register +@serializable +class SOLOv2Loss(object): + """ + SOLOv2Loss + Args: + ins_loss_weight (float): Weight of instance loss. + focal_loss_gamma (float): Gamma parameter for focal loss. + focal_loss_alpha (float): Alpha parameter for focal loss. + """ + + def __init__(self, + ins_loss_weight=3.0, + focal_loss_gamma=2.0, + focal_loss_alpha=0.25): + self.ins_loss_weight = ins_loss_weight + self.focal_loss_gamma = focal_loss_gamma + self.focal_loss_alpha = focal_loss_alpha + + def _dice_loss(self, input, target): + input = paddle.reshape(input, shape=(paddle.shape(input)[0], -1)) + target = paddle.reshape(target, shape=(paddle.shape(target)[0], -1)) + a = paddle.sum(input * target, axis=1) + b = paddle.sum(input * input, axis=1) + 0.001 + c = paddle.sum(target * target, axis=1) + 0.001 + d = (2 * a) / (b + c) + return 1 - d + + def __call__(self, ins_pred_list, ins_label_list, cate_preds, cate_labels, + num_ins): + """ + Get loss of network of SOLOv2. + Args: + ins_pred_list (list): Variable list of instance branch output. + ins_label_list (list): List of instance labels pre batch. + cate_preds (list): Concat Variable list of categroy branch output. + cate_labels (list): Concat list of categroy labels pre batch. + num_ins (int): Number of positive samples in a mini-batch. + Returns: + loss_ins (Variable): The instance loss Variable of SOLOv2 network. + loss_cate (Variable): The category loss Variable of SOLOv2 network. + """ + + #1. Ues dice_loss to calculate instance loss + loss_ins = [] + total_weights = paddle.zeros(shape=[1], dtype='float32') + for input, target in zip(ins_pred_list, ins_label_list): + if input is None: + continue + target = paddle.cast(target, 'float32') + target = paddle.reshape( + target, + shape=[-1, paddle.shape(input)[-2], paddle.shape(input)[-1]]) + weights = paddle.cast( + paddle.sum(target, axis=[1, 2]) > 0, 'float32') + input = F.sigmoid(input) + dice_out = paddle.multiply(self._dice_loss(input, target), weights) + total_weights += paddle.sum(weights) + loss_ins.append(dice_out) + loss_ins = paddle.sum(paddle.concat(loss_ins)) / total_weights + loss_ins = loss_ins * self.ins_loss_weight + + #2. Ues sigmoid_focal_loss to calculate category loss + # expand onehot labels + num_classes = cate_preds.shape[-1] + cate_labels_bin = F.one_hot(cate_labels, num_classes=num_classes + 1) + cate_labels_bin = cate_labels_bin[:, 1:] + + loss_cate = F.sigmoid_focal_loss( + cate_preds, + label=cate_labels_bin, + normalizer=num_ins + 1., + gamma=self.focal_loss_gamma, + alpha=self.focal_loss_alpha) + + return loss_ins, loss_cate diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/sparsercnn_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/sparsercnn_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..ac9eba6feee5d07d722e4524c4d373b51e7834c4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/sparsercnn_loss.py @@ -0,0 +1,430 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/PeizeSun/SparseR-CNN/blob/main/projects/SparseRCNN/sparsercnn/loss.py +Ths copyright of PeizeSun/SparseR-CNN is as follows: +MIT License [see LICENSE for details] +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from scipy.optimize import linear_sum_assignment +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.metric import accuracy +from ppdet.core.workspace import register +from ppdet.modeling.losses.iou_loss import GIoULoss + +__all__ = ["SparseRCNNLoss"] + + +@register +class SparseRCNNLoss(nn.Layer): + """ This class computes the loss for SparseRCNN. + The process happens in two steps: + 1) we compute hungarian assignment between ground truth boxes and the outputs of the model + 2) we supervise each pair of matched ground-truth / prediction (supervise class and box) + """ + __shared__ = ['num_classes'] + + def __init__(self, + losses, + focal_loss_alpha, + focal_loss_gamma, + num_classes=80, + class_weight=2., + l1_weight=5., + giou_weight=2.): + """ Create the criterion. + Parameters: + num_classes: number of object categories, omitting the special no-object category + weight_dict: dict containing as key the names of the losses and as values their relative weight. + losses: list of all the losses to be applied. See get_loss for list of available losses. + matcher: module able to compute a matching between targets and proposals + """ + super().__init__() + self.num_classes = num_classes + weight_dict = { + "loss_ce": class_weight, + "loss_bbox": l1_weight, + "loss_giou": giou_weight + } + self.weight_dict = weight_dict + self.losses = losses + self.giou_loss = GIoULoss(reduction="sum") + + self.focal_loss_alpha = focal_loss_alpha + self.focal_loss_gamma = focal_loss_gamma + + self.matcher = HungarianMatcher(focal_loss_alpha, focal_loss_gamma, + class_weight, l1_weight, giou_weight) + + def loss_labels(self, outputs, targets, indices, num_boxes, log=True): + """Classification loss (NLL) + targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes] + """ + assert 'pred_logits' in outputs + src_logits = outputs['pred_logits'] + + idx = self._get_src_permutation_idx(indices) + target_classes_o = paddle.concat([ + paddle.gather( + t["labels"], J, axis=0) for t, (_, J) in zip(targets, indices) + ]) + target_classes = paddle.full( + src_logits.shape[:2], self.num_classes, dtype="int32") + for i, ind in enumerate(zip(idx[0], idx[1])): + target_classes[int(ind[0]), int(ind[1])] = target_classes_o[i] + target_classes.stop_gradient = True + + src_logits = src_logits.flatten(start_axis=0, stop_axis=1) + + # prepare one_hot target. + target_classes = target_classes.flatten(start_axis=0, stop_axis=1) + class_ids = paddle.arange(0, self.num_classes) + labels = (target_classes.unsqueeze(-1) == class_ids).astype("float32") + labels.stop_gradient = True + + # comp focal loss. + class_loss = sigmoid_focal_loss( + src_logits, + labels, + alpha=self.focal_loss_alpha, + gamma=self.focal_loss_gamma, + reduction="sum", ) / num_boxes + losses = {'loss_ce': class_loss} + + if log: + label_acc = target_classes_o.unsqueeze(-1) + src_idx = [src for (src, _) in indices] + + pred_list = [] + for i in range(outputs["pred_logits"].shape[0]): + pred_list.append( + paddle.gather( + outputs["pred_logits"][i], src_idx[i], axis=0)) + + pred = F.sigmoid(paddle.concat(pred_list, axis=0)) + acc = accuracy(pred, label_acc.astype("int64")) + losses["acc"] = acc + + return losses + + def loss_boxes(self, outputs, targets, indices, num_boxes): + """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss + targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4] + The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size. + """ + assert 'pred_boxes' in outputs # [batch_size, num_proposals, 4] + src_idx = [src for (src, _) in indices] + src_boxes_list = [] + + for i in range(outputs["pred_boxes"].shape[0]): + src_boxes_list.append( + paddle.gather( + outputs["pred_boxes"][i], src_idx[i], axis=0)) + + src_boxes = paddle.concat(src_boxes_list, axis=0) + + target_boxes = paddle.concat( + [ + paddle.gather( + t['boxes'], I, axis=0) + for t, (_, I) in zip(targets, indices) + ], + axis=0) + target_boxes.stop_gradient = True + losses = {} + + losses['loss_giou'] = self.giou_loss(src_boxes, + target_boxes) / num_boxes + + image_size = paddle.concat([v["img_whwh_tgt"] for v in targets]) + src_boxes_ = src_boxes / image_size + target_boxes_ = target_boxes / image_size + + loss_bbox = F.l1_loss(src_boxes_, target_boxes_, reduction='sum') + losses['loss_bbox'] = loss_bbox / num_boxes + + return losses + + def _get_src_permutation_idx(self, indices): + # permute predictions following indices + batch_idx = paddle.concat( + [paddle.full_like(src, i) for i, (src, _) in enumerate(indices)]) + src_idx = paddle.concat([src for (src, _) in indices]) + return batch_idx, src_idx + + def _get_tgt_permutation_idx(self, indices): + # permute targets following indices + batch_idx = paddle.concat( + [paddle.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)]) + tgt_idx = paddle.concat([tgt for (_, tgt) in indices]) + return batch_idx, tgt_idx + + def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs): + loss_map = { + 'labels': self.loss_labels, + 'boxes': self.loss_boxes, + } + assert loss in loss_map, f'do you really want to compute {loss} loss?' + return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) + + def forward(self, outputs, targets): + """ This performs the loss computation. + Parameters: + outputs: dict of tensors, see the output specification of the model for the format + targets: list of dicts, such that len(targets) == batch_size. + The expected keys in each dict depends on the losses applied, see each loss' doc + """ + outputs_without_aux = { + k: v + for k, v in outputs.items() if k != 'aux_outputs' + } + + # Retrieve the matching between the outputs of the last layer and the targets + indices = self.matcher(outputs_without_aux, targets) + + # Compute the average number of target boxes across all nodes, for normalization purposes + num_boxes = sum(len(t["labels"]) for t in targets) + num_boxes = paddle.to_tensor( + [num_boxes], + dtype="float32", + place=next(iter(outputs.values())).place) + + # Compute all the requested losses + losses = {} + for loss in self.losses: + losses.update( + self.get_loss(loss, outputs, targets, indices, num_boxes)) + + # In case of auxiliary losses, we repeat this process with the output of each intermediate layer. + if 'aux_outputs' in outputs: + for i, aux_outputs in enumerate(outputs['aux_outputs']): + indices = self.matcher(aux_outputs, targets) + for loss in self.losses: + kwargs = {} + if loss == 'labels': + # Logging is enabled only for the last layer + kwargs = {'log': False} + l_dict = self.get_loss(loss, aux_outputs, targets, indices, + num_boxes, **kwargs) + + w_dict = {} + for k in l_dict.keys(): + if k in self.weight_dict: + w_dict[k + f'_{i}'] = l_dict[k] * self.weight_dict[ + k] + else: + w_dict[k + f'_{i}'] = l_dict[k] + losses.update(w_dict) + + return losses + + +class HungarianMatcher(nn.Layer): + """This class computes an assignment between the targets and the predictions of the network + For efficiency reasons, the targets don't include the no_object. Because of this, in general, + there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions, + while the others are un-matched (and thus treated as non-objects). + """ + + def __init__(self, + focal_loss_alpha, + focal_loss_gamma, + cost_class: float=1, + cost_bbox: float=1, + cost_giou: float=1): + """Creates the matcher + Params: + cost_class: This is the relative weight of the classification error in the matching cost + cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost + cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost + """ + super().__init__() + self.cost_class = cost_class + self.cost_bbox = cost_bbox + self.cost_giou = cost_giou + self.focal_loss_alpha = focal_loss_alpha + self.focal_loss_gamma = focal_loss_gamma + assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0" + + @paddle.no_grad() + def forward(self, outputs, targets): + """ Performs the matching + Args: + outputs: This is a dict that contains at least these entries: + "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits + "pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates + eg. outputs = {"pred_logits": pred_logits, "pred_boxes": pred_boxes} + targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing: + "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth + objects in the target) containing the class labels + "boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates + eg. targets = [{"labels":labels, "boxes": boxes}, ...,{"labels":labels, "boxes": boxes}] + Returns: + A list of size batch_size, containing tuples of (index_i, index_j) where: + - index_i is the indices of the selected predictions (in order) + - index_j is the indices of the corresponding selected targets (in order) + For each batch element, it holds: + len(index_i) = len(index_j) = min(num_queries, num_target_boxes) + """ + bs, num_queries = outputs["pred_logits"].shape[:2] + + if sum(len(v["labels"]) for v in targets) == 0: + return [(paddle.to_tensor( + [], dtype=paddle.int64), paddle.to_tensor( + [], dtype=paddle.int64)) for _ in range(bs)] + + # We flatten to compute the cost matrices in a batch + out_prob = F.sigmoid(outputs["pred_logits"].flatten( + start_axis=0, stop_axis=1)) + out_bbox = outputs["pred_boxes"].flatten(start_axis=0, stop_axis=1) + + # Also concat the target labels and boxes + tgt_ids = paddle.concat([v["labels"] for v in targets]) + assert (tgt_ids > -1).all() + tgt_bbox = paddle.concat([v["boxes"] for v in targets]) + + # Compute the classification cost. Contrary to the loss, we don't use the NLL, + # but approximate it in 1 - proba[target class]. + # The 1 is a constant that doesn't change the matching, it can be ommitted. + + # Compute the classification cost. + alpha = self.focal_loss_alpha + gamma = self.focal_loss_gamma + + neg_cost_class = (1 - alpha) * (out_prob**gamma) * (-( + 1 - out_prob + 1e-8).log()) + pos_cost_class = alpha * ((1 - out_prob) + **gamma) * (-(out_prob + 1e-8).log()) + + cost_class = paddle.gather( + pos_cost_class, tgt_ids, axis=1) - paddle.gather( + neg_cost_class, tgt_ids, axis=1) + + # Compute the L1 cost between boxes + image_size_out = paddle.concat( + [v["img_whwh"].unsqueeze(0) for v in targets]) + image_size_out = image_size_out.unsqueeze(1).tile( + [1, num_queries, 1]).flatten( + start_axis=0, stop_axis=1) + image_size_tgt = paddle.concat([v["img_whwh_tgt"] for v in targets]) + + out_bbox_ = out_bbox / image_size_out + tgt_bbox_ = tgt_bbox / image_size_tgt + cost_bbox = F.l1_loss( + out_bbox_.unsqueeze(-2), tgt_bbox_, + reduction='none').sum(-1) # [batch_size * num_queries, num_tgts] + + # Compute the giou cost betwen boxes + cost_giou = -get_bboxes_giou(out_bbox, tgt_bbox) + + # Final cost matrix + C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou + C = C.reshape([bs, num_queries, -1]) + + sizes = [len(v["boxes"]) for v in targets] + + indices = [ + linear_sum_assignment(c[i].numpy()) + for i, c in enumerate(C.split(sizes, -1)) + ] + return [(paddle.to_tensor( + i, dtype="int32"), paddle.to_tensor( + j, dtype="int32")) for i, j in indices] + + +def box_area(boxes): + assert (boxes[:, 2:] >= boxes[:, :2]).all() + wh = boxes[:, 2:] - boxes[:, :2] + return wh[:, 0] * wh[:, 1] + + +def boxes_iou(boxes1, boxes2): + ''' + Compute iou + + Args: + boxes1 (paddle.tensor) shape (N, 4) + boxes2 (paddle.tensor) shape (M, 4) + + Return: + (paddle.tensor) shape (N, M) + ''' + area1 = box_area(boxes1) + area2 = box_area(boxes2) + + lt = paddle.maximum(boxes1.unsqueeze(-2)[:, :, :2], boxes2[:, :2]) + rb = paddle.minimum(boxes1.unsqueeze(-2)[:, :, 2:], boxes2[:, 2:]) + + wh = (rb - lt).astype("float32").clip(min=1e-9) + inter = wh[:, :, 0] * wh[:, :, 1] + + union = area1.unsqueeze(-1) + area2 - inter + 1e-9 + + iou = inter / union + return iou, union + + +def get_bboxes_giou(boxes1, boxes2, eps=1e-9): + """calculate the ious of boxes1 and boxes2 + + Args: + boxes1 (Tensor): shape [N, 4] + boxes2 (Tensor): shape [M, 4] + eps (float): epsilon to avoid divide by zero + + Return: + ious (Tensor): ious of boxes1 and boxes2, with the shape [N, M] + """ + assert (boxes1[:, 2:] >= boxes1[:, :2]).all() + assert (boxes2[:, 2:] >= boxes2[:, :2]).all() + + iou, union = boxes_iou(boxes1, boxes2) + + lt = paddle.minimum(boxes1.unsqueeze(-2)[:, :, :2], boxes2[:, :2]) + rb = paddle.maximum(boxes1.unsqueeze(-2)[:, :, 2:], boxes2[:, 2:]) + + wh = (rb - lt).astype("float32").clip(min=eps) + enclose_area = wh[:, :, 0] * wh[:, :, 1] + + giou = iou - (enclose_area - union) / enclose_area + + return giou + + +def sigmoid_focal_loss(inputs, targets, alpha, gamma, reduction="sum"): + + assert reduction in ["sum", "mean" + ], f'do not support this {reduction} reduction?' + + p = F.sigmoid(inputs) + ce_loss = F.binary_cross_entropy_with_logits( + inputs, targets, reduction="none") + p_t = p * targets + (1 - p) * (1 - targets) + loss = ce_loss * ((1 - p_t)**gamma) + + if alpha >= 0: + alpha_t = alpha * targets + (1 - alpha) * (1 - targets) + loss = alpha_t * loss + + if reduction == "mean": + loss = loss.mean() + elif reduction == "sum": + loss = loss.sum() + + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/ssd_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/ssd_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..2ab94f2b5bbf1f31fe47d186a92ac805cdf6daf3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/ssd_loss.py @@ -0,0 +1,168 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from ..bbox_utils import iou_similarity, bbox2delta + +__all__ = ['SSDLoss'] + + +@register +class SSDLoss(nn.Layer): + """ + SSDLoss + + Args: + overlap_threshold (float32, optional): IoU threshold for negative bboxes + and positive bboxes, 0.5 by default. + neg_pos_ratio (float): The ratio of negative samples / positive samples. + loc_loss_weight (float): The weight of loc_loss. + conf_loss_weight (float): The weight of conf_loss. + prior_box_var (list): Variances corresponding to prior box coord, [0.1, + 0.1, 0.2, 0.2] by default. + """ + + def __init__(self, + overlap_threshold=0.5, + neg_pos_ratio=3.0, + loc_loss_weight=1.0, + conf_loss_weight=1.0, + prior_box_var=[0.1, 0.1, 0.2, 0.2]): + super(SSDLoss, self).__init__() + self.overlap_threshold = overlap_threshold + self.neg_pos_ratio = neg_pos_ratio + self.loc_loss_weight = loc_loss_weight + self.conf_loss_weight = conf_loss_weight + self.prior_box_var = [1. / a for a in prior_box_var] + + def _bipartite_match_for_batch(self, gt_bbox, gt_label, prior_boxes, + bg_index): + """ + Args: + gt_bbox (Tensor): [B, N, 4] + gt_label (Tensor): [B, N, 1] + prior_boxes (Tensor): [A, 4] + bg_index (int): Background class index + """ + batch_size, num_priors = gt_bbox.shape[0], prior_boxes.shape[0] + ious = iou_similarity(gt_bbox.reshape((-1, 4)), prior_boxes).reshape( + (batch_size, -1, num_priors)) + + # For each prior box, get the max IoU of all GTs. + prior_max_iou, prior_argmax_iou = ious.max(axis=1), ious.argmax(axis=1) + # For each GT, get the max IoU of all prior boxes. + gt_max_iou, gt_argmax_iou = ious.max(axis=2), ious.argmax(axis=2) + + # Gather target bbox and label according to 'prior_argmax_iou' index. + batch_ind = paddle.arange(end=batch_size, dtype='int64').unsqueeze(-1) + prior_argmax_iou = paddle.stack( + [batch_ind.tile([1, num_priors]), prior_argmax_iou], axis=-1) + targets_bbox = paddle.gather_nd(gt_bbox, prior_argmax_iou) + targets_label = paddle.gather_nd(gt_label, prior_argmax_iou) + # Assign negative + bg_index_tensor = paddle.full([batch_size, num_priors, 1], bg_index, + 'int64') + targets_label = paddle.where( + prior_max_iou.unsqueeze(-1) < self.overlap_threshold, + bg_index_tensor, targets_label) + + # Ensure each GT can match the max IoU prior box. + batch_ind = (batch_ind * num_priors + gt_argmax_iou).flatten() + targets_bbox = paddle.scatter( + targets_bbox.reshape([-1, 4]), batch_ind, + gt_bbox.reshape([-1, 4])).reshape([batch_size, -1, 4]) + targets_label = paddle.scatter( + targets_label.reshape([-1, 1]), batch_ind, + gt_label.reshape([-1, 1])).reshape([batch_size, -1, 1]) + targets_label[:, :1] = bg_index + + # Encode box + prior_boxes = prior_boxes.unsqueeze(0).tile([batch_size, 1, 1]) + targets_bbox = bbox2delta( + prior_boxes.reshape([-1, 4]), + targets_bbox.reshape([-1, 4]), self.prior_box_var) + targets_bbox = targets_bbox.reshape([batch_size, -1, 4]) + + return targets_bbox, targets_label + + def _mine_hard_example(self, + conf_loss, + targets_label, + bg_index, + mine_neg_ratio=0.01): + pos = (targets_label != bg_index).astype(conf_loss.dtype) + num_pos = pos.sum(axis=1, keepdim=True) + neg = (targets_label == bg_index).astype(conf_loss.dtype) + + conf_loss = conf_loss.detach() * neg + loss_idx = conf_loss.argsort(axis=1, descending=True) + idx_rank = loss_idx.argsort(axis=1) + num_negs = [] + for i in range(conf_loss.shape[0]): + cur_num_pos = num_pos[i] + num_neg = paddle.clip( + cur_num_pos * self.neg_pos_ratio, max=pos.shape[1]) + num_neg = num_neg if num_neg > 0 else paddle.to_tensor( + [pos.shape[1] * mine_neg_ratio]) + num_negs.append(num_neg) + num_negs = paddle.stack(num_negs).expand_as(idx_rank) + neg_mask = (idx_rank < num_negs).astype(conf_loss.dtype) + + return (neg_mask + pos).astype('bool') + + def forward(self, boxes, scores, gt_bbox, gt_label, prior_boxes): + boxes = paddle.concat(boxes, axis=1) + scores = paddle.concat(scores, axis=1) + gt_label = gt_label.unsqueeze(-1).astype('int64') + prior_boxes = paddle.concat(prior_boxes, axis=0) + bg_index = scores.shape[-1] - 1 + + # Match bbox and get targets. + targets_bbox, targets_label = \ + self._bipartite_match_for_batch(gt_bbox, gt_label, prior_boxes, bg_index) + targets_bbox.stop_gradient = True + targets_label.stop_gradient = True + + # Compute regression loss. + # Select positive samples. + bbox_mask = paddle.tile(targets_label != bg_index, [1, 1, 4]) + if bbox_mask.astype(boxes.dtype).sum() > 0: + location = paddle.masked_select(boxes, bbox_mask) + targets_bbox = paddle.masked_select(targets_bbox, bbox_mask) + loc_loss = F.smooth_l1_loss(location, targets_bbox, reduction='sum') + loc_loss = loc_loss * self.loc_loss_weight + else: + loc_loss = paddle.zeros([1]) + + # Compute confidence loss. + conf_loss = F.cross_entropy(scores, targets_label, reduction="none") + # Mining hard examples. + label_mask = self._mine_hard_example( + conf_loss.squeeze(-1), targets_label.squeeze(-1), bg_index) + conf_loss = paddle.masked_select(conf_loss, label_mask.unsqueeze(-1)) + conf_loss = conf_loss.sum() * self.conf_loss_weight + + # Compute overall weighted loss. + normalizer = (targets_label != bg_index).astype('float32').sum().clip( + min=1) + loss = (conf_loss + loc_loss) / normalizer + + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/supcontrast.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/supcontrast.py new file mode 100644 index 0000000000000000000000000000000000000000..3e59f08124fc0acb974ee57d70d5a489e9cd5312 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/supcontrast.py @@ -0,0 +1,83 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import random +from ppdet.core.workspace import register + + +__all__ = ['SupContrast'] + + +@register +class SupContrast(nn.Layer): + __shared__ = [ + 'num_classes' + ] + def __init__(self, num_classes=80, temperature=2.5, sample_num=4096, thresh=0.75): + super(SupContrast, self).__init__() + self.num_classes = num_classes + self.temperature = temperature + self.sample_num = sample_num + self.thresh = thresh + def forward(self, features, labels, scores): + + assert features.shape[0] == labels.shape[0] == scores.shape[0] + positive_mask = (labels < self.num_classes) + positive_features, positive_labels, positive_scores = features[positive_mask], labels[positive_mask], \ + scores[positive_mask] + + negative_mask = (labels == self.num_classes) + negative_features, negative_labels, negative_scores = features[negative_mask], labels[negative_mask], \ + scores[negative_mask] + + N = negative_features.shape[0] + S = self.sample_num - positive_mask.sum() + index = paddle.to_tensor(random.sample(range(N), int(S)), dtype='int32') + + negative_features = paddle.index_select(x=negative_features, index=index, axis=0) + negative_labels = paddle.index_select(x=negative_labels, index=index, axis=0) + negative_scores = paddle.index_select(x=negative_scores, index=index, axis=0) + + features = paddle.concat([positive_features, negative_features], 0) + labels = paddle.concat([positive_labels, negative_labels], 0) + scores = paddle.concat([positive_scores, negative_scores], 0) + + if len(labels.shape) == 1: + labels = labels.reshape([-1, 1]) + label_mask = paddle.equal(labels, labels.T).detach() + similarity = (paddle.matmul(features, features.T) / self.temperature) + + sim_row_max = paddle.max(similarity, axis=1, keepdim=True) + similarity = similarity - sim_row_max + + logits_mask = paddle.ones_like(similarity).detach() + logits_mask.fill_diagonal_(0) + + exp_sim = paddle.exp(similarity) * logits_mask + log_prob = similarity - paddle.log(exp_sim.sum(axis=1, keepdim=True)) + + per_label_log_prob = (log_prob * logits_mask * label_mask).sum(1) / label_mask.sum(1) + keep = scores > self.thresh + per_label_log_prob = per_label_log_prob[keep] + loss = -per_label_log_prob + + return loss.mean() \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/varifocal_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/varifocal_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..42d18a659e824f8e1aacbe2b4bdd1b0c9b6bbf04 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/varifocal_loss.py @@ -0,0 +1,152 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/losses/varifocal_loss.py + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling import ops + +__all__ = ['VarifocalLoss'] + + +def varifocal_loss(pred, + target, + alpha=0.75, + gamma=2.0, + iou_weighted=True, + use_sigmoid=True): + """`Varifocal Loss `_ + + Args: + pred (Tensor): The prediction with shape (N, C), C is the + number of classes + target (Tensor): The learning target of the iou-aware + classification score with shape (N, C), C is the number of classes. + alpha (float, optional): A balance factor for the negative part of + Varifocal Loss, which is different from the alpha of Focal Loss. + Defaults to 0.75. + gamma (float, optional): The gamma for calculating the modulating + factor. Defaults to 2.0. + iou_weighted (bool, optional): Whether to weight the loss of the + positive example with the iou target. Defaults to True. + """ + # pred and target should be of the same size + assert pred.shape == target.shape + if use_sigmoid: + pred_new = F.sigmoid(pred) + else: + pred_new = pred + target = target.cast(pred.dtype) + if iou_weighted: + focal_weight = target * (target > 0.0).cast('float32') + \ + alpha * (pred_new - target).abs().pow(gamma) * \ + (target <= 0.0).cast('float32') + else: + focal_weight = (target > 0.0).cast('float32') + \ + alpha * (pred_new - target).abs().pow(gamma) * \ + (target <= 0.0).cast('float32') + + if use_sigmoid: + loss = F.binary_cross_entropy_with_logits( + pred, target, reduction='none') * focal_weight + else: + loss = F.binary_cross_entropy( + pred, target, reduction='none') * focal_weight + loss = loss.sum(axis=1) + return loss + + +@register +@serializable +class VarifocalLoss(nn.Layer): + def __init__(self, + use_sigmoid=True, + alpha=0.75, + gamma=2.0, + iou_weighted=True, + reduction='mean', + loss_weight=1.0): + """`Varifocal Loss `_ + + Args: + use_sigmoid (bool, optional): Whether the prediction is + used for sigmoid or softmax. Defaults to True. + alpha (float, optional): A balance factor for the negative part of + Varifocal Loss, which is different from the alpha of Focal + Loss. Defaults to 0.75. + gamma (float, optional): The gamma for calculating the modulating + factor. Defaults to 2.0. + iou_weighted (bool, optional): Whether to weight the loss of the + positive examples with the iou target. Defaults to True. + reduction (str, optional): The method used to reduce the loss into + a scalar. Defaults to 'mean'. Options are "none", "mean" and + "sum". + loss_weight (float, optional): Weight of loss. Defaults to 1.0. + """ + super(VarifocalLoss, self).__init__() + assert alpha >= 0.0 + self.use_sigmoid = use_sigmoid + self.alpha = alpha + self.gamma = gamma + self.iou_weighted = iou_weighted + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, pred, target, weight=None, avg_factor=None): + """Forward function. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + Returns: + Tensor: The calculated loss + """ + loss = self.loss_weight * varifocal_loss( + pred, + target, + alpha=self.alpha, + gamma=self.gamma, + iou_weighted=self.iou_weighted, + use_sigmoid=self.use_sigmoid) + + if weight is not None: + loss = loss * weight + if avg_factor is None: + if self.reduction == 'none': + return loss + elif self.reduction == 'mean': + return loss.mean() + elif self.reduction == 'sum': + return loss.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if self.reduction == 'mean': + loss = loss.sum() / avg_factor + # if reduction is 'none', then do nothing, otherwise raise an error + elif self.reduction != 'none': + raise ValueError( + 'avg_factor can not be used with reduction="sum"') + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/losses/yolo_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/losses/yolo_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..1ba05f2c8eae530e44e20d21375f7cf9b9cd1fb0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/losses/yolo_loss.py @@ -0,0 +1,206 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register + +from ..bbox_utils import decode_yolo, xywh2xyxy, batch_iou_similarity + +__all__ = ['YOLOv3Loss'] + + +def bbox_transform(pbox, anchor, downsample): + pbox = decode_yolo(pbox, anchor, downsample) + pbox = xywh2xyxy(pbox) + return pbox + + +@register +class YOLOv3Loss(nn.Layer): + + __inject__ = ['iou_loss', 'iou_aware_loss'] + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + ignore_thresh=0.7, + label_smooth=False, + downsample=[32, 16, 8], + scale_x_y=1., + iou_loss=None, + iou_aware_loss=None): + """ + YOLOv3Loss layer + + Args: + num_calsses (int): number of foreground classes + ignore_thresh (float): threshold to ignore confidence loss + label_smooth (bool): whether to use label smoothing + downsample (list): downsample ratio for each detection block + scale_x_y (float): scale_x_y factor + iou_loss (object): IoULoss instance + iou_aware_loss (object): IouAwareLoss instance + """ + super(YOLOv3Loss, self).__init__() + self.num_classes = num_classes + self.ignore_thresh = ignore_thresh + self.label_smooth = label_smooth + self.downsample = downsample + self.scale_x_y = scale_x_y + self.iou_loss = iou_loss + self.iou_aware_loss = iou_aware_loss + self.distill_pairs = [] + + def obj_loss(self, pbox, gbox, pobj, tobj, anchor, downsample): + # pbox + pbox = decode_yolo(pbox, anchor, downsample) + pbox = xywh2xyxy(pbox) + pbox = paddle.concat(pbox, axis=-1) + b = pbox.shape[0] + pbox = pbox.reshape((b, -1, 4)) + # gbox + gxy = gbox[:, :, 0:2] - gbox[:, :, 2:4] * 0.5 + gwh = gbox[:, :, 0:2] + gbox[:, :, 2:4] * 0.5 + gbox = paddle.concat([gxy, gwh], axis=-1) + + iou = batch_iou_similarity(pbox, gbox) + iou.stop_gradient = True + iou_max = iou.max(2) # [N, M1] + iou_mask = paddle.cast(iou_max <= self.ignore_thresh, dtype=pbox.dtype) + iou_mask.stop_gradient = True + + pobj = pobj.reshape((b, -1)) + tobj = tobj.reshape((b, -1)) + obj_mask = paddle.cast(tobj > 0, dtype=pbox.dtype) + obj_mask.stop_gradient = True + + loss_obj = F.binary_cross_entropy_with_logits( + pobj, obj_mask, reduction='none') + loss_obj_pos = (loss_obj * tobj) + loss_obj_neg = (loss_obj * (1 - obj_mask) * iou_mask) + return loss_obj_pos + loss_obj_neg + + def cls_loss(self, pcls, tcls): + if self.label_smooth: + delta = min(1. / self.num_classes, 1. / 40) + pos, neg = 1 - delta, delta + # 1 for positive, 0 for negative + tcls = pos * paddle.cast( + tcls > 0., dtype=tcls.dtype) + neg * paddle.cast( + tcls <= 0., dtype=tcls.dtype) + + loss_cls = F.binary_cross_entropy_with_logits( + pcls, tcls, reduction='none') + return loss_cls + + def yolov3_loss(self, p, t, gt_box, anchor, downsample, scale=1., + eps=1e-10): + na = len(anchor) + b, c, h, w = p.shape + if self.iou_aware_loss: + ioup, p = p[:, 0:na, :, :], p[:, na:, :, :] + ioup = ioup.unsqueeze(-1) + p = p.reshape((b, na, -1, h, w)).transpose((0, 1, 3, 4, 2)) + x, y = p[:, :, :, :, 0:1], p[:, :, :, :, 1:2] + w, h = p[:, :, :, :, 2:3], p[:, :, :, :, 3:4] + obj, pcls = p[:, :, :, :, 4:5], p[:, :, :, :, 5:] + self.distill_pairs.append([x, y, w, h, obj, pcls]) + + t = t.transpose((0, 1, 3, 4, 2)) + tx, ty = t[:, :, :, :, 0:1], t[:, :, :, :, 1:2] + tw, th = t[:, :, :, :, 2:3], t[:, :, :, :, 3:4] + tscale = t[:, :, :, :, 4:5] + tobj, tcls = t[:, :, :, :, 5:6], t[:, :, :, :, 6:] + + tscale_obj = tscale * tobj + loss = dict() + + x = scale * F.sigmoid(x) - 0.5 * (scale - 1.) + y = scale * F.sigmoid(y) - 0.5 * (scale - 1.) + + if abs(scale - 1.) < eps: + loss_x = F.binary_cross_entropy(x, tx, reduction='none') + loss_y = F.binary_cross_entropy(y, ty, reduction='none') + loss_xy = tscale_obj * (loss_x + loss_y) + else: + loss_x = paddle.abs(x - tx) + loss_y = paddle.abs(y - ty) + loss_xy = tscale_obj * (loss_x + loss_y) + + loss_xy = loss_xy.sum([1, 2, 3, 4]).mean() + + loss_w = paddle.abs(w - tw) + loss_h = paddle.abs(h - th) + loss_wh = tscale_obj * (loss_w + loss_h) + loss_wh = loss_wh.sum([1, 2, 3, 4]).mean() + + loss['loss_xy'] = loss_xy + loss['loss_wh'] = loss_wh + + if self.iou_loss is not None: + # warn: do not modify x, y, w, h in place + box, tbox = [x, y, w, h], [tx, ty, tw, th] + pbox = bbox_transform(box, anchor, downsample) + gbox = bbox_transform(tbox, anchor, downsample) + loss_iou = self.iou_loss(pbox, gbox) + loss_iou = loss_iou * tscale_obj + loss_iou = loss_iou.sum([1, 2, 3, 4]).mean() + loss['loss_iou'] = loss_iou + + if self.iou_aware_loss is not None: + box, tbox = [x, y, w, h], [tx, ty, tw, th] + pbox = bbox_transform(box, anchor, downsample) + gbox = bbox_transform(tbox, anchor, downsample) + loss_iou_aware = self.iou_aware_loss(ioup, pbox, gbox) + loss_iou_aware = loss_iou_aware * tobj + loss_iou_aware = loss_iou_aware.sum([1, 2, 3, 4]).mean() + loss['loss_iou_aware'] = loss_iou_aware + + box = [x, y, w, h] + loss_obj = self.obj_loss(box, gt_box, obj, tobj, anchor, downsample) + loss_obj = loss_obj.sum(-1).mean() + loss['loss_obj'] = loss_obj + loss_cls = self.cls_loss(pcls, tcls) * tobj + loss_cls = loss_cls.sum([1, 2, 3, 4]).mean() + loss['loss_cls'] = loss_cls + return loss + + def forward(self, inputs, targets, anchors): + np = len(inputs) + gt_targets = [targets['target{}'.format(i)] for i in range(np)] + gt_box = targets['gt_bbox'] + yolo_losses = dict() + self.distill_pairs.clear() + for x, t, anchor, downsample in zip(inputs, gt_targets, anchors, + self.downsample): + yolo_loss = self.yolov3_loss(x, t, gt_box, anchor, downsample, + self.scale_x_y) + for k, v in yolo_loss.items(): + if k in yolo_losses: + yolo_losses[k] += v + else: + yolo_losses[k] = v + + loss = 0 + for k, v in yolo_losses.items(): + loss += v + + yolo_losses['loss'] = loss + return yolo_losses diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..258e4c9010832936f098e6febe777ac556f0668f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/__init__.py @@ -0,0 +1,25 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import matching +from . import tracker +from . import motion +from . import visualization +from . import utils + +from .matching import * +from .tracker import * +from .motion import * +from .visualization import * +from .utils import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ccb5f10fef21994f3af3ad13af948e97d63a4f3d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6accc8456e6c04a683dd2de42a363eae133564fe Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/visualization.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/visualization.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..38c2da4722fcb76bf5a3760e4b541f3e8e387192 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/__pycache__/visualization.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..f6a88c5673a50452415b1f86f7b18bac12297f49 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__init__.py @@ -0,0 +1,21 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import jde_matching +from . import deepsort_matching +from . import ocsort_matching + +from .jde_matching import * +from .deepsort_matching import * +from .ocsort_matching import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c8d6bee57da0cf27405379bf6de30fb1818dd73e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/deepsort_matching.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/deepsort_matching.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5a1fab2945f0e458862e9ba2772901ffd08abc62 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/deepsort_matching.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/jde_matching.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/jde_matching.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c6cc2a7c3cadab6b6e4ea4f6e32c64824ad2c50b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/jde_matching.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/ocsort_matching.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/ocsort_matching.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6e794689a25885b60bb5c4ec033d53c2347b1862 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/__pycache__/ocsort_matching.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/deepsort_matching.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/deepsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..3859ccfbd1f384cc24716a94342230c2c8a2387f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/deepsort_matching.py @@ -0,0 +1,379 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/tree/master/deep_sort +""" + +import numpy as np +from scipy.optimize import linear_sum_assignment +from ..motion import kalman_filter + +INFTY_COST = 1e+5 + +__all__ = [ + 'iou_1toN', + 'iou_cost', + '_nn_euclidean_distance', + '_nn_cosine_distance', + 'NearestNeighborDistanceMetric', + 'min_cost_matching', + 'matching_cascade', + 'gate_cost_matrix', +] + + +def iou_1toN(bbox, candidates): + """ + Computer intersection over union (IoU) by one box to N candidates. + + Args: + bbox (ndarray): A bounding box in format `(top left x, top left y, width, height)`. + candidates (ndarray): A matrix of candidate bounding boxes (one per row) in the + same format as `bbox`. + + Returns: + ious (ndarray): The intersection over union in [0, 1] between the `bbox` + and each candidate. A higher score means a larger fraction of the + `bbox` is occluded by the candidate. + """ + bbox_tl = bbox[:2] + bbox_br = bbox[:2] + bbox[2:] + candidates_tl = candidates[:, :2] + candidates_br = candidates[:, :2] + candidates[:, 2:] + + tl = np.c_[np.maximum(bbox_tl[0], candidates_tl[:, 0])[:, np.newaxis], + np.maximum(bbox_tl[1], candidates_tl[:, 1])[:, np.newaxis]] + br = np.c_[np.minimum(bbox_br[0], candidates_br[:, 0])[:, np.newaxis], + np.minimum(bbox_br[1], candidates_br[:, 1])[:, np.newaxis]] + wh = np.maximum(0., br - tl) + + area_intersection = wh.prod(axis=1) + area_bbox = bbox[2:].prod() + area_candidates = candidates[:, 2:].prod(axis=1) + ious = area_intersection / (area_bbox + area_candidates - area_intersection) + return ious + + +def iou_cost(tracks, detections, track_indices=None, detection_indices=None): + """ + IoU distance metric. + + Args: + tracks (list[Track]): A list of tracks. + detections (list[Detection]): A list of detections. + track_indices (Optional[list[int]]): A list of indices to tracks that + should be matched. Defaults to all `tracks`. + detection_indices (Optional[list[int]]): A list of indices to detections + that should be matched. Defaults to all `detections`. + + Returns: + cost_matrix (ndarray): A cost matrix of shape len(track_indices), + len(detection_indices) where entry (i, j) is + `1 - iou(tracks[track_indices[i]], detections[detection_indices[j]])`. + """ + if track_indices is None: + track_indices = np.arange(len(tracks)) + if detection_indices is None: + detection_indices = np.arange(len(detections)) + + cost_matrix = np.zeros((len(track_indices), len(detection_indices))) + for row, track_idx in enumerate(track_indices): + if tracks[track_idx].time_since_update > 1: + cost_matrix[row, :] = 1e+5 + continue + + bbox = tracks[track_idx].to_tlwh() + candidates = np.asarray([detections[i].tlwh for i in detection_indices]) + cost_matrix[row, :] = 1. - iou_1toN(bbox, candidates) + return cost_matrix + + +def _nn_euclidean_distance(s, q): + """ + Compute pair-wise squared (Euclidean) distance between points in `s` and `q`. + + Args: + s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M. + q (ndarray): Query points: an LxM matrix of L samples of dimensionality M. + + Returns: + distances (ndarray): A vector of length M that contains for each entry in `q` the + smallest Euclidean distance to a sample in `s`. + """ + s, q = np.asarray(s), np.asarray(q) + if len(s) == 0 or len(q) == 0: + return np.zeros((len(s), len(q))) + s2, q2 = np.square(s).sum(axis=1), np.square(q).sum(axis=1) + distances = -2. * np.dot(s, q.T) + s2[:, None] + q2[None, :] + distances = np.clip(distances, 0., float(np.inf)) + + return np.maximum(0.0, distances.min(axis=0)) + + +def _nn_cosine_distance(s, q): + """ + Compute pair-wise cosine distance between points in `s` and `q`. + + Args: + s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M. + q (ndarray): Query points: an LxM matrix of L samples of dimensionality M. + + Returns: + distances (ndarray): A vector of length M that contains for each entry in `q` the + smallest Euclidean distance to a sample in `s`. + """ + s = np.asarray(s) / np.linalg.norm(s, axis=1, keepdims=True) + q = np.asarray(q) / np.linalg.norm(q, axis=1, keepdims=True) + distances = 1. - np.dot(s, q.T) + + return distances.min(axis=0) + + +class NearestNeighborDistanceMetric(object): + """ + A nearest neighbor distance metric that, for each target, returns + the closest distance to any sample that has been observed so far. + + Args: + metric (str): Either "euclidean" or "cosine". + matching_threshold (float): The matching threshold. Samples with larger + distance are considered an invalid match. + budget (Optional[int]): If not None, fix samples per class to at most + this number. Removes the oldest samples when the budget is reached. + + Attributes: + samples (Dict[int -> List[ndarray]]): A dictionary that maps from target + identities to the list of samples that have been observed so far. + """ + + def __init__(self, metric, matching_threshold, budget=None): + if metric == "euclidean": + self._metric = _nn_euclidean_distance + elif metric == "cosine": + self._metric = _nn_cosine_distance + else: + raise ValueError( + "Invalid metric; must be either 'euclidean' or 'cosine'") + self.matching_threshold = matching_threshold + self.budget = budget + self.samples = {} + + def partial_fit(self, features, targets, active_targets): + """ + Update the distance metric with new data. + + Args: + features (ndarray): An NxM matrix of N features of dimensionality M. + targets (ndarray): An integer array of associated target identities. + active_targets (List[int]): A list of targets that are currently + present in the scene. + """ + for feature, target in zip(features, targets): + self.samples.setdefault(target, []).append(feature) + if self.budget is not None: + self.samples[target] = self.samples[target][-self.budget:] + self.samples = {k: self.samples[k] for k in active_targets} + + def distance(self, features, targets): + """ + Compute distance between features and targets. + + Args: + features (ndarray): An NxM matrix of N features of dimensionality M. + targets (list[int]): A list of targets to match the given `features` against. + + Returns: + cost_matrix (ndarray): a cost matrix of shape len(targets), len(features), + where element (i, j) contains the closest squared distance between + `targets[i]` and `features[j]`. + """ + cost_matrix = np.zeros((len(targets), len(features))) + for i, target in enumerate(targets): + cost_matrix[i, :] = self._metric(self.samples[target], features) + return cost_matrix + + +def min_cost_matching(distance_metric, + max_distance, + tracks, + detections, + track_indices=None, + detection_indices=None): + """ + Solve linear assignment problem. + + Args: + distance_metric : + Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray + The distance metric is given a list of tracks and detections as + well as a list of N track indices and M detection indices. The + metric should return the NxM dimensional cost matrix, where element + (i, j) is the association cost between the i-th track in the given + track indices and the j-th detection in the given detection_indices. + max_distance (float): Gating threshold. Associations with cost larger + than this value are disregarded. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (list[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + + Returns: + A tuple (List[(int, int)], List[int], List[int]) with the following + three entries: + * A list of matched track and detection indices. + * A list of unmatched track indices. + * A list of unmatched detection indices. + """ + if track_indices is None: + track_indices = np.arange(len(tracks)) + if detection_indices is None: + detection_indices = np.arange(len(detections)) + + if len(detection_indices) == 0 or len(track_indices) == 0: + return [], track_indices, detection_indices # Nothing to match. + + cost_matrix = distance_metric(tracks, detections, track_indices, + detection_indices) + + cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5 + indices = linear_sum_assignment(cost_matrix) + + matches, unmatched_tracks, unmatched_detections = [], [], [] + for col, detection_idx in enumerate(detection_indices): + if col not in indices[1]: + unmatched_detections.append(detection_idx) + for row, track_idx in enumerate(track_indices): + if row not in indices[0]: + unmatched_tracks.append(track_idx) + for row, col in zip(indices[0], indices[1]): + track_idx = track_indices[row] + detection_idx = detection_indices[col] + if cost_matrix[row, col] > max_distance: + unmatched_tracks.append(track_idx) + unmatched_detections.append(detection_idx) + else: + matches.append((track_idx, detection_idx)) + return matches, unmatched_tracks, unmatched_detections + + +def matching_cascade(distance_metric, + max_distance, + cascade_depth, + tracks, + detections, + track_indices=None, + detection_indices=None): + """ + Run matching cascade. + + Args: + distance_metric : + Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray + The distance metric is given a list of tracks and detections as + well as a list of N track indices and M detection indices. The + metric should return the NxM dimensional cost matrix, where element + (i, j) is the association cost between the i-th track in the given + track indices and the j-th detection in the given detection_indices. + max_distance (float): Gating threshold. Associations with cost larger + than this value are disregarded. + cascade_depth (int): The cascade depth, should be se to the maximum + track age. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (list[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + + Returns: + A tuple (List[(int, int)], List[int], List[int]) with the following + three entries: + * A list of matched track and detection indices. + * A list of unmatched track indices. + * A list of unmatched detection indices. + """ + if track_indices is None: + track_indices = list(range(len(tracks))) + if detection_indices is None: + detection_indices = list(range(len(detections))) + + unmatched_detections = detection_indices + matches = [] + for level in range(cascade_depth): + if len(unmatched_detections) == 0: # No detections left + break + + track_indices_l = [ + k for k in track_indices if tracks[k].time_since_update == 1 + level + ] + if len(track_indices_l) == 0: # Nothing to match at this level + continue + + matches_l, _, unmatched_detections = \ + min_cost_matching( + distance_metric, max_distance, tracks, detections, + track_indices_l, unmatched_detections) + matches += matches_l + unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches)) + return matches, unmatched_tracks, unmatched_detections + + +def gate_cost_matrix(kf, + cost_matrix, + tracks, + detections, + track_indices, + detection_indices, + gated_cost=INFTY_COST, + only_position=False): + """ + Invalidate infeasible entries in cost matrix based on the state + distributions obtained by Kalman filtering. + + Args: + kf (object): The Kalman filter. + cost_matrix (ndarray): The NxM dimensional cost matrix, where N is the + number of track indices and M is the number of detection indices, + such that entry (i, j) is the association cost between + `tracks[track_indices[i]]` and `detections[detection_indices[j]]`. + tracks (list[Track]): A list of predicted tracks at the current time + step. + detections (list[Detection]): A list of detections at the current time + step. + track_indices (List[int]): List of track indices that maps rows in + `cost_matrix` to tracks in `tracks`. + detection_indices (List[int]): List of detection indices that maps + columns in `cost_matrix` to detections in `detections`. + gated_cost (Optional[float]): Entries in the cost matrix corresponding + to infeasible associations are set this value. Defaults to a very + large value. + only_position (Optional[bool]): If True, only the x, y position of the + state distribution is considered during gating. Default False. + """ + gating_dim = 2 if only_position else 4 + gating_threshold = kalman_filter.chi2inv95[gating_dim] + measurements = np.asarray( + [detections[i].to_xyah() for i in detection_indices]) + for row, track_idx in enumerate(track_indices): + track = tracks[track_idx] + gating_distance = kf.gating_distance(track.mean, track.covariance, + measurements, only_position) + cost_matrix[row, gating_distance > gating_threshold] = gated_cost + return cost_matrix diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/jde_matching.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/jde_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..ac28f90167f1b98c3193c375bd74536cf109a3ee --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/jde_matching.py @@ -0,0 +1,163 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py +""" + +try: + import lap +except: + print( + 'Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + pass + +import scipy +import numpy as np +from scipy.spatial.distance import cdist +from ..motion import kalman_filter +import warnings +warnings.filterwarnings("ignore") + +__all__ = [ + 'merge_matches', + 'linear_assignment', + 'bbox_ious', + 'iou_distance', + 'embedding_distance', + 'fuse_motion', +] + + +def merge_matches(m1, m2, shape): + O, P, Q = shape + m1 = np.asarray(m1) + m2 = np.asarray(m2) + + M1 = scipy.sparse.coo_matrix( + (np.ones(len(m1)), (m1[:, 0], m1[:, 1])), shape=(O, P)) + M2 = scipy.sparse.coo_matrix( + (np.ones(len(m2)), (m2[:, 0], m2[:, 1])), shape=(P, Q)) + + mask = M1 * M2 + match = mask.nonzero() + match = list(zip(match[0], match[1])) + unmatched_O = tuple(set(range(O)) - set([i for i, j in match])) + unmatched_Q = tuple(set(range(Q)) - set([j for i, j in match])) + + return match, unmatched_O, unmatched_Q + + +def linear_assignment(cost_matrix, thresh): + try: + import lap + except Exception as e: + raise RuntimeError( + 'Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + if cost_matrix.size == 0: + return np.empty( + (0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple( + range(cost_matrix.shape[1])) + matches, unmatched_a, unmatched_b = [], [], [] + cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh) + for ix, mx in enumerate(x): + if mx >= 0: + matches.append([ix, mx]) + unmatched_a = np.where(x < 0)[0] + unmatched_b = np.where(y < 0)[0] + matches = np.asarray(matches) + return matches, unmatched_a, unmatched_b + + +def bbox_ious(atlbrs, btlbrs): + boxes = np.ascontiguousarray(atlbrs, dtype=np.float32) + query_boxes = np.ascontiguousarray(btlbrs, dtype=np.float32) + N = boxes.shape[0] + K = query_boxes.shape[0] + ious = np.zeros((N, K), dtype=boxes.dtype) + if N * K == 0: + return ious + + for k in range(K): + box_area = ((query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1)) + for n in range(N): + iw = (min(boxes[n, 2], query_boxes[k, 2]) - max( + boxes[n, 0], query_boxes[k, 0]) + 1) + if iw > 0: + ih = (min(boxes[n, 3], query_boxes[k, 3]) - max( + boxes[n, 1], query_boxes[k, 1]) + 1) + if ih > 0: + ua = float((boxes[n, 2] - boxes[n, 0] + 1) * (boxes[ + n, 3] - boxes[n, 1] + 1) + box_area - iw * ih) + ious[n, k] = iw * ih / ua + return ious + + +def iou_distance(atracks, btracks): + """ + Compute cost based on IoU between two list[STrack]. + """ + if (len(atracks) > 0 and isinstance(atracks[0], np.ndarray)) or ( + len(btracks) > 0 and isinstance(btracks[0], np.ndarray)): + atlbrs = atracks + btlbrs = btracks + else: + atlbrs = [track.tlbr for track in atracks] + btlbrs = [track.tlbr for track in btracks] + _ious = bbox_ious(atlbrs, btlbrs) + cost_matrix = 1 - _ious + + return cost_matrix + + +def embedding_distance(tracks, detections, metric='euclidean'): + """ + Compute cost based on features between two list[STrack]. + """ + cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float32) + if cost_matrix.size == 0: + return cost_matrix + det_features = np.asarray( + [track.curr_feat for track in detections], dtype=np.float32) + track_features = np.asarray( + [track.smooth_feat for track in tracks], dtype=np.float32) + cost_matrix = np.maximum(0.0, cdist(track_features, det_features, + metric)) # Nomalized features + return cost_matrix + + +def fuse_motion(kf, + cost_matrix, + tracks, + detections, + only_position=False, + lambda_=0.98): + if cost_matrix.size == 0: + return cost_matrix + gating_dim = 2 if only_position else 4 + gating_threshold = kalman_filter.chi2inv95[gating_dim] + measurements = np.asarray([det.to_xyah() for det in detections]) + for row, track in enumerate(tracks): + gating_distance = kf.gating_distance( + track.mean, + track.covariance, + measurements, + only_position, + metric='maha') + cost_matrix[row, gating_distance > gating_threshold] = np.inf + cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_ + ) * gating_distance + return cost_matrix diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/ocsort_matching.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/ocsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..58f79a5c8f2d1547341c5b6818497d4e7043618f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/matching/ocsort_matching.py @@ -0,0 +1,165 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/association.py +""" + +import os +import numpy as np + + +def iou_batch(bboxes1, bboxes2): + bboxes2 = np.expand_dims(bboxes2, 0) + bboxes1 = np.expand_dims(bboxes1, 1) + + xx1 = np.maximum(bboxes1[..., 0], bboxes2[..., 0]) + yy1 = np.maximum(bboxes1[..., 1], bboxes2[..., 1]) + xx2 = np.minimum(bboxes1[..., 2], bboxes2[..., 2]) + yy2 = np.minimum(bboxes1[..., 3], bboxes2[..., 3]) + w = np.maximum(0., xx2 - xx1) + h = np.maximum(0., yy2 - yy1) + area = w * h + iou_matrix = area / ((bboxes1[..., 2] - bboxes1[..., 0]) * + (bboxes1[..., 3] - bboxes1[..., 1]) + + (bboxes2[..., 2] - bboxes2[..., 0]) * + (bboxes2[..., 3] - bboxes2[..., 1]) - area) + return iou_matrix + + +def speed_direction_batch(dets, tracks): + tracks = tracks[..., np.newaxis] + CX1, CY1 = (dets[:, 0] + dets[:, 2]) / 2.0, (dets[:, 1] + dets[:, 3]) / 2.0 + CX2, CY2 = (tracks[:, 0] + tracks[:, 2]) / 2.0, ( + tracks[:, 1] + tracks[:, 3]) / 2.0 + dx = CX1 - CX2 + dy = CY1 - CY2 + norm = np.sqrt(dx**2 + dy**2) + 1e-6 + dx = dx / norm + dy = dy / norm + return dy, dx + + +def linear_assignment(cost_matrix): + try: + import lap + _, x, y = lap.lapjv(cost_matrix, extend_cost=True) + return np.array([[y[i], i] for i in x if i >= 0]) + except ImportError: + from scipy.optimize import linear_sum_assignment + x, y = linear_sum_assignment(cost_matrix) + return np.array(list(zip(x, y))) + + +def associate(detections, trackers, iou_threshold, velocities, previous_obs, + vdc_weight): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + Y, X = speed_direction_batch(detections, previous_obs) + inertia_Y, inertia_X = velocities[:, 0], velocities[:, 1] + inertia_Y = np.repeat(inertia_Y[:, np.newaxis], Y.shape[1], axis=1) + inertia_X = np.repeat(inertia_X[:, np.newaxis], X.shape[1], axis=1) + diff_angle_cos = inertia_X * X + inertia_Y * Y + diff_angle_cos = np.clip(diff_angle_cos, a_min=-1, a_max=1) + diff_angle = np.arccos(diff_angle_cos) + diff_angle = (np.pi / 2.0 - np.abs(diff_angle)) / np.pi + + valid_mask = np.ones(previous_obs.shape[0]) + valid_mask[np.where(previous_obs[:, 4] < 0)] = 0 + + iou_matrix = iou_batch(detections, trackers) + scores = np.repeat( + detections[:, -1][:, np.newaxis], trackers.shape[0], axis=1) + # iou_matrix = iou_matrix * scores # a trick sometiems works, we don't encourage this + valid_mask = np.repeat(valid_mask[:, np.newaxis], X.shape[1], axis=1) + + angle_diff_cost = (valid_mask * diff_angle) * vdc_weight + angle_diff_cost = angle_diff_cost.T + angle_diff_cost = angle_diff_cost * scores + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-(iou_matrix + angle_diff_cost)) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) + + +def associate_only_iou(detections, trackers, iou_threshold): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + iou_matrix = iou_batch(detections, trackers) + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-iou_matrix) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6d206128e1e797c5f70203d0dec45db4d0aa6728 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__init__.py @@ -0,0 +1,18 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import kalman_filter + +from .kalman_filter import * +from .gmc import * \ No newline at end of file diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..dc20f57a1fa4ce35c3e5bb9bc1b707204ba6f4c1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/gmc.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/gmc.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..334664779cae8f97db50cfe0525a591fb60a8d3a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/gmc.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/kalman_filter.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/kalman_filter.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6d49adbbedfc827009141bbfd475c4ff4aad2bf8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/kalman_filter.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/ocsort_kalman_filter.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/ocsort_kalman_filter.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..962812d14461b54b07038c460f73e9aff0d95c57 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/__pycache__/ocsort_kalman_filter.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/gmc.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/gmc.py new file mode 100644 index 0000000000000000000000000000000000000000..43ec42eea558b9efb9c08b25c9211f1a57f63891 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/gmc.py @@ -0,0 +1,368 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/WWangYuHsiang/SMILEtrack/blob/main/BoT-SORT/tracker/gmc.py +""" + +import cv2 +import matplotlib.pyplot as plt +import numpy as np +import copy +import time +from ppdet.core.workspace import register, serializable + + +@register +@serializable +class GMC: + def __init__(self, method='sparseOptFlow', downscale=2, verbose=None): + super(GMC, self).__init__() + + self.method = method + self.downscale = max(1, int(downscale)) + + if self.method == 'orb': + self.detector = cv2.FastFeatureDetector_create(20) + self.extractor = cv2.ORB_create() + self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING) + + elif self.method == 'sift': + self.detector = cv2.SIFT_create( + nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20) + self.extractor = cv2.SIFT_create( + nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20) + self.matcher = cv2.BFMatcher(cv2.NORM_L2) + + elif self.method == 'ecc': + number_of_iterations = 5000 + termination_eps = 1e-6 + self.warp_mode = cv2.MOTION_EUCLIDEAN + self.criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, + number_of_iterations, termination_eps) + + elif self.method == 'sparseOptFlow': + self.feature_params = dict( + maxCorners=1000, + qualityLevel=0.01, + minDistance=1, + blockSize=3, + useHarrisDetector=False, + k=0.04) + # self.gmc_file = open('GMC_results.txt', 'w') + + elif self.method == 'file' or self.method == 'files': + seqName = verbose[0] + ablation = verbose[1] + if ablation: + filePath = r'tracker/GMC_files/MOT17_ablation' + else: + filePath = r'tracker/GMC_files/MOTChallenge' + + if '-FRCNN' in seqName: + seqName = seqName[:-6] + elif '-DPM' in seqName: + seqName = seqName[:-4] + elif '-SDP' in seqName: + seqName = seqName[:-4] + + self.gmcFile = open(filePath + "/GMC-" + seqName + ".txt", 'r') + + if self.gmcFile is None: + raise ValueError("Error: Unable to open GMC file in directory:" + + filePath) + elif self.method == 'none' or self.method == 'None': + self.method = 'none' + else: + raise ValueError("Error: Unknown CMC method:" + method) + + self.prevFrame = None + self.prevKeyPoints = None + self.prevDescriptors = None + + self.initializedFirstFrame = False + + def apply(self, raw_frame, detections=None): + if self.method == 'orb' or self.method == 'sift': + return self.applyFeaures(raw_frame, detections) + elif self.method == 'ecc': + return self.applyEcc(raw_frame, detections) + elif self.method == 'sparseOptFlow': + return self.applySparseOptFlow(raw_frame, detections) + elif self.method == 'file': + return self.applyFile(raw_frame, detections) + elif self.method == 'none': + return np.eye(2, 3) + else: + return np.eye(2, 3) + + def applyEcc(self, raw_frame, detections=None): + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3, dtype=np.float32) + + # Downscale image (TODO: consider using pyramids) + if self.downscale > 1.0: + frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + width = width // self.downscale + height = height // self.downscale + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + + # Initialization done + self.initializedFirstFrame = True + + return H + + # Run the ECC algorithm. The results are stored in warp_matrix. + # (cc, H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, self.criteria) + try: + (cc, + H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, + self.criteria, None, 1) + except: + print('Warning: find transform failed. Set warp as identity') + + return H + + def applyFeaures(self, raw_frame, detections=None): + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3) + + # Downscale image (TODO: consider using pyramids) + if self.downscale > 1.0: + # frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + width = width // self.downscale + height = height // self.downscale + + # find the keypoints + mask = np.zeros_like(frame) + # mask[int(0.05 * height): int(0.95 * height), int(0.05 * width): int(0.95 * width)] = 255 + mask[int(0.02 * height):int(0.98 * height), int(0.02 * width):int( + 0.98 * width)] = 255 + if detections is not None: + for det in detections: + tlbr = (det[:4] / self.downscale).astype(np.int_) + mask[tlbr[1]:tlbr[3], tlbr[0]:tlbr[2]] = 0 + + keypoints = self.detector.detect(frame, mask) + + # compute the descriptors + keypoints, descriptors = self.extractor.compute(frame, keypoints) + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + # Initialization done + self.initializedFirstFrame = True + + return H + + # Match descriptors. + knnMatches = self.matcher.knnMatch(self.prevDescriptors, descriptors, 2) + + # Filtered matches based on smallest spatial distance + matches = [] + spatialDistances = [] + + maxSpatialDistance = 0.25 * np.array([width, height]) + + # Handle empty matches case + if len(knnMatches) == 0: + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + return H + + for m, n in knnMatches: + if m.distance < 0.9 * n.distance: + prevKeyPointLocation = self.prevKeyPoints[m.queryIdx].pt + currKeyPointLocation = keypoints[m.trainIdx].pt + + spatialDistance = ( + prevKeyPointLocation[0] - currKeyPointLocation[0], + prevKeyPointLocation[1] - currKeyPointLocation[1]) + + if (np.abs(spatialDistance[0]) < maxSpatialDistance[0]) and \ + (np.abs(spatialDistance[1]) < maxSpatialDistance[1]): + spatialDistances.append(spatialDistance) + matches.append(m) + + meanSpatialDistances = np.mean(spatialDistances, 0) + stdSpatialDistances = np.std(spatialDistances, 0) + + inliesrs = (spatialDistances - meanSpatialDistances + ) < 2.5 * stdSpatialDistances + + goodMatches = [] + prevPoints = [] + currPoints = [] + for i in range(len(matches)): + if inliesrs[i, 0] and inliesrs[i, 1]: + goodMatches.append(matches[i]) + prevPoints.append(self.prevKeyPoints[matches[i].queryIdx].pt) + currPoints.append(keypoints[matches[i].trainIdx].pt) + + prevPoints = np.array(prevPoints) + currPoints = np.array(currPoints) + + # Draw the keypoint matches on the output image + if 0: + matches_img = np.hstack((self.prevFrame, frame)) + matches_img = cv2.cvtColor(matches_img, cv2.COLOR_GRAY2BGR) + W = np.size(self.prevFrame, 1) + for m in goodMatches: + prev_pt = np.array( + self.prevKeyPoints[m.queryIdx].pt, dtype=np.int_) + curr_pt = np.array(keypoints[m.trainIdx].pt, dtype=np.int_) + curr_pt[0] += W + color = np.random.randint(0, 255, (3, )) + color = (int(color[0]), int(color[1]), int(color[2])) + + matches_img = cv2.line(matches_img, prev_pt, curr_pt, + tuple(color), 1, cv2.LINE_AA) + matches_img = cv2.circle(matches_img, prev_pt, 2, + tuple(color), -1) + matches_img = cv2.circle(matches_img, curr_pt, 2, + tuple(color), -1) + + plt.figure() + plt.imshow(matches_img) + plt.show() + + # Find rigid matrix + if (np.size(prevPoints, 0) > 4) and ( + np.size(prevPoints, 0) == np.size(prevPoints, 0)): + H, inliesrs = cv2.estimateAffinePartial2D(prevPoints, currPoints, + cv2.RANSAC) + + # Handle downscale + if self.downscale > 1.0: + H[0, 2] *= self.downscale + H[1, 2] *= self.downscale + else: + print('Warning: not enough matching points') + + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + self.prevDescriptors = copy.copy(descriptors) + + return H + + def applySparseOptFlow(self, raw_frame, detections=None): + + t0 = time.time() + + # Initialize + height, width, _ = raw_frame.shape + frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY) + H = np.eye(2, 3) + + # Downscale image + if self.downscale > 1.0: + # frame = cv2.GaussianBlur(frame, (3, 3), 1.5) + frame = cv2.resize(frame, (width // self.downscale, + height // self.downscale)) + + # find the keypoints + keypoints = cv2.goodFeaturesToTrack( + frame, mask=None, **self.feature_params) + + # Handle first frame + if not self.initializedFirstFrame: + # Initialize data + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + + # Initialization done + self.initializedFirstFrame = True + + return H + + if self.prevFrame.shape != frame.shape: + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + return H + + # find correspondences + matchedKeypoints, status, err = cv2.calcOpticalFlowPyrLK( + self.prevFrame, frame, self.prevKeyPoints, None) + + # leave good correspondences only + prevPoints = [] + currPoints = [] + + for i in range(len(status)): + if status[i]: + prevPoints.append(self.prevKeyPoints[i]) + currPoints.append(matchedKeypoints[i]) + + prevPoints = np.array(prevPoints) + currPoints = np.array(currPoints) + + # Find rigid matrix + if (np.size(prevPoints, 0) > 4) and ( + np.size(prevPoints, 0) == np.size(prevPoints, 0)): + H, inliesrs = cv2.estimateAffinePartial2D(prevPoints, currPoints, + cv2.RANSAC) + + # Handle downscale + if self.downscale > 1.0: + H[0, 2] *= self.downscale + H[1, 2] *= self.downscale + else: + print('Warning: not enough matching points') + + # Store to next iteration + self.prevFrame = frame.copy() + self.prevKeyPoints = copy.copy(keypoints) + + t1 = time.time() + + # gmc_line = str(1000 * (t1 - t0)) + "\t" + str(H[0, 0]) + "\t" + str(H[0, 1]) + "\t" + str( + # H[0, 2]) + "\t" + str(H[1, 0]) + "\t" + str(H[1, 1]) + "\t" + str(H[1, 2]) + "\n" + # self.gmc_file.write(gmc_line) + + return H + + def applyFile(self, raw_frame, detections=None): + line = self.gmcFile.readline() + tokens = line.split("\t") + H = np.eye(2, 3, dtype=np.float_) + H[0, 0] = float(tokens[1]) + H[0, 1] = float(tokens[2]) + H[0, 2] = float(tokens[3]) + H[1, 0] = float(tokens[4]) + H[1, 1] = float(tokens[5]) + H[1, 2] = float(tokens[6]) + + return H diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/kalman_filter.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/kalman_filter.py new file mode 100644 index 0000000000000000000000000000000000000000..18951aabd6fbdebc69191dea7b07da1dbea8d52c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/kalman_filter.py @@ -0,0 +1,316 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/kalman_filter.py +""" + +import numpy as np +import scipy.linalg + +use_numba = True +try: + import numba as nb + + @nb.njit(fastmath=True, cache=True) + def nb_project(mean, covariance, std, _update_mat): + innovation_cov = np.diag(np.square(std)) + mean = np.dot(_update_mat, mean) + covariance = np.dot(np.dot(_update_mat, covariance), _update_mat.T) + return mean, covariance + innovation_cov + + @nb.njit(fastmath=True, cache=True) + def nb_multi_predict(mean, covariance, motion_cov, motion_mat): + mean = np.dot(mean, motion_mat.T) + left = np.dot(motion_mat, covariance) + covariance = np.dot(left, motion_mat.T) + motion_cov + return mean, covariance + + @nb.njit(fastmath=True, cache=True) + def nb_update(mean, covariance, proj_mean, proj_cov, measurement, meas_mat): + kalman_gain = np.linalg.solve(proj_cov, (covariance @meas_mat.T).T).T + innovation = measurement - proj_mean + mean = mean + innovation @kalman_gain.T + covariance = covariance - kalman_gain @proj_cov @kalman_gain.T + return mean, covariance + +except: + use_numba = False + print( + 'Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`' + ) + pass + +__all__ = ['KalmanFilter'] +""" +Table for the 0.95 quantile of the chi-square distribution with N degrees of +freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv +function and used as Mahalanobis gating threshold. +""" + +chi2inv95 = { + 1: 3.8415, + 2: 5.9915, + 3: 7.8147, + 4: 9.4877, + 5: 11.070, + 6: 12.592, + 7: 14.067, + 8: 15.507, + 9: 16.919 +} + + +class KalmanFilter(object): + """ + A simple Kalman filter for tracking bounding boxes in image space. + + The 8-dimensional state space + + x, y, a, h, vx, vy, va, vh + + contains the bounding box center position (x, y), aspect ratio a, height h, + and their respective velocities. + + Object motion follows a constant velocity model. The bounding box location + (x, y, a, h) is taken as direct observation of the state space (linear + observation model). + + """ + + def __init__(self): + ndim, dt = 4, 1. + + # Create Kalman filter model matrices. + self._motion_mat = np.eye(2 * ndim, 2 * ndim, dtype=np.float32) + for i in range(ndim): + self._motion_mat[i, ndim + i] = dt + self._update_mat = np.eye(ndim, 2 * ndim, dtype=np.float32) + + # Motion and observation uncertainty are chosen relative to the current + # state estimate. These weights control the amount of uncertainty in + # the model. This is a bit hacky. + self._std_weight_position = 1. / 20 + self._std_weight_velocity = 1. / 160 + + def initiate(self, measurement): + """ + Create track from unassociated measurement. + + Args: + measurement (ndarray): Bounding box coordinates (x, y, a, h) with + center position (x, y), aspect ratio a, and height h. + + Returns: + The mean vector (8 dimensional) and covariance matrix (8x8 + dimensional) of the new track. Unobserved velocities are + initialized to 0 mean. + """ + mean_pos = measurement + mean_vel = np.zeros_like(mean_pos) + mean = np.r_[mean_pos, mean_vel] + + std = [ + 2 * self._std_weight_position * measurement[3], + 2 * self._std_weight_position * measurement[3], 1e-2, + 2 * self._std_weight_position * measurement[3], + 10 * self._std_weight_velocity * measurement[3], + 10 * self._std_weight_velocity * measurement[3], 1e-5, + 10 * self._std_weight_velocity * measurement[3] + ] + covariance = np.diag(np.square(std)) + return mean, np.float32(covariance) + + def predict(self, mean, covariance): + """ + Run Kalman filter prediction step. + + Args: + mean (ndarray): The 8 dimensional mean vector of the object state + at the previous time step. + covariance (ndarray): The 8x8 dimensional covariance matrix of the + object state at the previous time step. + + Returns: + The mean vector and covariance matrix of the predicted state. + Unobserved velocities are initialized to 0 mean. + """ + std_pos = [ + self._std_weight_position * mean[3], self._std_weight_position * + mean[3], 1e-2, self._std_weight_position * mean[3] + ] + std_vel = [ + self._std_weight_velocity * mean[3], self._std_weight_velocity * + mean[3], 1e-5, self._std_weight_velocity * mean[3] + ] + motion_cov = np.diag(np.square(np.r_[std_pos, std_vel])) + + #mean = np.dot(self._motion_mat, mean) + mean = np.dot(mean, self._motion_mat.T) + covariance = np.linalg.multi_dot( + (self._motion_mat, covariance, self._motion_mat.T)) + motion_cov + + return mean, covariance + + def project(self, mean, covariance): + """ + Project state distribution to measurement space. + + Args + mean (ndarray): The state's mean vector (8 dimensional array). + covariance (ndarray): The state's covariance matrix (8x8 dimensional). + + Returns: + The projected mean and covariance matrix of the given state estimate. + """ + std = np.array( + [ + self._std_weight_position * mean[3], self._std_weight_position * + mean[3], 1e-1, self._std_weight_position * mean[3] + ], + dtype=np.float32) + + if use_numba: + return nb_project(mean, covariance, std, self._update_mat) + + innovation_cov = np.diag(np.square(std)) + + mean = np.dot(self._update_mat, mean) + covariance = np.linalg.multi_dot((self._update_mat, covariance, + self._update_mat.T)) + return mean, covariance + innovation_cov + + def multi_predict(self, mean, covariance): + """ + Run Kalman filter prediction step (Vectorized version). + + Args: + mean (ndarray): The Nx8 dimensional mean matrix of the object states + at the previous time step. + covariance (ndarray): The Nx8x8 dimensional covariance matrics of the + object states at the previous time step. + + Returns: + The mean vector and covariance matrix of the predicted state. + Unobserved velocities are initialized to 0 mean. + """ + std_pos = np.array([ + self._std_weight_position * mean[:, 3], self._std_weight_position * + mean[:, 3], 1e-2 * np.ones_like(mean[:, 3]), + self._std_weight_position * mean[:, 3] + ]) + std_vel = np.array([ + self._std_weight_velocity * mean[:, 3], self._std_weight_velocity * + mean[:, 3], 1e-5 * np.ones_like(mean[:, 3]), + self._std_weight_velocity * mean[:, 3] + ]) + sqr = np.square(np.r_[std_pos, std_vel]).T + + if use_numba: + + means = [] + covariances = [] + for i in range(len(mean)): + a, b = nb_multi_predict(mean[i], covariance[i], + np.diag(sqr[i]), self._motion_mat) + means.append(a) + covariances.append(b) + return np.asarray(means), np.asarray(covariances) + + motion_cov = [] + for i in range(len(mean)): + motion_cov.append(np.diag(sqr[i])) + motion_cov = np.asarray(motion_cov) + + mean = np.dot(mean, self._motion_mat.T) + left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2)) + covariance = np.dot(left, self._motion_mat.T) + motion_cov + + return mean, covariance + + def update(self, mean, covariance, measurement): + """ + Run Kalman filter correction step. + + Args: + mean (ndarray): The predicted state's mean vector (8 dimensional). + covariance (ndarray): The state's covariance matrix (8x8 dimensional). + measurement (ndarray): The 4 dimensional measurement vector + (x, y, a, h), where (x, y) is the center position, a the aspect + ratio, and h the height of the bounding box. + + Returns: + The measurement-corrected state distribution. + """ + projected_mean, projected_cov = self.project(mean, covariance) + + if use_numba: + + return nb_update(mean, covariance, projected_mean, projected_cov, + measurement, self._update_mat) + + kalman_gain = np.linalg.solve(projected_cov, + (covariance @self._update_mat.T).T).T + innovation = measurement - projected_mean + mean = mean + innovation @kalman_gain.T + covariance = covariance - kalman_gain @projected_cov @kalman_gain.T + return mean, covariance + + def gating_distance(self, + mean, + covariance, + measurements, + only_position=False, + metric='maha'): + """ + Compute gating distance between state distribution and measurements. + A suitable distance threshold can be obtained from `chi2inv95`. If + `only_position` is False, the chi-square distribution has 4 degrees of + freedom, otherwise 2. + + Args: + mean (ndarray): Mean vector over the state distribution (8 + dimensional). + covariance (ndarray): Covariance of the state distribution (8x8 + dimensional). + measurements (ndarray): An Nx4 dimensional matrix of N measurements, + each in format (x, y, a, h) where (x, y) is the bounding box center + position, a the aspect ratio, and h the height. + only_position (Optional[bool]): If True, distance computation is + done with respect to the bounding box center position only. + metric (str): Metric type, 'gaussian' or 'maha'. + + Returns + An array of length N, where the i-th element contains the squared + Mahalanobis distance between (mean, covariance) and `measurements[i]`. + """ + mean, covariance = self.project(mean, covariance) + if only_position: + mean, covariance = mean[:2], covariance[:2, :2] + measurements = measurements[:, :2] + + d = measurements - mean + if metric == 'gaussian': + return np.sum(d * d, axis=1) + elif metric == 'maha': + cholesky_factor = np.linalg.cholesky(covariance) + z = scipy.linalg.solve_triangular( + cholesky_factor, + d.T, + lower=True, + check_finite=False, + overwrite_b=True) + squared_maha = np.sum(z * z, axis=0) + return squared_maha + else: + raise ValueError('invalid distance metric') diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/ocsort_kalman_filter.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/ocsort_kalman_filter.py new file mode 100644 index 0000000000000000000000000000000000000000..8cfd9c5970bea7b9c135389348ef9238446359cf --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/motion/ocsort_kalman_filter.py @@ -0,0 +1,93 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/danbochman/SORT/blob/danny_opencv/kalman_filter.py +""" + +import numpy as np +from numpy import dot, zeros, eye +from numpy.linalg import inv + +use_numba = True +try: + import numba as nb + + @nb.njit(fastmath=True, cache=True) + def nb_predict(x, F, P, Q): + x = dot(F, x) + P = dot(dot(F, P), F.T) + Q + return x, P + + @nb.njit(fastmath=True, cache=True) + def nb_update(x, z, H, P, R, _I): + + y = z - np.dot(H, x) + PHT = dot(P, H.T) + + S = dot(H, PHT) + R + K = dot(PHT, inv(S)) + + x = x + dot(K, y) + + I_KH = _I - dot(K, H) + P = dot(dot(I_KH, P), I_KH.T) + dot(dot(K, R), K.T) + return x, P +except: + use_numba = False + print( + 'Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`' + ) + pass + + +class OCSORTKalmanFilter: + def __init__(self, dim_x, dim_z): + self.dim_x = dim_x + self.dim_z = dim_z + self.x = zeros((dim_x, 1)) + self.P = eye(dim_x) + self.Q = eye(dim_x) + self.F = eye(dim_x) + self.H = zeros((dim_z, dim_x)) + self.R = eye(dim_z) + self.M = zeros((dim_z, dim_z)) + + self._I = eye(dim_x) + + def predict(self): + if use_numba: + self.x, self.P = nb_predict(self.x, self.F, self.P, self.Q) + else: + self.x = dot(self.F, self.x) + self.P = dot(dot(self.F, self.P), self.F.T) + self.Q + + def update(self, z): + + if z is None: + return + + if use_numba: + self.x, self.P = nb_update(self.x, z, self.H, self.P, self.R, + self._I) + else: + y = z - np.dot(self.H, self.x) + PHT = dot(self.P, self.H.T) + + S = dot(self.H, PHT) + self.R + K = dot(PHT, inv(S)) + + self.x = self.x + dot(K, y) + + I_KH = self._I - dot(K, self.H) + self.P = dot(dot(I_KH, self.P), I_KH.T) + dot(dot(K, self.R), K.T) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a3c4229ed8a0e87c8bf6a138ce6945c9242c6fc4 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__init__.py @@ -0,0 +1,30 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import base_jde_tracker +from . import base_sde_tracker + +from .base_jde_tracker import * +from .base_sde_tracker import * + +from . import jde_tracker +from . import deepsort_tracker +from . import ocsort_tracker +from . import center_tracker + +from .jde_tracker import * +from .deepsort_tracker import * +from .ocsort_tracker import * +from .botsort_tracker import * +from .center_tracker import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6ab97ab5af2e61b0915e6534398d07e14f290692 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_jde_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_jde_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1a7d15c06595c2f5807bd0dd06fa5ca28930891c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_jde_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_sde_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_sde_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3d27b6ea5d3af34cf3f7a2ce291eab8c477ffc52 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/base_sde_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/botsort_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/botsort_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..daacd882fec6d9101b25cfa1b31a7195637b6860 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/botsort_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/center_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/center_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3663870ad3146b60b3bdcd4caf33bb970ee9c0b5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/center_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/deepsort_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/deepsort_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c2e5be045d6fe33e0ee304cc1adc41c4c19608f3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/deepsort_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/jde_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/jde_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2b38653b6d4c628fa8382ceeca05edfa766704d1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/jde_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/ocsort_tracker.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/ocsort_tracker.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..813938bb5e2804d28146914c036d378ec0293735 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/__pycache__/ocsort_tracker.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_jde_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_jde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..e78fe00a101c94544b1075104a4a207669bd166f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_jde_tracker.py @@ -0,0 +1,311 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py +""" + +import numpy as np +from collections import defaultdict +from collections import deque, OrderedDict +from ..matching import jde_matching as matching +from ppdet.core.workspace import register, serializable +import warnings +warnings.filterwarnings("ignore") + +__all__ = [ + 'TrackState', + 'BaseTrack', + 'STrack', + 'joint_stracks', + 'sub_stracks', + 'remove_duplicate_stracks', +] + + +class TrackState(object): + New = 0 + Tracked = 1 + Lost = 2 + Removed = 3 + + +@register +@serializable +class BaseTrack(object): + _count_dict = defaultdict(int) # support single class and multi classes + + track_id = 0 + is_activated = False + state = TrackState.New + + history = OrderedDict() + features = [] + curr_feat = None + score = 0 + start_frame = 0 + frame_id = 0 + time_since_update = 0 + + # multi-camera + location = (np.inf, np.inf) + + @property + def end_frame(self): + return self.frame_id + + @staticmethod + def next_id(cls_id): + BaseTrack._count_dict[cls_id] += 1 + return BaseTrack._count_dict[cls_id] + + # @even: reset track id + @staticmethod + def init_count(num_classes): + """ + Initiate _count for all object classes + :param num_classes: + """ + for cls_id in range(num_classes): + BaseTrack._count_dict[cls_id] = 0 + + @staticmethod + def reset_track_count(cls_id): + BaseTrack._count_dict[cls_id] = 0 + + def activate(self, *args): + raise NotImplementedError + + def predict(self): + raise NotImplementedError + + def update(self, *args, **kwargs): + raise NotImplementedError + + def mark_lost(self): + self.state = TrackState.Lost + + def mark_removed(self): + self.state = TrackState.Removed + + +@register +@serializable +class STrack(BaseTrack): + def __init__(self, tlwh, score, cls_id, buff_size=30, temp_feat=None): + # wait activate + self._tlwh = np.asarray(tlwh, dtype=np.float32) + self.score = score + self.cls_id = cls_id + self.track_len = 0 + + self.kalman_filter = None + self.mean, self.covariance = None, None + self.is_activated = False + + self.use_reid = True if temp_feat is not None else False + if self.use_reid: + self.smooth_feat = None + self.update_features(temp_feat) + self.features = deque([], maxlen=buff_size) + self.alpha = 0.9 + + def update_features(self, feat): + # L2 normalizing, this function has no use for BYTETracker + feat /= np.linalg.norm(feat) + self.curr_feat = feat + if self.smooth_feat is None: + self.smooth_feat = feat + else: + self.smooth_feat = self.alpha * self.smooth_feat + (1.0 - self.alpha + ) * feat + self.features.append(feat) + self.smooth_feat /= np.linalg.norm(self.smooth_feat) + + def predict(self): + mean_state = self.mean.copy() + if self.state != TrackState.Tracked: + mean_state[7] = 0 + self.mean, self.covariance = self.kalman_filter.predict(mean_state, + self.covariance) + + @staticmethod + def multi_predict(tracks, kalman_filter): + if len(tracks) > 0: + multi_mean = np.asarray([track.mean.copy() for track in tracks]) + multi_covariance = np.asarray( + [track.covariance for track in tracks]) + for i, st in enumerate(tracks): + if st.state != TrackState.Tracked: + multi_mean[i][7] = 0 + multi_mean, multi_covariance = kalman_filter.multi_predict( + multi_mean, multi_covariance) + for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)): + tracks[i].mean = mean + tracks[i].covariance = cov + + @staticmethod + def multi_gmc(stracks, H=np.eye(2, 3)): + if len(stracks) > 0: + multi_mean = np.asarray([st.mean.copy() for st in stracks]) + multi_covariance = np.asarray([st.covariance for st in stracks]) + + R = H[:2, :2] + R8x8 = np.kron(np.eye(4, dtype=float), R) + t = H[:2, 2] + + for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)): + mean = R8x8.dot(mean) + mean[:2] += t + cov = R8x8.dot(cov).dot(R8x8.transpose()) + + stracks[i].mean = mean + stracks[i].covariance = cov + + def reset_track_id(self): + self.reset_track_count(self.cls_id) + + def activate(self, kalman_filter, frame_id): + """Start a new track""" + self.kalman_filter = kalman_filter + # update track id for the object class + self.track_id = self.next_id(self.cls_id) + self.mean, self.covariance = self.kalman_filter.initiate( + self.tlwh_to_xyah(self._tlwh)) + + self.track_len = 0 + self.state = TrackState.Tracked # set flag 'tracked' + + if frame_id == 1: # to record the first frame's detection result + self.is_activated = True + + self.frame_id = frame_id + self.start_frame = frame_id + + def re_activate(self, new_track, frame_id, new_id=False): + self.mean, self.covariance = self.kalman_filter.update( + self.mean, self.covariance, self.tlwh_to_xyah(new_track.tlwh)) + if self.use_reid: + self.update_features(new_track.curr_feat) + self.track_len = 0 + self.state = TrackState.Tracked + self.is_activated = True + self.frame_id = frame_id + if new_id: # update track id for the object class + self.track_id = self.next_id(self.cls_id) + + def update(self, new_track, frame_id, update_feature=True): + self.frame_id = frame_id + self.track_len += 1 + + new_tlwh = new_track.tlwh + self.mean, self.covariance = self.kalman_filter.update( + self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh)) + self.state = TrackState.Tracked # set flag 'tracked' + self.is_activated = True # set flag 'activated' + + self.score = new_track.score + if update_feature and self.use_reid: + self.update_features(new_track.curr_feat) + + @property + def tlwh(self): + """Get current position in bounding box format `(top left x, top left y, + width, height)`. + """ + if self.mean is None: + return self._tlwh.copy() + + ret = self.mean[:4].copy() + ret[2] *= ret[3] + ret[:2] -= ret[2:] / 2 + return ret + + @property + def tlbr(self): + """Convert bounding box to format `(min x, min y, max x, max y)`, i.e., + `(top left, bottom right)`. + """ + ret = self.tlwh.copy() + ret[2:] += ret[:2] + return ret + + @staticmethod + def tlwh_to_xyah(tlwh): + """Convert bounding box to format `(center x, center y, aspect ratio, + height)`, where the aspect ratio is `width / height`. + """ + ret = np.asarray(tlwh).copy() + ret[:2] += ret[2:] / 2 + ret[2] /= ret[3] + return ret + + def to_xyah(self): + return self.tlwh_to_xyah(self.tlwh) + + @staticmethod + def tlbr_to_tlwh(tlbr): + ret = np.asarray(tlbr).copy() + ret[2:] -= ret[:2] + return ret + + @staticmethod + def tlwh_to_tlbr(tlwh): + ret = np.asarray(tlwh).copy() + ret[2:] += ret[:2] + return ret + + def __repr__(self): + return 'OT_({}-{})_({}-{})'.format(self.cls_id, self.track_id, + self.start_frame, self.end_frame) + + +def joint_stracks(tlista, tlistb): + exists = {} + res = [] + for t in tlista: + exists[t.track_id] = 1 + res.append(t) + for t in tlistb: + tid = t.track_id + if not exists.get(tid, 0): + exists[tid] = 1 + res.append(t) + return res + + +def sub_stracks(tlista, tlistb): + stracks = {} + for t in tlista: + stracks[t.track_id] = t + for t in tlistb: + tid = t.track_id + if stracks.get(tid, 0): + del stracks[tid] + return list(stracks.values()) + + +def remove_duplicate_stracks(stracksa, stracksb): + pdist = matching.iou_distance(stracksa, stracksb) + pairs = np.where(pdist < 0.15) + dupa, dupb = list(), list() + for p, q in zip(*pairs): + timep = stracksa[p].frame_id - stracksa[p].start_frame + timeq = stracksb[q].frame_id - stracksb[q].start_frame + if timep > timeq: + dupb.append(q) + else: + dupa.append(p) + resa = [t for i, t in enumerate(stracksa) if not i in dupa] + resb = [t for i, t in enumerate(stracksb) if not i in dupb] + return resa, resb diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_sde_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_sde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..accc2016ff74fd5f864797e6ef52227f2f8a7163 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/base_sde_tracker.py @@ -0,0 +1,156 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py +""" + +import datetime +from ppdet.core.workspace import register, serializable + +__all__ = ['TrackState', 'Track'] + + +class TrackState(object): + """ + Enumeration type for the single target track state. Newly created tracks are + classified as `tentative` until enough evidence has been collected. Then, + the track state is changed to `confirmed`. Tracks that are no longer alive + are classified as `deleted` to mark them for removal from the set of active + tracks. + """ + Tentative = 1 + Confirmed = 2 + Deleted = 3 + + +@register +@serializable +class Track(object): + """ + A single target track with state space `(x, y, a, h)` and associated + velocities, where `(x, y)` is the center of the bounding box, `a` is the + aspect ratio and `h` is the height. + + Args: + mean (ndarray): Mean vector of the initial state distribution. + covariance (ndarray): Covariance matrix of the initial state distribution. + track_id (int): A unique track identifier. + n_init (int): Number of consecutive detections before the track is confirmed. + The track state is set to `Deleted` if a miss occurs within the first + `n_init` frames. + max_age (int): The maximum number of consecutive misses before the track + state is set to `Deleted`. + cls_id (int): The category id of the tracked box. + score (float): The confidence score of the tracked box. + feature (Optional[ndarray]): Feature vector of the detection this track + originates from. If not None, this feature is added to the `features` cache. + + Attributes: + hits (int): Total number of measurement updates. + age (int): Total number of frames since first occurance. + time_since_update (int): Total number of frames since last measurement + update. + state (TrackState): The current track state. + features (List[ndarray]): A cache of features. On each measurement update, + the associated feature vector is added to this list. + """ + + def __init__(self, + mean, + covariance, + track_id, + n_init, + max_age, + cls_id, + score, + feature=None): + self.mean = mean + self.covariance = covariance + self.track_id = track_id + self.hits = 1 + self.age = 1 + self.time_since_update = 0 + self.cls_id = cls_id + self.score = score + self.start_time = datetime.datetime.now() + + self.state = TrackState.Tentative + self.features = [] + self.feat = feature + if feature is not None: + self.features.append(feature) + + self._n_init = n_init + self._max_age = max_age + + def to_tlwh(self): + """Get position in format `(top left x, top left y, width, height)`.""" + ret = self.mean[:4].copy() + ret[2] *= ret[3] + ret[:2] -= ret[2:] / 2 + return ret + + def to_tlbr(self): + """Get position in bounding box format `(min x, miny, max x, max y)`.""" + ret = self.to_tlwh() + ret[2:] = ret[:2] + ret[2:] + return ret + + def predict(self, kalman_filter): + """ + Propagate the state distribution to the current time step using a Kalman + filter prediction step. + """ + self.mean, self.covariance = kalman_filter.predict(self.mean, + self.covariance) + self.age += 1 + self.time_since_update += 1 + + def update(self, kalman_filter, detection): + """ + Perform Kalman filter measurement update step and update the associated + detection feature cache. + """ + self.mean, self.covariance = kalman_filter.update(self.mean, + self.covariance, + detection.to_xyah()) + self.features.append(detection.feature) + self.feat = detection.feature + self.cls_id = detection.cls_id + self.score = detection.score + + self.hits += 1 + self.time_since_update = 0 + if self.state == TrackState.Tentative and self.hits >= self._n_init: + self.state = TrackState.Confirmed + + def mark_missed(self): + """Mark this track as missed (no association at the current time step). + """ + if self.state == TrackState.Tentative: + self.state = TrackState.Deleted + elif self.time_since_update > self._max_age: + self.state = TrackState.Deleted + + def is_tentative(self): + """Returns True if this track is tentative (unconfirmed).""" + return self.state == TrackState.Tentative + + def is_confirmed(self): + """Returns True if this track is confirmed.""" + return self.state == TrackState.Confirmed + + def is_deleted(self): + """Returns True if this track is dead and should be deleted.""" + return self.state == TrackState.Deleted diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/botsort_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/botsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..4f412a7b847021ce4c912e51ac8aca88306083dd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/botsort_tracker.py @@ -0,0 +1,242 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/WWangYuHsiang/SMILEtrack/blob/main/BoT-SORT/tracker/bot_sort.py +""" + +import cv2 +import matplotlib.pyplot as plt +import numpy as np +from collections import deque + +from ..matching import jde_matching as matching +from ..motion import GMC +from .base_jde_tracker import TrackState, STrack +from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks +from ..motion import KalmanFilter + +from ppdet.core.workspace import register, serializable + + +@register +@serializable +class BOTSORTTracker(object): + """ + BOTSORT tracker, support single class + + Args: + track_high_thresh (float): threshold of detection high score + track_low_thresh (float): threshold of remove detection score + new_track_thresh (float): threshold of new track score + match_thresh (float): iou threshold for associate + track_buffer (int): tracking reserved frames,default 30 + min_box_area (float): reserved min box + camera_motion (bool): Whether use camera motion, default False + cmc_method (str): camera motion method,defalut sparseOptFlow + frame_rate (int): fps buffer_size=int(frame_rate / 30.0 * track_buffer) + """ + + def __init__(self, + track_high_thresh=0.3, + track_low_thresh=0.2, + new_track_thresh=0.4, + match_thresh=0.7, + track_buffer=30, + min_box_area=0, + camera_motion=False, + cmc_method='sparseOptFlow', + frame_rate=30): + + self.tracked_stracks = [] # type: list[STrack] + self.lost_stracks = [] # type: list[STrack] + self.removed_stracks = [] # type: list[STrack] + + self.frame_id = 0 + + self.track_high_thresh = track_high_thresh + self.track_low_thresh = track_low_thresh + self.new_track_thresh = new_track_thresh + self.match_thresh = match_thresh + self.buffer_size = int(frame_rate / 30.0 * track_buffer) + self.max_time_lost = self.buffer_size + self.kalman_filter = KalmanFilter() + self.min_box_area = min_box_area + + self.camera_motion = camera_motion + self.gmc = GMC(method=cmc_method) + + def update(self, output_results, img=None): + self.frame_id += 1 + activated_starcks = [] + refind_stracks = [] + lost_stracks = [] + removed_stracks = [] + + if len(output_results): + bboxes = output_results[:, 2:6] + scores = output_results[:, 1] + classes = output_results[:, 0] + + # Remove bad detections + lowest_inds = scores > self.track_low_thresh + bboxes = bboxes[lowest_inds] + scores = scores[lowest_inds] + classes = classes[lowest_inds] + + # Find high threshold detections + remain_inds = scores > self.track_high_thresh + dets = bboxes[remain_inds] + scores_keep = scores[remain_inds] + classes_keep = classes[remain_inds] + + else: + bboxes = [] + scores = [] + classes = [] + dets = [] + scores_keep = [] + classes_keep = [] + + if len(dets) > 0: + '''Detections''' + detections = [ + STrack(STrack.tlbr_to_tlwh(tlbr), s, c) + for (tlbr, s, c) in zip(dets, scores_keep, classes_keep) + ] + else: + detections = [] + ''' Add newly detected tracklets to tracked_stracks''' + unconfirmed = [] + tracked_stracks = [] # type: list[STrack] + for track in self.tracked_stracks: + if not track.is_activated: + unconfirmed.append(track) + else: + tracked_stracks.append(track) + ''' Step 2: First association, with high score detection boxes''' + strack_pool = joint_stracks(tracked_stracks, self.lost_stracks) + + # Predict the current location with KF + STrack.multi_predict(strack_pool, self.kalman_filter) + + # Fix camera motion + if self.camera_motion: + warp = self.gmc.apply(img[0], dets) + STrack.multi_gmc(strack_pool, warp) + STrack.multi_gmc(unconfirmed, warp) + + # Associate with high score detection boxes + ious_dists = matching.iou_distance(strack_pool, detections) + matches, u_track, u_detection = matching.linear_assignment( + ious_dists, thresh=self.match_thresh) + + for itracked, idet in matches: + track = strack_pool[itracked] + det = detections[idet] + if track.state == TrackState.Tracked: + track.update(detections[idet], self.frame_id) + activated_starcks.append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + ''' Step 3: Second association, with low score detection boxes''' + if len(scores): + inds_high = scores < self.track_high_thresh + inds_low = scores > self.track_low_thresh + inds_second = np.logical_and(inds_low, inds_high) + dets_second = bboxes[inds_second] + scores_second = scores[inds_second] + classes_second = classes[inds_second] + else: + dets_second = [] + scores_second = [] + classes_second = [] + + # association the untrack to the low score detections + if len(dets_second) > 0: + '''Detections''' + detections_second = [ + STrack(STrack.tlbr_to_tlwh(tlbr), s, c) for (tlbr, s, c) in + zip(dets_second, scores_second, classes_second) + ] + else: + detections_second = [] + + r_tracked_stracks = [ + strack_pool[i] for i in u_track + if strack_pool[i].state == TrackState.Tracked + ] + dists = matching.iou_distance(r_tracked_stracks, detections_second) + matches, u_track, u_detection_second = matching.linear_assignment( + dists, thresh=0.5) + for itracked, idet in matches: + track = r_tracked_stracks[itracked] + det = detections_second[idet] + if track.state == TrackState.Tracked: + track.update(det, self.frame_id) + activated_starcks.append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + + for it in u_track: + track = r_tracked_stracks[it] + if not track.state == TrackState.Lost: + track.mark_lost() + lost_stracks.append(track) + '''Deal with unconfirmed tracks, usually tracks with only one beginning frame''' + detections = [detections[i] for i in u_detection] + dists = matching.iou_distance(unconfirmed, detections) + + matches, u_unconfirmed, u_detection = matching.linear_assignment( + dists, thresh=0.7) + for itracked, idet in matches: + unconfirmed[itracked].update(detections[idet], self.frame_id) + activated_starcks.append(unconfirmed[itracked]) + for it in u_unconfirmed: + track = unconfirmed[it] + track.mark_removed() + removed_stracks.append(track) + """ Step 4: Init new stracks""" + for inew in u_detection: + track = detections[inew] + if track.score < self.new_track_thresh: + continue + + track.activate(self.kalman_filter, self.frame_id) + activated_starcks.append(track) + """ Step 5: Update state""" + for track in self.lost_stracks: + if self.frame_id - track.end_frame > self.max_time_lost: + track.mark_removed() + removed_stracks.append(track) + """ Merge """ + self.tracked_stracks = [ + t for t in self.tracked_stracks if t.state == TrackState.Tracked + ] + self.tracked_stracks = joint_stracks(self.tracked_stracks, + activated_starcks) + self.tracked_stracks = joint_stracks(self.tracked_stracks, + refind_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks) + self.lost_stracks.extend(lost_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks) + self.removed_stracks.extend(removed_stracks) + self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks( + self.tracked_stracks, self.lost_stracks) + + # output_stracks = [track for track in self.tracked_stracks if track.is_activated] + output_stracks = [track for track in self.tracked_stracks] + + return output_stracks diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/center_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/center_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..8005ddc0c3cd9cd214ea190f3d48d250074aaf20 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/center_tracker.py @@ -0,0 +1,149 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/xingyizhou/CenterTrack/blob/master/src/lib/utils/tracker.py +""" + +import copy +import numpy as np +import sklearn + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['CenterTracker'] + + +@register +@serializable +class CenterTracker(object): + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=1, + min_box_area=0, + vertical_ratio=-1, + track_thresh=0.4, + pre_thresh=0.5, + new_thresh=0.4, + out_thresh=0.4, + hungarian=False): + self.num_classes = num_classes + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + + self.track_thresh = track_thresh + self.pre_thresh = max(track_thresh, pre_thresh) + self.new_thresh = max(track_thresh, new_thresh) + self.out_thresh = max(track_thresh, out_thresh) + self.hungarian = hungarian + + self.reset() + + def init_track(self, results): + print('Initialize tracking!') + for item in results: + if item['score'] > self.new_thresh: + self.id_count += 1 + item['tracking_id'] = self.id_count + if not ('ct' in item): + bbox = item['bbox'] + item['ct'] = [(bbox[0] + bbox[2]) / 2, + (bbox[1] + bbox[3]) / 2] + self.tracks.append(item) + + def reset(self): + self.id_count = 0 + self.tracks = [] + + def update(self, results, public_det=None): + N = len(results) + M = len(self.tracks) + + dets = np.array([det['ct'] + det['tracking'] for det in results], + np.float32) # N x 2 + track_size = np.array([((track['bbox'][2] - track['bbox'][0]) * \ + (track['bbox'][3] - track['bbox'][1])) \ + for track in self.tracks], np.float32) # M + track_cat = np.array([track['class'] for track in self.tracks], + np.int32) # M + item_size = np.array([((item['bbox'][2] - item['bbox'][0]) * \ + (item['bbox'][3] - item['bbox'][1])) \ + for item in results], np.float32) # N + item_cat = np.array([item['class'] for item in results], np.int32) # N + tracks = np.array([pre_det['ct'] for pre_det in self.tracks], + np.float32) # M x 2 + dist = (((tracks.reshape(1, -1, 2) - \ + dets.reshape(-1, 1, 2)) ** 2).sum(axis=2)) # N x M + + invalid = ((dist > track_size.reshape(1, M)) + \ + (dist > item_size.reshape(N, 1)) + \ + (item_cat.reshape(N, 1) != track_cat.reshape(1, M))) > 0 + dist = dist + invalid * 1e18 + + if self.hungarian: + item_score = np.array([item['score'] for item in results], + np.float32) + dist[dist > 1e18] = 1e18 + from sklearn.utils.linear_assignment_ import linear_assignment + matched_indices = linear_assignment(dist) + else: + matched_indices = greedy_assignment(copy.deepcopy(dist)) + + unmatched_dets = [d for d in range(dets.shape[0]) \ + if not (d in matched_indices[:, 0])] + unmatched_tracks = [d for d in range(tracks.shape[0]) \ + if not (d in matched_indices[:, 1])] + + if self.hungarian: + matches = [] + for m in matched_indices: + if dist[m[0], m[1]] > 1e16: + unmatched_dets.append(m[0]) + unmatched_tracks.append(m[1]) + else: + matches.append(m) + matches = np.array(matches).reshape(-1, 2) + else: + matches = matched_indices + + ret = [] + for m in matches: + track = results[m[0]] + track['tracking_id'] = self.tracks[m[1]]['tracking_id'] + ret.append(track) + + # Private detection: create tracks for all un-matched detections + for i in unmatched_dets: + track = results[i] + if track['score'] > self.new_thresh: + self.id_count += 1 + track['tracking_id'] = self.id_count + ret.append(track) + + self.tracks = ret + return ret + + +def greedy_assignment(dist): + matched_indices = [] + if dist.shape[1] == 0: + return np.array(matched_indices, np.int32).reshape(-1, 2) + for i in range(dist.shape[0]): + j = dist[i].argmin() + if dist[i][j] < 1e16: + dist[:, j] = 1e18 + matched_indices.append([i, j]) + return np.array(matched_indices, np.int32).reshape(-1, 2) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/deepsort_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/deepsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..9065dfe7ed80c5c03a4ddeb2ae954e55984a206b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/deepsort_tracker.py @@ -0,0 +1,189 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/nwojke/deep_sort/blob/master/deep_sort/tracker.py +""" + +import numpy as np + +from ..motion import KalmanFilter +from ..matching.deepsort_matching import NearestNeighborDistanceMetric +from ..matching.deepsort_matching import iou_cost, min_cost_matching, matching_cascade, gate_cost_matrix +from .base_sde_tracker import Track +from ..utils import Detection + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['DeepSORTTracker'] + + +@register +@serializable +class DeepSORTTracker(object): + """ + DeepSORT tracker + + Args: + input_size (list): input feature map size to reid model, [h, w] format, + [64, 192] as default. + min_box_area (int): min box area to filter out low quality boxes + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results, set 1.6 default for pedestrian tracking. If set <=0 + means no need to filter bboxes. + budget (int): If not None, fix samples per class to at most this number. + Removes the oldest samples when the budget is reached. + max_age (int): maximum number of missed misses before a track is deleted + n_init (float): Number of frames that a track remains in initialization + phase. Number of consecutive detections before the track is confirmed. + The track state is set to `Deleted` if a miss occurs within the first + `n_init` frames. + metric_type (str): either "euclidean" or "cosine", the distance metric + used for measurement to track association. + matching_threshold (float): samples with larger distance are + considered an invalid match. + max_iou_distance (float): max iou distance threshold + motion (object): KalmanFilter instance + """ + + def __init__(self, + input_size=[64, 192], + min_box_area=0, + vertical_ratio=-1, + budget=100, + max_age=70, + n_init=3, + metric_type='cosine', + matching_threshold=0.2, + max_iou_distance=0.9, + motion='KalmanFilter'): + self.input_size = input_size + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + self.max_age = max_age + self.n_init = n_init + self.metric = NearestNeighborDistanceMetric(metric_type, + matching_threshold, budget) + self.max_iou_distance = max_iou_distance + if motion == 'KalmanFilter': + self.motion = KalmanFilter() + + self.tracks = [] + self._next_id = 1 + + def predict(self): + """ + Propagate track state distributions one time step forward. + This function should be called once every time step, before `update`. + """ + for track in self.tracks: + track.predict(self.motion) + + def update(self, pred_dets, pred_embs): + """ + Perform measurement update and track management. + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128], usually pred_embs.shape[1] is a multiple of 128. + """ + pred_cls_ids = pred_dets[:, 0:1] + pred_scores = pred_dets[:, 1:2] + pred_xyxys = pred_dets[:, 2:6] + pred_tlwhs = np.concatenate((pred_xyxys[:, 0:2], pred_xyxys[:, 2:4] - pred_xyxys[:, 0:2] + 1), axis=1) + + detections = [ + Detection(tlwh, score, feat, cls_id) + for tlwh, score, feat, cls_id in zip(pred_tlwhs, pred_scores, + pred_embs, pred_cls_ids) + ] + + # Run matching cascade. + matches, unmatched_tracks, unmatched_detections = \ + self._match(detections) + + # Update track set. + for track_idx, detection_idx in matches: + self.tracks[track_idx].update(self.motion, + detections[detection_idx]) + for track_idx in unmatched_tracks: + self.tracks[track_idx].mark_missed() + for detection_idx in unmatched_detections: + self._initiate_track(detections[detection_idx]) + self.tracks = [t for t in self.tracks if not t.is_deleted()] + + # Update distance metric. + active_targets = [t.track_id for t in self.tracks if t.is_confirmed()] + features, targets = [], [] + for track in self.tracks: + if not track.is_confirmed(): + continue + features += track.features + targets += [track.track_id for _ in track.features] + track.features = [] + self.metric.partial_fit( + np.asarray(features), np.asarray(targets), active_targets) + output_stracks = self.tracks + return output_stracks + + def _match(self, detections): + def gated_metric(tracks, dets, track_indices, detection_indices): + features = np.array([dets[i].feature for i in detection_indices]) + targets = np.array([tracks[i].track_id for i in track_indices]) + cost_matrix = self.metric.distance(features, targets) + cost_matrix = gate_cost_matrix(self.motion, cost_matrix, tracks, + dets, track_indices, + detection_indices) + return cost_matrix + + # Split track set into confirmed and unconfirmed tracks. + confirmed_tracks = [ + i for i, t in enumerate(self.tracks) if t.is_confirmed() + ] + unconfirmed_tracks = [ + i for i, t in enumerate(self.tracks) if not t.is_confirmed() + ] + + # Associate confirmed tracks using appearance features. + matches_a, unmatched_tracks_a, unmatched_detections = \ + matching_cascade( + gated_metric, self.metric.matching_threshold, self.max_age, + self.tracks, detections, confirmed_tracks) + + # Associate remaining tracks together with unconfirmed tracks using IOU. + iou_track_candidates = unconfirmed_tracks + [ + k for k in unmatched_tracks_a + if self.tracks[k].time_since_update == 1 + ] + unmatched_tracks_a = [ + k for k in unmatched_tracks_a + if self.tracks[k].time_since_update != 1 + ] + matches_b, unmatched_tracks_b, unmatched_detections = \ + min_cost_matching( + iou_cost, self.max_iou_distance, self.tracks, + detections, iou_track_candidates, unmatched_detections) + + matches = matches_a + matches_b + unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b)) + return matches, unmatched_tracks, unmatched_detections + + def _initiate_track(self, detection): + mean, covariance = self.motion.initiate(detection.to_xyah()) + self.tracks.append( + Track(mean, covariance, self._next_id, self.n_init, self.max_age, + detection.cls_id, detection.score, detection.feature)) + self._next_id += 1 diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/jde_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/jde_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..9571a6b4f9fc2b9797b8d5b08fbfeb9d137b78ec --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/jde_tracker.py @@ -0,0 +1,353 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py +""" + +import numpy as np +from collections import defaultdict + +from ..matching import jde_matching as matching +from ..motion import KalmanFilter +from .base_jde_tracker import TrackState, STrack +from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['JDETracker'] + + +@register +@serializable +class JDETracker(object): + __shared__ = ['num_classes'] + """ + JDE tracker, support single class and multi classes + + Args: + use_byte (bool): Whether use ByteTracker, default False + num_classes (int): the number of classes + det_thresh (float): threshold of detection score + track_buffer (int): buffer for tracker + min_box_area (int): min box area to filter out low quality boxes + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + tracked_thresh (float): linear assignment threshold of tracked + stracks and detections + r_tracked_thresh (float): linear assignment threshold of + tracked stracks and unmatched detections + unconfirmed_thresh (float): linear assignment threshold of + unconfirmed stracks and unmatched detections + conf_thres (float): confidence threshold for tracking, also used in + ByteTracker as higher confidence threshold + match_thres (float): linear assignment threshold of tracked + stracks and detections in ByteTracker + low_conf_thres (float): lower confidence threshold for tracking in + ByteTracker + input_size (list): input feature map size to reid model, [h, w] format, + [64, 192] as default. + motion (str): motion model, KalmanFilter as default + metric_type (str): either "euclidean" or "cosine", the distance metric + used for measurement to track association. + """ + + def __init__(self, + use_byte=False, + num_classes=1, + det_thresh=0.3, + track_buffer=30, + min_box_area=0, + vertical_ratio=0, + tracked_thresh=0.7, + r_tracked_thresh=0.5, + unconfirmed_thresh=0.7, + conf_thres=0, + match_thres=0.8, + low_conf_thres=0.2, + input_size=[64, 192], + motion='KalmanFilter', + metric_type='euclidean'): + self.use_byte = use_byte + self.num_classes = num_classes + self.det_thresh = det_thresh if not use_byte else conf_thres + 0.1 + self.track_buffer = track_buffer + self.min_box_area = min_box_area + self.vertical_ratio = vertical_ratio + + self.tracked_thresh = tracked_thresh + self.r_tracked_thresh = r_tracked_thresh + self.unconfirmed_thresh = unconfirmed_thresh + self.conf_thres = conf_thres + self.match_thres = match_thres + self.low_conf_thres = low_conf_thres + + self.input_size = input_size + if motion == 'KalmanFilter': + self.motion = KalmanFilter() + self.metric_type = metric_type + + self.frame_id = 0 + self.tracked_tracks_dict = defaultdict(list) # dict(list[STrack]) + self.lost_tracks_dict = defaultdict(list) # dict(list[STrack]) + self.removed_tracks_dict = defaultdict(list) # dict(list[STrack]) + + self.max_time_lost = 0 + # max_time_lost will be calculated: int(frame_rate / 30.0 * track_buffer) + + def update(self, pred_dets, pred_embs=None): + """ + Processes the image frame and finds bounding box(detections). + Associates the detection with corresponding tracklets and also handles + lost, removed, refound and active tracklets. + + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512]. + + Return: + output_stracks_dict (dict(list)): The list contains information + regarding the online_tracklets for the received image tensor. + """ + self.frame_id += 1 + if self.frame_id == 1: + STrack.init_count(self.num_classes) + activated_tracks_dict = defaultdict(list) + refined_tracks_dict = defaultdict(list) + lost_tracks_dict = defaultdict(list) + removed_tracks_dict = defaultdict(list) + output_tracks_dict = defaultdict(list) + + pred_dets_dict = defaultdict(list) + pred_embs_dict = defaultdict(list) + + # unify single and multi classes detection and embedding results + for cls_id in range(self.num_classes): + cls_idx = (pred_dets[:, 0:1] == cls_id).squeeze(-1) + pred_dets_dict[cls_id] = pred_dets[cls_idx] + if pred_embs is not None: + pred_embs_dict[cls_id] = pred_embs[cls_idx] + else: + pred_embs_dict[cls_id] = None + + for cls_id in range(self.num_classes): + """ Step 1: Get detections by class""" + pred_dets_cls = pred_dets_dict[cls_id] + pred_embs_cls = pred_embs_dict[cls_id] + remain_inds = (pred_dets_cls[:, 1:2] > self.conf_thres).squeeze(-1) + if remain_inds.sum() > 0: + pred_dets_cls = pred_dets_cls[remain_inds] + if pred_embs_cls is None: + # in original ByteTrack + detections = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), + tlbrs[1], + cls_id, + 30, + temp_feat=None) for tlbrs in pred_dets_cls + ] + else: + pred_embs_cls = pred_embs_cls[remain_inds] + detections = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], cls_id, + 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls, pred_embs_cls) + ] + else: + detections = [] + ''' Add newly detected tracklets to tracked_stracks''' + unconfirmed_dict = defaultdict(list) + tracked_tracks_dict = defaultdict(list) + for track in self.tracked_tracks_dict[cls_id]: + if not track.is_activated: + # previous tracks which are not active in the current frame are added in unconfirmed list + unconfirmed_dict[cls_id].append(track) + else: + # Active tracks are added to the local list 'tracked_stracks' + tracked_tracks_dict[cls_id].append(track) + """ Step 2: First association, with embedding""" + # building tracking pool for the current frame + track_pool_dict = defaultdict(list) + track_pool_dict[cls_id] = joint_stracks( + tracked_tracks_dict[cls_id], self.lost_tracks_dict[cls_id]) + + # Predict the current location with KalmanFilter + STrack.multi_predict(track_pool_dict[cls_id], self.motion) + + if pred_embs_cls is None: + # in original ByteTrack + dists = matching.iou_distance(track_pool_dict[cls_id], + detections) + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.match_thres) # not self.tracked_thresh + else: + dists = matching.embedding_distance( + track_pool_dict[cls_id], + detections, + metric=self.metric_type) + dists = matching.fuse_motion( + self.motion, dists, track_pool_dict[cls_id], detections) + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.tracked_thresh) + + for i_tracked, idet in matches: + # i_tracked is the id of the track and idet is the detection + track = track_pool_dict[cls_id][i_tracked] + det = detections[idet] + if track.state == TrackState.Tracked: + # If the track is active, add the detection to the track + track.update(detections[idet], self.frame_id) + activated_tracks_dict[cls_id].append(track) + else: + # We have obtained a detection from a track which is not active, + # hence put the track in refind_stracks list + track.re_activate(det, self.frame_id, new_id=False) + refined_tracks_dict[cls_id].append(track) + + # None of the steps below happen if there are no undetected tracks. + """ Step 3: Second association, with IOU""" + if self.use_byte: + inds_low = pred_dets_dict[cls_id][:, 1:2] > self.low_conf_thres + inds_high = pred_dets_dict[cls_id][:, 1:2] < self.conf_thres + inds_second = np.logical_and(inds_low, inds_high).squeeze(-1) + pred_dets_cls_second = pred_dets_dict[cls_id][inds_second] + + # association the untrack to the low score detections + if len(pred_dets_cls_second) > 0: + if pred_embs_dict[cls_id] is None: + # in original ByteTrack + detections_second = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), + tlbrs[1], + cls_id, + 30, + temp_feat=None) + for tlbrs in pred_dets_cls_second + ] + else: + pred_embs_cls_second = pred_embs_dict[cls_id][ + inds_second] + detections_second = [ + STrack( + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], + cls_id, 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls_second, pred_embs_cls_second) + ] + else: + detections_second = [] + r_tracked_stracks = [ + track_pool_dict[cls_id][i] for i in u_track + if track_pool_dict[cls_id][i].state == TrackState.Tracked + ] + dists = matching.iou_distance(r_tracked_stracks, + detections_second) + matches, u_track, u_detection_second = matching.linear_assignment( + dists, thresh=0.4) # not r_tracked_thresh + else: + detections = [detections[i] for i in u_detection] + r_tracked_stracks = [] + for i in u_track: + if track_pool_dict[cls_id][i].state == TrackState.Tracked: + r_tracked_stracks.append(track_pool_dict[cls_id][i]) + dists = matching.iou_distance(r_tracked_stracks, detections) + + matches, u_track, u_detection = matching.linear_assignment( + dists, thresh=self.r_tracked_thresh) + + for i_tracked, idet in matches: + track = r_tracked_stracks[i_tracked] + det = detections[ + idet] if not self.use_byte else detections_second[idet] + if track.state == TrackState.Tracked: + track.update(det, self.frame_id) + activated_tracks_dict[cls_id].append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refined_tracks_dict[cls_id].append(track) + + for it in u_track: + track = r_tracked_stracks[it] + if not track.state == TrackState.Lost: + track.mark_lost() + lost_tracks_dict[cls_id].append(track) + '''Deal with unconfirmed tracks, usually tracks with only one beginning frame''' + detections = [detections[i] for i in u_detection] + dists = matching.iou_distance(unconfirmed_dict[cls_id], detections) + matches, u_unconfirmed, u_detection = matching.linear_assignment( + dists, thresh=self.unconfirmed_thresh) + for i_tracked, idet in matches: + unconfirmed_dict[cls_id][i_tracked].update(detections[idet], + self.frame_id) + activated_tracks_dict[cls_id].append(unconfirmed_dict[cls_id][ + i_tracked]) + for it in u_unconfirmed: + track = unconfirmed_dict[cls_id][it] + track.mark_removed() + removed_tracks_dict[cls_id].append(track) + """ Step 4: Init new stracks""" + for inew in u_detection: + track = detections[inew] + if track.score < self.det_thresh: + continue + track.activate(self.motion, self.frame_id) + activated_tracks_dict[cls_id].append(track) + """ Step 5: Update state""" + for track in self.lost_tracks_dict[cls_id]: + if self.frame_id - track.end_frame > self.max_time_lost: + track.mark_removed() + removed_tracks_dict[cls_id].append(track) + + self.tracked_tracks_dict[cls_id] = [ + t for t in self.tracked_tracks_dict[cls_id] + if t.state == TrackState.Tracked + ] + self.tracked_tracks_dict[cls_id] = joint_stracks( + self.tracked_tracks_dict[cls_id], activated_tracks_dict[cls_id]) + self.tracked_tracks_dict[cls_id] = joint_stracks( + self.tracked_tracks_dict[cls_id], refined_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id] = sub_stracks( + self.lost_tracks_dict[cls_id], self.tracked_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id].extend(lost_tracks_dict[cls_id]) + self.lost_tracks_dict[cls_id] = sub_stracks( + self.lost_tracks_dict[cls_id], self.removed_tracks_dict[cls_id]) + self.removed_tracks_dict[cls_id].extend(removed_tracks_dict[cls_id]) + self.tracked_tracks_dict[cls_id], self.lost_tracks_dict[ + cls_id] = remove_duplicate_stracks( + self.tracked_tracks_dict[cls_id], + self.lost_tracks_dict[cls_id]) + + # get scores of lost tracks + output_tracks_dict[cls_id] = [ + track for track in self.tracked_tracks_dict[cls_id] + if track.is_activated + ] + + logger.debug('===========Frame {}=========='.format(self.frame_id)) + logger.debug('Activated: {}'.format( + [track.track_id for track in activated_tracks_dict[cls_id]])) + logger.debug('Refind: {}'.format( + [track.track_id for track in refined_tracks_dict[cls_id]])) + logger.debug('Lost: {}'.format( + [track.track_id for track in lost_tracks_dict[cls_id]])) + logger.debug('Removed: {}'.format( + [track.track_id for track in removed_tracks_dict[cls_id]])) + + return output_tracks_dict diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/ocsort_tracker.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/ocsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..49b44e35d830a871f96117146880c134729c01e1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/tracker/ocsort_tracker.py @@ -0,0 +1,371 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/ocsort.py +""" + +import numpy as np +from ..matching.ocsort_matching import associate, linear_assignment, iou_batch, associate_only_iou +from ..motion.ocsort_kalman_filter import OCSORTKalmanFilter +from ppdet.core.workspace import register, serializable + + +def k_previous_obs(observations, cur_age, k): + if len(observations) == 0: + return [-1, -1, -1, -1, -1] + for i in range(k): + dt = k - i + if cur_age - dt in observations: + return observations[cur_age - dt] + max_age = max(observations.keys()) + return observations[max_age] + + +def convert_bbox_to_z(bbox): + """ + Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form + [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is + the aspect ratio + """ + w = bbox[2] - bbox[0] + h = bbox[3] - bbox[1] + x = bbox[0] + w / 2. + y = bbox[1] + h / 2. + s = w * h # scale is just area + r = w / float(h + 1e-6) + return np.array([x, y, s, r]).reshape((4, 1)) + + +def convert_x_to_bbox(x, score=None): + """ + Takes a bounding box in the centre form [x,y,s,r] and returns it in the form + [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right + """ + w = np.sqrt(x[2] * x[3]) + h = x[2] / w + if (score == None): + return np.array( + [x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., + x[1] + h / 2.]).reshape((1, 4)) + else: + score = np.array([score]) + return np.array([ + x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score + ]).reshape((1, 5)) + + +def speed_direction(bbox1, bbox2): + cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0 + cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0 + speed = np.array([cy2 - cy1, cx2 - cx1]) + norm = np.sqrt((cy2 - cy1)**2 + (cx2 - cx1)**2) + 1e-6 + return speed / norm + + +class KalmanBoxTracker(object): + """ + This class represents the internal state of individual tracked objects observed as bbox. + + Args: + bbox (np.array): bbox in [x1,y1,x2,y2,score] format. + delta_t (int): delta_t of previous observation + """ + count = 0 + + def __init__(self, bbox, delta_t=3): + + self.kf = OCSORTKalmanFilter(dim_x=7, dim_z=4) + self.kf.F = np.array([[1., 0, 0, 0, 1., 0, 0], [0, 1., 0, 0, 0, 1., 0], + [0, 0, 1., 0, 0, 0, 1], [0, 0, 0, 1., 0, 0, 0], + [0, 0, 0, 0, 1., 0, 0], [0, 0, 0, 0, 0, 1., 0], + [0, 0, 0, 0, 0, 0, 1.]]) + self.kf.H = np.array([[1., 0, 0, 0, 0, 0, 0], [0, 1., 0, 0, 0, 0, 0], + [0, 0, 1., 0, 0, 0, 0], [0, 0, 0, 1., 0, 0, 0]]) + self.kf.R[2:, 2:] *= 10. + self.kf.P[4:, 4:] *= 1000. + # give high uncertainty to the unobservable initial velocities + self.kf.P *= 10. + self.kf.Q[-1, -1] *= 0.01 + self.kf.Q[4:, 4:] *= 0.01 + + self.score = bbox[4] + self.kf.x[:4] = convert_bbox_to_z(bbox) + self.time_since_update = 0 + self.id = KalmanBoxTracker.count + KalmanBoxTracker.count += 1 + self.history = [] + self.hits = 0 + self.hit_streak = 0 + self.age = 0 + """ + NOTE: [-1,-1,-1,-1,-1] is a compromising placeholder for non-observation status, the same for the return of + function k_previous_obs. It is ugly and I do not like it. But to support generate observation array in a + fast and unified way, which you would see below k_observations = np.array([k_previous_obs(...]]), let's bear it for now. + """ + self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder + self.observations = dict() + self.history_observations = [] + self.velocity = None + self.delta_t = delta_t + + def update(self, bbox, angle_cost=False): + """ + Updates the state vector with observed bbox. + """ + if bbox is not None: + if angle_cost and self.last_observation.sum( + ) >= 0: # no previous observation + previous_box = None + for i in range(self.delta_t): + dt = self.delta_t - i + if self.age - dt in self.observations: + previous_box = self.observations[self.age - dt] + break + if previous_box is None: + previous_box = self.last_observation + """ + Estimate the track speed direction with observations \Delta t steps away + """ + self.velocity = speed_direction(previous_box, bbox) + """ + Insert new observations. This is a ugly way to maintain both self.observations + and self.history_observations. Bear it for the moment. + """ + self.last_observation = bbox + self.observations[self.age] = bbox + self.history_observations.append(bbox) + + self.time_since_update = 0 + self.history = [] + self.hits += 1 + self.hit_streak += 1 + self.kf.update(convert_bbox_to_z(bbox)) + else: + self.kf.update(bbox) + + def predict(self): + """ + Advances the state vector and returns the predicted bounding box estimate. + """ + if ((self.kf.x[6] + self.kf.x[2]) <= 0): + self.kf.x[6] *= 0.0 + + self.kf.predict() + self.age += 1 + if (self.time_since_update > 0): + self.hit_streak = 0 + self.time_since_update += 1 + self.history.append(convert_x_to_bbox(self.kf.x, score=self.score)) + return self.history[-1] + + def get_state(self): + return convert_x_to_bbox(self.kf.x, score=self.score) + + +@register +@serializable +class OCSORTTracker(object): + """ + OCSORT tracker, support single class + + Args: + det_thresh (float): threshold of detection score + max_age (int): maximum number of missed misses before a track is deleted + min_hits (int): minimum hits for associate + iou_threshold (float): iou threshold for associate + delta_t (int): delta_t of previous observation + inertia (float): vdc_weight of angle_diff_cost for associate + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + min_box_area (int): min box area to filter out low quality boxes + use_byte (bool): Whether use ByteTracker, default False + """ + + def __init__(self, + det_thresh=0.6, + max_age=30, + min_hits=3, + iou_threshold=0.3, + delta_t=3, + inertia=0.2, + vertical_ratio=-1, + min_box_area=0, + use_byte=False, + use_angle_cost=False): + self.det_thresh = det_thresh + self.max_age = max_age + self.min_hits = min_hits + self.iou_threshold = iou_threshold + self.delta_t = delta_t + self.inertia = inertia + self.vertical_ratio = vertical_ratio + self.min_box_area = min_box_area + self.use_byte = use_byte + self.use_angle_cost = use_angle_cost + + self.trackers = [] + self.frame_count = 0 + KalmanBoxTracker.count = 0 + + def update(self, pred_dets, pred_embs=None): + """ + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512], default as None. + + Return: + tracking boxes (np.array): [M, 6], means 'x0, y0, x1, y1, score, id'. + """ + if pred_dets is None: + return np.empty((0, 6)) + + self.frame_count += 1 + + bboxes = pred_dets[:, 2:] + scores = pred_dets[:, 1:2] + dets = np.concatenate((bboxes, scores), axis=1) + scores = scores.squeeze(-1) + + inds_low = scores > 0.1 + inds_high = scores < self.det_thresh + inds_second = np.logical_and(inds_low, inds_high) + # self.det_thresh > score > 0.1, for second matching + dets_second = dets[inds_second] # detections for second matching + remain_inds = scores > self.det_thresh + dets = dets[remain_inds] + + # get predicted locations from existing trackers. + trks = np.zeros((len(self.trackers), 5)) + to_del = [] + ret = [] + for t, trk in enumerate(trks): + pos = self.trackers[t].predict()[0] + trk[:] = [pos[0], pos[1], pos[2], pos[3], 0] + if np.any(np.isnan(pos)): + to_del.append(t) + trks = np.ma.compress_rows(np.ma.masked_invalid(trks)) + for t in reversed(to_del): + self.trackers.pop(t) + + if self.use_angle_cost: + velocities = np.array([ + trk.velocity if trk.velocity is not None else np.array((0, 0)) + for trk in self.trackers + ]) + + k_observations = np.array([ + k_previous_obs(trk.observations, trk.age, self.delta_t) + for trk in self.trackers + ]) + last_boxes = np.array([trk.last_observation for trk in self.trackers]) + """ + First round of association + """ + if self.use_angle_cost: + matched, unmatched_dets, unmatched_trks = associate( + dets, trks, self.iou_threshold, velocities, k_observations, + self.inertia) + else: + matched, unmatched_dets, unmatched_trks = associate_only_iou( + dets, trks, self.iou_threshold) + + for m in matched: + self.trackers[m[1]].update( + dets[m[0], :], angle_cost=self.use_angle_cost) + """ + Second round of associaton by OCR + """ + # BYTE association + if self.use_byte and len(dets_second) > 0 and unmatched_trks.shape[ + 0] > 0: + u_trks = trks[unmatched_trks] + iou_left = iou_batch( + dets_second, + u_trks) # iou between low score detections and unmatched tracks + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + matched_indices = linear_assignment(-iou_left) + to_remove_trk_indices = [] + for m in matched_indices: + det_ind, trk_ind = m[0], unmatched_trks[m[1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update( + dets_second[det_ind, :], angle_cost=self.use_angle_cost) + to_remove_trk_indices.append(trk_ind) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0: + left_dets = dets[unmatched_dets] + left_trks = last_boxes[unmatched_trks] + iou_left = iou_batch(left_dets, left_trks) + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + rematched_indices = linear_assignment(-iou_left) + to_remove_det_indices = [] + to_remove_trk_indices = [] + for m in rematched_indices: + det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[ + 1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update( + dets[det_ind, :], angle_cost=self.use_angle_cost) + to_remove_det_indices.append(det_ind) + to_remove_trk_indices.append(trk_ind) + unmatched_dets = np.setdiff1d(unmatched_dets, + np.array(to_remove_det_indices)) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + for m in unmatched_trks: + self.trackers[m].update(None) + + # create and initialise new trackers for unmatched detections + for i in unmatched_dets: + trk = KalmanBoxTracker(dets[i, :], delta_t=self.delta_t) + self.trackers.append(trk) + + i = len(self.trackers) + for trk in reversed(self.trackers): + if trk.last_observation.sum() < 0: + d = trk.get_state()[0] + else: + d = trk.last_observation # tlbr + score + if (trk.time_since_update < 1) and ( + trk.hit_streak >= self.min_hits or + self.frame_count <= self.min_hits): + # +1 as MOT benchmark requires positive + ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) + i -= 1 + # remove dead tracklet + if (trk.time_since_update > self.max_age): + self.trackers.pop(i) + if (len(ret) > 0): + return np.concatenate(ret) + return np.empty((0, 6)) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/utils.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f19b0d96bac48b6fc437518237d34604d6e19a86 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/utils.py @@ -0,0 +1,265 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import cv2 +import time +import numpy as np +from .visualization import plot_tracking_dict, plot_tracking + +__all__ = [ + 'MOTTimer', + 'Detection', + 'write_mot_results', + 'save_vis_results', + 'load_det_results', + 'preprocess_reid', + 'get_crops', + 'clip_box', + 'scale_coords', +] + + +class MOTTimer(object): + """ + This class used to compute and print the current FPS while evaling. + """ + + def __init__(self): + self.total_time = 0. + self.calls = 0 + self.start_time = 0. + self.diff = 0. + self.average_time = 0. + self.duration = 0. + + def tic(self): + # using time.time instead of time.clock because time time.clock + # does not normalize for multithreading + self.start_time = time.time() + + def toc(self, average=True): + self.diff = time.time() - self.start_time + self.total_time += self.diff + self.calls += 1 + self.average_time = self.total_time / self.calls + if average: + self.duration = self.average_time + else: + self.duration = self.diff + return self.duration + + def clear(self): + self.total_time = 0. + self.calls = 0 + self.start_time = 0. + self.diff = 0. + self.average_time = 0. + self.duration = 0. + + +class Detection(object): + """ + This class represents a bounding box detection in a single image. + + Args: + tlwh (Tensor): Bounding box in format `(top left x, top left y, + width, height)`. + score (Tensor): Bounding box confidence score. + feature (Tensor): A feature vector that describes the object + contained in this image. + cls_id (Tensor): Bounding box category id. + """ + + def __init__(self, tlwh, score, feature, cls_id): + self.tlwh = np.asarray(tlwh, dtype=np.float32) + self.score = float(score) + self.feature = np.asarray(feature, dtype=np.float32) + self.cls_id = int(cls_id) + + def to_tlbr(self): + """ + Convert bounding box to format `(min x, min y, max x, max y)`, i.e., + `(top left, bottom right)`. + """ + ret = self.tlwh.copy() + ret[2:] += ret[:2] + return ret + + def to_xyah(self): + """ + Convert bounding box to format `(center x, center y, aspect ratio, + height)`, where the aspect ratio is `width / height`. + """ + ret = self.tlwh.copy() + ret[:2] += ret[2:] / 2 + ret[2] /= ret[3] + return ret + + +def write_mot_results(filename, results, data_type='mot', num_classes=1): + # support single and multi classes + if data_type in ['mot', 'mcmot']: + save_format = '{frame},{id},{x1},{y1},{w},{h},{score},{cls_id},-1,-1\n' + elif data_type == 'kitti': + save_format = '{frame} {id} car 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n' + else: + raise ValueError(data_type) + + f = open(filename, 'w') + for cls_id in range(num_classes): + for frame_id, tlwhs, tscores, track_ids in results[cls_id]: + if data_type == 'kitti': + frame_id -= 1 + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + if data_type == 'mot': + cls_id = -1 + + x1, y1, w, h = tlwh + x2, y2 = x1 + w, y1 + h + line = save_format.format( + frame=frame_id, + id=track_id, + x1=x1, + y1=y1, + x2=x2, + y2=y2, + w=w, + h=h, + score=score, + cls_id=cls_id) + f.write(line) + print('MOT results save in {}'.format(filename)) + + +def save_vis_results(data, + frame_id, + online_ids, + online_tlwhs, + online_scores, + average_time, + show_image, + save_dir, + num_classes=1, + ids2names=[]): + if show_image or save_dir is not None: + assert 'ori_image' in data + img0 = data['ori_image'].numpy()[0] + if online_ids is None: + online_im = img0 + else: + if isinstance(online_tlwhs, dict): + online_im = plot_tracking_dict( + img0, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=1. / average_time, + ids2names=ids2names) + else: + online_im = plot_tracking( + img0, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=1. / average_time, + ids2names=ids2names) + if show_image: + cv2.imshow('online_im', online_im) + if save_dir is not None: + cv2.imwrite( + os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), online_im) + + +def load_det_results(det_file, num_frames): + assert os.path.exists(det_file) and os.path.isfile(det_file), \ + '{} is not exist or not a file.'.format(det_file) + labels = np.loadtxt(det_file, dtype='float32', delimiter=',') + assert labels.shape[1] == 7, \ + "Each line of {} should have 7 items: '[frame_id],[x0],[y0],[w],[h],[score],[class_id]'.".format(det_file) + results_list = [] + for frame_i in range(num_frames): + results = {'bbox': [], 'score': [], 'cls_id': []} + lables_with_frame = labels[labels[:, 0] == frame_i + 1] + # each line of lables_with_frame: + # [frame_id],[x0],[y0],[w],[h],[score],[class_id] + for l in lables_with_frame: + results['bbox'].append(l[1:5]) + results['score'].append(l[5:6]) + results['cls_id'].append(l[6:7]) + results_list.append(results) + return results_list + + +def scale_coords(coords, input_shape, im_shape, scale_factor): + # Note: ratio has only one value, scale_factor[0] == scale_factor[1] + # + # This function only used for JDE YOLOv3 or other detectors with + # LetterBoxResize and JDEBBoxPostProcess, coords output from detector had + # not scaled back to the origin image. + + ratio = scale_factor[0] + pad_w = (input_shape[1] - int(im_shape[1])) / 2 + pad_h = (input_shape[0] - int(im_shape[0])) / 2 + coords[:, 0::2] -= pad_w + coords[:, 1::2] -= pad_h + coords[:, 0:4] /= ratio + coords[:, :4] = np.clip(coords[:, :4], a_min=0, a_max=coords[:, :4].max()) + return coords.round() + + +def clip_box(xyxy, ori_image_shape): + H, W = ori_image_shape + xyxy[:, 0::2] = np.clip(xyxy[:, 0::2], a_min=0, a_max=W) + xyxy[:, 1::2] = np.clip(xyxy[:, 1::2], a_min=0, a_max=H) + w = xyxy[:, 2:3] - xyxy[:, 0:1] + h = xyxy[:, 3:4] - xyxy[:, 1:2] + mask = np.logical_and(h > 0, w > 0) + keep_idx = np.nonzero(mask) + return xyxy[keep_idx[0]], keep_idx + + +def get_crops(xyxy, ori_img, w, h): + crops = [] + xyxy = xyxy.astype(np.int64) + ori_img = ori_img.numpy() + ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2) # [h,w,3]->[w,h,3] + for i, bbox in enumerate(xyxy): + crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :] + crops.append(crop) + crops = preprocess_reid(crops, w, h) + return crops + + +def preprocess_reid(imgs, + w=64, + h=192, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + im_batch = [] + for img in imgs: + img = cv2.resize(img, (w, h)) + img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255 + img_mean = np.array(mean).reshape((3, 1, 1)) + img_std = np.array(std).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + img = np.expand_dims(img, axis=0) + im_batch.append(img) + im_batch = np.concatenate(im_batch, 0) + return im_batch diff --git a/PaddleDetection-release-2.6/ppdet/modeling/mot/visualization.py b/PaddleDetection-release-2.6/ppdet/modeling/mot/visualization.py new file mode 100644 index 0000000000000000000000000000000000000000..6d13a28777c9afaa515de28623cb39593f0e71a1 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/mot/visualization.py @@ -0,0 +1,146 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np + + +def get_color(idx): + idx = idx * 3 + color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255) + return color + + +def plot_tracking(image, + tlwhs, + obj_ids, + scores=None, + frame_id=0, + fps=0., + ids2names=[]): + im = np.ascontiguousarray(np.copy(image)) + im_h, im_w = im.shape[:2] + + top_view = np.zeros([im_w, im_w, 3], dtype=np.uint8) + 255 + + text_scale = max(1, image.shape[1] / 1600.) + text_thickness = 2 + line_thickness = max(1, int(image.shape[1] / 500.)) + + radius = max(5, int(im_w / 140.)) + cv2.putText( + im, + 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), + (0, int(15 * text_scale)), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 0, 255), + thickness=2) + + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h))) + obj_id = int(obj_ids[i]) + id_text = '{}'.format(int(obj_id)) + if ids2names != []: + assert len( + ids2names) == 1, "plot_tracking only supports single classes." + id_text = '{}_'.format(ids2names[0]) + id_text + _line_thickness = 1 if obj_id <= 0 else line_thickness + color = get_color(abs(obj_id)) + cv2.rectangle( + im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness) + cv2.putText( + im, + id_text, (intbox[0], intbox[1] - 10), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if scores is not None: + text = '{:.2f}'.format(float(scores[i])) + cv2.putText( + im, + text, (intbox[0], intbox[1] + 10), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 255, 255), + thickness=text_thickness) + return im + + +def plot_tracking_dict(image, + num_classes, + tlwhs_dict, + obj_ids_dict, + scores_dict, + frame_id=0, + fps=0., + ids2names=[]): + im = np.ascontiguousarray(np.copy(image)) + im_h, im_w = im.shape[:2] + + top_view = np.zeros([im_w, im_w, 3], dtype=np.uint8) + 255 + + text_scale = max(1, image.shape[1] / 1600.) + text_thickness = 2 + line_thickness = max(1, int(image.shape[1] / 500.)) + + radius = max(5, int(im_w / 140.)) + + for cls_id in range(num_classes): + tlwhs = tlwhs_dict[cls_id] + obj_ids = obj_ids_dict[cls_id] + scores = scores_dict[cls_id] + cv2.putText( + im, + 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), + (0, int(15 * text_scale)), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 0, 255), + thickness=2) + + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h))) + obj_id = int(obj_ids[i]) + + id_text = '{}'.format(int(obj_id)) + if ids2names != []: + id_text = '{}_{}'.format(ids2names[cls_id], id_text) + else: + id_text = 'class{}_{}'.format(cls_id, id_text) + + _line_thickness = 1 if obj_id <= 0 else line_thickness + color = get_color(abs(obj_id)) + cv2.rectangle( + im, + intbox[0:2], + intbox[2:4], + color=color, + thickness=line_thickness) + cv2.putText( + im, + id_text, (intbox[0], intbox[1] - 10), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 0, 255), + thickness=text_thickness) + + if scores is not None: + text = '{:.2f}'.format(float(scores[i])) + cv2.putText( + im, + text, (intbox[0], intbox[1] + 10), + cv2.FONT_HERSHEY_PLAIN, + text_scale, (0, 255, 255), + thickness=text_thickness) + return im diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..478efec98e324b213ad3f822b551f92265d91e25 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/__init__.py @@ -0,0 +1,39 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import fpn +from . import yolo_fpn +from . import hrfpn +from . import ttf_fpn +from . import centernet_fpn +from . import bifpn +from . import csp_pan +from . import es_pan +from . import lc_pan +from . import custom_pan +from . import dilated_encoder + +from .fpn import * +from .yolo_fpn import * +from .hrfpn import * +from .ttf_fpn import * +from .centernet_fpn import * +from .blazeface_fpn import * +from .bifpn import * +from .csp_pan import * +from .es_pan import * +from .lc_pan import * +from .custom_pan import * +from .dilated_encoder import * +from .channel_mapper import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8a064f3a2937c24b268d8eb0fd91b64493805db2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/bifpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/bifpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0e53826454e45412e4436f4c0b307f75adaba987 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/bifpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/blazeface_fpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/blazeface_fpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9e32844eb2cf27de90d5d317df83c63f08b82fba Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/blazeface_fpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/centernet_fpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/centernet_fpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..aaf0a034a2bd5f0cc2d0771f4ed4a6b30d81dedf Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/centernet_fpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/channel_mapper.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/channel_mapper.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d9ce34d491b0fafd0e030d21e379649e4199fcf3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/channel_mapper.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/csp_pan.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/csp_pan.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..a01c230845eba62a10bb8bfa1e9c3978764235ea Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/csp_pan.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/custom_pan.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/custom_pan.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..db5f344981ada4b48b85ff6f877b563cfe942846 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/custom_pan.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/dilated_encoder.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/dilated_encoder.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b5132638baa3157e18288943530b534c174e3e76 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/dilated_encoder.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/es_pan.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/es_pan.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..308d7f75e5232c1634081492d553b19aeabb56f0 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/es_pan.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/fpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/fpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7c64106315e19c222a9c85ddf59471f342240c81 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/fpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/hrfpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/hrfpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b72274657059e48d4a3cad7ad6fcf44ee2cd52f9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/hrfpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/lc_pan.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/lc_pan.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4680174edd30f379f56ce424f89dab7171e2a88f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/lc_pan.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/ttf_fpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/ttf_fpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..b6df3c98561eaa6c8045cd9f922363aa33cca72d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/ttf_fpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/yolo_fpn.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/yolo_fpn.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..66735176fdc38306add9052c0e24c89ab784e758 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/necks/__pycache__/yolo_fpn.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/bifpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/bifpn.py new file mode 100644 index 0000000000000000000000000000000000000000..9e794b8f50b92de6a98cc15ebcc3bca6cfaccf41 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/bifpn.py @@ -0,0 +1,300 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant + +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ..shape_spec import ShapeSpec + +__all__ = ['BiFPN'] + + +class SeparableConvLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels=None, + kernel_size=3, + norm_type='bn', + norm_groups=32, + act='swish'): + super(SeparableConvLayer, self).__init__() + assert norm_type in ['bn', 'sync_bn', 'gn', None] + assert act in ['swish', 'relu', None] + + self.in_channels = in_channels + if out_channels is None: + self.out_channels = self.in_channels + self.norm_type = norm_type + self.norm_groups = norm_groups + self.depthwise_conv = nn.Conv2D( + in_channels, + in_channels, + kernel_size, + padding=kernel_size // 2, + groups=in_channels, + bias_attr=False) + self.pointwise_conv = nn.Conv2D(in_channels, self.out_channels, 1) + + # norm type + if self.norm_type in ['bn', 'sync_bn']: + self.norm = nn.BatchNorm2D(self.out_channels) + elif self.norm_type == 'gn': + self.norm = nn.GroupNorm( + num_groups=self.norm_groups, num_channels=self.out_channels) + + # activation + if act == 'swish': + self.act = nn.Swish() + elif act == 'relu': + self.act = nn.ReLU() + + def forward(self, x): + if self.act is not None: + x = self.act(x) + out = self.depthwise_conv(x) + out = self.pointwise_conv(out) + if self.norm_type is not None: + out = self.norm(out) + return out + + +class BiFPNCell(nn.Layer): + def __init__(self, + channels=256, + num_levels=5, + eps=1e-5, + use_weighted_fusion=True, + kernel_size=3, + norm_type='bn', + norm_groups=32, + act='swish'): + super(BiFPNCell, self).__init__() + self.channels = channels + self.num_levels = num_levels + self.eps = eps + self.use_weighted_fusion = use_weighted_fusion + + # up + self.conv_up = nn.LayerList([ + SeparableConvLayer( + self.channels, + kernel_size=kernel_size, + norm_type=norm_type, + norm_groups=norm_groups, + act=act) for _ in range(self.num_levels - 1) + ]) + # down + self.conv_down = nn.LayerList([ + SeparableConvLayer( + self.channels, + kernel_size=kernel_size, + norm_type=norm_type, + norm_groups=norm_groups, + act=act) for _ in range(self.num_levels - 1) + ]) + + if self.use_weighted_fusion: + self.up_weights = self.create_parameter( + shape=[self.num_levels - 1, 2], + attr=ParamAttr(initializer=Constant(1.))) + self.down_weights = self.create_parameter( + shape=[self.num_levels - 1, 3], + attr=ParamAttr(initializer=Constant(1.))) + + def _feature_fusion_cell(self, + conv_layer, + lateral_feat, + sampling_feat, + route_feat=None, + weights=None): + if self.use_weighted_fusion: + weights = F.relu(weights) + weights = weights / (weights.sum() + self.eps) + if route_feat is not None: + out_feat = weights[0] * lateral_feat + \ + weights[1] * sampling_feat + \ + weights[2] * route_feat + else: + out_feat = weights[0] * lateral_feat + \ + weights[1] * sampling_feat + else: + if route_feat is not None: + out_feat = lateral_feat + sampling_feat + route_feat + else: + out_feat = lateral_feat + sampling_feat + + out_feat = conv_layer(out_feat) + return out_feat + + def forward(self, feats): + # feats: [P3 - P7] + lateral_feats = [] + + # up + up_feature = feats[-1] + for i, feature in enumerate(feats[::-1]): + if i == 0: + lateral_feats.append(feature) + else: + shape = paddle.shape(feature) + up_feature = F.interpolate( + up_feature, size=[shape[2], shape[3]]) + lateral_feature = self._feature_fusion_cell( + self.conv_up[i - 1], + feature, + up_feature, + weights=self.up_weights[i - 1] + if self.use_weighted_fusion else None) + lateral_feats.append(lateral_feature) + up_feature = lateral_feature + + out_feats = [] + # down + down_feature = lateral_feats[-1] + for i, (lateral_feature, + route_feature) in enumerate(zip(lateral_feats[::-1], feats)): + if i == 0: + out_feats.append(lateral_feature) + else: + down_feature = F.max_pool2d(down_feature, 3, 2, 1) + if i == len(feats) - 1: + route_feature = None + weights = self.down_weights[ + i - 1][:2] if self.use_weighted_fusion else None + else: + weights = self.down_weights[ + i - 1] if self.use_weighted_fusion else None + out_feature = self._feature_fusion_cell( + self.conv_down[i - 1], + lateral_feature, + down_feature, + route_feature, + weights=weights) + out_feats.append(out_feature) + down_feature = out_feature + + return out_feats + + +@register +@serializable +class BiFPN(nn.Layer): + """ + Bidirectional Feature Pyramid Network, see https://arxiv.org/abs/1911.09070 + + Args: + in_channels (list[int]): input channels of each level which can be + derived from the output shape of backbone by from_config. + out_channel (int): output channel of each level. + num_extra_levels (int): the number of extra stages added to the last level. + default: 2 + fpn_strides (List): The stride of each level. + num_stacks (int): the number of stacks for BiFPN, default: 1. + use_weighted_fusion (bool): use weighted feature fusion in BiFPN, default: True. + norm_type (string|None): the normalization type in BiFPN module. If + norm_type is None, norm will not be used after conv and if + norm_type is string, bn, gn, sync_bn are available. default: bn. + norm_groups (int): if you use gn, set this param. + act (string|None): the activation function of BiFPN. + """ + + def __init__(self, + in_channels=(512, 1024, 2048), + out_channel=256, + num_extra_levels=2, + fpn_strides=[8, 16, 32, 64, 128], + num_stacks=1, + use_weighted_fusion=True, + norm_type='bn', + norm_groups=32, + act='swish'): + super(BiFPN, self).__init__() + assert num_stacks > 0, "The number of stacks of BiFPN is at least 1." + assert norm_type in ['bn', 'sync_bn', 'gn', None] + assert act in ['swish', 'relu', None] + assert num_extra_levels >= 0, \ + "The `num_extra_levels` must be non negative(>=0)." + + self.in_channels = in_channels + self.out_channel = out_channel + self.num_extra_levels = num_extra_levels + self.num_stacks = num_stacks + self.use_weighted_fusion = use_weighted_fusion + self.norm_type = norm_type + self.norm_groups = norm_groups + self.act = act + self.num_levels = len(self.in_channels) + self.num_extra_levels + if len(fpn_strides) != self.num_levels: + for i in range(self.num_extra_levels): + fpn_strides += [fpn_strides[-1] * 2] + self.fpn_strides = fpn_strides + + self.lateral_convs = nn.LayerList() + for in_c in in_channels: + self.lateral_convs.append( + ConvNormLayer(in_c, self.out_channel, 1, 1)) + if self.num_extra_levels > 0: + self.extra_convs = nn.LayerList() + for i in range(self.num_extra_levels): + if i == 0: + self.extra_convs.append( + ConvNormLayer(self.in_channels[-1], self.out_channel, 3, + 2)) + else: + self.extra_convs.append(nn.MaxPool2D(3, 2, 1)) + + self.bifpn_cells = nn.LayerList() + for i in range(self.num_stacks): + self.bifpn_cells.append( + BiFPNCell( + self.out_channel, + self.num_levels, + use_weighted_fusion=self.use_weighted_fusion, + norm_type=self.norm_type, + norm_groups=self.norm_groups, + act=self.act)) + + @classmethod + def from_config(cls, cfg, input_shape): + return { + 'in_channels': [i.channels for i in input_shape], + 'fpn_strides': [i.stride for i in input_shape] + } + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channel, stride=s) for s in self.fpn_strides + ] + + def forward(self, feats): + assert len(feats) == len(self.in_channels) + fpn_feats = [] + for conv_layer, feature in zip(self.lateral_convs, feats): + fpn_feats.append(conv_layer(feature)) + if self.num_extra_levels > 0: + feat = feats[-1] + for conv_layer in self.extra_convs: + feat = conv_layer(feat) + fpn_feats.append(feat) + + for bifpn_cell in self.bifpn_cells: + fpn_feats = bifpn_cell(fpn_feats) + return fpn_feats diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/blazeface_fpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/blazeface_fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..b903c97b290b82d6c528cce8a4b205decd42e28b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/blazeface_fpn.py @@ -0,0 +1,213 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn.functional as F +from paddle import ParamAttr +import paddle.nn as nn +from paddle.nn.initializer import KaimingNormal +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['BlazeNeck'] + + +def hard_swish(x): + return x * F.relu6(x + 3) / 6. + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels, + out_channels, + kernel_size, + stride, + padding, + num_groups=1, + act='relu', + conv_lr=0.1, + conv_decay=0., + norm_decay=0., + norm_type='bn', + name=None): + super(ConvBNLayer, self).__init__() + self.act = act + self._conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=num_groups, + weight_attr=ParamAttr( + learning_rate=conv_lr, initializer=KaimingNormal()), + bias_attr=False) + + if norm_type in ['sync_bn', 'bn']: + self._batch_norm = nn.BatchNorm2D(out_channels) + + def forward(self, x): + x = self._conv(x) + x = self._batch_norm(x) + if self.act == "relu": + x = F.relu(x) + elif self.act == "relu6": + x = F.relu6(x) + elif self.act == 'leaky': + x = F.leaky_relu(x) + elif self.act == 'hard_swish': + x = hard_swish(x) + return x + + +class FPN(nn.Layer): + def __init__(self, in_channels, out_channels, name=None): + super(FPN, self).__init__() + self.conv1_fpn = ConvBNLayer( + in_channels, + out_channels // 2, + kernel_size=1, + padding=0, + stride=1, + act='leaky', + name=name + '_output1') + self.conv2_fpn = ConvBNLayer( + in_channels, + out_channels // 2, + kernel_size=1, + padding=0, + stride=1, + act='leaky', + name=name + '_output2') + self.conv3_fpn = ConvBNLayer( + out_channels // 2, + out_channels // 2, + kernel_size=3, + padding=1, + stride=1, + act='leaky', + name=name + '_merge') + + def forward(self, input): + output1 = self.conv1_fpn(input[0]) + output2 = self.conv2_fpn(input[1]) + up2 = F.upsample( + output2, size=paddle.shape(output1)[-2:], mode='nearest') + output1 = paddle.add(output1, up2) + output1 = self.conv3_fpn(output1) + return output1, output2 + + +class SSH(nn.Layer): + def __init__(self, in_channels, out_channels, name=None): + super(SSH, self).__init__() + assert out_channels % 4 == 0 + self.conv0_ssh = ConvBNLayer( + in_channels, + out_channels // 2, + kernel_size=3, + padding=1, + stride=1, + act=None, + name=name + 'ssh_conv3') + self.conv1_ssh = ConvBNLayer( + out_channels // 2, + out_channels // 4, + kernel_size=3, + padding=1, + stride=1, + act='leaky', + name=name + 'ssh_conv5_1') + self.conv2_ssh = ConvBNLayer( + out_channels // 4, + out_channels // 4, + kernel_size=3, + padding=1, + stride=1, + act=None, + name=name + 'ssh_conv5_2') + self.conv3_ssh = ConvBNLayer( + out_channels // 4, + out_channels // 4, + kernel_size=3, + padding=1, + stride=1, + act='leaky', + name=name + 'ssh_conv7_1') + self.conv4_ssh = ConvBNLayer( + out_channels // 4, + out_channels // 4, + kernel_size=3, + padding=1, + stride=1, + act=None, + name=name + 'ssh_conv7_2') + + def forward(self, x): + conv0 = self.conv0_ssh(x) + conv1 = self.conv1_ssh(conv0) + conv2 = self.conv2_ssh(conv1) + conv3 = self.conv3_ssh(conv2) + conv4 = self.conv4_ssh(conv3) + concat = paddle.concat([conv0, conv2, conv4], axis=1) + return F.relu(concat) + + +@register +@serializable +class BlazeNeck(nn.Layer): + def __init__(self, in_channel, neck_type="None", data_format='NCHW'): + super(BlazeNeck, self).__init__() + self.neck_type = neck_type + self.reture_input = False + self._out_channels = in_channel + if self.neck_type == 'None': + self.reture_input = True + if "fpn" in self.neck_type: + self.fpn = FPN(self._out_channels[0], + self._out_channels[1], + name='fpn') + self._out_channels = [ + self._out_channels[0] // 2, self._out_channels[1] // 2 + ] + if "ssh" in self.neck_type: + self.ssh1 = SSH(self._out_channels[0], + self._out_channels[0], + name='ssh1') + self.ssh2 = SSH(self._out_channels[1], + self._out_channels[1], + name='ssh2') + self._out_channels = [self._out_channels[0], self._out_channels[1]] + + def forward(self, inputs): + if self.reture_input: + return inputs + output1, output2 = None, None + if "fpn" in self.neck_type: + backout_4, backout_1 = inputs + output1, output2 = self.fpn([backout_4, backout_1]) + if self.neck_type == "only_fpn": + return [output1, output2] + if self.neck_type == "only_ssh": + output1, output2 = inputs + feature1 = self.ssh1(output1) + feature2 = self.ssh2(output2) + return [feature1, feature2] + + @property + def out_shape(self): + return [ + ShapeSpec(channels=c) + for c in [self._out_channels[0], self._out_channels[1]] + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/centernet_fpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/centernet_fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..d4dded8886f5dc5b13cf1b498586ad15bc936947 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/centernet_fpn.py @@ -0,0 +1,426 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import math +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn.initializer import Uniform +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ppdet.modeling.backbones.hardnet import ConvLayer, HarDBlock +from ..shape_spec import ShapeSpec + +__all__ = ['CenterNetDLAFPN', 'CenterNetHarDNetFPN'] + + +# SGE attention +class BasicConv(nn.Layer): + def __init__(self, + in_planes, + out_planes, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + relu=True, + bn=True, + bias_attr=False): + super(BasicConv, self).__init__() + self.out_channels = out_planes + self.conv = nn.Conv2D( + in_planes, + out_planes, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + bias_attr=bias_attr) + self.bn = nn.BatchNorm2D( + out_planes, + epsilon=1e-5, + momentum=0.01, + weight_attr=False, + bias_attr=False) if bn else None + self.relu = nn.ReLU() if relu else None + + def forward(self, x): + x = self.conv(x) + if self.bn is not None: + x = self.bn(x) + if self.relu is not None: + x = self.relu(x) + return x + + +class ChannelPool(nn.Layer): + def forward(self, x): + return paddle.concat( + (paddle.max(x, 1).unsqueeze(1), paddle.mean(x, 1).unsqueeze(1)), + axis=1) + + +class SpatialGate(nn.Layer): + def __init__(self): + super(SpatialGate, self).__init__() + kernel_size = 7 + self.compress = ChannelPool() + self.spatial = BasicConv( + 2, + 1, + kernel_size, + stride=1, + padding=(kernel_size - 1) // 2, + relu=False) + + def forward(self, x): + x_compress = self.compress(x) + x_out = self.spatial(x_compress) + scale = F.sigmoid(x_out) # broadcasting + return x * scale + + +def fill_up_weights(up): + weight = up.weight.numpy() + f = math.ceil(weight.shape[2] / 2) + c = (2 * f - 1 - f % 2) / (2. * f) + for i in range(weight.shape[2]): + for j in range(weight.shape[3]): + weight[0, 0, i, j] = \ + (1 - math.fabs(i / f - c)) * (1 - math.fabs(j / f - c)) + for c in range(1, weight.shape[0]): + weight[c, 0, :, :] = weight[0, 0, :, :] + up.weight.set_value(weight) + + +class IDAUp(nn.Layer): + def __init__(self, ch_ins, ch_out, up_strides, dcn_v2=True): + super(IDAUp, self).__init__() + for i in range(1, len(ch_ins)): + ch_in = ch_ins[i] + up_s = int(up_strides[i]) + fan_in = ch_in * 3 * 3 + stdv = 1. / math.sqrt(fan_in) + proj = nn.Sequential( + ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=1, + use_dcn=dcn_v2, + bias_on=dcn_v2, + norm_decay=None, + dcn_lr_scale=1., + dcn_regularizer=None, + initializer=Uniform(-stdv, stdv)), + nn.ReLU()) + node = nn.Sequential( + ConvNormLayer( + ch_out, + ch_out, + filter_size=3, + stride=1, + use_dcn=dcn_v2, + bias_on=dcn_v2, + norm_decay=None, + dcn_lr_scale=1., + dcn_regularizer=None, + initializer=Uniform(-stdv, stdv)), + nn.ReLU()) + + kernel_size = up_s * 2 + fan_in = ch_out * kernel_size * kernel_size + stdv = 1. / math.sqrt(fan_in) + up = nn.Conv2DTranspose( + ch_out, + ch_out, + kernel_size=up_s * 2, + stride=up_s, + padding=up_s // 2, + groups=ch_out, + weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)), + bias_attr=False) + fill_up_weights(up) + setattr(self, 'proj_' + str(i), proj) + setattr(self, 'up_' + str(i), up) + setattr(self, 'node_' + str(i), node) + + def forward(self, inputs, start_level, end_level): + for i in range(start_level + 1, end_level): + upsample = getattr(self, 'up_' + str(i - start_level)) + project = getattr(self, 'proj_' + str(i - start_level)) + inputs[i] = project(inputs[i]) + inputs[i] = upsample(inputs[i]) + node = getattr(self, 'node_' + str(i - start_level)) + inputs[i] = node(paddle.add(inputs[i], inputs[i - 1])) + return inputs + + +class DLAUp(nn.Layer): + def __init__(self, start_level, channels, scales, ch_in=None, dcn_v2=True): + super(DLAUp, self).__init__() + self.start_level = start_level + if ch_in is None: + ch_in = channels + self.channels = channels + channels = list(channels) + scales = np.array(scales, dtype=int) + for i in range(len(channels) - 1): + j = -i - 2 + setattr( + self, + 'ida_{}'.format(i), + IDAUp( + ch_in[j:], + channels[j], + scales[j:] // scales[j], + dcn_v2=dcn_v2)) + scales[j + 1:] = scales[j] + ch_in[j + 1:] = [channels[j] for _ in channels[j + 1:]] + + def forward(self, inputs): + out = [inputs[-1]] # start with 32 + for i in range(len(inputs) - self.start_level - 1): + ida = getattr(self, 'ida_{}'.format(i)) + outputs = ida(inputs, len(inputs) - i - 2, len(inputs)) + out.insert(0, outputs[-1]) + return out + + +@register +@serializable +class CenterNetDLAFPN(nn.Layer): + """ + Args: + in_channels (list): number of input feature channels from backbone. + [16, 32, 64, 128, 256, 512] by default, means the channels of DLA-34 + down_ratio (int): the down ratio from images to heatmap, 4 by default + last_level (int): the last level of input feature fed into the upsamplng block + out_channel (int): the channel of the output feature, 0 by default means + the channel of the input feature whose down ratio is `down_ratio` + first_level (None): the first level of input feature fed into the upsamplng block. + if None, the first level stands for logs(down_ratio) + dcn_v2 (bool): whether use the DCNv2, True by default + with_sge (bool): whether use SGE attention, False by default + """ + + def __init__(self, + in_channels, + down_ratio=4, + last_level=5, + out_channel=0, + first_level=None, + dcn_v2=True, + with_sge=False): + super(CenterNetDLAFPN, self).__init__() + self.first_level = int(np.log2( + down_ratio)) if first_level is None else first_level + assert self.first_level >= 0, "first level in CenterNetDLAFPN should be greater or equal to 0, but received {}".format( + self.first_level) + self.down_ratio = down_ratio + self.last_level = last_level + scales = [2**i for i in range(len(in_channels[self.first_level:]))] + self.dla_up = DLAUp( + self.first_level, + in_channels[self.first_level:], + scales, + dcn_v2=dcn_v2) + self.out_channel = out_channel + if out_channel == 0: + self.out_channel = in_channels[self.first_level] + self.ida_up = IDAUp( + in_channels[self.first_level:self.last_level], + self.out_channel, + [2**i for i in range(self.last_level - self.first_level)], + dcn_v2=dcn_v2) + + self.with_sge = with_sge + if self.with_sge: + self.sge_attention = SpatialGate() + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape]} + + def forward(self, body_feats): + + inputs = [body_feats[i] for i in range(len(body_feats))] + + dla_up_feats = self.dla_up(inputs) + + ida_up_feats = [] + for i in range(self.last_level - self.first_level): + ida_up_feats.append(dla_up_feats[i].clone()) + + self.ida_up(ida_up_feats, 0, len(ida_up_feats)) + + feat = ida_up_feats[-1] + if self.with_sge: + feat = self.sge_attention(feat) + if self.down_ratio != 4: + feat = F.interpolate( + feat, + scale_factor=self.down_ratio // 4, + mode="bilinear", + align_corners=True) + return feat + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, stride=self.down_ratio)] + + +class TransitionUp(nn.Layer): + def __init__(self, in_channels, out_channels): + super().__init__() + + def forward(self, x, skip): + w, h = skip.shape[2], skip.shape[3] + out = F.interpolate(x, size=(w, h), mode="bilinear", align_corners=True) + out = paddle.concat([out, skip], 1) + return out + + +@register +@serializable +class CenterNetHarDNetFPN(nn.Layer): + """ + Args: + in_channels (list): number of input feature channels from backbone. + [96, 214, 458, 784] by default, means the channels of HarDNet85 + num_layers (int): HarDNet laters, 85 by default + down_ratio (int): the down ratio from images to heatmap, 4 by default + first_level (int|None): the first level of input feature fed into the upsamplng block. + if None, the first level stands for logs(down_ratio) - 1 + + last_level (int): the last level of input feature fed into the upsamplng block + out_channel (int): the channel of the output feature, 0 by default means + the channel of the input feature whose down ratio is `down_ratio` + """ + + def __init__(self, + in_channels, + num_layers=85, + down_ratio=4, + first_level=None, + last_level=4, + out_channel=0): + super(CenterNetHarDNetFPN, self).__init__() + self.first_level = int(np.log2( + down_ratio)) - 1 if first_level is None else first_level + assert self.first_level >= 0, "first level in CenterNetDLAFPN should be greater or equal to 0, but received {}".format( + self.first_level) + self.down_ratio = down_ratio + self.last_level = last_level + self.last_pool = nn.AvgPool2D(kernel_size=2, stride=2) + + assert num_layers in [68, 85], "HarDNet-{} not support.".format( + num_layers) + if num_layers == 85: + self.last_proj = ConvLayer(784, 256, kernel_size=1) + self.last_blk = HarDBlock(768, 80, 1.7, 8) + self.skip_nodes = [1, 3, 8, 13] + self.SC = [32, 32, 0] + gr = [64, 48, 28] + layers = [8, 8, 4] + ch_list2 = [224 + self.SC[0], 160 + self.SC[1], 96 + self.SC[2]] + channels = [96, 214, 458, 784] + self.skip_lv = 3 + + elif num_layers == 68: + self.last_proj = ConvLayer(654, 192, kernel_size=1) + self.last_blk = HarDBlock(576, 72, 1.7, 8) + self.skip_nodes = [1, 3, 8, 11] + self.SC = [32, 32, 0] + gr = [48, 32, 20] + layers = [8, 8, 4] + ch_list2 = [224 + self.SC[0], 96 + self.SC[1], 64 + self.SC[2]] + channels = [64, 124, 328, 654] + self.skip_lv = 2 + + self.transUpBlocks = nn.LayerList([]) + self.denseBlocksUp = nn.LayerList([]) + self.conv1x1_up = nn.LayerList([]) + self.avg9x9 = nn.AvgPool2D(kernel_size=(9, 9), stride=1, padding=(4, 4)) + prev_ch = self.last_blk.get_out_ch() + + for i in range(3): + skip_ch = channels[3 - i] + self.transUpBlocks.append(TransitionUp(prev_ch, prev_ch)) + if i < self.skip_lv: + cur_ch = prev_ch + skip_ch + else: + cur_ch = prev_ch + self.conv1x1_up.append( + ConvLayer( + cur_ch, ch_list2[i], kernel_size=1)) + cur_ch = ch_list2[i] + cur_ch -= self.SC[i] + cur_ch *= 3 + + blk = HarDBlock(cur_ch, gr[i], 1.7, layers[i]) + self.denseBlocksUp.append(blk) + prev_ch = blk.get_out_ch() + + prev_ch += self.SC[0] + self.SC[1] + self.SC[2] + self.out_channel = prev_ch + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape]} + + def forward(self, body_feats): + x = body_feats[-1] + x_sc = [] + x = self.last_proj(x) + x = self.last_pool(x) + x2 = self.avg9x9(x) + x3 = x / (x.sum((2, 3), keepdim=True) + 0.1) + x = paddle.concat([x, x2, x3], 1) + x = self.last_blk(x) + + for i in range(3): + skip_x = body_feats[3 - i] + x_up = self.transUpBlocks[i](x, skip_x) + x_ch = self.conv1x1_up[i](x_up) + if self.SC[i] > 0: + end = x_ch.shape[1] + new_st = end - self.SC[i] + x_sc.append(x_ch[:, new_st:, :, :]) + x_ch = x_ch[:, :new_st, :, :] + x2 = self.avg9x9(x_ch) + x3 = x_ch / (x_ch.sum((2, 3), keepdim=True) + 0.1) + x_new = paddle.concat([x_ch, x2, x3], 1) + x = self.denseBlocksUp[i](x_new) + + scs = [x] + for i in range(3): + if self.SC[i] > 0: + scs.insert( + 0, + F.interpolate( + x_sc[i], + size=(x.shape[2], x.shape[3]), + mode="bilinear", + align_corners=True)) + neck_feat = paddle.concat(scs, 1) + return neck_feat + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, stride=self.down_ratio)] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/channel_mapper.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/channel_mapper.py new file mode 100644 index 0000000000000000000000000000000000000000..6eff3f85476815351e2ec25750949c4cba74da84 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/channel_mapper.py @@ -0,0 +1,122 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is base on mmdet: git@github.com:open-mmlab/mmdetection.git +""" +import paddle.nn as nn + +from ppdet.core.workspace import register, serializable +from ..backbones.hrnet import ConvNormLayer +from ..shape_spec import ShapeSpec +from ..initializer import xavier_uniform_, constant_ + +__all__ = ['ChannelMapper'] + + +@register +@serializable +class ChannelMapper(nn.Layer): + """Channel Mapper to reduce/increase channels of backbone features. + + This is used to reduce/increase channels of backbone features. + + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale). + kernel_size (int, optional): kernel_size for reducing channels (used + at each scale). Default: 3. + conv_cfg (dict, optional): Config dict for convolution layer. + Default: None. + norm_cfg (dict, optional): Config dict for normalization layer. + Default: None. + act_cfg (dict, optional): Config dict for activation layer in + ConvModule. Default: dict(type='ReLU'). + num_outs (int, optional): Number of output feature maps. There + would be extra_convs when num_outs larger than the length + of in_channels. + init_cfg (dict or list[dict], optional): Initialization config dict. + + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=3, + norm_type="gn", + norm_groups=32, + act='relu', + num_outs=None, + init_cfg=dict( + type='Xavier', layer='Conv2d', distribution='uniform')): + super(ChannelMapper, self).__init__() + assert isinstance(in_channels, list) + self.extra_convs = None + if num_outs is None: + num_outs = len(in_channels) + self.convs = nn.LayerList() + for in_channel in in_channels: + self.convs.append( + ConvNormLayer( + ch_in=in_channel, + ch_out=out_channels, + filter_size=kernel_size, + norm_type='gn', + norm_groups=32, + act=act)) + + if num_outs > len(in_channels): + self.extra_convs = nn.LayerList() + for i in range(len(in_channels), num_outs): + if i == len(in_channels): + in_channel = in_channels[-1] + else: + in_channel = out_channels + self.extra_convs.append( + ConvNormLayer( + ch_in=in_channel, + ch_out=out_channels, + filter_size=3, + stride=2, + norm_type='gn', + norm_groups=32, + act=act)) + self.init_weights() + + def forward(self, inputs): + """Forward function.""" + assert len(inputs) == len(self.convs) + outs = [self.convs[i](inputs[i]) for i in range(len(inputs))] + if self.extra_convs: + for i in range(len(self.extra_convs)): + if i == 0: + outs.append(self.extra_convs[0](inputs[-1])) + else: + outs.append(self.extra_convs[i](outs[-1])) + return tuple(outs) + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channel, stride=1. / s) + for s in self.spatial_scales + ] + + def init_weights(self): + """Initialize the transformer weights.""" + for p in self.parameters(): + if p.rank() > 1: + xavier_uniform_(p) + if hasattr(p, 'bias') and p.bias is not None: + constant_(p.bais) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/csp_pan.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/csp_pan.py new file mode 100644 index 0000000000000000000000000000000000000000..5c3539a95c2e8e52a2d868dd53c547fb5673fba2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/csp_pan.py @@ -0,0 +1,363 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/necks/yolox_pafpn.py + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['CSPPAN'] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channel=96, + out_channel=96, + kernel_size=3, + stride=1, + groups=1, + act='leaky_relu'): + super(ConvBNLayer, self).__init__() + initializer = nn.initializer.KaimingUniform() + self.conv = nn.Conv2D( + in_channels=in_channel, + out_channels=out_channel, + kernel_size=kernel_size, + groups=groups, + padding=(kernel_size - 1) // 2, + stride=stride, + weight_attr=ParamAttr(initializer=initializer), + bias_attr=False) + self.bn = nn.BatchNorm2D(out_channel) + if act == "hard_swish": + act = 'hardswish' + self.act = act + + def forward(self, x): + x = self.bn(self.conv(x)) + if self.act: + x = getattr(F, self.act)(x) + return x + + +class DPModule(nn.Layer): + """ + Depth-wise and point-wise module. + Args: + in_channel (int): The input channels of this Module. + out_channel (int): The output channels of this Module. + kernel_size (int): The conv2d kernel size of this Module. + stride (int): The conv2d's stride of this Module. + act (str): The activation function of this Module, + Now support `leaky_relu` and `hard_swish`. + """ + + def __init__(self, + in_channel=96, + out_channel=96, + kernel_size=3, + stride=1, + act='leaky_relu', + use_act_in_out=True): + super(DPModule, self).__init__() + initializer = nn.initializer.KaimingUniform() + self.use_act_in_out = use_act_in_out + self.dwconv = nn.Conv2D( + in_channels=in_channel, + out_channels=out_channel, + kernel_size=kernel_size, + groups=out_channel, + padding=(kernel_size - 1) // 2, + stride=stride, + weight_attr=ParamAttr(initializer=initializer), + bias_attr=False) + self.bn1 = nn.BatchNorm2D(out_channel) + self.pwconv = nn.Conv2D( + in_channels=out_channel, + out_channels=out_channel, + kernel_size=1, + groups=1, + padding=0, + weight_attr=ParamAttr(initializer=initializer), + bias_attr=False) + self.bn2 = nn.BatchNorm2D(out_channel) + if act == "hard_swish": + act = 'hardswish' + self.act = act + + def forward(self, x): + x = self.bn1(self.dwconv(x)) + if self.act: + x = getattr(F, self.act)(x) + x = self.bn2(self.pwconv(x)) + if self.use_act_in_out and self.act: + x = getattr(F, self.act)(x) + return x + + +class DarknetBottleneck(nn.Layer): + """The basic bottleneck block used in Darknet. + + Each Block consists of two ConvModules and the input is added to the + final output. Each ConvModule is composed of Conv, BN, and act. + The first convLayer has filter size of 1x1 and the second one has the + filter size of 3x3. + + Args: + in_channels (int): The input channels of this Module. + out_channels (int): The output channels of this Module. + expansion (int): The kernel size of the convolution. Default: 0.5 + add_identity (bool): Whether to add identity to the out. + Default: True + use_depthwise (bool): Whether to use depthwise separable convolution. + Default: False + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=3, + expansion=0.5, + add_identity=True, + use_depthwise=False, + act="leaky_relu"): + super(DarknetBottleneck, self).__init__() + hidden_channels = int(out_channels * expansion) + conv_func = DPModule if use_depthwise else ConvBNLayer + self.conv1 = ConvBNLayer( + in_channel=in_channels, + out_channel=hidden_channels, + kernel_size=1, + act=act) + self.conv2 = conv_func( + in_channel=hidden_channels, + out_channel=out_channels, + kernel_size=kernel_size, + stride=1, + act=act) + self.add_identity = \ + add_identity and in_channels == out_channels + + def forward(self, x): + identity = x + out = self.conv1(x) + out = self.conv2(out) + + if self.add_identity: + return out + identity + else: + return out + + +class CSPLayer(nn.Layer): + """Cross Stage Partial Layer. + + Args: + in_channels (int): The input channels of the CSP layer. + out_channels (int): The output channels of the CSP layer. + expand_ratio (float): Ratio to adjust the number of channels of the + hidden layer. Default: 0.5 + num_blocks (int): Number of blocks. Default: 1 + add_identity (bool): Whether to add identity in blocks. + Default: True + use_depthwise (bool): Whether to depthwise separable convolution in + blocks. Default: False + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=3, + expand_ratio=0.5, + num_blocks=1, + add_identity=True, + use_depthwise=False, + act="leaky_relu"): + super().__init__() + mid_channels = int(out_channels * expand_ratio) + self.main_conv = ConvBNLayer(in_channels, mid_channels, 1, act=act) + self.short_conv = ConvBNLayer(in_channels, mid_channels, 1, act=act) + self.final_conv = ConvBNLayer( + 2 * mid_channels, out_channels, 1, act=act) + + self.blocks = nn.Sequential(* [ + DarknetBottleneck( + mid_channels, + mid_channels, + kernel_size, + 1.0, + add_identity, + use_depthwise, + act=act) for _ in range(num_blocks) + ]) + + def forward(self, x): + x_short = self.short_conv(x) + + x_main = self.main_conv(x) + x_main = self.blocks(x_main) + + x_final = paddle.concat((x_main, x_short), axis=1) + return self.final_conv(x_final) + + +class Channel_T(nn.Layer): + def __init__(self, + in_channels=[116, 232, 464], + out_channels=96, + act="leaky_relu"): + super(Channel_T, self).__init__() + self.convs = nn.LayerList() + for i in range(len(in_channels)): + self.convs.append( + ConvBNLayer( + in_channels[i], out_channels, 1, act=act)) + + def forward(self, x): + outs = [self.convs[i](x[i]) for i in range(len(x))] + return outs + + +@register +@serializable +class CSPPAN(nn.Layer): + """Path Aggregation Network with CSP module. + + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale) + kernel_size (int): The conv2d kernel size of this Module. + num_features (int): Number of output features of CSPPAN module. + num_csp_blocks (int): Number of bottlenecks in CSPLayer. Default: 1 + use_depthwise (bool): Whether to depthwise separable convolution in + blocks. Default: True + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=5, + num_features=3, + num_csp_blocks=1, + use_depthwise=True, + act='hard_swish', + spatial_scales=[0.125, 0.0625, 0.03125]): + super(CSPPAN, self).__init__() + self.conv_t = Channel_T(in_channels, out_channels, act=act) + in_channels = [out_channels] * len(spatial_scales) + self.in_channels = in_channels + self.out_channels = out_channels + self.spatial_scales = spatial_scales + self.num_features = num_features + conv_func = DPModule if use_depthwise else ConvBNLayer + + if self.num_features == 4: + self.first_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.second_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.spatial_scales.append(self.spatial_scales[-1] / 2) + + # build top-down blocks + self.upsample = nn.Upsample(scale_factor=2, mode='nearest') + self.top_down_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1, 0, -1): + self.top_down_blocks.append( + CSPLayer( + in_channels[idx - 1] * 2, + in_channels[idx - 1], + kernel_size=kernel_size, + num_blocks=num_csp_blocks, + add_identity=False, + use_depthwise=use_depthwise, + act=act)) + + # build bottom-up blocks + self.downsamples = nn.LayerList() + self.bottom_up_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1): + self.downsamples.append( + conv_func( + in_channels[idx], + in_channels[idx], + kernel_size=kernel_size, + stride=2, + act=act)) + self.bottom_up_blocks.append( + CSPLayer( + in_channels[idx] * 2, + in_channels[idx + 1], + kernel_size=kernel_size, + num_blocks=num_csp_blocks, + add_identity=False, + use_depthwise=use_depthwise, + act=act)) + + def forward(self, inputs): + """ + Args: + inputs (tuple[Tensor]): input features. + + Returns: + tuple[Tensor]: CSPPAN features. + """ + assert len(inputs) == len(self.in_channels) + inputs = self.conv_t(inputs) + + # top-down path + inner_outs = [inputs[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = inputs[idx - 1] + + upsample_feat = self.upsample(feat_heigh) + + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( + paddle.concat([upsample_feat, feat_low], 1)) + inner_outs.insert(0, inner_out) + + # bottom-up path + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsamples[idx](feat_low) + out = self.bottom_up_blocks[idx](paddle.concat( + [downsample_feat, feat_height], 1)) + outs.append(out) + + top_features = None + if self.num_features == 4: + top_features = self.first_top_conv(inputs[-1]) + top_features = top_features + self.second_top_conv(outs[-1]) + outs.append(top_features) + + return tuple(outs) + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channels, stride=1. / s) + for s in self.spatial_scales + ] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/custom_pan.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/custom_pan.py new file mode 100644 index 0000000000000000000000000000000000000000..cf7ec8412f4fef5f2c8778e6329b180a081c91b0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/custom_pan.py @@ -0,0 +1,398 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import math +import copy +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import DropBlock, MultiHeadAttention +from ppdet.modeling.ops import get_act_fn +from ..backbones.cspresnet import ConvBNLayer, BasicBlock +from ..shape_spec import ShapeSpec +from ..initializer import linear_init_ + +__all__ = ['CustomCSPPAN'] + + +def _get_clones(module, N): + return nn.LayerList([copy.deepcopy(module) for _ in range(N)]) + + +class SPP(nn.Layer): + def __init__(self, + ch_in, + ch_out, + k, + pool_size, + act='swish', + data_format='NCHW'): + super(SPP, self).__init__() + self.pool = [] + self.data_format = data_format + for i, size in enumerate(pool_size): + pool = self.add_sublayer( + 'pool{}'.format(i), + nn.MaxPool2D( + kernel_size=size, + stride=1, + padding=size // 2, + data_format=data_format, + ceil_mode=False)) + self.pool.append(pool) + self.conv = ConvBNLayer(ch_in, ch_out, k, padding=k // 2, act=act) + + def forward(self, x): + outs = [x] + for pool in self.pool: + outs.append(pool(x)) + if self.data_format == 'NCHW': + y = paddle.concat(outs, axis=1) + else: + y = paddle.concat(outs, axis=-1) + + y = self.conv(y) + return y + + +class CSPStage(nn.Layer): + def __init__(self, + block_fn, + ch_in, + ch_out, + n, + act='swish', + spp=False, + use_alpha=False): + super(CSPStage, self).__init__() + + ch_mid = int(ch_out // 2) + self.conv1 = ConvBNLayer(ch_in, ch_mid, 1, act=act) + self.conv2 = ConvBNLayer(ch_in, ch_mid, 1, act=act) + self.convs = nn.Sequential() + next_ch_in = ch_mid + for i in range(n): + self.convs.add_sublayer( + str(i), + eval(block_fn)(next_ch_in, + ch_mid, + act=act, + shortcut=False, + use_alpha=use_alpha)) + if i == (n - 1) // 2 and spp: + self.convs.add_sublayer( + 'spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13], act=act)) + next_ch_in = ch_mid + self.conv3 = ConvBNLayer(ch_mid * 2, ch_out, 1, act=act) + + def forward(self, x): + y1 = self.conv1(x) + y2 = self.conv2(x) + y2 = self.convs(y2) + y = paddle.concat([y1, y2], axis=1) + y = self.conv3(y) + return y + + +class TransformerEncoderLayer(nn.Layer): + def __init__(self, + d_model, + nhead, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(TransformerEncoderLayer, self).__init__() + attn_dropout = dropout if attn_dropout is None else attn_dropout + act_dropout = dropout if act_dropout is None else act_dropout + self.normalize_before = normalize_before + + self.self_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + # Implementation of Feedforward model + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(act_dropout, mode="upscale_in_train") + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout2 = nn.Dropout(dropout, mode="upscale_in_train") + self.activation = getattr(F, activation) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + + @staticmethod + def with_pos_embed(tensor, pos_embed): + return tensor if pos_embed is None else tensor + pos_embed + + def forward(self, src, src_mask=None, pos_embed=None): + residual = src + if self.normalize_before: + src = self.norm1(src) + q = k = self.with_pos_embed(src, pos_embed) + src = self.self_attn(q, k, value=src, attn_mask=src_mask) + + src = residual + self.dropout1(src) + if not self.normalize_before: + src = self.norm1(src) + + residual = src + if self.normalize_before: + src = self.norm2(src) + src = self.linear2(self.dropout(self.activation(self.linear1(src)))) + src = residual + self.dropout2(src) + if not self.normalize_before: + src = self.norm2(src) + return src + + +class TransformerEncoder(nn.Layer): + def __init__(self, encoder_layer, num_layers, norm=None): + super(TransformerEncoder, self).__init__() + self.layers = _get_clones(encoder_layer, num_layers) + self.num_layers = num_layers + self.norm = norm + + def forward(self, src, src_mask=None, pos_embed=None): + output = src + for layer in self.layers: + output = layer(output, src_mask=src_mask, pos_embed=pos_embed) + + if self.norm is not None: + output = self.norm(output) + + return output + + +@register +@serializable +class CustomCSPPAN(nn.Layer): + __shared__ = [ + 'norm_type', 'data_format', 'width_mult', 'depth_mult', 'trt', + 'eval_size' + ] + + def __init__(self, + in_channels=[256, 512, 1024], + out_channels=[1024, 512, 256], + norm_type='bn', + act='leaky', + stage_fn='CSPStage', + block_fn='BasicBlock', + stage_num=1, + block_num=3, + drop_block=False, + block_size=3, + keep_prob=0.9, + spp=False, + data_format='NCHW', + width_mult=1.0, + depth_mult=1.0, + use_alpha=False, + trt=False, + dim_feedforward=2048, + dropout=0.1, + activation='gelu', + nhead=4, + num_layers=4, + attn_dropout=None, + act_dropout=None, + normalize_before=False, + use_trans=False, + eval_size=None): + + super(CustomCSPPAN, self).__init__() + out_channels = [max(round(c * width_mult), 1) for c in out_channels] + block_num = max(round(block_num * depth_mult), 1) + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + self.num_blocks = len(in_channels) + self.data_format = data_format + self._out_channels = out_channels + + self.hidden_dim = in_channels[-1] + in_channels = in_channels[::-1] + + self.use_trans = use_trans + self.eval_size = eval_size + if use_trans: + if eval_size is not None: + self.pos_embed = self.build_2d_sincos_position_embedding( + eval_size[1] // 32, + eval_size[0] // 32, + embed_dim=self.hidden_dim) + else: + self.pos_embed = None + + encoder_layer = TransformerEncoderLayer( + self.hidden_dim, nhead, dim_feedforward, dropout, activation, + attn_dropout, act_dropout, normalize_before) + encoder_norm = nn.LayerNorm( + self.hidden_dim) if normalize_before else None + self.encoder = TransformerEncoder(encoder_layer, num_layers, + encoder_norm) + + fpn_stages = [] + fpn_routes = [] + for i, (ch_in, ch_out) in enumerate(zip(in_channels, out_channels)): + if i > 0: + ch_in += ch_pre // 2 + + stage = nn.Sequential() + for j in range(stage_num): + stage.add_sublayer( + str(j), + eval(stage_fn)(block_fn, + ch_in if j == 0 else ch_out, + ch_out, + block_num, + act=act, + spp=(spp and i == 0), + use_alpha=use_alpha)) + + if drop_block: + stage.add_sublayer('drop', DropBlock(block_size, keep_prob)) + + fpn_stages.append(stage) + + if i < self.num_blocks - 1: + fpn_routes.append( + ConvBNLayer( + ch_in=ch_out, + ch_out=ch_out // 2, + filter_size=1, + stride=1, + padding=0, + act=act)) + + ch_pre = ch_out + + self.fpn_stages = nn.LayerList(fpn_stages) + self.fpn_routes = nn.LayerList(fpn_routes) + + pan_stages = [] + pan_routes = [] + for i in reversed(range(self.num_blocks - 1)): + pan_routes.append( + ConvBNLayer( + ch_in=out_channels[i + 1], + ch_out=out_channels[i + 1], + filter_size=3, + stride=2, + padding=1, + act=act)) + + ch_in = out_channels[i] + out_channels[i + 1] + ch_out = out_channels[i] + stage = nn.Sequential() + for j in range(stage_num): + stage.add_sublayer( + str(j), + eval(stage_fn)(block_fn, + ch_in if j == 0 else ch_out, + ch_out, + block_num, + act=act, + spp=False, + use_alpha=use_alpha)) + if drop_block: + stage.add_sublayer('drop', DropBlock(block_size, keep_prob)) + + pan_stages.append(stage) + + self.pan_stages = nn.LayerList(pan_stages[::-1]) + self.pan_routes = nn.LayerList(pan_routes[::-1]) + + def build_2d_sincos_position_embedding( + self, + w, + h, + embed_dim=1024, + temperature=10000., ): + grid_w = paddle.arange(int(w), dtype=paddle.float32) + grid_h = paddle.arange(int(h), dtype=paddle.float32) + grid_w, grid_h = paddle.meshgrid(grid_w, grid_h) + assert embed_dim % 4 == 0, 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding' + pos_dim = embed_dim // 4 + omega = paddle.arange(pos_dim, dtype=paddle.float32) / pos_dim + omega = 1. / (temperature**omega) + + out_w = grid_w.flatten()[..., None] @omega[None] + out_h = grid_h.flatten()[..., None] @omega[None] + + pos_emb = paddle.concat( + [ + paddle.sin(out_w), paddle.cos(out_w), paddle.sin(out_h), + paddle.cos(out_h) + ], + axis=1)[None, :, :] + + return pos_emb + + def forward(self, blocks, for_mot=False): + if self.use_trans: + last_feat = blocks[-1] + n, c, h, w = last_feat.shape + + # flatten [B, C, H, W] to [B, HxW, C] + src_flatten = last_feat.flatten(2).transpose([0, 2, 1]) + if self.eval_size is not None and not self.training: + pos_embed = self.pos_embed + else: + pos_embed = self.build_2d_sincos_position_embedding( + w=w, h=h, embed_dim=self.hidden_dim) + + memory = self.encoder(src_flatten, pos_embed=pos_embed) + last_feat_encode = memory.transpose([0, 2, 1]).reshape([n, c, h, w]) + blocks[-1] = last_feat_encode + + blocks = blocks[::-1] + fpn_feats = [] + + for i, block in enumerate(blocks): + if i > 0: + block = paddle.concat([route, block], axis=1) + route = self.fpn_stages[i](block) + fpn_feats.append(route) + + if i < self.num_blocks - 1: + route = self.fpn_routes[i](route) + route = F.interpolate( + route, scale_factor=2., data_format=self.data_format) + + pan_feats = [fpn_feats[-1], ] + route = fpn_feats[-1] + for i in reversed(range(self.num_blocks - 1)): + block = fpn_feats[i] + route = self.pan_routes[i](route) + block = paddle.concat([route, block], axis=1) + route = self.pan_stages[i](block) + pan_feats.append(route) + + return pan_feats[::-1] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/dilated_encoder.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/dilated_encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..0bbc7fd1bc895933c4a6175dfa74d3b3d95071b3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/dilated_encoder.py @@ -0,0 +1,150 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import KaimingUniform, Constant, Normal +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec + +__all__ = ['DilatedEncoder'] + + +class Bottleneck(nn.Layer): + def __init__(self, in_channels, mid_channels, dilation): + super(Bottleneck, self).__init__() + self.conv1 = nn.Sequential(* [ + nn.Conv2D( + in_channels, + mid_channels, + 1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(0.0))), + nn.BatchNorm2D( + mid_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))), + nn.ReLU(), + ]) + self.conv2 = nn.Sequential(* [ + nn.Conv2D( + mid_channels, + mid_channels, + 3, + padding=dilation, + dilation=dilation, + weight_attr=ParamAttr(initializer=Normal( + mean=0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(0.0))), + nn.BatchNorm2D( + mid_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))), + nn.ReLU(), + ]) + self.conv3 = nn.Sequential(* [ + nn.Conv2D( + mid_channels, + in_channels, + 1, + padding=0, + weight_attr=ParamAttr(initializer=Normal( + mean=0, std=0.01)), + bias_attr=ParamAttr(initializer=Constant(0.0))), + nn.BatchNorm2D( + in_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))), + nn.ReLU(), + ]) + + def forward(self, x): + identity = x + y = self.conv3(self.conv2(self.conv1(x))) + return y + identity + + +@register +class DilatedEncoder(nn.Layer): + """ + DilatedEncoder used in YOLOF + """ + + def __init__(self, + in_channels=[2048], + out_channels=[512], + block_mid_channels=128, + num_residual_blocks=4, + block_dilations=[2, 4, 6, 8]): + super(DilatedEncoder, self).__init__() + self.in_channels = in_channels + self.out_channels = out_channels + assert len(self.in_channels) == 1, "YOLOF only has one level feature." + assert len(self.out_channels) == 1, "YOLOF only has one level feature." + + self.block_mid_channels = block_mid_channels + self.num_residual_blocks = num_residual_blocks + self.block_dilations = block_dilations + + out_ch = self.out_channels[0] + self.lateral_conv = nn.Conv2D( + self.in_channels[0], + out_ch, + 1, + weight_attr=ParamAttr(initializer=KaimingUniform( + negative_slope=1, nonlinearity='leaky_relu')), + bias_attr=ParamAttr(initializer=Constant(value=0.0))) + self.lateral_norm = nn.BatchNorm2D( + out_ch, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + self.fpn_conv = nn.Conv2D( + out_ch, + out_ch, + 3, + padding=1, + weight_attr=ParamAttr(initializer=KaimingUniform( + negative_slope=1, nonlinearity='leaky_relu'))) + self.fpn_norm = nn.BatchNorm2D( + out_ch, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + encoder_blocks = [] + for i in range(self.num_residual_blocks): + encoder_blocks.append( + Bottleneck( + out_ch, + self.block_mid_channels, + dilation=block_dilations[i])) + self.dilated_encoder_blocks = nn.Sequential(*encoder_blocks) + + def forward(self, inputs, for_mot=False): + out = self.lateral_norm(self.lateral_conv(inputs[0])) + out = self.fpn_norm(self.fpn_conv(out)) + out = self.dilated_encoder_blocks(out) + return [out] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self.out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/es_pan.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/es_pan.py new file mode 100644 index 0000000000000000000000000000000000000000..bc2487733de7e1a6dfe53c60653fcdf635205b88 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/es_pan.py @@ -0,0 +1,212 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register, serializable + +from ..shape_spec import ShapeSpec +from ..backbones.esnet import SEModule +from .csp_pan import ConvBNLayer, Channel_T, DPModule + +__all__ = ['ESPAN'] + + +class ES_Block(nn.Layer): + def __init__(self, + in_channels, + mid_channels, + out_channels, + kernel_size=5, + stride=1, + act='leaky_relu'): + super(ES_Block, self).__init__() + self._residual = ConvBNLayer( + in_channel=in_channels, + out_channel=out_channels, + kernel_size=1, + stride=1, + groups=1, + act=act) + self._conv_pw = ConvBNLayer( + in_channel=in_channels, + out_channel=mid_channels // 2, + kernel_size=1, + stride=1, + groups=1, + act=act) + self._conv_dw = ConvBNLayer( + in_channel=mid_channels // 2, + out_channel=mid_channels // 2, + kernel_size=kernel_size, + stride=stride, + groups=mid_channels // 2, + act=None) + self._se = SEModule(mid_channels) + + self._conv_linear = ConvBNLayer( + in_channel=mid_channels, + out_channel=out_channels, + kernel_size=1, + stride=1, + groups=1, + act=act) + + self._out_conv = ConvBNLayer( + in_channel=out_channels * 2, + out_channel=out_channels, + kernel_size=1, + stride=1, + groups=1, + act=act) + + def forward(self, inputs): + x1 = self._residual(inputs) + x2 = self._conv_pw(inputs) + x3 = self._conv_dw(x2) + x3 = paddle.concat([x2, x3], axis=1) + x3 = self._se(x3) + x3 = self._conv_linear(x3) + out = paddle.concat([x1, x3], axis=1) + out = self._out_conv(out) + return out + + +@register +@serializable +class ESPAN(nn.Layer): + """Path Aggregation Network with ES module. + + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale) + kernel_size (int): The conv2d kernel size of this Module. + num_features (int): Number of output features of CSPPAN module. + num_csp_blocks (int): Number of bottlenecks in CSPLayer. Default: 1 + use_depthwise (bool): Whether to depthwise separable convolution in + blocks. Default: True + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=5, + num_features=3, + use_depthwise=True, + act='hard_swish', + spatial_scales=[0.125, 0.0625, 0.03125]): + super(ESPAN, self).__init__() + self.conv_t = Channel_T(in_channels, out_channels, act=act) + in_channels = [out_channels] * len(spatial_scales) + self.in_channels = in_channels + self.out_channels = out_channels + self.spatial_scales = spatial_scales + self.num_features = num_features + conv_func = DPModule if use_depthwise else ConvBNLayer + + if self.num_features == 4: + self.first_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.second_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.spatial_scales.append(self.spatial_scales[-1] / 2) + + # build top-down blocks + self.upsample = nn.Upsample(scale_factor=2, mode='nearest') + self.top_down_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1, 0, -1): + self.top_down_blocks.append( + ES_Block( + in_channels[idx - 1] * 2, + in_channels[idx - 1], + in_channels[idx - 1], + kernel_size=kernel_size, + stride=1, + act=act)) + + # build bottom-up blocks + self.downsamples = nn.LayerList() + self.bottom_up_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1): + self.downsamples.append( + conv_func( + in_channels[idx], + in_channels[idx], + kernel_size=kernel_size, + stride=2, + act=act)) + self.bottom_up_blocks.append( + ES_Block( + in_channels[idx] * 2, + in_channels[idx + 1], + in_channels[idx + 1], + kernel_size=kernel_size, + stride=1, + act=act)) + + def forward(self, inputs): + """ + Args: + inputs (tuple[Tensor]): input features. + + Returns: + tuple[Tensor]: CSPPAN features. + """ + assert len(inputs) == len(self.in_channels) + inputs = self.conv_t(inputs) + + # top-down path + inner_outs = [inputs[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = inputs[idx - 1] + + upsample_feat = self.upsample(feat_heigh) + + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( + paddle.concat([upsample_feat, feat_low], 1)) + inner_outs.insert(0, inner_out) + + # bottom-up path + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsamples[idx](feat_low) + out = self.bottom_up_blocks[idx](paddle.concat( + [downsample_feat, feat_height], 1)) + outs.append(out) + + top_features = None + if self.num_features == 4: + top_features = self.first_top_conv(inputs[-1]) + top_features = top_features + self.second_top_conv(outs[-1]) + outs.append(top_features) + + return tuple(outs) + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channels, stride=1. / s) + for s in self.spatial_scales + ] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/fpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..d08ca415c7acce7d4495dae5834d53961ea9df57 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/fpn.py @@ -0,0 +1,231 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import XavierUniform + +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ..shape_spec import ShapeSpec + +__all__ = ['FPN'] + + +@register +@serializable +class FPN(nn.Layer): + """ + Feature Pyramid Network, see https://arxiv.org/abs/1612.03144 + + Args: + in_channels (list[int]): input channels of each level which can be + derived from the output shape of backbone by from_config + out_channel (int): output channel of each level + spatial_scales (list[float]): the spatial scales between input feature + maps and original input image which can be derived from the output + shape of backbone by from_config + has_extra_convs (bool): whether to add extra conv to the last level. + default False + extra_stage (int): the number of extra stages added to the last level. + default 1 + use_c5 (bool): Whether to use c5 as the input of extra stage, + otherwise p5 is used. default True + norm_type (string|None): The normalization type in FPN module. If + norm_type is None, norm will not be used after conv and if + norm_type is string, bn, gn, sync_bn are available. default None + norm_decay (float): weight decay for normalization layer weights. + default 0. + freeze_norm (bool): whether to freeze normalization layer. + default False + relu_before_extra_convs (bool): whether to add relu before extra convs. + default False + + """ + + def __init__(self, + in_channels, + out_channel, + spatial_scales=[0.25, 0.125, 0.0625, 0.03125], + has_extra_convs=False, + extra_stage=1, + use_c5=True, + norm_type=None, + norm_decay=0., + freeze_norm=False, + relu_before_extra_convs=True): + super(FPN, self).__init__() + self.out_channel = out_channel + for s in range(extra_stage): + spatial_scales = spatial_scales + [spatial_scales[-1] / 2.] + self.spatial_scales = spatial_scales + self.has_extra_convs = has_extra_convs + self.extra_stage = extra_stage + self.use_c5 = use_c5 + self.relu_before_extra_convs = relu_before_extra_convs + self.norm_type = norm_type + self.norm_decay = norm_decay + self.freeze_norm = freeze_norm + + self.lateral_convs = [] + self.fpn_convs = [] + fan = out_channel * 3 * 3 + + # stage index 0,1,2,3 stands for res2,res3,res4,res5 on ResNet Backbone + # 0 <= st_stage < ed_stage <= 3 + st_stage = 4 - len(in_channels) + ed_stage = st_stage + len(in_channels) - 1 + for i in range(st_stage, ed_stage + 1): + if i == 3: + lateral_name = 'fpn_inner_res5_sum' + else: + lateral_name = 'fpn_inner_res{}_sum_lateral'.format(i + 2) + in_c = in_channels[i - st_stage] + if self.norm_type is not None: + lateral = self.add_sublayer( + lateral_name, + ConvNormLayer( + ch_in=in_c, + ch_out=out_channel, + filter_size=1, + stride=1, + norm_type=self.norm_type, + norm_decay=self.norm_decay, + freeze_norm=self.freeze_norm, + initializer=XavierUniform(fan_out=in_c))) + else: + lateral = self.add_sublayer( + lateral_name, + nn.Conv2D( + in_channels=in_c, + out_channels=out_channel, + kernel_size=1, + weight_attr=ParamAttr( + initializer=XavierUniform(fan_out=in_c)))) + self.lateral_convs.append(lateral) + + fpn_name = 'fpn_res{}_sum'.format(i + 2) + if self.norm_type is not None: + fpn_conv = self.add_sublayer( + fpn_name, + ConvNormLayer( + ch_in=out_channel, + ch_out=out_channel, + filter_size=3, + stride=1, + norm_type=self.norm_type, + norm_decay=self.norm_decay, + freeze_norm=self.freeze_norm, + initializer=XavierUniform(fan_out=fan))) + else: + fpn_conv = self.add_sublayer( + fpn_name, + nn.Conv2D( + in_channels=out_channel, + out_channels=out_channel, + kernel_size=3, + padding=1, + weight_attr=ParamAttr( + initializer=XavierUniform(fan_out=fan)))) + self.fpn_convs.append(fpn_conv) + + # add extra conv levels for RetinaNet(use_c5)/FCOS(use_p5) + if self.has_extra_convs: + for i in range(self.extra_stage): + lvl = ed_stage + 1 + i + if i == 0 and self.use_c5: + in_c = in_channels[-1] + else: + in_c = out_channel + extra_fpn_name = 'fpn_{}'.format(lvl + 2) + if self.norm_type is not None: + extra_fpn_conv = self.add_sublayer( + extra_fpn_name, + ConvNormLayer( + ch_in=in_c, + ch_out=out_channel, + filter_size=3, + stride=2, + norm_type=self.norm_type, + norm_decay=self.norm_decay, + freeze_norm=self.freeze_norm, + initializer=XavierUniform(fan_out=fan))) + else: + extra_fpn_conv = self.add_sublayer( + extra_fpn_name, + nn.Conv2D( + in_channels=in_c, + out_channels=out_channel, + kernel_size=3, + stride=2, + padding=1, + weight_attr=ParamAttr( + initializer=XavierUniform(fan_out=fan)))) + self.fpn_convs.append(extra_fpn_conv) + + @classmethod + def from_config(cls, cfg, input_shape): + return { + 'in_channels': [i.channels for i in input_shape], + 'spatial_scales': [1.0 / i.stride for i in input_shape], + } + + def forward(self, body_feats): + laterals = [] + num_levels = len(body_feats) + for i in range(num_levels): + laterals.append(self.lateral_convs[i](body_feats[i])) + + for i in range(1, num_levels): + lvl = num_levels - i + upsample = F.interpolate( + laterals[lvl], + scale_factor=2., + mode='nearest', ) + laterals[lvl - 1] += upsample + + fpn_output = [] + for lvl in range(num_levels): + fpn_output.append(self.fpn_convs[lvl](laterals[lvl])) + + if self.extra_stage > 0: + # use max pool to get more levels on top of outputs (Faster R-CNN, Mask R-CNN) + if not self.has_extra_convs: + assert self.extra_stage == 1, 'extra_stage should be 1 if FPN has not extra convs' + fpn_output.append(F.max_pool2d(fpn_output[-1], 1, stride=2)) + # add extra conv levels for RetinaNet(use_c5)/FCOS(use_p5) + else: + if self.use_c5: + extra_source = body_feats[-1] + else: + extra_source = fpn_output[-1] + fpn_output.append(self.fpn_convs[num_levels](extra_source)) + + for i in range(1, self.extra_stage): + if self.relu_before_extra_convs: + fpn_output.append(self.fpn_convs[num_levels + i](F.relu( + fpn_output[-1]))) + else: + fpn_output.append(self.fpn_convs[num_levels + i]( + fpn_output[-1])) + return fpn_output + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channel, stride=1. / s) + for s in self.spatial_scales + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/hrfpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/hrfpn.py new file mode 100644 index 0000000000000000000000000000000000000000..5c45c9974b3bd213747cc7b6f0f5f670f38c61bf --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/hrfpn.py @@ -0,0 +1,129 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn.functional as F +import paddle.nn as nn +from ppdet.core.workspace import register +from ..shape_spec import ShapeSpec + +__all__ = ['HRFPN'] + + +@register +class HRFPN(nn.Layer): + """ + Args: + in_channels (list): number of input feature channels from backbone + out_channel (int): number of output feature channels + share_conv (bool): whether to share conv for different layers' reduction + extra_stage (int): add extra stage for returning HRFPN fpn_feats + spatial_scales (list): feature map scaling factor + """ + + def __init__(self, + in_channels=[18, 36, 72, 144], + out_channel=256, + share_conv=False, + extra_stage=1, + spatial_scales=[1. / 4, 1. / 8, 1. / 16, 1. / 32], + use_bias=False): + super(HRFPN, self).__init__() + in_channel = sum(in_channels) + self.in_channel = in_channel + self.out_channel = out_channel + self.share_conv = share_conv + for i in range(extra_stage): + spatial_scales = spatial_scales + [spatial_scales[-1] / 2.] + self.spatial_scales = spatial_scales + self.num_out = len(self.spatial_scales) + self.use_bias = use_bias + bias_attr = False if use_bias is False else None + + self.reduction = nn.Conv2D( + in_channels=in_channel, + out_channels=out_channel, + kernel_size=1, + bias_attr=bias_attr) + + if share_conv: + self.fpn_conv = nn.Conv2D( + in_channels=out_channel, + out_channels=out_channel, + kernel_size=3, + padding=1, + bias_attr=bias_attr) + else: + self.fpn_conv = [] + for i in range(self.num_out): + conv_name = "fpn_conv_" + str(i) + conv = self.add_sublayer( + conv_name, + nn.Conv2D( + in_channels=out_channel, + out_channels=out_channel, + kernel_size=3, + padding=1, + bias_attr=bias_attr)) + self.fpn_conv.append(conv) + + def forward(self, body_feats): + num_backbone_stages = len(body_feats) + + outs = [] + outs.append(body_feats[0]) + + # resize + for i in range(1, num_backbone_stages): + resized = F.interpolate( + body_feats[i], scale_factor=2**i, mode='bilinear') + outs.append(resized) + + # concat + out = paddle.concat(outs, axis=1) + assert out.shape[ + 1] == self.in_channel, 'in_channel should be {}, be received {}'.format( + out.shape[1], self.in_channel) + + # reduction + out = self.reduction(out) + + # conv + outs = [out] + for i in range(1, self.num_out): + outs.append(F.avg_pool2d(out, kernel_size=2**i, stride=2**i)) + outputs = [] + + for i in range(self.num_out): + conv_func = self.fpn_conv if self.share_conv else self.fpn_conv[i] + conv = conv_func(outs[i]) + outputs.append(conv) + + fpn_feats = [outputs[k] for k in range(self.num_out)] + return fpn_feats + + @classmethod + def from_config(cls, cfg, input_shape): + return { + 'in_channels': [i.channels for i in input_shape], + 'spatial_scales': [1.0 / i.stride for i in input_shape], + } + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channel, stride=1. / s) + for s in self.spatial_scales + ] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/lc_pan.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/lc_pan.py new file mode 100644 index 0000000000000000000000000000000000000000..0c59c8a38b10abfb442e56a23b5fabe86bbb9a3a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/lc_pan.py @@ -0,0 +1,168 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register, serializable + +from ..shape_spec import ShapeSpec +from ..backbones.lcnet import DepthwiseSeparable +from .csp_pan import ConvBNLayer, Channel_T, DPModule + +__all__ = ['LCPAN'] + + +@register +@serializable +class LCPAN(nn.Layer): + """Path Aggregation Network with LCNet module. + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale) + kernel_size (int): The conv2d kernel size of this Module. + num_features (int): Number of output features of CSPPAN module. + num_csp_blocks (int): Number of bottlenecks in CSPLayer. Default: 1 + use_depthwise (bool): Whether to depthwise separable convolution in + blocks. Default: True + """ + + def __init__(self, + in_channels, + out_channels, + kernel_size=5, + num_features=3, + use_depthwise=True, + act='hard_swish', + spatial_scales=[0.125, 0.0625, 0.03125]): + super(LCPAN, self).__init__() + self.conv_t = Channel_T(in_channels, out_channels, act=act) + in_channels = [out_channels] * len(spatial_scales) + self.in_channels = in_channels + self.out_channels = out_channels + self.spatial_scales = spatial_scales + self.num_features = num_features + conv_func = DPModule if use_depthwise else ConvBNLayer + + NET_CONFIG = { + #k, in_c, out_c, stride, use_se + "block1": [ + [kernel_size, out_channels * 2, out_channels * 2, 1, False], + [kernel_size, out_channels * 2, out_channels, 1, False], + ], + "block2": [ + [kernel_size, out_channels * 2, out_channels * 2, 1, False], + [kernel_size, out_channels * 2, out_channels, 1, False], + ] + } + + if self.num_features == 4: + self.first_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.second_top_conv = conv_func( + in_channels[0], in_channels[0], kernel_size, stride=2, act=act) + self.spatial_scales.append(self.spatial_scales[-1] / 2) + + # build top-down blocks + self.upsample = nn.Upsample(scale_factor=2, mode='nearest') + self.top_down_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1, 0, -1): + self.top_down_blocks.append( + nn.Sequential(* [ + DepthwiseSeparable( + num_channels=in_c, + num_filters=out_c, + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG[ + "block1"]) + ])) + + # build bottom-up blocks + self.downsamples = nn.LayerList() + self.bottom_up_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1): + self.downsamples.append( + conv_func( + in_channels[idx], + in_channels[idx], + kernel_size=kernel_size, + stride=2, + act=act)) + self.bottom_up_blocks.append( + nn.Sequential(* [ + DepthwiseSeparable( + num_channels=in_c, + num_filters=out_c, + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG[ + "block2"]) + ])) + + def forward(self, inputs): + """ + Args: + inputs (tuple[Tensor]): input features. + Returns: + tuple[Tensor]: CSPPAN features. + """ + assert len(inputs) == len(self.in_channels) + inputs = self.conv_t(inputs) + + # top-down path + inner_outs = [inputs[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = inputs[idx - 1] + + upsample_feat = self.upsample(feat_heigh) + + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( + paddle.concat([upsample_feat, feat_low], 1)) + inner_outs.insert(0, inner_out) + + # bottom-up path + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsamples[idx](feat_low) + out = self.bottom_up_blocks[idx](paddle.concat( + [downsample_feat, feat_height], 1)) + outs.append(out) + + top_features = None + if self.num_features == 4: + top_features = self.first_top_conv(inputs[-1]) + top_features = top_features + self.second_top_conv(outs[-1]) + outs.append(top_features) + + return tuple(outs) + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=self.out_channels, stride=1. / s) + for s in self.spatial_scales + ] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/ttf_fpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/ttf_fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..60cc69f8081198fa7436894a2ed4b7d3944eeb10 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/ttf_fpn.py @@ -0,0 +1,242 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant, Uniform, Normal, XavierUniform +from ppdet.core.workspace import register, serializable +from paddle.regularizer import L2Decay +from ppdet.modeling.layers import DeformableConvV2, ConvNormLayer, LiteConv +import math +from ppdet.modeling.ops import batch_norm +from ..shape_spec import ShapeSpec + +__all__ = ['TTFFPN'] + + +class Upsample(nn.Layer): + def __init__(self, ch_in, ch_out, norm_type='bn'): + super(Upsample, self).__init__() + fan_in = ch_in * 3 * 3 + stdv = 1. / math.sqrt(fan_in) + self.dcn = DeformableConvV2( + ch_in, + ch_out, + kernel_size=3, + weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)), + bias_attr=ParamAttr( + initializer=Constant(0), + regularizer=L2Decay(0.), + learning_rate=2.), + lr_scale=2., + regularizer=L2Decay(0.)) + + self.bn = batch_norm( + ch_out, norm_type=norm_type, initializer=Constant(1.)) + + def forward(self, feat): + dcn = self.dcn(feat) + bn = self.bn(dcn) + relu = F.relu(bn) + out = F.interpolate(relu, scale_factor=2., mode='bilinear') + return out + + +class DeConv(nn.Layer): + def __init__(self, ch_in, ch_out, norm_type='bn'): + super(DeConv, self).__init__() + self.deconv = nn.Sequential() + conv1 = ConvNormLayer( + ch_in=ch_in, + ch_out=ch_out, + stride=1, + filter_size=1, + norm_type=norm_type, + initializer=XavierUniform()) + conv2 = nn.Conv2DTranspose( + in_channels=ch_out, + out_channels=ch_out, + kernel_size=4, + padding=1, + stride=2, + groups=ch_out, + weight_attr=ParamAttr(initializer=XavierUniform()), + bias_attr=False) + bn = batch_norm(ch_out, norm_type=norm_type, norm_decay=0.) + conv3 = ConvNormLayer( + ch_in=ch_out, + ch_out=ch_out, + stride=1, + filter_size=1, + norm_type=norm_type, + initializer=XavierUniform()) + + self.deconv.add_sublayer('conv1', conv1) + self.deconv.add_sublayer('relu6_1', nn.ReLU6()) + self.deconv.add_sublayer('conv2', conv2) + self.deconv.add_sublayer('bn', bn) + self.deconv.add_sublayer('relu6_2', nn.ReLU6()) + self.deconv.add_sublayer('conv3', conv3) + self.deconv.add_sublayer('relu6_3', nn.ReLU6()) + + def forward(self, inputs): + return self.deconv(inputs) + + +class LiteUpsample(nn.Layer): + def __init__(self, ch_in, ch_out, norm_type='bn'): + super(LiteUpsample, self).__init__() + self.deconv = DeConv(ch_in, ch_out, norm_type=norm_type) + self.conv = LiteConv(ch_in, ch_out, norm_type=norm_type) + + def forward(self, inputs): + deconv_up = self.deconv(inputs) + conv = self.conv(inputs) + interp_up = F.interpolate(conv, scale_factor=2., mode='bilinear') + return deconv_up + interp_up + + +class ShortCut(nn.Layer): + def __init__(self, + layer_num, + ch_in, + ch_out, + norm_type='bn', + lite_neck=False, + name=None): + super(ShortCut, self).__init__() + shortcut_conv = nn.Sequential() + for i in range(layer_num): + fan_out = 3 * 3 * ch_out + std = math.sqrt(2. / fan_out) + in_channels = ch_in if i == 0 else ch_out + shortcut_name = name + '.conv.{}'.format(i) + if lite_neck: + shortcut_conv.add_sublayer( + shortcut_name, + LiteConv( + in_channels=in_channels, + out_channels=ch_out, + with_act=i < layer_num - 1, + norm_type=norm_type)) + else: + shortcut_conv.add_sublayer( + shortcut_name, + nn.Conv2D( + in_channels=in_channels, + out_channels=ch_out, + kernel_size=3, + padding=1, + weight_attr=ParamAttr(initializer=Normal(0, std)), + bias_attr=ParamAttr( + learning_rate=2., regularizer=L2Decay(0.)))) + if i < layer_num - 1: + shortcut_conv.add_sublayer(shortcut_name + '.act', + nn.ReLU()) + self.shortcut = self.add_sublayer('shortcut', shortcut_conv) + + def forward(self, feat): + out = self.shortcut(feat) + return out + + +@register +@serializable +class TTFFPN(nn.Layer): + """ + Args: + in_channels (list): number of input feature channels from backbone. + [128,256,512,1024] by default, means the channels of DarkNet53 + backbone return_idx [1,2,3,4]. + planes (list): the number of output feature channels of FPN. + [256, 128, 64] by default + shortcut_num (list): the number of convolution layers in each shortcut. + [3,2,1] by default, means DarkNet53 backbone return_idx_1 has 3 convs + in its shortcut, return_idx_2 has 2 convs and return_idx_3 has 1 conv. + norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional. + bn by default + lite_neck (bool): whether to use lite conv in TTFNet FPN, + False by default + fusion_method (string): the method to fusion upsample and lateral layer. + 'add' and 'concat' are optional, add by default + """ + + __shared__ = ['norm_type'] + + def __init__(self, + in_channels, + planes=[256, 128, 64], + shortcut_num=[3, 2, 1], + norm_type='bn', + lite_neck=False, + fusion_method='add'): + super(TTFFPN, self).__init__() + self.planes = planes + self.shortcut_num = shortcut_num[::-1] + self.shortcut_len = len(shortcut_num) + self.ch_in = in_channels[::-1] + self.fusion_method = fusion_method + + self.upsample_list = [] + self.shortcut_list = [] + self.upper_list = [] + for i, out_c in enumerate(self.planes): + in_c = self.ch_in[i] if i == 0 else self.upper_list[-1] + upsample_module = LiteUpsample if lite_neck else Upsample + upsample = self.add_sublayer( + 'upsample.' + str(i), + upsample_module( + in_c, out_c, norm_type=norm_type)) + self.upsample_list.append(upsample) + if i < self.shortcut_len: + shortcut = self.add_sublayer( + 'shortcut.' + str(i), + ShortCut( + self.shortcut_num[i], + self.ch_in[i + 1], + out_c, + norm_type=norm_type, + lite_neck=lite_neck, + name='shortcut.' + str(i))) + self.shortcut_list.append(shortcut) + if self.fusion_method == 'add': + upper_c = out_c + elif self.fusion_method == 'concat': + upper_c = out_c * 2 + else: + raise ValueError('Illegal fusion method. Expected add or\ + concat, but received {}'.format(self.fusion_method)) + self.upper_list.append(upper_c) + + def forward(self, inputs): + feat = inputs[-1] + for i, out_c in enumerate(self.planes): + feat = self.upsample_list[i](feat) + if i < self.shortcut_len: + shortcut = self.shortcut_list[i](inputs[-i - 2]) + if self.fusion_method == 'add': + feat = feat + shortcut + else: + feat = paddle.concat([feat, shortcut], axis=1) + return feat + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=self.upper_list[-1], )] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/necks/yolo_fpn.py b/PaddleDetection-release-2.6/ppdet/modeling/necks/yolo_fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..79f4cead360f872233f48be739e2357d4c9e1121 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/necks/yolo_fpn.py @@ -0,0 +1,1099 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import DropBlock +from ppdet.modeling.ops import get_act_fn +from ..backbones.darknet import ConvBNLayer +from ..shape_spec import ShapeSpec +from ..backbones.csp_darknet import BaseConv, DWConv, CSPLayer + +__all__ = ['YOLOv3FPN', 'PPYOLOFPN', 'PPYOLOTinyFPN', 'PPYOLOPAN', 'YOLOCSPPAN'] + + +def add_coord(x, data_format): + b = paddle.shape(x)[0] + if data_format == 'NCHW': + h, w = x.shape[2], x.shape[3] + else: + h, w = x.shape[1], x.shape[2] + + gx = paddle.cast(paddle.arange(w) / ((w - 1.) * 2.0) - 1., x.dtype) + gy = paddle.cast(paddle.arange(h) / ((h - 1.) * 2.0) - 1., x.dtype) + + if data_format == 'NCHW': + gx = gx.reshape([1, 1, 1, w]).expand([b, 1, h, w]) + gy = gy.reshape([1, 1, h, 1]).expand([b, 1, h, w]) + else: + gx = gx.reshape([1, 1, w, 1]).expand([b, h, w, 1]) + gy = gy.reshape([1, h, 1, 1]).expand([b, h, w, 1]) + + gx.stop_gradient = True + gy.stop_gradient = True + return gx, gy + + +class YoloDetBlock(nn.Layer): + def __init__(self, + ch_in, + channel, + norm_type, + freeze_norm=False, + name='', + data_format='NCHW'): + """ + YOLODetBlock layer for yolov3, see https://arxiv.org/abs/1804.02767 + + Args: + ch_in (int): input channel + channel (int): base channel + norm_type (str): batch norm type + freeze_norm (bool): whether to freeze norm, default False + name (str): layer name + data_format (str): data format, NCHW or NHWC + """ + super(YoloDetBlock, self).__init__() + self.ch_in = ch_in + self.channel = channel + assert channel % 2 == 0, \ + "channel {} cannot be divided by 2".format(channel) + conv_def = [ + ['conv0', ch_in, channel, 1, '.0.0'], + ['conv1', channel, channel * 2, 3, '.0.1'], + ['conv2', channel * 2, channel, 1, '.1.0'], + ['conv3', channel, channel * 2, 3, '.1.1'], + ['route', channel * 2, channel, 1, '.2'], + ] + + self.conv_module = nn.Sequential() + for idx, (conv_name, ch_in, ch_out, filter_size, + post_name) in enumerate(conv_def): + self.conv_module.add_sublayer( + conv_name, + ConvBNLayer( + ch_in=ch_in, + ch_out=ch_out, + filter_size=filter_size, + padding=(filter_size - 1) // 2, + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name + post_name)) + + self.tip = ConvBNLayer( + ch_in=channel, + ch_out=channel * 2, + filter_size=3, + padding=1, + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name + '.tip') + + def forward(self, inputs): + route = self.conv_module(inputs) + tip = self.tip(route) + return route, tip + + +class SPP(nn.Layer): + def __init__(self, + ch_in, + ch_out, + k, + pool_size, + norm_type='bn', + freeze_norm=False, + name='', + act='leaky', + data_format='NCHW'): + """ + SPP layer, which consist of four pooling layer follwed by conv layer + + Args: + ch_in (int): input channel of conv layer + ch_out (int): output channel of conv layer + k (int): kernel size of conv layer + norm_type (str): batch norm type + freeze_norm (bool): whether to freeze norm, default False + name (str): layer name + act (str): activation function + data_format (str): data format, NCHW or NHWC + """ + super(SPP, self).__init__() + self.pool = [] + self.data_format = data_format + for size in pool_size: + pool = self.add_sublayer( + '{}.pool1'.format(name), + nn.MaxPool2D( + kernel_size=size, + stride=1, + padding=size // 2, + data_format=data_format, + ceil_mode=False)) + self.pool.append(pool) + self.conv = ConvBNLayer( + ch_in, + ch_out, + k, + padding=k // 2, + norm_type=norm_type, + freeze_norm=freeze_norm, + name=name, + act=act, + data_format=data_format) + + def forward(self, x): + outs = [x] + for pool in self.pool: + outs.append(pool(x)) + if self.data_format == "NCHW": + y = paddle.concat(outs, axis=1) + else: + y = paddle.concat(outs, axis=-1) + + y = self.conv(y) + return y + + +class CoordConv(nn.Layer): + def __init__(self, + ch_in, + ch_out, + filter_size, + padding, + norm_type, + freeze_norm=False, + name='', + data_format='NCHW'): + """ + CoordConv layer, see https://arxiv.org/abs/1807.03247 + + Args: + ch_in (int): input channel + ch_out (int): output channel + filter_size (int): filter size, default 3 + padding (int): padding size, default 0 + norm_type (str): batch norm type, default bn + name (str): layer name + data_format (str): data format, NCHW or NHWC + + """ + super(CoordConv, self).__init__() + self.conv = ConvBNLayer( + ch_in + 2, + ch_out, + filter_size=filter_size, + padding=padding, + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name) + self.data_format = data_format + + def forward(self, x): + gx, gy = add_coord(x, self.data_format) + if self.data_format == 'NCHW': + y = paddle.concat([x, gx, gy], axis=1) + else: + y = paddle.concat([x, gx, gy], axis=-1) + y = self.conv(y) + return y + + +class PPYOLODetBlock(nn.Layer): + def __init__(self, cfg, name, data_format='NCHW'): + """ + PPYOLODetBlock layer + + Args: + cfg (list): layer configs for this block + name (str): block name + data_format (str): data format, NCHW or NHWC + """ + super(PPYOLODetBlock, self).__init__() + self.conv_module = nn.Sequential() + for idx, (conv_name, layer, args, kwargs) in enumerate(cfg[:-1]): + kwargs.update( + name='{}.{}'.format(name, conv_name), data_format=data_format) + self.conv_module.add_sublayer(conv_name, layer(*args, **kwargs)) + + conv_name, layer, args, kwargs = cfg[-1] + kwargs.update( + name='{}.{}'.format(name, conv_name), data_format=data_format) + self.tip = layer(*args, **kwargs) + + def forward(self, inputs): + route = self.conv_module(inputs) + tip = self.tip(route) + return route, tip + + +class PPYOLOTinyDetBlock(nn.Layer): + def __init__(self, + ch_in, + ch_out, + name, + drop_block=False, + block_size=3, + keep_prob=0.9, + data_format='NCHW'): + """ + PPYOLO Tiny DetBlock layer + Args: + ch_in (list): input channel number + ch_out (list): output channel number + name (str): block name + drop_block: whether user DropBlock + block_size: drop block size + keep_prob: probability to keep block in DropBlock + data_format (str): data format, NCHW or NHWC + """ + super(PPYOLOTinyDetBlock, self).__init__() + self.drop_block_ = drop_block + self.conv_module = nn.Sequential() + + cfgs = [ + # name, in channels, out channels, filter_size, + # stride, padding, groups + ['.0', ch_in, ch_out, 1, 1, 0, 1], + ['.1', ch_out, ch_out, 5, 1, 2, ch_out], + ['.2', ch_out, ch_out, 1, 1, 0, 1], + ['.route', ch_out, ch_out, 5, 1, 2, ch_out], + ] + for cfg in cfgs: + conv_name, conv_ch_in, conv_ch_out, filter_size, stride, padding, \ + groups = cfg + self.conv_module.add_sublayer( + name + conv_name, + ConvBNLayer( + ch_in=conv_ch_in, + ch_out=conv_ch_out, + filter_size=filter_size, + stride=stride, + padding=padding, + groups=groups, + name=name + conv_name)) + + self.tip = ConvBNLayer( + ch_in=ch_out, + ch_out=ch_out, + filter_size=1, + stride=1, + padding=0, + groups=1, + name=name + conv_name) + + if self.drop_block_: + self.drop_block = DropBlock( + block_size=block_size, + keep_prob=keep_prob, + data_format=data_format, + name=name + '.dropblock') + + def forward(self, inputs): + if self.drop_block_: + inputs = self.drop_block(inputs) + route = self.conv_module(inputs) + tip = self.tip(route) + return route, tip + + +class PPYOLODetBlockCSP(nn.Layer): + def __init__(self, + cfg, + ch_in, + ch_out, + act, + norm_type, + name, + data_format='NCHW'): + """ + PPYOLODetBlockCSP layer + + Args: + cfg (list): layer configs for this block + ch_in (int): input channel + ch_out (int): output channel + act (str): default mish + name (str): block name + data_format (str): data format, NCHW or NHWC + """ + super(PPYOLODetBlockCSP, self).__init__() + self.data_format = data_format + self.conv1 = ConvBNLayer( + ch_in, + ch_out, + 1, + padding=0, + act=act, + norm_type=norm_type, + name=name + '.left', + data_format=data_format) + self.conv2 = ConvBNLayer( + ch_in, + ch_out, + 1, + padding=0, + act=act, + norm_type=norm_type, + name=name + '.right', + data_format=data_format) + self.conv3 = ConvBNLayer( + ch_out * 2, + ch_out * 2, + 1, + padding=0, + act=act, + norm_type=norm_type, + name=name, + data_format=data_format) + self.conv_module = nn.Sequential() + for idx, (layer_name, layer, args, kwargs) in enumerate(cfg): + kwargs.update(name=name + layer_name, data_format=data_format) + self.conv_module.add_sublayer(layer_name, layer(*args, **kwargs)) + + def forward(self, inputs): + conv_left = self.conv1(inputs) + conv_right = self.conv2(inputs) + conv_left = self.conv_module(conv_left) + if self.data_format == 'NCHW': + conv = paddle.concat([conv_left, conv_right], axis=1) + else: + conv = paddle.concat([conv_left, conv_right], axis=-1) + + conv = self.conv3(conv) + return conv, conv + + +@register +@serializable +class YOLOv3FPN(nn.Layer): + __shared__ = ['norm_type', 'data_format'] + + def __init__(self, + in_channels=[256, 512, 1024], + norm_type='bn', + freeze_norm=False, + data_format='NCHW'): + """ + YOLOv3FPN layer + + Args: + in_channels (list): input channels for fpn + norm_type (str): batch norm type, default bn + data_format (str): data format, NCHW or NHWC + + """ + super(YOLOv3FPN, self).__init__() + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + self.num_blocks = len(in_channels) + + self._out_channels = [] + self.yolo_blocks = [] + self.routes = [] + self.data_format = data_format + for i in range(self.num_blocks): + name = 'yolo_block.{}'.format(i) + in_channel = in_channels[-i - 1] + if i > 0: + in_channel += 512 // (2**i) + yolo_block = self.add_sublayer( + name, + YoloDetBlock( + in_channel, + channel=512 // (2**i), + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name)) + self.yolo_blocks.append(yolo_block) + # tip layer output channel doubled + self._out_channels.append(1024 // (2**i)) + + if i < self.num_blocks - 1: + name = 'yolo_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=512 // (2**i), + ch_out=256 // (2**i), + filter_size=1, + stride=1, + padding=0, + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name)) + self.routes.append(route) + + def forward(self, blocks, for_mot=False): + assert len(blocks) == self.num_blocks + blocks = blocks[::-1] + yolo_feats = [] + + # add embedding features output for multi-object tracking model + if for_mot: + emb_feats = [] + + for i, block in enumerate(blocks): + if i > 0: + if self.data_format == 'NCHW': + block = paddle.concat([route, block], axis=1) + else: + block = paddle.concat([route, block], axis=-1) + route, tip = self.yolo_blocks[i](block) + yolo_feats.append(tip) + + if for_mot: + # add embedding features output + emb_feats.append(route) + + if i < self.num_blocks - 1: + route = self.routes[i](route) + route = F.interpolate( + route, scale_factor=2., data_format=self.data_format) + + if for_mot: + return {'yolo_feats': yolo_feats, 'emb_feats': emb_feats} + else: + return yolo_feats + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] + + +@register +@serializable +class PPYOLOFPN(nn.Layer): + __shared__ = ['norm_type', 'data_format'] + + def __init__(self, + in_channels=[512, 1024, 2048], + norm_type='bn', + freeze_norm=False, + data_format='NCHW', + coord_conv=False, + conv_block_num=2, + drop_block=False, + block_size=3, + keep_prob=0.9, + spp=False): + """ + PPYOLOFPN layer + + Args: + in_channels (list): input channels for fpn + norm_type (str): batch norm type, default bn + data_format (str): data format, NCHW or NHWC + coord_conv (bool): whether use CoordConv or not + conv_block_num (int): conv block num of each pan block + drop_block (bool): whether use DropBlock or not + block_size (int): block size of DropBlock + keep_prob (float): keep probability of DropBlock + spp (bool): whether use spp or not + + """ + super(PPYOLOFPN, self).__init__() + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + self.num_blocks = len(in_channels) + # parse kwargs + self.coord_conv = coord_conv + self.drop_block = drop_block + self.block_size = block_size + self.keep_prob = keep_prob + self.spp = spp + self.conv_block_num = conv_block_num + self.data_format = data_format + if self.coord_conv: + ConvLayer = CoordConv + else: + ConvLayer = ConvBNLayer + + if self.drop_block: + dropblock_cfg = [[ + 'dropblock', DropBlock, [self.block_size, self.keep_prob], + dict() + ]] + else: + dropblock_cfg = [] + + self._out_channels = [] + self.yolo_blocks = [] + self.routes = [] + for i, ch_in in enumerate(self.in_channels[::-1]): + if i > 0: + ch_in += 512 // (2**i) + channel = 64 * (2**self.num_blocks) // (2**i) + base_cfg = [] + c_in, c_out = ch_in, channel + for j in range(self.conv_block_num): + base_cfg += [ + [ + 'conv{}'.format(2 * j), ConvLayer, [c_in, c_out, 1], + dict( + padding=0, + norm_type=norm_type, + freeze_norm=freeze_norm) + ], + [ + 'conv{}'.format(2 * j + 1), ConvBNLayer, + [c_out, c_out * 2, 3], dict( + padding=1, + norm_type=norm_type, + freeze_norm=freeze_norm) + ], + ] + c_in, c_out = c_out * 2, c_out + + base_cfg += [[ + 'route', ConvLayer, [c_in, c_out, 1], dict( + padding=0, norm_type=norm_type, freeze_norm=freeze_norm) + ], [ + 'tip', ConvLayer, [c_out, c_out * 2, 3], dict( + padding=1, norm_type=norm_type, freeze_norm=freeze_norm) + ]] + + if self.conv_block_num == 2: + if i == 0: + if self.spp: + spp_cfg = [[ + 'spp', SPP, [channel * 4, channel, 1], dict( + pool_size=[5, 9, 13], + norm_type=norm_type, + freeze_norm=freeze_norm) + ]] + else: + spp_cfg = [] + cfg = base_cfg[0:3] + spp_cfg + base_cfg[ + 3:4] + dropblock_cfg + base_cfg[4:6] + else: + cfg = base_cfg[0:2] + dropblock_cfg + base_cfg[2:6] + elif self.conv_block_num == 0: + if self.spp and i == 0: + spp_cfg = [[ + 'spp', SPP, [c_in * 4, c_in, 1], dict( + pool_size=[5, 9, 13], + norm_type=norm_type, + freeze_norm=freeze_norm) + ]] + else: + spp_cfg = [] + cfg = spp_cfg + dropblock_cfg + base_cfg + name = 'yolo_block.{}'.format(i) + yolo_block = self.add_sublayer(name, PPYOLODetBlock(cfg, name)) + self.yolo_blocks.append(yolo_block) + self._out_channels.append(channel * 2) + if i < self.num_blocks - 1: + name = 'yolo_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=channel, + ch_out=256 // (2**i), + filter_size=1, + stride=1, + padding=0, + norm_type=norm_type, + freeze_norm=freeze_norm, + data_format=data_format, + name=name)) + self.routes.append(route) + + def forward(self, blocks, for_mot=False): + assert len(blocks) == self.num_blocks + blocks = blocks[::-1] + yolo_feats = [] + + # add embedding features output for multi-object tracking model + if for_mot: + emb_feats = [] + + for i, block in enumerate(blocks): + if i > 0: + if self.data_format == 'NCHW': + block = paddle.concat([route, block], axis=1) + else: + block = paddle.concat([route, block], axis=-1) + route, tip = self.yolo_blocks[i](block) + yolo_feats.append(tip) + + if for_mot: + # add embedding features output + emb_feats.append(route) + + if i < self.num_blocks - 1: + route = self.routes[i](route) + route = F.interpolate( + route, scale_factor=2., data_format=self.data_format) + + if for_mot: + return {'yolo_feats': yolo_feats, 'emb_feats': emb_feats} + else: + return yolo_feats + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] + + +@register +@serializable +class PPYOLOTinyFPN(nn.Layer): + __shared__ = ['norm_type', 'data_format'] + + def __init__(self, + in_channels=[80, 56, 34], + detection_block_channels=[160, 128, 96], + norm_type='bn', + data_format='NCHW', + **kwargs): + """ + PPYOLO Tiny FPN layer + Args: + in_channels (list): input channels for fpn + detection_block_channels (list): channels in fpn + norm_type (str): batch norm type, default bn + data_format (str): data format, NCHW or NHWC + kwargs: extra key-value pairs, such as parameter of DropBlock and spp + """ + super(PPYOLOTinyFPN, self).__init__() + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels[::-1] + assert len(detection_block_channels + ) > 0, "detection_block_channelslength should > 0" + self.detection_block_channels = detection_block_channels + self.data_format = data_format + self.num_blocks = len(in_channels) + # parse kwargs + self.drop_block = kwargs.get('drop_block', False) + self.block_size = kwargs.get('block_size', 3) + self.keep_prob = kwargs.get('keep_prob', 0.9) + + self.spp_ = kwargs.get('spp', False) + if self.spp_: + self.spp = SPP(self.in_channels[0] * 4, + self.in_channels[0], + k=1, + pool_size=[5, 9, 13], + norm_type=norm_type, + name='spp') + + self._out_channels = [] + self.yolo_blocks = [] + self.routes = [] + for i, ( + ch_in, ch_out + ) in enumerate(zip(self.in_channels, self.detection_block_channels)): + name = 'yolo_block.{}'.format(i) + if i > 0: + ch_in += self.detection_block_channels[i - 1] + yolo_block = self.add_sublayer( + name, + PPYOLOTinyDetBlock( + ch_in, + ch_out, + name, + drop_block=self.drop_block, + block_size=self.block_size, + keep_prob=self.keep_prob)) + self.yolo_blocks.append(yolo_block) + self._out_channels.append(ch_out) + + if i < self.num_blocks - 1: + name = 'yolo_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=ch_out, + ch_out=ch_out, + filter_size=1, + stride=1, + padding=0, + norm_type=norm_type, + data_format=data_format, + name=name)) + self.routes.append(route) + + def forward(self, blocks, for_mot=False): + assert len(blocks) == self.num_blocks + blocks = blocks[::-1] + yolo_feats = [] + + # add embedding features output for multi-object tracking model + if for_mot: + emb_feats = [] + + for i, block in enumerate(blocks): + if i == 0 and self.spp_: + block = self.spp(block) + + if i > 0: + if self.data_format == 'NCHW': + block = paddle.concat([route, block], axis=1) + else: + block = paddle.concat([route, block], axis=-1) + route, tip = self.yolo_blocks[i](block) + yolo_feats.append(tip) + + if for_mot: + # add embedding features output + emb_feats.append(route) + + if i < self.num_blocks - 1: + route = self.routes[i](route) + route = F.interpolate( + route, scale_factor=2., data_format=self.data_format) + + if for_mot: + return {'yolo_feats': yolo_feats, 'emb_feats': emb_feats} + else: + return yolo_feats + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] + + +@register +@serializable +class PPYOLOPAN(nn.Layer): + __shared__ = ['norm_type', 'data_format'] + + def __init__(self, + in_channels=[512, 1024, 2048], + norm_type='bn', + data_format='NCHW', + act='mish', + conv_block_num=3, + drop_block=False, + block_size=3, + keep_prob=0.9, + spp=False): + """ + PPYOLOPAN layer with SPP, DropBlock and CSP connection. + + Args: + in_channels (list): input channels for fpn + norm_type (str): batch norm type, default bn + data_format (str): data format, NCHW or NHWC + act (str): activation function, default mish + conv_block_num (int): conv block num of each pan block + drop_block (bool): whether use DropBlock or not + block_size (int): block size of DropBlock + keep_prob (float): keep probability of DropBlock + spp (bool): whether use spp or not + + """ + super(PPYOLOPAN, self).__init__() + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + self.num_blocks = len(in_channels) + # parse kwargs + self.drop_block = drop_block + self.block_size = block_size + self.keep_prob = keep_prob + self.spp = spp + self.conv_block_num = conv_block_num + self.data_format = data_format + if self.drop_block: + dropblock_cfg = [[ + 'dropblock', DropBlock, [self.block_size, self.keep_prob], + dict() + ]] + else: + dropblock_cfg = [] + + # fpn + self.fpn_blocks = [] + self.fpn_routes = [] + fpn_channels = [] + for i, ch_in in enumerate(self.in_channels[::-1]): + if i > 0: + ch_in += 512 // (2**(i - 1)) + channel = 512 // (2**i) + base_cfg = [] + for j in range(self.conv_block_num): + base_cfg += [ + # name, layer, args + [ + '{}.0'.format(j), ConvBNLayer, [channel, channel, 1], + dict( + padding=0, act=act, norm_type=norm_type) + ], + [ + '{}.1'.format(j), ConvBNLayer, [channel, channel, 3], + dict( + padding=1, act=act, norm_type=norm_type) + ] + ] + + if i == 0 and self.spp: + base_cfg[3] = [ + 'spp', SPP, [channel * 4, channel, 1], dict( + pool_size=[5, 9, 13], act=act, norm_type=norm_type) + ] + + cfg = base_cfg[:4] + dropblock_cfg + base_cfg[4:] + name = 'fpn.{}'.format(i) + fpn_block = self.add_sublayer( + name, + PPYOLODetBlockCSP(cfg, ch_in, channel, act, norm_type, name, + data_format)) + self.fpn_blocks.append(fpn_block) + fpn_channels.append(channel * 2) + if i < self.num_blocks - 1: + name = 'fpn_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=channel * 2, + ch_out=channel, + filter_size=1, + stride=1, + padding=0, + act=act, + norm_type=norm_type, + data_format=data_format, + name=name)) + self.fpn_routes.append(route) + # pan + self.pan_blocks = [] + self.pan_routes = [] + self._out_channels = [512 // (2**(self.num_blocks - 2)), ] + for i in reversed(range(self.num_blocks - 1)): + name = 'pan_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=fpn_channels[i + 1], + ch_out=fpn_channels[i + 1], + filter_size=3, + stride=2, + padding=1, + act=act, + norm_type=norm_type, + data_format=data_format, + name=name)) + self.pan_routes = [route, ] + self.pan_routes + base_cfg = [] + ch_in = fpn_channels[i] + fpn_channels[i + 1] + channel = 512 // (2**i) + for j in range(self.conv_block_num): + base_cfg += [ + # name, layer, args + [ + '{}.0'.format(j), ConvBNLayer, [channel, channel, 1], + dict( + padding=0, act=act, norm_type=norm_type) + ], + [ + '{}.1'.format(j), ConvBNLayer, [channel, channel, 3], + dict( + padding=1, act=act, norm_type=norm_type) + ] + ] + + cfg = base_cfg[:4] + dropblock_cfg + base_cfg[4:] + name = 'pan.{}'.format(i) + pan_block = self.add_sublayer( + name, + PPYOLODetBlockCSP(cfg, ch_in, channel, act, norm_type, name, + data_format)) + + self.pan_blocks = [pan_block, ] + self.pan_blocks + self._out_channels.append(channel * 2) + + self._out_channels = self._out_channels[::-1] + + def forward(self, blocks, for_mot=False): + assert len(blocks) == self.num_blocks + blocks = blocks[::-1] + fpn_feats = [] + + # add embedding features output for multi-object tracking model + if for_mot: + emb_feats = [] + + for i, block in enumerate(blocks): + if i > 0: + if self.data_format == 'NCHW': + block = paddle.concat([route, block], axis=1) + else: + block = paddle.concat([route, block], axis=-1) + route, tip = self.fpn_blocks[i](block) + fpn_feats.append(tip) + + if for_mot: + # add embedding features output + emb_feats.append(route) + + if i < self.num_blocks - 1: + route = self.fpn_routes[i](route) + route = F.interpolate( + route, scale_factor=2., data_format=self.data_format) + + pan_feats = [fpn_feats[-1], ] + route = fpn_feats[self.num_blocks - 1] + for i in reversed(range(self.num_blocks - 1)): + block = fpn_feats[i] + route = self.pan_routes[i](route) + if self.data_format == 'NCHW': + block = paddle.concat([route, block], axis=1) + else: + block = paddle.concat([route, block], axis=-1) + + route, tip = self.pan_blocks[i](block) + pan_feats.append(tip) + + if for_mot: + return {'yolo_feats': pan_feats[::-1], 'emb_feats': emb_feats} + else: + return pan_feats[::-1] + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] + + +@register +@serializable +class YOLOCSPPAN(nn.Layer): + """ + YOLO CSP-PAN, used in YOLOv5 and YOLOX. + """ + __shared__ = ['depth_mult', 'data_format', 'act', 'trt'] + + def __init__(self, + depth_mult=1.0, + in_channels=[256, 512, 1024], + depthwise=False, + data_format='NCHW', + act='silu', + trt=False): + super(YOLOCSPPAN, self).__init__() + self.in_channels = in_channels + self._out_channels = in_channels + Conv = DWConv if depthwise else BaseConv + + self.data_format = data_format + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + self.upsample = nn.Upsample(scale_factor=2, mode="nearest") + + # top-down fpn + self.lateral_convs = nn.LayerList() + self.fpn_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1, 0, -1): + self.lateral_convs.append( + BaseConv( + int(in_channels[idx]), + int(in_channels[idx - 1]), + 1, + 1, + act=act)) + self.fpn_blocks.append( + CSPLayer( + int(in_channels[idx - 1] * 2), + int(in_channels[idx - 1]), + round(3 * depth_mult), + shortcut=False, + depthwise=depthwise, + act=act)) + + # bottom-up pan + self.downsample_convs = nn.LayerList() + self.pan_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1): + self.downsample_convs.append( + Conv( + int(in_channels[idx]), + int(in_channels[idx]), + 3, + stride=2, + act=act)) + self.pan_blocks.append( + CSPLayer( + int(in_channels[idx] * 2), + int(in_channels[idx + 1]), + round(3 * depth_mult), + shortcut=False, + depthwise=depthwise, + act=act)) + + def forward(self, feats, for_mot=False): + assert len(feats) == len(self.in_channels) + + # top-down fpn + inner_outs = [feats[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = feats[idx - 1] + feat_heigh = self.lateral_convs[len(self.in_channels) - 1 - idx]( + feat_heigh) + inner_outs[0] = feat_heigh + + upsample_feat = F.interpolate( + feat_heigh, + scale_factor=2., + mode="nearest", + data_format=self.data_format) + inner_out = self.fpn_blocks[len(self.in_channels) - 1 - idx]( + paddle.concat( + [upsample_feat, feat_low], axis=1)) + inner_outs.insert(0, inner_out) + + # bottom-up pan + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsample_convs[idx](feat_low) + out = self.pan_blocks[idx](paddle.concat( + [downsample_feat, feat_height], axis=1)) + outs.append(out) + + return outs + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ops.py b/PaddleDetection-release-2.6/ppdet/modeling/ops.py new file mode 100644 index 0000000000000000000000000000000000000000..d9a1192d7fb93ef855d06cf8fbebd688e21a7317 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/ops.py @@ -0,0 +1,1114 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn.functional as F +import paddle.nn as nn +from paddle import ParamAttr +from paddle.regularizer import L2Decay +try: + import paddle._legacy_C_ops as C_ops +except: + import paddle._C_ops as C_ops + +from paddle import in_dynamic_mode +from paddle.common_ops_import import Variable, LayerHelper, check_variable_and_dtype, check_type, check_dtype + +__all__ = [ + 'prior_box', 'generate_proposals', 'box_coder', 'multiclass_nms', + 'distribute_fpn_proposals', 'matrix_nms', 'batch_norm', 'mish', 'silu', + 'swish', 'identity', 'anchor_generator' +] + + +def identity(x): + return x + + +def mish(x): + return F.mish(x) if hasattr(F, mish) else x * F.tanh(F.softplus(x)) + + +def silu(x): + return F.silu(x) + + +def swish(x): + return x * F.sigmoid(x) + + +TRT_ACT_SPEC = {'swish': swish, 'silu': swish} + +ACT_SPEC = {'mish': mish, 'silu': silu} + + +def get_act_fn(act=None, trt=False): + assert act is None or isinstance(act, ( + str, dict)), 'name of activation should be str, dict or None' + if not act: + return identity + + if isinstance(act, dict): + name = act['name'] + act.pop('name') + kwargs = act + else: + name = act + kwargs = dict() + + if trt and name in TRT_ACT_SPEC: + fn = TRT_ACT_SPEC[name] + elif name in ACT_SPEC: + fn = ACT_SPEC[name] + else: + fn = getattr(F, name) + + return lambda x: fn(x, **kwargs) + + +def batch_norm(ch, + norm_type='bn', + norm_decay=0., + freeze_norm=False, + initializer=None, + data_format='NCHW'): + + norm_lr = 0. if freeze_norm else 1. + weight_attr = ParamAttr( + initializer=initializer, + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + bias_attr = ParamAttr( + learning_rate=norm_lr, + regularizer=L2Decay(norm_decay), + trainable=False if freeze_norm else True) + + if norm_type in ['sync_bn', 'bn']: + norm_layer = nn.BatchNorm2D( + ch, + weight_attr=weight_attr, + bias_attr=bias_attr, + data_format=data_format) + + norm_params = norm_layer.parameters() + if freeze_norm: + for param in norm_params: + param.stop_gradient = True + + return norm_layer + + +@paddle.jit.not_to_static +def anchor_generator(input, + anchor_sizes=None, + aspect_ratios=None, + variance=[0.1, 0.1, 0.2, 0.2], + stride=None, + offset=0.5): + """ + **Anchor generator operator** + Generate anchors for Faster RCNN algorithm. + Each position of the input produce N anchors, N = + size(anchor_sizes) * size(aspect_ratios). The order of generated anchors + is firstly aspect_ratios loop then anchor_sizes loop. + Args: + input(Variable): 4-D Tensor with shape [N,C,H,W]. The input feature map. + anchor_sizes(float32|list|tuple, optional): The anchor sizes of generated + anchors, given in absolute pixels e.g. [64., 128., 256., 512.]. + For instance, the anchor size of 64 means the area of this anchor + equals to 64**2. None by default. + aspect_ratios(float32|list|tuple, optional): The height / width ratios + of generated anchors, e.g. [0.5, 1.0, 2.0]. None by default. + variance(list|tuple, optional): The variances to be used in box + regression deltas. The data type is float32, [0.1, 0.1, 0.2, 0.2] by + default. + stride(list|tuple, optional): The anchors stride across width and height. + The data type is float32. e.g. [16.0, 16.0]. None by default. + offset(float32, optional): Prior boxes center offset. 0.5 by default. + Returns: + Tuple: + Anchors(Variable): The output anchors with a layout of [H, W, num_anchors, 4]. + H is the height of input, W is the width of input, + num_anchors is the box count of each position. + Each anchor is in (xmin, ymin, xmax, ymax) format an unnormalized. + + Variances(Variable): The expanded variances of anchors + with a layout of [H, W, num_priors, 4]. + H is the height of input, W is the width of input + num_anchors is the box count of each position. + Each variance is in (xcenter, ycenter, w, h) format. + Examples: + .. code-block:: python + import paddle.fluid as fluid + conv1 = fluid.data(name='conv1', shape=[None, 48, 16, 16], dtype='float32') + anchor, var = fluid.layers.anchor_generator( + input=conv1, + anchor_sizes=[64, 128, 256, 512], + aspect_ratios=[0.5, 1.0, 2.0], + variance=[0.1, 0.1, 0.2, 0.2], + stride=[16.0, 16.0], + offset=0.5) + """ + + def _is_list_or_tuple_(data): + return (isinstance(data, list) or isinstance(data, tuple)) + + if not _is_list_or_tuple_(anchor_sizes): + anchor_sizes = [anchor_sizes] + if not _is_list_or_tuple_(aspect_ratios): + aspect_ratios = [aspect_ratios] + if not (_is_list_or_tuple_(stride) and len(stride) == 2): + raise ValueError('stride should be a list or tuple ', + 'with length 2, (stride_width, stride_height).') + + anchor_sizes = list(map(float, anchor_sizes)) + aspect_ratios = list(map(float, aspect_ratios)) + stride = list(map(float, stride)) + + if in_dynamic_mode(): + attrs = ('anchor_sizes', anchor_sizes, 'aspect_ratios', aspect_ratios, + 'variances', variance, 'stride', stride, 'offset', offset) + anchor, var = C_ops.anchor_generator(input, *attrs) + return anchor, var + + helper = LayerHelper("anchor_generator", **locals()) + dtype = helper.input_dtype() + attrs = { + 'anchor_sizes': anchor_sizes, + 'aspect_ratios': aspect_ratios, + 'variances': variance, + 'stride': stride, + 'offset': offset + } + + anchor = helper.create_variable_for_type_inference(dtype) + var = helper.create_variable_for_type_inference(dtype) + helper.append_op( + type="anchor_generator", + inputs={"Input": input}, + outputs={"Anchors": anchor, + "Variances": var}, + attrs=attrs, ) + anchor.stop_gradient = True + var.stop_gradient = True + return anchor, var + + +@paddle.jit.not_to_static +def distribute_fpn_proposals(fpn_rois, + min_level, + max_level, + refer_level, + refer_scale, + pixel_offset=False, + rois_num=None, + name=None): + r""" + + **This op only takes LoDTensor as input.** In Feature Pyramid Networks + (FPN) models, it is needed to distribute all proposals into different FPN + level, with respect to scale of the proposals, the referring scale and the + referring level. Besides, to restore the order of proposals, we return an + array which indicates the original index of rois in current proposals. + To compute FPN level for each roi, the formula is given as follows: + + .. math:: + + roi\_scale &= \sqrt{BBoxArea(fpn\_roi)} + + level = floor(&\log(\\frac{roi\_scale}{refer\_scale}) + refer\_level) + + where BBoxArea is a function to compute the area of each roi. + + Args: + + fpn_rois(Variable): 2-D Tensor with shape [N, 4] and data type is + float32 or float64. The input fpn_rois. + min_level(int32): The lowest level of FPN layer where the proposals come + from. + max_level(int32): The highest level of FPN layer where the proposals + come from. + refer_level(int32): The referring level of FPN layer with specified scale. + refer_scale(int32): The referring scale of FPN layer with specified level. + rois_num(Tensor): 1-D Tensor contains the number of RoIs in each image. + The shape is [B] and data type is int32. B is the number of images. + If it is not None then return a list of 1-D Tensor. Each element + is the output RoIs' number of each image on the corresponding level + and the shape is [B]. None by default. + name(str, optional): For detailed information, please refer + to :ref:`api_guide_Name`. Usually name is no need to set and + None by default. + + Returns: + Tuple: + + multi_rois(List) : A list of 2-D LoDTensor with shape [M, 4] + and data type of float32 and float64. The length is + max_level-min_level+1. The proposals in each FPN level. + + restore_ind(Variable): A 2-D Tensor with shape [N, 1], N is + the number of total rois. The data type is int32. It is + used to restore the order of fpn_rois. + + rois_num_per_level(List): A list of 1-D Tensor and each Tensor is + the RoIs' number in each image on the corresponding level. The shape + is [B] and data type of int32. B is the number of images + + + Examples: + .. code-block:: python + + import paddle + from ppdet.modeling import ops + paddle.enable_static() + fpn_rois = paddle.static.data( + name='data', shape=[None, 4], dtype='float32', lod_level=1) + multi_rois, restore_ind = ops.distribute_fpn_proposals( + fpn_rois=fpn_rois, + min_level=2, + max_level=5, + refer_level=4, + refer_scale=224) + """ + num_lvl = max_level - min_level + 1 + + if in_dynamic_mode(): + assert rois_num is not None, "rois_num should not be None in dygraph mode." + attrs = ('min_level', min_level, 'max_level', max_level, 'refer_level', + refer_level, 'refer_scale', refer_scale, 'pixel_offset', + pixel_offset) + multi_rois, restore_ind, rois_num_per_level = C_ops.distribute_fpn_proposals( + fpn_rois, rois_num, num_lvl, num_lvl, *attrs) + + return multi_rois, restore_ind, rois_num_per_level + + else: + check_variable_and_dtype(fpn_rois, 'fpn_rois', ['float32', 'float64'], + 'distribute_fpn_proposals') + helper = LayerHelper('distribute_fpn_proposals', **locals()) + dtype = helper.input_dtype('fpn_rois') + multi_rois = [ + helper.create_variable_for_type_inference(dtype) + for i in range(num_lvl) + ] + + restore_ind = helper.create_variable_for_type_inference(dtype='int32') + + inputs = {'FpnRois': fpn_rois} + outputs = { + 'MultiFpnRois': multi_rois, + 'RestoreIndex': restore_ind, + } + + if rois_num is not None: + inputs['RoisNum'] = rois_num + rois_num_per_level = [ + helper.create_variable_for_type_inference(dtype='int32') + for i in range(num_lvl) + ] + outputs['MultiLevelRoIsNum'] = rois_num_per_level + else: + rois_num_per_level = None + + helper.append_op( + type='distribute_fpn_proposals', + inputs=inputs, + outputs=outputs, + attrs={ + 'min_level': min_level, + 'max_level': max_level, + 'refer_level': refer_level, + 'refer_scale': refer_scale, + 'pixel_offset': pixel_offset + }) + return multi_rois, restore_ind, rois_num_per_level + + +@paddle.jit.not_to_static +def prior_box(input, + image, + min_sizes, + max_sizes=None, + aspect_ratios=[1.], + variance=[0.1, 0.1, 0.2, 0.2], + flip=False, + clip=False, + steps=[0.0, 0.0], + offset=0.5, + min_max_aspect_ratios_order=False, + name=None): + """ + + This op generates prior boxes for SSD(Single Shot MultiBox Detector) algorithm. + Each position of the input produce N prior boxes, N is determined by + the count of min_sizes, max_sizes and aspect_ratios, The size of the + box is in range(min_size, max_size) interval, which is generated in + sequence according to the aspect_ratios. + + Parameters: + input(Tensor): 4-D tensor(NCHW), the data type should be float32 or float64. + image(Tensor): 4-D tensor(NCHW), the input image data of PriorBoxOp, + the data type should be float32 or float64. + min_sizes(list|tuple|float): the min sizes of generated prior boxes. + max_sizes(list|tuple|None): the max sizes of generated prior boxes. + Default: None. + aspect_ratios(list|tuple|float): the aspect ratios of generated + prior boxes. Default: [1.]. + variance(list|tuple): the variances to be encoded in prior boxes. + Default:[0.1, 0.1, 0.2, 0.2]. + flip(bool): Whether to flip aspect ratios. Default:False. + clip(bool): Whether to clip out-of-boundary boxes. Default: False. + step(list|tuple): Prior boxes step across width and height, If + step[0] equals to 0.0 or step[1] equals to 0.0, the prior boxes step across + height or weight of the input will be automatically calculated. + Default: [0., 0.] + offset(float): Prior boxes center offset. Default: 0.5 + min_max_aspect_ratios_order(bool): If set True, the output prior box is + in order of [min, max, aspect_ratios], which is consistent with + Caffe. Please note, this order affects the weights order of + convolution layer followed by and does not affect the final + detection results. Default: False. + name(str, optional): The default value is None. Normally there is no need for + user to set this property. For more information, please refer to :ref:`api_guide_Name` + + Returns: + Tuple: A tuple with two Variable (boxes, variances) + + boxes(Tensor): the output prior boxes of PriorBox. + 4-D tensor, the layout is [H, W, num_priors, 4]. + H is the height of input, W is the width of input, + num_priors is the total box count of each position of input. + + variances(Tensor): the expanded variances of PriorBox. + 4-D tensor, the layput is [H, W, num_priors, 4]. + H is the height of input, W is the width of input + num_priors is the total box count of each position of input + + Examples: + .. code-block:: python + + import paddle + from ppdet.modeling import ops + + paddle.enable_static() + input = paddle.static.data(name="input", shape=[None,3,6,9]) + image = paddle.static.data(name="image", shape=[None,3,9,12]) + box, var = ops.prior_box( + input=input, + image=image, + min_sizes=[100.], + clip=True, + flip=True) + """ + helper = LayerHelper("prior_box", **locals()) + dtype = helper.input_dtype() + check_variable_and_dtype( + input, 'input', ['uint8', 'int8', 'float32', 'float64'], 'prior_box') + + def _is_list_or_tuple_(data): + return (isinstance(data, list) or isinstance(data, tuple)) + + if not _is_list_or_tuple_(min_sizes): + min_sizes = [min_sizes] + if not _is_list_or_tuple_(aspect_ratios): + aspect_ratios = [aspect_ratios] + if not (_is_list_or_tuple_(steps) and len(steps) == 2): + raise ValueError('steps should be a list or tuple ', + 'with length 2, (step_width, step_height).') + + min_sizes = list(map(float, min_sizes)) + aspect_ratios = list(map(float, aspect_ratios)) + steps = list(map(float, steps)) + + cur_max_sizes = None + if max_sizes is not None and len(max_sizes) > 0 and max_sizes[0] > 0: + if not _is_list_or_tuple_(max_sizes): + max_sizes = [max_sizes] + cur_max_sizes = max_sizes + + if in_dynamic_mode(): + attrs = ('min_sizes', min_sizes, 'aspect_ratios', aspect_ratios, + 'variances', variance, 'flip', flip, 'clip', clip, 'step_w', + steps[0], 'step_h', steps[1], 'offset', offset, + 'min_max_aspect_ratios_order', min_max_aspect_ratios_order) + if cur_max_sizes is not None: + attrs += ('max_sizes', cur_max_sizes) + box, var = C_ops.prior_box(input, image, *attrs) + return box, var + else: + attrs = { + 'min_sizes': min_sizes, + 'aspect_ratios': aspect_ratios, + 'variances': variance, + 'flip': flip, + 'clip': clip, + 'step_w': steps[0], + 'step_h': steps[1], + 'offset': offset, + 'min_max_aspect_ratios_order': min_max_aspect_ratios_order + } + + if cur_max_sizes is not None: + attrs['max_sizes'] = cur_max_sizes + + box = helper.create_variable_for_type_inference(dtype) + var = helper.create_variable_for_type_inference(dtype) + helper.append_op( + type="prior_box", + inputs={"Input": input, + "Image": image}, + outputs={"Boxes": box, + "Variances": var}, + attrs=attrs, ) + box.stop_gradient = True + var.stop_gradient = True + return box, var + + +@paddle.jit.not_to_static +def multiclass_nms(bboxes, + scores, + score_threshold, + nms_top_k, + keep_top_k, + nms_threshold=0.3, + normalized=True, + nms_eta=1., + background_label=-1, + return_index=False, + return_rois_num=True, + rois_num=None, + name=None): + """ + This operator is to do multi-class non maximum suppression (NMS) on + boxes and scores. + In the NMS step, this operator greedily selects a subset of detection bounding + boxes that have high scores larger than score_threshold, if providing this + threshold, then selects the largest nms_top_k confidences scores if nms_top_k + is larger than -1. Then this operator pruns away boxes that have high IOU + (intersection over union) overlap with already selected boxes by adaptive + threshold NMS based on parameters of nms_threshold and nms_eta. + Aftern NMS step, at most keep_top_k number of total bboxes are to be kept + per image if keep_top_k is larger than -1. + Args: + bboxes (Tensor): Two types of bboxes are supported: + 1. (Tensor) A 3-D Tensor with shape + [N, M, 4 or 8 16 24 32] represents the + predicted locations of M bounding bboxes, + N is the batch size. Each bounding box has four + coordinate values and the layout is + [xmin, ymin, xmax, ymax], when box size equals to 4. + 2. (LoDTensor) A 3-D Tensor with shape [M, C, 4] + M is the number of bounding boxes, C is the + class number + scores (Tensor): Two types of scores are supported: + 1. (Tensor) A 3-D Tensor with shape [N, C, M] + represents the predicted confidence predictions. + N is the batch size, C is the class number, M is + number of bounding boxes. For each category there + are total M scores which corresponding M bounding + boxes. Please note, M is equal to the 2nd dimension + of BBoxes. + 2. (LoDTensor) A 2-D LoDTensor with shape [M, C]. + M is the number of bbox, C is the class number. + In this case, input BBoxes should be the second + case with shape [M, C, 4]. + background_label (int): The index of background label, the background + label will be ignored. If set to -1, then all + categories will be considered. Default: 0 + score_threshold (float): Threshold to filter out bounding boxes with + low confidence score. If not provided, + consider all boxes. + nms_top_k (int): Maximum number of detections to be kept according to + the confidences after the filtering detections based + on score_threshold. + nms_threshold (float): The threshold to be used in NMS. Default: 0.3 + nms_eta (float): The threshold to be used in NMS. Default: 1.0 + keep_top_k (int): Number of total bboxes to be kept per image after NMS + step. -1 means keeping all bboxes after NMS step. + normalized (bool): Whether detections are normalized. Default: True + return_index(bool): Whether return selected index. Default: False + rois_num(Tensor): 1-D Tensor contains the number of RoIs in each image. + The shape is [B] and data type is int32. B is the number of images. + If it is not None then return a list of 1-D Tensor. Each element + is the output RoIs' number of each image on the corresponding level + and the shape is [B]. None by default. + name(str): Name of the multiclass nms op. Default: None. + Returns: + A tuple with two Variables: (Out, Index) if return_index is True, + otherwise, a tuple with one Variable(Out) is returned. + Out: A 2-D LoDTensor with shape [No, 6] represents the detections. + Each row has 6 values: [label, confidence, xmin, ymin, xmax, ymax] + or A 2-D LoDTensor with shape [No, 10] represents the detections. + Each row has 10 values: [label, confidence, x1, y1, x2, y2, x3, y3, + x4, y4]. No is the total number of detections. + If all images have not detected results, all elements in LoD will be + 0, and output tensor is empty (None). + Index: Only return when return_index is True. A 2-D LoDTensor with + shape [No, 1] represents the selected index which type is Integer. + The index is the absolute value cross batches. No is the same number + as Out. If the index is used to gather other attribute such as age, + one needs to reshape the input(N, M, 1) to (N * M, 1) as first, where + N is the batch size and M is the number of boxes. + Examples: + .. code-block:: python + + import paddle + from ppdet.modeling import ops + boxes = paddle.static.data(name='bboxes', shape=[81, 4], + dtype='float32', lod_level=1) + scores = paddle.static.data(name='scores', shape=[81], + dtype='float32', lod_level=1) + out, index = ops.multiclass_nms(bboxes=boxes, + scores=scores, + background_label=0, + score_threshold=0.5, + nms_top_k=400, + nms_threshold=0.3, + keep_top_k=200, + normalized=False, + return_index=True) + """ + helper = LayerHelper('multiclass_nms3', **locals()) + + if in_dynamic_mode(): + attrs = ('background_label', background_label, 'score_threshold', + score_threshold, 'nms_top_k', nms_top_k, 'nms_threshold', + nms_threshold, 'keep_top_k', keep_top_k, 'nms_eta', nms_eta, + 'normalized', normalized) + output, index, nms_rois_num = C_ops.multiclass_nms3(bboxes, scores, + rois_num, *attrs) + if not return_index: + index = None + return output, nms_rois_num, index + + else: + output = helper.create_variable_for_type_inference(dtype=bboxes.dtype) + index = helper.create_variable_for_type_inference(dtype='int32') + + inputs = {'BBoxes': bboxes, 'Scores': scores} + outputs = {'Out': output, 'Index': index} + + if rois_num is not None: + inputs['RoisNum'] = rois_num + + if return_rois_num: + nms_rois_num = helper.create_variable_for_type_inference( + dtype='int32') + outputs['NmsRoisNum'] = nms_rois_num + + helper.append_op( + type="multiclass_nms3", + inputs=inputs, + attrs={ + 'background_label': background_label, + 'score_threshold': score_threshold, + 'nms_top_k': nms_top_k, + 'nms_threshold': nms_threshold, + 'keep_top_k': keep_top_k, + 'nms_eta': nms_eta, + 'normalized': normalized + }, + outputs=outputs) + output.stop_gradient = True + index.stop_gradient = True + if not return_index: + index = None + if not return_rois_num: + nms_rois_num = None + + return output, nms_rois_num, index + + +@paddle.jit.not_to_static +def matrix_nms(bboxes, + scores, + score_threshold, + post_threshold, + nms_top_k, + keep_top_k, + use_gaussian=False, + gaussian_sigma=2., + background_label=0, + normalized=True, + return_index=False, + return_rois_num=True, + name=None): + """ + **Matrix NMS** + This operator does matrix non maximum suppression (NMS). + First selects a subset of candidate bounding boxes that have higher scores + than score_threshold (if provided), then the top k candidate is selected if + nms_top_k is larger than -1. Score of the remaining candidate are then + decayed according to the Matrix NMS scheme. + Aftern NMS step, at most keep_top_k number of total bboxes are to be kept + per image if keep_top_k is larger than -1. + Args: + bboxes (Tensor): A 3-D Tensor with shape [N, M, 4] represents the + predicted locations of M bounding bboxes, + N is the batch size. Each bounding box has four + coordinate values and the layout is + [xmin, ymin, xmax, ymax], when box size equals to 4. + The data type is float32 or float64. + scores (Tensor): A 3-D Tensor with shape [N, C, M] + represents the predicted confidence predictions. + N is the batch size, C is the class number, M is + number of bounding boxes. For each category there + are total M scores which corresponding M bounding + boxes. Please note, M is equal to the 2nd dimension + of BBoxes. The data type is float32 or float64. + score_threshold (float): Threshold to filter out bounding boxes with + low confidence score. + post_threshold (float): Threshold to filter out bounding boxes with + low confidence score AFTER decaying. + nms_top_k (int): Maximum number of detections to be kept according to + the confidences after the filtering detections based + on score_threshold. + keep_top_k (int): Number of total bboxes to be kept per image after NMS + step. -1 means keeping all bboxes after NMS step. + use_gaussian (bool): Use Gaussian as the decay function. Default: False + gaussian_sigma (float): Sigma for Gaussian decay function. Default: 2.0 + background_label (int): The index of background label, the background + label will be ignored. If set to -1, then all + categories will be considered. Default: 0 + normalized (bool): Whether detections are normalized. Default: True + return_index(bool): Whether return selected index. Default: False + return_rois_num(bool): whether return rois_num. Default: True + name(str): Name of the matrix nms op. Default: None. + Returns: + A tuple with three Tensor: (Out, Index, RoisNum) if return_index is True, + otherwise, a tuple with two Tensor (Out, RoisNum) is returned. + Out (Tensor): A 2-D Tensor with shape [No, 6] containing the + detection results. + Each row has 6 values: [label, confidence, xmin, ymin, xmax, ymax] + (After version 1.3, when no boxes detected, the lod is changed + from {0} to {1}) + Index (Tensor): A 2-D Tensor with shape [No, 1] containing the + selected indices, which are absolute values cross batches. + rois_num (Tensor): A 1-D Tensor with shape [N] containing + the number of detected boxes in each image. + Examples: + .. code-block:: python + import paddle + from ppdet.modeling import ops + boxes = paddle.static.data(name='bboxes', shape=[None,81, 4], + dtype='float32', lod_level=1) + scores = paddle.static.data(name='scores', shape=[None,81], + dtype='float32', lod_level=1) + out = ops.matrix_nms(bboxes=boxes, scores=scores, background_label=0, + score_threshold=0.5, post_threshold=0.1, + nms_top_k=400, keep_top_k=200, normalized=False) + """ + check_variable_and_dtype(bboxes, 'BBoxes', ['float32', 'float64'], + 'matrix_nms') + check_variable_and_dtype(scores, 'Scores', ['float32', 'float64'], + 'matrix_nms') + check_type(score_threshold, 'score_threshold', float, 'matrix_nms') + check_type(post_threshold, 'post_threshold', float, 'matrix_nms') + check_type(nms_top_k, 'nums_top_k', int, 'matrix_nms') + check_type(keep_top_k, 'keep_top_k', int, 'matrix_nms') + check_type(normalized, 'normalized', bool, 'matrix_nms') + check_type(use_gaussian, 'use_gaussian', bool, 'matrix_nms') + check_type(gaussian_sigma, 'gaussian_sigma', float, 'matrix_nms') + check_type(background_label, 'background_label', int, 'matrix_nms') + + if in_dynamic_mode(): + attrs = ('background_label', background_label, 'score_threshold', + score_threshold, 'post_threshold', post_threshold, 'nms_top_k', + nms_top_k, 'gaussian_sigma', gaussian_sigma, 'use_gaussian', + use_gaussian, 'keep_top_k', keep_top_k, 'normalized', + normalized) + out, index, rois_num = C_ops.matrix_nms(bboxes, scores, *attrs) + if not return_index: + index = None + if not return_rois_num: + rois_num = None + return out, rois_num, index + else: + helper = LayerHelper('matrix_nms', **locals()) + output = helper.create_variable_for_type_inference(dtype=bboxes.dtype) + index = helper.create_variable_for_type_inference(dtype='int32') + outputs = {'Out': output, 'Index': index} + if return_rois_num: + rois_num = helper.create_variable_for_type_inference(dtype='int32') + outputs['RoisNum'] = rois_num + + helper.append_op( + type="matrix_nms", + inputs={'BBoxes': bboxes, + 'Scores': scores}, + attrs={ + 'background_label': background_label, + 'score_threshold': score_threshold, + 'post_threshold': post_threshold, + 'nms_top_k': nms_top_k, + 'gaussian_sigma': gaussian_sigma, + 'use_gaussian': use_gaussian, + 'keep_top_k': keep_top_k, + 'normalized': normalized + }, + outputs=outputs) + output.stop_gradient = True + + if not return_index: + index = None + if not return_rois_num: + rois_num = None + return output, rois_num, index + + +@paddle.jit.not_to_static +def box_coder(prior_box, + prior_box_var, + target_box, + code_type="encode_center_size", + box_normalized=True, + axis=0, + name=None): + r""" + **Box Coder Layer** + Encode/Decode the target bounding box with the priorbox information. + + The Encoding schema described below: + .. math:: + ox = (tx - px) / pw / pxv + oy = (ty - py) / ph / pyv + ow = \log(\abs(tw / pw)) / pwv + oh = \log(\abs(th / ph)) / phv + The Decoding schema described below: + + .. math:: + + ox = (pw * pxv * tx * + px) - tw / 2 + oy = (ph * pyv * ty * + py) - th / 2 + ow = \exp(pwv * tw) * pw + tw / 2 + oh = \exp(phv * th) * ph + th / 2 + where `tx`, `ty`, `tw`, `th` denote the target box's center coordinates, + width and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote + the priorbox's (anchor) center coordinates, width and height. `pxv`, + `pyv`, `pwv`, `phv` denote the variance of the priorbox and `ox`, `oy`, + `ow`, `oh` denote the encoded/decoded coordinates, width and height. + During Box Decoding, two modes for broadcast are supported. Say target + box has shape [N, M, 4], and the shape of prior box can be [N, 4] or + [M, 4]. Then prior box will broadcast to target box along the + assigned axis. + + Args: + prior_box(Tensor): Box list prior_box is a 2-D Tensor with shape + [M, 4] holds M boxes and data type is float32 or float64. Each box + is represented as [xmin, ymin, xmax, ymax], [xmin, ymin] is the + left top coordinate of the anchor box, if the input is image feature + map, they are close to the origin of the coordinate system. + [xmax, ymax] is the right bottom coordinate of the anchor box. + prior_box_var(List|Tensor|None): prior_box_var supports three types + of input. One is Tensor with shape [M, 4] which holds M group and + data type is float32 or float64. The second is list consist of + 4 elements shared by all boxes and data type is float32 or float64. + Other is None and not involved in calculation. + target_box(Tensor): This input can be a 2-D LoDTensor with shape + [N, 4] when code_type is 'encode_center_size'. This input also can + be a 3-D Tensor with shape [N, M, 4] when code_type is + 'decode_center_size'. Each box is represented as + [xmin, ymin, xmax, ymax]. The data type is float32 or float64. + code_type(str): The code type used with the target box. It can be + `encode_center_size` or `decode_center_size`. `encode_center_size` + by default. + box_normalized(bool): Whether treat the priorbox as a normalized box. + Set true by default. + axis(int): Which axis in PriorBox to broadcast for box decode, + for example, if axis is 0 and TargetBox has shape [N, M, 4] and + PriorBox has shape [M, 4], then PriorBox will broadcast to [N, M, 4] + for decoding. It is only valid when code type is + `decode_center_size`. Set 0 by default. + name(str, optional): For detailed information, please refer + to :ref:`api_guide_Name`. Usually name is no need to set and + None by default. + + Returns: + Tensor: + output_box(Tensor): When code_type is 'encode_center_size', the + output tensor of box_coder_op with shape [N, M, 4] representing the + result of N target boxes encoded with M Prior boxes and variances. + When code_type is 'decode_center_size', N represents the batch size + and M represents the number of decoded boxes. + + Examples: + + .. code-block:: python + + import paddle + from ppdet.modeling import ops + paddle.enable_static() + # For encode + prior_box_encode = paddle.static.data(name='prior_box_encode', + shape=[512, 4], + dtype='float32') + target_box_encode = paddle.static.data(name='target_box_encode', + shape=[81, 4], + dtype='float32') + output_encode = ops.box_coder(prior_box=prior_box_encode, + prior_box_var=[0.1,0.1,0.2,0.2], + target_box=target_box_encode, + code_type="encode_center_size") + # For decode + prior_box_decode = paddle.static.data(name='prior_box_decode', + shape=[512, 4], + dtype='float32') + target_box_decode = paddle.static.data(name='target_box_decode', + shape=[512, 81, 4], + dtype='float32') + output_decode = ops.box_coder(prior_box=prior_box_decode, + prior_box_var=[0.1,0.1,0.2,0.2], + target_box=target_box_decode, + code_type="decode_center_size", + box_normalized=False, + axis=1) + """ + check_variable_and_dtype(prior_box, 'prior_box', ['float32', 'float64'], + 'box_coder') + check_variable_and_dtype(target_box, 'target_box', ['float32', 'float64'], + 'box_coder') + + if in_dynamic_mode(): + if isinstance(prior_box_var, Variable): + output_box = C_ops.box_coder( + prior_box, prior_box_var, target_box, "code_type", code_type, + "box_normalized", box_normalized, "axis", axis) + + elif isinstance(prior_box_var, list): + output_box = C_ops.box_coder( + prior_box, None, target_box, "code_type", code_type, + "box_normalized", box_normalized, "axis", axis, "variance", + prior_box_var) + else: + raise TypeError( + "Input variance of box_coder must be Variable or list") + return output_box + else: + helper = LayerHelper("box_coder", **locals()) + + output_box = helper.create_variable_for_type_inference( + dtype=prior_box.dtype) + + inputs = {"PriorBox": prior_box, "TargetBox": target_box} + attrs = { + "code_type": code_type, + "box_normalized": box_normalized, + "axis": axis + } + if isinstance(prior_box_var, Variable): + inputs['PriorBoxVar'] = prior_box_var + elif isinstance(prior_box_var, list): + attrs['variance'] = prior_box_var + else: + raise TypeError( + "Input variance of box_coder must be Variable or list") + helper.append_op( + type="box_coder", + inputs=inputs, + attrs=attrs, + outputs={"OutputBox": output_box}) + return output_box + + +@paddle.jit.not_to_static +def generate_proposals(scores, + bbox_deltas, + im_shape, + anchors, + variances, + pre_nms_top_n=6000, + post_nms_top_n=1000, + nms_thresh=0.5, + min_size=0.1, + eta=1.0, + pixel_offset=False, + return_rois_num=False, + name=None): + """ + **Generate proposal Faster-RCNN** + This operation proposes RoIs according to each box with their + probability to be a foreground object and + the box can be calculated by anchors. Bbox_deltais and scores + to be an object are the output of RPN. Final proposals + could be used to train detection net. + For generating proposals, this operation performs following steps: + 1. Transposes and resizes scores and bbox_deltas in size of + (H*W*A, 1) and (H*W*A, 4) + 2. Calculate box locations as proposals candidates. + 3. Clip boxes to image + 4. Remove predicted boxes with small area. + 5. Apply NMS to get final proposals as output. + Args: + scores(Tensor): A 4-D Tensor with shape [N, A, H, W] represents + the probability for each box to be an object. + N is batch size, A is number of anchors, H and W are height and + width of the feature map. The data type must be float32. + bbox_deltas(Tensor): A 4-D Tensor with shape [N, 4*A, H, W] + represents the difference between predicted box location and + anchor location. The data type must be float32. + im_shape(Tensor): A 2-D Tensor with shape [N, 2] represents H, W, the + origin image size or input size. The data type can be float32 or + float64. + anchors(Tensor): A 4-D Tensor represents the anchors with a layout + of [H, W, A, 4]. H and W are height and width of the feature map, + num_anchors is the box count of each position. Each anchor is + in (xmin, ymin, xmax, ymax) format an unnormalized. The data type must be float32. + variances(Tensor): A 4-D Tensor. The expanded variances of anchors with a layout of + [H, W, num_priors, 4]. Each variance is in + (xcenter, ycenter, w, h) format. The data type must be float32. + pre_nms_top_n(float): Number of total bboxes to be kept per + image before NMS. The data type must be float32. `6000` by default. + post_nms_top_n(float): Number of total bboxes to be kept per + image after NMS. The data type must be float32. `1000` by default. + nms_thresh(float): Threshold in NMS. The data type must be float32. `0.5` by default. + min_size(float): Remove predicted boxes with either height or + width < min_size. The data type must be float32. `0.1` by default. + eta(float): Apply in adaptive NMS, if adaptive `threshold > 0.5`, + `adaptive_threshold = adaptive_threshold * eta` in each iteration. + return_rois_num(bool): When setting True, it will return a 1D Tensor with shape [N, ] that includes Rois's + num of each image in one batch. The N is the image's num. For example, the tensor has values [4,5] that represents + the first image has 4 Rois, the second image has 5 Rois. It only used in rcnn model. + 'False' by default. + name(str, optional): For detailed information, please refer + to :ref:`api_guide_Name`. Usually name is no need to set and + None by default. + + Returns: + tuple: + A tuple with format ``(rpn_rois, rpn_roi_probs)``. + - **rpn_rois**: The generated RoIs. 2-D Tensor with shape ``[N, 4]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``. + - **rpn_roi_probs**: The scores of generated RoIs. 2-D Tensor with shape ``[N, 1]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``. + + Examples: + .. code-block:: python + + import paddle + from ppdet.modeling import ops + paddle.enable_static() + scores = paddle.static.data(name='scores', shape=[None, 4, 5, 5], dtype='float32') + bbox_deltas = paddle.static.data(name='bbox_deltas', shape=[None, 16, 5, 5], dtype='float32') + im_shape = paddle.static.data(name='im_shape', shape=[None, 2], dtype='float32') + anchors = paddle.static.data(name='anchors', shape=[None, 5, 4, 4], dtype='float32') + variances = paddle.static.data(name='variances', shape=[None, 5, 10, 4], dtype='float32') + rois, roi_probs = ops.generate_proposals(scores, bbox_deltas, + im_shape, anchors, variances) + """ + if in_dynamic_mode(): + assert return_rois_num, "return_rois_num should be True in dygraph mode." + attrs = ('pre_nms_topN', pre_nms_top_n, 'post_nms_topN', post_nms_top_n, + 'nms_thresh', nms_thresh, 'min_size', min_size, 'eta', eta, + 'pixel_offset', pixel_offset) + rpn_rois, rpn_roi_probs, rpn_rois_num = C_ops.generate_proposals_v2( + scores, bbox_deltas, im_shape, anchors, variances, *attrs) + if not return_rois_num: + rpn_rois_num = None + return rpn_rois, rpn_roi_probs, rpn_rois_num + + else: + helper = LayerHelper('generate_proposals_v2', **locals()) + + check_variable_and_dtype(scores, 'scores', ['float32'], + 'generate_proposals_v2') + check_variable_and_dtype(bbox_deltas, 'bbox_deltas', ['float32'], + 'generate_proposals_v2') + check_variable_and_dtype(im_shape, 'im_shape', ['float32', 'float64'], + 'generate_proposals_v2') + check_variable_and_dtype(anchors, 'anchors', ['float32'], + 'generate_proposals_v2') + check_variable_and_dtype(variances, 'variances', ['float32'], + 'generate_proposals_v2') + + rpn_rois = helper.create_variable_for_type_inference( + dtype=bbox_deltas.dtype) + rpn_roi_probs = helper.create_variable_for_type_inference( + dtype=scores.dtype) + outputs = { + 'RpnRois': rpn_rois, + 'RpnRoiProbs': rpn_roi_probs, + } + if return_rois_num: + rpn_rois_num = helper.create_variable_for_type_inference( + dtype='int32') + rpn_rois_num.stop_gradient = True + outputs['RpnRoisNum'] = rpn_rois_num + + helper.append_op( + type="generate_proposals_v2", + inputs={ + 'Scores': scores, + 'BboxDeltas': bbox_deltas, + 'ImShape': im_shape, + 'Anchors': anchors, + 'Variances': variances + }, + attrs={ + 'pre_nms_topN': pre_nms_top_n, + 'post_nms_topN': post_nms_top_n, + 'nms_thresh': nms_thresh, + 'min_size': min_size, + 'eta': eta, + 'pixel_offset': pixel_offset + }, + outputs=outputs) + rpn_rois.stop_gradient = True + rpn_roi_probs.stop_gradient = True + if not return_rois_num: + rpn_rois_num = None + + return rpn_rois, rpn_roi_probs, rpn_rois_num + + +def sigmoid_cross_entropy_with_logits(input, + label, + ignore_index=-100, + normalize=False): + output = F.binary_cross_entropy_with_logits(input, label, reduction='none') + mask_tensor = paddle.cast(label != ignore_index, 'float32') + output = paddle.multiply(output, mask_tensor) + if normalize: + sum_valid_mask = paddle.sum(mask_tensor) + output = output / sum_valid_mask + return output + + +def smooth_l1(input, label, inside_weight=None, outside_weight=None, + sigma=None): + input_new = paddle.multiply(input, inside_weight) + label_new = paddle.multiply(label, inside_weight) + delta = 1 / (sigma * sigma) + out = F.smooth_l1_loss(input_new, label_new, reduction='none', delta=delta) + out = paddle.multiply(out, outside_weight) + out = out / delta + out = paddle.reshape(out, shape=[out.shape[0], -1]) + out = paddle.sum(out, axis=1) + return out + + +def channel_shuffle(x, groups): + batch_size, num_channels, height, width = x.shape[0:4] + assert num_channels % groups == 0, 'num_channels should be divisible by groups' + channels_per_group = num_channels // groups + x = paddle.reshape( + x=x, shape=[batch_size, groups, channels_per_group, height, width]) + x = paddle.transpose(x=x, perm=[0, 2, 1, 3, 4]) + x = paddle.reshape(x=x, shape=[batch_size, num_channels, height, width]) + return x + + +def get_static_shape(tensor): + shape = paddle.shape(tensor) + shape.stop_gradient = True + return shape diff --git a/PaddleDetection-release-2.6/ppdet/modeling/post_process.py b/PaddleDetection-release-2.6/ppdet/modeling/post_process.py new file mode 100644 index 0000000000000000000000000000000000000000..933d012de1818d2ced2d43a3039d1e63f2740f9e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/post_process.py @@ -0,0 +1,688 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register +from ppdet.modeling.bbox_utils import nonempty_bbox +from .transformers import bbox_cxcywh_to_xyxy +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +__all__ = [ + 'BBoxPostProcess', 'MaskPostProcess', 'JDEBBoxPostProcess', + 'CenterNetPostProcess', 'DETRBBoxPostProcess', 'SparsePostProcess' +] + + +@register +class BBoxPostProcess(object): + __shared__ = ['num_classes', 'export_onnx', 'export_eb'] + __inject__ = ['decode', 'nms'] + + def __init__(self, + num_classes=80, + decode=None, + nms=None, + export_onnx=False, + export_eb=False): + super(BBoxPostProcess, self).__init__() + self.num_classes = num_classes + self.decode = decode + self.nms = nms + self.export_onnx = export_onnx + self.export_eb = export_eb + + def __call__(self, head_out, rois, im_shape, scale_factor): + """ + Decode the bbox and do NMS if needed. + + Args: + head_out (tuple): bbox_pred and cls_prob of bbox_head output. + rois (tuple): roi and rois_num of rpn_head output. + im_shape (Tensor): The shape of the input image. + scale_factor (Tensor): The scale factor of the input image. + export_onnx (bool): whether export model to onnx + Returns: + bbox_pred (Tensor): The output prediction with shape [N, 6], including + labels, scores and bboxes. The size of bboxes are corresponding + to the input image, the bboxes may be used in other branch. + bbox_num (Tensor): The number of prediction boxes of each batch with + shape [1], and is N. + """ + if self.nms is not None: + bboxes, score = self.decode(head_out, rois, im_shape, scale_factor) + bbox_pred, bbox_num, before_nms_indexes = self.nms(bboxes, score, self.num_classes) + + else: + bbox_pred, bbox_num = self.decode(head_out, rois, im_shape, + scale_factor) + + if self.export_onnx: + # add fake box after postprocess when exporting onnx + fake_bboxes = paddle.to_tensor( + np.array( + [[0., 0.0, 0.0, 0.0, 1.0, 1.0]], dtype='float32')) + + bbox_pred = paddle.concat([bbox_pred, fake_bboxes]) + bbox_num = bbox_num + 1 + + if self.nms is not None: + return bbox_pred, bbox_num, before_nms_indexes + else: + return bbox_pred, bbox_num + + def get_pred(self, bboxes, bbox_num, im_shape, scale_factor): + """ + Rescale, clip and filter the bbox from the output of NMS to + get final prediction. + + Notes: + Currently only support bs = 1. + + Args: + bboxes (Tensor): The output bboxes with shape [N, 6] after decode + and NMS, including labels, scores and bboxes. + bbox_num (Tensor): The number of prediction boxes of each batch with + shape [1], and is N. + im_shape (Tensor): The shape of the input image. + scale_factor (Tensor): The scale factor of the input image. + Returns: + pred_result (Tensor): The final prediction results with shape [N, 6] + including labels, scores and bboxes. + """ + if self.export_eb: + # enable rcnn models for edgeboard hw to skip the following postprocess. + return bboxes, bboxes, bbox_num + + if not self.export_onnx: + bboxes_list = [] + bbox_num_list = [] + id_start = 0 + fake_bboxes = paddle.to_tensor( + np.array( + [[0., 0.0, 0.0, 0.0, 1.0, 1.0]], dtype='float32')) + fake_bbox_num = paddle.to_tensor(np.array([1], dtype='int32')) + + # add fake bbox when output is empty for each batch + for i in range(bbox_num.shape[0]): + if bbox_num[i] == 0: + bboxes_i = fake_bboxes + bbox_num_i = fake_bbox_num + else: + bboxes_i = bboxes[id_start:id_start + bbox_num[i], :] + bbox_num_i = bbox_num[i] + id_start += bbox_num[i] + bboxes_list.append(bboxes_i) + bbox_num_list.append(bbox_num_i) + bboxes = paddle.concat(bboxes_list) + bbox_num = paddle.concat(bbox_num_list) + + origin_shape = paddle.floor(im_shape / scale_factor + 0.5) + + if not self.export_onnx: + origin_shape_list = [] + scale_factor_list = [] + # scale_factor: scale_y, scale_x + for i in range(bbox_num.shape[0]): + expand_shape = paddle.expand(origin_shape[i:i + 1, :], + [bbox_num[i], 2]) + scale_y, scale_x = scale_factor[i][0], scale_factor[i][1] + scale = paddle.concat([scale_x, scale_y, scale_x, scale_y]) + expand_scale = paddle.expand(scale, [bbox_num[i], 4]) + origin_shape_list.append(expand_shape) + scale_factor_list.append(expand_scale) + + self.origin_shape_list = paddle.concat(origin_shape_list) + scale_factor_list = paddle.concat(scale_factor_list) + + else: + # simplify the computation for bs=1 when exporting onnx + scale_y, scale_x = scale_factor[0][0], scale_factor[0][1] + scale = paddle.concat( + [scale_x, scale_y, scale_x, scale_y]).unsqueeze(0) + self.origin_shape_list = paddle.expand(origin_shape, + [bbox_num[0], 2]) + scale_factor_list = paddle.expand(scale, [bbox_num[0], 4]) + + # bboxes: [N, 6], label, score, bbox + pred_label = bboxes[:, 0:1] + pred_score = bboxes[:, 1:2] + pred_bbox = bboxes[:, 2:] + # rescale bbox to original image + scaled_bbox = pred_bbox / scale_factor_list + origin_h = self.origin_shape_list[:, 0] + origin_w = self.origin_shape_list[:, 1] + zeros = paddle.zeros_like(origin_h) + # clip bbox to [0, original_size] + x1 = paddle.maximum(paddle.minimum(scaled_bbox[:, 0], origin_w), zeros) + y1 = paddle.maximum(paddle.minimum(scaled_bbox[:, 1], origin_h), zeros) + x2 = paddle.maximum(paddle.minimum(scaled_bbox[:, 2], origin_w), zeros) + y2 = paddle.maximum(paddle.minimum(scaled_bbox[:, 3], origin_h), zeros) + pred_bbox = paddle.stack([x1, y1, x2, y2], axis=-1) + # filter empty bbox + keep_mask = nonempty_bbox(pred_bbox, return_mask=True) + keep_mask = paddle.unsqueeze(keep_mask, [1]) + pred_label = paddle.where(keep_mask, pred_label, + paddle.ones_like(pred_label) * -1) + pred_result = paddle.concat([pred_label, pred_score, pred_bbox], axis=1) + return bboxes, pred_result, bbox_num + + def get_origin_shape(self, ): + return self.origin_shape_list + + +@register +class MaskPostProcess(object): + __shared__ = ['export_onnx', 'assign_on_cpu'] + """ + refer to: + https://github.com/facebookresearch/detectron2/layers/mask_ops.py + + Get Mask output according to the output from model + """ + + def __init__(self, + binary_thresh=0.5, + export_onnx=False, + assign_on_cpu=False): + super(MaskPostProcess, self).__init__() + self.binary_thresh = binary_thresh + self.export_onnx = export_onnx + self.assign_on_cpu = assign_on_cpu + + def __call__(self, mask_out, bboxes, bbox_num, origin_shape): + """ + Decode the mask_out and paste the mask to the origin image. + + Args: + mask_out (Tensor): mask_head output with shape [N, 28, 28]. + bbox_pred (Tensor): The output bboxes with shape [N, 6] after decode + and NMS, including labels, scores and bboxes. + bbox_num (Tensor): The number of prediction boxes of each batch with + shape [1], and is N. + origin_shape (Tensor): The origin shape of the input image, the tensor + shape is [N, 2], and each row is [h, w]. + Returns: + pred_result (Tensor): The final prediction mask results with shape + [N, h, w] in binary mask style. + """ + num_mask = mask_out.shape[0] + origin_shape = paddle.cast(origin_shape, 'int32') + device = paddle.device.get_device() + + if self.export_onnx: + h, w = origin_shape[0][0], origin_shape[0][1] + mask_onnx = paste_mask(mask_out[:, None, :, :], bboxes[:, 2:], h, w, + self.assign_on_cpu) + mask_onnx = mask_onnx >= self.binary_thresh + pred_result = paddle.cast(mask_onnx, 'int32') + + else: + max_h = paddle.max(origin_shape[:, 0]) + max_w = paddle.max(origin_shape[:, 1]) + pred_result = paddle.zeros( + [num_mask, max_h, max_w], dtype='int32') - 1 + + id_start = 0 + for i in range(paddle.shape(bbox_num)[0]): + bboxes_i = bboxes[id_start:id_start + bbox_num[i], :] + mask_out_i = mask_out[id_start:id_start + bbox_num[i], :, :] + im_h = origin_shape[i, 0] + im_w = origin_shape[i, 1] + pred_mask = paste_mask(mask_out_i[:, None, :, :], + bboxes_i[:, 2:], im_h, im_w, + self.assign_on_cpu) + pred_mask = paddle.cast(pred_mask >= self.binary_thresh, + 'int32') + pred_result[id_start:id_start + bbox_num[i], :im_h, : + im_w] = pred_mask + id_start += bbox_num[i] + if self.assign_on_cpu: + paddle.set_device(device) + + return pred_result + + +@register +class JDEBBoxPostProcess(nn.Layer): + __shared__ = ['num_classes'] + __inject__ = ['decode', 'nms'] + + def __init__(self, num_classes=1, decode=None, nms=None, return_idx=True): + super(JDEBBoxPostProcess, self).__init__() + self.num_classes = num_classes + self.decode = decode + self.nms = nms + self.return_idx = return_idx + + self.fake_bbox_pred = paddle.to_tensor( + np.array( + [[-1, 0.0, 0.0, 0.0, 0.0, 0.0]], dtype='float32')) + self.fake_bbox_num = paddle.to_tensor(np.array([1], dtype='int32')) + self.fake_nms_keep_idx = paddle.to_tensor( + np.array( + [[0]], dtype='int32')) + + self.fake_yolo_boxes_out = paddle.to_tensor( + np.array( + [[[0.0, 0.0, 0.0, 0.0]]], dtype='float32')) + self.fake_yolo_scores_out = paddle.to_tensor( + np.array( + [[[0.0]]], dtype='float32')) + self.fake_boxes_idx = paddle.to_tensor(np.array([[0]], dtype='int64')) + + def forward(self, head_out, anchors): + """ + Decode the bbox and do NMS for JDE model. + + Args: + head_out (list): Bbox_pred and cls_prob of bbox_head output. + anchors (list): Anchors of JDE model. + + Returns: + boxes_idx (Tensor): The index of kept bboxes after decode 'JDEBox'. + bbox_pred (Tensor): The output is the prediction with shape [N, 6] + including labels, scores and bboxes. + bbox_num (Tensor): The number of prediction of each batch with shape [N]. + nms_keep_idx (Tensor): The index of kept bboxes after NMS. + """ + boxes_idx, yolo_boxes_scores = self.decode(head_out, anchors) + + if len(boxes_idx) == 0: + boxes_idx = self.fake_boxes_idx + yolo_boxes_out = self.fake_yolo_boxes_out + yolo_scores_out = self.fake_yolo_scores_out + else: + yolo_boxes = paddle.gather_nd(yolo_boxes_scores, boxes_idx) + # TODO: only support bs=1 now + yolo_boxes_out = paddle.reshape( + yolo_boxes[:, :4], shape=[1, len(boxes_idx), 4]) + yolo_scores_out = paddle.reshape( + yolo_boxes[:, 4:5], shape=[1, 1, len(boxes_idx)]) + boxes_idx = boxes_idx[:, 1:] + + if self.return_idx: + bbox_pred, bbox_num, nms_keep_idx = self.nms( + yolo_boxes_out, yolo_scores_out, self.num_classes) + if bbox_pred.shape[0] == 0: + bbox_pred = self.fake_bbox_pred + bbox_num = self.fake_bbox_num + nms_keep_idx = self.fake_nms_keep_idx + return boxes_idx, bbox_pred, bbox_num, nms_keep_idx + else: + bbox_pred, bbox_num, _ = self.nms(yolo_boxes_out, yolo_scores_out, + self.num_classes) + if bbox_pred.shape[0] == 0: + bbox_pred = self.fake_bbox_pred + bbox_num = self.fake_bbox_num + return _, bbox_pred, bbox_num, _ + + +@register +class CenterNetPostProcess(object): + """ + Postprocess the model outputs to get final prediction: + 1. Do NMS for heatmap to get top `max_per_img` bboxes. + 2. Decode bboxes using center offset and box size. + 3. Rescale decoded bboxes reference to the origin image shape. + Args: + max_per_img(int): the maximum number of predicted objects in a image, + 500 by default. + down_ratio(int): the down ratio from images to heatmap, 4 by default. + regress_ltrb (bool): whether to regress left/top/right/bottom or + width/height for a box, true by default. + """ + __shared__ = ['down_ratio'] + + def __init__(self, max_per_img=500, down_ratio=4, regress_ltrb=True): + super(CenterNetPostProcess, self).__init__() + self.max_per_img = max_per_img + self.down_ratio = down_ratio + self.regress_ltrb = regress_ltrb + # _simple_nms() _topk() are same as TTFBox in ppdet/modeling/layers.py + + def _simple_nms(self, heat, kernel=3): + """ Use maxpool to filter the max score, get local peaks. """ + pad = (kernel - 1) // 2 + hmax = F.max_pool2d(heat, kernel, stride=1, padding=pad) + keep = paddle.cast(hmax == heat, 'float32') + return heat * keep + + def _topk(self, scores): + """ Select top k scores and decode to get xy coordinates. """ + k = self.max_per_img + shape_fm = paddle.shape(scores) + shape_fm.stop_gradient = True + cat, height, width = shape_fm[1], shape_fm[2], shape_fm[3] + # batch size is 1 + scores_r = paddle.reshape(scores, [cat, -1]) + topk_scores, topk_inds = paddle.topk(scores_r, k) + topk_ys = topk_inds // width + topk_xs = topk_inds % width + + topk_score_r = paddle.reshape(topk_scores, [-1]) + topk_score, topk_ind = paddle.topk(topk_score_r, k) + k_t = paddle.full(paddle.shape(topk_ind), k, dtype='int64') + topk_clses = paddle.cast(paddle.floor_divide(topk_ind, k_t), 'float32') + + topk_inds = paddle.reshape(topk_inds, [-1]) + topk_ys = paddle.reshape(topk_ys, [-1, 1]) + topk_xs = paddle.reshape(topk_xs, [-1, 1]) + topk_inds = paddle.gather(topk_inds, topk_ind) + topk_ys = paddle.gather(topk_ys, topk_ind) + topk_xs = paddle.gather(topk_xs, topk_ind) + return topk_score, topk_inds, topk_clses, topk_ys, topk_xs + + def __call__(self, hm, wh, reg, im_shape, scale_factor): + # 1.get clses and scores, note that hm had been done sigmoid + heat = self._simple_nms(hm) + scores, inds, topk_clses, ys, xs = self._topk(heat) + clses = topk_clses.unsqueeze(1) + scores = scores.unsqueeze(1) + + # 2.get bboxes, note only support batch_size=1 now + reg_t = paddle.transpose(reg, [0, 2, 3, 1]) + reg = paddle.reshape(reg_t, [-1, reg_t.shape[-1]]) + reg = paddle.gather(reg, inds) + xs = paddle.cast(xs, 'float32') + ys = paddle.cast(ys, 'float32') + xs = xs + reg[:, 0:1] + ys = ys + reg[:, 1:2] + wh_t = paddle.transpose(wh, [0, 2, 3, 1]) + wh = paddle.reshape(wh_t, [-1, wh_t.shape[-1]]) + wh = paddle.gather(wh, inds) + if self.regress_ltrb: + x1 = xs - wh[:, 0:1] + y1 = ys - wh[:, 1:2] + x2 = xs + wh[:, 2:3] + y2 = ys + wh[:, 3:4] + else: + x1 = xs - wh[:, 0:1] / 2 + y1 = ys - wh[:, 1:2] / 2 + x2 = xs + wh[:, 0:1] / 2 + y2 = ys + wh[:, 1:2] / 2 + n, c, feat_h, feat_w = paddle.shape(hm) + padw = (feat_w * self.down_ratio - im_shape[0, 1]) / 2 + padh = (feat_h * self.down_ratio - im_shape[0, 0]) / 2 + x1 = x1 * self.down_ratio + y1 = y1 * self.down_ratio + x2 = x2 * self.down_ratio + y2 = y2 * self.down_ratio + x1 = x1 - padw + y1 = y1 - padh + x2 = x2 - padw + y2 = y2 - padh + bboxes = paddle.concat([x1, y1, x2, y2], axis=1) + scale_y = scale_factor[:, 0:1] + scale_x = scale_factor[:, 1:2] + scale_expand = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], axis=1) + boxes_shape = bboxes.shape[:] + scale_expand = paddle.expand(scale_expand, shape=boxes_shape) + bboxes = paddle.divide(bboxes, scale_expand) + + results = paddle.concat([clses, scores, bboxes], axis=1) + return results, paddle.shape(results)[0:1], inds, topk_clses, ys, xs + + +@register +class DETRBBoxPostProcess(object): + __shared__ = ['num_classes', 'use_focal_loss'] + __inject__ = [] + + def __init__(self, + num_classes=80, + num_top_queries=100, + use_focal_loss=False): + super(DETRBBoxPostProcess, self).__init__() + self.num_classes = num_classes + self.num_top_queries = num_top_queries + self.use_focal_loss = use_focal_loss + + def __call__(self, head_out, im_shape, scale_factor): + """ + Decode the bbox. + + Args: + head_out (tuple): bbox_pred, cls_logit and masks of bbox_head output. + im_shape (Tensor): The shape of the input image. + scale_factor (Tensor): The scale factor of the input image. + Returns: + bbox_pred (Tensor): The output prediction with shape [N, 6], including + labels, scores and bboxes. The size of bboxes are corresponding + to the input image, the bboxes may be used in other branch. + bbox_num (Tensor): The number of prediction boxes of each batch with + shape [bs], and is N. + """ + bboxes, logits, masks = head_out + + bbox_pred = bbox_cxcywh_to_xyxy(bboxes) + origin_shape = paddle.floor(im_shape / scale_factor + 0.5) + img_h, img_w = paddle.split(origin_shape, 2, axis=-1) + origin_shape = paddle.concat( + [img_w, img_h, img_w, img_h], axis=-1).reshape([-1, 1, 4]) + bbox_pred *= origin_shape + + scores = F.sigmoid(logits) if self.use_focal_loss else F.softmax( + logits)[:, :, :-1] + + if not self.use_focal_loss: + scores, labels = scores.max(-1), scores.argmax(-1) + if scores.shape[1] > self.num_top_queries: + scores, index = paddle.topk( + scores, self.num_top_queries, axis=-1) + batch_ind = paddle.arange( + end=scores.shape[0]).unsqueeze(-1).tile( + [1, self.num_top_queries]) + index = paddle.stack([batch_ind, index], axis=-1) + labels = paddle.gather_nd(labels, index) + bbox_pred = paddle.gather_nd(bbox_pred, index) + else: + scores, index = paddle.topk( + scores.flatten(1), self.num_top_queries, axis=-1) + labels = index % self.num_classes + index = index // self.num_classes + batch_ind = paddle.arange(end=scores.shape[0]).unsqueeze(-1).tile( + [1, self.num_top_queries]) + index = paddle.stack([batch_ind, index], axis=-1) + bbox_pred = paddle.gather_nd(bbox_pred, index) + + bbox_pred = paddle.concat( + [ + labels.unsqueeze(-1).astype('float32'), scores.unsqueeze(-1), + bbox_pred + ], + axis=-1) + bbox_num = paddle.to_tensor( + bbox_pred.shape[1], dtype='int32').tile([bbox_pred.shape[0]]) + bbox_pred = bbox_pred.reshape([-1, 6]) + return bbox_pred, bbox_num + + +@register +class SparsePostProcess(object): + __shared__ = ['num_classes', 'assign_on_cpu'] + + def __init__(self, + num_proposals, + num_classes=80, + binary_thresh=0.5, + assign_on_cpu=False): + super(SparsePostProcess, self).__init__() + self.num_classes = num_classes + self.num_proposals = num_proposals + self.binary_thresh = binary_thresh + self.assign_on_cpu = assign_on_cpu + + def __call__(self, scores, bboxes, scale_factor, ori_shape, masks=None): + assert len(scores) == len(bboxes) == \ + len(ori_shape) == len(scale_factor) + device = paddle.device.get_device() + batch_size = len(ori_shape) + + scores = F.sigmoid(scores) + has_mask = masks is not None + if has_mask: + masks = F.sigmoid(masks) + masks = masks.reshape([batch_size, -1, *masks.shape[1:]]) + + bbox_pred = [] + mask_pred = [] if has_mask else None + bbox_num = paddle.zeros([batch_size], dtype='int32') + for i in range(batch_size): + score = scores[i] + bbox = bboxes[i] + score, indices = score.flatten(0, 1).topk( + self.num_proposals, sorted=False) + label = indices % self.num_classes + if has_mask: + mask = masks[i] + mask = mask.flatten(0, 1)[indices] + + H, W = ori_shape[i][0], ori_shape[i][1] + bbox = bbox[paddle.cast(indices / self.num_classes, indices.dtype)] + bbox /= scale_factor[i] + bbox[:, 0::2] = paddle.clip(bbox[:, 0::2], 0, W) + bbox[:, 1::2] = paddle.clip(bbox[:, 1::2], 0, H) + + keep = ((bbox[:, 2] - bbox[:, 0]).numpy() > 1.) & \ + ((bbox[:, 3] - bbox[:, 1]).numpy() > 1.) + if keep.sum() == 0: + bbox = paddle.zeros([1, 6], dtype='float32') + if has_mask: + mask = paddle.zeros([1, H, W], dtype='uint8') + else: + label = paddle.to_tensor(label.numpy()[keep]).astype( + 'float32').unsqueeze(-1) + score = paddle.to_tensor(score.numpy()[keep]).astype( + 'float32').unsqueeze(-1) + bbox = paddle.to_tensor(bbox.numpy()[keep]).astype('float32') + if has_mask: + mask = paddle.to_tensor(mask.numpy()[keep]).astype( + 'float32').unsqueeze(1) + mask = paste_mask(mask, bbox, H, W, self.assign_on_cpu) + mask = paddle.cast(mask >= self.binary_thresh, 'uint8') + bbox = paddle.concat([label, score, bbox], axis=-1) + + bbox_num[i] = bbox.shape[0] + bbox_pred.append(bbox) + if has_mask: + mask_pred.append(mask) + + bbox_pred = paddle.concat(bbox_pred) + mask_pred = paddle.concat(mask_pred) if has_mask else None + + if self.assign_on_cpu: + paddle.set_device(device) + + if has_mask: + return bbox_pred, bbox_num, mask_pred + else: + return bbox_pred, bbox_num + + +def paste_mask(masks, boxes, im_h, im_w, assign_on_cpu=False): + """ + Paste the mask prediction to the original image. + """ + x0_int, y0_int = 0, 0 + x1_int, y1_int = im_w, im_h + x0, y0, x1, y1 = paddle.split(boxes, 4, axis=1) + N = masks.shape[0] + img_y = paddle.arange(y0_int, y1_int) + 0.5 + img_x = paddle.arange(x0_int, x1_int) + 0.5 + + img_y = (img_y - y0) / (y1 - y0) * 2 - 1 + img_x = (img_x - x0) / (x1 - x0) * 2 - 1 + # img_x, img_y have shapes (N, w), (N, h) + + if assign_on_cpu: + paddle.set_device('cpu') + gx = img_x[:, None, :].expand( + [N, paddle.shape(img_y)[1], paddle.shape(img_x)[1]]) + gy = img_y[:, :, None].expand( + [N, paddle.shape(img_y)[1], paddle.shape(img_x)[1]]) + grid = paddle.stack([gx, gy], axis=3) + img_masks = F.grid_sample(masks, grid, align_corners=False) + return img_masks[:, 0] + + +def multiclass_nms(bboxs, num_classes, match_threshold=0.6, match_metric='iou'): + final_boxes = [] + for c in range(num_classes): + idxs = bboxs[:, 0] == c + if np.count_nonzero(idxs) == 0: continue + r = nms(bboxs[idxs, 1:], match_threshold, match_metric) + final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1)) + return final_boxes + + +def nms(dets, match_threshold=0.6, match_metric='iou'): + """ Apply NMS to avoid detecting too many overlapping bounding boxes. + Args: + dets: shape [N, 5], [score, x1, y1, x2, y2] + match_metric: 'iou' or 'ios' + match_threshold: overlap thresh for match metric. + """ + if dets.shape[0] == 0: + return dets[[], :] + scores = dets[:, 0] + x1 = dets[:, 1] + y1 = dets[:, 2] + x2 = dets[:, 3] + y2 = dets[:, 4] + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + ndets = dets.shape[0] + suppressed = np.zeros((ndets), dtype=np.int32) + + for _i in range(ndets): + i = order[_i] + if suppressed[i] == 1: + continue + ix1 = x1[i] + iy1 = y1[i] + ix2 = x2[i] + iy2 = y2[i] + iarea = areas[i] + for _j in range(_i + 1, ndets): + j = order[_j] + if suppressed[j] == 1: + continue + xx1 = max(ix1, x1[j]) + yy1 = max(iy1, y1[j]) + xx2 = min(ix2, x2[j]) + yy2 = min(iy2, y2[j]) + w = max(0.0, xx2 - xx1 + 1) + h = max(0.0, yy2 - yy1 + 1) + inter = w * h + if match_metric == 'iou': + union = iarea + areas[j] - inter + match_value = inter / union + elif match_metric == 'ios': + smaller = min(iarea, areas[j]) + match_value = inter / smaller + else: + raise ValueError() + if match_value >= match_threshold: + suppressed[j] = 1 + keep = np.where(suppressed == 0)[0] + dets = dets[keep, :] + return dets diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..f3ad19999ee3c606e0d64c47f9e33732260b1d0b --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__init__.py @@ -0,0 +1,5 @@ +from . import rpn_head +from . import embedding_rpn_head + +from .rpn_head import * +from .embedding_rpn_head import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5c47b88afaeb1fa38d1ccd6e6327048a8c2eba18 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/anchor_generator.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/anchor_generator.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..63fc1230e9ca9b8699eab0ebc8980fb108ab7d6c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/anchor_generator.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/embedding_rpn_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/embedding_rpn_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..37e9e259c58254732506587498271dd4c2959d7a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/embedding_rpn_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/proposal_generator.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/proposal_generator.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..8cfd361d86e10ba0376743d5b8ba64f5528054a6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/proposal_generator.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/rpn_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/rpn_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..181f4663c19d4acabc6eff04058b9b92cae23193 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/rpn_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d0e91cde999e2c578b4645242a0b39c2e2ada0a4 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target_layer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target_layer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..686f5385ffc35ed8aeacc28d2979bff1829013d6 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/__pycache__/target_layer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/anchor_generator.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/anchor_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..9a8e24ea3637110f45734e44b8b8e29b947dad52 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/anchor_generator.py @@ -0,0 +1,266 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on +# https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/anchor_generator.py + +import math + +import paddle +import paddle.nn as nn +import numpy as np + +from ppdet.core.workspace import register + +__all__ = ['AnchorGenerator', 'RetinaAnchorGenerator', 'S2ANetAnchorGenerator'] + + +@register +class AnchorGenerator(nn.Layer): + """ + Generate anchors according to the feature maps + + Args: + anchor_sizes (list[float] | list[list[float]]): The anchor sizes at + each feature point. list[float] means all feature levels share the + same sizes. list[list[float]] means the anchor sizes for + each level. The sizes stand for the scale of input size. + aspect_ratios (list[float] | list[list[float]]): The aspect ratios at + each feature point. list[float] means all feature levels share the + same ratios. list[list[float]] means the aspect ratios for + each level. + strides (list[float]): The strides of feature maps which generate + anchors + offset (float): The offset of the coordinate of anchors, default 0. + + """ + + def __init__(self, + anchor_sizes=[32, 64, 128, 256, 512], + aspect_ratios=[0.5, 1.0, 2.0], + strides=[16.0], + variance=[1.0, 1.0, 1.0, 1.0], + offset=0.): + super(AnchorGenerator, self).__init__() + self.anchor_sizes = anchor_sizes + self.aspect_ratios = aspect_ratios + self.strides = strides + self.variance = variance + self.cell_anchors = self._calculate_anchors(len(strides)) + self.offset = offset + + def _broadcast_params(self, params, num_features): + if not isinstance(params[0], (list, tuple)): # list[float] + return [params] * num_features + if len(params) == 1: + return list(params) * num_features + return params + + def generate_cell_anchors(self, sizes, aspect_ratios): + anchors = [] + for size in sizes: + area = size**2.0 + for aspect_ratio in aspect_ratios: + w = math.sqrt(area / aspect_ratio) + h = aspect_ratio * w + x0, y0, x1, y1 = -w / 2.0, -h / 2.0, w / 2.0, h / 2.0 + anchors.append([x0, y0, x1, y1]) + return paddle.to_tensor(anchors, dtype='float32') + + def _calculate_anchors(self, num_features): + sizes = self._broadcast_params(self.anchor_sizes, num_features) + aspect_ratios = self._broadcast_params(self.aspect_ratios, num_features) + cell_anchors = [ + self.generate_cell_anchors(s, a) + for s, a in zip(sizes, aspect_ratios) + ] + [ + self.register_buffer( + t.name, t, persistable=False) for t in cell_anchors + ] + return cell_anchors + + def _create_grid_offsets(self, size, stride, offset): + grid_height, grid_width = size[0], size[1] + shifts_x = paddle.arange( + offset * stride, grid_width * stride, step=stride, dtype='float32') + shifts_y = paddle.arange( + offset * stride, grid_height * stride, step=stride, dtype='float32') + shift_y, shift_x = paddle.meshgrid(shifts_y, shifts_x) + shift_x = paddle.reshape(shift_x, [-1]) + shift_y = paddle.reshape(shift_y, [-1]) + return shift_x, shift_y + + def _grid_anchors(self, grid_sizes): + anchors = [] + for size, stride, base_anchors in zip(grid_sizes, self.strides, + self.cell_anchors): + shift_x, shift_y = self._create_grid_offsets(size, stride, + self.offset) + shifts = paddle.stack((shift_x, shift_y, shift_x, shift_y), axis=1) + shifts = paddle.reshape(shifts, [-1, 1, 4]) + base_anchors = paddle.reshape(base_anchors, [1, -1, 4]) + + anchors.append(paddle.reshape(shifts + base_anchors, [-1, 4])) + + return anchors + + def forward(self, input): + grid_sizes = [paddle.shape(feature_map)[-2:] for feature_map in input] + anchors_over_all_feature_maps = self._grid_anchors(grid_sizes) + return anchors_over_all_feature_maps + + @property + def num_anchors(self): + """ + Returns: + int: number of anchors at every pixel + location, on that feature map. + For example, if at every pixel we use anchors of 3 aspect + ratios and 5 sizes, the number of anchors is 15. + For FPN models, `num_anchors` on every feature map is the same. + """ + return len(self.cell_anchors[0]) + + +@register +class RetinaAnchorGenerator(AnchorGenerator): + def __init__(self, + octave_base_scale=4, + scales_per_octave=3, + aspect_ratios=[0.5, 1.0, 2.0], + strides=[8.0, 16.0, 32.0, 64.0, 128.0], + variance=[1.0, 1.0, 1.0, 1.0], + offset=0.0): + anchor_sizes = [] + for s in strides: + anchor_sizes.append([ + s * octave_base_scale * 2**(i/scales_per_octave) \ + for i in range(scales_per_octave)]) + super(RetinaAnchorGenerator, self).__init__( + anchor_sizes=anchor_sizes, + aspect_ratios=aspect_ratios, + strides=strides, + variance=variance, + offset=offset) + + +@register +class S2ANetAnchorGenerator(nn.Layer): + """ + AnchorGenerator by paddle + """ + + def __init__(self, base_size, scales, ratios, scale_major=True, ctr=None): + super(S2ANetAnchorGenerator, self).__init__() + self.base_size = base_size + self.scales = paddle.to_tensor(scales) + self.ratios = paddle.to_tensor(ratios) + self.scale_major = scale_major + self.ctr = ctr + self.base_anchors = self.gen_base_anchors() + + @property + def num_base_anchors(self): + return self.base_anchors.shape[0] + + def gen_base_anchors(self): + w = self.base_size + h = self.base_size + if self.ctr is None: + x_ctr = 0.5 * (w - 1) + y_ctr = 0.5 * (h - 1) + else: + x_ctr, y_ctr = self.ctr + + h_ratios = paddle.sqrt(self.ratios) + w_ratios = 1 / h_ratios + if self.scale_major: + ws = (w * w_ratios[:] * self.scales[:]).reshape([-1]) + hs = (h * h_ratios[:] * self.scales[:]).reshape([-1]) + else: + ws = (w * self.scales[:] * w_ratios[:]).reshape([-1]) + hs = (h * self.scales[:] * h_ratios[:]).reshape([-1]) + + base_anchors = paddle.stack( + [ + x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1), + x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1) + ], + axis=-1) + base_anchors = paddle.round(base_anchors) + return base_anchors + + def _meshgrid(self, x, y, row_major=True): + yy, xx = paddle.meshgrid(y, x) + yy = yy.reshape([-1]) + xx = xx.reshape([-1]) + if row_major: + return xx, yy + else: + return yy, xx + + def forward(self, featmap_size, stride=16): + # featmap_size*stride project it to original area + + feat_h = featmap_size[0] + feat_w = featmap_size[1] + shift_x = paddle.arange(0, feat_w, 1, 'int32') * stride + shift_y = paddle.arange(0, feat_h, 1, 'int32') * stride + shift_xx, shift_yy = self._meshgrid(shift_x, shift_y) + shifts = paddle.stack([shift_xx, shift_yy, shift_xx, shift_yy], axis=-1) + + all_anchors = self.base_anchors[:, :] + shifts[:, :] + all_anchors = all_anchors.cast(paddle.float32).reshape( + [feat_h * feat_w, 4]) + all_anchors = self.rect2rbox(all_anchors) + return all_anchors + + def valid_flags(self, featmap_size, valid_size): + feat_h, feat_w = featmap_size + valid_h, valid_w = valid_size + assert valid_h <= feat_h and valid_w <= feat_w + valid_x = paddle.zeros([feat_w], dtype='int32') + valid_y = paddle.zeros([feat_h], dtype='int32') + valid_x[:valid_w] = 1 + valid_y[:valid_h] = 1 + valid_xx, valid_yy = self._meshgrid(valid_x, valid_y) + valid = valid_xx & valid_yy + valid = paddle.reshape(valid, [-1, 1]) + valid = paddle.expand(valid, [-1, self.num_base_anchors]).reshape([-1]) + return valid + + def rect2rbox(self, bboxes): + """ + :param bboxes: shape (L, 4) (xmin, ymin, xmax, ymax) + :return: dbboxes: shape (L, 5) (x_ctr, y_ctr, w, h, angle) + """ + x1, y1, x2, y2 = paddle.split(bboxes, 4, axis=-1) + + x_ctr = (x1 + x2) / 2.0 + y_ctr = (y1 + y2) / 2.0 + edges1 = paddle.abs(x2 - x1) + edges2 = paddle.abs(y2 - y1) + + rbox_w = paddle.maximum(edges1, edges2) + rbox_h = paddle.minimum(edges1, edges2) + + # set angle + inds = edges1 < edges2 + inds = paddle.cast(inds, paddle.float32) + rboxes_angle = inds * np.pi / 2.0 + + rboxes = paddle.concat( + (x_ctr, y_ctr, rbox_w, rbox_h, rboxes_angle), axis=-1) + return rboxes diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/embedding_rpn_head.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/embedding_rpn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..29174984b1d998a1a8a1cb6d872f0f9d89d89408 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/embedding_rpn_head.py @@ -0,0 +1,63 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This code is referenced from: https://github.com/open-mmlab/mmdetection + +import paddle +from paddle import nn + +from ppdet.core.workspace import register + +__all__ = ['EmbeddingRPNHead'] + + +@register +class EmbeddingRPNHead(nn.Layer): + __shared__ = ['proposal_embedding_dim'] + + def __init__(self, num_proposals, proposal_embedding_dim=256): + super(EmbeddingRPNHead, self).__init__() + + self.num_proposals = num_proposals + self.proposal_embedding_dim = proposal_embedding_dim + + self._init_layers() + self._init_weights() + + def _init_layers(self): + self.init_proposal_bboxes = nn.Embedding(self.num_proposals, 4) + self.init_proposal_features = nn.Embedding(self.num_proposals, + self.proposal_embedding_dim) + + def _init_weights(self): + init_bboxes = paddle.empty_like(self.init_proposal_bboxes.weight) + init_bboxes[:, :2] = 0.5 + init_bboxes[:, 2:] = 1.0 + self.init_proposal_bboxes.weight.set_value(init_bboxes) + + @staticmethod + def bbox_cxcywh_to_xyxy(x): + cxcy, wh = paddle.split(x, 2, axis=-1) + return paddle.concat([cxcy - 0.5 * wh, cxcy + 0.5 * wh], axis=-1) + + def forward(self, img_whwh): + proposal_bboxes = self.init_proposal_bboxes.weight.clone() + proposal_bboxes = self.bbox_cxcywh_to_xyxy(proposal_bboxes) + proposal_bboxes = proposal_bboxes.unsqueeze(0) * img_whwh.unsqueeze(1) + + proposal_features = self.init_proposal_features.weight.clone() + proposal_features = proposal_features.unsqueeze(0).tile( + [img_whwh.shape[0], 1, 1]) + + return proposal_bboxes, proposal_features diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/proposal_generator.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/proposal_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..b87a72ced5f0ddcd9515332a17b52d5210e9398a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/proposal_generator.py @@ -0,0 +1,83 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle + +from ppdet.core.workspace import register, serializable +from .. import ops + + +@register +@serializable +class ProposalGenerator(object): + """ + Proposal generation module + + For more details, please refer to the document of generate_proposals + in ppdet/modeing/ops.py + + Args: + pre_nms_top_n (int): Number of total bboxes to be kept per + image before NMS. default 6000 + post_nms_top_n (int): Number of total bboxes to be kept per + image after NMS. default 1000 + nms_thresh (float): Threshold in NMS. default 0.5 + min_size (flaot): Remove predicted boxes with either height or + width < min_size. default 0.1 + eta (float): Apply in adaptive NMS, if adaptive `threshold > 0.5`, + `adaptive_threshold = adaptive_threshold * eta` in each iteration. + default 1. + topk_after_collect (bool): whether to adopt topk after batch + collection. If topk_after_collect is true, box filter will not be + used after NMS at each image in proposal generation. default false + """ + + def __init__(self, + pre_nms_top_n=12000, + post_nms_top_n=2000, + nms_thresh=.5, + min_size=.1, + eta=1., + topk_after_collect=False): + super(ProposalGenerator, self).__init__() + self.pre_nms_top_n = pre_nms_top_n + self.post_nms_top_n = post_nms_top_n + self.nms_thresh = nms_thresh + self.min_size = min_size + self.eta = eta + self.topk_after_collect = topk_after_collect + + def __call__(self, scores, bbox_deltas, anchors, im_shape): + + top_n = self.pre_nms_top_n if self.topk_after_collect else self.post_nms_top_n + variances = paddle.ones_like(anchors) + if hasattr(paddle.vision.ops, "generate_proposals"): + generate_proposals = getattr(paddle.vision.ops, + "generate_proposals") + else: + generate_proposals = ops.generate_proposals + rpn_rois, rpn_rois_prob, rpn_rois_num = generate_proposals( + scores, + bbox_deltas, + im_shape, + anchors, + variances, + pre_nms_top_n=self.pre_nms_top_n, + post_nms_top_n=top_n, + nms_thresh=self.nms_thresh, + min_size=self.min_size, + eta=self.eta, + return_rois_num=True) + + return rpn_rois, rpn_rois_prob, rpn_rois_num, self.post_nms_top_n diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/rpn_head.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/rpn_head.py new file mode 100644 index 0000000000000000000000000000000000000000..8a431eeac208a052ed8de5dfb7278948cfbcf042 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/rpn_head.py @@ -0,0 +1,313 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal + +from ppdet.core.workspace import register +from .anchor_generator import AnchorGenerator +from .target_layer import RPNTargetAssign +from .proposal_generator import ProposalGenerator +from ..cls_utils import _get_class_default_kwargs + + +class RPNFeat(nn.Layer): + """ + Feature extraction in RPN head + + Args: + in_channel (int): Input channel + out_channel (int): Output channel + """ + + def __init__(self, in_channel=1024, out_channel=1024): + super(RPNFeat, self).__init__() + # rpn feat is shared with each level + self.rpn_conv = nn.Conv2D( + in_channels=in_channel, + out_channels=out_channel, + kernel_size=3, + padding=1, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0., std=0.01))) + self.rpn_conv.skip_quant = True + + def forward(self, feats): + rpn_feats = [] + for feat in feats: + rpn_feats.append(F.relu(self.rpn_conv(feat))) + return rpn_feats + + +@register +class RPNHead(nn.Layer): + """ + Region Proposal Network + + Args: + anchor_generator (dict): configure of anchor generation + rpn_target_assign (dict): configure of rpn targets assignment + train_proposal (dict): configure of proposals generation + at the stage of training + test_proposal (dict): configure of proposals generation + at the stage of prediction + in_channel (int): channel of input feature maps which can be + derived by from_config + """ + __shared__ = ['export_onnx'] + __inject__ = ['loss_rpn_bbox'] + + def __init__(self, + anchor_generator=_get_class_default_kwargs(AnchorGenerator), + rpn_target_assign=_get_class_default_kwargs(RPNTargetAssign), + train_proposal=_get_class_default_kwargs(ProposalGenerator, + 12000, 2000), + test_proposal=_get_class_default_kwargs(ProposalGenerator), + in_channel=1024, + export_onnx=False, + loss_rpn_bbox=None): + super(RPNHead, self).__init__() + self.anchor_generator = anchor_generator + self.rpn_target_assign = rpn_target_assign + self.train_proposal = train_proposal + self.test_proposal = test_proposal + self.export_onnx = export_onnx + if isinstance(anchor_generator, dict): + self.anchor_generator = AnchorGenerator(**anchor_generator) + if isinstance(rpn_target_assign, dict): + self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign) + if isinstance(train_proposal, dict): + self.train_proposal = ProposalGenerator(**train_proposal) + if isinstance(test_proposal, dict): + self.test_proposal = ProposalGenerator(**test_proposal) + self.loss_rpn_bbox = loss_rpn_bbox + + num_anchors = self.anchor_generator.num_anchors + self.rpn_feat = RPNFeat(in_channel, in_channel) + # rpn head is shared with each level + # rpn roi classification scores + self.rpn_rois_score = nn.Conv2D( + in_channels=in_channel, + out_channels=num_anchors, + kernel_size=1, + padding=0, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0., std=0.01))) + self.rpn_rois_score.skip_quant = True + + # rpn roi bbox regression deltas + self.rpn_rois_delta = nn.Conv2D( + in_channels=in_channel, + out_channels=4 * num_anchors, + kernel_size=1, + padding=0, + weight_attr=paddle.ParamAttr(initializer=Normal( + mean=0., std=0.01))) + self.rpn_rois_delta.skip_quant = True + + @classmethod + def from_config(cls, cfg, input_shape): + # FPN share same rpn head + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channel': input_shape.channels} + + def forward(self, feats, inputs): + rpn_feats = self.rpn_feat(feats) + scores = [] + deltas = [] + + for rpn_feat in rpn_feats: + rrs = self.rpn_rois_score(rpn_feat) + rrd = self.rpn_rois_delta(rpn_feat) + scores.append(rrs) + deltas.append(rrd) + + anchors = self.anchor_generator(rpn_feats) + + rois, rois_num = self._gen_proposal(scores, deltas, anchors, inputs) + if self.training: + loss = self.get_loss(scores, deltas, anchors, inputs) + return rois, rois_num, loss + else: + return rois, rois_num, None + + def _gen_proposal(self, scores, bbox_deltas, anchors, inputs): + """ + scores (list[Tensor]): Multi-level scores prediction + bbox_deltas (list[Tensor]): Multi-level deltas prediction + anchors (list[Tensor]): Multi-level anchors + inputs (dict): ground truth info + """ + prop_gen = self.train_proposal if self.training else self.test_proposal + im_shape = inputs['im_shape'] + + # Collect multi-level proposals for each batch + # Get 'topk' of them as final output + + if self.export_onnx: + # bs = 1 when exporting onnx + onnx_rpn_rois_list = [] + onnx_rpn_prob_list = [] + onnx_rpn_rois_num_list = [] + + for rpn_score, rpn_delta, anchor in zip(scores, bbox_deltas, + anchors): + onnx_rpn_rois, onnx_rpn_rois_prob, onnx_rpn_rois_num, onnx_post_nms_top_n = prop_gen( + scores=rpn_score[0:1], + bbox_deltas=rpn_delta[0:1], + anchors=anchor, + im_shape=im_shape[0:1]) + onnx_rpn_rois_list.append(onnx_rpn_rois) + onnx_rpn_prob_list.append(onnx_rpn_rois_prob) + onnx_rpn_rois_num_list.append(onnx_rpn_rois_num) + + onnx_rpn_rois = paddle.concat(onnx_rpn_rois_list) + onnx_rpn_prob = paddle.concat(onnx_rpn_prob_list).flatten() + + onnx_top_n = paddle.to_tensor(onnx_post_nms_top_n).cast('int32') + onnx_num_rois = paddle.shape(onnx_rpn_prob)[0].cast('int32') + k = paddle.minimum(onnx_top_n, onnx_num_rois) + onnx_topk_prob, onnx_topk_inds = paddle.topk(onnx_rpn_prob, k) + onnx_topk_rois = paddle.gather(onnx_rpn_rois, onnx_topk_inds) + # TODO(wangguanzhong): Now bs_rois_collect in export_onnx is moved outside conditional branch + # due to problems in dy2static of paddle. Will fix it when updating paddle framework. + # bs_rois_collect = [onnx_topk_rois] + # bs_rois_num_collect = paddle.shape(onnx_topk_rois)[0] + + else: + bs_rois_collect = [] + bs_rois_num_collect = [] + + batch_size = paddle.slice(paddle.shape(im_shape), [0], [0], [1]) + + # Generate proposals for each level and each batch. + # Discard batch-computing to avoid sorting bbox cross different batches. + for i in range(batch_size): + rpn_rois_list = [] + rpn_prob_list = [] + rpn_rois_num_list = [] + + for rpn_score, rpn_delta, anchor in zip(scores, bbox_deltas, + anchors): + rpn_rois, rpn_rois_prob, rpn_rois_num, post_nms_top_n = prop_gen( + scores=rpn_score[i:i + 1], + bbox_deltas=rpn_delta[i:i + 1], + anchors=anchor, + im_shape=im_shape[i:i + 1]) + rpn_rois_list.append(rpn_rois) + rpn_prob_list.append(rpn_rois_prob) + rpn_rois_num_list.append(rpn_rois_num) + + if len(scores) > 1: + rpn_rois = paddle.concat(rpn_rois_list) + rpn_prob = paddle.concat(rpn_prob_list).flatten() + + num_rois = paddle.shape(rpn_prob)[0].cast('int32') + if num_rois > post_nms_top_n: + topk_prob, topk_inds = paddle.topk(rpn_prob, + post_nms_top_n) + topk_rois = paddle.gather(rpn_rois, topk_inds) + else: + topk_rois = rpn_rois + topk_prob = rpn_prob + else: + topk_rois = rpn_rois_list[0] + topk_prob = rpn_prob_list[0].flatten() + + bs_rois_collect.append(topk_rois) + bs_rois_num_collect.append(paddle.shape(topk_rois)[0]) + + bs_rois_num_collect = paddle.concat(bs_rois_num_collect) + + if self.export_onnx: + output_rois = [onnx_topk_rois] + output_rois_num = paddle.shape(onnx_topk_rois)[0] + else: + output_rois = bs_rois_collect + output_rois_num = bs_rois_num_collect + + return output_rois, output_rois_num + + def get_loss(self, pred_scores, pred_deltas, anchors, inputs): + """ + pred_scores (list[Tensor]): Multi-level scores prediction + pred_deltas (list[Tensor]): Multi-level deltas prediction + anchors (list[Tensor]): Multi-level anchors + inputs (dict): ground truth info, including im, gt_bbox, gt_score + """ + anchors = [paddle.reshape(a, shape=(-1, 4)) for a in anchors] + anchors = paddle.concat(anchors) + + scores = [ + paddle.reshape( + paddle.transpose( + v, perm=[0, 2, 3, 1]), + shape=(v.shape[0], -1, 1)) for v in pred_scores + ] + scores = paddle.concat(scores, axis=1) + + deltas = [ + paddle.reshape( + paddle.transpose( + v, perm=[0, 2, 3, 1]), + shape=(v.shape[0], -1, 4)) for v in pred_deltas + ] + deltas = paddle.concat(deltas, axis=1) + + score_tgt, bbox_tgt, loc_tgt, norm = self.rpn_target_assign(inputs, + anchors) + + scores = paddle.reshape(x=scores, shape=(-1, )) + deltas = paddle.reshape(x=deltas, shape=(-1, 4)) + + score_tgt = paddle.concat(score_tgt) + score_tgt.stop_gradient = True + + pos_mask = score_tgt == 1 + pos_ind = paddle.nonzero(pos_mask) + + valid_mask = score_tgt >= 0 + valid_ind = paddle.nonzero(valid_mask) + + # cls loss + if valid_ind.shape[0] == 0: + loss_rpn_cls = paddle.zeros([1], dtype='float32') + else: + score_pred = paddle.gather(scores, valid_ind) + score_label = paddle.gather(score_tgt, valid_ind).cast('float32') + score_label.stop_gradient = True + loss_rpn_cls = F.binary_cross_entropy_with_logits( + logit=score_pred, label=score_label, reduction="sum") + + # reg loss + if pos_ind.shape[0] == 0: + loss_rpn_reg = paddle.zeros([1], dtype='float32') + else: + loc_pred = paddle.gather(deltas, pos_ind) + loc_tgt = paddle.concat(loc_tgt) + loc_tgt = paddle.gather(loc_tgt, pos_ind) + loc_tgt.stop_gradient = True + + if self.loss_rpn_bbox is None: + loss_rpn_reg = paddle.abs(loc_pred - loc_tgt).sum() + else: + loss_rpn_reg = self.loss_rpn_bbox(loc_pred, loc_tgt).sum() + + return { + 'loss_rpn_cls': loss_rpn_cls / norm, + 'loss_rpn_reg': loss_rpn_reg / norm + } diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target.py new file mode 100644 index 0000000000000000000000000000000000000000..f95f906a27fa93eb6543f522a579f0242c919e52 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target.py @@ -0,0 +1,678 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle +from ..bbox_utils import bbox2delta, bbox_overlaps + + +def rpn_anchor_target(anchors, + gt_boxes, + rpn_batch_size_per_im, + rpn_positive_overlap, + rpn_negative_overlap, + rpn_fg_fraction, + use_random=True, + batch_size=1, + ignore_thresh=-1, + is_crowd=None, + weights=[1., 1., 1., 1.], + assign_on_cpu=False): + tgt_labels = [] + tgt_bboxes = [] + tgt_deltas = [] + for i in range(batch_size): + gt_bbox = gt_boxes[i] + is_crowd_i = is_crowd[i] if is_crowd else None + # Step1: match anchor and gt_bbox + matches, match_labels = label_box( + anchors, gt_bbox, rpn_positive_overlap, rpn_negative_overlap, True, + ignore_thresh, is_crowd_i, assign_on_cpu) + # Step2: sample anchor + fg_inds, bg_inds = subsample_labels(match_labels, rpn_batch_size_per_im, + rpn_fg_fraction, 0, use_random) + # Fill with the ignore label (-1), then set positive and negative labels + labels = paddle.full(match_labels.shape, -1, dtype='int32') + if bg_inds.shape[0] > 0: + labels = paddle.scatter(labels, bg_inds, paddle.zeros_like(bg_inds)) + if fg_inds.shape[0] > 0: + labels = paddle.scatter(labels, fg_inds, paddle.ones_like(fg_inds)) + # Step3: make output + if gt_bbox.shape[0] == 0: + matched_gt_boxes = paddle.zeros([matches.shape[0], 4]) + tgt_delta = paddle.zeros([matches.shape[0], 4]) + else: + matched_gt_boxes = paddle.gather(gt_bbox, matches) + tgt_delta = bbox2delta(anchors, matched_gt_boxes, weights) + matched_gt_boxes.stop_gradient = True + tgt_delta.stop_gradient = True + labels.stop_gradient = True + tgt_labels.append(labels) + tgt_bboxes.append(matched_gt_boxes) + tgt_deltas.append(tgt_delta) + + return tgt_labels, tgt_bboxes, tgt_deltas + + +def label_box(anchors, + gt_boxes, + positive_overlap, + negative_overlap, + allow_low_quality, + ignore_thresh, + is_crowd=None, + assign_on_cpu=False): + if assign_on_cpu: + device = paddle.device.get_device() + paddle.set_device("cpu") + iou = bbox_overlaps(gt_boxes, anchors) + paddle.set_device(device) + + else: + iou = bbox_overlaps(gt_boxes, anchors) + n_gt = gt_boxes.shape[0] + if n_gt == 0 or is_crowd is None: + n_gt_crowd = 0 + else: + n_gt_crowd = paddle.nonzero(is_crowd).shape[0] + if iou.shape[0] == 0 or n_gt_crowd == n_gt: + # No truth, assign everything to background + default_matches = paddle.full((iou.shape[1], ), 0, dtype='int64') + default_match_labels = paddle.full((iou.shape[1], ), 0, dtype='int32') + return default_matches, default_match_labels + # if ignore_thresh > 0, remove anchor if it is closed to + # one of the crowded ground-truth + if n_gt_crowd > 0: + N_a = anchors.shape[0] + ones = paddle.ones([N_a]) + mask = is_crowd * ones + + if ignore_thresh > 0: + crowd_iou = iou * mask + valid = (paddle.sum((crowd_iou > ignore_thresh).cast('int32'), + axis=0) > 0).cast('float32') + iou = iou * (1 - valid) - valid + + # ignore the iou between anchor and crowded ground-truth + iou = iou * (1 - mask) - mask + + matched_vals, matches = paddle.topk(iou, k=1, axis=0) + match_labels = paddle.full(matches.shape, -1, dtype='int32') + # set ignored anchor with iou = -1 + neg_cond = paddle.logical_and(matched_vals > -1, + matched_vals < negative_overlap) + match_labels = paddle.where(neg_cond, + paddle.zeros_like(match_labels), match_labels) + match_labels = paddle.where(matched_vals >= positive_overlap, + paddle.ones_like(match_labels), match_labels) + if allow_low_quality: + highest_quality_foreach_gt = iou.max(axis=1, keepdim=True) + pred_inds_with_highest_quality = paddle.logical_and( + iou > 0, iou == highest_quality_foreach_gt).cast('int32').sum( + 0, keepdim=True) + match_labels = paddle.where(pred_inds_with_highest_quality > 0, + paddle.ones_like(match_labels), + match_labels) + + matches = matches.flatten() + match_labels = match_labels.flatten() + + return matches, match_labels + + +def subsample_labels(labels, + num_samples, + fg_fraction, + bg_label=0, + use_random=True): + positive = paddle.nonzero( + paddle.logical_and(labels != -1, labels != bg_label)) + negative = paddle.nonzero(labels == bg_label) + + fg_num = int(num_samples * fg_fraction) + fg_num = min(positive.numel(), fg_num) + bg_num = num_samples - fg_num + bg_num = min(negative.numel(), bg_num) + if fg_num == 0 and bg_num == 0: + fg_inds = paddle.zeros([0], dtype='int32') + bg_inds = paddle.zeros([0], dtype='int32') + return fg_inds, bg_inds + + # randomly select positive and negative examples + + negative = negative.cast('int32').flatten() + bg_perm = paddle.randperm(negative.numel(), dtype='int32') + bg_perm = paddle.slice(bg_perm, axes=[0], starts=[0], ends=[bg_num]) + if use_random: + bg_inds = paddle.gather(negative, bg_perm) + else: + bg_inds = paddle.slice(negative, axes=[0], starts=[0], ends=[bg_num]) + if fg_num == 0: + fg_inds = paddle.zeros([0], dtype='int32') + return fg_inds, bg_inds + + positive = positive.cast('int32').flatten() + fg_perm = paddle.randperm(positive.numel(), dtype='int32') + fg_perm = paddle.slice(fg_perm, axes=[0], starts=[0], ends=[fg_num]) + if use_random: + fg_inds = paddle.gather(positive, fg_perm) + else: + fg_inds = paddle.slice(positive, axes=[0], starts=[0], ends=[fg_num]) + + return fg_inds, bg_inds + + +def generate_proposal_target(rpn_rois, + gt_classes, + gt_boxes, + batch_size_per_im, + fg_fraction, + fg_thresh, + bg_thresh, + num_classes, + ignore_thresh=-1., + is_crowd=None, + use_random=True, + is_cascade=False, + cascade_iou=0.5, + assign_on_cpu=False, + add_gt_as_proposals=True): + + rois_with_gt = [] + tgt_labels = [] + tgt_bboxes = [] + tgt_gt_inds = [] + new_rois_num = [] + + # In cascade rcnn, the threshold for foreground and background + # is used from cascade_iou + fg_thresh = cascade_iou if is_cascade else fg_thresh + bg_thresh = cascade_iou if is_cascade else bg_thresh + for i, rpn_roi in enumerate(rpn_rois): + gt_bbox = gt_boxes[i] + is_crowd_i = is_crowd[i] if is_crowd else None + gt_class = paddle.squeeze(gt_classes[i], axis=-1) + + # Concat RoIs and gt boxes except cascade rcnn or none gt + if add_gt_as_proposals and gt_bbox.shape[0] > 0: + bbox = paddle.concat([rpn_roi, gt_bbox]) + else: + bbox = rpn_roi + + # Step1: label bbox + matches, match_labels = label_box(bbox, gt_bbox, fg_thresh, bg_thresh, + False, ignore_thresh, is_crowd_i, + assign_on_cpu) + # Step2: sample bbox + sampled_inds, sampled_gt_classes = sample_bbox( + matches, match_labels, gt_class, batch_size_per_im, fg_fraction, + num_classes, use_random, is_cascade) + + # Step3: make output + rois_per_image = bbox if is_cascade else paddle.gather(bbox, + sampled_inds) + sampled_gt_ind = matches if is_cascade else paddle.gather(matches, + sampled_inds) + if gt_bbox.shape[0] > 0: + sampled_bbox = paddle.gather(gt_bbox, sampled_gt_ind) + else: + num = rois_per_image.shape[0] + sampled_bbox = paddle.zeros([num, 4], dtype='float32') + + rois_per_image.stop_gradient = True + sampled_gt_ind.stop_gradient = True + sampled_bbox.stop_gradient = True + tgt_labels.append(sampled_gt_classes) + tgt_bboxes.append(sampled_bbox) + rois_with_gt.append(rois_per_image) + tgt_gt_inds.append(sampled_gt_ind) + new_rois_num.append(paddle.shape(sampled_inds)[0]) + new_rois_num = paddle.concat(new_rois_num) + return rois_with_gt, tgt_labels, tgt_bboxes, tgt_gt_inds, new_rois_num + + +def sample_bbox(matches, + match_labels, + gt_classes, + batch_size_per_im, + fg_fraction, + num_classes, + use_random=True, + is_cascade=False): + + n_gt = gt_classes.shape[0] + if n_gt == 0: + # No truth, assign everything to background + gt_classes = paddle.ones(matches.shape, dtype='int32') * num_classes + #return matches, match_labels + num_classes + else: + gt_classes = paddle.gather(gt_classes, matches) + gt_classes = paddle.where(match_labels == 0, + paddle.ones_like(gt_classes) * num_classes, + gt_classes) + gt_classes = paddle.where(match_labels == -1, + paddle.ones_like(gt_classes) * -1, gt_classes) + if is_cascade: + index = paddle.arange(matches.shape[0]) + return index, gt_classes + rois_per_image = int(batch_size_per_im) + + fg_inds, bg_inds = subsample_labels(gt_classes, rois_per_image, fg_fraction, + num_classes, use_random) + if fg_inds.shape[0] == 0 and bg_inds.shape[0] == 0: + # fake output labeled with -1 when all boxes are neither + # foreground nor background + sampled_inds = paddle.zeros([1], dtype='int32') + else: + sampled_inds = paddle.concat([fg_inds, bg_inds]) + sampled_gt_classes = paddle.gather(gt_classes, sampled_inds) + return sampled_inds, sampled_gt_classes + + +def polygons_to_mask(polygons, height, width): + """ + Convert the polygons to mask format + + Args: + polygons (list[ndarray]): each array has shape (Nx2,) + height (int): mask height + width (int): mask width + Returns: + ndarray: a bool mask of shape (height, width) + """ + import pycocotools.mask as mask_util + assert len(polygons) > 0, "COCOAPI does not support empty polygons" + rles = mask_util.frPyObjects(polygons, height, width) + rle = mask_util.merge(rles) + return mask_util.decode(rle).astype(np.bool_) + + +def rasterize_polygons_within_box(poly, box, resolution): + w, h = box[2] - box[0], box[3] - box[1] + polygons = [np.asarray(p, dtype=np.float64) for p in poly] + for p in polygons: + p[0::2] = p[0::2] - box[0] + p[1::2] = p[1::2] - box[1] + + ratio_h = resolution / max(h, 0.1) + ratio_w = resolution / max(w, 0.1) + + if ratio_h == ratio_w: + for p in polygons: + p *= ratio_h + else: + for p in polygons: + p[0::2] *= ratio_w + p[1::2] *= ratio_h + + # 3. Rasterize the polygons with coco api + mask = polygons_to_mask(polygons, resolution, resolution) + mask = paddle.to_tensor(mask, dtype='int32') + return mask + + +def generate_mask_target(gt_segms, rois, labels_int32, sampled_gt_inds, + num_classes, resolution): + mask_rois = [] + mask_rois_num = [] + tgt_masks = [] + tgt_classes = [] + mask_index = [] + tgt_weights = [] + for k in range(len(rois)): + labels_per_im = labels_int32[k] + # select rois labeled with foreground + fg_inds = paddle.nonzero( + paddle.logical_and(labels_per_im != -1, labels_per_im != + num_classes)) + has_fg = True + # generate fake roi if foreground is empty + if fg_inds.numel() == 0: + has_fg = False + fg_inds = paddle.ones([1, 1], dtype='int64') + inds_per_im = sampled_gt_inds[k] + inds_per_im = paddle.gather(inds_per_im, fg_inds) + + rois_per_im = rois[k] + fg_rois = paddle.gather(rois_per_im, fg_inds) + # Copy the foreground roi to cpu + # to generate mask target with ground-truth + boxes = fg_rois.numpy() + gt_segms_per_im = gt_segms[k] + + new_segm = [] + inds_per_im = inds_per_im.numpy() + if len(gt_segms_per_im) > 0: + for i in inds_per_im: + new_segm.append(gt_segms_per_im[i]) + fg_inds_new = fg_inds.reshape([-1]).numpy() + results = [] + if len(gt_segms_per_im) > 0: + for j in range(fg_inds_new.shape[0]): + results.append( + rasterize_polygons_within_box(new_segm[j], boxes[j], + resolution)) + else: + results.append(paddle.ones([resolution, resolution], dtype='int32')) + + fg_classes = paddle.gather(labels_per_im, fg_inds) + weight = paddle.ones([fg_rois.shape[0]], dtype='float32') + if not has_fg: + # now all sampled classes are background + # which will cause error in loss calculation, + # make fake classes with weight of 0. + fg_classes = paddle.zeros([1], dtype='int32') + weight = weight - 1 + tgt_mask = paddle.stack(results) + tgt_mask.stop_gradient = True + fg_rois.stop_gradient = True + + mask_index.append(fg_inds) + mask_rois.append(fg_rois) + mask_rois_num.append(paddle.shape(fg_rois)[0]) + tgt_classes.append(fg_classes) + tgt_masks.append(tgt_mask) + tgt_weights.append(weight) + + mask_index = paddle.concat(mask_index) + mask_rois_num = paddle.concat(mask_rois_num) + tgt_classes = paddle.concat(tgt_classes, axis=0) + tgt_masks = paddle.concat(tgt_masks, axis=0) + tgt_weights = paddle.concat(tgt_weights, axis=0) + + return mask_rois, mask_rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights + + +def libra_sample_pos(max_overlaps, max_classes, pos_inds, num_expected): + if len(pos_inds) <= num_expected: + return pos_inds + else: + unique_gt_inds = np.unique(max_classes[pos_inds]) + num_gts = len(unique_gt_inds) + num_per_gt = int(round(num_expected / float(num_gts)) + 1) + + sampled_inds = [] + for i in unique_gt_inds: + inds = np.nonzero(max_classes == i)[0] + before_len = len(inds) + inds = list(set(inds) & set(pos_inds)) + after_len = len(inds) + if len(inds) > num_per_gt: + inds = np.random.choice(inds, size=num_per_gt, replace=False) + sampled_inds.extend(list(inds)) # combine as a new sampler + if len(sampled_inds) < num_expected: + num_extra = num_expected - len(sampled_inds) + extra_inds = np.array(list(set(pos_inds) - set(sampled_inds))) + assert len(sampled_inds) + len(extra_inds) == len(pos_inds), \ + "sum of sampled_inds({}) and extra_inds({}) length must be equal with pos_inds({})!".format( + len(sampled_inds), len(extra_inds), len(pos_inds)) + if len(extra_inds) > num_extra: + extra_inds = np.random.choice( + extra_inds, size=num_extra, replace=False) + sampled_inds.extend(extra_inds.tolist()) + elif len(sampled_inds) > num_expected: + sampled_inds = np.random.choice( + sampled_inds, size=num_expected, replace=False) + return paddle.to_tensor(sampled_inds) + + +def libra_sample_via_interval(max_overlaps, full_set, num_expected, floor_thr, + num_bins, bg_thresh): + max_iou = max_overlaps.max() + iou_interval = (max_iou - floor_thr) / num_bins + per_num_expected = int(num_expected / num_bins) + + sampled_inds = [] + for i in range(num_bins): + start_iou = floor_thr + i * iou_interval + end_iou = floor_thr + (i + 1) * iou_interval + + tmp_set = set( + np.where( + np.logical_and(max_overlaps >= start_iou, max_overlaps < + end_iou))[0]) + tmp_inds = list(tmp_set & full_set) + + if len(tmp_inds) > per_num_expected: + tmp_sampled_set = np.random.choice( + tmp_inds, size=per_num_expected, replace=False) + else: + tmp_sampled_set = np.array(tmp_inds, dtype=np.int32) + sampled_inds.append(tmp_sampled_set) + + sampled_inds = np.concatenate(sampled_inds) + if len(sampled_inds) < num_expected: + num_extra = num_expected - len(sampled_inds) + extra_inds = np.array(list(full_set - set(sampled_inds))) + assert len(sampled_inds) + len(extra_inds) == len(full_set), \ + "sum of sampled_inds({}) and extra_inds({}) length must be equal with full_set({})!".format( + len(sampled_inds), len(extra_inds), len(full_set)) + + if len(extra_inds) > num_extra: + extra_inds = np.random.choice(extra_inds, num_extra, replace=False) + sampled_inds = np.concatenate([sampled_inds, extra_inds]) + + return sampled_inds + + +def libra_sample_neg(max_overlaps, + max_classes, + neg_inds, + num_expected, + floor_thr=-1, + floor_fraction=0, + num_bins=3, + bg_thresh=0.5): + if len(neg_inds) <= num_expected: + return neg_inds + else: + # balance sampling for negative samples + neg_set = set(neg_inds.tolist()) + if floor_thr > 0: + floor_set = set( + np.where( + np.logical_and(max_overlaps >= 0, max_overlaps < floor_thr)) + [0]) + iou_sampling_set = set(np.where(max_overlaps >= floor_thr)[0]) + elif floor_thr == 0: + floor_set = set(np.where(max_overlaps == 0)[0]) + iou_sampling_set = set(np.where(max_overlaps > floor_thr)[0]) + else: + floor_set = set() + iou_sampling_set = set(np.where(max_overlaps > floor_thr)[0]) + floor_thr = 0 + + floor_neg_inds = list(floor_set & neg_set) + iou_sampling_neg_inds = list(iou_sampling_set & neg_set) + + num_expected_iou_sampling = int(num_expected * (1 - floor_fraction)) + if len(iou_sampling_neg_inds) > num_expected_iou_sampling: + if num_bins >= 2: + iou_sampled_inds = libra_sample_via_interval( + max_overlaps, + set(iou_sampling_neg_inds), num_expected_iou_sampling, + floor_thr, num_bins, bg_thresh) + else: + iou_sampled_inds = np.random.choice( + iou_sampling_neg_inds, + size=num_expected_iou_sampling, + replace=False) + else: + iou_sampled_inds = np.array(iou_sampling_neg_inds, dtype=np.int32) + num_expected_floor = num_expected - len(iou_sampled_inds) + if len(floor_neg_inds) > num_expected_floor: + sampled_floor_inds = np.random.choice( + floor_neg_inds, size=num_expected_floor, replace=False) + else: + sampled_floor_inds = np.array(floor_neg_inds, dtype=np.int32) + sampled_inds = np.concatenate((sampled_floor_inds, iou_sampled_inds)) + if len(sampled_inds) < num_expected: + num_extra = num_expected - len(sampled_inds) + extra_inds = np.array(list(neg_set - set(sampled_inds))) + if len(extra_inds) > num_extra: + extra_inds = np.random.choice( + extra_inds, size=num_extra, replace=False) + sampled_inds = np.concatenate((sampled_inds, extra_inds)) + return paddle.to_tensor(sampled_inds) + + +def libra_label_box(anchors, gt_boxes, gt_classes, positive_overlap, + negative_overlap, num_classes): + # TODO: use paddle API to speed up + gt_classes = gt_classes.numpy() + gt_overlaps = np.zeros((anchors.shape[0], num_classes)) + matches = np.zeros((anchors.shape[0]), dtype=np.int32) + if len(gt_boxes) > 0: + proposal_to_gt_overlaps = bbox_overlaps(anchors, gt_boxes).numpy() + overlaps_argmax = proposal_to_gt_overlaps.argmax(axis=1) + overlaps_max = proposal_to_gt_overlaps.max(axis=1) + # Boxes which with non-zero overlap with gt boxes + overlapped_boxes_ind = np.where(overlaps_max > 0)[0] + overlapped_boxes_gt_classes = gt_classes[overlaps_argmax[ + overlapped_boxes_ind]] + + for idx in range(len(overlapped_boxes_ind)): + gt_overlaps[overlapped_boxes_ind[idx], overlapped_boxes_gt_classes[ + idx]] = overlaps_max[overlapped_boxes_ind[idx]] + matches[overlapped_boxes_ind[idx]] = overlaps_argmax[ + overlapped_boxes_ind[idx]] + + gt_overlaps = paddle.to_tensor(gt_overlaps) + matches = paddle.to_tensor(matches) + + matched_vals = paddle.max(gt_overlaps, axis=1) + match_labels = paddle.full(matches.shape, -1, dtype='int32') + match_labels = paddle.where(matched_vals < negative_overlap, + paddle.zeros_like(match_labels), match_labels) + match_labels = paddle.where(matched_vals >= positive_overlap, + paddle.ones_like(match_labels), match_labels) + + return matches, match_labels, matched_vals + + +def libra_sample_bbox(matches, + match_labels, + matched_vals, + gt_classes, + batch_size_per_im, + num_classes, + fg_fraction, + fg_thresh, + bg_thresh, + num_bins, + use_random=True, + is_cascade_rcnn=False): + rois_per_image = int(batch_size_per_im) + fg_rois_per_im = int(np.round(fg_fraction * rois_per_image)) + bg_rois_per_im = rois_per_image - fg_rois_per_im + + if is_cascade_rcnn: + fg_inds = paddle.nonzero(matched_vals >= fg_thresh) + bg_inds = paddle.nonzero(matched_vals < bg_thresh) + else: + matched_vals_np = matched_vals.numpy() + match_labels_np = match_labels.numpy() + + # sample fg + fg_inds = paddle.nonzero(matched_vals >= fg_thresh).flatten() + fg_nums = int(np.minimum(fg_rois_per_im, fg_inds.shape[0])) + if (fg_inds.shape[0] > fg_nums) and use_random: + fg_inds = libra_sample_pos(matched_vals_np, match_labels_np, + fg_inds.numpy(), fg_rois_per_im) + fg_inds = fg_inds[:fg_nums] + + # sample bg + bg_inds = paddle.nonzero(matched_vals < bg_thresh).flatten() + bg_nums = int(np.minimum(rois_per_image - fg_nums, bg_inds.shape[0])) + if (bg_inds.shape[0] > bg_nums) and use_random: + bg_inds = libra_sample_neg( + matched_vals_np, + match_labels_np, + bg_inds.numpy(), + bg_rois_per_im, + num_bins=num_bins, + bg_thresh=bg_thresh) + bg_inds = bg_inds[:bg_nums] + + sampled_inds = paddle.concat([fg_inds, bg_inds]) + + gt_classes = paddle.gather(gt_classes, matches) + gt_classes = paddle.where(match_labels == 0, + paddle.ones_like(gt_classes) * num_classes, + gt_classes) + gt_classes = paddle.where(match_labels == -1, + paddle.ones_like(gt_classes) * -1, gt_classes) + sampled_gt_classes = paddle.gather(gt_classes, sampled_inds) + + return sampled_inds, sampled_gt_classes + + +def libra_generate_proposal_target(rpn_rois, + gt_classes, + gt_boxes, + batch_size_per_im, + fg_fraction, + fg_thresh, + bg_thresh, + num_classes, + use_random=True, + is_cascade_rcnn=False, + max_overlaps=None, + num_bins=3): + + rois_with_gt = [] + tgt_labels = [] + tgt_bboxes = [] + sampled_max_overlaps = [] + tgt_gt_inds = [] + new_rois_num = [] + + for i, rpn_roi in enumerate(rpn_rois): + max_overlap = max_overlaps[i] if is_cascade_rcnn else None + gt_bbox = gt_boxes[i] + gt_class = paddle.squeeze(gt_classes[i], axis=-1) + if is_cascade_rcnn: + rpn_roi = filter_roi(rpn_roi, max_overlap) + bbox = paddle.concat([rpn_roi, gt_bbox]) + + # Step1: label bbox + matches, match_labels, matched_vals = libra_label_box( + bbox, gt_bbox, gt_class, fg_thresh, bg_thresh, num_classes) + + # Step2: sample bbox + sampled_inds, sampled_gt_classes = libra_sample_bbox( + matches, match_labels, matched_vals, gt_class, batch_size_per_im, + num_classes, fg_fraction, fg_thresh, bg_thresh, num_bins, + use_random, is_cascade_rcnn) + + # Step3: make output + rois_per_image = paddle.gather(bbox, sampled_inds) + sampled_gt_ind = paddle.gather(matches, sampled_inds) + sampled_bbox = paddle.gather(gt_bbox, sampled_gt_ind) + sampled_overlap = paddle.gather(matched_vals, sampled_inds) + + rois_per_image.stop_gradient = True + sampled_gt_ind.stop_gradient = True + sampled_bbox.stop_gradient = True + sampled_overlap.stop_gradient = True + + tgt_labels.append(sampled_gt_classes) + tgt_bboxes.append(sampled_bbox) + rois_with_gt.append(rois_per_image) + sampled_max_overlaps.append(sampled_overlap) + tgt_gt_inds.append(sampled_gt_ind) + new_rois_num.append(paddle.shape(sampled_inds)[0]) + new_rois_num = paddle.concat(new_rois_num) + # rois_with_gt, tgt_labels, tgt_bboxes, tgt_gt_inds, new_rois_num + return rois_with_gt, tgt_labels, tgt_bboxes, tgt_gt_inds, new_rois_num diff --git a/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target_layer.py b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target_layer.py new file mode 100644 index 0000000000000000000000000000000000000000..c010c819de1b059a396019685f431ac822be8868 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/proposal_generator/target_layer.py @@ -0,0 +1,481 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys +import paddle +from ppdet.core.workspace import register, serializable + +from .target import rpn_anchor_target, generate_proposal_target, generate_mask_target, libra_generate_proposal_target +import numpy as np + + +@register +@serializable +class RPNTargetAssign(object): + __shared__ = ['assign_on_cpu'] + """ + RPN targets assignment module + + The assignment consists of three steps: + 1. Match anchor and ground-truth box, label the anchor with foreground + or background sample + 2. Sample anchors to keep the properly ratio between foreground and + background + 3. Generate the targets for classification and regression branch + + + Args: + batch_size_per_im (int): Total number of RPN samples per image. + default 256 + fg_fraction (float): Fraction of anchors that is labeled + foreground, default 0.5 + positive_overlap (float): Minimum overlap required between an anchor + and ground-truth box for the (anchor, gt box) pair to be + a foreground sample. default 0.7 + negative_overlap (float): Maximum overlap allowed between an anchor + and ground-truth box for the (anchor, gt box) pair to be + a background sample. default 0.3 + ignore_thresh(float): Threshold for ignoring the is_crowd ground-truth + if the value is larger than zero. + use_random (bool): Use random sampling to choose foreground and + background boxes, default true. + assign_on_cpu (bool): In case the number of gt box is too large, + compute IoU on CPU, default false. + """ + + def __init__(self, + batch_size_per_im=256, + fg_fraction=0.5, + positive_overlap=0.7, + negative_overlap=0.3, + ignore_thresh=-1., + use_random=True, + assign_on_cpu=False): + super(RPNTargetAssign, self).__init__() + self.batch_size_per_im = batch_size_per_im + self.fg_fraction = fg_fraction + self.positive_overlap = positive_overlap + self.negative_overlap = negative_overlap + self.ignore_thresh = ignore_thresh + self.use_random = use_random + self.assign_on_cpu = assign_on_cpu + + def __call__(self, inputs, anchors): + """ + inputs: ground-truth instances. + anchor_box (Tensor): [num_anchors, 4], num_anchors are all anchors in all feature maps. + """ + gt_boxes = inputs['gt_bbox'] + is_crowd = inputs.get('is_crowd', None) + batch_size = len(gt_boxes) + tgt_labels, tgt_bboxes, tgt_deltas = rpn_anchor_target( + anchors, + gt_boxes, + self.batch_size_per_im, + self.positive_overlap, + self.negative_overlap, + self.fg_fraction, + self.use_random, + batch_size, + self.ignore_thresh, + is_crowd, + assign_on_cpu=self.assign_on_cpu) + norm = self.batch_size_per_im * batch_size + + return tgt_labels, tgt_bboxes, tgt_deltas, norm + + +@register +class BBoxAssigner(object): + __shared__ = ['num_classes', 'assign_on_cpu'] + """ + RCNN targets assignment module + + The assignment consists of three steps: + 1. Match RoIs and ground-truth box, label the RoIs with foreground + or background sample + 2. Sample anchors to keep the properly ratio between foreground and + background + 3. Generate the targets for classification and regression branch + + Args: + batch_size_per_im (int): Total number of RoIs per image. + default 512 + fg_fraction (float): Fraction of RoIs that is labeled + foreground, default 0.25 + fg_thresh (float): Minimum overlap required between a RoI + and ground-truth box for the (roi, gt box) pair to be + a foreground sample. default 0.5 + bg_thresh (float): Maximum overlap allowed between a RoI + and ground-truth box for the (roi, gt box) pair to be + a background sample. default 0.5 + ignore_thresh(float): Threshold for ignoring the is_crowd ground-truth + if the value is larger than zero. + use_random (bool): Use random sampling to choose foreground and + background boxes, default true + cascade_iou (list[iou]): The list of overlap to select foreground and + background of each stage, which is only used In Cascade RCNN. + num_classes (int): The number of class. + assign_on_cpu (bool): In case the number of gt box is too large, + compute IoU on CPU, default false. + """ + + def __init__(self, + batch_size_per_im=512, + fg_fraction=.25, + fg_thresh=.5, + bg_thresh=.5, + ignore_thresh=-1., + use_random=True, + cascade_iou=[0.5, 0.6, 0.7], + num_classes=80, + assign_on_cpu=False): + super(BBoxAssigner, self).__init__() + self.batch_size_per_im = batch_size_per_im + self.fg_fraction = fg_fraction + self.fg_thresh = fg_thresh + self.bg_thresh = bg_thresh + self.ignore_thresh = ignore_thresh + self.use_random = use_random + self.cascade_iou = cascade_iou + self.num_classes = num_classes + self.assign_on_cpu = assign_on_cpu + + def __call__(self, + rpn_rois, + rpn_rois_num, + inputs, + stage=0, + is_cascade=False, + add_gt_as_proposals=True): + gt_classes = inputs['gt_class'] + gt_boxes = inputs['gt_bbox'] + is_crowd = inputs.get('is_crowd', None) + # rois, tgt_labels, tgt_bboxes, tgt_gt_inds + # new_rois_num + outs = generate_proposal_target( + rpn_rois, gt_classes, gt_boxes, self.batch_size_per_im, + self.fg_fraction, self.fg_thresh, self.bg_thresh, self.num_classes, + self.ignore_thresh, is_crowd, self.use_random, is_cascade, + self.cascade_iou[stage], self.assign_on_cpu, add_gt_as_proposals) + rois = outs[0] + rois_num = outs[-1] + # tgt_labels, tgt_bboxes, tgt_gt_inds + targets = outs[1:4] + return rois, rois_num, targets + + +@register +class BBoxLibraAssigner(object): + __shared__ = ['num_classes'] + """ + Libra-RCNN targets assignment module + + The assignment consists of three steps: + 1. Match RoIs and ground-truth box, label the RoIs with foreground + or background sample + 2. Sample anchors to keep the properly ratio between foreground and + background + 3. Generate the targets for classification and regression branch + + Args: + batch_size_per_im (int): Total number of RoIs per image. + default 512 + fg_fraction (float): Fraction of RoIs that is labeled + foreground, default 0.25 + fg_thresh (float): Minimum overlap required between a RoI + and ground-truth box for the (roi, gt box) pair to be + a foreground sample. default 0.5 + bg_thresh (float): Maximum overlap allowed between a RoI + and ground-truth box for the (roi, gt box) pair to be + a background sample. default 0.5 + use_random (bool): Use random sampling to choose foreground and + background boxes, default true + cascade_iou (list[iou]): The list of overlap to select foreground and + background of each stage, which is only used In Cascade RCNN. + num_classes (int): The number of class. + num_bins (int): The number of libra_sample. + """ + + def __init__(self, + batch_size_per_im=512, + fg_fraction=.25, + fg_thresh=.5, + bg_thresh=.5, + use_random=True, + cascade_iou=[0.5, 0.6, 0.7], + num_classes=80, + num_bins=3): + super(BBoxLibraAssigner, self).__init__() + self.batch_size_per_im = batch_size_per_im + self.fg_fraction = fg_fraction + self.fg_thresh = fg_thresh + self.bg_thresh = bg_thresh + self.use_random = use_random + self.cascade_iou = cascade_iou + self.num_classes = num_classes + self.num_bins = num_bins + + def __call__(self, + rpn_rois, + rpn_rois_num, + inputs, + stage=0, + is_cascade=False): + gt_classes = inputs['gt_class'] + gt_boxes = inputs['gt_bbox'] + # rois, tgt_labels, tgt_bboxes, tgt_gt_inds + outs = libra_generate_proposal_target( + rpn_rois, gt_classes, gt_boxes, self.batch_size_per_im, + self.fg_fraction, self.fg_thresh, self.bg_thresh, self.num_classes, + self.use_random, is_cascade, self.cascade_iou[stage], self.num_bins) + rois = outs[0] + rois_num = outs[-1] + # tgt_labels, tgt_bboxes, tgt_gt_inds + targets = outs[1:4] + return rois, rois_num, targets + + +@register +@serializable +class MaskAssigner(object): + __shared__ = ['num_classes', 'mask_resolution'] + """ + Mask targets assignment module + + The assignment consists of three steps: + 1. Select RoIs labels with foreground. + 2. Encode the RoIs and corresponding gt polygons to generate + mask target + + Args: + num_classes (int): The number of class + mask_resolution (int): The resolution of mask target, default 14 + """ + + def __init__(self, num_classes=80, mask_resolution=14): + super(MaskAssigner, self).__init__() + self.num_classes = num_classes + self.mask_resolution = mask_resolution + + def __call__(self, rois, tgt_labels, tgt_gt_inds, inputs): + gt_segms = inputs['gt_poly'] + + outs = generate_mask_target(gt_segms, rois, tgt_labels, tgt_gt_inds, + self.num_classes, self.mask_resolution) + + # mask_rois, mask_rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights + return outs + + +@register +class RBoxAssigner(object): + """ + assigner of rbox + Args: + pos_iou_thr (float): threshold of pos samples + neg_iou_thr (float): threshold of neg samples + min_iou_thr (float): the min threshold of samples + ignore_iof_thr (int): the ignored threshold + """ + + def __init__(self, + pos_iou_thr=0.5, + neg_iou_thr=0.4, + min_iou_thr=0.0, + ignore_iof_thr=-2): + super(RBoxAssigner, self).__init__() + + self.pos_iou_thr = pos_iou_thr + self.neg_iou_thr = neg_iou_thr + self.min_iou_thr = min_iou_thr + self.ignore_iof_thr = ignore_iof_thr + + def anchor_valid(self, anchors): + """ + + Args: + anchor: M x 4 + + Returns: + + """ + if anchors.ndim == 3: + anchors = anchors.reshape(-1, anchors.shape[-1]) + assert anchors.ndim == 2 + anchor_num = anchors.shape[0] + anchor_valid = np.ones((anchor_num), np.int32) + anchor_inds = np.arange(anchor_num) + return anchor_inds + + def rbox2delta(self, + proposals, + gt, + means=[0, 0, 0, 0, 0], + stds=[1, 1, 1, 1, 1]): + """ + Args: + proposals: tensor [N, 5] + gt: gt [N, 5] + means: means [5] + stds: stds [5] + Returns: + + """ + proposals = proposals.astype(np.float64) + + PI = np.pi + + gt_widths = gt[..., 2] + gt_heights = gt[..., 3] + gt_angle = gt[..., 4] + + proposals_widths = proposals[..., 2] + proposals_heights = proposals[..., 3] + proposals_angle = proposals[..., 4] + + coord = gt[..., 0:2] - proposals[..., 0:2] + dx = (np.cos(proposals[..., 4]) * coord[..., 0] + + np.sin(proposals[..., 4]) * coord[..., 1]) / proposals_widths + dy = (-np.sin(proposals[..., 4]) * coord[..., 0] + + np.cos(proposals[..., 4]) * coord[..., 1]) / proposals_heights + dw = np.log(gt_widths / proposals_widths) + dh = np.log(gt_heights / proposals_heights) + da = (gt_angle - proposals_angle) + + da = (da + PI / 4) % PI - PI / 4 + da /= PI + + deltas = np.stack([dx, dy, dw, dh, da], axis=-1) + means = np.array(means, dtype=deltas.dtype) + stds = np.array(stds, dtype=deltas.dtype) + deltas = (deltas - means) / stds + deltas = deltas.astype(np.float32) + return deltas + + def assign_anchor(self, + anchors, + gt_bboxes, + gt_labels, + pos_iou_thr, + neg_iou_thr, + min_iou_thr=0.0, + ignore_iof_thr=-2): + assert anchors.shape[1] == 4 or anchors.shape[1] == 5 + assert gt_bboxes.shape[1] == 4 or gt_bboxes.shape[1] == 5 + anchors_xc_yc = anchors + gt_bboxes_xc_yc = gt_bboxes + + # calc rbox iou + anchors_xc_yc = anchors_xc_yc.astype(np.float32) + gt_bboxes_xc_yc = gt_bboxes_xc_yc.astype(np.float32) + anchors_xc_yc = paddle.to_tensor(anchors_xc_yc) + gt_bboxes_xc_yc = paddle.to_tensor(gt_bboxes_xc_yc) + + try: + from ext_op import rbox_iou + except Exception as e: + print("import custom_ops error, try install ext_op " \ + "following ppdet/ext_op/README.md", e) + sys.stdout.flush() + sys.exit(-1) + + iou = rbox_iou(gt_bboxes_xc_yc, anchors_xc_yc) + iou = iou.numpy() + iou = iou.T + + # every gt's anchor's index + gt_bbox_anchor_inds = iou.argmax(axis=0) + gt_bbox_anchor_iou = iou[gt_bbox_anchor_inds, np.arange(iou.shape[1])] + gt_bbox_anchor_iou_inds = np.where(iou == gt_bbox_anchor_iou)[0] + + # every anchor's gt bbox's index + anchor_gt_bbox_inds = iou.argmax(axis=1) + anchor_gt_bbox_iou = iou[np.arange(iou.shape[0]), anchor_gt_bbox_inds] + + # (1) set labels=-2 as default + labels = np.ones((iou.shape[0], ), dtype=np.int32) * ignore_iof_thr + + # (2) assign ignore + labels[anchor_gt_bbox_iou < min_iou_thr] = ignore_iof_thr + + # (3) assign neg_ids -1 + assign_neg_ids1 = anchor_gt_bbox_iou >= min_iou_thr + assign_neg_ids2 = anchor_gt_bbox_iou < neg_iou_thr + assign_neg_ids = np.logical_and(assign_neg_ids1, assign_neg_ids2) + labels[assign_neg_ids] = -1 + + # anchor_gt_bbox_iou_inds + # (4) assign max_iou as pos_ids >=0 + anchor_gt_bbox_iou_inds = anchor_gt_bbox_inds[gt_bbox_anchor_iou_inds] + # gt_bbox_anchor_iou_inds = np.logical_and(gt_bbox_anchor_iou_inds, anchor_gt_bbox_iou >= min_iou_thr) + labels[gt_bbox_anchor_iou_inds] = gt_labels[anchor_gt_bbox_iou_inds] + + # (5) assign >= pos_iou_thr as pos_ids + iou_pos_iou_thr_ids = anchor_gt_bbox_iou >= pos_iou_thr + iou_pos_iou_thr_ids_box_inds = anchor_gt_bbox_inds[iou_pos_iou_thr_ids] + labels[iou_pos_iou_thr_ids] = gt_labels[iou_pos_iou_thr_ids_box_inds] + return anchor_gt_bbox_inds, anchor_gt_bbox_iou, labels + + def __call__(self, anchors, gt_bboxes, gt_labels, is_crowd): + + assert anchors.ndim == 2 + assert anchors.shape[1] == 5 + assert gt_bboxes.ndim == 2 + assert gt_bboxes.shape[1] == 5 + + pos_iou_thr = self.pos_iou_thr + neg_iou_thr = self.neg_iou_thr + min_iou_thr = self.min_iou_thr + ignore_iof_thr = self.ignore_iof_thr + + anchor_num = anchors.shape[0] + + gt_bboxes = gt_bboxes + is_crowd_slice = is_crowd + not_crowd_inds = np.where(is_crowd_slice == 0) + + # Step1: match anchor and gt_bbox + anchor_gt_bbox_inds, anchor_gt_bbox_iou, labels = self.assign_anchor( + anchors, gt_bboxes, + gt_labels.reshape(-1), pos_iou_thr, neg_iou_thr, min_iou_thr, + ignore_iof_thr) + + # Step2: sample anchor + pos_inds = np.where(labels >= 0)[0] + neg_inds = np.where(labels == -1)[0] + + # Step3: make output + anchors_num = anchors.shape[0] + bbox_targets = np.zeros_like(anchors) + bbox_weights = np.zeros_like(anchors) + bbox_gt_bboxes = np.zeros_like(anchors) + pos_labels = np.zeros(anchors_num, dtype=np.int32) + pos_labels_weights = np.zeros(anchors_num, dtype=np.float32) + + pos_sampled_anchors = anchors[pos_inds] + pos_sampled_gt_boxes = gt_bboxes[anchor_gt_bbox_inds[pos_inds]] + if len(pos_inds) > 0: + pos_bbox_targets = self.rbox2delta(pos_sampled_anchors, + pos_sampled_gt_boxes) + bbox_targets[pos_inds, :] = pos_bbox_targets + bbox_gt_bboxes[pos_inds, :] = pos_sampled_gt_boxes + bbox_weights[pos_inds, :] = 1.0 + + pos_labels[pos_inds] = labels[pos_inds] + pos_labels_weights[pos_inds] = 1.0 + + if len(neg_inds) > 0: + pos_labels_weights[neg_inds] = 1.0 + return (pos_labels, pos_labels_weights, bbox_targets, bbox_weights, + bbox_gt_bboxes, pos_inds, neg_inds) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/rbox_utils.py b/PaddleDetection-release-2.6/ppdet/modeling/rbox_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..a5f19a2949d9f46b05ff94e5534807dabc46600d --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/rbox_utils.py @@ -0,0 +1,295 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import numpy as np +import cv2 + + +def norm_angle(angle, range=[-np.pi / 4, np.pi]): + return (angle - range[0]) % range[1] + range[0] + + +# rbox function implemented using numpy +def poly2rbox_le135_np(poly): + """convert poly to rbox [-pi / 4, 3 * pi / 4] + + Args: + poly: [x1, y1, x2, y2, x3, y3, x4, y4] + + Returns: + rbox: [cx, cy, w, h, angle] + """ + poly = np.array(poly[:8], dtype=np.float32) + + pt1 = (poly[0], poly[1]) + pt2 = (poly[2], poly[3]) + pt3 = (poly[4], poly[5]) + pt4 = (poly[6], poly[7]) + + edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) * + (pt1[1] - pt2[1])) + edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) * + (pt2[1] - pt3[1])) + + width = max(edge1, edge2) + height = min(edge1, edge2) + + rbox_angle = 0 + if edge1 > edge2: + rbox_angle = np.arctan2(float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0])) + elif edge2 >= edge1: + rbox_angle = np.arctan2(float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0])) + + rbox_angle = norm_angle(rbox_angle) + + x_ctr = float(pt1[0] + pt3[0]) / 2 + y_ctr = float(pt1[1] + pt3[1]) / 2 + return [x_ctr, y_ctr, width, height, rbox_angle] + + +def poly2rbox_oc_np(poly): + """convert poly to rbox (0, pi / 2] + + Args: + poly: [x1, y1, x2, y2, x3, y3, x4, y4] + + Returns: + rbox: [cx, cy, w, h, angle] + """ + points = np.array(poly, dtype=np.float32).reshape((-1, 2)) + (cx, cy), (w, h), angle = cv2.minAreaRect(points) + # using the new OpenCV Rotated BBox definition since 4.5.1 + # if angle < 0, opencv is older than 4.5.1, angle is in [-90, 0) + if angle < 0: + angle += 90 + w, h = h, w + + # convert angle to [0, 90) + if angle == -0.0: + angle = 0.0 + if angle == 90.0: + angle = 0.0 + w, h = h, w + + angle = angle / 180 * np.pi + return [cx, cy, w, h, angle] + + +def poly2rbox_np(polys, rbox_type='oc'): + """ + polys: [x0,y0,x1,y1,x2,y2,x3,y3] + to + rboxes: [x_ctr,y_ctr,w,h,angle] + """ + assert rbox_type in ['oc', 'le135'], 'only oc or le135 is supported now' + poly2rbox_fn = poly2rbox_oc_np if rbox_type == 'oc' else poly2rbox_le135_np + rboxes = [] + for poly in polys: + x, y, w, h, angle = poly2rbox_fn(poly) + rbox = np.array([x, y, w, h, angle], dtype=np.float32) + rboxes.append(rbox) + + return np.array(rboxes) + + +def cal_line_length(point1, point2): + return math.sqrt( + math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2)) + + +def get_best_begin_point_single(coordinate): + x1, y1, x2, y2, x3, y3, x4, y4 = coordinate + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + xmax = max(x1, x2, x3, x4) + ymax = max(y1, y2, y3, y4) + combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], + [[x4, y4], [x1, y1], [x2, y2], [x3, y3]], + [[x3, y3], [x4, y4], [x1, y1], [x2, y2]], + [[x2, y2], [x3, y3], [x4, y4], [x1, y1]]] + dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]] + force = 100000000.0 + force_flag = 0 + for i in range(4): + temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \ + + cal_line_length(combinate[i][1], dst_coordinate[1]) \ + + cal_line_length(combinate[i][2], dst_coordinate[2]) \ + + cal_line_length(combinate[i][3], dst_coordinate[3]) + if temp_force < force: + force = temp_force + force_flag = i + if force_flag != 0: + pass + return np.array(combinate[force_flag]).reshape(8) + + +def rbox2poly_np(rboxes): + """ + rboxes:[x_ctr,y_ctr,w,h,angle] + to + poly:[x0,y0,x1,y1,x2,y2,x3,y3] + """ + polys = [] + for i in range(len(rboxes)): + x_ctr, y_ctr, width, height, angle = rboxes[i][:5] + tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 + rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) + R = np.array([[np.cos(angle), -np.sin(angle)], + [np.sin(angle), np.cos(angle)]]) + poly = R.dot(rect) + x0, x1, x2, x3 = poly[0, :4] + x_ctr + y0, y1, y2, y3 = poly[1, :4] + y_ctr + poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32) + poly = get_best_begin_point_single(poly) + polys.append(poly) + polys = np.array(polys) + return polys + + +# rbox function implemented using paddle +def box2corners(box): + """convert box coordinate to corners + Args: + box (Tensor): (B, N, 5) with (x, y, w, h, alpha) angle is in [0, 90) + Returns: + corners (Tensor): (B, N, 4, 2) with (x1, y1, x2, y2, x3, y3, x4, y4) + """ + B = box.shape[0] + x, y, w, h, alpha = paddle.split(box, 5, axis=-1) + x4 = paddle.to_tensor( + [0.5, 0.5, -0.5, -0.5], dtype=paddle.float32).reshape( + (1, 1, 4)) # (1,1,4) + x4 = x4 * w # (B, N, 4) + y4 = paddle.to_tensor( + [-0.5, 0.5, 0.5, -0.5], dtype=paddle.float32).reshape((1, 1, 4)) + y4 = y4 * h # (B, N, 4) + corners = paddle.stack([x4, y4], axis=-1) # (B, N, 4, 2) + sin = paddle.sin(alpha) + cos = paddle.cos(alpha) + row1 = paddle.concat([cos, sin], axis=-1) + row2 = paddle.concat([-sin, cos], axis=-1) # (B, N, 2) + rot_T = paddle.stack([row1, row2], axis=-2) # (B, N, 2, 2) + rotated = paddle.bmm(corners.reshape([-1, 4, 2]), rot_T.reshape([-1, 2, 2])) + rotated = rotated.reshape([B, -1, 4, 2]) # (B*N, 4, 2) -> (B, N, 4, 2) + rotated[..., 0] += x + rotated[..., 1] += y + return rotated + + +def paddle_gather(x, dim, index): + index_shape = index.shape + index_flatten = index.flatten() + if dim < 0: + dim = len(x.shape) + dim + nd_index = [] + for k in range(len(x.shape)): + if k == dim: + nd_index.append(index_flatten) + else: + reshape_shape = [1] * len(x.shape) + reshape_shape[k] = x.shape[k] + x_arange = paddle.arange(x.shape[k], dtype=index.dtype) + x_arange = x_arange.reshape(reshape_shape) + dim_index = paddle.expand(x_arange, index_shape).flatten() + nd_index.append(dim_index) + ind2 = paddle.transpose(paddle.stack(nd_index), [1, 0]).astype("int64") + paddle_out = paddle.gather_nd(x, ind2).reshape(index_shape) + return paddle_out + + +def check_points_in_polys(points, polys): + """Check whether point is in rotated boxes + Args: + points (tensor): (1, L, 2) anchor points + polys (tensor): [B, N, 4, 2] gt_polys + eps (float): default 1e-9 + Returns: + is_in_polys (tensor): (B, N, L) + """ + # [1, L, 2] -> [1, 1, L, 2] + points = points.unsqueeze(0) + # [B, N, 4, 2] -> [B, N, 1, 2] + a, b, c, d = polys.split(4, axis=2) + ab = b - a + ad = d - a + # [B, N, L, 2] + ap = points - a + # [B, N, 1] + norm_ab = paddle.sum(ab * ab, axis=-1) + # [B, N, 1] + norm_ad = paddle.sum(ad * ad, axis=-1) + # [B, N, L] dot product + ap_dot_ab = paddle.sum(ap * ab, axis=-1) + # [B, N, L] dot product + ap_dot_ad = paddle.sum(ap * ad, axis=-1) + # [B, N, L] = |A|*|B|*cos(theta) + is_in_polys = (ap_dot_ab >= 0) & (ap_dot_ab <= norm_ab) & ( + ap_dot_ad >= 0) & (ap_dot_ad <= norm_ad) + return is_in_polys + + +def check_points_in_rotated_boxes(points, boxes): + """Check whether point is in rotated boxes + + Args: + points (tensor): (1, L, 2) anchor points + boxes (tensor): [B, N, 5] gt_bboxes + eps (float): default 1e-9 + + Returns: + is_in_box (tensor): (B, N, L) + + """ + # [B, N, 5] -> [B, N, 4, 2] + corners = box2corners(boxes) + # [1, L, 2] -> [1, 1, L, 2] + points = points.unsqueeze(0) + # [B, N, 4, 2] -> [B, N, 1, 2] + a, b, c, d = corners.split(4, axis=2) + ab = b - a + ad = d - a + # [B, N, L, 2] + ap = points - a + # [B, N, L] + norm_ab = paddle.sum(ab * ab, axis=-1) + # [B, N, L] + norm_ad = paddle.sum(ad * ad, axis=-1) + # [B, N, L] dot product + ap_dot_ab = paddle.sum(ap * ab, axis=-1) + # [B, N, L] dot product + ap_dot_ad = paddle.sum(ap * ad, axis=-1) + # [B, N, L] = |A|*|B|*cos(theta) + is_in_box = (ap_dot_ab >= 0) & (ap_dot_ab <= norm_ab) & (ap_dot_ad >= 0) & ( + ap_dot_ad <= norm_ad) + return is_in_box + + +def rotated_iou_similarity(box1, box2, eps=1e-9, func=''): + """Calculate iou of box1 and box2 + + Args: + box1 (Tensor): box with the shape [N, M1, 5] + box2 (Tensor): box with the shape [N, M2, 5] + + Return: + iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2] + """ + from ext_op import rbox_iou + rotated_ious = [] + for b1, b2 in zip(box1, box2): + rotated_ious.append(rbox_iou(b1, b2)) + + return paddle.stack(rotated_ious, axis=0) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..3c176d705bbb87d300428b6732013fed058d4339 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/__init__.py @@ -0,0 +1,27 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import jde_embedding_head +from . import fairmot_embedding_head +from . import resnet +from . import pyramidal_embedding +from . import pplcnet_embedding +from . import resnet_embedding + +from .fairmot_embedding_head import * +from .jde_embedding_head import * +from .resnet import * +from .pyramidal_embedding import * +from .pplcnet_embedding import * +from .resnet_embedding import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f1bb7d15de03b7dce290ca86de2279dafa9696e1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/fairmot_embedding_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/fairmot_embedding_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7be242b2b2be5108067c6eef8c458ef490dfa869 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/fairmot_embedding_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/jde_embedding_head.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/jde_embedding_head.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7207ac9d743d0963fb93795ac53f91ffd67e887d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/jde_embedding_head.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pplcnet_embedding.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pplcnet_embedding.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ba00cddf4b1102e7a17111976ff3ba537abc52a2 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pplcnet_embedding.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pyramidal_embedding.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pyramidal_embedding.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c74f1899c18a6c2997c287fa841b8be82c3e941d Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/pyramidal_embedding.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..88443bf0226e00307d964ded5bf2bccdd71ca637 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet_embedding.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet_embedding.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..388cd2e9dda9aa7a098ecb3079285bb8569453b0 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/reid/__pycache__/resnet_embedding.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/fairmot_embedding_head.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/fairmot_embedding_head.py new file mode 100644 index 0000000000000000000000000000000000000000..98ca257fd55db020d0e2d483684b5391c28bce49 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/fairmot_embedding_head.py @@ -0,0 +1,224 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import KaimingUniform, Uniform +from ppdet.core.workspace import register +from ppdet.modeling.heads.centernet_head import ConvLayer + +__all__ = ['FairMOTEmbeddingHead'] + + +@register +class FairMOTEmbeddingHead(nn.Layer): + __shared__ = ['num_classes'] + """ + Args: + in_channels (int): the channel number of input to FairMOTEmbeddingHead. + ch_head (int): the channel of features before fed into embedding, 256 by default. + ch_emb (int): the channel of the embedding feature, 128 by default. + num_identities_dict (dict): the number of identities of each category, + support single class and multi-calss, {0: 14455} as default. + """ + + def __init__(self, + in_channels, + ch_head=256, + ch_emb=128, + num_classes=1, + num_identities_dict={0: 14455}): + super(FairMOTEmbeddingHead, self).__init__() + assert num_classes >= 1 + self.num_classes = num_classes + self.ch_emb = ch_emb + self.num_identities_dict = num_identities_dict + self.reid = nn.Sequential( + ConvLayer( + in_channels, ch_head, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + ch_head, ch_emb, kernel_size=1, stride=1, padding=0, bias=True)) + param_attr = paddle.ParamAttr(initializer=KaimingUniform()) + bound = 1 / math.sqrt(ch_emb) + bias_attr = paddle.ParamAttr(initializer=Uniform(-bound, bound)) + self.reid_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum') + + if num_classes == 1: + nID = self.num_identities_dict[0] # single class + self.classifier = nn.Linear( + ch_emb, nID, weight_attr=param_attr, bias_attr=bias_attr) + # When num_identities(nID) is 1, emb_scale is set as 1 + self.emb_scale = math.sqrt(2) * math.log(nID - 1) if nID > 1 else 1 + else: + self.classifiers = dict() + self.emb_scale_dict = dict() + for cls_id, nID in self.num_identities_dict.items(): + self.classifiers[str(cls_id)] = nn.Linear( + ch_emb, nID, weight_attr=param_attr, bias_attr=bias_attr) + # When num_identities(nID) is 1, emb_scale is set as 1 + self.emb_scale_dict[str(cls_id)] = math.sqrt(2) * math.log( + nID - 1) if nID > 1 else 1 + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channels': input_shape.channels} + + def process_by_class(self, bboxes, embedding, bbox_inds, topk_clses): + pred_dets, pred_embs = [], [] + for cls_id in range(self.num_classes): + inds_masks = topk_clses == cls_id + inds_masks = paddle.cast(inds_masks, 'float32') + + pos_num = inds_masks.sum().numpy() + if pos_num == 0: + continue + + cls_inds_mask = inds_masks > 0 + + bbox_mask = paddle.nonzero(cls_inds_mask) + cls_bboxes = paddle.gather_nd(bboxes, bbox_mask) + pred_dets.append(cls_bboxes) + + cls_inds = paddle.masked_select(bbox_inds, cls_inds_mask) + cls_inds = cls_inds.unsqueeze(-1) + cls_embedding = paddle.gather_nd(embedding, cls_inds) + pred_embs.append(cls_embedding) + + return paddle.concat(pred_dets), paddle.concat(pred_embs) + + def forward(self, + neck_feat, + inputs, + bboxes=None, + bbox_inds=None, + topk_clses=None): + reid_feat = self.reid(neck_feat) + if self.training: + if self.num_classes == 1: + loss = self.get_loss(reid_feat, inputs) + else: + loss = self.get_mc_loss(reid_feat, inputs) + return loss + else: + assert bboxes is not None and bbox_inds is not None + reid_feat = F.normalize(reid_feat) + embedding = paddle.transpose(reid_feat, [0, 2, 3, 1]) + embedding = paddle.reshape(embedding, [-1, self.ch_emb]) + # embedding shape: [bs * h * w, ch_emb] + + if self.num_classes == 1: + pred_dets = bboxes + pred_embs = paddle.gather(embedding, bbox_inds) + else: + pred_dets, pred_embs = self.process_by_class( + bboxes, embedding, bbox_inds, topk_clses) + return pred_dets, pred_embs + + def get_loss(self, feat, inputs): + index = inputs['index'] + mask = inputs['index_mask'] + target = inputs['reid'] + target = paddle.masked_select(target, mask > 0) + target = paddle.unsqueeze(target, 1) + + feat = paddle.transpose(feat, perm=[0, 2, 3, 1]) + feat_n, feat_h, feat_w, feat_c = feat.shape + feat = paddle.reshape(feat, shape=[feat_n, -1, feat_c]) + index = paddle.unsqueeze(index, 2) + batch_inds = list() + for i in range(feat_n): + batch_ind = paddle.full( + shape=[1, index.shape[1], 1], fill_value=i, dtype='int64') + batch_inds.append(batch_ind) + batch_inds = paddle.concat(batch_inds, axis=0) + index = paddle.concat(x=[batch_inds, index], axis=2) + feat = paddle.gather_nd(feat, index=index) + + mask = paddle.unsqueeze(mask, axis=2) + mask = paddle.expand_as(mask, feat) + mask.stop_gradient = True + feat = paddle.masked_select(feat, mask > 0) + feat = paddle.reshape(feat, shape=[-1, feat_c]) + feat = F.normalize(feat) + feat = self.emb_scale * feat + logit = self.classifier(feat) + target.stop_gradient = True + loss = self.reid_loss(logit, target) + valid = (target != self.reid_loss.ignore_index) + valid.stop_gradient = True + count = paddle.sum((paddle.cast(valid, dtype=np.int32))) + count.stop_gradient = True + if count > 0: + loss = loss / count + + return loss + + def get_mc_loss(self, feat, inputs): + # feat.shape = [bs, ch_emb, h, w] + assert 'cls_id_map' in inputs and 'cls_tr_ids' in inputs + index = inputs['index'] + mask = inputs['index_mask'] + cls_id_map = inputs['cls_id_map'] # [bs, h, w] + cls_tr_ids = inputs['cls_tr_ids'] # [bs, num_classes, h, w] + + feat = paddle.transpose(feat, perm=[0, 2, 3, 1]) + feat_n, feat_h, feat_w, feat_c = feat.shape + feat = paddle.reshape(feat, shape=[feat_n, -1, feat_c]) + + index = paddle.unsqueeze(index, 2) + batch_inds = list() + for i in range(feat_n): + batch_ind = paddle.full( + shape=[1, index.shape[1], 1], fill_value=i, dtype='int64') + batch_inds.append(batch_ind) + batch_inds = paddle.concat(batch_inds, axis=0) + index = paddle.concat(x=[batch_inds, index], axis=2) + feat = paddle.gather_nd(feat, index=index) + + mask = paddle.unsqueeze(mask, axis=2) + mask = paddle.expand_as(mask, feat) + mask.stop_gradient = True + feat = paddle.masked_select(feat, mask > 0) + feat = paddle.reshape(feat, shape=[-1, feat_c]) + + reid_losses = 0 + for cls_id, id_num in self.num_identities_dict.items(): + # target + cur_cls_tr_ids = paddle.reshape( + cls_tr_ids[:, cls_id, :, :], shape=[feat_n, -1]) # [bs, h*w] + cls_id_target = paddle.gather_nd(cur_cls_tr_ids, index=index) + mask = inputs['index_mask'] + cls_id_target = paddle.masked_select(cls_id_target, mask > 0) + cls_id_target.stop_gradient = True + + # feat + cls_id_feat = self.emb_scale_dict[str(cls_id)] * F.normalize(feat) + cls_id_pred = self.classifiers[str(cls_id)](cls_id_feat) + + loss = self.reid_loss(cls_id_pred, cls_id_target) + valid = (cls_id_target != self.reid_loss.ignore_index) + valid.stop_gradient = True + count = paddle.sum((paddle.cast(valid, dtype=np.int32))) + count.stop_gradient = True + if count > 0: + loss = loss / count + reid_losses += loss + + return reid_losses diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/jde_embedding_head.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/jde_embedding_head.py new file mode 100644 index 0000000000000000000000000000000000000000..1d1e60f3cf5f13f08e2f46c7832d931236d688cd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/jde_embedding_head.py @@ -0,0 +1,211 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register +from paddle.nn.initializer import Normal, Constant + +__all__ = ['JDEEmbeddingHead'] + + +class LossParam(nn.Layer): + def __init__(self, init_value=0., use_uncertainy=True): + super(LossParam, self).__init__() + self.loss_param = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=init_value)), + dtype="float32") + + def forward(self, inputs): + out = paddle.exp(-self.loss_param) * inputs + self.loss_param + return out * 0.5 + + +@register +class JDEEmbeddingHead(nn.Layer): + __shared__ = ['num_classes'] + __inject__ = ['emb_loss', 'jde_loss'] + """ + JDEEmbeddingHead + Args: + num_classes(int): Number of classes. Only support one class tracking. + num_identities(int): Number of identities. + anchor_levels(int): Number of anchor levels, same as FPN levels. + anchor_scales(int): Number of anchor scales on each FPN level. + embedding_dim(int): Embedding dimension. Default: 512. + emb_loss(object): Instance of 'JDEEmbeddingLoss' + jde_loss(object): Instance of 'JDELoss' + """ + + def __init__( + self, + num_classes=1, + num_identities=14455, # dataset.num_identities_dict[0] + anchor_levels=3, + anchor_scales=4, + embedding_dim=512, + emb_loss='JDEEmbeddingLoss', + jde_loss='JDELoss'): + super(JDEEmbeddingHead, self).__init__() + self.num_classes = num_classes + self.num_identities = num_identities + self.anchor_levels = anchor_levels + self.anchor_scales = anchor_scales + self.embedding_dim = embedding_dim + self.emb_loss = emb_loss + self.jde_loss = jde_loss + + self.emb_scale = math.sqrt(2) * math.log( + self.num_identities - 1) if self.num_identities > 1 else 1 + + self.identify_outputs = [] + self.loss_params_cls = [] + self.loss_params_reg = [] + self.loss_params_ide = [] + for i in range(self.anchor_levels): + name = 'identify_output.{}'.format(i) + identify_output = self.add_sublayer( + name, + nn.Conv2D( + in_channels=64 * (2**self.anchor_levels) // (2**i), + out_channels=self.embedding_dim, + kernel_size=3, + stride=1, + padding=1, + bias_attr=ParamAttr(regularizer=L2Decay(0.)))) + self.identify_outputs.append(identify_output) + + loss_p_cls = self.add_sublayer('cls.{}'.format(i), LossParam(-4.15)) + self.loss_params_cls.append(loss_p_cls) + loss_p_reg = self.add_sublayer('reg.{}'.format(i), LossParam(-4.85)) + self.loss_params_reg.append(loss_p_reg) + loss_p_ide = self.add_sublayer('ide.{}'.format(i), LossParam(-2.3)) + self.loss_params_ide.append(loss_p_ide) + + self.classifier = self.add_sublayer( + 'classifier', + nn.Linear( + self.embedding_dim, + self.num_identities, + weight_attr=ParamAttr( + learning_rate=1., initializer=Normal( + mean=0.0, std=0.01)), + bias_attr=ParamAttr( + learning_rate=2., regularizer=L2Decay(0.)))) + + def forward(self, + identify_feats, + targets, + loss_confs=None, + loss_boxes=None, + bboxes=None, + boxes_idx=None, + nms_keep_idx=None): + assert self.num_classes == 1, 'JDE only support sindle class MOT.' + assert len(identify_feats) == self.anchor_levels + ide_outs = [] + for feat, ide_head in zip(identify_feats, self.identify_outputs): + ide_outs.append(ide_head(feat)) + + if self.training: + assert len(loss_confs) == len(loss_boxes) == self.anchor_levels + loss_ides = self.emb_loss(ide_outs, targets, self.emb_scale, + self.classifier) + jde_losses = self.jde_loss( + loss_confs, loss_boxes, loss_ides, self.loss_params_cls, + self.loss_params_reg, self.loss_params_ide, targets) + return jde_losses + else: + assert bboxes is not None + assert boxes_idx is not None + assert nms_keep_idx is not None + + emb_outs = self.get_emb_outs(ide_outs) + emb_valid = paddle.gather_nd(emb_outs, boxes_idx) + pred_embs = paddle.gather_nd(emb_valid, nms_keep_idx) + + input_shape = targets['image'].shape[2:] + # input_shape: [h, w], before data transforms, set in model config + im_shape = targets['im_shape'][0].numpy() + # im_shape: [new_h, new_w], after data transforms + scale_factor = targets['scale_factor'][0].numpy() + bboxes[:, 2:] = self.scale_coords(bboxes[:, 2:], input_shape, + im_shape, scale_factor) + # cls_ids, scores, tlwhs + pred_dets = bboxes + return pred_dets, pred_embs + + def scale_coords(self, coords, input_shape, im_shape, scale_factor): + ratio = scale_factor[0] + pad_w = (input_shape[1] - int(im_shape[1])) / 2 + pad_h = (input_shape[0] - int(im_shape[0])) / 2 + coords = paddle.cast(coords, 'float32') + coords[:, 0::2] -= pad_w + coords[:, 1::2] -= pad_h + coords[:, 0:4] /= ratio + coords[:, :4] = paddle.clip( + coords[:, :4], min=0, max=coords[:, :4].max()) + return coords.round() + + def get_emb_and_gt_outs(self, ide_outs, targets): + emb_and_gts = [] + for i, p_ide in enumerate(ide_outs): + t_conf = targets['tconf{}'.format(i)] + t_ide = targets['tide{}'.format(i)] + + p_ide = p_ide.transpose((0, 2, 3, 1)) + p_ide_flatten = paddle.reshape(p_ide, [-1, self.embedding_dim]) + + mask = t_conf > 0 + mask = paddle.cast(mask, dtype="int64") + emb_mask = mask.max(1).flatten() + emb_mask_inds = paddle.nonzero(emb_mask > 0).flatten() + if len(emb_mask_inds) > 0: + t_ide_flatten = paddle.reshape(t_ide.max(1), [-1, 1]) + tids = paddle.gather(t_ide_flatten, emb_mask_inds) + + embedding = paddle.gather(p_ide_flatten, emb_mask_inds) + embedding = self.emb_scale * F.normalize(embedding) + emb_and_gt = paddle.concat([embedding, tids], axis=1) + emb_and_gts.append(emb_and_gt) + + if len(emb_and_gts) > 0: + return paddle.concat(emb_and_gts, axis=0) + else: + return paddle.zeros((1, self.embedding_dim + 1)) + + def get_emb_outs(self, ide_outs): + emb_outs = [] + for i, p_ide in enumerate(ide_outs): + p_ide = p_ide.transpose((0, 2, 3, 1)) + + p_ide_repeat = paddle.tile(p_ide, [self.anchor_scales, 1, 1, 1]) + embedding = F.normalize(p_ide_repeat, axis=-1) + emb = paddle.reshape(embedding, [-1, self.embedding_dim]) + emb_outs.append(emb) + + if len(emb_outs) > 0: + return paddle.concat(emb_outs, axis=0) + else: + return paddle.zeros((1, self.embedding_dim)) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/pplcnet_embedding.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/pplcnet_embedding.py new file mode 100644 index 0000000000000000000000000000000000000000..d360f89149d807069345e6255d86190d517376b3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/pplcnet_embedding.py @@ -0,0 +1,281 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, Constant +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D, BatchNorm2D, Conv2D, Linear +from paddle.regularizer import L2Decay +from paddle.nn.initializer import KaimingNormal, XavierNormal +from ppdet.core.workspace import register + +__all__ = ['PPLCNetEmbedding'] + + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(nn.Layer): + def __init__(self, + num_channels, + filter_size, + num_filters, + stride, + num_groups=1): + super().__init__() + + self.conv = Conv2D( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm2D( + num_filters, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(nn.Layer): + def __init__(self, + num_channels, + num_filters, + stride, + dw_size=3, + use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer( + num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer( + num_channels=num_channels, + filter_size=1, + num_filters=num_filters, + stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(nn.Layer): + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D( + in_channels=channel, + out_channels=channel // reduction, + kernel_size=1, + stride=1, + padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D( + in_channels=channel // reduction, + out_channels=channel, + kernel_size=1, + stride=1, + padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(nn.Layer): + """ + PP-LCNet, see https://arxiv.org/abs/2109.15099. + This code is different from PPLCNet in ppdet/modeling/backbones/lcnet.py + or in PaddleClas, because the output is the flatten feature of last_conv. + + Args: + scale (float): Scale ratio of channels. + class_expand (int): Number of channels of conv feature. + """ + + def __init__(self, scale=1.0, class_expand=1280): + super(PPLCNet, self).__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer( + num_channels=3, + filter_size=3, + num_filters=make_divisible(16 * scale), + stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable( + num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) + for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + self.last_conv = Conv2D( + in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + self.hardswish = nn.Hardswish() + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.flatten(x) + return x + + +class FC(nn.Layer): + def __init__(self, input_ch, output_ch): + super(FC, self).__init__() + weight_attr = ParamAttr(initializer=XavierNormal()) + self.fc = paddle.nn.Linear(input_ch, output_ch, weight_attr=weight_attr) + + def forward(self, x): + out = self.fc(x) + return out + + +@register +class PPLCNetEmbedding(nn.Layer): + """ + PPLCNet Embedding + + Args: + input_ch (int): Number of channels of input conv feature. + output_ch (int): Number of channels of output conv feature. + """ + def __init__(self, scale=2.5, input_ch=1280, output_ch=512): + super(PPLCNetEmbedding, self).__init__() + self.backbone = PPLCNet(scale=scale) + self.neck = FC(input_ch, output_ch) + + def forward(self, x): + feat = self.backbone(x) + feat_out = self.neck(feat) + return feat_out diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/pyramidal_embedding.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/pyramidal_embedding.py new file mode 100644 index 0000000000000000000000000000000000000000..6b2a76d60ca09f647abfb3b5edad1f3c60fb8373 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/pyramidal_embedding.py @@ -0,0 +1,146 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal, Constant +from paddle import ParamAttr +from .resnet import ResNet50, ResNet101 +from ppdet.core.workspace import register + +__all__ = ['PCBPyramid'] + + +@register +class PCBPyramid(nn.Layer): + """ + PCB (Part-based Convolutional Baseline), see https://arxiv.org/abs/1711.09349, + Pyramidal Person Re-IDentification, see https://arxiv.org/abs/1810.12193 + + Args: + input_ch (int): Number of channels of the input feature. + num_stripes (int): Number of sub-parts. + used_levels (tuple): Whether the level is used, 1 means used. + num_classes (int): Number of classes for identities, default 751 in + Market-1501 dataset. + last_conv_stride (int): Stride of the last conv. + last_conv_dilation (int): Dilation of the last conv. + num_conv_out_channels (int): Number of channels of conv feature. + """ + + def __init__(self, + input_ch=2048, + model_name='ResNet101', + num_stripes=6, + used_levels=(1, 1, 1, 1, 1, 1), + num_classes=751, + last_conv_stride=1, + last_conv_dilation=1, + num_conv_out_channels=128): + super(PCBPyramid, self).__init__() + self.num_stripes = num_stripes + self.used_levels = used_levels + self.num_classes = num_classes + + self.num_in_each_level = [i for i in range(self.num_stripes, 0, -1)] + self.num_branches = sum(self.num_in_each_level) + + assert model_name in ['ResNet50', 'ResNet101'], "Unsupported ReID arch: {}".format(model_name) + self.base = eval(model_name)( + lr_mult=0.1, + last_conv_stride=last_conv_stride, + last_conv_dilation=last_conv_dilation) + self.dropout_layer = nn.Dropout(p=0.2) + self.pyramid_conv_list0, self.pyramid_fc_list0 = self.basic_branch( + num_conv_out_channels, input_ch) + + def basic_branch(self, num_conv_out_channels, input_ch): + # the level indexes are defined from fine to coarse, + # the branch will contain one more part than that of its previous level + # the sliding step is set to 1 + pyramid_conv_list = nn.LayerList() + pyramid_fc_list = nn.LayerList() + + idx_levels = 0 + for idx_branches in range(self.num_branches): + if idx_branches >= sum(self.num_in_each_level[0:idx_levels + 1]): + idx_levels += 1 + + pyramid_conv_list.append( + nn.Sequential( + nn.Conv2D(input_ch, num_conv_out_channels, 1), + nn.BatchNorm2D(num_conv_out_channels), nn.ReLU())) + + idx_levels = 0 + for idx_branches in range(self.num_branches): + if idx_branches >= sum(self.num_in_each_level[0:idx_levels + 1]): + idx_levels += 1 + + fc = nn.Linear( + in_features=num_conv_out_channels, + out_features=self.num_classes, + weight_attr=ParamAttr(initializer=Normal( + mean=0., std=0.001)), + bias_attr=ParamAttr(initializer=Constant(value=0.))) + pyramid_fc_list.append(fc) + return pyramid_conv_list, pyramid_fc_list + + def pyramid_forward(self, feat): + each_stripe_size = int(feat.shape[2] / self.num_stripes) + + feat_list, logits_list = [], [] + idx_levels = 0 + used_branches = 0 + for idx_branches in range(self.num_branches): + if idx_branches >= sum(self.num_in_each_level[0:idx_levels + 1]): + idx_levels += 1 + idx_in_each_level = idx_branches - sum(self.num_in_each_level[ + 0:idx_levels]) + stripe_size_in_each_level = each_stripe_size * (idx_levels + 1) + start = idx_in_each_level * each_stripe_size + end = start + stripe_size_in_each_level + + k = feat.shape[-1] + local_feat_avgpool = F.avg_pool2d( + feat[:, :, start:end, :], + kernel_size=(stripe_size_in_each_level, k)) + local_feat_maxpool = F.max_pool2d( + feat[:, :, start:end, :], + kernel_size=(stripe_size_in_each_level, k)) + local_feat = local_feat_avgpool + local_feat_maxpool + + local_feat = self.pyramid_conv_list0[used_branches](local_feat) + local_feat = paddle.reshape( + local_feat, shape=[local_feat.shape[0], -1]) + feat_list.append(local_feat) + + local_logits = self.pyramid_fc_list0[used_branches]( + self.dropout_layer(local_feat)) + logits_list.append(local_logits) + + used_branches += 1 + + return feat_list, logits_list + + def forward(self, x): + feat = self.base(x) + assert feat.shape[2] % self.num_stripes == 0 + feat_list, logits_list = self.pyramid_forward(feat) + feat_out = paddle.concat(feat_list, axis=-1) + return feat_out diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..2e2a85558d69cecb307df1f1098ec0bdd70a93e2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet.py @@ -0,0 +1,312 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import math +import paddle +from paddle import ParamAttr +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Normal + +__all__ = ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"] + + +class ConvBNLayer(nn.Layer): + def __init__(self, + num_channels, + num_filters, + filter_size, + stride=1, + dilation=1, + groups=1, + act=None, + lr_mult=1.0, + name=None, + data_format="NCHW"): + super(ConvBNLayer, self).__init__() + conv_stdv = filter_size * filter_size * num_filters + self._conv = nn.Conv2D( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + dilation=dilation, + groups=groups, + weight_attr=ParamAttr( + learning_rate=lr_mult, + initializer=Normal(0, math.sqrt(2. / conv_stdv))), + bias_attr=False, + data_format=data_format) + + self._batch_norm = nn.BatchNorm2D(num_filters) + self.act = act + + def forward(self, inputs): + y = self._conv(inputs) + y = self._batch_norm(y) + if self.act: + y = getattr(F, self.act)(y) + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels, + num_filters, + stride, + shortcut=True, + name=None, + lr_mult=1.0, + dilation=1, + data_format="NCHW"): + super(BottleneckBlock, self).__init__() + self.conv0 = ConvBNLayer( + num_channels=num_channels, + num_filters=num_filters, + filter_size=1, + dilation=dilation, + act="relu", + lr_mult=lr_mult, + name=name + "_branch2a", + data_format=data_format) + self.conv1 = ConvBNLayer( + num_channels=num_filters, + num_filters=num_filters, + filter_size=3, + dilation=dilation, + stride=stride, + act="relu", + lr_mult=lr_mult, + name=name + "_branch2b", + data_format=data_format) + self.conv2 = ConvBNLayer( + num_channels=num_filters, + num_filters=num_filters * 4, + filter_size=1, + dilation=dilation, + act=None, + lr_mult=lr_mult, + name=name + "_branch2c", + data_format=data_format) + if not shortcut: + self.short = ConvBNLayer( + num_channels=num_channels, + num_filters=num_filters * 4, + filter_size=1, + dilation=dilation, + stride=stride, + lr_mult=lr_mult, + name=name + "_branch1", + data_format=data_format) + self.shortcut = shortcut + self._num_channels_out = num_filters * 4 + + def forward(self, inputs): + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels, + num_filters, + stride, + shortcut=True, + name=None, + data_format="NCHW"): + super(BasicBlock, self).__init__() + self.stride = stride + self.conv0 = ConvBNLayer( + num_channels=num_channels, + num_filters=num_filters, + filter_size=3, + stride=stride, + act="relu", + name=name + "_branch2a", + data_format=data_format) + self.conv1 = ConvBNLayer( + num_channels=num_filters, + num_filters=num_filters, + filter_size=3, + act=None, + name=name + "_branch2b", + data_format=data_format) + if not shortcut: + self.short = ConvBNLayer( + num_channels=num_channels, + num_filters=num_filters, + filter_size=1, + stride=stride, + name=name + "_branch1", + data_format=data_format) + self.shortcut = shortcut + + def forward(self, inputs): + y = self.conv0(inputs) + conv1 = self.conv1(y) + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = paddle.add(x=short, y=conv1) + y = F.relu(y) + return y + + +class ResNet(nn.Layer): + def __init__(self, + layers=50, + lr_mult=1.0, + last_conv_stride=2, + last_conv_dilation=1): + super(ResNet, self).__init__() + self.layers = layers + self.data_format = "NCHW" + self.input_image_channel = 3 + supported_layers = [18, 34, 50, 101, 152] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + num_channels = [64, 256, 512, + 1024] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + self.conv = ConvBNLayer( + num_channels=self.input_image_channel, + num_filters=64, + filter_size=7, + stride=2, + act="relu", + lr_mult=lr_mult, + name="conv1", + data_format=self.data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=self.data_format) + self.block_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + if i != 0 or block == 0: + stride = 1 + elif block == len(depth) - 1: + stride = last_conv_stride + else: + stride = 2 + bottleneck_block = self.add_sublayer( + conv_name, + BottleneckBlock( + num_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + num_filters=num_filters[block], + stride=stride, + shortcut=shortcut, + name=conv_name, + lr_mult=lr_mult, + dilation=last_conv_dilation + if block == len(depth) - 1 else 1, + data_format=self.data_format)) + self.block_list.append(bottleneck_block) + shortcut = True + else: + for block in range(len(depth)): + shortcut = False + for i in range(depth[block]): + conv_name = "res" + str(block + 2) + chr(97 + i) + basic_block = self.add_sublayer( + conv_name, + BasicBlock( + num_channels=num_channels[block] + if i == 0 else num_filters[block], + num_filters=num_filters[block], + stride=2 if i == 0 and block != 0 else 1, + shortcut=shortcut, + name=conv_name, + data_format=self.data_format)) + self.block_list.append(basic_block) + shortcut = True + + def forward(self, inputs): + y = self.conv(inputs) + y = self.pool2d_max(y) + for block in self.block_list: + y = block(y) + return y + + +def ResNet18(**args): + model = ResNet(layers=18, **args) + return model + + +def ResNet34(**args): + model = ResNet(layers=34, **args) + return model + + +def ResNet50(pretrained=None, **args): + model = ResNet(layers=50, **args) + if pretrained is not None: + if not (os.path.isdir(pretrained) or + os.path.exists(pretrained + '.pdparams')): + raise ValueError("Model pretrain path {} does not " + "exists.".format(pretrained)) + param_state_dict = paddle.load(pretrained + '.pdparams') + model.set_dict(param_state_dict) + return model + + +def ResNet101(pretrained=None, **args): + model = ResNet(layers=101, **args) + if pretrained is not None: + if not (os.path.isdir(pretrained) or + os.path.exists(pretrained + '.pdparams')): + raise ValueError("Model pretrain path {} does not " + "exists.".format(pretrained)) + param_state_dict = paddle.load(pretrained + '.pdparams') + model.set_dict(param_state_dict) + return model + + +def ResNet152(**args): + model = ResNet(layers=152, **args) + return model diff --git a/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet_embedding.py b/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet_embedding.py new file mode 100644 index 0000000000000000000000000000000000000000..28c11eb40d6360adbef434679a386d1784addb36 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/reid/resnet_embedding.py @@ -0,0 +1,41 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import paddle +import paddle.nn.functional as F +from paddle import nn +from .resnet import ResNet50, ResNet101 +from ppdet.core.workspace import register + +__all__ = ['ResNetEmbedding'] + + +@register +class ResNetEmbedding(nn.Layer): + in_planes = 2048 + def __init__(self, model_name='ResNet50', last_stride=1): + super(ResNetEmbedding, self).__init__() + assert model_name in ['ResNet50', 'ResNet101'], "Unsupported ReID arch: {}".format(model_name) + self.base = eval(model_name)(last_conv_stride=last_stride) + self.gap = nn.AdaptiveAvgPool2D(output_size=1) + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + self.bn = nn.BatchNorm1D(self.in_planes, bias_attr=False) + + def forward(self, x): + base_out = self.base(x) + global_feat = self.gap(base_out) + global_feat = self.flatten(global_feat) + global_feat = self.bn(global_feat) + return global_feat diff --git a/PaddleDetection-release-2.6/ppdet/modeling/shape_spec.py b/PaddleDetection-release-2.6/ppdet/modeling/shape_spec.py new file mode 100644 index 0000000000000000000000000000000000000000..81601fd64cbaedcee57389cfa71ad8c04e97274c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/shape_spec.py @@ -0,0 +1,25 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The code is based on: +# https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/shape_spec.py + +from collections import namedtuple + + +class ShapeSpec( + namedtuple("_ShapeSpec", ["channels", "height", "width", "stride"])): + def __new__(cls, channels=None, height=None, width=None, stride=None): + return super(ShapeSpec, cls).__new__(cls, channels, height, width, + stride) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e7588577e943fcac4bbe1f6ea8e1dd17c4ca8362 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__init__.py @@ -0,0 +1,19 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import utils +from . import losses + +from .utils import * +from .losses import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..9630a21e91be6e19bdc3b64e09c61de7a7f33d7e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/losses.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/losses.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..93689e29b62975cc3ea1a2c5cdc69696b3b7a312 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/losses.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d288d4b6bffff4a8b8dcffc4486f57d997dcada8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/ssod/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/losses.py b/PaddleDetection-release-2.6/ppdet/modeling/ssod/losses.py new file mode 100644 index 0000000000000000000000000000000000000000..e4c5038d4b4c6657f8351ccaa3238d639b53d3f9 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/ssod/losses.py @@ -0,0 +1,236 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ppdet.modeling.losses.iou_loss import GIoULoss +from .utils import QFLv2 + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'SSODFCOSLoss', + 'SSODPPYOLOELoss', +] + + +@register +class SSODFCOSLoss(nn.Layer): + def __init__(self, loss_weight=1.0): + super(SSODFCOSLoss, self).__init__() + self.loss_weight = loss_weight + + def forward(self, student_head_outs, teacher_head_outs, train_cfg): + # for semi-det distill + student_logits, student_deltas, student_quality = student_head_outs + teacher_logits, teacher_deltas, teacher_quality = teacher_head_outs + nc = student_logits[0].shape[1] + + student_logits = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, nc]) + for _ in student_logits + ], + axis=0) + teacher_logits = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, nc]) + for _ in teacher_logits + ], + axis=0) + + student_deltas = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, 4]) + for _ in student_deltas + ], + axis=0) + teacher_deltas = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, 4]) + for _ in teacher_deltas + ], + axis=0) + + student_quality = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, 1]) + for _ in student_quality + ], + axis=0) + teacher_quality = paddle.concat( + [ + _.transpose([0, 2, 3, 1]).reshape([-1, 1]) + for _ in teacher_quality + ], + axis=0) + + ratio = train_cfg.get('ratio', 0.01) + with paddle.no_grad(): + # Region Selection + count_num = int(teacher_logits.shape[0] * ratio) + teacher_probs = F.sigmoid(teacher_logits) + max_vals = paddle.max(teacher_probs, 1) + sorted_vals, sorted_inds = paddle.topk(max_vals, + teacher_logits.shape[0]) + mask = paddle.zeros_like(max_vals) + mask[sorted_inds[:count_num]] = 1. + fg_num = sorted_vals[:count_num].sum() + b_mask = mask > 0 + + # distill_loss_cls + loss_logits = QFLv2( + F.sigmoid(student_logits), + teacher_probs, + weight=mask, + reduction="sum") / fg_num + + # distill_loss_box + inputs = paddle.concat( + (-student_deltas[b_mask][..., :2], student_deltas[b_mask][..., 2:]), + axis=-1) + targets = paddle.concat( + (-teacher_deltas[b_mask][..., :2], teacher_deltas[b_mask][..., 2:]), + axis=-1) + iou_loss = GIoULoss(reduction='mean') + loss_deltas = iou_loss(inputs, targets) + + # distill_loss_quality + loss_quality = F.binary_cross_entropy( + F.sigmoid(student_quality[b_mask]), + F.sigmoid(teacher_quality[b_mask]), + reduction='mean') + + return { + "distill_loss_cls": loss_logits, + "distill_loss_box": loss_deltas, + "distill_loss_quality": loss_quality, + "fg_sum": fg_num, + } + + +@register +class SSODPPYOLOELoss(nn.Layer): + def __init__(self, loss_weight=1.0): + super(SSODPPYOLOELoss, self).__init__() + self.loss_weight = loss_weight + + def forward(self, student_head_outs, teacher_head_outs, train_cfg): + # for semi-det distill + # student_probs: already sigmoid + student_probs, student_deltas, student_dfl = student_head_outs + teacher_probs, teacher_deltas, teacher_dfl = teacher_head_outs + bs, l, nc = student_probs.shape[:] # bs, l, num_classes + bs, l, _, reg_ch = student_dfl.shape[:] # bs, l, 4, reg_ch + student_probs = student_probs.reshape([-1, nc]) + teacher_probs = teacher_probs.reshape([-1, nc]) + student_deltas = student_deltas.reshape([-1, 4]) + teacher_deltas = teacher_deltas.reshape([-1, 4]) + student_dfl = student_dfl.reshape([-1, 4, reg_ch]) + teacher_dfl = teacher_dfl.reshape([-1, 4, reg_ch]) + + ratio = train_cfg.get('ratio', 0.01) + + # for contrast loss + curr_iter = train_cfg['curr_iter'] + st_iter = train_cfg['st_iter'] + if curr_iter == st_iter + 1: + # start semi-det training + self.queue_ptr = 0 + self.queue_size = int(bs * l * ratio) + self.queue_feats = paddle.zeros([self.queue_size, nc]) + self.queue_probs = paddle.zeros([self.queue_size, nc]) + contrast_loss_cfg = train_cfg['contrast_loss'] + temperature = contrast_loss_cfg.get('temperature', 0.2) + alpha = contrast_loss_cfg.get('alpha', 0.9) + smooth_iter = contrast_loss_cfg.get('smooth_iter', 100) + st_iter + + with paddle.no_grad(): + # Region Selection + count_num = int(teacher_probs.shape[0] * ratio) + max_vals = paddle.max(teacher_probs, 1) + sorted_vals, sorted_inds = paddle.topk(max_vals, + teacher_probs.shape[0]) + mask = paddle.zeros_like(max_vals) + mask[sorted_inds[:count_num]] = 1. + fg_num = sorted_vals[:count_num].sum() + b_mask = mask > 0. + + # for contrast loss + probs = teacher_probs[b_mask].detach() + if curr_iter > smooth_iter: # memory-smoothing + A = paddle.exp( + paddle.mm(teacher_probs[b_mask], self.queue_probs.t()) / + temperature) + A = A / A.sum(1, keepdim=True) + probs = alpha * probs + (1 - alpha) * paddle.mm( + A, self.queue_probs) + n = student_probs[b_mask].shape[0] + # update memory bank + self.queue_feats[self.queue_ptr:self.queue_ptr + + n, :] = teacher_probs[b_mask].detach() + self.queue_probs[self.queue_ptr:self.queue_ptr + + n, :] = teacher_probs[b_mask].detach() + self.queue_ptr = (self.queue_ptr + n) % self.queue_size + + # embedding similarity + sim = paddle.exp( + paddle.mm(student_probs[b_mask], teacher_probs[b_mask].t()) / 0.2) + sim_probs = sim / sim.sum(1, keepdim=True) + # pseudo-label graph with self-loop + Q = paddle.mm(probs, probs.t()) + Q.fill_diagonal_(1) + pos_mask = (Q >= 0.5).astype('float32') + Q = Q * pos_mask + Q = Q / Q.sum(1, keepdim=True) + # contrastive loss + loss_contrast = -(paddle.log(sim_probs + 1e-7) * Q).sum(1) + loss_contrast = loss_contrast.mean() + + # distill_loss_cls + loss_cls = QFLv2( + student_probs, teacher_probs, weight=mask, reduction="sum") / fg_num + + # distill_loss_iou + inputs = paddle.concat( + (-student_deltas[b_mask][..., :2], student_deltas[b_mask][..., 2:]), + -1) + targets = paddle.concat( + (-teacher_deltas[b_mask][..., :2], teacher_deltas[b_mask][..., 2:]), + -1) + iou_loss = GIoULoss(reduction='mean') + loss_iou = iou_loss(inputs, targets) + + # distill_loss_dfl + loss_dfl = F.cross_entropy( + student_dfl[b_mask].reshape([-1, reg_ch]), + teacher_dfl[b_mask].reshape([-1, reg_ch]), + soft_label=True, + reduction='mean') + + return { + "distill_loss_cls": loss_cls, + "distill_loss_iou": loss_iou, + "distill_loss_dfl": loss_dfl, + "distill_loss_contrast": loss_contrast, + "fg_sum": fg_num, + } diff --git a/PaddleDetection-release-2.6/ppdet/modeling/ssod/utils.py b/PaddleDetection-release-2.6/ppdet/modeling/ssod/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..09753abfeddd4a017cb64ec8560ad0da1e585708 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/ssod/utils.py @@ -0,0 +1,82 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn.functional as F + + +def align_weak_strong_shape(data_weak, data_strong): + max_shape_x = max(data_strong['image'].shape[2], + data_weak['image'].shape[2]) + max_shape_y = max(data_strong['image'].shape[3], + data_weak['image'].shape[3]) + + scale_x_s = max_shape_x / data_strong['image'].shape[2] + scale_y_s = max_shape_y / data_strong['image'].shape[3] + scale_x_w = max_shape_x / data_weak['image'].shape[2] + scale_y_w = max_shape_y / data_weak['image'].shape[3] + target_size = [max_shape_x, max_shape_y] + + if scale_x_s != 1 or scale_y_s != 1: + data_strong['image'] = F.interpolate( + data_strong['image'], + size=target_size, + mode='bilinear', + align_corners=False) + if 'gt_bbox' in data_strong: + gt_bboxes = data_strong['gt_bbox'].numpy() + for i in range(len(gt_bboxes)): + if len(gt_bboxes[i]) > 0: + gt_bboxes[i][:, 0::2] = gt_bboxes[i][:, 0::2] * scale_x_s + gt_bboxes[i][:, 1::2] = gt_bboxes[i][:, 1::2] * scale_y_s + data_strong['gt_bbox'] = paddle.to_tensor(gt_bboxes) + + if scale_x_w != 1 or scale_y_w != 1: + data_weak['image'] = F.interpolate( + data_weak['image'], + size=target_size, + mode='bilinear', + align_corners=False) + if 'gt_bbox' in data_weak: + gt_bboxes = data_weak['gt_bbox'].numpy() + for i in range(len(gt_bboxes)): + if len(gt_bboxes[i]) > 0: + gt_bboxes[i][:, 0::2] = gt_bboxes[i][:, 0::2] * scale_x_w + gt_bboxes[i][:, 1::2] = gt_bboxes[i][:, 1::2] * scale_y_w + data_weak['gt_bbox'] = paddle.to_tensor(gt_bboxes) + return data_weak, data_strong + + +def QFLv2(pred_sigmoid, + teacher_sigmoid, + weight=None, + beta=2.0, + reduction='mean'): + pt = pred_sigmoid + zerolabel = paddle.zeros_like(pt) + loss = F.binary_cross_entropy( + pred_sigmoid, zerolabel, reduction='none') * pt.pow(beta) + pos = weight > 0 + + pt = teacher_sigmoid[pos] - pred_sigmoid[pos] + loss[pos] = F.binary_cross_entropy( + pred_sigmoid[pos], teacher_sigmoid[pos], + reduction='none') * pt.pow(beta) + + valid = weight >= 0 + if reduction == "mean": + loss = loss[valid].mean() + elif reduction == "sum": + loss = loss[valid].sum() + return loss diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..847ddc47ac89114f2012bc6b9990a69abfe39fb3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000139.jpg b/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000139.jpg new file mode 100644 index 0000000000000000000000000000000000000000..19023f718333c56c70776c79201dc03d742c1ed3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000139.jpg differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000724.jpg b/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000724.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2a17e0c6ee400dcba762c4d56dea03d7e124b9c5 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/tests/imgs/coco2017_val2017_000000000724.jpg differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/test_architectures.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_architectures.py new file mode 100644 index 0000000000000000000000000000000000000000..5de79b2cedb3fffac0ce853406560821a9142363 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_architectures.py @@ -0,0 +1,69 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import unittest +import ppdet + + +class TestFasterRCNN(unittest.TestCase): + def setUp(self): + self.set_config() + + def set_config(self): + self.cfg_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml' + + def test_trainer(self): + # Trainer __init__ will build model and DataLoader + # 'train' and 'eval' mode include dataset loading + # use 'test' mode to simplify tests + cfg = ppdet.core.workspace.load_config(self.cfg_file) + trainer = ppdet.engine.Trainer(cfg, mode='test') + + +class TestMaskRCNN(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml' + + +class TestCascadeRCNN(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml' + + +class TestYolov3(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/yolov3/yolov3_darknet53_270e_coco.yml' + + +class TestSSD(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/ssd/ssd_vgg16_300_240e_voc.yml' + + +class TestGFL(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/gfl/gfl_r50_fpn_1x_coco.yml' + + +class TestPicoDet(TestFasterRCNN): + def set_config(self): + self.cfg_file = 'configs/picodet/picodet_s_320_coco_lcnet.yml' + + +if __name__ == '__main__': + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/test_base.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_base.py new file mode 100644 index 0000000000000000000000000000000000000000..451aa78e32ce0682f55a2ab0f9d1ea03e939e481 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_base.py @@ -0,0 +1,70 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function +import unittest + +import contextlib + +import paddle +from paddle.static import Program + + +class LayerTest(unittest.TestCase): + @classmethod + def setUpClass(cls): + cls.seed = 111 + + @classmethod + def tearDownClass(cls): + pass + + def _get_place(self, force_to_use_cpu=False): + # this option for ops that only have cpu kernel + if force_to_use_cpu: + return 'cpu' + else: + return paddle.device.get_device() + + @contextlib.contextmanager + def static_graph(self): + paddle.enable_static() + scope = paddle.static.Scope() + program = Program() + with paddle.static.scope_guard(scope): + with paddle.static.program_guard(program): + paddle.seed(self.seed) + paddle.framework.random._manual_program_seed(self.seed) + yield + + def get_static_graph_result(self, + feed, + fetch_list, + with_lod=False, + force_to_use_cpu=False): + exe = paddle.static.Executor(self._get_place(force_to_use_cpu)) + exe.run(paddle.static.default_startup_program()) + return exe.run(paddle.static.default_main_program(), + feed=feed, + fetch_list=fetch_list, + return_numpy=(not with_lod)) + + @contextlib.contextmanager + def dynamic_graph(self, force_to_use_cpu=False): + paddle.disable_static() + place = self._get_place(force_to_use_cpu=force_to_use_cpu) + paddle.device.set_device(place) + paddle.seed(self.seed) + paddle.framework.random._manual_program_seed(self.seed) + yield diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/test_mstest.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_mstest.py new file mode 100644 index 0000000000000000000000000000000000000000..a5b75110afd326f6bff10acdc85eb4b0461ba910 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_mstest.py @@ -0,0 +1,62 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import unittest +from ppdet.core.workspace import load_config +from ppdet.engine import Trainer + + +class TestMultiScaleInference(unittest.TestCase): + def setUp(self): + self.set_config() + + def set_config(self): + self.mstest_cfg_file = 'configs/faster_rcnn/faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml' + + # test evaluation with multi scale test + def test_eval_mstest(self): + cfg = load_config(self.mstest_cfg_file) + trainer = Trainer(cfg, mode='eval') + + cfg.weights = 'https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams' + trainer.load_weights(cfg.weights) + + trainer.evaluate() + + # test inference with multi scale test + def test_infer_mstest(self): + cfg = load_config(self.mstest_cfg_file) + trainer = Trainer(cfg, mode='test') + + cfg.weights = 'https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams' + trainer.load_weights(cfg.weights) + tests_img_root = os.path.join(os.path.dirname(__file__), 'imgs') + + # input images to predict + imgs = [ + 'coco2017_val2017_000000000139.jpg', + 'coco2017_val2017_000000000724.jpg' + ] + imgs = [os.path.join(tests_img_root, img) for img in imgs] + trainer.predict( + imgs, draw_threshold=0.5, output_dir='output', save_results=False) + + +if __name__ == '__main__': + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/test_ops.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..4827be5a7bc1fe33d5d736a0fb546bd993c7e059 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_ops.py @@ -0,0 +1,456 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function +import os, sys +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +import unittest +import numpy as np + +import paddle + +import ppdet.modeling.ops as ops +from ppdet.modeling.tests.test_base import LayerTest + + +def make_rois(h, w, rois_num, output_size): + rois = np.zeros((0, 4)).astype('float32') + for roi_num in rois_num: + roi = np.zeros((roi_num, 4)).astype('float32') + roi[:, 0] = np.random.randint(0, h - output_size[0], size=roi_num) + roi[:, 1] = np.random.randint(0, w - output_size[1], size=roi_num) + roi[:, 2] = np.random.randint(roi[:, 0] + output_size[0], h) + roi[:, 3] = np.random.randint(roi[:, 1] + output_size[1], w) + rois = np.vstack((rois, roi)) + return rois + + +def softmax(x): + # clip to shiftx, otherwise, when calc loss with + # log(exp(shiftx)), may get log(0)=INF + shiftx = (x - np.max(x)).clip(-64.) + exps = np.exp(shiftx) + return exps / np.sum(exps) + + +class TestROIAlign(LayerTest): + def test_roi_align(self): + b, c, h, w = 2, 12, 20, 20 + inputs_np = np.random.rand(b, c, h, w).astype('float32') + rois_num = [4, 6] + output_size = (7, 7) + rois_np = make_rois(h, w, rois_num, output_size) + rois_num_np = np.array(rois_num).astype('int32') + with self.static_graph(): + inputs = paddle.static.data( + name='inputs', shape=[b, c, h, w], dtype='float32') + rois = paddle.static.data( + name='rois', shape=[10, 4], dtype='float32') + rois_num = paddle.static.data( + name='rois_num', shape=[None], dtype='int32') + + output = paddle.vision.ops.roi_align( + x=inputs, + boxes=rois, + boxes_num=rois_num, + output_size=output_size) + output_np, = self.get_static_graph_result( + feed={ + 'inputs': inputs_np, + 'rois': rois_np, + 'rois_num': rois_num_np + }, + fetch_list=output, + with_lod=False) + + with self.dynamic_graph(): + inputs_dy = paddle.to_tensor(inputs_np) + rois_dy = paddle.to_tensor(rois_np) + rois_num_dy = paddle.to_tensor(rois_num_np) + + output_dy = paddle.vision.ops.roi_align( + x=inputs_dy, + boxes=rois_dy, + boxes_num=rois_num_dy, + output_size=output_size) + output_dy_np = output_dy.numpy() + + self.assertTrue(np.array_equal(output_np, output_dy_np)) + + def test_roi_align_error(self): + with self.static_graph(): + inputs = paddle.static.data( + name='inputs', shape=[2, 12, 20, 20], dtype='float32') + rois = paddle.static.data( + name='data_error', shape=[10, 4], dtype='int32', lod_level=1) + self.assertRaises( + TypeError, + paddle.vision.ops.roi_align, + input=inputs, + rois=rois, + output_size=(7, 7)) + + paddle.disable_static() + + +class TestROIPool(LayerTest): + def test_roi_pool(self): + b, c, h, w = 2, 12, 20, 20 + inputs_np = np.random.rand(b, c, h, w).astype('float32') + rois_num = [4, 6] + output_size = (7, 7) + rois_np = make_rois(h, w, rois_num, output_size) + rois_num_np = np.array(rois_num).astype('int32') + with self.static_graph(): + inputs = paddle.static.data( + name='inputs', shape=[b, c, h, w], dtype='float32') + rois = paddle.static.data( + name='rois', shape=[10, 4], dtype='float32') + rois_num = paddle.static.data( + name='rois_num', shape=[None], dtype='int32') + + output = paddle.vision.ops.roi_pool( + x=inputs, + boxes=rois, + boxes_num=rois_num, + output_size=output_size) + output_np, = self.get_static_graph_result( + feed={ + 'inputs': inputs_np, + 'rois': rois_np, + 'rois_num': rois_num_np + }, + fetch_list=[output], + with_lod=False) + + with self.dynamic_graph(): + inputs_dy = paddle.to_tensor(inputs_np) + rois_dy = paddle.to_tensor(rois_np) + rois_num_dy = paddle.to_tensor(rois_num_np) + + output_dy = paddle.vision.ops.roi_pool( + x=inputs_dy, + boxes=rois_dy, + boxes_num=rois_num_dy, + output_size=output_size) + output_dy_np = output_dy.numpy() + + self.assertTrue(np.array_equal(output_np, output_dy_np)) + + def test_roi_pool_error(self): + with self.static_graph(): + inputs = paddle.static.data( + name='inputs', shape=[2, 12, 20, 20], dtype='float32') + rois = paddle.static.data( + name='data_error', shape=[10, 4], dtype='int32', lod_level=1) + self.assertRaises( + TypeError, + paddle.vision.ops.roi_pool, + input=inputs, + rois=rois, + output_size=(7, 7)) + + paddle.disable_static() + + +class TestPriorBox(LayerTest): + def test_prior_box(self): + input_np = np.random.rand(2, 10, 32, 32).astype('float32') + image_np = np.random.rand(2, 10, 40, 40).astype('float32') + min_sizes = [2, 4] + with self.static_graph(): + input = paddle.static.data( + name='input', shape=[2, 10, 32, 32], dtype='float32') + image = paddle.static.data( + name='image', shape=[2, 10, 40, 40], dtype='float32') + + box, var = ops.prior_box( + input=input, + image=image, + min_sizes=min_sizes, + clip=True, + flip=True) + box_np, var_np = self.get_static_graph_result( + feed={ + 'input': input_np, + 'image': image_np, + }, + fetch_list=[box, var], + with_lod=False) + + with self.dynamic_graph(): + inputs_dy = paddle.to_tensor(input_np) + image_dy = paddle.to_tensor(image_np) + + box_dy, var_dy = ops.prior_box( + input=inputs_dy, + image=image_dy, + min_sizes=min_sizes, + clip=True, + flip=True) + box_dy_np = box_dy.numpy() + var_dy_np = var_dy.numpy() + + self.assertTrue(np.array_equal(box_np, box_dy_np)) + self.assertTrue(np.array_equal(var_np, var_dy_np)) + + def test_prior_box_error(self): + with self.static_graph(): + input = paddle.static.data( + name='input', shape=[2, 10, 32, 32], dtype='int32') + image = paddle.static.data( + name='image', shape=[2, 10, 40, 40], dtype='int32') + self.assertRaises( + TypeError, + ops.prior_box, + input=input, + image=image, + min_sizes=[2, 4], + clip=True, + flip=True) + + paddle.disable_static() + + +class TestMulticlassNms(LayerTest): + def test_multiclass_nms(self): + boxes_np = np.random.rand(10, 81, 4).astype('float32') + scores_np = np.random.rand(10, 81).astype('float32') + rois_num_np = np.array([2, 8]).astype('int32') + with self.static_graph(): + boxes = paddle.static.data( + name='bboxes', + shape=[None, 81, 4], + dtype='float32', + lod_level=1) + scores = paddle.static.data( + name='scores', shape=[None, 81], dtype='float32', lod_level=1) + rois_num = paddle.static.data( + name='rois_num', shape=[None], dtype='int32') + + output = ops.multiclass_nms( + bboxes=boxes, + scores=scores, + background_label=0, + score_threshold=0.5, + nms_top_k=400, + nms_threshold=0.3, + keep_top_k=200, + normalized=False, + return_index=True, + rois_num=rois_num) + out_np, index_np, nms_rois_num_np = self.get_static_graph_result( + feed={ + 'bboxes': boxes_np, + 'scores': scores_np, + 'rois_num': rois_num_np + }, + fetch_list=output, + with_lod=True) + out_np = np.array(out_np) + index_np = np.array(index_np) + nms_rois_num_np = np.array(nms_rois_num_np) + + with self.dynamic_graph(): + boxes_dy = paddle.to_tensor(boxes_np) + scores_dy = paddle.to_tensor(scores_np) + rois_num_dy = paddle.to_tensor(rois_num_np) + + out_dy, index_dy, nms_rois_num_dy = ops.multiclass_nms( + bboxes=boxes_dy, + scores=scores_dy, + background_label=0, + score_threshold=0.5, + nms_top_k=400, + nms_threshold=0.3, + keep_top_k=200, + normalized=False, + return_index=True, + rois_num=rois_num_dy) + out_dy_np = out_dy.numpy() + index_dy_np = index_dy.numpy() + nms_rois_num_dy_np = nms_rois_num_dy.numpy() + + self.assertTrue(np.array_equal(out_np, out_dy_np)) + self.assertTrue(np.array_equal(index_np, index_dy_np)) + self.assertTrue(np.array_equal(nms_rois_num_np, nms_rois_num_dy_np)) + + def test_multiclass_nms_error(self): + with self.static_graph(): + boxes = paddle.static.data( + name='bboxes', shape=[81, 4], dtype='float32', lod_level=1) + scores = paddle.static.data( + name='scores', shape=[81], dtype='float32', lod_level=1) + rois_num = paddle.static.data( + name='rois_num', shape=[40, 41], dtype='int32') + self.assertRaises( + TypeError, + ops.multiclass_nms, + boxes=boxes, + scores=scores, + background_label=0, + score_threshold=0.5, + nms_top_k=400, + nms_threshold=0.3, + keep_top_k=200, + normalized=False, + return_index=True, + rois_num=rois_num) + + +class TestMatrixNMS(LayerTest): + def test_matrix_nms(self): + N, M, C = 7, 1200, 21 + BOX_SIZE = 4 + nms_top_k = 400 + keep_top_k = 200 + score_threshold = 0.01 + post_threshold = 0. + + scores_np = np.random.random((N * M, C)).astype('float32') + scores_np = np.apply_along_axis(softmax, 1, scores_np) + scores_np = np.reshape(scores_np, (N, M, C)) + scores_np = np.transpose(scores_np, (0, 2, 1)) + + boxes_np = np.random.random((N, M, BOX_SIZE)).astype('float32') + boxes_np[:, :, 0:2] = boxes_np[:, :, 0:2] * 0.5 + boxes_np[:, :, 2:4] = boxes_np[:, :, 2:4] * 0.5 + 0.5 + + with self.static_graph(): + boxes = paddle.static.data( + name='boxes', shape=[N, M, BOX_SIZE], dtype='float32') + scores = paddle.static.data( + name='scores', shape=[N, C, M], dtype='float32') + out, index, _ = ops.matrix_nms( + bboxes=boxes, + scores=scores, + score_threshold=score_threshold, + post_threshold=post_threshold, + nms_top_k=nms_top_k, + keep_top_k=keep_top_k, + return_index=True) + out_np, index_np = self.get_static_graph_result( + feed={'boxes': boxes_np, + 'scores': scores_np}, + fetch_list=[out, index], + with_lod=True) + + with self.dynamic_graph(): + boxes_dy = paddle.to_tensor(boxes_np) + scores_dy = paddle.to_tensor(scores_np) + + out_dy, index_dy, _ = ops.matrix_nms( + bboxes=boxes_dy, + scores=scores_dy, + score_threshold=score_threshold, + post_threshold=post_threshold, + nms_top_k=nms_top_k, + keep_top_k=keep_top_k, + return_index=True) + out_dy_np = out_dy.numpy() + index_dy_np = index_dy.numpy() + + self.assertTrue(np.array_equal(out_np, out_dy_np)) + self.assertTrue(np.array_equal(index_np, index_dy_np)) + + def test_matrix_nms_error(self): + with self.static_graph(): + bboxes = paddle.static.data( + name='bboxes', shape=[7, 1200, 4], dtype='float32') + scores = paddle.static.data( + name='data_error', shape=[7, 21, 1200], dtype='int32') + self.assertRaises( + TypeError, + ops.matrix_nms, + bboxes=bboxes, + scores=scores, + score_threshold=0.01, + post_threshold=0., + nms_top_k=400, + keep_top_k=200, + return_index=True) + + paddle.disable_static() + + +class TestBoxCoder(LayerTest): + def test_box_coder(self): + + prior_box_np = np.random.random((81, 4)).astype('float32') + prior_box_var_np = np.random.random((81, 4)).astype('float32') + target_box_np = np.random.random((20, 81, 4)).astype('float32') + + # static + with self.static_graph(): + prior_box = paddle.static.data( + name='prior_box', shape=[81, 4], dtype='float32') + prior_box_var = paddle.static.data( + name='prior_box_var', shape=[81, 4], dtype='float32') + target_box = paddle.static.data( + name='target_box', shape=[20, 81, 4], dtype='float32') + + boxes = ops.box_coder( + prior_box=prior_box, + prior_box_var=prior_box_var, + target_box=target_box, + code_type="decode_center_size", + box_normalized=False) + + boxes_np, = self.get_static_graph_result( + feed={ + 'prior_box': prior_box_np, + 'prior_box_var': prior_box_var_np, + 'target_box': target_box_np, + }, + fetch_list=[boxes], + with_lod=False) + + # dygraph + with self.dynamic_graph(): + prior_box_dy = paddle.to_tensor(prior_box_np) + prior_box_var_dy = paddle.to_tensor(prior_box_var_np) + target_box_dy = paddle.to_tensor(target_box_np) + + boxes_dy = ops.box_coder( + prior_box=prior_box_dy, + prior_box_var=prior_box_var_dy, + target_box=target_box_dy, + code_type="decode_center_size", + box_normalized=False) + + boxes_dy_np = boxes_dy.numpy() + + self.assertTrue(np.array_equal(boxes_np, boxes_dy_np)) + + def test_box_coder_error(self): + with self.static_graph(): + prior_box = paddle.static.data( + name='prior_box', shape=[81, 4], dtype='int32') + prior_box_var = paddle.static.data( + name='prior_box_var', shape=[81, 4], dtype='float32') + target_box = paddle.static.data( + name='target_box', shape=[20, 81, 4], dtype='float32') + + self.assertRaises(TypeError, ops.box_coder, prior_box, + prior_box_var, target_box) + + paddle.disable_static() + + +if __name__ == '__main__': + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/tests/test_yolov3_loss.py b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_yolov3_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..0ad1c8e803ef2c6faedf3aab3049c26949fed003 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/tests/test_yolov3_loss.py @@ -0,0 +1,403 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import division + +import unittest + +import paddle +import paddle.nn.functional as F +# add python path of PaddleDetection to sys.path +import os +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.modeling.losses import YOLOv3Loss +from ppdet.data.transform.op_helper import jaccard_overlap +from ppdet.modeling.bbox_utils import iou_similarity +import numpy as np +np.random.seed(0) + + +def _split_output(output, an_num, num_classes): + """ + Split output feature map to x, y, w, h, objectness, classification + along channel dimension + """ + x = paddle.strided_slice( + output, + axes=[1], + starts=[0], + ends=[output.shape[1]], + strides=[5 + num_classes]) + y = paddle.strided_slice( + output, + axes=[1], + starts=[1], + ends=[output.shape[1]], + strides=[5 + num_classes]) + w = paddle.strided_slice( + output, + axes=[1], + starts=[2], + ends=[output.shape[1]], + strides=[5 + num_classes]) + h = paddle.strided_slice( + output, + axes=[1], + starts=[3], + ends=[output.shape[1]], + strides=[5 + num_classes]) + obj = paddle.strided_slice( + output, + axes=[1], + starts=[4], + ends=[output.shape[1]], + strides=[5 + num_classes]) + clss = [] + stride = output.shape[1] // an_num + for m in range(an_num): + clss.append( + paddle.slice( + output, + axes=[1], + starts=[stride * m + 5], + ends=[stride * m + 5 + num_classes])) + cls = paddle.transpose(paddle.stack(clss, axis=1), perm=[0, 1, 3, 4, 2]) + return (x, y, w, h, obj, cls) + + +def _split_target(target): + """ + split target to x, y, w, h, objectness, classification + along dimension 2 + target is in shape [N, an_num, 6 + class_num, H, W] + """ + tx = target[:, :, 0, :, :] + ty = target[:, :, 1, :, :] + tw = target[:, :, 2, :, :] + th = target[:, :, 3, :, :] + tscale = target[:, :, 4, :, :] + tobj = target[:, :, 5, :, :] + tcls = paddle.transpose(target[:, :, 6:, :, :], perm=[0, 1, 3, 4, 2]) + tcls.stop_gradient = True + return (tx, ty, tw, th, tscale, tobj, tcls) + + +def _calc_obj_loss(output, obj, tobj, gt_box, batch_size, anchors, num_classes, + downsample, ignore_thresh, scale_x_y): + # A prediction bbox overlap any gt_bbox over ignore_thresh, + # objectness loss will be ignored, process as follows: + # 1. get pred bbox, which is same with YOLOv3 infer mode, use yolo_box here + # NOTE: img_size is set as 1.0 to get noramlized pred bbox + bbox, prob = paddle.vision.ops.yolo_box( + x=output, + img_size=paddle.ones( + shape=[batch_size, 2], dtype="int32"), + anchors=anchors, + class_num=num_classes, + conf_thresh=0., + downsample_ratio=downsample, + clip_bbox=False, + scale_x_y=scale_x_y) + # 2. split pred bbox and gt bbox by sample, calculate IoU between pred bbox + # and gt bbox in each sample + if batch_size > 1: + preds = paddle.split(bbox, batch_size, axis=0) + gts = paddle.split(gt_box, batch_size, axis=0) + else: + preds = [bbox] + gts = [gt_box] + probs = [prob] + ious = [] + for pred, gt in zip(preds, gts): + + def box_xywh2xyxy(box): + x = box[:, 0] + y = box[:, 1] + w = box[:, 2] + h = box[:, 3] + return paddle.stack( + [ + x - w / 2., + y - h / 2., + x + w / 2., + y + h / 2., + ], axis=1) + + pred = paddle.squeeze(pred, axis=[0]) + gt = box_xywh2xyxy(paddle.squeeze(gt, axis=[0])) + ious.append(iou_similarity(pred, gt)) + iou = paddle.stack(ious, axis=0) + # 3. Get iou_mask by IoU between gt bbox and prediction bbox, + # Get obj_mask by tobj(holds gt_score), calculate objectness loss + max_iou = paddle.max(iou, axis=-1) + iou_mask = paddle.cast(max_iou <= ignore_thresh, dtype="float32") + output_shape = paddle.shape(output) + an_num = len(anchors) // 2 + iou_mask = paddle.reshape(iou_mask, (-1, an_num, output_shape[2], + output_shape[3])) + iou_mask.stop_gradient = True + # NOTE: tobj holds gt_score, obj_mask holds object existence mask + obj_mask = paddle.cast(tobj > 0., dtype="float32") + obj_mask.stop_gradient = True + # For positive objectness grids, objectness loss should be calculated + # For negative objectness grids, objectness loss is calculated only iou_mask == 1.0 + obj_sigmoid = F.sigmoid(obj) + loss_obj = F.binary_cross_entropy(obj_sigmoid, obj_mask, reduction='none') + loss_obj_pos = paddle.sum(loss_obj * tobj, axis=[1, 2, 3]) + loss_obj_neg = paddle.sum(loss_obj * (1.0 - obj_mask) * iou_mask, + axis=[1, 2, 3]) + return loss_obj_pos, loss_obj_neg + + +def fine_grained_loss(output, + target, + gt_box, + batch_size, + num_classes, + anchors, + ignore_thresh, + downsample, + scale_x_y=1., + eps=1e-10): + an_num = len(anchors) // 2 + x, y, w, h, obj, cls = _split_output(output, an_num, num_classes) + tx, ty, tw, th, tscale, tobj, tcls = _split_target(target) + + tscale_tobj = tscale * tobj + + scale_x_y = scale_x_y + + if (abs(scale_x_y - 1.0) < eps): + x = F.sigmoid(x) + y = F.sigmoid(y) + loss_x = F.binary_cross_entropy(x, tx, reduction='none') * tscale_tobj + loss_x = paddle.sum(loss_x, axis=[1, 2, 3]) + loss_y = F.binary_cross_entropy(y, ty, reduction='none') * tscale_tobj + loss_y = paddle.sum(loss_y, axis=[1, 2, 3]) + else: + dx = scale_x_y * F.sigmoid(x) - 0.5 * (scale_x_y - 1.0) + dy = scale_x_y * F.sigmoid(y) - 0.5 * (scale_x_y - 1.0) + loss_x = paddle.abs(dx - tx) * tscale_tobj + loss_x = paddle.sum(loss_x, axis=[1, 2, 3]) + loss_y = paddle.abs(dy - ty) * tscale_tobj + loss_y = paddle.sum(loss_y, axis=[1, 2, 3]) + + # NOTE: we refined loss function of (w, h) as L1Loss + loss_w = paddle.abs(w - tw) * tscale_tobj + loss_w = paddle.sum(loss_w, axis=[1, 2, 3]) + loss_h = paddle.abs(h - th) * tscale_tobj + loss_h = paddle.sum(loss_h, axis=[1, 2, 3]) + + loss_obj_pos, loss_obj_neg = _calc_obj_loss( + output, obj, tobj, gt_box, batch_size, anchors, num_classes, downsample, + ignore_thresh, scale_x_y) + + cls = F.sigmoid(cls) + loss_cls = F.binary_cross_entropy(cls, tcls, reduction='none') + tobj = paddle.unsqueeze(tobj, axis=-1) + + loss_cls = paddle.multiply(loss_cls, tobj) + loss_cls = paddle.sum(loss_cls, axis=[1, 2, 3, 4]) + + loss_xys = paddle.mean(loss_x + loss_y) + loss_whs = paddle.mean(loss_w + loss_h) + loss_objs = paddle.mean(loss_obj_pos + loss_obj_neg) + loss_clss = paddle.mean(loss_cls) + + losses_all = { + "loss_xy": paddle.sum(loss_xys), + "loss_wh": paddle.sum(loss_whs), + "loss_loc": paddle.sum(loss_xys) + paddle.sum(loss_whs), + "loss_obj": paddle.sum(loss_objs), + "loss_cls": paddle.sum(loss_clss), + } + return losses_all, x, y, tx, ty + + +def gt2yolotarget(gt_bbox, gt_class, gt_score, anchors, mask, num_classes, size, + stride): + grid_h, grid_w = size + h, w = grid_h * stride, grid_w * stride + an_hw = np.array(anchors) / np.array([[w, h]]) + target = np.zeros( + (len(mask), 6 + num_classes, grid_h, grid_w), dtype=np.float32) + for b in range(gt_bbox.shape[0]): + gx, gy, gw, gh = gt_bbox[b, :] + cls = gt_class[b] + score = gt_score[b] + if gw <= 0. or gh <= 0. or score <= 0.: + continue + + # find best match anchor index + best_iou = 0. + best_idx = -1 + for an_idx in range(an_hw.shape[0]): + iou = jaccard_overlap([0., 0., gw, gh], + [0., 0., an_hw[an_idx, 0], an_hw[an_idx, 1]]) + if iou > best_iou: + best_iou = iou + best_idx = an_idx + + gi = int(gx * grid_w) + gj = int(gy * grid_h) + + # gtbox should be regresed in this layes if best match + # anchor index in anchor mask of this layer + if best_idx in mask: + best_n = mask.index(best_idx) + + # x, y, w, h, scale + target[best_n, 0, gj, gi] = gx * grid_w - gi + target[best_n, 1, gj, gi] = gy * grid_h - gj + target[best_n, 2, gj, gi] = np.log(gw * w / anchors[best_idx][0]) + target[best_n, 3, gj, gi] = np.log(gh * h / anchors[best_idx][1]) + target[best_n, 4, gj, gi] = 2.0 - gw * gh + + # objectness record gt_score + # if target[best_n, 5, gj, gi] > 0: + # print('find 1 duplicate') + target[best_n, 5, gj, gi] = score + + # classification + target[best_n, 6 + cls, gj, gi] = 1. + + return target + + +class TestYolov3LossOp(unittest.TestCase): + def setUp(self): + self.initTestCase() + x = np.random.uniform(0, 1, self.x_shape).astype('float64') + gtbox = np.random.random(size=self.gtbox_shape).astype('float64') + gtlabel = np.random.randint(0, self.class_num, self.gtbox_shape[:2]) + gtmask = np.random.randint(0, 2, self.gtbox_shape[:2]) + gtbox = gtbox * gtmask[:, :, np.newaxis] + gtlabel = gtlabel * gtmask + + gtscore = np.ones(self.gtbox_shape[:2]).astype('float64') + if self.gtscore: + gtscore = np.random.random(self.gtbox_shape[:2]).astype('float64') + + target = [] + for box, label, score in zip(gtbox, gtlabel, gtscore): + target.append( + gt2yolotarget(box, label, score, self.anchors, self.anchor_mask, + self.class_num, (self.h, self.w + ), self.downsample_ratio)) + + self.target = np.array(target).astype('float64') + + self.mask_anchors = [] + for i in self.anchor_mask: + self.mask_anchors.extend(self.anchors[i]) + self.x = x + self.gtbox = gtbox + self.gtlabel = gtlabel + self.gtscore = gtscore + + def initTestCase(self): + self.b = 8 + self.h = 19 + self.w = 19 + self.anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], + [59, 119], [116, 90], [156, 198], [373, 326]] + self.anchor_mask = [6, 7, 8] + self.na = len(self.anchor_mask) + self.class_num = 80 + self.ignore_thresh = 0.7 + self.downsample_ratio = 32 + self.x_shape = (self.b, len(self.anchor_mask) * (5 + self.class_num), + self.h, self.w) + self.gtbox_shape = (self.b, 40, 4) + self.gtscore = True + self.use_label_smooth = False + self.scale_x_y = 1. + + def test_loss(self): + x, gtbox, gtlabel, gtscore, target = self.x, self.gtbox, self.gtlabel, self.gtscore, self.target + yolo_loss = YOLOv3Loss( + ignore_thresh=self.ignore_thresh, + label_smooth=self.use_label_smooth, + num_classes=self.class_num, + downsample=self.downsample_ratio, + scale_x_y=self.scale_x_y) + x = paddle.to_tensor(x.astype(np.float32)) + gtbox = paddle.to_tensor(gtbox.astype(np.float32)) + gtlabel = paddle.to_tensor(gtlabel.astype(np.float32)) + gtscore = paddle.to_tensor(gtscore.astype(np.float32)) + t = paddle.to_tensor(target.astype(np.float32)) + anchor = [self.anchors[i] for i in self.anchor_mask] + (yolo_loss1, px, py, tx, ty) = fine_grained_loss( + output=x, + target=t, + gt_box=gtbox, + batch_size=self.b, + num_classes=self.class_num, + anchors=self.mask_anchors, + ignore_thresh=self.ignore_thresh, + downsample=self.downsample_ratio, + scale_x_y=self.scale_x_y) + yolo_loss2 = yolo_loss.yolov3_loss( + x, t, gtbox, anchor, self.downsample_ratio, self.scale_x_y) + for k in yolo_loss2: + self.assertAlmostEqual( + float(yolo_loss1[k]), float(yolo_loss2[k]), delta=1e-2, msg=k) + + +class TestYolov3LossNoGTScore(TestYolov3LossOp): + def initTestCase(self): + self.b = 1 + self.h = 76 + self.w = 76 + self.anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], + [59, 119], [116, 90], [156, 198], [373, 326]] + self.anchor_mask = [0, 1, 2] + self.na = len(self.anchor_mask) + self.class_num = 80 + self.ignore_thresh = 0.7 + self.downsample_ratio = 8 + self.x_shape = (self.b, len(self.anchor_mask) * (5 + self.class_num), + self.h, self.w) + self.gtbox_shape = (self.b, 40, 4) + self.gtscore = False + self.use_label_smooth = False + self.scale_x_y = 1. + + +class TestYolov3LossWithScaleXY(TestYolov3LossOp): + def initTestCase(self): + self.b = 5 + self.h = 38 + self.w = 38 + self.anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], + [59, 119], [116, 90], [156, 198], [373, 326]] + self.anchor_mask = [3, 4, 5] + self.na = len(self.anchor_mask) + self.class_num = 80 + self.ignore_thresh = 0.7 + self.downsample_ratio = 16 + self.x_shape = (self.b, len(self.anchor_mask) * (5 + self.class_num), + self.h, self.w) + self.gtbox_shape = (self.b, 40, 4) + self.gtscore = True + self.use_label_smooth = False + self.scale_x_y = 1.2 + + +if __name__ == "__main__": + unittest.main() diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__init__.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e55cb0c1de9d62154a93cd8d6a101ef8fe51d356 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__init__.py @@ -0,0 +1,28 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import detr_transformer +from . import utils +from . import matchers +from . import position_encoding +from . import deformable_transformer +from . import dino_transformer + +from .detr_transformer import * +from .utils import * +from .matchers import * +from .position_encoding import * +from .deformable_transformer import * +from .dino_transformer import * +from .petr_transformer import * diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..ec0a12a911ed9be7492c77753a51641b2035fba8 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/deformable_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/deformable_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c9320709ace5e33ea86cc97fd9d68151161e5cf1 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/deformable_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/detr_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/detr_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..17b81c7db507b58cfdf5e5ddc8146c0f22638925 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/detr_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/dino_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/dino_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..2e244520529828f73dbd0d2f0ca60466cb7cd6c0 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/dino_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/matchers.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/matchers.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..eb9982129d88ae0316fbaa06968d20ededdc2c5e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/matchers.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/petr_transformer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/petr_transformer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d82e7123c48f388ef4c21893506d058eca853abf Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/petr_transformer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/position_encoding.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/position_encoding.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..3d896f92a5cfb99850b1393aa656c955e46d15bf Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/position_encoding.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7b4fad0834f68140f98fbe9570d9f2368270ef22 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/modeling/transformers/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/deformable_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/deformable_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..79aefad2972c7d3159a67307c4d1c8c214dccf75 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/deformable_transformer.py @@ -0,0 +1,536 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from Deformable-DETR (https://github.com/fundamentalvision/Deformable-DETR) +# Copyright (c) 2020 SenseTime. All Rights Reserved. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr + +from ppdet.core.workspace import register +from ..layers import MultiHeadAttention +from .position_encoding import PositionEmbedding +from .utils import _get_clones, get_valid_ratio +from ..initializer import linear_init_, constant_, xavier_uniform_, normal_ + +__all__ = ['DeformableTransformer'] + + +class MSDeformableAttention(nn.Layer): + def __init__(self, + embed_dim=256, + num_heads=8, + num_levels=4, + num_points=4, + lr_mult=0.1): + """ + Multi-Scale Deformable Attention Module + """ + super(MSDeformableAttention, self).__init__() + self.embed_dim = embed_dim + self.num_heads = num_heads + self.num_levels = num_levels + self.num_points = num_points + self.total_points = num_heads * num_levels * num_points + + self.head_dim = embed_dim // num_heads + assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" + + self.sampling_offsets = nn.Linear( + embed_dim, + self.total_points * 2, + weight_attr=ParamAttr(learning_rate=lr_mult), + bias_attr=ParamAttr(learning_rate=lr_mult)) + + self.attention_weights = nn.Linear(embed_dim, self.total_points) + self.value_proj = nn.Linear(embed_dim, embed_dim) + self.output_proj = nn.Linear(embed_dim, embed_dim) + try: + # use cuda op + from deformable_detr_ops import ms_deformable_attn + except: + # use paddle func + from .utils import deformable_attention_core_func as ms_deformable_attn + self.ms_deformable_attn_core = ms_deformable_attn + + self._reset_parameters() + + def _reset_parameters(self): + # sampling_offsets + constant_(self.sampling_offsets.weight) + thetas = paddle.arange( + self.num_heads, + dtype=paddle.float32) * (2.0 * math.pi / self.num_heads) + grid_init = paddle.stack([thetas.cos(), thetas.sin()], -1) + grid_init = grid_init / grid_init.abs().max(-1, keepdim=True) + grid_init = grid_init.reshape([self.num_heads, 1, 1, 2]).tile( + [1, self.num_levels, self.num_points, 1]) + scaling = paddle.arange( + 1, self.num_points + 1, + dtype=paddle.float32).reshape([1, 1, -1, 1]) + grid_init *= scaling + self.sampling_offsets.bias.set_value(grid_init.flatten()) + # attention_weights + constant_(self.attention_weights.weight) + constant_(self.attention_weights.bias) + # proj + xavier_uniform_(self.value_proj.weight) + constant_(self.value_proj.bias) + xavier_uniform_(self.output_proj.weight) + constant_(self.output_proj.bias) + + def forward(self, + query, + reference_points, + value, + value_spatial_shapes, + value_level_start_index, + value_mask=None): + """ + Args: + query (Tensor): [bs, query_length, C] + reference_points (Tensor): [bs, query_length, n_levels, 2], range in [0, 1], top-left (0,0), + bottom-right (1, 1), including padding area + value (Tensor): [bs, value_length, C] + value_spatial_shapes (Tensor): [n_levels, 2], [(H_0, W_0), (H_1, W_1), ..., (H_{L-1}, W_{L-1})] + value_level_start_index (Tensor(int64)): [n_levels], [0, H_0*W_0, H_0*W_0+H_1*W_1, ...] + value_mask (Tensor): [bs, value_length], True for non-padding elements, False for padding elements + + Returns: + output (Tensor): [bs, Length_{query}, C] + """ + bs, Len_q = query.shape[:2] + Len_v = value.shape[1] + assert int(value_spatial_shapes.prod(1).sum()) == Len_v + + value = self.value_proj(value) + if value_mask is not None: + value_mask = value_mask.astype(value.dtype).unsqueeze(-1) + value *= value_mask + value = value.reshape([bs, Len_v, self.num_heads, self.head_dim]) + + sampling_offsets = self.sampling_offsets(query).reshape( + [bs, Len_q, self.num_heads, self.num_levels, self.num_points, 2]) + attention_weights = self.attention_weights(query).reshape( + [bs, Len_q, self.num_heads, self.num_levels * self.num_points]) + attention_weights = F.softmax(attention_weights).reshape( + [bs, Len_q, self.num_heads, self.num_levels, self.num_points]) + + if reference_points.shape[-1] == 2: + offset_normalizer = value_spatial_shapes.flip([1]).reshape( + [1, 1, 1, self.num_levels, 1, 2]) + sampling_locations = reference_points.reshape([ + bs, Len_q, 1, self.num_levels, 1, 2 + ]) + sampling_offsets / offset_normalizer + elif reference_points.shape[-1] == 4: + sampling_locations = ( + reference_points[:, :, None, :, None, :2] + sampling_offsets / + self.num_points * reference_points[:, :, None, :, None, 2:] * + 0.5) + else: + raise ValueError( + "Last dim of reference_points must be 2 or 4, but get {} instead.". + format(reference_points.shape[-1])) + + output = self.ms_deformable_attn_core( + value, value_spatial_shapes, value_level_start_index, + sampling_locations, attention_weights) + output = self.output_proj(output) + + return output + + +class DeformableTransformerEncoderLayer(nn.Layer): + def __init__(self, + d_model=256, + n_head=8, + dim_feedforward=1024, + dropout=0.1, + activation="relu", + n_levels=4, + n_points=4, + weight_attr=None, + bias_attr=None): + super(DeformableTransformerEncoderLayer, self).__init__() + # self attention + self.self_attn = MSDeformableAttention(d_model, n_head, n_levels, + n_points) + self.dropout1 = nn.Dropout(dropout) + self.norm1 = nn.LayerNorm(d_model) + # ffn + self.linear1 = nn.Linear(d_model, dim_feedforward, weight_attr, + bias_attr) + self.activation = getattr(F, activation) + self.dropout2 = nn.Dropout(dropout) + self.linear2 = nn.Linear(dim_feedforward, d_model, weight_attr, + bias_attr) + self.dropout3 = nn.Dropout(dropout) + self.norm2 = nn.LayerNorm(d_model) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + xavier_uniform_(self.linear1.weight) + xavier_uniform_(self.linear2.weight) + + def with_pos_embed(self, tensor, pos): + return tensor if pos is None else tensor + pos + + def forward_ffn(self, src): + src2 = self.linear2(self.dropout2(self.activation(self.linear1(src)))) + src = src + self.dropout3(src2) + src = self.norm2(src) + return src + + def forward(self, + src, + reference_points, + spatial_shapes, + level_start_index, + src_mask=None, + pos_embed=None): + # self attention + src2 = self.self_attn( + self.with_pos_embed(src, pos_embed), reference_points, src, + spatial_shapes, level_start_index, src_mask) + src = src + self.dropout1(src2) + src = self.norm1(src) + # ffn + src = self.forward_ffn(src) + + return src + + +class DeformableTransformerEncoder(nn.Layer): + def __init__(self, encoder_layer, num_layers): + super(DeformableTransformerEncoder, self).__init__() + self.layers = _get_clones(encoder_layer, num_layers) + self.num_layers = num_layers + + @staticmethod + def get_reference_points(spatial_shapes, valid_ratios, offset=0.5): + valid_ratios = valid_ratios.unsqueeze(1) + reference_points = [] + for i, (H, W) in enumerate(spatial_shapes): + ref_y, ref_x = paddle.meshgrid( + paddle.arange(end=H) + offset, paddle.arange(end=W) + offset) + ref_y = ref_y.flatten().unsqueeze(0) / (valid_ratios[:, :, i, 1] * + H) + ref_x = ref_x.flatten().unsqueeze(0) / (valid_ratios[:, :, i, 0] * + W) + reference_points.append(paddle.stack((ref_x, ref_y), axis=-1)) + reference_points = paddle.concat(reference_points, 1).unsqueeze(2) + reference_points = reference_points * valid_ratios + return reference_points + + def forward(self, + src, + spatial_shapes, + level_start_index, + src_mask=None, + pos_embed=None, + valid_ratios=None): + output = src + if valid_ratios is None: + valid_ratios = paddle.ones( + [src.shape[0], spatial_shapes.shape[0], 2]) + reference_points = self.get_reference_points(spatial_shapes, + valid_ratios) + for layer in self.layers: + output = layer(output, reference_points, spatial_shapes, + level_start_index, src_mask, pos_embed) + + return output + + +class DeformableTransformerDecoderLayer(nn.Layer): + def __init__(self, + d_model=256, + n_head=8, + dim_feedforward=1024, + dropout=0.1, + activation="relu", + n_levels=4, + n_points=4, + weight_attr=None, + bias_attr=None): + super(DeformableTransformerDecoderLayer, self).__init__() + + # self attention + self.self_attn = MultiHeadAttention(d_model, n_head, dropout=dropout) + self.dropout1 = nn.Dropout(dropout) + self.norm1 = nn.LayerNorm(d_model) + + # cross attention + self.cross_attn = MSDeformableAttention(d_model, n_head, n_levels, + n_points) + self.dropout2 = nn.Dropout(dropout) + self.norm2 = nn.LayerNorm(d_model) + + # ffn + self.linear1 = nn.Linear(d_model, dim_feedforward, weight_attr, + bias_attr) + self.activation = getattr(F, activation) + self.dropout3 = nn.Dropout(dropout) + self.linear2 = nn.Linear(dim_feedforward, d_model, weight_attr, + bias_attr) + self.dropout4 = nn.Dropout(dropout) + self.norm3 = nn.LayerNorm(d_model) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + xavier_uniform_(self.linear1.weight) + xavier_uniform_(self.linear2.weight) + + def with_pos_embed(self, tensor, pos): + return tensor if pos is None else tensor + pos + + def forward_ffn(self, tgt): + tgt2 = self.linear2(self.dropout3(self.activation(self.linear1(tgt)))) + tgt = tgt + self.dropout4(tgt2) + tgt = self.norm3(tgt) + return tgt + + def forward(self, + tgt, + reference_points, + memory, + memory_spatial_shapes, + memory_level_start_index, + memory_mask=None, + query_pos_embed=None): + # self attention + q = k = self.with_pos_embed(tgt, query_pos_embed) + tgt2 = self.self_attn(q, k, value=tgt) + tgt = tgt + self.dropout1(tgt2) + tgt = self.norm1(tgt) + + # cross attention + tgt2 = self.cross_attn( + self.with_pos_embed(tgt, query_pos_embed), reference_points, memory, + memory_spatial_shapes, memory_level_start_index, memory_mask) + tgt = tgt + self.dropout2(tgt2) + tgt = self.norm2(tgt) + + # ffn + tgt = self.forward_ffn(tgt) + + return tgt + + +class DeformableTransformerDecoder(nn.Layer): + def __init__(self, decoder_layer, num_layers, return_intermediate=False): + super(DeformableTransformerDecoder, self).__init__() + self.layers = _get_clones(decoder_layer, num_layers) + self.num_layers = num_layers + self.return_intermediate = return_intermediate + + def forward(self, + tgt, + reference_points, + memory, + memory_spatial_shapes, + memory_level_start_index, + memory_mask=None, + query_pos_embed=None): + output = tgt + intermediate = [] + for lid, layer in enumerate(self.layers): + output = layer(output, reference_points, memory, + memory_spatial_shapes, memory_level_start_index, + memory_mask, query_pos_embed) + + if self.return_intermediate: + intermediate.append(output) + + if self.return_intermediate: + return paddle.stack(intermediate) + + return output.unsqueeze(0) + + +@register +class DeformableTransformer(nn.Layer): + __shared__ = ['hidden_dim'] + + def __init__(self, + num_queries=300, + position_embed_type='sine', + return_intermediate_dec=True, + backbone_num_channels=[512, 1024, 2048], + num_feature_levels=4, + num_encoder_points=4, + num_decoder_points=4, + hidden_dim=256, + nhead=8, + num_encoder_layers=6, + num_decoder_layers=6, + dim_feedforward=1024, + dropout=0.1, + activation="relu", + lr_mult=0.1, + weight_attr=None, + bias_attr=None): + super(DeformableTransformer, self).__init__() + assert position_embed_type in ['sine', 'learned'], \ + f'ValueError: position_embed_type not supported {position_embed_type}!' + assert len(backbone_num_channels) <= num_feature_levels + + self.hidden_dim = hidden_dim + self.nhead = nhead + self.num_feature_levels = num_feature_levels + + encoder_layer = DeformableTransformerEncoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, + num_feature_levels, num_encoder_points, weight_attr, bias_attr) + self.encoder = DeformableTransformerEncoder(encoder_layer, + num_encoder_layers) + + decoder_layer = DeformableTransformerDecoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, + num_feature_levels, num_decoder_points, weight_attr, bias_attr) + self.decoder = DeformableTransformerDecoder( + decoder_layer, num_decoder_layers, return_intermediate_dec) + + self.level_embed = nn.Embedding(num_feature_levels, hidden_dim) + self.tgt_embed = nn.Embedding(num_queries, hidden_dim) + self.query_pos_embed = nn.Embedding(num_queries, hidden_dim) + + self.reference_points = nn.Linear( + hidden_dim, + 2, + weight_attr=ParamAttr(learning_rate=lr_mult), + bias_attr=ParamAttr(learning_rate=lr_mult)) + + self.input_proj = nn.LayerList() + for in_channels in backbone_num_channels: + self.input_proj.append( + nn.Sequential( + nn.Conv2D( + in_channels, + hidden_dim, + kernel_size=1, + weight_attr=weight_attr, + bias_attr=bias_attr), + nn.GroupNorm(32, hidden_dim))) + in_channels = backbone_num_channels[-1] + for _ in range(num_feature_levels - len(backbone_num_channels)): + self.input_proj.append( + nn.Sequential( + nn.Conv2D( + in_channels, + hidden_dim, + kernel_size=3, + stride=2, + padding=1, + weight_attr=weight_attr, + bias_attr=bias_attr), + nn.GroupNorm(32, hidden_dim))) + in_channels = hidden_dim + + self.position_embedding = PositionEmbedding( + hidden_dim // 2, + normalize=True if position_embed_type == 'sine' else False, + embed_type=position_embed_type, + offset=-0.5) + + self._reset_parameters() + + def _reset_parameters(self): + normal_(self.level_embed.weight) + normal_(self.tgt_embed.weight) + normal_(self.query_pos_embed.weight) + xavier_uniform_(self.reference_points.weight) + constant_(self.reference_points.bias) + for l in self.input_proj: + xavier_uniform_(l[0].weight) + constant_(l[0].bias) + + @classmethod + def from_config(cls, cfg, input_shape): + return {'backbone_num_channels': [i.channels for i in input_shape], } + + def forward(self, src_feats, src_mask=None, *args, **kwargs): + srcs = [] + for i in range(len(src_feats)): + srcs.append(self.input_proj[i](src_feats[i])) + if self.num_feature_levels > len(srcs): + len_srcs = len(srcs) + for i in range(len_srcs, self.num_feature_levels): + if i == len_srcs: + srcs.append(self.input_proj[i](src_feats[-1])) + else: + srcs.append(self.input_proj[i](srcs[-1])) + src_flatten = [] + mask_flatten = [] + lvl_pos_embed_flatten = [] + spatial_shapes = [] + valid_ratios = [] + for level, src in enumerate(srcs): + bs, _, h, w = paddle.shape(src) + spatial_shapes.append(paddle.concat([h, w])) + src = src.flatten(2).transpose([0, 2, 1]) + src_flatten.append(src) + if src_mask is not None: + mask = F.interpolate(src_mask.unsqueeze(0), size=(h, w))[0] + else: + mask = paddle.ones([bs, h, w]) + valid_ratios.append(get_valid_ratio(mask)) + pos_embed = self.position_embedding(mask).flatten(1, 2) + lvl_pos_embed = pos_embed + self.level_embed.weight[level] + lvl_pos_embed_flatten.append(lvl_pos_embed) + mask = mask.flatten(1) + mask_flatten.append(mask) + src_flatten = paddle.concat(src_flatten, 1) + mask_flatten = None if src_mask is None else paddle.concat(mask_flatten, + 1) + lvl_pos_embed_flatten = paddle.concat(lvl_pos_embed_flatten, 1) + # [l, 2] + spatial_shapes = paddle.to_tensor( + paddle.stack(spatial_shapes).astype('int64')) + # [l], 每一个level的起始index + level_start_index = paddle.concat([ + paddle.zeros( + [1], dtype='int64'), spatial_shapes.prod(1).cumsum(0)[:-1] + ]) + # [b, l, 2] + valid_ratios = paddle.stack(valid_ratios, 1) + + # encoder + memory = self.encoder(src_flatten, spatial_shapes, level_start_index, + mask_flatten, lvl_pos_embed_flatten, valid_ratios) + + # prepare input for decoder + bs, _, c = memory.shape + query_embed = self.query_pos_embed.weight.unsqueeze(0).tile([bs, 1, 1]) + tgt = self.tgt_embed.weight.unsqueeze(0).tile([bs, 1, 1]) + reference_points = F.sigmoid(self.reference_points(query_embed)) + reference_points_input = reference_points.unsqueeze( + 2) * valid_ratios.unsqueeze(1) + + # decoder + hs = self.decoder(tgt, reference_points_input, memory, spatial_shapes, + level_start_index, mask_flatten, query_embed) + + return (hs, memory, reference_points) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/detr_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/detr_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..ccbdb0a3d2ace0a92595ceda0ee74cd2b5b93502 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/detr_transformer.py @@ -0,0 +1,355 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from DETR (https://github.com/facebookresearch/detr) +# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import register +from ..layers import MultiHeadAttention, _convert_attention_mask +from .position_encoding import PositionEmbedding +from .utils import _get_clones +from ..initializer import linear_init_, conv_init_, xavier_uniform_, normal_ + +__all__ = ['DETRTransformer'] + + +class TransformerEncoderLayer(nn.Layer): + def __init__(self, + d_model, + nhead, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(TransformerEncoderLayer, self).__init__() + attn_dropout = dropout if attn_dropout is None else attn_dropout + act_dropout = dropout if act_dropout is None else act_dropout + self.normalize_before = normalize_before + + self.self_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + # Implementation of Feedforward model + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(act_dropout, mode="upscale_in_train") + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout2 = nn.Dropout(dropout, mode="upscale_in_train") + self.activation = getattr(F, activation) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + + @staticmethod + def with_pos_embed(tensor, pos_embed): + return tensor if pos_embed is None else tensor + pos_embed + + def forward(self, src, src_mask=None, pos_embed=None): + residual = src + if self.normalize_before: + src = self.norm1(src) + q = k = self.with_pos_embed(src, pos_embed) + src = self.self_attn(q, k, value=src, attn_mask=src_mask) + + src = residual + self.dropout1(src) + if not self.normalize_before: + src = self.norm1(src) + + residual = src + if self.normalize_before: + src = self.norm2(src) + src = self.linear2(self.dropout(self.activation(self.linear1(src)))) + src = residual + self.dropout2(src) + if not self.normalize_before: + src = self.norm2(src) + return src + + +class TransformerEncoder(nn.Layer): + def __init__(self, encoder_layer, num_layers, norm=None): + super(TransformerEncoder, self).__init__() + self.layers = _get_clones(encoder_layer, num_layers) + self.num_layers = num_layers + self.norm = norm + + def forward(self, src, src_mask=None, pos_embed=None): + output = src + for layer in self.layers: + output = layer(output, src_mask=src_mask, pos_embed=pos_embed) + + if self.norm is not None: + output = self.norm(output) + + return output + + +class TransformerDecoderLayer(nn.Layer): + def __init__(self, + d_model, + nhead, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(TransformerDecoderLayer, self).__init__() + attn_dropout = dropout if attn_dropout is None else attn_dropout + act_dropout = dropout if act_dropout is None else act_dropout + self.normalize_before = normalize_before + + self.self_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + self.cross_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + # Implementation of Feedforward model + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(act_dropout, mode="upscale_in_train") + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.norm3 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout2 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout3 = nn.Dropout(dropout, mode="upscale_in_train") + self.activation = getattr(F, activation) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + + @staticmethod + def with_pos_embed(tensor, pos_embed): + return tensor if pos_embed is None else tensor + pos_embed + + def forward(self, + tgt, + memory, + tgt_mask=None, + memory_mask=None, + pos_embed=None, + query_pos_embed=None): + tgt_mask = _convert_attention_mask(tgt_mask, tgt.dtype) + + residual = tgt + if self.normalize_before: + tgt = self.norm1(tgt) + q = k = self.with_pos_embed(tgt, query_pos_embed) + tgt = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask) + tgt = residual + self.dropout1(tgt) + if not self.normalize_before: + tgt = self.norm1(tgt) + + residual = tgt + if self.normalize_before: + tgt = self.norm2(tgt) + q = self.with_pos_embed(tgt, query_pos_embed) + k = self.with_pos_embed(memory, pos_embed) + tgt = self.cross_attn(q, k, value=memory, attn_mask=memory_mask) + tgt = residual + self.dropout2(tgt) + if not self.normalize_before: + tgt = self.norm2(tgt) + + residual = tgt + if self.normalize_before: + tgt = self.norm3(tgt) + tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt)))) + tgt = residual + self.dropout3(tgt) + if not self.normalize_before: + tgt = self.norm3(tgt) + return tgt + + +class TransformerDecoder(nn.Layer): + def __init__(self, + decoder_layer, + num_layers, + norm=None, + return_intermediate=False): + super(TransformerDecoder, self).__init__() + self.layers = _get_clones(decoder_layer, num_layers) + self.num_layers = num_layers + self.norm = norm + self.return_intermediate = return_intermediate + + def forward(self, + tgt, + memory, + tgt_mask=None, + memory_mask=None, + pos_embed=None, + query_pos_embed=None): + tgt_mask = _convert_attention_mask(tgt_mask, tgt.dtype) + + output = tgt + intermediate = [] + for layer in self.layers: + output = layer( + output, + memory, + tgt_mask=tgt_mask, + memory_mask=memory_mask, + pos_embed=pos_embed, + query_pos_embed=query_pos_embed) + if self.return_intermediate: + intermediate.append(self.norm(output)) + + if self.norm is not None: + output = self.norm(output) + + if self.return_intermediate: + return paddle.stack(intermediate) + + return output.unsqueeze(0) + + +@register +class DETRTransformer(nn.Layer): + __shared__ = ['hidden_dim'] + + def __init__(self, + num_queries=100, + position_embed_type='sine', + return_intermediate_dec=True, + backbone_num_channels=2048, + hidden_dim=256, + nhead=8, + num_encoder_layers=6, + num_decoder_layers=6, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(DETRTransformer, self).__init__() + assert position_embed_type in ['sine', 'learned'],\ + f'ValueError: position_embed_type not supported {position_embed_type}!' + self.hidden_dim = hidden_dim + self.nhead = nhead + + encoder_layer = TransformerEncoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, + attn_dropout, act_dropout, normalize_before) + encoder_norm = nn.LayerNorm(hidden_dim) if normalize_before else None + self.encoder = TransformerEncoder(encoder_layer, num_encoder_layers, + encoder_norm) + + decoder_layer = TransformerDecoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, + attn_dropout, act_dropout, normalize_before) + decoder_norm = nn.LayerNorm(hidden_dim) + self.decoder = TransformerDecoder( + decoder_layer, + num_decoder_layers, + decoder_norm, + return_intermediate=return_intermediate_dec) + + self.input_proj = nn.Conv2D( + backbone_num_channels, hidden_dim, kernel_size=1) + self.query_pos_embed = nn.Embedding(num_queries, hidden_dim) + self.position_embedding = PositionEmbedding( + hidden_dim // 2, + normalize=True if position_embed_type == 'sine' else False, + embed_type=position_embed_type) + + self._reset_parameters() + + def _reset_parameters(self): + for p in self.parameters(): + if p.dim() > 1: + xavier_uniform_(p) + conv_init_(self.input_proj) + normal_(self.query_pos_embed.weight) + + @classmethod + def from_config(cls, cfg, input_shape): + return { + 'backbone_num_channels': [i.channels for i in input_shape][-1], + } + + def _convert_attention_mask(self, mask): + return (mask - 1.0) * 1e9 + + def forward(self, src, src_mask=None, *args, **kwargs): + r""" + Applies a Transformer model on the inputs. + + Parameters: + src (List(Tensor)): Backbone feature maps with shape [[bs, c, h, w]]. + src_mask (Tensor, optional): A tensor used in multi-head attention + to prevents attention to some unwanted positions, usually the + paddings or the subsequent positions. It is a tensor with shape + [bs, H, W]`. When the data type is bool, the unwanted positions + have `False` values and the others have `True` values. When the + data type is int, the unwanted positions have 0 values and the + others have 1 values. When the data type is float, the unwanted + positions have `-INF` values and the others have 0 values. It + can be None when nothing wanted or needed to be prevented + attention to. Default None. + + Returns: + output (Tensor): [num_levels, batch_size, num_queries, hidden_dim] + memory (Tensor): [batch_size, hidden_dim, h, w] + """ + # use last level feature map + src_proj = self.input_proj(src[-1]) + bs, c, h, w = paddle.shape(src_proj) + # flatten [B, C, H, W] to [B, HxW, C] + src_flatten = src_proj.flatten(2).transpose([0, 2, 1]) + if src_mask is not None: + src_mask = F.interpolate(src_mask.unsqueeze(0), size=(h, w))[0] + else: + src_mask = paddle.ones([bs, h, w]) + pos_embed = self.position_embedding(src_mask).flatten(1, 2) + + if self.training: + src_mask = self._convert_attention_mask(src_mask) + src_mask = src_mask.reshape([bs, 1, 1, h * w]) + else: + src_mask = None + + memory = self.encoder( + src_flatten, src_mask=src_mask, pos_embed=pos_embed) + + query_pos_embed = self.query_pos_embed.weight.unsqueeze(0).tile( + [bs, 1, 1]) + tgt = paddle.zeros_like(query_pos_embed) + output = self.decoder( + tgt, + memory, + memory_mask=src_mask, + pos_embed=pos_embed, + query_pos_embed=query_pos_embed) + + if self.training: + src_mask = src_mask.reshape([bs, 1, 1, h, w]) + else: + src_mask = None + + return (output, memory.transpose([0, 2, 1]).reshape([bs, c, h, w]), + src_proj, src_mask) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/dino_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/dino_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..040e1807a5d7e81a836fdde30222dd3a9be56c9e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/dino_transformer.py @@ -0,0 +1,647 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from Deformable-DETR (https://github.com/fundamentalvision/Deformable-DETR) +# Copyright (c) 2020 SenseTime. All Rights Reserved. +# Modified from detrex (https://github.com/IDEA-Research/detrex) +# Copyright 2022 The IDEA Authors. All rights reserved. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay + +from ppdet.core.workspace import register +from ..layers import MultiHeadAttention +from .position_encoding import PositionEmbedding +from ..heads.detr_head import MLP +from .deformable_transformer import MSDeformableAttention +from ..initializer import (linear_init_, constant_, xavier_uniform_, normal_, + bias_init_with_prob) +from .utils import (_get_clones, get_valid_ratio, + get_contrastive_denoising_training_group, + get_sine_pos_embed, inverse_sigmoid) + +__all__ = ['DINOTransformer'] + + +class DINOTransformerEncoderLayer(nn.Layer): + def __init__(self, + d_model=256, + n_head=8, + dim_feedforward=1024, + dropout=0., + activation="relu", + n_levels=4, + n_points=4, + weight_attr=None, + bias_attr=None): + super(DINOTransformerEncoderLayer, self).__init__() + # self attention + self.self_attn = MSDeformableAttention(d_model, n_head, n_levels, + n_points, 1.0) + self.dropout1 = nn.Dropout(dropout) + self.norm1 = nn.LayerNorm( + d_model, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + # ffn + self.linear1 = nn.Linear(d_model, dim_feedforward, weight_attr, + bias_attr) + self.activation = getattr(F, activation) + self.dropout2 = nn.Dropout(dropout) + self.linear2 = nn.Linear(dim_feedforward, d_model, weight_attr, + bias_attr) + self.dropout3 = nn.Dropout(dropout) + self.norm2 = nn.LayerNorm( + d_model, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + xavier_uniform_(self.linear1.weight) + xavier_uniform_(self.linear2.weight) + + def with_pos_embed(self, tensor, pos): + return tensor if pos is None else tensor + pos + + def forward_ffn(self, src): + src2 = self.linear2(self.dropout2(self.activation(self.linear1(src)))) + src = src + self.dropout3(src2) + src = self.norm2(src) + return src + + def forward(self, + src, + reference_points, + spatial_shapes, + level_start_index, + src_mask=None, + query_pos_embed=None): + # self attention + src2 = self.self_attn( + self.with_pos_embed(src, query_pos_embed), reference_points, src, + spatial_shapes, level_start_index, src_mask) + src = src + self.dropout1(src2) + src = self.norm1(src) + # ffn + src = self.forward_ffn(src) + + return src + + +class DINOTransformerEncoder(nn.Layer): + def __init__(self, encoder_layer, num_layers): + super(DINOTransformerEncoder, self).__init__() + self.layers = _get_clones(encoder_layer, num_layers) + self.num_layers = num_layers + + @staticmethod + def get_reference_points(spatial_shapes, valid_ratios, offset=0.5): + valid_ratios = valid_ratios.unsqueeze(1) + reference_points = [] + for i, (H, W) in enumerate(spatial_shapes): + ref_y, ref_x = paddle.meshgrid( + paddle.arange(end=H) + offset, paddle.arange(end=W) + offset) + ref_y = ref_y.flatten().unsqueeze(0) / (valid_ratios[:, :, i, 1] * + H) + ref_x = ref_x.flatten().unsqueeze(0) / (valid_ratios[:, :, i, 0] * + W) + reference_points.append(paddle.stack((ref_x, ref_y), axis=-1)) + reference_points = paddle.concat(reference_points, 1).unsqueeze(2) + reference_points = reference_points * valid_ratios + return reference_points + + def forward(self, + feat, + spatial_shapes, + level_start_index, + feat_mask=None, + query_pos_embed=None, + valid_ratios=None): + if valid_ratios is None: + valid_ratios = paddle.ones( + [feat.shape[0], spatial_shapes.shape[0], 2]) + reference_points = self.get_reference_points(spatial_shapes, + valid_ratios) + for layer in self.layers: + feat = layer(feat, reference_points, spatial_shapes, + level_start_index, feat_mask, query_pos_embed) + + return feat + + +class DINOTransformerDecoderLayer(nn.Layer): + def __init__(self, + d_model=256, + n_head=8, + dim_feedforward=1024, + dropout=0., + activation="relu", + n_levels=4, + n_points=4, + weight_attr=None, + bias_attr=None): + super(DINOTransformerDecoderLayer, self).__init__() + + # self attention + self.self_attn = MultiHeadAttention(d_model, n_head, dropout=dropout) + self.dropout1 = nn.Dropout(dropout) + self.norm1 = nn.LayerNorm( + d_model, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + # cross attention + self.cross_attn = MSDeformableAttention(d_model, n_head, n_levels, + n_points, 1.0) + self.dropout2 = nn.Dropout(dropout) + self.norm2 = nn.LayerNorm( + d_model, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + # ffn + self.linear1 = nn.Linear(d_model, dim_feedforward, weight_attr, + bias_attr) + self.activation = getattr(F, activation) + self.dropout3 = nn.Dropout(dropout) + self.linear2 = nn.Linear(dim_feedforward, d_model, weight_attr, + bias_attr) + self.dropout4 = nn.Dropout(dropout) + self.norm3 = nn.LayerNorm( + d_model, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + xavier_uniform_(self.linear1.weight) + xavier_uniform_(self.linear2.weight) + + def with_pos_embed(self, tensor, pos): + return tensor if pos is None else tensor + pos + + def forward_ffn(self, tgt): + return self.linear2(self.dropout3(self.activation(self.linear1(tgt)))) + + def forward(self, + tgt, + reference_points, + memory, + memory_spatial_shapes, + memory_level_start_index, + attn_mask=None, + memory_mask=None, + query_pos_embed=None): + # self attention + q = k = self.with_pos_embed(tgt, query_pos_embed) + if attn_mask is not None: + attn_mask = attn_mask.astype('bool') + tgt2 = self.self_attn(q, k, value=tgt, attn_mask=attn_mask) + tgt = tgt + self.dropout1(tgt2) + tgt = self.norm1(tgt) + + # cross attention + tgt2 = self.cross_attn( + self.with_pos_embed(tgt, query_pos_embed), reference_points, memory, + memory_spatial_shapes, memory_level_start_index, memory_mask) + tgt = tgt + self.dropout2(tgt2) + tgt = self.norm2(tgt) + + # ffn + tgt2 = self.forward_ffn(tgt) + tgt = tgt + self.dropout4(tgt2) + tgt = self.norm3(tgt) + + return tgt + + +class DINOTransformerDecoder(nn.Layer): + def __init__(self, + hidden_dim, + decoder_layer, + num_layers, + return_intermediate=True): + super(DINOTransformerDecoder, self).__init__() + self.layers = _get_clones(decoder_layer, num_layers) + self.hidden_dim = hidden_dim + self.num_layers = num_layers + self.return_intermediate = return_intermediate + self.norm = nn.LayerNorm( + hidden_dim, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + def forward(self, + tgt, + ref_points_unact, + memory, + memory_spatial_shapes, + memory_level_start_index, + bbox_head, + query_pos_head, + valid_ratios=None, + attn_mask=None, + memory_mask=None): + if valid_ratios is None: + valid_ratios = paddle.ones( + [memory.shape[0], memory_spatial_shapes.shape[0], 2]) + + output = tgt + intermediate = [] + inter_ref_bboxes_unact = [] + for i, layer in enumerate(self.layers): + reference_points_input = F.sigmoid(ref_points_unact).unsqueeze( + 2) * valid_ratios.tile([1, 1, 2]).unsqueeze(1) + query_pos_embed = get_sine_pos_embed( + reference_points_input[..., 0, :], self.hidden_dim // 2) + query_pos_embed = query_pos_head(query_pos_embed) + + output = layer(output, reference_points_input, memory, + memory_spatial_shapes, memory_level_start_index, + attn_mask, memory_mask, query_pos_embed) + + inter_ref_bbox_unact = bbox_head[i](output) + ref_points_unact + + if self.return_intermediate: + intermediate.append(self.norm(output)) + inter_ref_bboxes_unact.append(inter_ref_bbox_unact) + + ref_points_unact = inter_ref_bbox_unact.detach() + + if self.return_intermediate: + return paddle.stack(intermediate), paddle.stack( + inter_ref_bboxes_unact) + + return output, ref_points_unact + + +@register +class DINOTransformer(nn.Layer): + __shared__ = ['num_classes', 'hidden_dim'] + + def __init__(self, + num_classes=80, + hidden_dim=256, + num_queries=900, + position_embed_type='sine', + return_intermediate_dec=True, + backbone_feat_channels=[512, 1024, 2048], + num_levels=4, + num_encoder_points=4, + num_decoder_points=4, + nhead=8, + num_encoder_layers=6, + num_decoder_layers=6, + dim_feedforward=1024, + dropout=0., + activation="relu", + pe_temperature=10000, + pe_offset=-0.5, + num_denoising=100, + label_noise_ratio=0.5, + box_noise_scale=1.0, + learnt_init_query=True, + eps=1e-2): + super(DINOTransformer, self).__init__() + assert position_embed_type in ['sine', 'learned'], \ + f'ValueError: position_embed_type not supported {position_embed_type}!' + assert len(backbone_feat_channels) <= num_levels + + self.hidden_dim = hidden_dim + self.nhead = nhead + self.num_levels = num_levels + self.num_classes = num_classes + self.num_queries = num_queries + self.eps = eps + self.num_decoder_layers = num_decoder_layers + + # backbone feature projection + self._build_input_proj_layer(backbone_feat_channels) + + # Transformer module + encoder_layer = DINOTransformerEncoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, num_levels, + num_encoder_points) + self.encoder = DINOTransformerEncoder(encoder_layer, num_encoder_layers) + decoder_layer = DINOTransformerDecoderLayer( + hidden_dim, nhead, dim_feedforward, dropout, activation, num_levels, + num_decoder_points) + self.decoder = DINOTransformerDecoder(hidden_dim, decoder_layer, + num_decoder_layers, + return_intermediate_dec) + + # denoising part + self.denoising_class_embed = nn.Embedding( + num_classes, + hidden_dim, + weight_attr=ParamAttr(initializer=nn.initializer.Normal())) + self.num_denoising = num_denoising + self.label_noise_ratio = label_noise_ratio + self.box_noise_scale = box_noise_scale + + # position embedding + self.position_embedding = PositionEmbedding( + hidden_dim // 2, + temperature=pe_temperature, + normalize=True if position_embed_type == 'sine' else False, + embed_type=position_embed_type, + offset=pe_offset) + self.level_embed = nn.Embedding(num_levels, hidden_dim) + # decoder embedding + self.learnt_init_query = learnt_init_query + if learnt_init_query: + self.tgt_embed = nn.Embedding(num_queries, hidden_dim) + self.query_pos_head = MLP(2 * hidden_dim, + hidden_dim, + hidden_dim, + num_layers=2) + + # encoder head + self.enc_output = nn.Sequential( + nn.Linear(hidden_dim, hidden_dim), + nn.LayerNorm( + hidden_dim, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0)))) + self.enc_score_head = nn.Linear(hidden_dim, num_classes) + self.enc_bbox_head = MLP(hidden_dim, hidden_dim, 4, num_layers=3) + # decoder head + self.dec_score_head = nn.LayerList([ + nn.Linear(hidden_dim, num_classes) + for _ in range(num_decoder_layers) + ]) + self.dec_bbox_head = nn.LayerList([ + MLP(hidden_dim, hidden_dim, 4, num_layers=3) + for _ in range(num_decoder_layers) + ]) + + self._reset_parameters() + + def _reset_parameters(self): + # class and bbox head init + bias_cls = bias_init_with_prob(0.01) + linear_init_(self.enc_score_head) + constant_(self.enc_score_head.bias, bias_cls) + constant_(self.enc_bbox_head.layers[-1].weight) + constant_(self.enc_bbox_head.layers[-1].bias) + for cls_, reg_ in zip(self.dec_score_head, self.dec_bbox_head): + linear_init_(cls_) + constant_(cls_.bias, bias_cls) + constant_(reg_.layers[-1].weight) + constant_(reg_.layers[-1].bias) + + linear_init_(self.enc_output[0]) + xavier_uniform_(self.enc_output[0].weight) + normal_(self.level_embed.weight) + if self.learnt_init_query: + xavier_uniform_(self.tgt_embed.weight) + xavier_uniform_(self.query_pos_head.layers[0].weight) + xavier_uniform_(self.query_pos_head.layers[1].weight) + for l in self.input_proj: + xavier_uniform_(l[0].weight) + constant_(l[0].bias) + + @classmethod + def from_config(cls, cfg, input_shape): + return {'backbone_feat_channels': [i.channels for i in input_shape], } + + def _build_input_proj_layer(self, backbone_feat_channels): + self.input_proj = nn.LayerList() + for in_channels in backbone_feat_channels: + self.input_proj.append( + nn.Sequential( + ('conv', nn.Conv2D( + in_channels, self.hidden_dim, kernel_size=1)), + ('norm', nn.GroupNorm( + 32, + self.hidden_dim, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0)))))) + in_channels = backbone_feat_channels[-1] + for _ in range(self.num_levels - len(backbone_feat_channels)): + self.input_proj.append( + nn.Sequential( + ('conv', nn.Conv2D( + in_channels, + self.hidden_dim, + kernel_size=3, + stride=2, + padding=1)), ('norm', nn.GroupNorm( + 32, + self.hidden_dim, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0)))))) + in_channels = self.hidden_dim + + def _get_encoder_input(self, feats, pad_mask=None): + # get projection features + proj_feats = [self.input_proj[i](feat) for i, feat in enumerate(feats)] + if self.num_levels > len(proj_feats): + len_srcs = len(proj_feats) + for i in range(len_srcs, self.num_levels): + if i == len_srcs: + proj_feats.append(self.input_proj[i](feats[-1])) + else: + proj_feats.append(self.input_proj[i](proj_feats[-1])) + + # get encoder inputs + feat_flatten = [] + mask_flatten = [] + lvl_pos_embed_flatten = [] + spatial_shapes = [] + valid_ratios = [] + for i, feat in enumerate(proj_feats): + bs, _, h, w = paddle.shape(feat) + spatial_shapes.append(paddle.concat([h, w])) + # [b,c,h,w] -> [b,h*w,c] + feat_flatten.append(feat.flatten(2).transpose([0, 2, 1])) + if pad_mask is not None: + mask = F.interpolate(pad_mask.unsqueeze(0), size=(h, w))[0] + else: + mask = paddle.ones([bs, h, w]) + valid_ratios.append(get_valid_ratio(mask)) + # [b, h*w, c] + pos_embed = self.position_embedding(mask).flatten(1, 2) + lvl_pos_embed = pos_embed + self.level_embed.weight[i] + lvl_pos_embed_flatten.append(lvl_pos_embed) + if pad_mask is not None: + # [b, h*w] + mask_flatten.append(mask.flatten(1)) + + # [b, l, c] + feat_flatten = paddle.concat(feat_flatten, 1) + # [b, l] + mask_flatten = None if pad_mask is None else paddle.concat(mask_flatten, + 1) + # [b, l, c] + lvl_pos_embed_flatten = paddle.concat(lvl_pos_embed_flatten, 1) + # [num_levels, 2] + spatial_shapes = paddle.to_tensor( + paddle.stack(spatial_shapes).astype('int64')) + # [l], 每一个level的起始index + level_start_index = paddle.concat([ + paddle.zeros( + [1], dtype='int64'), spatial_shapes.prod(1).cumsum(0)[:-1] + ]) + # [b, num_levels, 2] + valid_ratios = paddle.stack(valid_ratios, 1) + return (feat_flatten, spatial_shapes, level_start_index, mask_flatten, + lvl_pos_embed_flatten, valid_ratios) + + def forward(self, feats, pad_mask=None, gt_meta=None): + # input projection and embedding + (feat_flatten, spatial_shapes, level_start_index, mask_flatten, + lvl_pos_embed_flatten, + valid_ratios) = self._get_encoder_input(feats, pad_mask) + + # encoder + memory = self.encoder(feat_flatten, spatial_shapes, level_start_index, + mask_flatten, lvl_pos_embed_flatten, valid_ratios) + + # prepare denoising training + if self.training: + denoising_class, denoising_bbox_unact, attn_mask, dn_meta = \ + get_contrastive_denoising_training_group(gt_meta, + self.num_classes, + self.num_queries, + self.denoising_class_embed.weight, + self.num_denoising, + self.label_noise_ratio, + self.box_noise_scale) + else: + denoising_class, denoising_bbox_unact, attn_mask, dn_meta = None, None, None, None + + target, init_ref_points_unact, enc_topk_bboxes, enc_topk_logits = \ + self._get_decoder_input( + memory, spatial_shapes, mask_flatten, denoising_class, + denoising_bbox_unact) + + # decoder + inter_feats, inter_ref_bboxes_unact = self.decoder( + target, init_ref_points_unact, memory, spatial_shapes, + level_start_index, self.dec_bbox_head, self.query_pos_head, + valid_ratios, attn_mask, mask_flatten) + out_bboxes = [] + out_logits = [] + for i in range(self.num_decoder_layers): + out_logits.append(self.dec_score_head[i](inter_feats[i])) + if i == 0: + out_bboxes.append( + F.sigmoid(self.dec_bbox_head[i](inter_feats[i]) + + init_ref_points_unact)) + else: + out_bboxes.append( + F.sigmoid(self.dec_bbox_head[i](inter_feats[i]) + + inter_ref_bboxes_unact[i - 1])) + + out_bboxes = paddle.stack(out_bboxes) + out_logits = paddle.stack(out_logits) + + return (out_bboxes, out_logits, enc_topk_bboxes, enc_topk_logits, + dn_meta) + + def _get_encoder_output_anchors(self, + memory, + spatial_shapes, + memory_mask=None, + grid_size=0.05): + output_anchors = [] + idx = 0 + for lvl, (h, w) in enumerate(spatial_shapes): + if memory_mask is not None: + mask_ = memory_mask[:, idx:idx + h * w].reshape([-1, h, w]) + valid_H = paddle.sum(mask_[:, :, 0], 1) + valid_W = paddle.sum(mask_[:, 0, :], 1) + else: + valid_H, valid_W = h, w + + grid_y, grid_x = paddle.meshgrid( + paddle.arange( + end=h, dtype=memory.dtype), + paddle.arange( + end=w, dtype=memory.dtype)) + grid_xy = paddle.stack([grid_x, grid_y], -1) + + valid_WH = paddle.stack([valid_W, valid_H], -1).reshape( + [-1, 1, 1, 2]).astype(grid_xy.dtype) + grid_xy = (grid_xy.unsqueeze(0) + 0.5) / valid_WH + wh = paddle.ones_like(grid_xy) * grid_size * (2.0**lvl) + output_anchors.append( + paddle.concat([grid_xy, wh], -1).reshape([-1, h * w, 4])) + idx += h * w + + output_anchors = paddle.concat(output_anchors, 1) + valid_mask = ((output_anchors > self.eps) * + (output_anchors < 1 - self.eps)).all(-1, keepdim=True) + output_anchors = paddle.log(output_anchors / (1 - output_anchors)) + if memory_mask is not None: + valid_mask = (valid_mask * (memory_mask.unsqueeze(-1) > 0)) > 0 + output_anchors = paddle.where(valid_mask, output_anchors, + paddle.to_tensor(float("inf"))) + + memory = paddle.where(valid_mask, memory, paddle.to_tensor(0.)) + output_memory = self.enc_output(memory) + return output_memory, output_anchors + + def _get_decoder_input(self, + memory, + spatial_shapes, + memory_mask=None, + denoising_class=None, + denoising_bbox_unact=None): + bs, _, _ = memory.shape + # prepare input for decoder + output_memory, output_anchors = self._get_encoder_output_anchors( + memory, spatial_shapes, memory_mask) + enc_outputs_class = self.enc_score_head(output_memory) + enc_outputs_coord_unact = self.enc_bbox_head( + output_memory) + output_anchors + + _, topk_ind = paddle.topk( + enc_outputs_class.max(-1), self.num_queries, axis=1) + # extract region proposal boxes + batch_ind = paddle.arange(end=bs, dtype=topk_ind.dtype) + batch_ind = batch_ind.unsqueeze(-1).tile([1, self.num_queries]) + topk_ind = paddle.stack([batch_ind, topk_ind], axis=-1) + reference_points_unact = paddle.gather_nd(enc_outputs_coord_unact, + topk_ind) # unsigmoided. + enc_topk_bboxes = F.sigmoid(reference_points_unact) + if denoising_bbox_unact is not None: + reference_points_unact = paddle.concat( + [denoising_bbox_unact, reference_points_unact], 1) + enc_topk_logits = paddle.gather_nd(enc_outputs_class, topk_ind) + + # extract region features + if self.learnt_init_query: + target = self.tgt_embed.weight.unsqueeze(0).tile([bs, 1, 1]) + else: + target = paddle.gather_nd(output_memory, topk_ind).detach() + if denoising_class is not None: + target = paddle.concat([denoising_class, target], 1) + + return target, reference_points_unact.detach( + ), enc_topk_bboxes, enc_topk_logits diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/README.md b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/README.md new file mode 100644 index 0000000000000000000000000000000000000000..290926d56a3ae23ccd1b36861d047c6e6bf13187 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/README.md @@ -0,0 +1,85 @@ +# Multi-scale deformable attention自定义OP编译 +该自定义OP是参考[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/new_cpp_op_cn.html) 。 + +## 1. 环境依赖 +- Paddle >= 2.3.2 +- gcc 8.2 + +## 2. 安装 +请在当前路径下进行编译安装 +``` +cd PaddleDetection/ppdet/modeling/transformers/ext_op/ +python setup_ms_deformable_attn_op.py install +``` + +编译完成后即可使用,以下为`ms_deformable_attn`的使用示例 +``` +# 引入自定义op +from deformable_detr_ops import ms_deformable_attn + +# 构造fake input tensor +bs, n_heads, c = 2, 8, 8 +query_length, n_levels, n_points = 2, 2, 2 +spatial_shapes = paddle.to_tensor([(6, 4), (3, 2)], dtype=paddle.int64) +level_start_index = paddle.concat((paddle.to_tensor( + [0], dtype=paddle.int64), spatial_shapes.prod(1).cumsum(0)[:-1])) +value_length = sum([(H * W).item() for H, W in spatial_shapes]) + +def get_test_tensors(channels): + value = paddle.rand( + [bs, value_length, n_heads, channels], dtype=paddle.float32) * 0.01 + sampling_locations = paddle.rand( + [bs, query_length, n_heads, n_levels, n_points, 2], + dtype=paddle.float32) + attention_weights = paddle.rand( + [bs, query_length, n_heads, n_levels, n_points], + dtype=paddle.float32) + 1e-5 + attention_weights /= attention_weights.sum(-1, keepdim=True).sum( + -2, keepdim=True) + return [value, sampling_locations, attention_weights] + +value, sampling_locations, attention_weights = get_test_tensors(c) + +output = ms_deformable_attn(value, + spatial_shapes, + level_start_index, + sampling_locations, + attention_weights) +``` + +## 3. 单元测试 +可以通过执行单元测试来确认自定义算子功能的正确性,执行单元测试的示例如下所示: +``` +python test_ms_deformable_attn_op.py +``` +运行成功后,打印如下: +``` +*True check_forward_equal_with_paddle_float: max_abs_err 6.98e-10 max_rel_err 2.03e-07 +*tensor1 True check_gradient_numerical(D=30) +*tensor2 True check_gradient_numerical(D=30) +*tensor3 True check_gradient_numerical(D=30) +*tensor1 True check_gradient_numerical(D=32) +*tensor2 True check_gradient_numerical(D=32) +*tensor3 True check_gradient_numerical(D=32) +*tensor1 True check_gradient_numerical(D=64) +*tensor2 True check_gradient_numerical(D=64) +*tensor3 True check_gradient_numerical(D=64) +*tensor1 True check_gradient_numerical(D=71) +*tensor2 True check_gradient_numerical(D=71) +*tensor3 True check_gradient_numerical(D=71) +*tensor1 True check_gradient_numerical(D=128) +*tensor2 True check_gradient_numerical(D=128) +*tensor3 True check_gradient_numerical(D=128) +*tensor1 True check_gradient_numerical(D=1024) +*tensor2 True check_gradient_numerical(D=1024) +*tensor3 True check_gradient_numerical(D=1024) +*tensor1 True check_gradient_numerical(D=1025) +*tensor2 True check_gradient_numerical(D=1025) +*tensor3 True check_gradient_numerical(D=1025) +*tensor1 True check_gradient_numerical(D=2048) +*tensor2 True check_gradient_numerical(D=2048) +*tensor3 True check_gradient_numerical(D=2048) +*tensor1 True check_gradient_numerical(D=3096) +*tensor2 True check_gradient_numerical(D=3096) +*tensor3 True check_gradient_numerical(D=3096) +``` diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cc b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..d1758adbcd995189085ed1661be889cb7cf7a25c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cc @@ -0,0 +1,65 @@ +/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/extension.h" + +#include + +// declare GPU implementation +std::vector +MSDeformableAttnCUDAForward(const paddle::Tensor &value, + const paddle::Tensor &value_spatial_shapes, + const paddle::Tensor &value_level_start_index, + const paddle::Tensor &sampling_locations, + const paddle::Tensor &attention_weights); + +std::vector MSDeformableAttnCUDABackward( + const paddle::Tensor &value, const paddle::Tensor &value_spatial_shapes, + const paddle::Tensor &value_level_start_index, + const paddle::Tensor &sampling_locations, + const paddle::Tensor &attention_weights, const paddle::Tensor &grad_out); + +//// CPU not implemented + +std::vector> +MSDeformableAttnInferShape(std::vector value_shape, + std::vector value_spatial_shapes_shape, + std::vector value_level_start_index_shape, + std::vector sampling_locations_shape, + std::vector attention_weights_shape) { + return {{value_shape[0], sampling_locations_shape[1], + value_shape[2] * value_shape[3]}}; +} + +std::vector +MSDeformableAttnInferDtype(paddle::DataType value_dtype, + paddle::DataType value_spatial_shapes_dtype, + paddle::DataType value_level_start_index_dtype, + paddle::DataType sampling_locations_dtype, + paddle::DataType attention_weights_dtype) { + return {value_dtype}; +} + +PD_BUILD_OP(ms_deformable_attn) + .Inputs({"Value", "SpatialShapes", "LevelIndex", "SamplingLocations", + "AttentionWeights"}) + .Outputs({"Out"}) + .SetKernelFn(PD_KERNEL(MSDeformableAttnCUDAForward)) + .SetInferShapeFn(PD_INFER_SHAPE(MSDeformableAttnInferShape)) + .SetInferDtypeFn(PD_INFER_DTYPE(MSDeformableAttnInferDtype)); + +PD_BUILD_GRAD_OP(ms_deformable_attn) + .Inputs({"Value", "SpatialShapes", "LevelIndex", "SamplingLocations", + "AttentionWeights", paddle::Grad("Out")}) + .Outputs({paddle::Grad("Value"), paddle::Grad("SpatialShapes"), + paddle::Grad("LevelIndex"), paddle::Grad("SamplingLocations"), + paddle::Grad("AttentionWeights")}) + .SetKernelFn(PD_KERNEL(MSDeformableAttnCUDABackward)); diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cu b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cu new file mode 100644 index 0000000000000000000000000000000000000000..d5a8d16181bb53b9e5e5b3167adb283fba4db763 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/ms_deformable_attn_op.cu @@ -0,0 +1,1073 @@ +/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/extension.h" + +#define CUDA_KERNEL_LOOP(i, n) \ + for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \ + i += blockDim.x * gridDim.x) + +const int CUDA_NUM_THREADS = 1024; +inline int GET_BLOCKS(const int N, const int num_threads) { + return (N + num_threads - 1) / num_threads; +} + +// forward bilinear +template +__device__ data_t deformable_attn_bilinear_forward( + const data_t *&bottom_data, const int &height, const int &width, + const int &nheads, const int &channels, const data_t &h, const data_t &w, + const int &m, const int &c) { + const int h_low = floor(h); + const int w_low = floor(w); + const int h_high = h_low + 1; + const int w_high = w_low + 1; + + const data_t lh = h - h_low; + const data_t lw = w - w_low; + const data_t hh = 1 - lh, hw = 1 - lw; + + const int w_stride = nheads * channels; + const int h_stride = width * w_stride; + const int h_low_ptr_offset = h_low * h_stride; + const int h_high_ptr_offset = h_low_ptr_offset + h_stride; + const int w_low_ptr_offset = w_low * w_stride; + const int w_high_ptr_offset = w_low_ptr_offset + w_stride; + const int base_ptr = m * channels + c; + + data_t v1 = 0; + if (h_low >= 0 && w_low >= 0) { + const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr; + v1 = bottom_data[ptr1]; + } + data_t v2 = 0; + if (h_low >= 0 && w_high <= width - 1) { + const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr; + v2 = bottom_data[ptr2]; + } + data_t v3 = 0; + if (h_high <= height - 1 && w_low >= 0) { + const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr; + v3 = bottom_data[ptr3]; + } + data_t v4 = 0; + if (h_high <= height - 1 && w_high <= width - 1) { + const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr; + v4 = bottom_data[ptr4]; + } + + const data_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw; + + const data_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4); + return val; +} + +// forward kernel +template +__global__ void deformable_attn_cuda_kernel_forward( + const int n, const data_t *data_value, const int64_t *data_spatial_shapes, + const int64_t *data_level_start_index, const data_t *data_sampling_loc, + const data_t *data_attn_weight, const int batch_size, + const int value_length, const int num_heads, const int channels, + const int num_levels, const int query_length, const int num_points, + data_t *output_data_ptr) { + CUDA_KERNEL_LOOP(index, n) { + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + data_t *data_ptr = output_data_ptr + index; + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + data_t col = 0; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const data_t *data_value_ptr = data_value + (data_value_ptr_init_offset + + level_start_id * qid_stride); + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + col += deformable_attn_bilinear_forward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, + h_im, w_im, m_col, c_col) * + weight; + } + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + } + } + *data_ptr = col; + } +} + +#define CHECK_INPUT_GPU(x) PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.") +// forward +std::vector +MSDeformableAttnCUDAForward(const paddle::Tensor &value, + const paddle::Tensor &value_spatial_shapes, + const paddle::Tensor &value_level_start_index, + const paddle::Tensor &sampling_locations, + const paddle::Tensor &attention_weights) { + + CHECK_INPUT_GPU(value); + CHECK_INPUT_GPU(value_spatial_shapes); + CHECK_INPUT_GPU(value_level_start_index); + CHECK_INPUT_GPU(sampling_locations); + CHECK_INPUT_GPU(attention_weights); + + const int batch_size = value.shape()[0]; + const int value_length = value.shape()[1]; + const int num_heads = value.shape()[2]; + const int channels = value.shape()[3]; + + const int num_levels = value_spatial_shapes.shape()[0]; + const int query_length = sampling_locations.shape()[1]; + const int num_points = sampling_locations.shape()[4]; + + auto output = paddle::full({batch_size, query_length, num_heads * channels}, + 0, value.dtype(), paddle::GPUPlace()); + + const int num_kernels = batch_size * query_length * num_heads * channels; + deformable_attn_cuda_kernel_forward + <<>>(num_kernels, value.data(), + value_spatial_shapes.data(), + value_level_start_index.data(), + sampling_locations.data(), + attention_weights.data(), batch_size, + value_length, num_heads, channels, num_levels, + query_length, num_points, output.data()); + return {output}; +} + +// backward bilinear +template +__device__ void deformable_attn_bilinear_backward( + const data_t *&bottom_data, const int &height, const int &width, + const int &nheads, const int &channels, const data_t &h, const data_t &w, + const int &m, const int &c, const data_t &top_grad, + const data_t &attn_weight, data_t *&grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + const int h_low = floor(h); + const int w_low = floor(w); + const int h_high = h_low + 1; + const int w_high = w_low + 1; + + const data_t lh = h - h_low; + const data_t lw = w - w_low; + const data_t hh = 1 - lh, hw = 1 - lw; + + const int w_stride = nheads * channels; + const int h_stride = width * w_stride; + const int h_low_ptr_offset = h_low * h_stride; + const int h_high_ptr_offset = h_low_ptr_offset + h_stride; + const int w_low_ptr_offset = w_low * w_stride; + const int w_high_ptr_offset = w_low_ptr_offset + w_stride; + const int base_ptr = m * channels + c; + + const data_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw; + const data_t top_grad_value = top_grad * attn_weight; + data_t grad_h_weight = 0, grad_w_weight = 0; + + data_t v1 = 0; + if (h_low >= 0 && w_low >= 0) { + const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr; + v1 = bottom_data[ptr1]; + grad_h_weight -= hw * v1; + grad_w_weight -= hh * v1; + atomicAdd(grad_value + ptr1, w1 * top_grad_value); + } + data_t v2 = 0; + if (h_low >= 0 && w_high <= width - 1) { + const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr; + v2 = bottom_data[ptr2]; + grad_h_weight -= lw * v2; + grad_w_weight += hh * v2; + atomicAdd(grad_value + ptr2, w2 * top_grad_value); + } + data_t v3 = 0; + if (h_high <= height - 1 && w_low >= 0) { + const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr; + v3 = bottom_data[ptr3]; + grad_h_weight += hw * v3; + grad_w_weight -= lh * v3; + atomicAdd(grad_value + ptr3, w3 * top_grad_value); + } + data_t v4 = 0; + if (h_high <= height - 1 && w_high <= width - 1) { + const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr; + v4 = bottom_data[ptr4]; + grad_h_weight += lw * v4; + grad_w_weight += lh * v4; + atomicAdd(grad_value + ptr4, w4 * top_grad_value); + } + + const data_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4); + *grad_attn_weight = top_grad * val; + *grad_sampling_loc = width * grad_w_weight * top_grad_value; + *(grad_sampling_loc + 1) = height * grad_h_weight * top_grad_value; +} + +template +__device__ void deformable_attn_bilinear_backward_gm( + const data_t *&bottom_data, const int &height, const int &width, + const int &nheads, const int &channels, const data_t &h, const data_t &w, + const int &m, const int &c, const data_t &top_grad, + const data_t &attn_weight, data_t *&grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + const int h_low = floor(h); + const int w_low = floor(w); + const int h_high = h_low + 1; + const int w_high = w_low + 1; + + const data_t lh = h - h_low; + const data_t lw = w - w_low; + const data_t hh = 1 - lh, hw = 1 - lw; + + const int w_stride = nheads * channels; + const int h_stride = width * w_stride; + const int h_low_ptr_offset = h_low * h_stride; + const int h_high_ptr_offset = h_low_ptr_offset + h_stride; + const int w_low_ptr_offset = w_low * w_stride; + const int w_high_ptr_offset = w_low_ptr_offset + w_stride; + const int base_ptr = m * channels + c; + + const data_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw; + const data_t top_grad_value = top_grad * attn_weight; + data_t grad_h_weight = 0, grad_w_weight = 0; + + data_t v1 = 0; + if (h_low >= 0 && w_low >= 0) { + const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr; + v1 = bottom_data[ptr1]; + grad_h_weight -= hw * v1; + grad_w_weight -= hh * v1; + atomicAdd(grad_value + ptr1, w1 * top_grad_value); + } + data_t v2 = 0; + if (h_low >= 0 && w_high <= width - 1) { + const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr; + v2 = bottom_data[ptr2]; + grad_h_weight -= lw * v2; + grad_w_weight += hh * v2; + atomicAdd(grad_value + ptr2, w2 * top_grad_value); + } + data_t v3 = 0; + if (h_high <= height - 1 && w_low >= 0) { + const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr; + v3 = bottom_data[ptr3]; + grad_h_weight += hw * v3; + grad_w_weight -= lh * v3; + atomicAdd(grad_value + ptr3, w3 * top_grad_value); + } + data_t v4 = 0; + if (h_high <= height - 1 && w_high <= width - 1) { + const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr; + v4 = bottom_data[ptr4]; + grad_h_weight += lw * v4; + grad_w_weight += lh * v4; + atomicAdd(grad_value + ptr4, w4 * top_grad_value); + } + + const data_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4); + atomicAdd(grad_attn_weight, top_grad * val); + atomicAdd(grad_sampling_loc, width * grad_w_weight * top_grad_value); + atomicAdd(grad_sampling_loc + 1, height * grad_h_weight * top_grad_value); +} + +// backward kernels +// channels > 1024 +template +__global__ void deformable_attn_cuda_kernel_backward_shm_reduce_v2_multi_blocks( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + extern __shared__ int _s[]; + data_t *cache_grad_sampling_loc = (data_t *)_s; + data_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x; + unsigned int tid = threadIdx.x; + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0; + *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0; + *(cache_grad_attn_weight + threadIdx.x) = 0; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + cache_grad_sampling_loc + (threadIdx.x << 1), + cache_grad_attn_weight + threadIdx.x); + } + + __syncthreads(); + + for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0; + s >>= 1, spre >>= 1) { + if (tid < s) { + const unsigned int xid1 = tid << 1; + const unsigned int xid2 = (tid + s) << 1; + cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s]; + cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2]; + cache_grad_sampling_loc[xid1 + 1] += + cache_grad_sampling_loc[xid2 + 1]; + if (tid + (s << 1) < spre) { + cache_grad_attn_weight[tid] += + cache_grad_attn_weight[tid + (s << 1)]; + cache_grad_sampling_loc[xid1] += + cache_grad_sampling_loc[xid2 + (s << 1)]; + cache_grad_sampling_loc[xid1 + 1] += + cache_grad_sampling_loc[xid2 + 1 + (s << 1)]; + } + } + __syncthreads(); + } + + if (tid == 0) { + atomicAdd(grad_sampling_loc, cache_grad_sampling_loc[0]); + atomicAdd(grad_sampling_loc + 1, cache_grad_sampling_loc[1]); + atomicAdd(grad_attn_weight, cache_grad_attn_weight[0]); + } + __syncthreads(); + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +template +__global__ void deformable_attn_cuda_kernel_backward_gm( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward_gm( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + grad_sampling_loc, grad_attn_weight); + } + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +// channels <= 1024 +template +__global__ void +deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + __shared__ data_t cache_grad_sampling_loc[blockSize * 2]; + __shared__ data_t cache_grad_attn_weight[blockSize]; + unsigned int tid = threadIdx.x; + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0; + *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0; + *(cache_grad_attn_weight + threadIdx.x) = 0; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + cache_grad_sampling_loc + (threadIdx.x << 1), + cache_grad_attn_weight + threadIdx.x); + } + + __syncthreads(); + if (tid == 0) { + data_t _grad_w = cache_grad_sampling_loc[0], + _grad_h = cache_grad_sampling_loc[1], + _grad_a = cache_grad_attn_weight[0]; + int sid = 2; + for (unsigned int tid = 1; tid < blockSize; ++tid) { + _grad_w += cache_grad_sampling_loc[sid]; + _grad_h += cache_grad_sampling_loc[sid + 1]; + _grad_a += cache_grad_attn_weight[tid]; + sid += 2; + } + + *grad_sampling_loc = _grad_w; + *(grad_sampling_loc + 1) = _grad_h; + *grad_attn_weight = _grad_a; + } + __syncthreads(); + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +template +__global__ void +deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + __shared__ data_t cache_grad_sampling_loc[blockSize * 2]; + __shared__ data_t cache_grad_attn_weight[blockSize]; + unsigned int tid = threadIdx.x; + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0; + *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0; + *(cache_grad_attn_weight + threadIdx.x) = 0; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + cache_grad_sampling_loc + (threadIdx.x << 1), + cache_grad_attn_weight + threadIdx.x); + } + + __syncthreads(); + + for (unsigned int s = blockSize / 2; s > 0; s >>= 1) { + if (tid < s) { + const unsigned int xid1 = tid << 1; + const unsigned int xid2 = (tid + s) << 1; + cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s]; + cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2]; + cache_grad_sampling_loc[xid1 + 1] += + cache_grad_sampling_loc[xid2 + 1]; + } + __syncthreads(); + } + + if (tid == 0) { + *grad_sampling_loc = cache_grad_sampling_loc[0]; + *(grad_sampling_loc + 1) = cache_grad_sampling_loc[1]; + *grad_attn_weight = cache_grad_attn_weight[0]; + } + __syncthreads(); + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +template +__global__ void deformable_attn_cuda_kernel_backward_shm_reduce_v1( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + extern __shared__ int _s[]; + data_t *cache_grad_sampling_loc = (data_t *)_s; + data_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x; + unsigned int tid = threadIdx.x; + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0; + *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0; + *(cache_grad_attn_weight + threadIdx.x) = 0; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + cache_grad_sampling_loc + (threadIdx.x << 1), + cache_grad_attn_weight + threadIdx.x); + } + + __syncthreads(); + if (tid == 0) { + data_t _grad_w = cache_grad_sampling_loc[0], + _grad_h = cache_grad_sampling_loc[1], + _grad_a = cache_grad_attn_weight[0]; + int sid = 2; + for (unsigned int tid = 1; tid < blockDim.x; ++tid) { + _grad_w += cache_grad_sampling_loc[sid]; + _grad_h += cache_grad_sampling_loc[sid + 1]; + _grad_a += cache_grad_attn_weight[tid]; + sid += 2; + } + + *grad_sampling_loc = _grad_w; + *(grad_sampling_loc + 1) = _grad_h; + *grad_attn_weight = _grad_a; + } + __syncthreads(); + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +template +__global__ void deformable_attn_cuda_kernel_backward_shm_reduce_v2( + const int n, const data_t *grad_col, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + CUDA_KERNEL_LOOP(index, n) { + extern __shared__ int _s[]; + data_t *cache_grad_sampling_loc = (data_t *)_s; + data_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x; + unsigned int tid = threadIdx.x; + int _temp = index; + const int c_col = _temp % channels; + _temp /= channels; + const int sampling_index = _temp; + const int m_col = _temp % num_heads; + _temp /= num_heads; + const int q_col = _temp % query_length; + _temp /= query_length; + const int b_col = _temp; + + const data_t top_grad = grad_col[index]; + + int data_weight_ptr = sampling_index * num_levels * num_points; + int data_loc_w_ptr = data_weight_ptr << 1; + const int grad_sampling_ptr = data_weight_ptr; + grad_sampling_loc += grad_sampling_ptr << 1; + grad_attn_weight += grad_sampling_ptr; + const int grad_weight_stride = 1; + const int grad_loc_stride = 2; + const int qid_stride = num_heads * channels; + const int data_value_ptr_init_offset = b_col * value_length * qid_stride; + + for (int l_col = 0; l_col < num_levels; ++l_col) { + const int level_start_id = data_level_start_index[l_col]; + const int spatial_h_ptr = l_col << 1; + const int spatial_h = data_spatial_shapes[spatial_h_ptr]; + const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; + const int value_ptr_offset = + data_value_ptr_init_offset + level_start_id * qid_stride; + const data_t *data_value_ptr = data_value + value_ptr_offset; + data_t *grad_value_ptr = grad_value + value_ptr_offset; + + for (int p_col = 0; p_col < num_points; ++p_col) { + const data_t loc_w = data_sampling_loc[data_loc_w_ptr]; + const data_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; + const data_t weight = data_attn_weight[data_weight_ptr]; + + const data_t h_im = loc_h * spatial_h - 0.5; + const data_t w_im = loc_w * spatial_w - 0.5; + *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0; + *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0; + *(cache_grad_attn_weight + threadIdx.x) = 0; + if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) { + deformable_attn_bilinear_backward( + data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, + w_im, m_col, c_col, top_grad, weight, grad_value_ptr, + cache_grad_sampling_loc + (threadIdx.x << 1), + cache_grad_attn_weight + threadIdx.x); + } + + __syncthreads(); + + for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0; + s >>= 1, spre >>= 1) { + if (tid < s) { + const unsigned int xid1 = tid << 1; + const unsigned int xid2 = (tid + s) << 1; + cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s]; + cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2]; + cache_grad_sampling_loc[xid1 + 1] += + cache_grad_sampling_loc[xid2 + 1]; + if (tid + (s << 1) < spre) { + cache_grad_attn_weight[tid] += + cache_grad_attn_weight[tid + (s << 1)]; + cache_grad_sampling_loc[xid1] += + cache_grad_sampling_loc[xid2 + (s << 1)]; + cache_grad_sampling_loc[xid1 + 1] += + cache_grad_sampling_loc[xid2 + 1 + (s << 1)]; + } + } + __syncthreads(); + } + + if (tid == 0) { + *grad_sampling_loc = cache_grad_sampling_loc[0]; + *(grad_sampling_loc + 1) = cache_grad_sampling_loc[1]; + *grad_attn_weight = cache_grad_attn_weight[0]; + } + __syncthreads(); + + data_weight_ptr += 1; + data_loc_w_ptr += 2; + grad_attn_weight += grad_weight_stride; + grad_sampling_loc += grad_loc_stride; + } + } + } +} + +// backward branch +template +void deformable_attn_cuda_backward( + cudaStream_t stream, const data_t *grad_out, const data_t *data_value, + const int64_t *data_spatial_shapes, const int64_t *data_level_start_index, + const data_t *data_sampling_loc, const data_t *data_attn_weight, + const int batch_size, const int value_length, const int num_heads, + const int channels, const int num_levels, const int query_length, + const int num_points, data_t *grad_value, data_t *grad_sampling_loc, + data_t *grad_attn_weight) { + const int num_threads = + (channels > CUDA_NUM_THREADS) ? CUDA_NUM_THREADS : channels; + const int num_kernels = batch_size * query_length * num_heads * channels; + const int num_actual_kernels = + batch_size * query_length * num_heads * channels; + if (channels > 1024) { + if ((channels & 1023) == 0) { + deformable_attn_cuda_kernel_backward_shm_reduce_v2_multi_blocks + <<>>( + num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, data_attn_weight, + batch_size, value_length, num_heads, channels, num_levels, + query_length, num_points, grad_value, grad_sampling_loc, + grad_attn_weight); + } else { + deformable_attn_cuda_kernel_backward_gm + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + } + } else { + switch (channels) { + case 1: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 2: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 4: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 8: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 16: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 32: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v1 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 64: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 128: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 256: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 512: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + case 1024: + deformable_attn_cuda_kernel_backward_shm_blocksize_aware_reduce_v2 + <<>>(num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, + data_attn_weight, batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, + grad_value, grad_sampling_loc, grad_attn_weight); + break; + default: + if (channels < 64) { + deformable_attn_cuda_kernel_backward_shm_reduce_v1 + <<>>( + num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, data_attn_weight, + batch_size, value_length, num_heads, channels, num_levels, + query_length, num_points, grad_value, grad_sampling_loc, + grad_attn_weight); + } else { + deformable_attn_cuda_kernel_backward_shm_reduce_v2 + <<>>( + num_kernels, grad_out, data_value, data_spatial_shapes, + data_level_start_index, data_sampling_loc, data_attn_weight, + batch_size, value_length, num_heads, channels, num_levels, + query_length, num_points, grad_value, grad_sampling_loc, + grad_attn_weight); + } + } + } +} + +// backward +std::vector MSDeformableAttnCUDABackward( + const paddle::Tensor &value, const paddle::Tensor &value_spatial_shapes, + const paddle::Tensor &value_level_start_index, + const paddle::Tensor &sampling_locations, + const paddle::Tensor &attention_weights, const paddle::Tensor &grad_out) { + + CHECK_INPUT_GPU(value); + CHECK_INPUT_GPU(value_spatial_shapes); + CHECK_INPUT_GPU(value_level_start_index); + CHECK_INPUT_GPU(sampling_locations); + CHECK_INPUT_GPU(attention_weights); + CHECK_INPUT_GPU(grad_out); + + const int batch_size = value.shape()[0]; + const int value_length = value.shape()[1]; + const int num_heads = value.shape()[2]; + const int channels = value.shape()[3]; + + const int num_levels = value_spatial_shapes.shape()[0]; + const int query_length = sampling_locations.shape()[1]; + const int num_points = sampling_locations.shape()[4]; + + auto grad_value = + paddle::full(value.shape(), 0, value.dtype(), paddle::GPUPlace()); + auto grad_spatial_shapes = + paddle::full(value.shape(), 0, value.dtype(), paddle::GPUPlace()); + auto grad_level_start_index = + paddle::full(value.shape(), 0, value.dtype(), paddle::GPUPlace()); + auto grad_sampling_locations = + paddle::full(sampling_locations.shape(), 0, sampling_locations.dtype(), + paddle::GPUPlace()); + auto grad_attention_weights = + paddle::full(attention_weights.shape(), 0, attention_weights.dtype(), + paddle::GPUPlace()); + + deformable_attn_cuda_backward( + value.stream(), grad_out.data(), value.data(), + value_spatial_shapes.data(), + value_level_start_index.data(), sampling_locations.data(), + attention_weights.data(), batch_size, value_length, num_heads, + channels, num_levels, query_length, num_points, grad_value.data(), + grad_sampling_locations.data(), + grad_attention_weights.data()); + + return {grad_value, grad_spatial_shapes, grad_level_start_index, + grad_sampling_locations, grad_attention_weights}; +} diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/setup_ms_deformable_attn_op.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/setup_ms_deformable_attn_op.py new file mode 100644 index 0000000000000000000000000000000000000000..7c3c386677e5d5eb5ccc91315e958cb9efc21c3e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/setup_ms_deformable_attn_op.py @@ -0,0 +1,7 @@ +from paddle.utils.cpp_extension import CUDAExtension, setup + +if __name__ == "__main__": + setup( + name='deformable_detr_ops', + ext_modules=CUDAExtension( + sources=['ms_deformable_attn_op.cc', 'ms_deformable_attn_op.cu'])) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/test_ms_deformable_attn_op.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/test_ms_deformable_attn_op.py new file mode 100644 index 0000000000000000000000000000000000000000..94a05737cbcd6deef55b10f73d39dbd46beeebf7 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/ext_op/test_ms_deformable_attn_op.py @@ -0,0 +1,140 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import print_function +from __future__ import division + +import os +import sys +import random +import numpy as np +import paddle +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 5))) +if parent_path not in sys.path: + sys.path.append(parent_path) + +from ppdet.modeling.transformers.utils import deformable_attention_core_func +ms_deform_attn_core_paddle = deformable_attention_core_func + +try: + gpu_index = int(sys.argv[1]) +except: + gpu_index = 0 +print(f'Use gpu {gpu_index} to test...') +paddle.set_device(f'gpu:{gpu_index}') + +try: + from deformable_detr_ops import ms_deformable_attn +except Exception as e: + print('import deformable_detr_ops error', e) + sys.exit(-1) + +paddle.seed(1) +random.seed(1) +np.random.seed(1) + +bs, n_heads, c = 2, 8, 8 +query_length, n_levels, n_points = 2, 2, 2 +spatial_shapes = paddle.to_tensor([(6, 4), (3, 2)], dtype=paddle.int64) +level_start_index = paddle.concat((paddle.to_tensor( + [0], dtype=paddle.int64), spatial_shapes.prod(1).cumsum(0)[:-1])) +value_length = sum([(H * W).item() for H, W in spatial_shapes]) + + +def get_test_tensors(channels): + value = paddle.rand( + [bs, value_length, n_heads, channels], dtype=paddle.float32) * 0.01 + sampling_locations = paddle.rand( + [bs, query_length, n_heads, n_levels, n_points, 2], + dtype=paddle.float32) + attention_weights = paddle.rand( + [bs, query_length, n_heads, n_levels, n_points], + dtype=paddle.float32) + 1e-5 + attention_weights /= attention_weights.sum(-1, keepdim=True).sum( + -2, keepdim=True) + + return [value, sampling_locations, attention_weights] + + +@paddle.no_grad() +def check_forward_equal_with_paddle_float(): + value, sampling_locations, attention_weights = get_test_tensors(c) + + output_paddle = ms_deform_attn_core_paddle( + value, spatial_shapes, level_start_index, sampling_locations, + attention_weights).detach().cpu() + output_cuda = ms_deformable_attn(value, spatial_shapes, level_start_index, + sampling_locations, + attention_weights).detach().cpu() + fwdok = paddle.allclose( + output_cuda, output_paddle, rtol=1e-2, atol=1e-3).item() + max_abs_err = (output_cuda - output_paddle).abs().max().item() + max_rel_err = ( + (output_cuda - output_paddle).abs() / output_paddle.abs()).max().item() + + print( + f'*{fwdok} check_forward_equal_with_paddle_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}' + ) + + +def check_gradient_numerical(channels=4): + value_paddle, sampling_locations_paddle, attention_weights_paddle = get_test_tensors( + channels) + value_paddle.stop_gradient = False + sampling_locations_paddle.stop_gradient = False + attention_weights_paddle.stop_gradient = False + + value_cuda = value_paddle.detach().clone() + sampling_locations_cuda = sampling_locations_paddle.detach().clone() + attention_weights_cuda = attention_weights_paddle.detach().clone() + value_cuda.stop_gradient = False + sampling_locations_cuda.stop_gradient = False + attention_weights_cuda.stop_gradient = False + + output_paddle = ms_deform_attn_core_paddle( + value_paddle, spatial_shapes, level_start_index, + sampling_locations_paddle, attention_weights_paddle) + output_paddle.sum().backward() + + output_cuda = ms_deformable_attn(value_cuda, spatial_shapes, + level_start_index, sampling_locations_cuda, + attention_weights_cuda) + output_cuda.sum().backward() + + res = paddle.allclose( + value_paddle.grad, value_cuda.grad, rtol=1e-2, atol=1e-3).item() + print(f'*tensor1 {res} check_gradient_numerical(D={channels})') + + res = paddle.allclose( + sampling_locations_paddle.grad, + sampling_locations_cuda.grad, + rtol=1e-2, + atol=1e-3).item() + print(f'*tensor2 {res} check_gradient_numerical(D={channels})') + + res = paddle.allclose( + attention_weights_paddle.grad, + attention_weights_cuda.grad, + rtol=1e-2, + atol=1e-3).item() + print(f'*tensor3 {res} check_gradient_numerical(D={channels})') + + +if __name__ == '__main__': + check_forward_equal_with_paddle_float() + + for channels in [30, 32, 64, 71, 128, 1024, 1025, 2048, 3096]: + check_gradient_numerical(channels) diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/matchers.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/matchers.py new file mode 100644 index 0000000000000000000000000000000000000000..794d8632803c3546123cb595249c3343c6d48fba --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/matchers.py @@ -0,0 +1,126 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from DETR (https://github.com/facebookresearch/detr) +# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from scipy.optimize import linear_sum_assignment + +from ppdet.core.workspace import register, serializable +from ..losses.iou_loss import GIoULoss +from .utils import bbox_cxcywh_to_xyxy + +__all__ = ['HungarianMatcher'] + + +@register +@serializable +class HungarianMatcher(nn.Layer): + __shared__ = ['use_focal_loss'] + + def __init__(self, + matcher_coeff={'class': 1, + 'bbox': 5, + 'giou': 2}, + use_focal_loss=False, + alpha=0.25, + gamma=2.0): + r""" + Args: + matcher_coeff (dict): The coefficient of hungarian matcher cost. + """ + super(HungarianMatcher, self).__init__() + self.matcher_coeff = matcher_coeff + self.use_focal_loss = use_focal_loss + self.alpha = alpha + self.gamma = gamma + + self.giou_loss = GIoULoss() + + def forward(self, boxes, logits, gt_bbox, gt_class): + r""" + Args: + boxes (Tensor): [b, query, 4] + logits (Tensor): [b, query, num_classes] + gt_bbox (List(Tensor)): list[[n, 4]] + gt_class (List(Tensor)): list[[n, 1]] + + Returns: + A list of size batch_size, containing tuples of (index_i, index_j) where: + - index_i is the indices of the selected predictions (in order) + - index_j is the indices of the corresponding selected targets (in order) + For each batch element, it holds: + len(index_i) = len(index_j) = min(num_queries, num_target_boxes) + """ + bs, num_queries = boxes.shape[:2] + + num_gts = sum(len(a) for a in gt_class) + if num_gts == 0: + return [(paddle.to_tensor( + [], dtype=paddle.int64), paddle.to_tensor( + [], dtype=paddle.int64)) for _ in range(bs)] + + # We flatten to compute the cost matrices in a batch + # [batch_size * num_queries, num_classes] + out_prob = F.sigmoid(logits.flatten( + 0, 1)) if self.use_focal_loss else F.softmax(logits.flatten(0, 1)) + # [batch_size * num_queries, 4] + out_bbox = boxes.flatten(0, 1) + + # Also concat the target labels and boxes + tgt_ids = paddle.concat(gt_class).flatten() + tgt_bbox = paddle.concat(gt_bbox) + + # Compute the classification cost + if self.use_focal_loss: + neg_cost_class = (1 - self.alpha) * (out_prob**self.gamma) * (-( + 1 - out_prob + 1e-8).log()) + pos_cost_class = self.alpha * ( + (1 - out_prob)**self.gamma) * (-(out_prob + 1e-8).log()) + cost_class = paddle.gather( + pos_cost_class, tgt_ids, axis=1) - paddle.gather( + neg_cost_class, tgt_ids, axis=1) + else: + cost_class = -paddle.gather(out_prob, tgt_ids, axis=1) + + # Compute the L1 cost between boxes + cost_bbox = ( + out_bbox.unsqueeze(1) - tgt_bbox.unsqueeze(0)).abs().sum(-1) + + # Compute the giou cost betwen boxes + cost_giou = self.giou_loss( + bbox_cxcywh_to_xyxy(out_bbox.unsqueeze(1)), + bbox_cxcywh_to_xyxy(tgt_bbox.unsqueeze(0))).squeeze(-1) + + # Final cost matrix + C = self.matcher_coeff['class'] * cost_class + self.matcher_coeff['bbox'] * cost_bbox + \ + self.matcher_coeff['giou'] * cost_giou + C = C.reshape([bs, num_queries, -1]) + C = [a.squeeze(0) for a in C.chunk(bs)] + + sizes = [a.shape[0] for a in gt_bbox] + indices = [ + linear_sum_assignment(c.split(sizes, -1)[i].numpy()) + for i, c in enumerate(C) + ] + return [(paddle.to_tensor( + i, dtype=paddle.int64), paddle.to_tensor( + j, dtype=paddle.int64)) for i, j in indices] diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/petr_transformer.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/petr_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..7859b0df028bf5da7a615c427de4fb0850bfca2e --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/petr_transformer.py @@ -0,0 +1,1198 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +this code is base on https://github.com/hikvision-research/opera/blob/main/opera/models/utils/transformer.py +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr + +from ppdet.core.workspace import register +from ..layers import MultiHeadAttention, _convert_attention_mask +from .utils import _get_clones +from ..initializer import linear_init_, normal_, constant_, xavier_uniform_ + +__all__ = [ + 'PETRTransformer', 'MultiScaleDeformablePoseAttention', + 'PETR_TransformerDecoderLayer', 'PETR_TransformerDecoder', + 'PETR_DeformableDetrTransformerDecoder', + 'PETR_DeformableTransformerDecoder', 'TransformerEncoderLayer', + 'TransformerEncoder', 'MSDeformableAttention' +] + + +def masked_fill(x, mask, value): + y = paddle.full(x.shape, value, x.dtype) + return paddle.where(mask, y, x) + + +def inverse_sigmoid(x, eps=1e-5): + """Inverse function of sigmoid. + + Args: + x (Tensor): The tensor to do the + inverse. + eps (float): EPS avoid numerical + overflow. Defaults 1e-5. + Returns: + Tensor: The x has passed the inverse + function of sigmoid, has same + shape with input. + """ + x = x.clip(min=0, max=1) + x1 = x.clip(min=eps) + x2 = (1 - x).clip(min=eps) + return paddle.log(x1 / x2) + + +@register +class TransformerEncoderLayer(nn.Layer): + __inject__ = ['attn'] + + def __init__(self, + d_model, + attn=None, + nhead=8, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(TransformerEncoderLayer, self).__init__() + attn_dropout = dropout if attn_dropout is None else attn_dropout + act_dropout = dropout if act_dropout is None else act_dropout + self.normalize_before = normalize_before + self.embed_dims = d_model + + if attn is None: + self.self_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + else: + self.self_attn = attn + # Implementation of Feedforward model + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(act_dropout, mode="upscale_in_train") + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout2 = nn.Dropout(dropout, mode="upscale_in_train") + self.activation = getattr(F, activation) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + + @staticmethod + def with_pos_embed(tensor, pos_embed): + return tensor if pos_embed is None else tensor + pos_embed + + def forward(self, src, src_mask=None, pos_embed=None, **kwargs): + residual = src + if self.normalize_before: + src = self.norm1(src) + q = k = self.with_pos_embed(src, pos_embed) + src = self.self_attn(q, k, value=src, attn_mask=src_mask, **kwargs) + + src = residual + self.dropout1(src) + if not self.normalize_before: + src = self.norm1(src) + + residual = src + if self.normalize_before: + src = self.norm2(src) + src = self.linear2(self.dropout(self.activation(self.linear1(src)))) + src = residual + self.dropout2(src) + if not self.normalize_before: + src = self.norm2(src) + return src + + +@register +class TransformerEncoder(nn.Layer): + __inject__ = ['encoder_layer'] + + def __init__(self, encoder_layer, num_layers, norm=None): + super(TransformerEncoder, self).__init__() + self.layers = _get_clones(encoder_layer, num_layers) + self.num_layers = num_layers + self.norm = norm + self.embed_dims = encoder_layer.embed_dims + + def forward(self, src, src_mask=None, pos_embed=None, **kwargs): + output = src + for layer in self.layers: + output = layer( + output, src_mask=src_mask, pos_embed=pos_embed, **kwargs) + + if self.norm is not None: + output = self.norm(output) + + return output + + +@register +class MSDeformableAttention(nn.Layer): + def __init__(self, + embed_dim=256, + num_heads=8, + num_levels=4, + num_points=4, + lr_mult=0.1): + """ + Multi-Scale Deformable Attention Module + """ + super(MSDeformableAttention, self).__init__() + self.embed_dim = embed_dim + self.num_heads = num_heads + self.num_levels = num_levels + self.num_points = num_points + self.total_points = num_heads * num_levels * num_points + + self.head_dim = embed_dim // num_heads + assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" + + self.sampling_offsets = nn.Linear( + embed_dim, + self.total_points * 2, + weight_attr=ParamAttr(learning_rate=lr_mult), + bias_attr=ParamAttr(learning_rate=lr_mult)) + + self.attention_weights = nn.Linear(embed_dim, self.total_points) + self.value_proj = nn.Linear(embed_dim, embed_dim) + self.output_proj = nn.Linear(embed_dim, embed_dim) + try: + # use cuda op + print("use deformable_detr_ops in ms_deformable_attn") + from deformable_detr_ops import ms_deformable_attn + except: + # use paddle func + from .utils import deformable_attention_core_func as ms_deformable_attn + self.ms_deformable_attn_core = ms_deformable_attn + + self._reset_parameters() + + def _reset_parameters(self): + # sampling_offsets + constant_(self.sampling_offsets.weight) + thetas = paddle.arange( + self.num_heads, + dtype=paddle.float32) * (2.0 * math.pi / self.num_heads) + grid_init = paddle.stack([thetas.cos(), thetas.sin()], -1) + grid_init = grid_init / grid_init.abs().max(-1, keepdim=True) + grid_init = grid_init.reshape([self.num_heads, 1, 1, 2]).tile( + [1, self.num_levels, self.num_points, 1]) + scaling = paddle.arange( + 1, self.num_points + 1, + dtype=paddle.float32).reshape([1, 1, -1, 1]) + grid_init *= scaling + self.sampling_offsets.bias.set_value(grid_init.flatten()) + # attention_weights + constant_(self.attention_weights.weight) + constant_(self.attention_weights.bias) + # proj + xavier_uniform_(self.value_proj.weight) + constant_(self.value_proj.bias) + xavier_uniform_(self.output_proj.weight) + constant_(self.output_proj.bias) + + def forward(self, + query, + key, + value, + reference_points, + value_spatial_shapes, + value_level_start_index, + attn_mask=None, + **kwargs): + """ + Args: + query (Tensor): [bs, query_length, C] + reference_points (Tensor): [bs, query_length, n_levels, 2], range in [0, 1], top-left (0,0), + bottom-right (1, 1), including padding area + value (Tensor): [bs, value_length, C] + value_spatial_shapes (Tensor): [n_levels, 2], [(H_0, W_0), (H_1, W_1), ..., (H_{L-1}, W_{L-1})] + value_level_start_index (Tensor(int64)): [n_levels], [0, H_0*W_0, H_0*W_0+H_1*W_1, ...] + attn_mask (Tensor): [bs, value_length], True for non-padding elements, False for padding elements + + Returns: + output (Tensor): [bs, Length_{query}, C] + """ + bs, Len_q = query.shape[:2] + Len_v = value.shape[1] + assert int(value_spatial_shapes.prod(1).sum()) == Len_v + + value = self.value_proj(value) + if attn_mask is not None: + attn_mask = attn_mask.astype(value.dtype).unsqueeze(-1) + value *= attn_mask + value = value.reshape([bs, Len_v, self.num_heads, self.head_dim]) + + sampling_offsets = self.sampling_offsets(query).reshape( + [bs, Len_q, self.num_heads, self.num_levels, self.num_points, 2]) + attention_weights = self.attention_weights(query).reshape( + [bs, Len_q, self.num_heads, self.num_levels * self.num_points]) + attention_weights = F.softmax(attention_weights).reshape( + [bs, Len_q, self.num_heads, self.num_levels, self.num_points]) + + if reference_points.shape[-1] == 2: + offset_normalizer = value_spatial_shapes.flip([1]).reshape( + [1, 1, 1, self.num_levels, 1, 2]) + sampling_locations = reference_points.reshape([ + bs, Len_q, 1, self.num_levels, 1, 2 + ]) + sampling_offsets / offset_normalizer + elif reference_points.shape[-1] == 4: + sampling_locations = ( + reference_points[:, :, None, :, None, :2] + sampling_offsets / + self.num_points * reference_points[:, :, None, :, None, 2:] * + 0.5) + else: + raise ValueError( + "Last dim of reference_points must be 2 or 4, but get {} instead.". + format(reference_points.shape[-1])) + + output = self.ms_deformable_attn_core( + value, value_spatial_shapes, value_level_start_index, + sampling_locations, attention_weights) + output = self.output_proj(output) + + return output + + +@register +class MultiScaleDeformablePoseAttention(nn.Layer): + """An attention module used in PETR. `End-to-End Multi-Person + Pose Estimation with Transformers`. + + Args: + embed_dims (int): The embedding dimension of Attention. + Default: 256. + num_heads (int): Parallel attention heads. Default: 8. + num_levels (int): The number of feature map used in + Attention. Default: 4. + num_points (int): The number of sampling points for + each query in each head. Default: 17. + im2col_step (int): The step used in image_to_column. + Default: 64. + dropout (float): A Dropout layer on `inp_residual`. + Default: 0.1. + init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization. + Default: None. + """ + + def __init__(self, + embed_dims=256, + num_heads=8, + num_levels=4, + num_points=17, + im2col_step=64, + dropout=0.1, + norm_cfg=None, + init_cfg=None, + batch_first=False, + lr_mult=0.1): + super().__init__() + if embed_dims % num_heads != 0: + raise ValueError(f'embed_dims must be divisible by num_heads, ' + f'but got {embed_dims} and {num_heads}') + dim_per_head = embed_dims // num_heads + self.norm_cfg = norm_cfg + self.init_cfg = init_cfg + self.dropout = nn.Dropout(dropout) + self.batch_first = batch_first + + # you'd better set dim_per_head to a power of 2 + # which is more efficient in the CUDA implementation + def _is_power_of_2(n): + if (not isinstance(n, int)) or (n < 0): + raise ValueError( + 'invalid input for _is_power_of_2: {} (type: {})'.format( + n, type(n))) + return (n & (n - 1) == 0) and n != 0 + + if not _is_power_of_2(dim_per_head): + warnings.warn("You'd better set embed_dims in " + 'MultiScaleDeformAttention to make ' + 'the dimension of each attention head a power of 2 ' + 'which is more efficient in our CUDA implementation.') + + self.im2col_step = im2col_step + self.embed_dims = embed_dims + self.num_levels = num_levels + self.num_heads = num_heads + self.num_points = num_points + self.sampling_offsets = nn.Linear( + embed_dims, + num_heads * num_levels * num_points * 2, + weight_attr=ParamAttr(learning_rate=lr_mult), + bias_attr=ParamAttr(learning_rate=lr_mult)) + self.attention_weights = nn.Linear(embed_dims, + num_heads * num_levels * num_points) + self.value_proj = nn.Linear(embed_dims, embed_dims) + self.output_proj = nn.Linear(embed_dims, embed_dims) + + try: + # use cuda op + from deformable_detr_ops import ms_deformable_attn + except: + # use paddle func + from .utils import deformable_attention_core_func as ms_deformable_attn + self.ms_deformable_attn_core = ms_deformable_attn + + self.init_weights() + + def init_weights(self): + """Default initialization for Parameters of Module.""" + constant_(self.sampling_offsets.weight) + constant_(self.sampling_offsets.bias) + constant_(self.attention_weights.weight) + constant_(self.attention_weights.bias) + xavier_uniform_(self.value_proj.weight) + constant_(self.value_proj.bias) + xavier_uniform_(self.output_proj.weight) + constant_(self.output_proj.bias) + + def forward(self, + query, + key, + value, + residual=None, + attn_mask=None, + reference_points=None, + value_spatial_shapes=None, + value_level_start_index=None, + **kwargs): + """Forward Function of MultiScaleDeformAttention. + + Args: + query (Tensor): Query of Transformer with shape + (num_query, bs, embed_dims). + key (Tensor): The key tensor with shape (num_key, bs, embed_dims). + value (Tensor): The value tensor with shape + (num_key, bs, embed_dims). + residual (Tensor): The tensor used for addition, with the + same shape as `x`. Default None. If None, `x` will be used. + reference_points (Tensor): The normalized reference points with + shape (bs, num_query, num_levels, K*2), all elements is range + in [0, 1], top-left (0,0), bottom-right (1, 1), including + padding area. + attn_mask (Tensor): ByteTensor for `query`, with + shape [bs, num_key]. + value_spatial_shapes (Tensor): Spatial shape of features in + different level. With shape (num_levels, 2), + last dimension represent (h, w). + value_level_start_index (Tensor): The start index of each level. + A tensor has shape (num_levels) and can be represented + as [0, h_0*w_0, h_0*w_0+h_1*w_1, ...]. + + Returns: + Tensor: forwarded results with shape [num_query, bs, embed_dims]. + """ + + if key is None: + key = query + if value is None: + value = key + + bs, num_query, _ = query.shape + bs, num_key, _ = value.shape + assert (value_spatial_shapes[:, 0].numpy() * + value_spatial_shapes[:, 1].numpy()).sum() == num_key + + value = self.value_proj(value) + if attn_mask is not None: + # value = value.masked_fill(attn_mask[..., None], 0.0) + value *= attn_mask.unsqueeze(-1) + value = value.reshape([bs, num_key, self.num_heads, -1]) + sampling_offsets = self.sampling_offsets(query).reshape([ + bs, num_query, self.num_heads, self.num_levels, self.num_points, 2 + ]) + attention_weights = self.attention_weights(query).reshape( + [bs, num_query, self.num_heads, self.num_levels * self.num_points]) + attention_weights = F.softmax(attention_weights, axis=-1) + + attention_weights = attention_weights.reshape( + [bs, num_query, self.num_heads, self.num_levels, self.num_points]) + if reference_points.shape[-1] == self.num_points * 2: + reference_points_reshape = reference_points.reshape( + (bs, num_query, self.num_levels, -1, 2)).unsqueeze(2) + x1 = reference_points[:, :, :, 0::2].min(axis=-1, keepdim=True) + y1 = reference_points[:, :, :, 1::2].min(axis=-1, keepdim=True) + x2 = reference_points[:, :, :, 0::2].max(axis=-1, keepdim=True) + y2 = reference_points[:, :, :, 1::2].max(axis=-1, keepdim=True) + w = paddle.clip(x2 - x1, min=1e-4) + h = paddle.clip(y2 - y1, min=1e-4) + wh = paddle.concat([w, h], axis=-1)[:, :, None, :, None, :] + + sampling_locations = reference_points_reshape \ + + sampling_offsets * wh * 0.5 + else: + raise ValueError( + f'Last dim of reference_points must be' + f' 2K, but get {reference_points.shape[-1]} instead.') + + output = self.ms_deformable_attn_core( + value, value_spatial_shapes, value_level_start_index, + sampling_locations, attention_weights) + + output = self.output_proj(output) + return output + + +@register +class PETR_TransformerDecoderLayer(nn.Layer): + __inject__ = ['self_attn', 'cross_attn'] + + def __init__(self, + d_model, + nhead=8, + self_attn=None, + cross_attn=None, + dim_feedforward=2048, + dropout=0.1, + activation="relu", + attn_dropout=None, + act_dropout=None, + normalize_before=False): + super(PETR_TransformerDecoderLayer, self).__init__() + attn_dropout = dropout if attn_dropout is None else attn_dropout + act_dropout = dropout if act_dropout is None else act_dropout + self.normalize_before = normalize_before + + if self_attn is None: + self.self_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + else: + self.self_attn = self_attn + if cross_attn is None: + self.cross_attn = MultiHeadAttention(d_model, nhead, attn_dropout) + else: + self.cross_attn = cross_attn + # Implementation of Feedforward model + self.linear1 = nn.Linear(d_model, dim_feedforward) + self.dropout = nn.Dropout(act_dropout, mode="upscale_in_train") + self.linear2 = nn.Linear(dim_feedforward, d_model) + + self.norm1 = nn.LayerNorm(d_model) + self.norm2 = nn.LayerNorm(d_model) + self.norm3 = nn.LayerNorm(d_model) + self.dropout1 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout2 = nn.Dropout(dropout, mode="upscale_in_train") + self.dropout3 = nn.Dropout(dropout, mode="upscale_in_train") + self.activation = getattr(F, activation) + self._reset_parameters() + + def _reset_parameters(self): + linear_init_(self.linear1) + linear_init_(self.linear2) + + @staticmethod + def with_pos_embed(tensor, pos_embed): + return tensor if pos_embed is None else tensor + pos_embed + + def forward(self, + tgt, + memory, + tgt_mask=None, + memory_mask=None, + pos_embed=None, + query_pos_embed=None, + **kwargs): + tgt_mask = _convert_attention_mask(tgt_mask, tgt.dtype) + + residual = tgt + if self.normalize_before: + tgt = self.norm1(tgt) + q = k = self.with_pos_embed(tgt, query_pos_embed) + tgt = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask) + tgt = residual + self.dropout1(tgt) + if not self.normalize_before: + tgt = self.norm1(tgt) + + residual = tgt + if self.normalize_before: + tgt = self.norm2(tgt) + q = self.with_pos_embed(tgt, query_pos_embed) + key_tmp = tgt + # k = self.with_pos_embed(memory, pos_embed) + tgt = self.cross_attn( + q, key=key_tmp, value=memory, attn_mask=memory_mask, **kwargs) + tgt = residual + self.dropout2(tgt) + if not self.normalize_before: + tgt = self.norm2(tgt) + + residual = tgt + if self.normalize_before: + tgt = self.norm3(tgt) + tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt)))) + tgt = residual + self.dropout3(tgt) + if not self.normalize_before: + tgt = self.norm3(tgt) + return tgt + + +@register +class PETR_TransformerDecoder(nn.Layer): + """Implements the decoder in PETR transformer. + + Args: + return_intermediate (bool): Whether to return intermediate outputs. + coder_norm_cfg (dict): Config of last normalization layer. Default: + `LN`. + """ + __inject__ = ['decoder_layer'] + + def __init__(self, + decoder_layer, + num_layers, + norm=None, + return_intermediate=False, + num_keypoints=17, + **kwargs): + super(PETR_TransformerDecoder, self).__init__() + self.layers = _get_clones(decoder_layer, num_layers) + self.num_layers = num_layers + self.norm = norm + self.return_intermediate = return_intermediate + self.num_keypoints = num_keypoints + + def forward(self, + query, + *args, + reference_points=None, + valid_ratios=None, + kpt_branches=None, + **kwargs): + """Forward function for `TransformerDecoder`. + + Args: + query (Tensor): Input query with shape (num_query, bs, embed_dims). + reference_points (Tensor): The reference points of offset, + has shape (bs, num_query, K*2). + valid_ratios (Tensor): The radios of valid points on the feature + map, has shape (bs, num_levels, 2). + kpt_branches: (obj:`nn.LayerList`): Used for refining the + regression results. Only would be passed when `with_box_refine` + is True, otherwise would be passed a `None`. + + Returns: + tuple (Tensor): Results with shape [1, num_query, bs, embed_dims] when + return_intermediate is `False`, otherwise it has shape + [num_layers, num_query, bs, embed_dims] and + [num_layers, bs, num_query, K*2]. + """ + output = query + intermediate = [] + intermediate_reference_points = [] + for lid, layer in enumerate(self.layers): + if reference_points.shape[-1] == self.num_keypoints * 2: + reference_points_input = \ + reference_points[:, :, None] * \ + valid_ratios.tile((1, 1, self.num_keypoints))[:, None] + else: + assert reference_points.shape[-1] == 2 + reference_points_input = reference_points[:, :, None] * \ + valid_ratios[:, None] + output = layer( + output, + *args, + reference_points=reference_points_input, + **kwargs) + + if kpt_branches is not None: + tmp = kpt_branches[lid](output) + if reference_points.shape[-1] == self.num_keypoints * 2: + new_reference_points = tmp + inverse_sigmoid( + reference_points) + new_reference_points = F.sigmoid(new_reference_points) + else: + raise NotImplementedError + reference_points = new_reference_points.detach() + + if self.return_intermediate: + intermediate.append(output) + intermediate_reference_points.append(reference_points) + + if self.return_intermediate: + return paddle.stack(intermediate), paddle.stack( + intermediate_reference_points) + + return output, reference_points + + +@register +class PETR_DeformableTransformerDecoder(nn.Layer): + __inject__ = ['decoder_layer'] + + def __init__(self, decoder_layer, num_layers, return_intermediate=False): + super(PETR_DeformableTransformerDecoder, self).__init__() + self.layers = _get_clones(decoder_layer, num_layers) + self.num_layers = num_layers + self.return_intermediate = return_intermediate + + def forward(self, + tgt, + reference_points, + memory, + memory_spatial_shapes, + memory_mask=None, + query_pos_embed=None): + output = tgt + intermediate = [] + for lid, layer in enumerate(self.layers): + output = layer(output, reference_points, memory, + memory_spatial_shapes, memory_mask, query_pos_embed) + + if self.return_intermediate: + intermediate.append(output) + + if self.return_intermediate: + return paddle.stack(intermediate) + + return output.unsqueeze(0) + + +@register +class PETR_DeformableDetrTransformerDecoder(PETR_DeformableTransformerDecoder): + """Implements the decoder in DETR transformer. + + Args: + return_intermediate (bool): Whether to return intermediate outputs. + coder_norm_cfg (dict): Config of last normalization layer. Default: + `LN`. + """ + + def __init__(self, *args, return_intermediate=False, **kwargs): + + super(PETR_DeformableDetrTransformerDecoder, self).__init__(*args, + **kwargs) + self.return_intermediate = return_intermediate + + def forward(self, + query, + *args, + reference_points=None, + valid_ratios=None, + reg_branches=None, + **kwargs): + """Forward function for `TransformerDecoder`. + + Args: + query (Tensor): Input query with shape + `(num_query, bs, embed_dims)`. + reference_points (Tensor): The reference + points of offset. has shape + (bs, num_query, 4) when as_two_stage, + otherwise has shape ((bs, num_query, 2). + valid_ratios (Tensor): The radios of valid + points on the feature map, has shape + (bs, num_levels, 2) + reg_branch: (obj:`nn.LayerList`): Used for + refining the regression results. Only would + be passed when with_box_refine is True, + otherwise would be passed a `None`. + + Returns: + Tensor: Results with shape [1, num_query, bs, embed_dims] when + return_intermediate is `False`, otherwise it has shape + [num_layers, num_query, bs, embed_dims]. + """ + output = query + intermediate = [] + intermediate_reference_points = [] + for lid, layer in enumerate(self.layers): + if reference_points.shape[-1] == 4: + reference_points_input = reference_points[:, :, None] * \ + paddle.concat([valid_ratios, valid_ratios], -1)[:, None] + else: + assert reference_points.shape[-1] == 2 + reference_points_input = reference_points[:, :, None] * \ + valid_ratios[:, None] + output = layer( + output, + *args, + reference_points=reference_points_input, + **kwargs) + + if reg_branches is not None: + tmp = reg_branches[lid](output) + if reference_points.shape[-1] == 4: + new_reference_points = tmp + inverse_sigmoid( + reference_points) + new_reference_points = F.sigmoid(new_reference_points) + else: + assert reference_points.shape[-1] == 2 + new_reference_points = tmp + new_reference_points[..., :2] = tmp[ + ..., :2] + inverse_sigmoid(reference_points) + new_reference_points = F.sigmoid(new_reference_points) + reference_points = new_reference_points.detach() + + if self.return_intermediate: + intermediate.append(output) + intermediate_reference_points.append(reference_points) + + if self.return_intermediate: + return paddle.stack(intermediate), paddle.stack( + intermediate_reference_points) + + return output, reference_points + + +@register +class PETRTransformer(nn.Layer): + """Implements the PETR transformer. + + Args: + as_two_stage (bool): Generate query from encoder features. + Default: False. + num_feature_levels (int): Number of feature maps from FPN: + Default: 4. + two_stage_num_proposals (int): Number of proposals when set + `as_two_stage` as True. Default: 300. + """ + __inject__ = ["encoder", "decoder", "hm_encoder", "refine_decoder"] + + def __init__(self, + encoder="", + decoder="", + hm_encoder="", + refine_decoder="", + as_two_stage=True, + num_feature_levels=4, + two_stage_num_proposals=300, + num_keypoints=17, + **kwargs): + super(PETRTransformer, self).__init__(**kwargs) + self.as_two_stage = as_two_stage + self.num_feature_levels = num_feature_levels + self.two_stage_num_proposals = two_stage_num_proposals + self.num_keypoints = num_keypoints + self.encoder = encoder + self.decoder = decoder + self.embed_dims = self.encoder.embed_dims + self.hm_encoder = hm_encoder + self.refine_decoder = refine_decoder + self.init_layers() + self.init_weights() + + def init_layers(self): + """Initialize layers of the DeformableDetrTransformer.""" + #paddle.create_parameter + self.level_embeds = paddle.create_parameter( + (self.num_feature_levels, self.embed_dims), dtype="float32") + + if self.as_two_stage: + self.enc_output = nn.Linear(self.embed_dims, self.embed_dims) + self.enc_output_norm = nn.LayerNorm(self.embed_dims) + self.refine_query_embedding = nn.Embedding(self.num_keypoints, + self.embed_dims * 2) + else: + self.reference_points = nn.Linear(self.embed_dims, + 2 * self.num_keypoints) + + def init_weights(self): + """Initialize the transformer weights.""" + for p in self.parameters(): + if p.rank() > 1: + xavier_uniform_(p) + if hasattr(p, 'bias') and p.bias is not None: + constant_(p.bais) + for m in self.sublayers(): + if isinstance(m, MSDeformableAttention): + m._reset_parameters() + for m in self.sublayers(): + if isinstance(m, MultiScaleDeformablePoseAttention): + m.init_weights() + if not self.as_two_stage: + xavier_uniform_(self.reference_points.weight) + constant_(self.reference_points.bias) + normal_(self.level_embeds) + normal_(self.refine_query_embedding.weight) + + def gen_encoder_output_proposals(self, memory, memory_padding_mask, + spatial_shapes): + """Generate proposals from encoded memory. + + Args: + memory (Tensor): The output of encoder, has shape + (bs, num_key, embed_dim). num_key is equal the number of points + on feature map from all level. + memory_padding_mask (Tensor): Padding mask for memory. + has shape (bs, num_key). + spatial_shapes (Tensor): The shape of all feature maps. + has shape (num_level, 2). + + Returns: + tuple: A tuple of feature map and bbox prediction. + + - output_memory (Tensor): The input of decoder, has shape + (bs, num_key, embed_dim). num_key is equal the number of + points on feature map from all levels. + - output_proposals (Tensor): The normalized proposal + after a inverse sigmoid, has shape (bs, num_keys, 4). + """ + + N, S, C = memory.shape + proposals = [] + _cur = 0 + for lvl, (H, W) in enumerate(spatial_shapes): + mask_flatten_ = memory_padding_mask[:, _cur:(_cur + H * W)].reshape( + [N, H, W, 1]) + valid_H = paddle.sum(mask_flatten_[:, :, 0, 0], 1) + valid_W = paddle.sum(mask_flatten_[:, 0, :, 0], 1) + + grid_y, grid_x = paddle.meshgrid( + paddle.linspace( + 0, H - 1, H, dtype="float32"), + paddle.linspace( + 0, W - 1, W, dtype="float32")) + grid = paddle.concat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], + -1) + + scale = paddle.concat( + [valid_W.unsqueeze(-1), + valid_H.unsqueeze(-1)], 1).reshape([N, 1, 1, 2]) + grid = (grid.unsqueeze(0).expand((N, -1, -1, -1)) + 0.5) / scale + proposal = grid.reshape([N, -1, 2]) + proposals.append(proposal) + _cur += (H * W) + output_proposals = paddle.concat(proposals, 1) + output_proposals_valid = ((output_proposals > 0.01) & + (output_proposals < 0.99)).all( + -1, keepdim=True).astype("bool") + output_proposals = paddle.log(output_proposals / (1 - output_proposals)) + output_proposals = masked_fill( + output_proposals, ~memory_padding_mask.astype("bool").unsqueeze(-1), + float('inf')) + output_proposals = masked_fill(output_proposals, + ~output_proposals_valid, float('inf')) + + output_memory = memory + output_memory = masked_fill( + output_memory, ~memory_padding_mask.astype("bool").unsqueeze(-1), + float(0)) + output_memory = masked_fill(output_memory, ~output_proposals_valid, + float(0)) + output_memory = self.enc_output_norm(self.enc_output(output_memory)) + return output_memory, output_proposals + + @staticmethod + def get_reference_points(spatial_shapes, valid_ratios): + """Get the reference points used in decoder. + + Args: + spatial_shapes (Tensor): The shape of all feature maps, + has shape (num_level, 2). + valid_ratios (Tensor): The radios of valid points on the + feature map, has shape (bs, num_levels, 2). + + Returns: + Tensor: reference points used in decoder, has \ + shape (bs, num_keys, num_levels, 2). + """ + reference_points_list = [] + for lvl, (H, W) in enumerate(spatial_shapes): + ref_y, ref_x = paddle.meshgrid( + paddle.linspace( + 0.5, H - 0.5, H, dtype="float32"), + paddle.linspace( + 0.5, W - 0.5, W, dtype="float32")) + ref_y = ref_y.reshape( + (-1, ))[None] / (valid_ratios[:, None, lvl, 1] * H) + ref_x = ref_x.reshape( + (-1, ))[None] / (valid_ratios[:, None, lvl, 0] * W) + ref = paddle.stack((ref_x, ref_y), -1) + reference_points_list.append(ref) + reference_points = paddle.concat(reference_points_list, 1) + reference_points = reference_points[:, :, None] * valid_ratios[:, None] + return reference_points + + def get_valid_ratio(self, mask): + """Get the valid radios of feature maps of all level.""" + _, H, W = mask.shape + valid_H = paddle.sum(mask[:, :, 0].astype('float'), 1) + valid_W = paddle.sum(mask[:, 0, :].astype('float'), 1) + valid_ratio_h = valid_H.astype('float') / H + valid_ratio_w = valid_W.astype('float') / W + valid_ratio = paddle.stack([valid_ratio_w, valid_ratio_h], -1) + return valid_ratio + + def get_proposal_pos_embed(self, + proposals, + num_pos_feats=128, + temperature=10000): + """Get the position embedding of proposal.""" + scale = 2 * math.pi + dim_t = paddle.arange(num_pos_feats, dtype="float32") + dim_t = temperature**(2 * (dim_t // 2) / num_pos_feats) + # N, L, 4 + proposals = F.sigmoid(proposals) * scale + # N, L, 4, 128 + pos = proposals[:, :, :, None] / dim_t + # N, L, 4, 64, 2 + pos = paddle.stack( + (pos[:, :, :, 0::2].sin(), pos[:, :, :, 1::2].cos()), + axis=4).flatten(2) + return pos + + def forward(self, + mlvl_feats, + mlvl_masks, + query_embed, + mlvl_pos_embeds, + kpt_branches=None, + cls_branches=None): + """Forward function for `Transformer`. + + Args: + mlvl_feats (list(Tensor)): Input queries from different level. + Each element has shape [bs, embed_dims, h, w]. + mlvl_masks (list(Tensor)): The key_padding_mask from different + level used for encoder and decoder, each element has shape + [bs, h, w]. + query_embed (Tensor): The query embedding for decoder, + with shape [num_query, c]. + mlvl_pos_embeds (list(Tensor)): The positional encoding + of feats from different level, has the shape + [bs, embed_dims, h, w]. + kpt_branches (obj:`nn.LayerList`): Keypoint Regression heads for + feature maps from each decoder layer. Only would be passed when + `with_box_refine` is Ture. Default to None. + cls_branches (obj:`nn.LayerList`): Classification heads for + feature maps from each decoder layer. Only would be passed when + `as_two_stage` is Ture. Default to None. + + Returns: + tuple[Tensor]: results of decoder containing the following tensor. + + - inter_states: Outputs from decoder. If + `return_intermediate_dec` is True output has shape \ + (num_dec_layers, bs, num_query, embed_dims), else has \ + shape (1, bs, num_query, embed_dims). + - init_reference_out: The initial value of reference \ + points, has shape (bs, num_queries, 4). + - inter_references_out: The internal value of reference \ + points in decoder, has shape \ + (num_dec_layers, bs,num_query, embed_dims) + - enc_outputs_class: The classification score of proposals \ + generated from encoder's feature maps, has shape \ + (batch, h*w, num_classes). \ + Only would be returned when `as_two_stage` is True, \ + otherwise None. + - enc_outputs_kpt_unact: The regression results generated from \ + encoder's feature maps., has shape (batch, h*w, K*2). + Only would be returned when `as_two_stage` is True, \ + otherwise None. + """ + assert self.as_two_stage or query_embed is not None + + feat_flatten = [] + mask_flatten = [] + lvl_pos_embed_flatten = [] + spatial_shapes = [] + for lvl, (feat, mask, pos_embed + ) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + bs, c, h, w = feat.shape + spatial_shape = (h, w) + spatial_shapes.append(spatial_shape) + feat = feat.flatten(2).transpose((0, 2, 1)) + mask = mask.flatten(1) + pos_embed = pos_embed.flatten(2).transpose((0, 2, 1)) + lvl_pos_embed = pos_embed + self.level_embeds[lvl].reshape( + [1, 1, -1]) + lvl_pos_embed_flatten.append(lvl_pos_embed) + feat_flatten.append(feat) + mask_flatten.append(mask) + feat_flatten = paddle.concat(feat_flatten, 1) + mask_flatten = paddle.concat(mask_flatten, 1) + lvl_pos_embed_flatten = paddle.concat(lvl_pos_embed_flatten, 1) + spatial_shapes_cumsum = paddle.to_tensor( + np.array(spatial_shapes).prod(1).cumsum(0)) + spatial_shapes = paddle.to_tensor(spatial_shapes, dtype="int64") + level_start_index = paddle.concat((paddle.zeros( + (1, ), dtype=spatial_shapes.dtype), spatial_shapes_cumsum[:-1])) + valid_ratios = paddle.stack( + [self.get_valid_ratio(m) for m in mlvl_masks], 1) + + reference_points = \ + self.get_reference_points(spatial_shapes, + valid_ratios) + + memory = self.encoder( + src=feat_flatten, + pos_embed=lvl_pos_embed_flatten, + src_mask=mask_flatten, + value_spatial_shapes=spatial_shapes, + reference_points=reference_points, + value_level_start_index=level_start_index, + valid_ratios=valid_ratios) + + bs, _, c = memory.shape + + hm_proto = None + if self.training: + hm_memory = paddle.slice( + memory, + starts=level_start_index[0], + ends=level_start_index[1], + axes=[1]) + hm_pos_embed = paddle.slice( + lvl_pos_embed_flatten, + starts=level_start_index[0], + ends=level_start_index[1], + axes=[1]) + hm_mask = paddle.slice( + mask_flatten, + starts=level_start_index[0], + ends=level_start_index[1], + axes=[1]) + hm_reference_points = paddle.slice( + reference_points, + starts=level_start_index[0], + ends=level_start_index[1], + axes=[1])[:, :, :1, :] + + # official code make a mistake of pos_embed to pose_embed, which disable pos_embed + hm_memory = self.hm_encoder( + src=hm_memory, + pose_embed=hm_pos_embed, + src_mask=hm_mask, + value_spatial_shapes=spatial_shapes[[0]], + reference_points=hm_reference_points, + value_level_start_index=level_start_index[0], + valid_ratios=valid_ratios[:, :1, :]) + hm_memory = hm_memory.reshape((bs, spatial_shapes[0, 0], + spatial_shapes[0, 1], -1)) + hm_proto = (hm_memory, mlvl_masks[0]) + + if self.as_two_stage: + output_memory, output_proposals = \ + self.gen_encoder_output_proposals( + memory, mask_flatten, spatial_shapes) + enc_outputs_class = cls_branches[self.decoder.num_layers]( + output_memory) + enc_outputs_kpt_unact = \ + kpt_branches[self.decoder.num_layers](output_memory) + enc_outputs_kpt_unact[..., 0::2] += output_proposals[..., 0:1] + enc_outputs_kpt_unact[..., 1::2] += output_proposals[..., 1:2] + + topk = self.two_stage_num_proposals + topk_proposals = paddle.topk( + enc_outputs_class[..., 0], topk, axis=1)[1].unsqueeze(-1) + + #paddle.take_along_axis 对应torch.gather + topk_kpts_unact = paddle.take_along_axis(enc_outputs_kpt_unact, + topk_proposals, 1) + topk_kpts_unact = topk_kpts_unact.detach() + + reference_points = F.sigmoid(topk_kpts_unact) + init_reference_out = reference_points + # learnable query and query_pos + query_pos, query = paddle.split( + query_embed, query_embed.shape[1] // c, axis=1) + query_pos = query_pos.unsqueeze(0).expand((bs, -1, -1)) + query = query.unsqueeze(0).expand((bs, -1, -1)) + else: + query_pos, query = paddle.split( + query_embed, query_embed.shape[1] // c, axis=1) + query_pos = query_pos.unsqueeze(0).expand((bs, -1, -1)) + query = query.unsqueeze(0).expand((bs, -1, -1)) + reference_points = F.sigmoid(self.reference_points(query_pos)) + init_reference_out = reference_points + + # decoder + inter_states, inter_references = self.decoder( + query=query, + memory=memory, + query_pos_embed=query_pos, + memory_mask=mask_flatten, + reference_points=reference_points, + value_spatial_shapes=spatial_shapes, + value_level_start_index=level_start_index, + valid_ratios=valid_ratios, + kpt_branches=kpt_branches) + + inter_references_out = inter_references + if self.as_two_stage: + return inter_states, init_reference_out, \ + inter_references_out, enc_outputs_class, \ + enc_outputs_kpt_unact, hm_proto, memory + return inter_states, init_reference_out, \ + inter_references_out, None, None, None, None, None, hm_proto + + def forward_refine(self, + mlvl_masks, + memory, + reference_points_pose, + img_inds, + kpt_branches=None, + **kwargs): + mask_flatten = [] + spatial_shapes = [] + for lvl, mask in enumerate(mlvl_masks): + bs, h, w = mask.shape + spatial_shape = (h, w) + spatial_shapes.append(spatial_shape) + mask = mask.flatten(1) + mask_flatten.append(mask) + mask_flatten = paddle.concat(mask_flatten, 1) + spatial_shapes_cumsum = paddle.to_tensor( + np.array( + spatial_shapes, dtype='int64').prod(1).cumsum(0)) + spatial_shapes = paddle.to_tensor(spatial_shapes, dtype="int64") + level_start_index = paddle.concat((paddle.zeros( + (1, ), dtype=spatial_shapes.dtype), spatial_shapes_cumsum[:-1])) + valid_ratios = paddle.stack( + [self.get_valid_ratio(m) for m in mlvl_masks], 1) + + # pose refinement (17 queries corresponding to 17 keypoints) + # learnable query and query_pos + refine_query_embedding = self.refine_query_embedding.weight + query_pos, query = paddle.split(refine_query_embedding, 2, axis=1) + pos_num = reference_points_pose.shape[0] + query_pos = query_pos.unsqueeze(0).expand((pos_num, -1, -1)) + query = query.unsqueeze(0).expand((pos_num, -1, -1)) + reference_points = reference_points_pose.reshape( + (pos_num, reference_points_pose.shape[1] // 2, 2)) + pos_memory = memory[img_inds] + mask_flatten = mask_flatten[img_inds] + valid_ratios = valid_ratios[img_inds] + if img_inds.size == 1: + pos_memory = pos_memory.unsqueeze(0) + mask_flatten = mask_flatten.unsqueeze(0) + valid_ratios = valid_ratios.unsqueeze(0) + inter_states, inter_references = self.refine_decoder( + query=query, + memory=pos_memory, + query_pos_embed=query_pos, + memory_mask=mask_flatten, + reference_points=reference_points, + value_spatial_shapes=spatial_shapes, + value_level_start_index=level_start_index, + valid_ratios=valid_ratios, + reg_branches=kpt_branches, + **kwargs) + # [num_decoder, num_query, bs, embed_dim] + + init_reference_out = reference_points + return inter_states, init_reference_out, inter_references diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/position_encoding.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/position_encoding.py new file mode 100644 index 0000000000000000000000000000000000000000..a2c3260974251295f93b22cd145d5a170d63b2ad --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/position_encoding.py @@ -0,0 +1,100 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from DETR (https://github.com/facebookresearch/detr) +# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register, serializable + + +@register +@serializable +class PositionEmbedding(nn.Layer): + def __init__(self, + num_pos_feats=128, + temperature=10000, + normalize=True, + scale=2 * math.pi, + embed_type='sine', + num_embeddings=50, + offset=0., + eps=1e-6): + super(PositionEmbedding, self).__init__() + assert embed_type in ['sine', 'learned'] + + self.embed_type = embed_type + self.offset = offset + self.eps = eps + if self.embed_type == 'sine': + self.num_pos_feats = num_pos_feats + self.temperature = temperature + self.normalize = normalize + self.scale = scale + elif self.embed_type == 'learned': + self.row_embed = nn.Embedding(num_embeddings, num_pos_feats) + self.col_embed = nn.Embedding(num_embeddings, num_pos_feats) + else: + raise ValueError(f"{self.embed_type} is not supported.") + + def forward(self, mask): + """ + Args: + mask (Tensor): [B, H, W] + Returns: + pos (Tensor): [B, H, W, C] + """ + if self.embed_type == 'sine': + y_embed = mask.cumsum(1) + x_embed = mask.cumsum(2) + if self.normalize: + y_embed = (y_embed + self.offset) / ( + y_embed[:, -1:, :] + self.eps) * self.scale + x_embed = (x_embed + self.offset) / ( + x_embed[:, :, -1:] + self.eps) * self.scale + + dim_t = 2 * (paddle.arange(self.num_pos_feats) // + 2).astype('float32') + dim_t = self.temperature**(dim_t / self.num_pos_feats) + + pos_x = x_embed.unsqueeze(-1) / dim_t + pos_y = y_embed.unsqueeze(-1) / dim_t + pos_x = paddle.stack( + (pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), + axis=4).flatten(3) + pos_y = paddle.stack( + (pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), + axis=4).flatten(3) + return paddle.concat((pos_y, pos_x), axis=3) + elif self.embed_type == 'learned': + h, w = mask.shape[-2:] + i = paddle.arange(w) + j = paddle.arange(h) + x_emb = self.col_embed(i) + y_emb = self.row_embed(j) + return paddle.concat( + [ + x_emb.unsqueeze(0).tile([h, 1, 1]), + y_emb.unsqueeze(1).tile([1, w, 1]), + ], + axis=-1).unsqueeze(0) + else: + raise ValueError(f"not supported {self.embed_type}") diff --git a/PaddleDetection-release-2.6/ppdet/modeling/transformers/utils.py b/PaddleDetection-release-2.6/ppdet/modeling/transformers/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d8f869fbc27a807af88f2d4de262774a9f8638ce --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/modeling/transformers/utils.py @@ -0,0 +1,265 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Modified from DETR (https://github.com/facebookresearch/detr) +# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved +# Modified from detrex (https://github.com/IDEA-Research/detrex) +# Copyright 2022 The IDEA Authors. All rights reserved. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import copy +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ..bbox_utils import bbox_overlaps + +__all__ = [ + '_get_clones', 'bbox_overlaps', 'bbox_cxcywh_to_xyxy', + 'bbox_xyxy_to_cxcywh', 'sigmoid_focal_loss', 'inverse_sigmoid', + 'deformable_attention_core_func' +] + + +def _get_clones(module, N): + return nn.LayerList([copy.deepcopy(module) for _ in range(N)]) + + +def bbox_cxcywh_to_xyxy(x): + cxcy, wh = paddle.split(x, 2, axis=-1) + return paddle.concat([cxcy - 0.5 * wh, cxcy + 0.5 * wh], axis=-1) + + +def bbox_xyxy_to_cxcywh(x): + x1, y1, x2, y2 = x.split(4, axis=-1) + return paddle.concat( + [(x1 + x2) / 2, (y1 + y2) / 2, (x2 - x1), (y2 - y1)], axis=-1) + + +def sigmoid_focal_loss(logit, label, normalizer=1.0, alpha=0.25, gamma=2.0): + prob = F.sigmoid(logit) + ce_loss = F.binary_cross_entropy_with_logits(logit, label, reduction="none") + p_t = prob * label + (1 - prob) * (1 - label) + loss = ce_loss * ((1 - p_t)**gamma) + + if alpha >= 0: + alpha_t = alpha * label + (1 - alpha) * (1 - label) + loss = alpha_t * loss + return loss.mean(1).sum() / normalizer + + +def inverse_sigmoid(x, eps=1e-6): + x = x.clip(min=0., max=1.) + return paddle.log(x / (1 - x + eps) + eps) + + +def deformable_attention_core_func(value, value_spatial_shapes, + value_level_start_index, sampling_locations, + attention_weights): + """ + Args: + value (Tensor): [bs, value_length, n_head, c] + value_spatial_shapes (Tensor): [n_levels, 2] + value_level_start_index (Tensor): [n_levels] + sampling_locations (Tensor): [bs, query_length, n_head, n_levels, n_points, 2] + attention_weights (Tensor): [bs, query_length, n_head, n_levels, n_points] + + Returns: + output (Tensor): [bs, Length_{query}, C] + """ + bs, _, n_head, c = value.shape + _, Len_q, _, n_levels, n_points, _ = sampling_locations.shape + + value_list = value.split( + value_spatial_shapes.prod(1).split(n_levels), axis=1) + sampling_grids = 2 * sampling_locations - 1 + sampling_value_list = [] + for level, (h, w) in enumerate(value_spatial_shapes): + # N_, H_*W_, M_, D_ -> N_, H_*W_, M_*D_ -> N_, M_*D_, H_*W_ -> N_*M_, D_, H_, W_ + value_l_ = value_list[level].flatten(2).transpose( + [0, 2, 1]).reshape([bs * n_head, c, h, w]) + # N_, Lq_, M_, P_, 2 -> N_, M_, Lq_, P_, 2 -> N_*M_, Lq_, P_, 2 + sampling_grid_l_ = sampling_grids[:, :, :, level].transpose( + [0, 2, 1, 3, 4]).flatten(0, 1) + # N_*M_, D_, Lq_, P_ + sampling_value_l_ = F.grid_sample( + value_l_, + sampling_grid_l_, + mode='bilinear', + padding_mode='zeros', + align_corners=False) + sampling_value_list.append(sampling_value_l_) + # (N_, Lq_, M_, L_, P_) -> (N_, M_, Lq_, L_, P_) -> (N_*M_, 1, Lq_, L_*P_) + attention_weights = attention_weights.transpose([0, 2, 1, 3, 4]).reshape( + [bs * n_head, 1, Len_q, n_levels * n_points]) + output = (paddle.stack( + sampling_value_list, axis=-2).flatten(-2) * + attention_weights).sum(-1).reshape([bs, n_head * c, Len_q]) + + return output.transpose([0, 2, 1]) + + +def get_valid_ratio(mask): + _, H, W = paddle.shape(mask) + valid_ratio_h = paddle.sum(mask[:, :, 0], 1) / H + valid_ratio_w = paddle.sum(mask[:, 0, :], 1) / W + # [b, 2] + return paddle.stack([valid_ratio_w, valid_ratio_h], -1) + + +def get_contrastive_denoising_training_group(targets, + num_classes, + num_queries, + class_embed, + num_denoising=100, + label_noise_ratio=0.5, + box_noise_scale=1.0): + if num_denoising <= 0: + return None, None, None, None + num_gts = [len(t) for t in targets["gt_class"]] + max_gt_num = max(num_gts) + if max_gt_num == 0: + return None, None, None, None + + num_group = num_denoising // max_gt_num + num_group = 1 if num_group == 0 else num_group + # pad gt to max_num of a batch + bs = len(targets["gt_class"]) + input_query_class = paddle.full( + [bs, max_gt_num], num_classes, dtype='int32') + input_query_bbox = paddle.zeros([bs, max_gt_num, 4]) + pad_gt_mask = paddle.zeros([bs, max_gt_num]) + for i in range(bs): + num_gt = num_gts[i] + if num_gt > 0: + input_query_class[i, :num_gt] = targets["gt_class"][i].squeeze(-1) + input_query_bbox[i, :num_gt] = targets["gt_bbox"][i] + pad_gt_mask[i, :num_gt] = 1 + # each group has positive and negative queries. + input_query_class = input_query_class.tile([1, 2 * num_group]) + input_query_bbox = input_query_bbox.tile([1, 2 * num_group, 1]) + pad_gt_mask = pad_gt_mask.tile([1, 2 * num_group]) + # positive and negative mask + negative_gt_mask = paddle.zeros([bs, max_gt_num * 2, 1]) + negative_gt_mask[:, max_gt_num:] = 1 + negative_gt_mask = negative_gt_mask.tile([1, num_group, 1]) + positive_gt_mask = 1 - negative_gt_mask + # contrastive denoising training positive index + positive_gt_mask = positive_gt_mask.squeeze(-1) * pad_gt_mask + dn_positive_idx = paddle.nonzero(positive_gt_mask)[:, 1] + dn_positive_idx = paddle.split(dn_positive_idx, + [n * num_group for n in num_gts]) + # total denoising queries + num_denoising = int(max_gt_num * 2 * num_group) + + if label_noise_ratio > 0: + input_query_class = input_query_class.flatten() + pad_gt_mask = pad_gt_mask.flatten() + # half of bbox prob + mask = paddle.rand(input_query_class.shape) < (label_noise_ratio * 0.5) + chosen_idx = paddle.nonzero(mask * pad_gt_mask).squeeze(-1) + # randomly put a new one here + new_label = paddle.randint_like( + chosen_idx, 0, num_classes, dtype=input_query_class.dtype) + input_query_class.scatter_(chosen_idx, new_label) + input_query_class.reshape_([bs, num_denoising]) + pad_gt_mask.reshape_([bs, num_denoising]) + + if box_noise_scale > 0: + known_bbox = bbox_cxcywh_to_xyxy(input_query_bbox) + + diff = paddle.tile(input_query_bbox[..., 2:] * 0.5, + [1, 1, 2]) * box_noise_scale + + rand_sign = paddle.randint_like(input_query_bbox, 0, 2) * 2.0 - 1.0 + rand_part = paddle.rand(input_query_bbox.shape) + rand_part = (rand_part + 1.0) * negative_gt_mask + rand_part * ( + 1 - negative_gt_mask) + rand_part *= rand_sign + known_bbox += rand_part * diff + known_bbox.clip_(min=0.0, max=1.0) + input_query_bbox = bbox_xyxy_to_cxcywh(known_bbox) + input_query_bbox = inverse_sigmoid(input_query_bbox) + + class_embed = paddle.concat( + [class_embed, paddle.zeros([1, class_embed.shape[-1]])]) + input_query_class = paddle.gather( + class_embed, input_query_class.flatten(), + axis=0).reshape([bs, num_denoising, -1]) + + tgt_size = num_denoising + num_queries + attn_mask = paddle.ones([tgt_size, tgt_size]) < 0 + # match query cannot see the reconstruct + attn_mask[num_denoising:, :num_denoising] = True + # reconstruct cannot see each other + for i in range(num_group): + if i == 0: + attn_mask[max_gt_num * 2 * i:max_gt_num * 2 * (i + 1), max_gt_num * + 2 * (i + 1):num_denoising] = True + if i == num_group - 1: + attn_mask[max_gt_num * 2 * i:max_gt_num * 2 * (i + 1), :max_gt_num * + i * 2] = True + else: + attn_mask[max_gt_num * 2 * i:max_gt_num * 2 * (i + 1), max_gt_num * + 2 * (i + 1):num_denoising] = True + attn_mask[max_gt_num * 2 * i:max_gt_num * 2 * (i + 1), :max_gt_num * + 2 * i] = True + attn_mask = ~attn_mask + dn_meta = { + "dn_positive_idx": dn_positive_idx, + "dn_num_group": num_group, + "dn_num_split": [num_denoising, num_queries] + } + + return input_query_class, input_query_bbox, attn_mask, dn_meta + + +def get_sine_pos_embed(pos_tensor, + num_pos_feats=128, + temperature=10000, + exchange_xy=True): + """generate sine position embedding from a position tensor + + Args: + pos_tensor (torch.Tensor): Shape as `(None, n)`. + num_pos_feats (int): projected shape for each float in the tensor. Default: 128 + temperature (int): The temperature used for scaling + the position embedding. Default: 10000. + exchange_xy (bool, optional): exchange pos x and pos y. \ + For example, input tensor is `[x, y]`, the results will # noqa + be `[pos(y), pos(x)]`. Defaults: True. + + Returns: + torch.Tensor: Returned position embedding # noqa + with shape `(None, n * num_pos_feats)`. + """ + scale = 2. * math.pi + dim_t = 2. * paddle.floor_divide( + paddle.arange(num_pos_feats), paddle.to_tensor(2)) + dim_t = scale / temperature**(dim_t / num_pos_feats) + + def sine_func(x): + x *= dim_t + return paddle.stack( + (x[:, :, 0::2].sin(), x[:, :, 1::2].cos()), axis=3).flatten(2) + + pos_res = [sine_func(x) for x in pos_tensor.split(pos_tensor.shape[-1], -1)] + if exchange_xy: + pos_res[0], pos_res[1] = pos_res[1], pos_res[0] + pos_res = paddle.concat(pos_res, axis=2) + return pos_res diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__init__.py b/PaddleDetection-release-2.6/ppdet/optimizer/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..aa690dc85029300c4b23fa2a0a27c1ef551c2ef6 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/optimizer/__init__.py @@ -0,0 +1,19 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import optimizer +from . import ema + +from .optimizer import * +from .ema import * diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..247cd755d64b8f83fd5a92634c5eb1ab84c4f14e Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/adamw.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/adamw.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..88eb651ba05151bab2cd22fcea783aca0b1d0733 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/adamw.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/ema.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/ema.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d10350992fc1daca136d5ea5d080ad3318d2b9b3 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/ema.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/optimizer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/optimizer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5f50e068344b00678d75657134f5309aa604ee78 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/optimizer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..44e05ed303ef425d44740836abec005e83afa73c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/optimizer/__pycache__/utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/adamw.py b/PaddleDetection-release-2.6/ppdet/optimizer/adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..6ecf676d63224c9e76df25893c83cf5b610e957a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/optimizer/adamw.py @@ -0,0 +1,272 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from paddle.optimizer import AdamW +from functools import partial +import re + +IS_PADDLE_LATER_2_4 = ( + int(paddle.version.major) >= 2 and + int(paddle.version.minor) >= 4) or int(paddle.version.major) == 0 + + +def layerwise_lr_decay(decay_rate, name_dict, n_layers, param): + """ + Args: + decay_rate (float): + The layer-wise decay ratio. + name_dict (dict): + The keys of name_dict is dynamic name of model while the value + of name_dict is static name. + Use model.named_parameters() to get name_dict. + n_layers (int): + Total number of layers in the transformer encoder. + """ + ratio = 1.0 + static_name = name_dict[param.name] + if 'blocks.' in static_name or 'layers.' in static_name: + idx_1 = static_name.find('blocks.') + idx_2 = static_name.find('layers.') + assert any([x >= 0 for x in [idx_1, idx_2]]), '' + idx = idx_1 if idx_1 >= 0 else idx_2 + # idx = re.findall('[blocks|layers]\.(\d+)\.', static_name)[0] + + layer = int(static_name[idx:].split('.')[1]) + ratio = decay_rate**(n_layers - layer) + + elif 'cls_token' in static_name or 'patch_embed' in static_name: + ratio = decay_rate**(n_layers + 1) + + if IS_PADDLE_LATER_2_4: + return ratio + else: + param.optimize_attr['learning_rate'] *= ratio + + +class AdamWDL(AdamW): + r""" + The AdamWDL optimizer is implemented based on the AdamW Optimization with dynamic lr setting. + Generally it's used for transformer model. + + We use "layerwise_lr_decay" as default dynamic lr setting method of AdamWDL. + “Layer-wise decay” means exponentially decaying the learning rates of individual + layers in a top-down manner. For example, suppose the 24-th layer uses a learning + rate l, and the Layer-wise decay rate is α, then the learning rate of layer m + is lα^(24-m). See more details on: https://arxiv.org/abs/1906.08237. + + .. math:: + & t = t + 1 + + & moment\_1\_out = {\beta}_1 * moment\_1 + (1 - {\beta}_1) * grad + + & moment\_2\_out = {\beta}_2 * moment\_2 + (1 - {\beta}_2) * grad * grad + + & learning\_rate = learning\_rate * \frac{\sqrt{1 - {\beta}_2^t}}{1 - {\beta}_1^t} + + & param\_out = param - learning\_rate * (\frac{moment\_1}{\sqrt{moment\_2} + \epsilon} + \lambda * param) + + Args: + learning_rate (float|LRScheduler, optional): The learning rate used to update ``Parameter``. + It can be a float value or a LRScheduler. The default value is 0.001. + beta1 (float, optional): The exponential decay rate for the 1st moment estimates. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 0.9. + beta2 (float, optional): The exponential decay rate for the 2nd moment estimates. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 0.999. + epsilon (float, optional): A small float value for numerical stability. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 1e-08. + parameters (list|tuple, optional): List/Tuple of ``Tensor`` to update to minimize ``loss``. \ + This parameter is required in dygraph mode. \ + The default value is None in static mode, at this time all parameters will be updated. + weight_decay (float, optional): The weight decay coefficient, it can be float or Tensor. The default value is 0.01. + apply_decay_param_fun (function|None, optional): If it is not None, + only tensors that makes apply_decay_param_fun(Tensor.name)==True + will be updated. It only works when we want to specify tensors. + Default: None. + grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of + some derived class of ``GradientClipBase`` . There are three cliping strategies + ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , + :ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there is no gradient clipping. + lazy_mode (bool, optional): The official Adam algorithm has two moving-average accumulators. + The accumulators are updated at every step. Every element of the two moving-average + is updated in both dense mode and sparse mode. If the size of parameter is very large, + then the update may be very slow. The lazy mode only update the element that has + gradient in current mini-batch, so it will be much more faster. But this mode has + different semantics with the original Adam algorithm and may lead to different result. + The default value is False. + multi_precision (bool, optional): Whether to use multi-precision during weight updating. Default is false. + layerwise_decay (float, optional): The layer-wise decay ratio. Defaults to 1.0. + n_layers (int, optional): The total number of encoder layers. Defaults to 12. + set_param_lr_fun (function|None, optional): If it's not None, set_param_lr_fun() will set the the parameter + learning rate before it executes Adam Operator. Defaults to :ref:`layerwise_lr_decay`. + name_dict (dict, optional): The keys of name_dict is dynamic name of model while the value + of name_dict is static name. Use model.named_parameters() to get name_dict. + name (str, optional): Normally there is no need for user to set this property. + For more information, please refer to :ref:`api_guide_Name`. + The default value is None. + + Examples: + .. code-block:: python + + import paddle + from paddlenlp.ops.optimizer import AdamWDL + def simple_lr_setting(decay_rate, name_dict, n_layers, param): + ratio = 1.0 + static_name = name_dict[param.name] + if "weight" in static_name: + ratio = decay_rate**0.5 + param.optimize_attr["learning_rate"] *= ratio + + linear = paddle.nn.Linear(10, 10) + + name_dict = dict() + for n, p in linear.named_parameters(): + name_dict[p.name] = n + + inp = paddle.rand([10,10], dtype="float32") + out = linear(inp) + loss = paddle.mean(out) + + adamwdl = AdamWDL( + learning_rate=1e-4, + parameters=linear.parameters(), + set_param_lr_fun=simple_lr_setting, + layerwise_decay=0.8, + name_dict=name_dict) + + loss.backward() + adamwdl.step() + adamwdl.clear_grad() + """ + + def __init__(self, + learning_rate=0.001, + beta1=0.9, + beta2=0.999, + epsilon=1e-8, + parameters=None, + weight_decay=0.01, + apply_decay_param_fun=None, + grad_clip=None, + lazy_mode=False, + multi_precision=False, + layerwise_decay=1.0, + n_layers=12, + set_param_lr_func=None, + name_dict=None, + name=None): + if not isinstance(layerwise_decay, float): + raise TypeError("coeff should be float or Tensor.") + self.layerwise_decay = layerwise_decay + self.n_layers = n_layers + self.set_param_lr_func = partial( + set_param_lr_func, layerwise_decay, name_dict, + n_layers) if set_param_lr_func is not None else set_param_lr_func + + if IS_PADDLE_LATER_2_4: + super(AdamWDL, self).__init__( + learning_rate=learning_rate, + parameters=parameters, + beta1=beta1, + beta2=beta2, + epsilon=epsilon, + grad_clip=grad_clip, + name=name, + apply_decay_param_fun=apply_decay_param_fun, + weight_decay=weight_decay, + lazy_mode=lazy_mode, + multi_precision=multi_precision, + lr_ratio=self.set_param_lr_func) + else: + super(AdamWDL, self).__init__( + learning_rate=learning_rate, + parameters=parameters, + beta1=beta1, + beta2=beta2, + epsilon=epsilon, + grad_clip=grad_clip, + name=name, + apply_decay_param_fun=apply_decay_param_fun, + weight_decay=weight_decay, + lazy_mode=lazy_mode, + multi_precision=multi_precision) + + +def _append_optimize_op(self, block, param_and_grad): + if self.set_param_lr_func is None: + return super(AdamWDL, self)._append_optimize_op(block, param_and_grad) + + self._append_decoupled_weight_decay(block, param_and_grad) + prev_lr = param_and_grad[0].optimize_attr["learning_rate"] + self.set_param_lr_func(param_and_grad[0]) + # excute Adam op + res = super(AdamW, self)._append_optimize_op(block, param_and_grad) + param_and_grad[0].optimize_attr["learning_rate"] = prev_lr + return res + + +if not IS_PADDLE_LATER_2_4: + AdamWDL._append_optimize_op = _append_optimize_op + + +def build_adamwdl(model, + lr=1e-4, + weight_decay=0.05, + betas=(0.9, 0.999), + layer_decay=0.65, + num_layers=None, + filter_bias_and_bn=True, + skip_decay_names=None, + set_param_lr_func='layerwise_lr_decay'): + + if skip_decay_names and filter_bias_and_bn: + decay_dict = { + param.name: not (len(param.shape) == 1 or name.endswith('.bias') or + any([_n in name for _n in skip_decay_names])) + for name, param in model.named_parameters() + } + parameters = [p for p in model.parameters()] + + else: + parameters = model.parameters() + + opt_args = dict( + parameters=parameters, learning_rate=lr, weight_decay=weight_decay) + + if decay_dict is not None: + opt_args['apply_decay_param_fun'] = lambda n: decay_dict[n] + + if isinstance(set_param_lr_func, str): + set_param_lr_func = eval(set_param_lr_func) + opt_args['set_param_lr_func'] = set_param_lr_func + + opt_args['beta1'] = betas[0] + opt_args['beta2'] = betas[1] + + opt_args['layerwise_decay'] = layer_decay + name_dict = {p.name: n for n, p in model.named_parameters()} + + opt_args['name_dict'] = name_dict + opt_args['n_layers'] = num_layers + + optimizer = AdamWDL(**opt_args) + + return optimizer diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/ema.py b/PaddleDetection-release-2.6/ppdet/optimizer/ema.py new file mode 100644 index 0000000000000000000000000000000000000000..9cd9dca637998f4701bfe77fe317240fee26fe71 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/optimizer/ema.py @@ -0,0 +1,193 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import weakref +from copy import deepcopy + +from .utils import get_bn_running_state_names + +__all__ = ['ModelEMA', 'SimpleModelEMA'] + + +class ModelEMA(object): + """ + Exponential Weighted Average for Deep Neutal Networks + Args: + model (nn.Layer): Detector of model. + decay (int): The decay used for updating ema parameter. + Ema's parameter are updated with the formula: + `ema_param = decay * ema_param + (1 - decay) * cur_param`. + Defaults is 0.9998. + ema_decay_type (str): type in ['threshold', 'normal', 'exponential'], + 'threshold' as default. + cycle_epoch (int): The epoch of interval to reset ema_param and + step. Defaults is -1, which means not reset. Its function is to + add a regular effect to ema, which is set according to experience + and is effective when the total training epoch is large. + ema_black_list (set|list|tuple, optional): The custom EMA black_list. + Blacklist of weight names that will not participate in EMA + calculation. Default: None. + """ + + def __init__(self, + model, + decay=0.9998, + ema_decay_type='threshold', + cycle_epoch=-1, + ema_black_list=None, + ema_filter_no_grad=False): + self.step = 0 + self.epoch = 0 + self.decay = decay + self.ema_decay_type = ema_decay_type + self.cycle_epoch = cycle_epoch + self.ema_black_list = self._match_ema_black_list( + model.state_dict().keys(), ema_black_list) + self.state_dict = dict() + for k, v in model.state_dict().items(): + if k in self.ema_black_list: + self.state_dict[k] = v + else: + self.state_dict[k] = paddle.zeros_like(v) + + bn_states_names = get_bn_running_state_names(model) + if ema_filter_no_grad: + for n, p in model.named_parameters(): + if p.stop_gradient == True and n not in bn_states_names: + self.ema_black_list.append(n) + + self._model_state = { + k: weakref.ref(p) + for k, p in model.state_dict().items() + } + + def reset(self): + self.step = 0 + self.epoch = 0 + for k, v in self.state_dict.items(): + if k in self.ema_black_list: + self.state_dict[k] = v + else: + self.state_dict[k] = paddle.zeros_like(v) + + def resume(self, state_dict, step=0): + for k, v in state_dict.items(): + if k in self.state_dict: + if self.state_dict[k].dtype == v.dtype: + self.state_dict[k] = v + else: + self.state_dict[k] = v.astype(self.state_dict[k].dtype) + self.step = step + + def update(self, model=None): + if self.ema_decay_type == 'threshold': + decay = min(self.decay, (1 + self.step) / (10 + self.step)) + elif self.ema_decay_type == 'exponential': + decay = self.decay * (1 - math.exp(-(self.step + 1) / 2000)) + else: + decay = self.decay + self._decay = decay + + if model is not None: + model_dict = model.state_dict() + else: + model_dict = {k: p() for k, p in self._model_state.items()} + assert all( + [v is not None for _, v in model_dict.items()]), 'python gc.' + + for k, v in self.state_dict.items(): + if k not in self.ema_black_list: + v = decay * v + (1 - decay) * model_dict[k] + v.stop_gradient = True + self.state_dict[k] = v + self.step += 1 + + def apply(self): + if self.step == 0: + return self.state_dict + state_dict = dict() + for k, v in self.state_dict.items(): + if k in self.ema_black_list: + v.stop_gradient = True + state_dict[k] = v + else: + if self.ema_decay_type != 'exponential': + v = v / (1 - self._decay**self.step) + v.stop_gradient = True + state_dict[k] = v + self.epoch += 1 + if self.cycle_epoch > 0 and self.epoch == self.cycle_epoch: + self.reset() + + return state_dict + + def _match_ema_black_list(self, weight_name, ema_black_list=None): + out_list = set() + if ema_black_list: + for name in weight_name: + for key in ema_black_list: + if key in name: + out_list.add(name) + return out_list + + +class SimpleModelEMA(object): + """ + Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models + Keep a moving average of everything in the model state_dict (parameters and buffers). + This is intended to allow functionality like + https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage + A smoothed version of the weights is necessary for some training schemes to perform well. + This class is sensitive where it is initialized in the sequence of model init, + GPU assignment and distributed training wrappers. + """ + + def __init__(self, model=None, decay=0.9996): + """ + Args: + model (nn.Module): model to apply EMA. + decay (float): ema decay reate. + """ + self.model = deepcopy(model) + self.decay = decay + + def update(self, model, decay=None): + if decay is None: + decay = self.decay + + with paddle.no_grad(): + state = {} + msd = model.state_dict() + for k, v in self.model.state_dict().items(): + if paddle.is_floating_point(v): + v *= decay + v += (1.0 - decay) * msd[k].detach() + state[k] = v + self.model.set_state_dict(state) + + def resume(self, state_dict, step=0): + state = {} + msd = state_dict + for k, v in self.model.state_dict().items(): + if paddle.is_floating_point(v): + v = msd[k].detach() + state[k] = v + self.model.set_state_dict(state) + self.step = step diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/optimizer.py b/PaddleDetection-release-2.6/ppdet/optimizer/optimizer.py new file mode 100644 index 0000000000000000000000000000000000000000..2d0714078eec14dadd57f5689ae6a41039562202 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/optimizer/optimizer.py @@ -0,0 +1,355 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import sys +import math +import paddle +import paddle.nn as nn + +import paddle.optimizer as optimizer +import paddle.regularizer as regularizer + +from ppdet.core.workspace import register, serializable +import copy + +from .adamw import AdamWDL, build_adamwdl + +__all__ = ['LearningRate', 'OptimizerBuilder'] + +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@serializable +class CosineDecay(object): + """ + Cosine learning rate decay + + Args: + max_epochs (int): max epochs for the training process. + if you commbine cosine decay with warmup, it is recommended that + the max_iters is much larger than the warmup iter + use_warmup (bool): whether to use warmup. Default: True. + min_lr_ratio (float): minimum learning rate ratio. Default: 0. + last_plateau_epochs (int): use minimum learning rate in + the last few epochs. Default: 0. + """ + + def __init__(self, + max_epochs=1000, + use_warmup=True, + min_lr_ratio=0., + last_plateau_epochs=0): + self.max_epochs = max_epochs + self.use_warmup = use_warmup + self.min_lr_ratio = min_lr_ratio + self.last_plateau_epochs = last_plateau_epochs + + def __call__(self, + base_lr=None, + boundary=None, + value=None, + step_per_epoch=None): + assert base_lr is not None, "either base LR or values should be provided" + + max_iters = self.max_epochs * int(step_per_epoch) + last_plateau_iters = self.last_plateau_epochs * int(step_per_epoch) + min_lr = base_lr * self.min_lr_ratio + if boundary is not None and value is not None and self.use_warmup: + # use warmup + warmup_iters = len(boundary) + for i in range(int(boundary[-1]), max_iters): + boundary.append(i) + if i < max_iters - last_plateau_iters: + decayed_lr = min_lr + (base_lr - min_lr) * 0.5 * (math.cos( + (i - warmup_iters) * math.pi / + (max_iters - warmup_iters - last_plateau_iters)) + 1) + value.append(decayed_lr) + else: + value.append(min_lr) + return optimizer.lr.PiecewiseDecay(boundary, value) + elif last_plateau_iters > 0: + # not use warmup, but set `last_plateau_epochs` > 0 + boundary = [] + value = [] + for i in range(max_iters): + if i < max_iters - last_plateau_iters: + decayed_lr = min_lr + (base_lr - min_lr) * 0.5 * (math.cos( + i * math.pi / (max_iters - last_plateau_iters)) + 1) + value.append(decayed_lr) + else: + value.append(min_lr) + if i > 0: + boundary.append(i) + return optimizer.lr.PiecewiseDecay(boundary, value) + + return optimizer.lr.CosineAnnealingDecay( + base_lr, T_max=max_iters, eta_min=min_lr) + + +@serializable +class PiecewiseDecay(object): + """ + Multi step learning rate decay + + Args: + gamma (float | list): decay factor + milestones (list): steps at which to decay learning rate + """ + + def __init__(self, + gamma=[0.1, 0.01], + milestones=[8, 11], + values=None, + use_warmup=True): + super(PiecewiseDecay, self).__init__() + if type(gamma) is not list: + self.gamma = [] + for i in range(len(milestones)): + self.gamma.append(gamma / 10**i) + else: + self.gamma = gamma + self.milestones = milestones + self.values = values + self.use_warmup = use_warmup + + def __call__(self, + base_lr=None, + boundary=None, + value=None, + step_per_epoch=None): + if boundary is not None and self.use_warmup: + boundary.extend([int(step_per_epoch) * i for i in self.milestones]) + else: + # do not use LinearWarmup + boundary = [int(step_per_epoch) * i for i in self.milestones] + value = [base_lr] # during step[0, boundary[0]] is base_lr + + # self.values is setted directly in config + if self.values is not None: + assert len(self.milestones) + 1 == len(self.values) + return optimizer.lr.PiecewiseDecay(boundary, self.values) + + # value is computed by self.gamma + value = value if value is not None else [base_lr] + for i in self.gamma: + value.append(base_lr * i) + + return optimizer.lr.PiecewiseDecay(boundary, value) + + +@serializable +class LinearWarmup(object): + """ + Warm up learning rate linearly + + Args: + steps (int): warm up steps + start_factor (float): initial learning rate factor + epochs (int|None): use epochs as warm up steps, the priority + of `epochs` is higher than `steps`. Default: None. + """ + + def __init__(self, steps=500, start_factor=1. / 3, epochs=None): + super(LinearWarmup, self).__init__() + self.steps = steps + self.start_factor = start_factor + self.epochs = epochs + + def __call__(self, base_lr, step_per_epoch): + boundary = [] + value = [] + warmup_steps = self.epochs * step_per_epoch \ + if self.epochs is not None else self.steps + warmup_steps = max(warmup_steps, 1) + for i in range(warmup_steps + 1): + if warmup_steps > 0: + alpha = i / warmup_steps + factor = self.start_factor * (1 - alpha) + alpha + lr = base_lr * factor + value.append(lr) + if i > 0: + boundary.append(i) + return boundary, value + + +@serializable +class ExpWarmup(object): + """ + Warm up learning rate in exponential mode + Args: + steps (int): warm up steps. + epochs (int|None): use epochs as warm up steps, the priority + of `epochs` is higher than `steps`. Default: None. + power (int): Exponential coefficient. Default: 2. + """ + + def __init__(self, steps=1000, epochs=None, power=2): + super(ExpWarmup, self).__init__() + self.steps = steps + self.epochs = epochs + self.power = power + + def __call__(self, base_lr, step_per_epoch): + boundary = [] + value = [] + warmup_steps = self.epochs * step_per_epoch if self.epochs is not None else self.steps + warmup_steps = max(warmup_steps, 1) + for i in range(warmup_steps + 1): + factor = (i / float(warmup_steps))**self.power + value.append(base_lr * factor) + if i > 0: + boundary.append(i) + return boundary, value + + +@register +class LearningRate(object): + """ + Learning Rate configuration + + Args: + base_lr (float): base learning rate + schedulers (list): learning rate schedulers + """ + __category__ = 'optim' + + def __init__(self, + base_lr=0.01, + schedulers=[PiecewiseDecay(), LinearWarmup()]): + super(LearningRate, self).__init__() + self.base_lr = base_lr + self.schedulers = [] + + schedulers = copy.deepcopy(schedulers) + for sched in schedulers: + if isinstance(sched, dict): + # support dict sched instantiate + module = sys.modules[__name__] + type = sched.pop("name") + scheduler = getattr(module, type)(**sched) + self.schedulers.append(scheduler) + else: + self.schedulers.append(sched) + + def __call__(self, step_per_epoch): + assert len(self.schedulers) >= 1 + if not self.schedulers[0].use_warmup: + return self.schedulers[0](base_lr=self.base_lr, + step_per_epoch=step_per_epoch) + + # TODO: split warmup & decay + # warmup + boundary, value = self.schedulers[1](self.base_lr, step_per_epoch) + # decay + decay_lr = self.schedulers[0](self.base_lr, boundary, value, + step_per_epoch) + return decay_lr + + +@register +class OptimizerBuilder(): + """ + Build optimizer handles + Args: + regularizer (object): an `Regularizer` instance + optimizer (object): an `Optimizer` instance + """ + __category__ = 'optim' + + def __init__(self, + clip_grad_by_norm=None, + clip_grad_by_value=None, + regularizer={'type': 'L2', + 'factor': .0001}, + optimizer={'type': 'Momentum', + 'momentum': .9}): + self.clip_grad_by_norm = clip_grad_by_norm + self.clip_grad_by_value = clip_grad_by_value + self.regularizer = regularizer + self.optimizer = optimizer + + def __call__(self, learning_rate, model=None): + if self.clip_grad_by_norm is not None: + grad_clip = nn.ClipGradByGlobalNorm( + clip_norm=self.clip_grad_by_norm) + elif self.clip_grad_by_value is not None: + var = abs(self.clip_grad_by_value) + grad_clip = nn.ClipGradByValue(min=-var, max=var) + else: + grad_clip = None + if self.regularizer and self.regularizer != 'None': + reg_type = self.regularizer['type'] + 'Decay' + reg_factor = self.regularizer['factor'] + regularization = getattr(regularizer, reg_type)(reg_factor) + else: + regularization = None + + optim_args = self.optimizer.copy() + optim_type = optim_args['type'] + del optim_args['type'] + + if optim_type == 'AdamWDL': + return build_adamwdl(model, lr=learning_rate, **optim_args) + + if optim_type != 'AdamW': + optim_args['weight_decay'] = regularization + + op = getattr(optimizer, optim_type) + + if 'param_groups' in optim_args: + assert isinstance(optim_args['param_groups'], list), '' + + param_groups = optim_args.pop('param_groups') + + params, visited = [], [] + for group in param_groups: + assert isinstance(group, + dict) and 'params' in group and isinstance( + group['params'], list), '' + _params = { + n: p + for n, p in model.named_parameters() + if any([k in n + for k in group['params']]) and p.trainable is True + } + _group = group.copy() + _group.update({'params': list(_params.values())}) + + params.append(_group) + visited.extend(list(_params.keys())) + + ext_params = [ + p for n, p in model.named_parameters() + if n not in visited and p.trainable is True + ] + + if len(ext_params) < len(model.parameters()): + params.append({'params': ext_params}) + + elif len(ext_params) > len(model.parameters()): + raise RuntimeError + + else: + _params = model.parameters() + params = [param for param in _params if param.trainable is True] + + return op(learning_rate=learning_rate, + parameters=params, + grad_clip=grad_clip, + **optim_args) diff --git a/PaddleDetection-release-2.6/ppdet/optimizer/utils.py b/PaddleDetection-release-2.6/ppdet/optimizer/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..ce2de49bf5973ee0b69a9ecc62028cca67f4d1e0 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/optimizer/utils.py @@ -0,0 +1,37 @@ +# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn + +from typing import List + + +def get_bn_running_state_names(model: nn.Layer) -> List[str]: + """Get all bn state full names including running mean and variance + """ + names = [] + for n, m in model.named_sublayers(): + if isinstance(m, (nn.BatchNorm2D, nn.SyncBatchNorm)): + assert hasattr(m, '_mean'), f'assert {m} has _mean' + assert hasattr(m, '_variance'), f'assert {m} has _variance' + running_mean = f'{n}._mean' + running_var = f'{n}._variance' + names.extend([running_mean, running_var]) + + return names diff --git a/PaddleDetection-release-2.6/ppdet/slim/__init__.py b/PaddleDetection-release-2.6/ppdet/slim/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..712919002ff49d9ff503fa8caaed85c954a02104 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/__init__.py @@ -0,0 +1,110 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import distill_loss +from . import distill_model +from . import ofa +from . import prune +from . import quant +from . import unstructured_prune + +from .distill_loss import * +from .distill_model import * +from .ofa import * +from .prune import * +from .quant import * +from .unstructured_prune import * + +import yaml +from ppdet.core.workspace import load_config +from ppdet.utils.checkpoint import load_pretrain_weight + + +def build_slim_model(cfg, slim_cfg, mode='train'): + with open(slim_cfg) as f: + slim_load_cfg = yaml.load(f, Loader=yaml.Loader) + + if mode != 'train' and slim_load_cfg['slim'] == 'Distill': + return cfg + + if slim_load_cfg['slim'] == 'Distill': + if "slim_method" in slim_load_cfg and slim_load_cfg[ + 'slim_method'] == "FGD": + model = FGDDistillModel(cfg, slim_cfg) + elif "slim_method" in slim_load_cfg and slim_load_cfg[ + 'slim_method'] == "LD": + model = LDDistillModel(cfg, slim_cfg) + elif "slim_method" in slim_load_cfg and slim_load_cfg[ + 'slim_method'] == "CWD": + model = CWDDistillModel(cfg, slim_cfg) + elif "slim_method" in slim_load_cfg and slim_load_cfg[ + 'slim_method'] == "PPYOLOEDistill": + model = PPYOLOEDistillModel(cfg, slim_cfg) + else: + # common distillation model + model = DistillModel(cfg, slim_cfg) + cfg['model'] = model + cfg['slim_type'] = cfg.slim + elif slim_load_cfg['slim'] == 'OFA': + load_config(slim_cfg) + model = create(cfg.architecture) + load_pretrain_weight(model, cfg.weights) + slim = create(cfg.slim) + cfg['slim'] = slim + cfg['model'] = slim(model, model.state_dict()) + cfg['slim_type'] = cfg.slim + elif slim_load_cfg['slim'] == 'DistillPrune': + if mode == 'train': + model = DistillModel(cfg, slim_cfg) + pruner = create(cfg.pruner) + pruner(model.student_model) + else: + model = create(cfg.architecture) + weights = cfg.weights + load_config(slim_cfg) + pruner = create(cfg.pruner) + model = pruner(model) + load_pretrain_weight(model, weights) + cfg['model'] = model + cfg['slim_type'] = cfg.slim + elif slim_load_cfg['slim'] == 'PTQ': + model = create(cfg.architecture) + load_config(slim_cfg) + load_pretrain_weight(model, cfg.weights) + slim = create(cfg.slim) + cfg['slim_type'] = cfg.slim + cfg['slim'] = slim + cfg['model'] = slim(model) + elif slim_load_cfg['slim'] == 'UnstructuredPruner': + load_config(slim_cfg) + slim = create(cfg.slim) + cfg['slim_type'] = cfg.slim + cfg['slim'] = slim + cfg['unstructured_prune'] = True + else: + load_config(slim_cfg) + model = create(cfg.architecture) + if mode == 'train': + load_pretrain_weight(model, cfg.pretrain_weights) + slim = create(cfg.slim) + cfg['slim_type'] = cfg.slim + # TODO: fix quant export model in framework. + if mode == 'test' and 'QAT' in slim_load_cfg['slim']: + slim.quant_config['activation_preprocess_type'] = None + cfg['model'] = slim(model) + cfg['slim'] = slim + if mode != 'train': + load_pretrain_weight(cfg['model'], cfg.weights) + + return cfg diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fab83df0cdcfe61df27b05c8f887a55bd902cf98 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_loss.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_loss.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..09d17d8484548cbef2023e13218cb85392322c80 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_loss.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_model.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_model.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f84a6632af3082033fdea05de21c87e343758191 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/distill_model.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/ofa.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/ofa.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..df8d1dc163b99e961685d813db5509e61db6ff06 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/ofa.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/prune.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/prune.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..7fc5682e4fa1520631f07ff8de7b2476bde02c05 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/prune.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/quant.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/quant.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4e16f333c7ea2bfc663e93aa130f507184cfa72b Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/quant.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/__pycache__/unstructured_prune.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/unstructured_prune.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..114a3a71d2a834a7d07742a523a44045f59e1a78 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/slim/__pycache__/unstructured_prune.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/slim/distill_loss.py b/PaddleDetection-release-2.6/ppdet/slim/distill_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..d325a5b2ac93983256bf8c07b165354f0b4ffd98 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/distill_loss.py @@ -0,0 +1,919 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr + +from ppdet.core.workspace import register +from ppdet.modeling import ops +from ppdet.modeling.losses.iou_loss import GIoULoss +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'DistillYOLOv3Loss', + 'KnowledgeDistillationKLDivLoss', + 'DistillPPYOLOELoss', + 'FGDFeatureLoss', + 'CWDFeatureLoss', + 'PKDFeatureLoss', + 'MGDFeatureLoss', +] + + +def parameter_init(mode="kaiming", value=0.): + if mode == "kaiming": + weight_attr = paddle.nn.initializer.KaimingUniform() + elif mode == "constant": + weight_attr = paddle.nn.initializer.Constant(value=value) + else: + weight_attr = paddle.nn.initializer.KaimingUniform() + + weight_init = ParamAttr(initializer=weight_attr) + return weight_init + + +def feature_norm(feat): + # Normalize the feature maps to have zero mean and unit variances. + assert len(feat.shape) == 4 + N, C, H, W = feat.shape + feat = feat.transpose([1, 0, 2, 3]).reshape([C, -1]) + mean = feat.mean(axis=-1, keepdim=True) + std = feat.std(axis=-1, keepdim=True) + feat = (feat - mean) / (std + 1e-6) + return feat.reshape([C, N, H, W]).transpose([1, 0, 2, 3]) + + +@register +class DistillYOLOv3Loss(nn.Layer): + def __init__(self, weight=1000): + super(DistillYOLOv3Loss, self).__init__() + self.loss_weight = weight + + def obj_weighted_reg(self, sx, sy, sw, sh, tx, ty, tw, th, tobj): + loss_x = ops.sigmoid_cross_entropy_with_logits(sx, F.sigmoid(tx)) + loss_y = ops.sigmoid_cross_entropy_with_logits(sy, F.sigmoid(ty)) + loss_w = paddle.abs(sw - tw) + loss_h = paddle.abs(sh - th) + loss = paddle.add_n([loss_x, loss_y, loss_w, loss_h]) + weighted_loss = paddle.mean(loss * F.sigmoid(tobj)) + return weighted_loss + + def obj_weighted_cls(self, scls, tcls, tobj): + loss = ops.sigmoid_cross_entropy_with_logits(scls, F.sigmoid(tcls)) + weighted_loss = paddle.mean(paddle.multiply(loss, F.sigmoid(tobj))) + return weighted_loss + + def obj_loss(self, sobj, tobj): + obj_mask = paddle.cast(tobj > 0., dtype="float32") + obj_mask.stop_gradient = True + loss = paddle.mean( + ops.sigmoid_cross_entropy_with_logits(sobj, obj_mask)) + return loss + + def forward(self, teacher_model, student_model): + teacher_distill_pairs = teacher_model.yolo_head.loss.distill_pairs + student_distill_pairs = student_model.yolo_head.loss.distill_pairs + distill_reg_loss, distill_cls_loss, distill_obj_loss = [], [], [] + for s_pair, t_pair in zip(student_distill_pairs, teacher_distill_pairs): + distill_reg_loss.append( + self.obj_weighted_reg(s_pair[0], s_pair[1], s_pair[2], s_pair[ + 3], t_pair[0], t_pair[1], t_pair[2], t_pair[3], t_pair[4])) + distill_cls_loss.append( + self.obj_weighted_cls(s_pair[5], t_pair[5], t_pair[4])) + distill_obj_loss.append(self.obj_loss(s_pair[4], t_pair[4])) + distill_reg_loss = paddle.add_n(distill_reg_loss) + distill_cls_loss = paddle.add_n(distill_cls_loss) + distill_obj_loss = paddle.add_n(distill_obj_loss) + loss = (distill_reg_loss + distill_cls_loss + distill_obj_loss + ) * self.loss_weight + return loss + + +@register +class KnowledgeDistillationKLDivLoss(nn.Layer): + """Loss function for knowledge distilling using KL divergence. + + Args: + reduction (str): Options are `'none'`, `'mean'` and `'sum'`. + loss_weight (float): Loss weight of current loss. + T (int): Temperature for distillation. + """ + + def __init__(self, reduction='mean', loss_weight=1.0, T=10): + super(KnowledgeDistillationKLDivLoss, self).__init__() + assert reduction in ('none', 'mean', 'sum') + assert T >= 1 + self.reduction = reduction + self.loss_weight = loss_weight + self.T = T + + def knowledge_distillation_kl_div_loss(self, + pred, + soft_label, + T, + detach_target=True): + r"""Loss function for knowledge distilling using KL divergence. + + Args: + pred (Tensor): Predicted logits with shape (N, n + 1). + soft_label (Tensor): Target logits with shape (N, N + 1). + T (int): Temperature for distillation. + detach_target (bool): Remove soft_label from automatic differentiation + """ + assert pred.shape == soft_label.shape + target = F.softmax(soft_label / T, axis=1) + if detach_target: + target = target.detach() + + kd_loss = F.kl_div( + F.log_softmax( + pred / T, axis=1), target, reduction='none').mean(1) * (T * T) + + return kd_loss + + def forward(self, + pred, + soft_label, + weight=None, + avg_factor=None, + reduction_override=None): + """Forward function. + + Args: + pred (Tensor): Predicted logits with shape (N, n + 1). + soft_label (Tensor): Target logits with shape (N, N + 1). + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + + reduction = (reduction_override + if reduction_override else self.reduction) + + loss_kd_out = self.knowledge_distillation_kl_div_loss( + pred, soft_label, T=self.T) + + if weight is not None: + loss_kd_out = weight * loss_kd_out + + if avg_factor is None: + if reduction == 'none': + loss = loss_kd_out + elif reduction == 'mean': + loss = loss_kd_out.mean() + elif reduction == 'sum': + loss = loss_kd_out.sum() + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == 'mean': + loss = loss_kd_out.sum() / avg_factor + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != 'none': + raise ValueError( + 'avg_factor can not be used with reduction="sum"') + + loss_kd = self.loss_weight * loss + return loss_kd + + +@register +class DistillPPYOLOELoss(nn.Layer): + def __init__( + self, + loss_weight={'logits': 4.0, + 'feat': 1.0}, + logits_distill=True, + logits_loss_weight={'class': 1.0, + 'iou': 2.5, + 'dfl': 0.5}, + logits_ld_distill=False, + logits_ld_params={'weight': 20000, + 'T': 10}, + feat_distill=True, + feat_distiller='fgd', + feat_distill_place='neck_feats', + teacher_width_mult=1.0, # L + student_width_mult=0.75, # M + feat_out_channels=[768, 384, 192]): + super(DistillPPYOLOELoss, self).__init__() + self.loss_weight_logits = loss_weight['logits'] + self.loss_weight_feat = loss_weight['feat'] + self.logits_distill = logits_distill + self.logits_ld_distill = logits_ld_distill + self.feat_distill = feat_distill + + if logits_distill and self.loss_weight_logits > 0: + self.bbox_loss_weight = logits_loss_weight['iou'] + self.dfl_loss_weight = logits_loss_weight['dfl'] + self.qfl_loss_weight = logits_loss_weight['class'] + self.loss_bbox = GIoULoss() + + if logits_ld_distill: + self.loss_kd = KnowledgeDistillationKLDivLoss( + loss_weight=logits_ld_params['weight'], T=logits_ld_params['T']) + + if feat_distill and self.loss_weight_feat > 0: + assert feat_distiller in ['cwd', 'fgd', 'pkd', 'mgd', 'mimic'] + assert feat_distill_place in ['backbone_feats', 'neck_feats'] + self.feat_distill_place = feat_distill_place + self.t_channel_list = [ + int(c * teacher_width_mult) for c in feat_out_channels + ] + self.s_channel_list = [ + int(c * student_width_mult) for c in feat_out_channels + ] + self.distill_feat_loss_modules = [] + for i in range(len(feat_out_channels)): + if feat_distiller == 'cwd': + feat_loss_module = CWDFeatureLoss( + student_channels=self.s_channel_list[i], + teacher_channels=self.t_channel_list[i], + normalize=True) + elif feat_distiller == 'fgd': + feat_loss_module = FGDFeatureLoss( + student_channels=self.s_channel_list[i], + teacher_channels=self.t_channel_list[i], + normalize=True, + alpha_fgd=0.00001, + beta_fgd=0.000005, + gamma_fgd=0.00001, + lambda_fgd=0.00000005) + elif feat_distiller == 'pkd': + feat_loss_module = PKDFeatureLoss( + student_channels=self.s_channel_list[i], + teacher_channels=self.t_channel_list[i], + normalize=True, + resize_stu=True) + elif feat_distiller == 'mgd': + feat_loss_module = MGDFeatureLoss( + student_channels=self.s_channel_list[i], + teacher_channels=self.t_channel_list[i], + normalize=True, + loss_func='ssim') + elif feat_distiller == 'mimic': + feat_loss_module = MimicFeatureLoss( + student_channels=self.s_channel_list[i], + teacher_channels=self.t_channel_list[i], + normalize=True) + else: + raise ValueError + self.distill_feat_loss_modules.append(feat_loss_module) + + def quality_focal_loss(self, + pred_logits, + soft_target_logits, + beta=2.0, + use_sigmoid=False, + num_total_pos=None): + if use_sigmoid: + func = F.binary_cross_entropy_with_logits + soft_target = F.sigmoid(soft_target_logits) + pred_sigmoid = F.sigmoid(pred_logits) + preds = pred_logits + else: + func = F.binary_cross_entropy + soft_target = soft_target_logits + pred_sigmoid = pred_logits + preds = pred_sigmoid + + scale_factor = pred_sigmoid - soft_target + loss = func( + preds, soft_target, reduction='none') * scale_factor.abs().pow(beta) + loss = loss.sum(1) + + if num_total_pos is not None: + loss = loss.sum() / num_total_pos + else: + loss = loss.mean() + return loss + + def bbox_loss(self, s_bbox, t_bbox, weight_targets=None): + # [x,y,w,h] + if weight_targets is not None: + loss = paddle.sum(self.loss_bbox(s_bbox, t_bbox) * weight_targets) + avg_factor = weight_targets.sum() + loss = loss / avg_factor + else: + loss = paddle.mean(self.loss_bbox(s_bbox, t_bbox)) + return loss + + def distribution_focal_loss(self, + pred_corners, + target_corners, + weight_targets=None): + target_corners_label = F.softmax(target_corners, axis=-1) + loss_dfl = F.cross_entropy( + pred_corners, + target_corners_label, + soft_label=True, + reduction='none') + loss_dfl = loss_dfl.sum(1) + + if weight_targets is not None: + loss_dfl = loss_dfl * (weight_targets.expand([-1, 4]).reshape([-1])) + loss_dfl = loss_dfl.sum(-1) / weight_targets.sum() + else: + loss_dfl = loss_dfl.mean(-1) + return loss_dfl / 4.0 # 4 direction + + def main_kd(self, mask_positive, pred_scores, soft_cls, num_classes): + num_pos = mask_positive.sum() + if num_pos > 0: + cls_mask = mask_positive.unsqueeze(-1).tile([1, 1, num_classes]) + pred_scores_pos = paddle.masked_select( + pred_scores, cls_mask).reshape([-1, num_classes]) + soft_cls_pos = paddle.masked_select( + soft_cls, cls_mask).reshape([-1, num_classes]) + loss_kd = self.loss_kd( + pred_scores_pos, soft_cls_pos, avg_factor=num_pos) + else: + loss_kd = paddle.zeros([1]) + return loss_kd + + def forward(self, teacher_model, student_model): + teacher_distill_pairs = teacher_model.yolo_head.distill_pairs + student_distill_pairs = student_model.yolo_head.distill_pairs + if self.logits_distill and self.loss_weight_logits > 0: + distill_bbox_loss, distill_dfl_loss, distill_cls_loss = [], [], [] + + distill_cls_loss.append( + self.quality_focal_loss( + student_distill_pairs['pred_cls_scores'].reshape( + (-1, student_distill_pairs['pred_cls_scores'].shape[-1] + )), + teacher_distill_pairs['pred_cls_scores'].detach().reshape( + (-1, teacher_distill_pairs['pred_cls_scores'].shape[-1] + )), + num_total_pos=student_distill_pairs['pos_num'], + use_sigmoid=False)) + + distill_bbox_loss.append( + self.bbox_loss(student_distill_pairs['pred_bboxes_pos'], + teacher_distill_pairs['pred_bboxes_pos'].detach(), + weight_targets=student_distill_pairs['bbox_weight'] + ) if 'pred_bboxes_pos' in student_distill_pairs and \ + 'pred_bboxes_pos' in teacher_distill_pairs and \ + 'bbox_weight' in student_distill_pairs + else paddle.zeros([1])) + + distill_dfl_loss.append( + self.distribution_focal_loss( + student_distill_pairs['pred_dist_pos'].reshape((-1, student_distill_pairs['pred_dist_pos'].shape[-1])), + teacher_distill_pairs['pred_dist_pos'].detach().reshape((-1, teacher_distill_pairs['pred_dist_pos'].shape[-1])), \ + weight_targets=student_distill_pairs['bbox_weight'] + ) if 'pred_dist_pos' in student_distill_pairs and \ + 'pred_dist_pos' in teacher_distill_pairs and \ + 'bbox_weight' in student_distill_pairs + else paddle.zeros([1])) + + distill_cls_loss = paddle.add_n(distill_cls_loss) + distill_bbox_loss = paddle.add_n(distill_bbox_loss) + distill_dfl_loss = paddle.add_n(distill_dfl_loss) + logits_loss = distill_bbox_loss * self.bbox_loss_weight + distill_cls_loss * self.qfl_loss_weight + distill_dfl_loss * self.dfl_loss_weight + + if self.logits_ld_distill: + loss_kd = self.main_kd( + student_distill_pairs['mask_positive_select'], + student_distill_pairs['pred_cls_scores'], + teacher_distill_pairs['pred_cls_scores'], + student_model.yolo_head.num_classes, ) + logits_loss += loss_kd + else: + logits_loss = paddle.zeros([1]) + + if self.feat_distill and self.loss_weight_feat > 0: + feat_loss_list = [] + inputs = student_model.inputs + assert 'gt_bbox' in inputs + assert self.feat_distill_place in student_distill_pairs + assert self.feat_distill_place in teacher_distill_pairs + stu_feats = student_distill_pairs[self.feat_distill_place] + tea_feats = teacher_distill_pairs[self.feat_distill_place] + for i, loss_module in enumerate(self.distill_feat_loss_modules): + feat_loss_list.append( + loss_module(stu_feats[i], tea_feats[i], inputs)) + feat_loss = paddle.add_n(feat_loss_list) + else: + feat_loss = paddle.zeros([1]) + + student_model.yolo_head.distill_pairs.clear() + teacher_model.yolo_head.distill_pairs.clear() + return logits_loss * self.loss_weight_logits, feat_loss * self.loss_weight_feat + + +@register +class CWDFeatureLoss(nn.Layer): + def __init__(self, + student_channels, + teacher_channels, + normalize=False, + tau=1.0, + weight=1.0): + super(CWDFeatureLoss, self).__init__() + self.normalize = normalize + self.tau = tau + self.loss_weight = weight + + if student_channels != teacher_channels: + self.align = nn.Conv2D( + student_channels, + teacher_channels, + kernel_size=1, + stride=1, + padding=0) + else: + self.align = None + + def distill_softmax(self, x, tau): + _, _, w, h = paddle.shape(x) + x = paddle.reshape(x, [-1, w * h]) + x /= tau + return F.softmax(x, axis=1) + + def forward(self, preds_s, preds_t, inputs=None): + assert preds_s.shape[-2:] == preds_t.shape[-2:] + N, C, H, W = preds_s.shape + eps = 1e-5 + if self.align is not None: + preds_s = self.align(preds_s) + + if self.normalize: + preds_s = feature_norm(preds_s) + preds_t = feature_norm(preds_t) + + softmax_pred_s = self.distill_softmax(preds_s, self.tau) + softmax_pred_t = self.distill_softmax(preds_t, self.tau) + + loss = paddle.sum(-softmax_pred_t * paddle.log(eps + softmax_pred_s) + + softmax_pred_t * paddle.log(eps + softmax_pred_t)) + return self.loss_weight * loss / (C * N) + + +@register +class FGDFeatureLoss(nn.Layer): + """ + Focal and Global Knowledge Distillation for Detectors + The code is reference from https://github.com/yzd-v/FGD/blob/master/mmdet/distillation/losses/fgd.py + + Args: + student_channels (int): The number of channels in the student's FPN feature map. Default to 256. + teacher_channels (int): The number of channels in the teacher's FPN feature map. Default to 256. + normalize (bool): Whether to normalize the feature maps. + temp (float, optional): The temperature coefficient. Defaults to 0.5. + alpha_fgd (float, optional): The weight of fg_loss. Defaults to 0.001 + beta_fgd (float, optional): The weight of bg_loss. Defaults to 0.0005 + gamma_fgd (float, optional): The weight of mask_loss. Defaults to 0.001 + lambda_fgd (float, optional): The weight of relation_loss. Defaults to 0.000005 + """ + + def __init__(self, + student_channels, + teacher_channels, + normalize=False, + loss_weight=1.0, + temp=0.5, + alpha_fgd=0.001, + beta_fgd=0.0005, + gamma_fgd=0.001, + lambda_fgd=0.000005): + super(FGDFeatureLoss, self).__init__() + self.normalize = normalize + self.loss_weight = loss_weight + self.temp = temp + self.alpha_fgd = alpha_fgd + self.beta_fgd = beta_fgd + self.gamma_fgd = gamma_fgd + self.lambda_fgd = lambda_fgd + kaiming_init = parameter_init("kaiming") + zeros_init = parameter_init("constant", 0.0) + + if student_channels != teacher_channels: + self.align = nn.Conv2D( + student_channels, + teacher_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=kaiming_init) + student_channels = teacher_channels + else: + self.align = None + + self.conv_mask_s = nn.Conv2D( + student_channels, 1, kernel_size=1, weight_attr=kaiming_init) + self.conv_mask_t = nn.Conv2D( + teacher_channels, 1, kernel_size=1, weight_attr=kaiming_init) + + self.stu_conv_block = nn.Sequential( + nn.Conv2D( + student_channels, + student_channels // 2, + kernel_size=1, + weight_attr=zeros_init), + nn.LayerNorm([student_channels // 2, 1, 1]), + nn.ReLU(), + nn.Conv2D( + student_channels // 2, + student_channels, + kernel_size=1, + weight_attr=zeros_init)) + self.tea_conv_block = nn.Sequential( + nn.Conv2D( + teacher_channels, + teacher_channels // 2, + kernel_size=1, + weight_attr=zeros_init), + nn.LayerNorm([teacher_channels // 2, 1, 1]), + nn.ReLU(), + nn.Conv2D( + teacher_channels // 2, + teacher_channels, + kernel_size=1, + weight_attr=zeros_init)) + + def spatial_channel_attention(self, x, t=0.5): + shape = paddle.shape(x) + N, C, H, W = shape + _f = paddle.abs(x) + spatial_map = paddle.reshape( + paddle.mean( + _f, axis=1, keepdim=True) / t, [N, -1]) + spatial_map = F.softmax(spatial_map, axis=1, dtype="float32") * H * W + spatial_att = paddle.reshape(spatial_map, [N, H, W]) + + channel_map = paddle.mean( + paddle.mean( + _f, axis=2, keepdim=False), axis=2, keepdim=False) + channel_att = F.softmax(channel_map / t, axis=1, dtype="float32") * C + return [spatial_att, channel_att] + + def spatial_pool(self, x, mode="teacher"): + batch, channel, width, height = x.shape + x_copy = x + x_copy = paddle.reshape(x_copy, [batch, channel, height * width]) + x_copy = x_copy.unsqueeze(1) + if mode.lower() == "student": + context_mask = self.conv_mask_s(x) + else: + context_mask = self.conv_mask_t(x) + + context_mask = paddle.reshape(context_mask, [batch, 1, height * width]) + context_mask = F.softmax(context_mask, axis=2) + context_mask = context_mask.unsqueeze(-1) + context = paddle.matmul(x_copy, context_mask) + context = paddle.reshape(context, [batch, channel, 1, 1]) + return context + + def mask_loss(self, stu_channel_att, tea_channel_att, stu_spatial_att, + tea_spatial_att): + def _func(a, b): + return paddle.sum(paddle.abs(a - b)) / len(a) + + mask_loss = _func(stu_channel_att, tea_channel_att) + _func( + stu_spatial_att, tea_spatial_att) + return mask_loss + + def feature_loss(self, stu_feature, tea_feature, mask_fg, mask_bg, + tea_channel_att, tea_spatial_att): + mask_fg = mask_fg.unsqueeze(axis=1) + mask_bg = mask_bg.unsqueeze(axis=1) + tea_channel_att = tea_channel_att.unsqueeze(axis=-1).unsqueeze(axis=-1) + tea_spatial_att = tea_spatial_att.unsqueeze(axis=1) + + fea_t = paddle.multiply(tea_feature, paddle.sqrt(tea_spatial_att)) + fea_t = paddle.multiply(fea_t, paddle.sqrt(tea_channel_att)) + fg_fea_t = paddle.multiply(fea_t, paddle.sqrt(mask_fg)) + bg_fea_t = paddle.multiply(fea_t, paddle.sqrt(mask_bg)) + + fea_s = paddle.multiply(stu_feature, paddle.sqrt(tea_spatial_att)) + fea_s = paddle.multiply(fea_s, paddle.sqrt(tea_channel_att)) + fg_fea_s = paddle.multiply(fea_s, paddle.sqrt(mask_fg)) + bg_fea_s = paddle.multiply(fea_s, paddle.sqrt(mask_bg)) + + fg_loss = F.mse_loss(fg_fea_s, fg_fea_t, reduction="sum") / len(mask_fg) + bg_loss = F.mse_loss(bg_fea_s, bg_fea_t, reduction="sum") / len(mask_bg) + return fg_loss, bg_loss + + def relation_loss(self, stu_feature, tea_feature): + context_s = self.spatial_pool(stu_feature, "student") + context_t = self.spatial_pool(tea_feature, "teacher") + out_s = stu_feature + self.stu_conv_block(context_s) + out_t = tea_feature + self.tea_conv_block(context_t) + rela_loss = F.mse_loss(out_s, out_t, reduction="sum") / len(out_s) + return rela_loss + + def mask_value(self, mask, xl, xr, yl, yr, value): + mask[xl:xr, yl:yr] = paddle.maximum(mask[xl:xr, yl:yr], value) + return mask + + def forward(self, stu_feature, tea_feature, inputs): + assert stu_feature.shape[-2:] == stu_feature.shape[-2:] + assert "gt_bbox" in inputs.keys() and "im_shape" in inputs.keys() + gt_bboxes = inputs['gt_bbox'] + ins_shape = [ + inputs['im_shape'][i] for i in range(inputs['im_shape'].shape[0]) + ] + index_gt = [] + for i in range(len(gt_bboxes)): + if gt_bboxes[i].size > 2: + index_gt.append(i) + # only distill feature with labeled GTbox + if len(index_gt) != len(gt_bboxes): + index_gt_t = paddle.to_tensor(index_gt) + stu_feature = paddle.index_select(stu_feature, index_gt_t) + tea_feature = paddle.index_select(tea_feature, index_gt_t) + + ins_shape = [ins_shape[c] for c in index_gt] + gt_bboxes = [gt_bboxes[c] for c in index_gt] + assert len(gt_bboxes) == tea_feature.shape[0] + + if self.align is not None: + stu_feature = self.align(stu_feature) + + if self.normalize: + stu_feature = feature_norm(stu_feature) + tea_feature = feature_norm(tea_feature) + + tea_spatial_att, tea_channel_att = self.spatial_channel_attention( + tea_feature, self.temp) + stu_spatial_att, stu_channel_att = self.spatial_channel_attention( + stu_feature, self.temp) + + mask_fg = paddle.zeros(tea_spatial_att.shape) + mask_bg = paddle.ones_like(tea_spatial_att) + one_tmp = paddle.ones([*tea_spatial_att.shape[1:]]) + zero_tmp = paddle.zeros([*tea_spatial_att.shape[1:]]) + mask_fg.stop_gradient = True + mask_bg.stop_gradient = True + one_tmp.stop_gradient = True + zero_tmp.stop_gradient = True + + wmin, wmax, hmin, hmax = [], [], [], [] + + if len(gt_bboxes) == 0: + loss = self.relation_loss(stu_feature, tea_feature) + return self.lambda_fgd * loss + + N, _, H, W = stu_feature.shape + for i in range(N): + tmp_box = paddle.ones_like(gt_bboxes[i]) + tmp_box.stop_gradient = True + tmp_box[:, 0] = gt_bboxes[i][:, 0] / ins_shape[i][1] * W + tmp_box[:, 2] = gt_bboxes[i][:, 2] / ins_shape[i][1] * W + tmp_box[:, 1] = gt_bboxes[i][:, 1] / ins_shape[i][0] * H + tmp_box[:, 3] = gt_bboxes[i][:, 3] / ins_shape[i][0] * H + + zero = paddle.zeros_like(tmp_box[:, 0], dtype="int32") + ones = paddle.ones_like(tmp_box[:, 2], dtype="int32") + zero.stop_gradient = True + ones.stop_gradient = True + wmin.append( + paddle.cast(paddle.floor(tmp_box[:, 0]), "int32").maximum(zero)) + wmax.append(paddle.cast(paddle.ceil(tmp_box[:, 2]), "int32")) + hmin.append( + paddle.cast(paddle.floor(tmp_box[:, 1]), "int32").maximum(zero)) + hmax.append(paddle.cast(paddle.ceil(tmp_box[:, 3]), "int32")) + + area_recip = 1.0 / ( + hmax[i].reshape([1, -1]) + 1 - hmin[i].reshape([1, -1])) / ( + wmax[i].reshape([1, -1]) + 1 - wmin[i].reshape([1, -1])) + + for j in range(len(gt_bboxes[i])): + if gt_bboxes[i][j].sum() > 0: + mask_fg[i] = self.mask_value( + mask_fg[i], hmin[i][j], hmax[i][j] + 1, wmin[i][j], + wmax[i][j] + 1, area_recip[0][j]) + + mask_bg[i] = paddle.where(mask_fg[i] > zero_tmp, zero_tmp, one_tmp) + + if paddle.sum(mask_bg[i]): + mask_bg[i] /= paddle.sum(mask_bg[i]) + + fg_loss, bg_loss = self.feature_loss(stu_feature, tea_feature, mask_fg, + mask_bg, tea_channel_att, + tea_spatial_att) + mask_loss = self.mask_loss(stu_channel_att, tea_channel_att, + stu_spatial_att, tea_spatial_att) + rela_loss = self.relation_loss(stu_feature, tea_feature) + loss = self.alpha_fgd * fg_loss + self.beta_fgd * bg_loss \ + + self.gamma_fgd * mask_loss + self.lambda_fgd * rela_loss + return loss * self.loss_weight + + +@register +class PKDFeatureLoss(nn.Layer): + """ + PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient. + + Args: + loss_weight (float): Weight of loss. Defaults to 1.0. + resize_stu (bool): If True, we'll down/up sample the features of the + student model to the spatial size of those of the teacher model if + their spatial sizes are different. And vice versa. Defaults to + True. + """ + + def __init__(self, + student_channels=256, + teacher_channels=256, + normalize=True, + loss_weight=1.0, + resize_stu=True): + super(PKDFeatureLoss, self).__init__() + self.normalize = normalize + self.loss_weight = loss_weight + self.resize_stu = resize_stu + + def forward(self, stu_feature, tea_feature, inputs=None): + size_s, size_t = stu_feature.shape[2:], tea_feature.shape[2:] + if size_s[0] != size_t[0]: + if self.resize_stu: + stu_feature = F.interpolate( + stu_feature, size_t, mode='bilinear') + else: + tea_feature = F.interpolate( + tea_feature, size_s, mode='bilinear') + assert stu_feature.shape == tea_feature.shape + + if self.normalize: + stu_feature = feature_norm(stu_feature) + tea_feature = feature_norm(tea_feature) + + loss = F.mse_loss(stu_feature, tea_feature) / 2 + return loss * self.loss_weight + + +@register +class MimicFeatureLoss(nn.Layer): + def __init__(self, + student_channels=256, + teacher_channels=256, + normalize=True, + loss_weight=1.0): + super(MimicFeatureLoss, self).__init__() + self.normalize = normalize + self.loss_weight = loss_weight + self.mse_loss = nn.MSELoss() + + if student_channels != teacher_channels: + self.align = nn.Conv2D( + student_channels, + teacher_channels, + kernel_size=1, + stride=1, + padding=0) + else: + self.align = None + + def forward(self, stu_feature, tea_feature, inputs=None): + if self.align is not None: + stu_feature = self.align(stu_feature) + + if self.normalize: + stu_feature = feature_norm(stu_feature) + tea_feature = feature_norm(tea_feature) + + loss = self.mse_loss(stu_feature, tea_feature) + return loss * self.loss_weight + + +@register +class MGDFeatureLoss(nn.Layer): + def __init__(self, + student_channels=256, + teacher_channels=256, + normalize=True, + loss_weight=1.0, + loss_func='mse'): + super(MGDFeatureLoss, self).__init__() + self.normalize = normalize + self.loss_weight = loss_weight + assert loss_func in ['mse', 'ssim'] + self.loss_func = loss_func + self.mse_loss = nn.MSELoss(reduction='sum') + self.ssim_loss = SSIM(11) + + kaiming_init = parameter_init("kaiming") + if student_channels != teacher_channels: + self.align = nn.Conv2D( + student_channels, + teacher_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=kaiming_init, + bias_attr=False) + else: + self.align = None + + self.generation = nn.Sequential( + nn.Conv2D( + teacher_channels, teacher_channels, kernel_size=3, padding=1), + nn.ReLU(), + nn.Conv2D( + teacher_channels, teacher_channels, kernel_size=3, padding=1)) + + def forward(self, stu_feature, tea_feature, inputs=None): + N = stu_feature.shape[0] + if self.align is not None: + stu_feature = self.align(stu_feature) + stu_feature = self.generation(stu_feature) + + if self.normalize: + stu_feature = feature_norm(stu_feature) + tea_feature = feature_norm(tea_feature) + + if self.loss_func == 'mse': + loss = self.mse_loss(stu_feature, tea_feature) / N + elif self.loss_func == 'ssim': + ssim_loss = self.ssim_loss(stu_feature, tea_feature) + loss = paddle.clip((1 - ssim_loss) / 2, 0, 1) + else: + raise ValueError + return loss * self.loss_weight + + +class SSIM(nn.Layer): + def __init__(self, window_size=11, size_average=True): + super(SSIM, self).__init__() + self.window_size = window_size + self.size_average = size_average + self.channel = 1 + self.window = self.create_window(window_size, self.channel) + + def gaussian(self, window_size, sigma): + gauss = paddle.to_tensor([ + math.exp(-(x - window_size // 2)**2 / float(2 * sigma**2)) + for x in range(window_size) + ]) + return gauss / gauss.sum() + + def create_window(self, window_size, channel): + _1D_window = self.gaussian(window_size, 1.5).unsqueeze(1) + _2D_window = _1D_window.mm(_1D_window.t()).unsqueeze(0).unsqueeze(0) + window = _2D_window.expand([channel, 1, window_size, window_size]) + return window + + def _ssim(self, img1, img2, window, window_size, channel, + size_average=True): + mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel) + mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel) + mu1_sq = mu1.pow(2) + mu2_sq = mu2.pow(2) + mu1_mu2 = mu1 * mu2 + + sigma1_sq = F.conv2d( + img1 * img1, window, padding=window_size // 2, + groups=channel) - mu1_sq + sigma2_sq = F.conv2d( + img2 * img2, window, padding=window_size // 2, + groups=channel) - mu2_sq + sigma12 = F.conv2d( + img1 * img2, window, padding=window_size // 2, + groups=channel) - mu1_mu2 + + C1 = 0.01**2 + C2 = 0.03**2 + ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ( + 1e-12 + (mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2)) + + if size_average: + return ssim_map.mean() + else: + return ssim_map.mean([1, 2, 3]) + + def forward(self, img1, img2): + channel = img1.shape[1] + if channel == self.channel and self.window.dtype == img1.dtype: + window = self.window + else: + window = self.create_window(self.window_size, channel) + self.window = window + self.channel = channel + + return self._ssim(img1, img2, window, self.window_size, channel, + self.size_average) diff --git a/PaddleDetection-release-2.6/ppdet/slim/distill_model.py b/PaddleDetection-release-2.6/ppdet/slim/distill_model.py new file mode 100644 index 0000000000000000000000000000000000000000..96e1366381308aab6ee5e3c46d5b4d378da0783c --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/distill_model.py @@ -0,0 +1,353 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn + +from ppdet.core.workspace import register, create, load_config +from ppdet.utils.checkpoint import load_pretrain_weight +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'DistillModel', + 'FGDDistillModel', + 'CWDDistillModel', + 'LDDistillModel', + 'PPYOLOEDistillModel', +] + + +@register +class DistillModel(nn.Layer): + """ + Build common distill model. + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(DistillModel, self).__init__() + self.arch = cfg.architecture + + self.stu_cfg = cfg + self.student_model = create(self.stu_cfg.architecture) + if 'pretrain_weights' in self.stu_cfg and self.stu_cfg.pretrain_weights: + stu_pretrain = self.stu_cfg.pretrain_weights + else: + stu_pretrain = None + + slim_cfg = load_config(slim_cfg) + self.tea_cfg = slim_cfg + self.teacher_model = create(self.tea_cfg.architecture) + if 'pretrain_weights' in self.tea_cfg and self.tea_cfg.pretrain_weights: + tea_pretrain = self.tea_cfg.pretrain_weights + else: + tea_pretrain = None + self.distill_cfg = slim_cfg + + # load pretrain weights + self.is_inherit = False + if stu_pretrain: + if self.is_inherit and tea_pretrain: + load_pretrain_weight(self.student_model, tea_pretrain) + logger.debug( + "Inheriting! loading teacher weights to student model!") + load_pretrain_weight(self.student_model, stu_pretrain) + logger.info("Student model has loaded pretrain weights!") + if tea_pretrain: + load_pretrain_weight(self.teacher_model, tea_pretrain) + logger.info("Teacher model has loaded pretrain weights!") + + self.teacher_model.eval() + for param in self.teacher_model.parameters(): + param.trainable = False + + self.distill_loss = self.build_loss(self.distill_cfg) + + def build_loss(self, distill_cfg): + if 'distill_loss' in distill_cfg and distill_cfg.distill_loss: + return create(distill_cfg.distill_loss) + else: + return None + + def parameters(self): + return self.student_model.parameters() + + def forward(self, inputs): + if self.training: + student_loss = self.student_model(inputs) + with paddle.no_grad(): + teacher_loss = self.teacher_model(inputs) + + loss = self.distill_loss(self.teacher_model, self.student_model) + student_loss['distill_loss'] = loss + student_loss['teacher_loss'] = teacher_loss['loss'] + student_loss['loss'] += student_loss['distill_loss'] + return student_loss + else: + return self.student_model(inputs) + + +@register +class FGDDistillModel(DistillModel): + """ + Build FGD distill model. + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(FGDDistillModel, self).__init__(cfg=cfg, slim_cfg=slim_cfg) + assert self.arch in ['RetinaNet', 'PicoDet' + ], 'Unsupported arch: {}'.format(self.arch) + self.is_inherit = True + + def build_loss(self, distill_cfg): + assert 'distill_loss_name' in distill_cfg and distill_cfg.distill_loss_name + assert 'distill_loss' in distill_cfg and distill_cfg.distill_loss + loss_func = dict() + name_list = distill_cfg.distill_loss_name + for name in name_list: + loss_func[name] = create(distill_cfg.distill_loss) + return loss_func + + def forward(self, inputs): + if self.training: + s_body_feats = self.student_model.backbone(inputs) + s_neck_feats = self.student_model.neck(s_body_feats) + with paddle.no_grad(): + t_body_feats = self.teacher_model.backbone(inputs) + t_neck_feats = self.teacher_model.neck(t_body_feats) + + loss_dict = {} + for idx, k in enumerate(self.distill_loss): + loss_dict[k] = self.distill_loss[k](s_neck_feats[idx], + t_neck_feats[idx], inputs) + if self.arch == "RetinaNet": + loss = self.student_model.head(s_neck_feats, inputs) + elif self.arch == "PicoDet": + head_outs = self.student_model.head( + s_neck_feats, self.student_model.export_post_process) + loss_gfl = self.student_model.head.get_loss(head_outs, inputs) + total_loss = paddle.add_n(list(loss_gfl.values())) + loss = {} + loss.update(loss_gfl) + loss.update({'loss': total_loss}) + else: + raise ValueError(f"Unsupported model {self.arch}") + + for k in loss_dict: + loss['loss'] += loss_dict[k] + loss[k] = loss_dict[k] + return loss + else: + body_feats = self.student_model.backbone(inputs) + neck_feats = self.student_model.neck(body_feats) + head_outs = self.student_model.head(neck_feats) + if self.arch == "RetinaNet": + bbox, bbox_num = self.student_model.head.post_process( + head_outs, inputs['im_shape'], inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} + elif self.arch == "PicoDet": + head_outs = self.student_model.head( + neck_feats, self.student_model.export_post_process) + scale_factor = inputs['scale_factor'] + bboxes, bbox_num = self.student_model.head.post_process( + head_outs, + scale_factor, + export_nms=self.student_model.export_nms) + return {'bbox': bboxes, 'bbox_num': bbox_num} + else: + raise ValueError(f"Unsupported model {self.arch}") + + +@register +class CWDDistillModel(DistillModel): + """ + Build CWD distill model. + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(CWDDistillModel, self).__init__(cfg=cfg, slim_cfg=slim_cfg) + assert self.arch in ['GFL', 'RetinaNet'], 'Unsupported arch: {}'.format( + self.arch) + + def build_loss(self, distill_cfg): + assert 'distill_loss_name' in distill_cfg and distill_cfg.distill_loss_name + assert 'distill_loss' in distill_cfg and distill_cfg.distill_loss + loss_func = dict() + name_list = distill_cfg.distill_loss_name + for name in name_list: + loss_func[name] = create(distill_cfg.distill_loss) + return loss_func + + def get_loss_retinanet(self, stu_fea_list, tea_fea_list, inputs): + loss = self.student_model.head(stu_fea_list, inputs) + loss_dict = {} + for idx, k in enumerate(self.distill_loss): + loss_dict[k] = self.distill_loss[k](stu_fea_list[idx], + tea_fea_list[idx]) + + loss['loss'] += loss_dict[k] + loss[k] = loss_dict[k] + return loss + + def get_loss_gfl(self, stu_fea_list, tea_fea_list, inputs): + loss = {} + head_outs = self.student_model.head(stu_fea_list) + loss_gfl = self.student_model.head.get_loss(head_outs, inputs) + loss.update(loss_gfl) + total_loss = paddle.add_n(list(loss.values())) + loss.update({'loss': total_loss}) + + feat_loss = {} + loss_dict = {} + s_cls_feat, t_cls_feat = [], [] + for s_neck_f, t_neck_f in zip(stu_fea_list, tea_fea_list): + conv_cls_feat, _ = self.student_model.head.conv_feat(s_neck_f) + cls_score = self.student_model.head.gfl_head_cls(conv_cls_feat) + t_conv_cls_feat, _ = self.teacher_model.head.conv_feat(t_neck_f) + t_cls_score = self.teacher_model.head.gfl_head_cls(t_conv_cls_feat) + s_cls_feat.append(cls_score) + t_cls_feat.append(t_cls_score) + + for idx, k in enumerate(self.distill_loss): + loss_dict[k] = self.distill_loss[k](s_cls_feat[idx], + t_cls_feat[idx]) + feat_loss[f"neck_f_{idx}"] = self.distill_loss[k](stu_fea_list[idx], + tea_fea_list[idx]) + + for k in feat_loss: + loss['loss'] += feat_loss[k] + loss[k] = feat_loss[k] + + for k in loss_dict: + loss['loss'] += loss_dict[k] + loss[k] = loss_dict[k] + return loss + + def forward(self, inputs): + if self.training: + s_body_feats = self.student_model.backbone(inputs) + s_neck_feats = self.student_model.neck(s_body_feats) + with paddle.no_grad(): + t_body_feats = self.teacher_model.backbone(inputs) + t_neck_feats = self.teacher_model.neck(t_body_feats) + + if self.arch == "RetinaNet": + loss = self.get_loss_retinanet(s_neck_feats, t_neck_feats, + inputs) + elif self.arch == "GFL": + loss = self.get_loss_gfl(s_neck_feats, t_neck_feats, inputs) + else: + raise ValueError(f"unsupported arch {self.arch}") + return loss + else: + body_feats = self.student_model.backbone(inputs) + neck_feats = self.student_model.neck(body_feats) + head_outs = self.student_model.head(neck_feats) + if self.arch == "RetinaNet": + bbox, bbox_num = self.student_model.head.post_process( + head_outs, inputs['im_shape'], inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} + elif self.arch == "GFL": + bbox_pred, bbox_num = head_outs + output = {'bbox': bbox_pred, 'bbox_num': bbox_num} + return output + else: + raise ValueError(f"unsupported arch {self.arch}") + + +@register +class LDDistillModel(DistillModel): + """ + Build LD distill model. + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(LDDistillModel, self).__init__(cfg=cfg, slim_cfg=slim_cfg) + assert self.arch in ['GFL'], 'Unsupported arch: {}'.format(self.arch) + + def forward(self, inputs): + if self.training: + s_body_feats = self.student_model.backbone(inputs) + s_neck_feats = self.student_model.neck(s_body_feats) + s_head_outs = self.student_model.head(s_neck_feats) + with paddle.no_grad(): + t_body_feats = self.teacher_model.backbone(inputs) + t_neck_feats = self.teacher_model.neck(t_body_feats) + t_head_outs = self.teacher_model.head(t_neck_feats) + + soft_label_list = t_head_outs[0] + soft_targets_list = t_head_outs[1] + student_loss = self.student_model.head.get_loss( + s_head_outs, inputs, soft_label_list, soft_targets_list) + total_loss = paddle.add_n(list(student_loss.values())) + student_loss['loss'] = total_loss + return student_loss + else: + return self.student_model(inputs) + + +@register +class PPYOLOEDistillModel(DistillModel): + """ + Build PPYOLOE distill model, only used in PPYOLOE + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(PPYOLOEDistillModel, self).__init__(cfg=cfg, slim_cfg=slim_cfg) + assert self.arch in ['PPYOLOE'], 'Unsupported arch: {}'.format( + self.arch) + + def forward(self, inputs, alpha=0.125): + if self.training: + with paddle.no_grad(): + teacher_loss = self.teacher_model(inputs) + if hasattr(self.teacher_model.yolo_head, "assigned_labels"): + self.student_model.yolo_head.assigned_labels, self.student_model.yolo_head.assigned_bboxes, self.student_model.yolo_head.assigned_scores, self.student_model.yolo_head.mask_positive = \ + self.teacher_model.yolo_head.assigned_labels, self.teacher_model.yolo_head.assigned_bboxes, self.teacher_model.yolo_head.assigned_scores, self.teacher_model.yolo_head.mask_positive + delattr(self.teacher_model.yolo_head, "assigned_labels") + delattr(self.teacher_model.yolo_head, "assigned_bboxes") + delattr(self.teacher_model.yolo_head, "assigned_scores") + delattr(self.teacher_model.yolo_head, "mask_positive") + student_loss = self.student_model(inputs) + + logits_loss, feat_loss = self.distill_loss(self.teacher_model, + self.student_model) + det_total_loss = student_loss['loss'] + total_loss = alpha * (det_total_loss + logits_loss + feat_loss) + student_loss['loss'] = total_loss + student_loss['det_loss'] = det_total_loss + student_loss['logits_loss'] = logits_loss + student_loss['feat_loss'] = feat_loss + return student_loss + else: + return self.student_model(inputs) diff --git a/PaddleDetection-release-2.6/ppdet/slim/ofa.py b/PaddleDetection-release-2.6/ppdet/slim/ofa.py new file mode 100644 index 0000000000000000000000000000000000000000..b75edacdf2b65ad08cc538a0ca0334c03d53838a --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/ofa.py @@ -0,0 +1,89 @@ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ppdet.core.workspace import load_config, merge_config, create +from ppdet.utils.checkpoint import load_weight, load_pretrain_weight +from ppdet.utils.logger import setup_logger +from ppdet.core.workspace import register, serializable + +from paddle.utils import try_import + +logger = setup_logger(__name__) + + +@register +@serializable +class OFA(object): + def __init__(self, ofa_config): + super(OFA, self).__init__() + self.ofa_config = ofa_config + + def __call__(self, model, param_state_dict): + + paddleslim = try_import('paddleslim') + from paddleslim.nas.ofa import OFA, RunConfig, utils + from paddleslim.nas.ofa.convert_super import Convert, supernet + task = self.ofa_config['task'] + expand_ratio = self.ofa_config['expand_ratio'] + + skip_neck = self.ofa_config['skip_neck'] + skip_head = self.ofa_config['skip_head'] + + run_config = self.ofa_config['RunConfig'] + if 'skip_layers' in run_config: + skip_layers = run_config['skip_layers'] + else: + skip_layers = [] + + # supernet config + sp_config = supernet(expand_ratio=expand_ratio) + # convert to supernet + model = Convert(sp_config).convert(model) + + skip_names = [] + if skip_neck: + skip_names.append('neck.') + if skip_head: + skip_names.append('head.') + + for name, sublayer in model.named_sublayers(): + for n in skip_names: + if n in name: + skip_layers.append(name) + + run_config['skip_layers'] = skip_layers + run_config = RunConfig(**run_config) + + # build ofa model + ofa_model = OFA(model, run_config=run_config) + + ofa_model.set_epoch(0) + ofa_model.set_task(task) + + input_spec = [{ + "image": paddle.ones( + shape=[1, 3, 640, 640], dtype='float32'), + "im_shape": paddle.full( + [1, 2], 640, dtype='float32'), + "scale_factor": paddle.ones( + shape=[1, 2], dtype='float32') + }] + + ofa_model._clear_search_space(input_spec=input_spec) + ofa_model._build_ss = True + check_ss = ofa_model._sample_config('expand_ratio', phase=None) + # tokenize the search space + ofa_model.tokenize() + # check token map, search cands and search space + logger.info('Token map is {}'.format(ofa_model.token_map)) + logger.info('Search candidates is {}'.format(ofa_model.search_cands)) + logger.info('The length of search_space is {}, search_space is {}'. + format(len(ofa_model._ofa_layers), ofa_model._ofa_layers)) + # set model state dict into ofa model + utils.set_state_dict(ofa_model.model, param_state_dict) + return ofa_model diff --git a/PaddleDetection-release-2.6/ppdet/slim/prune.py b/PaddleDetection-release-2.6/ppdet/slim/prune.py new file mode 100644 index 0000000000000000000000000000000000000000..28ffb7588d1e596e5883072b3bd2b5e6ba80ed7f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/prune.py @@ -0,0 +1,151 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +from paddle.utils import try_import + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +def print_prune_params(model): + model_dict = model.state_dict() + for key in model_dict.keys(): + weight_name = model_dict[key].name + logger.info('Parameter name: {}, shape: {}'.format( + weight_name, model_dict[key].shape)) + + +@register +@serializable +class Pruner(object): + def __init__(self, + criterion, + pruned_params, + pruned_ratios, + print_params=False): + super(Pruner, self).__init__() + assert criterion in ['l1_norm', 'fpgm'], \ + "unsupported prune criterion: {}".format(criterion) + self.criterion = criterion + self.pruned_params = pruned_params + self.pruned_ratios = pruned_ratios + self.print_params = print_params + + def __call__(self, model): + # FIXME: adapt to network graph when Training and inference are + # inconsistent, now only supports prune inference network graph. + model.eval() + paddleslim = try_import('paddleslim') + from paddleslim.analysis import dygraph_flops as flops + input_spec = [{ + "image": paddle.ones( + shape=[1, 3, 640, 640], dtype='float32'), + "im_shape": paddle.full( + [1, 2], 640, dtype='float32'), + "scale_factor": paddle.ones( + shape=[1, 2], dtype='float32') + }] + if self.print_params: + print_prune_params(model) + + ori_flops = flops(model, input_spec) / (1000**3) + logger.info("FLOPs before pruning: {}GFLOPs".format(ori_flops)) + if self.criterion == 'fpgm': + pruner = paddleslim.dygraph.FPGMFilterPruner(model, input_spec) + elif self.criterion == 'l1_norm': + pruner = paddleslim.dygraph.L1NormFilterPruner(model, input_spec) + + logger.info("pruned params: {}".format(self.pruned_params)) + pruned_ratios = [float(n) for n in self.pruned_ratios] + ratios = {} + for i, param in enumerate(self.pruned_params): + ratios[param] = pruned_ratios[i] + pruner.prune_vars(ratios, [0]) + pruned_flops = flops(model, input_spec) / (1000**3) + logger.info("FLOPs after pruning: {}GFLOPs; pruned ratio: {}".format( + pruned_flops, (ori_flops - pruned_flops) / ori_flops)) + + return model + + +@register +@serializable +class PrunerQAT(object): + def __init__(self, criterion, pruned_params, pruned_ratios, + print_prune_params, quant_config, print_qat_model): + super(PrunerQAT, self).__init__() + assert criterion in ['l1_norm', 'fpgm'], \ + "unsupported prune criterion: {}".format(criterion) + # Pruner hyperparameter + self.criterion = criterion + self.pruned_params = pruned_params + self.pruned_ratios = pruned_ratios + self.print_prune_params = print_prune_params + # QAT hyperparameter + self.quant_config = quant_config + self.print_qat_model = print_qat_model + + def __call__(self, model): + # FIXME: adapt to network graph when Training and inference are + # inconsistent, now only supports prune inference network graph. + model.eval() + paddleslim = try_import('paddleslim') + from paddleslim.analysis import dygraph_flops as flops + input_spec = [{ + "image": paddle.ones( + shape=[1, 3, 640, 640], dtype='float32'), + "im_shape": paddle.full( + [1, 2], 640, dtype='float32'), + "scale_factor": paddle.ones( + shape=[1, 2], dtype='float32') + }] + if self.print_prune_params: + print_prune_params(model) + + ori_flops = flops(model, input_spec) / 1000 + logger.info("FLOPs before pruning: {}GFLOPs".format(ori_flops)) + if self.criterion == 'fpgm': + pruner = paddleslim.dygraph.FPGMFilterPruner(model, input_spec) + elif self.criterion == 'l1_norm': + pruner = paddleslim.dygraph.L1NormFilterPruner(model, input_spec) + + logger.info("pruned params: {}".format(self.pruned_params)) + pruned_ratios = [float(n) for n in self.pruned_ratios] + ratios = {} + for i, param in enumerate(self.pruned_params): + ratios[param] = pruned_ratios[i] + pruner.prune_vars(ratios, [0]) + pruned_flops = flops(model, input_spec) / 1000 + logger.info("FLOPs after pruning: {}GFLOPs; pruned ratio: {}".format( + pruned_flops, (ori_flops - pruned_flops) / ori_flops)) + + self.quanter = paddleslim.dygraph.quant.QAT(config=self.quant_config) + + self.quanter.quantize(model) + + if self.print_qat_model: + logger.info("Quantized model:") + logger.info(model) + + return model + + def save_quantized_model(self, layer, path, input_spec=None, **config): + self.quanter.save_quantized_model( + model=layer, path=path, input_spec=input_spec, **config) diff --git a/PaddleDetection-release-2.6/ppdet/slim/quant.py b/PaddleDetection-release-2.6/ppdet/slim/quant.py new file mode 100644 index 0000000000000000000000000000000000000000..44508198c46b77485d61e2b4e4d2804c62f96622 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/quant.py @@ -0,0 +1,89 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from paddle.utils import try_import + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register +@serializable +class QAT(object): + def __init__(self, quant_config, print_model): + super(QAT, self).__init__() + self.quant_config = quant_config + self.print_model = print_model + + def __call__(self, model): + paddleslim = try_import('paddleslim') + self.quanter = paddleslim.dygraph.quant.QAT(config=self.quant_config) + if self.print_model: + logger.info("Model before quant:") + logger.info(model) + + # For PP-YOLOE, convert model to deploy firstly. + for layer in model.sublayers(): + if hasattr(layer, 'convert_to_deploy'): + layer.convert_to_deploy() + + self.quanter.quantize(model) + + if self.print_model: + logger.info("Quantized model:") + logger.info(model) + + return model + + def save_quantized_model(self, layer, path, input_spec=None, **config): + self.quanter.save_quantized_model( + model=layer, path=path, input_spec=input_spec, **config) + + +@register +@serializable +class PTQ(object): + def __init__(self, + ptq_config, + quant_batch_num=10, + output_dir='output_inference', + fuse=True, + fuse_list=None): + super(PTQ, self).__init__() + self.ptq_config = ptq_config + self.quant_batch_num = quant_batch_num + self.output_dir = output_dir + self.fuse = fuse + self.fuse_list = fuse_list + + def __call__(self, model): + paddleslim = try_import('paddleslim') + self.ptq = paddleslim.PTQ(**self.ptq_config) + model.eval() + quant_model = self.ptq.quantize( + model, fuse=self.fuse, fuse_list=self.fuse_list) + + return quant_model + + def save_quantized_model(self, + quant_model, + quantize_model_path, + input_spec=None): + self.ptq.save_quantized_model(quant_model, quantize_model_path, + input_spec) diff --git a/PaddleDetection-release-2.6/ppdet/slim/unstructured_prune.py b/PaddleDetection-release-2.6/ppdet/slim/unstructured_prune.py new file mode 100644 index 0000000000000000000000000000000000000000..1dc876a8cb069700408a5c3f4b341be78e7dd6a3 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/slim/unstructured_prune.py @@ -0,0 +1,66 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from paddle.utils import try_import + +from ppdet.core.workspace import register, serializable +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + + +@register +@serializable +class UnstructuredPruner(object): + def __init__(self, + stable_epochs, + pruning_epochs, + tunning_epochs, + pruning_steps, + ratio, + initial_ratio, + prune_params_type=None): + self.stable_epochs = stable_epochs + self.pruning_epochs = pruning_epochs + self.tunning_epochs = tunning_epochs + self.ratio = ratio + self.prune_params_type = prune_params_type + self.initial_ratio = initial_ratio + self.pruning_steps = pruning_steps + + def __call__(self, model, steps_per_epoch, skip_params_func=None): + paddleslim = try_import('paddleslim') + from paddleslim import GMPUnstructuredPruner + configs = { + 'pruning_strategy': 'gmp', + 'stable_iterations': self.stable_epochs * steps_per_epoch, + 'pruning_iterations': self.pruning_epochs * steps_per_epoch, + 'tunning_iterations': self.tunning_epochs * steps_per_epoch, + 'resume_iteration': 0, + 'pruning_steps': self.pruning_steps, + 'initial_ratio': self.initial_ratio, + } + + pruner = GMPUnstructuredPruner( + model, + ratio=self.ratio, + skip_params_func=skip_params_func, + prune_params_type=self.prune_params_type, + local_sparsity=True, + configs=configs) + + return pruner diff --git a/PaddleDetection-release-2.6/ppdet/utils/__init__.py b/PaddleDetection-release-2.6/ppdet/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..d0c32e26092f6ea25771279418582a24ea449ab2 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/__init__.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/__init__.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5e379ca70dfd7bbb005b0aad69f5a1d462ded5ce Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/__init__.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/check.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/check.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1258e46f145443aa42ce29322f3209499f3a07cd Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/check.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/checkpoint.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/checkpoint.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f0466340ccc10ee5c9960fdcda87b216f59a28fc Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/checkpoint.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/cli.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/cli.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f20c6101a6738c22ff65108aed153a41bb83545a Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/cli.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/colormap.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/colormap.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..696aea1de7f068d6053c33b1f28763b0f6e4b297 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/colormap.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/download.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/download.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..55c9b5994a6e9b4d2f5a133f335cc017a2186dbc Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/download.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/fuse_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/fuse_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..4223f93a4020c529f5aa44a8cdc70433e7365c5c Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/fuse_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/logger.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/logger.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..d2ed21561113aa9a287b015c512dd030e003af59 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/logger.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/profiler.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/profiler.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..fe3c7846d66ed1ea6420815bc6f6911594d404f7 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/profiler.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/stats.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/stats.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c5e86c48ec00fc97c445109a5b020f07a9a8a5a7 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/stats.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/visualizer.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/visualizer.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..feef3fa567c4e37d1d67d35ece6071c1aef0b4e9 Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/visualizer.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/__pycache__/voc_utils.cpython-37.pyc b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/voc_utils.cpython-37.pyc new file mode 100644 index 0000000000000000000000000000000000000000..1b77a717aa76d2813b62ad4fee3458104ffc532f Binary files /dev/null and b/PaddleDetection-release-2.6/ppdet/utils/__pycache__/voc_utils.cpython-37.pyc differ diff --git a/PaddleDetection-release-2.6/ppdet/utils/cam_utils.py b/PaddleDetection-release-2.6/ppdet/utils/cam_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d2f7a4732be1a8ba3157d0d28304b0bd1b71da02 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/cam_utils.py @@ -0,0 +1,343 @@ +import numpy as np +import cv2 +import os +import sys +import glob +from ppdet.utils.logger import setup_logger +import copy +logger = setup_logger('ppdet_cam') + +import paddle +from ppdet.engine import Trainer + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--infer_img or --infer_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + logger.info("Found {} inference images in total.".format(len(images))) + + return images + + +def compute_ious(boxes1, boxes2): + """[Compute pairwise IOU matrix for given two sets of boxes] + + Args: + boxes1 ([numpy ndarray with shape N,4]): [representing bounding boxes with format (xmin,ymin,xmax,ymax)] + boxes2 ([numpy ndarray with shape M,4]): [representing bounding boxes with format (xmin,ymin,xmax,ymax)] + Returns: + pairwise IOU maxtrix with shape (N,M),where the value at ith row jth column hold the iou between ith + box and jth box from box1 and box2 respectively. + """ + lu = np.maximum( + boxes1[:, None, :2], boxes2[:, :2] + ) # lu with shape N,M,2 ; boxes1[:,None,:2] with shape (N,1,2) boxes2 with shape(M,2) + rd = np.minimum(boxes1[:, None, 2:], boxes2[:, 2:]) # rd same to lu + intersection_wh = np.maximum(0.0, rd - lu) + intersection_area = intersection_wh[:, :, + 0] * intersection_wh[:, :, + 1] # with shape (N,M) + boxes1_wh = np.maximum(0.0, boxes1[:, 2:] - boxes1[:, :2]) + boxes1_area = boxes1_wh[:, 0] * boxes1_wh[:, 1] # with shape (N,) + boxes2_wh = np.maximum(0.0, boxes2[:, 2:] - boxes2[:, :2]) + boxes2_area = boxes2_wh[:, 0] * boxes2_wh[:, 1] # with shape (M,) + union_area = np.maximum( + boxes1_area[:, None] + boxes2_area - intersection_area, + 1e-8) # with shape (N,M) + ious = np.clip(intersection_area / union_area, 0.0, 1.0) + return ious + + +def grad_cam(feat, grad): + """ + + Args: + feat: CxHxW + grad: CxHxW + + Returns: + cam: HxW + """ + exp = (feat * grad.mean((1, 2), keepdims=True)).mean(axis=0) + exp = np.maximum(-exp, 0) + return exp + + +def resize_cam(explanation, resize_shape) -> np.ndarray: + """ + + Args: + explanation: (width, height) + resize_shape: (width, height) + + Returns: + + """ + assert len(explanation.shape) == 2, f"{explanation.shape}. " \ + f"Currently support 2D explanation results for visualization. " \ + "Reduce higher dimensions to 2D for visualization." + + explanation = (explanation - explanation.min()) / ( + explanation.max() - explanation.min()) + + explanation = cv2.resize(explanation, resize_shape) + explanation = np.uint8(255 * explanation) + explanation = cv2.applyColorMap(explanation, cv2.COLORMAP_JET) + explanation = cv2.cvtColor(explanation, cv2.COLOR_BGR2RGB) + + return explanation + + +class BBoxCAM: + def __init__(self, FLAGS, cfg): + self.FLAGS = FLAGS + self.cfg = cfg + # build model + self.trainer = self.build_trainer(cfg) + # num_class + self.num_class = cfg.num_classes + # set hook for extraction of featuremaps and grads + self.set_hook(cfg) + self.nms_idx_need_divid_numclass_arch = ['FasterRCNN', 'MaskRCNN', 'CascadeRCNN'] + """ + In these networks, the bbox array shape before nms contain num_class, + the nms_keep_idx of the bbox need to divide the num_class; + """ + + # cam image output_dir + try: + os.makedirs(FLAGS.cam_out) + except: + print('Path already exists.') + pass + + def build_trainer(self, cfg): + # build trainer + trainer = Trainer(cfg, mode='test') + # load weights + trainer.load_weights(cfg.weights) + + # set for get extra_data before nms + trainer.model.use_extra_data=True + # set for record the bbox index before nms + if cfg.architecture in ['FasterRCNN', 'MaskRCNN']: + trainer.model.bbox_post_process.nms.return_index = True + elif cfg.architecture in ['YOLOv3', 'PPYOLOE', 'PPYOLOEWithAuxHead']: + if trainer.model.post_process is not None: + # anchor based YOLOs: YOLOv3,PP-YOLO + trainer.model.post_process.nms.return_index = True + else: + # anchor free YOLOs: PP-YOLOE, PP-YOLOE+ + trainer.model.yolo_head.nms.return_index = True + elif cfg.architecture=='BlazeFace' or cfg.architecture=='SSD': + trainer.model.post_process.nms.return_index = True + elif cfg.architecture=='RetinaNet': + trainer.model.head.nms.return_index = True + else: + print( + cfg.architecture+' is not supported for cam temporarily!' + ) + sys.exit() + # Todo: Unify the head/post_process name in each model + + return trainer + + def set_hook(self, cfg): + # set hook for extraction of featuremaps and grads + self.target_feats = {} + self.target_layer_name = cfg.target_feature_layer_name + # such as trainer.model.backbone, trainer.model.bbox_head.roi_extractor + + def hook(layer, input, output): + self.target_feats[layer._layer_name_for_hook] = output + + try: + exec('self.trainer.'+self.target_layer_name+'._layer_name_for_hook = self.target_layer_name') + # self.trainer.target_layer_name._layer_name_for_hook = self.target_layer_name + exec('self.trainer.'+self.target_layer_name+'.register_forward_post_hook(hook)') + # self.trainer.target_layer_name.register_forward_post_hook(hook) + except: + print("Error! " + "The target_layer_name--"+self.target_layer_name+" is not in model! " + "Please check the spelling and " + "the network's architecture!") + sys.exit() + + def get_bboxes(self): + # get inference images + images = get_test_images(self.FLAGS.infer_dir, self.FLAGS.infer_img) + + # inference + result = self.trainer.predict( + images, + draw_threshold=self.FLAGS.draw_threshold, + output_dir=self.FLAGS.output_dir, + save_results=self.FLAGS.save_results, + visualize=False)[0] + return result + + def get_bboxes_cams(self): + # Get the bboxes prediction(after nms result) of the input + inference_result = self.get_bboxes() + + # read input image + # Todo: Support folder multi-images process + from PIL import Image + img = np.array(Image.open(self.cfg.infer_img)) + + # data for calaulate bbox grad_cam + extra_data = inference_result['extra_data'] + """ + Example of Faster_RCNN based architecture: + extra_data: {'scores': tensor with shape [num_of_bboxes_before_nms, num_classes], for example: [1000, 80] + 'nms_keep_idx': tensor with shape [num_of_bboxes_after_nms, 1], for example: [300, 1] + } + Example of YOLOv3 based architecture: + extra_data: {'scores': tensor with shape [1, num_classes, num_of_yolo_bboxes_before_nms], #for example: [1, 80, 8400] + 'nms_keep_idx': tensor with shape [num_of_yolo_bboxes_after_nms, 1], # for example: [300, 1] + } + """ + + # array index of the predicted bbox before nms + if self.cfg.architecture in self.nms_idx_need_divid_numclass_arch: + # some network's bbox array shape before nms may be like [num_of_bboxes_before_nms, num_classes, 4], + # we need to divide num_classes to get the before_nms_index; + # currently, only include the rcnn architectures (fasterrcnn, maskrcnn, cascadercnn); + before_nms_indexes = extra_data['nms_keep_idx'].cpu().numpy( + ) // self.num_class # num_class + else : + before_nms_indexes = extra_data['nms_keep_idx'].cpu().numpy() + + # Calculate and visualize the heatmap of per predict bbox + for index, target_bbox in enumerate(inference_result['bbox']): + # target_bbox: [cls, score, x1, y1, x2, y2] + # filter bboxes with low predicted scores + if target_bbox[1] < self.FLAGS.draw_threshold: + continue + + target_bbox_before_nms = int(before_nms_indexes[index]) + + if len(extra_data['scores'].shape)==2: + score_out = extra_data['scores'][target_bbox_before_nms] + else: + score_out = extra_data['scores'][0, :, target_bbox_before_nms] + """ + There are two kinds array shape of bbox score output : + 1) [num_of_bboxes_before_nms, num_classes], for example: [1000, 80] + 2) [num_of_image, num_classes, num_of_yolo_bboxes_before_nms], for example: [1, 80, 1000] + """ + + + # construct one_hot label and do backward to get the gradients + predicted_label = paddle.argmax(score_out) + label_onehot = paddle.nn.functional.one_hot( + predicted_label, num_classes=len(score_out)) + label_onehot = label_onehot.squeeze() + target = paddle.sum(score_out * label_onehot) + target.backward(retain_graph=True) + + + if 'backbone' in self.target_layer_name or \ + 'neck' in self.target_layer_name: # backbone/neck level feature + if isinstance(self.target_feats[self.target_layer_name], list): + # when the featuremap contains of multiple scales, + # take the featuremap of the last scale + # Todo: fuse the cam result from multisclae featuremaps + if self.target_feats[self.target_layer_name][ + -1].shape[-1]==1: + """ + if the last level featuremap is 1x1 size, + we take the second last one + """ + cam_grad = self.target_feats[self.target_layer_name][ + -2].grad.squeeze().cpu().numpy() + cam_feat = self.target_feats[self.target_layer_name][ + -2].squeeze().cpu().numpy() + else: + cam_grad = self.target_feats[self.target_layer_name][ + -1].grad.squeeze().cpu().numpy() + cam_feat = self.target_feats[self.target_layer_name][ + -1].squeeze().cpu().numpy() + else: + cam_grad = self.target_feats[ + self.target_layer_name].grad.squeeze().cpu().numpy() + cam_feat = self.target_feats[ + self.target_layer_name].squeeze().cpu().numpy() + else: # roi level feature + cam_grad = self.target_feats[ + self.target_layer_name].grad.squeeze().cpu().numpy()[target_bbox_before_nms] + cam_feat = self.target_feats[ + self.target_layer_name].squeeze().cpu().numpy()[target_bbox_before_nms] + + # grad_cam: + exp = grad_cam(cam_feat, cam_grad) + + if 'backbone' in self.target_layer_name or \ + 'neck' in self.target_layer_name: + """ + when use backbone/neck featuremap, + we first do the cam on whole image, + and then set the area outside the predic bbox to 0 + """ + # reshape the cam image to the input image size + resized_exp = resize_cam(exp, (img.shape[1], img.shape[0])) + mask = np.zeros((img.shape[0], img.shape[1], 3)) + mask[int(target_bbox[3]):int(target_bbox[5]), int(target_bbox[2]): + int(target_bbox[4]), :] = 1 + resized_exp = resized_exp * mask + # add the bbox cam back to the input image + overlay_vis = np.uint8(resized_exp * 0.4 + img * 0.6) + elif 'roi' in self.target_layer_name: + # get the bbox part of the image + bbox_img = copy.deepcopy(img[int(target_bbox[3]):int(target_bbox[5]), + int(target_bbox[2]):int(target_bbox[4]), :]) + # reshape the cam image to the bbox size + resized_exp = resize_cam(exp, (bbox_img.shape[1], bbox_img.shape[0])) + # add the bbox cam back to the bbox image + bbox_overlay_vis = np.uint8(resized_exp * 0.4 + bbox_img * 0.6) + # put the bbox_cam image to the original image + overlay_vis = copy.deepcopy(img) + overlay_vis[int(target_bbox[3]):int(target_bbox[5]), + int(target_bbox[2]):int(target_bbox[4]), :] = bbox_overlay_vis + else: + print( + 'Only supported cam for backbone/neck feature and roi feature, the others are not supported temporarily!' + ) + sys.exit() + + # put the bbox rectangle on image + cv2.rectangle( + overlay_vis, (int(target_bbox[2]), int(target_bbox[3])), + (int(target_bbox[4]), int(target_bbox[5])), (0, 0, 255), 2) + + # save visualization result + cam_image = Image.fromarray(overlay_vis) + cam_image.save(self.FLAGS.cam_out + '/' + str(index) + '.jpg') + + # clear gradients after each bbox grad_cam + target.clear_gradient() + for n, v in self.trainer.model.named_sublayers(): + v.clear_gradients() diff --git a/PaddleDetection-release-2.6/ppdet/utils/check.py b/PaddleDetection-release-2.6/ppdet/utils/check.py new file mode 100644 index 0000000000000000000000000000000000000000..7690ade9eab0a7d859459a0be74d344446be6938 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/check.py @@ -0,0 +1,156 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import sys + +import paddle +import six +import paddle.version as paddle_version + +from .logger import setup_logger +logger = setup_logger(__name__) + +__all__ = [ + 'check_gpu', 'check_npu', 'check_xpu', 'check_mlu', 'check_version', + 'check_config' +] + + +def check_mlu(use_mlu): + """ + Log error and exit when set use_mlu=true in paddlepaddle + cpu/gpu/xpu/npu version. + """ + err = "Config use_mlu cannot be set as true while you are " \ + "using paddlepaddle cpu/gpu/xpu/npu version ! \nPlease try: \n" \ + "\t1. Install paddlepaddle-mlu to run model on MLU \n" \ + "\t2. Set use_mlu as false in config file to run " \ + "model on CPU/GPU/XPU/NPU" + + try: + if use_mlu and not paddle.is_compiled_with_mlu(): + logger.error(err) + sys.exit(1) + except Exception as e: + pass + + +def check_npu(use_npu): + """ + Log error and exit when set use_npu=true in paddlepaddle + version without paddle-custom-npu installed. + """ + err = "Config use_npu cannot be set as true while you are " \ + "using paddlepaddle version without paddle-custom-npu " \ + "installed! \nPlease try: \n" \ + "\t1. Install paddle-custom-npu to run model on NPU \n" \ + "\t2. Set use_npu as false in config file to run " \ + "model on other devices supported." + + try: + if use_npu and not 'npu' in paddle.device.get_all_custom_device_type(): + logger.error(err) + sys.exit(1) + except Exception as e: + pass + + +def check_xpu(use_xpu): + """ + Log error and exit when set use_xpu=true in paddlepaddle + cpu/gpu/npu version. + """ + err = "Config use_xpu cannot be set as true while you are " \ + "using paddlepaddle cpu/gpu/npu version ! \nPlease try: \n" \ + "\t1. Install paddlepaddle-xpu to run model on XPU \n" \ + "\t2. Set use_xpu as false in config file to run " \ + "model on CPU/GPU/NPU" + + try: + if use_xpu and not paddle.is_compiled_with_xpu(): + logger.error(err) + sys.exit(1) + except Exception as e: + pass + + +def check_gpu(use_gpu): + """ + Log error and exit when set use_gpu=true in paddlepaddle + cpu version. + """ + err = "Config use_gpu cannot be set as true while you are " \ + "using paddlepaddle cpu version ! \nPlease try: \n" \ + "\t1. Install paddlepaddle-gpu to run model on GPU \n" \ + "\t2. Set use_gpu as false in config file to run " \ + "model on CPU" + + try: + if use_gpu and not paddle.is_compiled_with_cuda(): + logger.error(err) + sys.exit(1) + except Exception as e: + pass + + +def check_version(version='2.2'): + """ + Log error and exit when the installed version of paddlepaddle is + not satisfied. + """ + err = "PaddlePaddle version {} or higher is required, " \ + "or a suitable develop version is satisfied as well. \n" \ + "Please make sure the version is good with your code.".format(version) + + version_installed = [ + paddle_version.major, paddle_version.minor, paddle_version.patch, + paddle_version.rc + ] + + if version_installed == ['0', '0', '0', '0']: + return + + version_split = version.split('.') + + length = min(len(version_installed), len(version_split)) + for i in six.moves.range(length): + if version_installed[i] > version_split[i]: + return + if version_installed[i] < version_split[i]: + raise Exception(err) + + +def check_config(cfg): + """ + Check the correctness of the configuration file. Log error and exit + when Config is not compliant. + """ + err = "'{}' not specified in config file. Please set it in config file." + check_list = ['architecture', 'num_classes'] + try: + for var in check_list: + if not var in cfg: + logger.error(err.format(var)) + sys.exit(1) + except Exception as e: + pass + + if 'log_iter' not in cfg: + cfg.log_iter = 20 + + return cfg diff --git a/PaddleDetection-release-2.6/ppdet/utils/checkpoint.py b/PaddleDetection-release-2.6/ppdet/utils/checkpoint.py new file mode 100644 index 0000000000000000000000000000000000000000..f57ef0227c676cdf54cb337cca1c6f49b3a3542f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/checkpoint.py @@ -0,0 +1,278 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import errno +import os +import time +import numpy as np +import paddle +import paddle.nn as nn +from .download import get_weights_path + +from .logger import setup_logger +logger = setup_logger(__name__) + + +def is_url(path): + """ + Whether path is URL. + Args: + path (string): URL string or not. + """ + return path.startswith('http://') \ + or path.startswith('https://') \ + or path.startswith('ppdet://') + + +def _get_unique_endpoints(trainer_endpoints): + # Sorting is to avoid different environmental variables for each card + trainer_endpoints.sort() + ips = set() + unique_endpoints = set() + for endpoint in trainer_endpoints: + ip = endpoint.split(":")[0] + if ip in ips: + continue + ips.add(ip) + unique_endpoints.add(endpoint) + logger.info("unique_endpoints {}".format(unique_endpoints)) + return unique_endpoints + + +def _strip_postfix(path): + path, ext = os.path.splitext(path) + assert ext in ['', '.pdparams', '.pdopt', '.pdmodel'], \ + "Unknown postfix {} from weights".format(ext) + return path + + +def load_weight(model, weight, optimizer=None, ema=None, exchange=True): + if is_url(weight): + weight = get_weights_path(weight) + + path = _strip_postfix(weight) + pdparam_path = path + '.pdparams' + if not os.path.exists(pdparam_path): + raise ValueError("Model pretrain path {} does not " + "exists.".format(pdparam_path)) + + if ema is not None and os.path.exists(path + '.pdema'): + if exchange: + # Exchange model and ema_model to load + logger.info('Exchange model and ema_model to load:') + ema_state_dict = paddle.load(pdparam_path) + logger.info('Loading ema_model weights from {}'.format(path + + '.pdparams')) + param_state_dict = paddle.load(path + '.pdema') + logger.info('Loading model weights from {}'.format(path + '.pdema')) + else: + ema_state_dict = paddle.load(path + '.pdema') + logger.info('Loading ema_model weights from {}'.format(path + + '.pdema')) + param_state_dict = paddle.load(pdparam_path) + logger.info('Loading model weights from {}'.format(path + + '.pdparams')) + else: + ema_state_dict = None + param_state_dict = paddle.load(pdparam_path) + + model_dict = model.state_dict() + model_weight = {} + incorrect_keys = 0 + + for key, value in model_dict.items(): + if key in param_state_dict.keys(): + if isinstance(param_state_dict[key], np.ndarray): + param_state_dict[key] = paddle.to_tensor(param_state_dict[key]) + if value.dtype == param_state_dict[key].dtype: + model_weight[key] = param_state_dict[key] + else: + model_weight[key] = param_state_dict[key].astype(value.dtype) + else: + logger.info('Unmatched key: {}'.format(key)) + incorrect_keys += 1 + + assert incorrect_keys == 0, "Load weight {} incorrectly, \ + {} keys unmatched, please check again.".format(weight, + incorrect_keys) + logger.info('Finish resuming model weights: {}'.format(pdparam_path)) + + model.set_dict(model_weight) + + last_epoch = 0 + if optimizer is not None and os.path.exists(path + '.pdopt'): + optim_state_dict = paddle.load(path + '.pdopt') + # to solve resume bug, will it be fixed in paddle 2.0 + for key in optimizer.state_dict().keys(): + if not key in optim_state_dict.keys(): + optim_state_dict[key] = optimizer.state_dict()[key] + if 'last_epoch' in optim_state_dict: + last_epoch = optim_state_dict.pop('last_epoch') + optimizer.set_state_dict(optim_state_dict) + + if ema_state_dict is not None: + ema.resume(ema_state_dict, + optim_state_dict['LR_Scheduler']['last_epoch']) + elif ema_state_dict is not None: + ema.resume(ema_state_dict) + return last_epoch + + +def match_state_dict(model_state_dict, weight_state_dict): + """ + Match between the model state dict and pretrained weight state dict. + Return the matched state dict. + + The method supposes that all the names in pretrained weight state dict are + subclass of the names in models`, if the prefix 'backbone.' in pretrained weight + keys is stripped. And we could get the candidates for each model key. Then we + select the name with the longest matched size as the final match result. For + example, the model state dict has the name of + 'backbone.res2.res2a.branch2a.conv.weight' and the pretrained weight as + name of 'res2.res2a.branch2a.conv.weight' and 'branch2a.conv.weight'. We + match the 'res2.res2a.branch2a.conv.weight' to the model key. + """ + + model_keys = sorted(model_state_dict.keys()) + weight_keys = sorted(weight_state_dict.keys()) + + def match(a, b): + if b.startswith('backbone.res5'): + # In Faster RCNN, res5 pretrained weights have prefix of backbone, + # however, the corresponding model weights have difficult prefix, + # bbox_head. + b = b[9:] + return a == b or a.endswith("." + b) + + match_matrix = np.zeros([len(model_keys), len(weight_keys)]) + for i, m_k in enumerate(model_keys): + for j, w_k in enumerate(weight_keys): + if match(m_k, w_k): + match_matrix[i, j] = len(w_k) + max_id = match_matrix.argmax(1) + max_len = match_matrix.max(1) + max_id[max_len == 0] = -1 + + load_id = set(max_id) + load_id.discard(-1) + not_load_weight_name = [] + for idx in range(len(weight_keys)): + if idx not in load_id: + not_load_weight_name.append(weight_keys[idx]) + + if len(not_load_weight_name) > 0: + logger.info('{} in pretrained weight is not used in the model, ' + 'and its will not be loaded'.format(not_load_weight_name)) + matched_keys = {} + result_state_dict = {} + for model_id, weight_id in enumerate(max_id): + if weight_id == -1: + continue + model_key = model_keys[model_id] + weight_key = weight_keys[weight_id] + weight_value = weight_state_dict[weight_key] + model_value_shape = list(model_state_dict[model_key].shape) + + if list(weight_value.shape) != model_value_shape: + logger.info( + 'The shape {} in pretrained weight {} is unmatched with ' + 'the shape {} in model {}. And the weight {} will not be ' + 'loaded'.format(weight_value.shape, weight_key, + model_value_shape, model_key, weight_key)) + continue + + assert model_key not in result_state_dict + result_state_dict[model_key] = weight_value + if weight_key in matched_keys: + raise ValueError('Ambiguity weight {} loaded, it matches at least ' + '{} and {} in the model'.format( + weight_key, model_key, matched_keys[ + weight_key])) + matched_keys[weight_key] = model_key + return result_state_dict + + +def load_pretrain_weight(model, pretrain_weight): + if is_url(pretrain_weight): + pretrain_weight = get_weights_path(pretrain_weight) + + path = _strip_postfix(pretrain_weight) + if not (os.path.isdir(path) or os.path.isfile(path) or + os.path.exists(path + '.pdparams')): + raise ValueError("Model pretrain path `{}` does not exists. " + "If you don't want to load pretrain model, " + "please delete `pretrain_weights` field in " + "config file.".format(path)) + + model_dict = model.state_dict() + + weights_path = path + '.pdparams' + param_state_dict = paddle.load(weights_path) + param_state_dict = match_state_dict(model_dict, param_state_dict) + + for k, v in param_state_dict.items(): + if isinstance(v, np.ndarray): + v = paddle.to_tensor(v) + if model_dict[k].dtype != v.dtype: + param_state_dict[k] = v.astype(model_dict[k].dtype) + + model.set_dict(param_state_dict) + logger.info('Finish loading model weights: {}'.format(weights_path)) + + +def save_model(model, + optimizer, + save_dir, + save_name, + last_epoch, + ema_model=None): + """ + save model into disk. + + Args: + model (dict): the model state_dict to save parameters. + optimizer (paddle.optimizer.Optimizer): the Optimizer instance to + save optimizer states. + save_dir (str): the directory to be saved. + save_name (str): the path to be saved. + last_epoch (int): the epoch index. + ema_model (dict|None): the ema_model state_dict to save parameters. + """ + if paddle.distributed.get_rank() != 0: + return + assert isinstance(model, dict), ("model is not a instance of dict, " + "please call model.state_dict() to get.") + if not os.path.exists(save_dir): + os.makedirs(save_dir) + save_path = os.path.join(save_dir, save_name) + # save model + if ema_model is None: + paddle.save(model, save_path + ".pdparams") + else: + assert isinstance(ema_model, + dict), ("ema_model is not a instance of dict, " + "please call model.state_dict() to get.") + # Exchange model and ema_model to save + paddle.save(ema_model, save_path + ".pdparams") + paddle.save(model, save_path + ".pdema") + # save optimizer + state_dict = optimizer.state_dict() + state_dict['last_epoch'] = last_epoch + paddle.save(state_dict, save_path + ".pdopt") + logger.info("Save checkpoint: {}".format(save_dir)) diff --git a/PaddleDetection-release-2.6/ppdet/utils/cli.py b/PaddleDetection-release-2.6/ppdet/utils/cli.py new file mode 100644 index 0000000000000000000000000000000000000000..2c5acc0e591af4bbd07a1d22e1237656ac47da65 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/cli.py @@ -0,0 +1,158 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from argparse import ArgumentParser, RawDescriptionHelpFormatter + +import yaml +import re +from ppdet.core.workspace import get_registered_modules, dump_value + +__all__ = ['ColorTTY', 'ArgsParser'] + + +class ColorTTY(object): + def __init__(self): + super(ColorTTY, self).__init__() + self.colors = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan'] + + def __getattr__(self, attr): + if attr in self.colors: + color = self.colors.index(attr) + 31 + + def color_message(message): + return "[{}m{}".format(color, message) + + setattr(self, attr, color_message) + return color_message + + def bold(self, message): + return self.with_code('01', message) + + def with_code(self, code, message): + return "[{}m{}".format(code, message) + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument("-c", "--config", help="configuration file to use") + self.add_argument( + "-o", "--opt", nargs='*', help="set configuration options") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + args.opt = self._parse_opt(args.opt) + return args + + def _parse_opt(self, opts): + config = {} + if not opts: + return config + for s in opts: + s = s.strip() + k, v = s.split('=', 1) + if '.' not in k: + config[k] = yaml.load(v, Loader=yaml.Loader) + else: + keys = k.split('.') + if keys[0] not in config: + config[keys[0]] = {} + cur = config[keys[0]] + for idx, key in enumerate(keys[1:]): + if idx == len(keys) - 2: + cur[key] = yaml.load(v, Loader=yaml.Loader) + else: + cur[key] = {} + cur = cur[key] + return config + + +def merge_args(config, args, exclude_args=['config', 'opt', 'slim_config']): + for k, v in vars(args).items(): + if k not in exclude_args: + config[k] = v + return config + + +def print_total_cfg(config): + modules = get_registered_modules() + color_tty = ColorTTY() + green = '___{}___'.format(color_tty.colors.index('green') + 31) + + styled = {} + for key in config.keys(): + if not config[key]: # empty schema + continue + + if key not in modules and not hasattr(config[key], '__dict__'): + styled[key] = config[key] + continue + elif key in modules: + module = modules[key] + else: + type_name = type(config[key]).__name__ + if type_name in modules: + module = modules[type_name].copy() + module.update({ + k: v + for k, v in config[key].__dict__.items() + if k in module.schema + }) + key += " ({})".format(type_name) + default = module.find_default_keys() + missing = module.find_missing_keys() + mismatch = module.find_mismatch_keys() + extra = module.find_extra_keys() + dep_missing = [] + for dep in module.inject: + if isinstance(module[dep], str) and module[dep] != '': + if module[dep] not in modules: # not a valid module + dep_missing.append(dep) + else: + dep_mod = modules[module[dep]] + # empty dict but mandatory + if not dep_mod and dep_mod.mandatory(): + dep_missing.append(dep) + override = list( + set(module.keys()) - set(default) - set(extra) - set(dep_missing)) + replacement = {} + for name in set(override + default + extra + mismatch + missing): + new_name = name + if name in missing: + value = "" + else: + value = module[name] + + if name in extra: + value = dump_value(value) + " " + elif name in mismatch: + value = dump_value(value) + " " + elif name in dep_missing: + value = dump_value(value) + " " + elif name in override and value != '': + mark = green + new_name = mark + name + replacement[new_name] = value + styled[key] = replacement + buffer = yaml.dump(styled, default_flow_style=False, default_style='') + buffer = (re.sub(r"", r"[31m[0m", buffer)) + buffer = (re.sub(r"", r"[33m[0m", buffer)) + buffer = (re.sub(r"", r"[31m[0m", buffer)) + buffer = (re.sub(r"", + r"[31m[0m", buffer)) + buffer = re.sub(r"___(\d+)___(.*?):", r"[\1m\2[0m:", buffer) + print(buffer) diff --git a/PaddleDetection-release-2.6/ppdet/utils/colormap.py b/PaddleDetection-release-2.6/ppdet/utils/colormap.py new file mode 100644 index 0000000000000000000000000000000000000000..67c68dc1c67e7de5e658d424a3bce9040e73f48f --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/colormap.py @@ -0,0 +1,58 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import numpy as np + + +def colormap(rgb=False): + """ + Get colormap + + The code of this function is copied from https://github.com/facebookresearch/Detectron/blob/main/detectron/utils/colormap.py + """ + color_list = np.array([ + 0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494, + 0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078, + 0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000, + 1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000, + 0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667, + 0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000, + 0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000, + 1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000, + 0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500, + 0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667, + 0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333, + 0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000, + 0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333, + 0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000, + 1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000, + 1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167, + 0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, + 0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, + 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, + 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000, + 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833, + 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286, + 0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714, + 0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000 + ]).astype(np.float32) + color_list = color_list.reshape((-1, 3)) * 255 + if not rgb: + color_list = color_list[:, ::-1] + return color_list.astype('int32') diff --git a/PaddleDetection-release-2.6/ppdet/utils/download.py b/PaddleDetection-release-2.6/ppdet/utils/download.py new file mode 100644 index 0000000000000000000000000000000000000000..8fb95afa36602ce9c6964ff05190216d01ffb235 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/download.py @@ -0,0 +1,559 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import os.path as osp +import sys +import yaml +import time +import shutil +import requests +import tqdm +import hashlib +import base64 +import binascii +import tarfile +import zipfile +import errno + +from paddle.utils.download import _get_unique_endpoints +from ppdet.core.workspace import BASE_KEY +from .logger import setup_logger +from .voc_utils import create_list + +logger = setup_logger(__name__) + +__all__ = [ + 'get_weights_path', 'get_dataset_path', 'get_config_path', + 'download_dataset', 'create_voc_list' +] + +WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights") +DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset") +CONFIGS_HOME = osp.expanduser("~/.cache/paddle/configs") + +# dict of {dataset_name: (download_info, sub_dirs)} +# download info: [(url, md5sum)] +DATASETS = { + 'coco': ([ + ( + 'http://images.cocodataset.org/zips/train2017.zip', + 'cced6f7f71b7629ddf16f17bbcfab6b2', ), + ( + 'http://images.cocodataset.org/zips/val2017.zip', + '442b8da7639aecaf257c1dceb8ba8c80', ), + ( + 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip', + 'f4bbac642086de4f52a3fdda2de5fa2c', ), + ], ["annotations", "train2017", "val2017"]), + 'voc': ([ + ( + 'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar', + '6cd6e144f989b92b3379bac3b3de84fd', ), + ( + 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar', + 'c52e279531787c972589f7e41ab4ae64', ), + ( + 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', + 'b6e924de25625d8de591ea690078ad9f', ), + ( + 'https://paddledet.bj.bcebos.com/data/label_list.txt', + '5ae5d62183cfb6f6d3ac109359d06a1b', ), + ], ["VOCdevkit/VOC2012", "VOCdevkit/VOC2007"]), + 'wider_face': ([ + ( + 'https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip', + '3fedf70df600953d25982bcd13d91ba2', ), + ( + 'https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip', + 'dfa7d7e790efa35df3788964cf0bbaea', ), + ( + 'https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip', + 'a4a898d6193db4b9ef3260a68bad0dc7', ), + ], ["WIDER_train", "WIDER_val", "wider_face_split"]), + 'fruit': ([( + 'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit.tar', + 'baa8806617a54ccf3685fa7153388ae6', ), ], + ['Annotations', 'JPEGImages']), + 'roadsign_voc': ([( + 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar', + '8d629c0f880dd8b48de9aeff44bf1f3e', ), ], ['annotations', 'images']), + 'roadsign_coco': ([( + 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar', + '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), + 'spine_coco': ([( + 'https://paddledet.bj.bcebos.com/data/spine.tar', + '8a3a353c2c54a2284ad7d2780b65f6a6', ), ], ['annotations', 'images']), + 'coco_ce': ([( + 'https://paddledet.bj.bcebos.com/data/coco_ce.tar', + 'eadd1b79bc2f069f2744b1dd4e0c0329', ), ], []) +} + +DOWNLOAD_DATASETS_LIST = DATASETS.keys() + +DOWNLOAD_RETRY_LIMIT = 3 + +PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX = 'https://paddledet.bj.bcebos.com/' + + +# When running unit tests, there could be multiple processes that +# trying to create DATA_HOME directory simultaneously, so we cannot +# use a if condition to check for the existence of the directory; +# instead, we use the filesystem as the synchronization mechanism by +# catching returned errors. +def must_mkdirs(path): + try: + os.makedirs(path) + except OSError as exc: + if exc.errno != errno.EEXIST: + raise + pass + + +def parse_url(url): + url = url.replace("ppdet://", PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX) + return url + + +def get_weights_path(url): + """Get weights path from WEIGHTS_HOME, if not exists, + download it from url. + """ + url = parse_url(url) + path, _ = get_path(url, WEIGHTS_HOME) + return path + + +def get_config_path(url): + """Get weights path from CONFIGS_HOME, if not exists, + download it from url. + """ + url = parse_url(url) + path = map_path(url, CONFIGS_HOME, path_depth=2) + if os.path.isfile(path): + return path + + # config file not found, try download + # 1. clear configs directory + if osp.isdir(CONFIGS_HOME): + shutil.rmtree(CONFIGS_HOME) + + # 2. get url + try: + from ppdet import __version__ as version + except ImportError: + version = None + + cfg_url = "ppdet://configs/{}/configs.tar".format(version) \ + if version else "ppdet://configs/configs.tar" + cfg_url = parse_url(cfg_url) + + # 3. download and decompress + cfg_fullname = _download_dist(cfg_url, osp.dirname(CONFIGS_HOME)) + _decompress_dist(cfg_fullname) + + # 4. check config file existing + if os.path.isfile(path): + return path + else: + logger.error("Get config {} failed after download, please contact us on " \ + "https://github.com/PaddlePaddle/PaddleDetection/issues".format(path)) + sys.exit(1) + + +def get_dataset_path(path, annotation, image_dir): + """ + If path exists, return path. + Otherwise, get dataset path from DATASET_HOME, if not exists, + download it. + """ + if _dataset_exists(path, annotation, image_dir): + return path + + data_name = os.path.split(path.strip().lower())[-1] + if data_name not in DOWNLOAD_DATASETS_LIST: + raise ValueError( + "Dataset {} is not valid for reason above, please check again.". + format(osp.realpath(path))) + else: + logger.warning( + "Dataset {} is not valid for reason above, try searching {} or " + "downloading dataset...".format(osp.realpath(path), DATASET_HOME)) + + for name, dataset in DATASETS.items(): + if data_name == name: + logger.debug("Parse dataset_dir {} as dataset " + "{}".format(path, name)) + data_dir = osp.join(DATASET_HOME, name) + + if name == "spine_coco": + if _dataset_exists(data_dir, annotation, image_dir): + return data_dir + + # For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007 + if name in ['voc', 'fruit', 'roadsign_voc']: + exists = True + for sub_dir in dataset[1]: + check_dir = osp.join(data_dir, sub_dir) + if osp.exists(check_dir): + logger.info("Found {}".format(check_dir)) + else: + exists = False + if exists: + return data_dir + + # voc exist is checked above, voc is not exist here + check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc' + for url, md5sum in dataset[0]: + get_path(url, data_dir, md5sum, check_exist) + + # voc should create list after download + if name == 'voc': + create_voc_list(data_dir) + return data_dir + + raise ValueError("Dataset automaticly downloading Error.") + + +def create_voc_list(data_dir, devkit_subdir='VOCdevkit'): + logger.debug("Create voc file list...") + devkit_dir = osp.join(data_dir, devkit_subdir) + years = ['2007', '2012'] + + # NOTE: since using auto download VOC + # dataset, VOC default label list should be used, + # do not generate label_list.txt here. For default + # label, see ../data/source/voc.py + create_list(devkit_dir, years, data_dir) + logger.debug("Create voc file list finished") + + +def map_path(url, root_dir, path_depth=1): + # parse path after download to decompress under root_dir + assert path_depth > 0, "path_depth should be a positive integer" + dirname = url + for _ in range(path_depth): + dirname = osp.dirname(dirname) + fpath = osp.relpath(url, dirname) + + zip_formats = ['.zip', '.tar', '.gz'] + for zip_format in zip_formats: + fpath = fpath.replace(zip_format, '') + return osp.join(root_dir, fpath) + + +def get_path(url, root_dir, md5sum=None, check_exist=True): + """ Download from given url to root_dir. + if file or directory specified by url is exists under + root_dir, return the path directly, otherwise download + from url and decompress it, return the path. + + url (str): download url + root_dir (str): root dir for downloading, it should be + WEIGHTS_HOME or DATASET_HOME + md5sum (str): md5 sum of download package + """ + # parse path after download to decompress under root_dir + fullpath = map_path(url, root_dir) + + # For same zip file, decompressed directory name different + # from zip file name, rename by following map + decompress_name_map = { + "VOCtrainval_11-May-2012": "VOCdevkit/VOC2012", + "VOCtrainval_06-Nov-2007": "VOCdevkit/VOC2007", + "VOCtest_06-Nov-2007": "VOCdevkit/VOC2007", + "annotations_trainval": "annotations" + } + for k, v in decompress_name_map.items(): + if fullpath.find(k) >= 0: + fullpath = osp.join(osp.split(fullpath)[0], v) + + if osp.exists(fullpath) and check_exist: + if not osp.isfile(fullpath) or \ + _check_exist_file_md5(fullpath, md5sum, url): + logger.debug("Found {}".format(fullpath)) + return fullpath, True + else: + os.remove(fullpath) + + fullname = _download_dist(url, root_dir, md5sum) + + # new weights format which postfix is 'pdparams' not + # need to decompress + if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']: + _decompress_dist(fullname) + + return fullpath, False + + +def download_dataset(path, dataset=None): + if dataset not in DATASETS.keys(): + logger.error("Unknown dataset {}, it should be " + "{}".format(dataset, DATASETS.keys())) + return + dataset_info = DATASETS[dataset][0] + for info in dataset_info: + get_path(info[0], path, info[1], False) + logger.debug("Download dataset {} finished.".format(dataset)) + + +def _dataset_exists(path, annotation, image_dir): + """ + Check if user define dataset exists + """ + if not osp.exists(path): + logger.warning("Config dataset_dir {} is not exits, " + "dataset config is not valid".format(path)) + return False + + if annotation: + annotation_path = osp.join(path, annotation) + if not osp.isfile(annotation_path): + logger.warning("Config annotation {} is not a " + "file, dataset config is not " + "valid".format(annotation_path)) + return False + if image_dir: + image_path = osp.join(path, image_dir) + if not osp.isdir(image_path): + logger.warning("Config image_dir {} is not a " + "directory, dataset config is not " + "valid".format(image_path)) + return False + return True + + +def _download(url, path, md5sum=None): + """ + Download from url, save to path. + + url (str): download url + path (str): download to given path + """ + must_mkdirs(path) + + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + retry_cnt = 0 + + while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum, + url)): + if retry_cnt < DOWNLOAD_RETRY_LIMIT: + retry_cnt += 1 + else: + raise RuntimeError("Download from {} failed. " + "Retry limit reached".format(url)) + + logger.info("Downloading {} from {}".format(fname, url)) + + # NOTE: windows path join may incur \, which is invalid in url + if sys.platform == "win32": + url = url.replace('\\', '/') + + req = requests.get(url, stream=True) + if req.status_code != 200: + raise RuntimeError("Downloading from {} failed with code " + "{}!".format(url, req.status_code)) + + # For protecting download interupted, download to + # tmp_fullname firstly, move tmp_fullname to fullname + # after download finished + tmp_fullname = fullname + "_tmp" + total_size = req.headers.get('content-length') + with open(tmp_fullname, 'wb') as f: + if total_size: + for chunk in tqdm.tqdm( + req.iter_content(chunk_size=1024), + total=(int(total_size) + 1023) // 1024, + unit='KB'): + f.write(chunk) + else: + for chunk in req.iter_content(chunk_size=1024): + if chunk: + f.write(chunk) + shutil.move(tmp_fullname, fullname) + return fullname + + +def _download_dist(url, path, md5sum=None): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + # Mainly used to solve the problem of downloading data from + # different machines in the case of multiple machines. + # Different nodes will download data, and the same node + # will only download data once. + # Reference https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/utils/download.py#L108 + rank_id_curr_node = int(os.environ.get("PADDLE_RANK_IN_NODE", 0)) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + return _download(url, path, md5sum) + else: + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + lock_path = fullname + '.download.lock' + + must_mkdirs(path) + + if not osp.exists(fullname): + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + if rank_id_curr_node == 0: + _download(url, path, md5sum) + os.remove(lock_path) + else: + while os.path.exists(lock_path): + time.sleep(0.5) + return fullname + else: + return _download(url, path, md5sum) + + +def _check_exist_file_md5(filename, md5sum, url): + # if md5sum is None, and file to check is weights file, + # read md5um from url and check, else check md5sum directly + return _md5check_from_url(filename, url) if md5sum is None \ + and filename.endswith('pdparams') \ + else _md5check(filename, md5sum) + + +def _md5check_from_url(filename, url): + # For weights in bcebos URLs, MD5 value is contained + # in request header as 'content_md5' + req = requests.get(url, stream=True) + content_md5 = req.headers.get('content-md5') + req.close() + if not content_md5 or _md5check( + filename, + binascii.hexlify(base64.b64decode(content_md5.strip('"'))).decode( + )): + return True + else: + return False + + +def _md5check(fullname, md5sum=None): + if md5sum is None: + return True + + logger.debug("File {} md5 checking...".format(fullname)) + md5 = hashlib.md5() + with open(fullname, 'rb') as f: + for chunk in iter(lambda: f.read(4096), b""): + md5.update(chunk) + calc_md5sum = md5.hexdigest() + + if calc_md5sum != md5sum: + logger.warning("File {} md5 check failed, {}(calc) != " + "{}(base)".format(fullname, calc_md5sum, md5sum)) + return False + return True + + +def _decompress(fname): + """ + Decompress for zip and tar file + """ + logger.info("Decompressing {}...".format(fname)) + + # For protecting decompressing interupted, + # decompress to fpath_tmp directory firstly, if decompress + # successed, move decompress files to fpath and delete + # fpath_tmp and remove download compress file. + fpath = osp.split(fname)[0] + fpath_tmp = osp.join(fpath, 'tmp') + if osp.isdir(fpath_tmp): + shutil.rmtree(fpath_tmp) + os.makedirs(fpath_tmp) + + if fname.find('tar') >= 0: + with tarfile.open(fname) as tf: + tf.extractall(path=fpath_tmp) + elif fname.find('zip') >= 0: + with zipfile.ZipFile(fname) as zf: + zf.extractall(path=fpath_tmp) + elif fname.find('.txt') >= 0: + return + else: + raise TypeError("Unsupport compress file type {}".format(fname)) + + for f in os.listdir(fpath_tmp): + src_dir = osp.join(fpath_tmp, f) + dst_dir = osp.join(fpath, f) + _move_and_merge_tree(src_dir, dst_dir) + + shutil.rmtree(fpath_tmp) + os.remove(fname) + + +def _decompress_dist(fname): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + trainer_id = int(env['PADDLE_TRAINER_ID']) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + _decompress(fname) + else: + lock_path = fname + '.decompress.lock' + from paddle.distributed import ParallelEnv + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + # NOTE(dkp): _decompress_dist always performed after + # _download_dist, in _download_dist sub-trainers is waiting + # for download lock file release with sleeping, if decompress + # prograss is very fast and finished with in the sleeping gap + # time, e.g in tiny dataset such as coco_ce, spine_coco, main + # trainer may finish decompress and release lock file, so we + # only craete lock file in main trainer and all sub-trainer + # wait 1s for main trainer to create lock file, for 1s is + # twice as sleeping gap, this waiting time can keep all + # trainer pipeline in order + # **change this if you have more elegent methods** + if ParallelEnv().current_endpoint in unique_endpoints: + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + _decompress(fname) + os.remove(lock_path) + else: + time.sleep(1) + while os.path.exists(lock_path): + time.sleep(0.5) + else: + _decompress(fname) + + +def _move_and_merge_tree(src, dst): + """ + Move src directory to dst, if dst is already exists, + merge src to dst + """ + if not osp.exists(dst): + shutil.move(src, dst) + elif osp.isfile(src): + shutil.move(src, dst) + else: + for fp in os.listdir(src): + src_fp = osp.join(src, fp) + dst_fp = osp.join(dst, fp) + if osp.isdir(src_fp): + if osp.isdir(dst_fp): + _move_and_merge_tree(src_fp, dst_fp) + else: + shutil.move(src_fp, dst_fp) + elif osp.isfile(src_fp) and \ + not osp.isfile(dst_fp): + shutil.move(src_fp, dst_fp) diff --git a/PaddleDetection-release-2.6/ppdet/utils/fuse_utils.py b/PaddleDetection-release-2.6/ppdet/utils/fuse_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..647fa995da615fcb2bcdca13f4296f73e3204628 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/fuse_utils.py @@ -0,0 +1,179 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import paddle +import paddle.nn as nn + +__all__ = ['fuse_conv_bn'] + + +def fuse_conv_bn(model): + is_train = False + if model.training: + model.eval() + is_train = True + fuse_list = [] + tmp_pair = [None, None] + for name, layer in model.named_sublayers(): + if isinstance(layer, nn.Conv2D): + tmp_pair[0] = name + if isinstance(layer, nn.BatchNorm2D): + tmp_pair[1] = name + + if tmp_pair[0] and tmp_pair[1] and len(tmp_pair) == 2: + fuse_list.append(tmp_pair) + tmp_pair = [None, None] + model = fuse_layers(model, fuse_list) + if is_train: + model.train() + return model + + +def find_parent_layer_and_sub_name(model, name): + """ + Given the model and the name of a layer, find the parent layer and + the sub_name of the layer. + For example, if name is 'block_1/convbn_1/conv_1', the parent layer is + 'block_1/convbn_1' and the sub_name is `conv_1`. + Args: + model(paddle.nn.Layer): the model to be quantized. + name(string): the name of a layer + + Returns: + parent_layer, subname + """ + assert isinstance(model, nn.Layer), \ + "The model must be the instance of paddle.nn.Layer." + assert len(name) > 0, "The input (name) should not be empty." + + last_idx = 0 + idx = 0 + parent_layer = model + while idx < len(name): + if name[idx] == '.': + sub_name = name[last_idx:idx] + if hasattr(parent_layer, sub_name): + parent_layer = getattr(parent_layer, sub_name) + last_idx = idx + 1 + idx += 1 + sub_name = name[last_idx:idx] + return parent_layer, sub_name + + +class Identity(nn.Layer): + '''a layer to replace bn or relu layers''' + + def __init__(self, *args, **kwargs): + super(Identity, self).__init__() + + def forward(self, input): + return input + + +def fuse_layers(model, layers_to_fuse, inplace=False): + ''' + fuse layers in layers_to_fuse + + Args: + model(nn.Layer): The model to be fused. + layers_to_fuse(list): The layers' names to be fused. For + example,"fuse_list = [["conv1", "bn1"], ["conv2", "bn2"]]". + A TypeError would be raised if "fuse" was set as + True but "fuse_list" was None. + Default: None. + inplace(bool): Whether apply fusing to the input model. + Default: False. + + Return + fused_model(paddle.nn.Layer): The fused model. + ''' + if not inplace: + model = copy.deepcopy(model) + for layers_list in layers_to_fuse: + layer_list = [] + for layer_name in layers_list: + parent_layer, sub_name = find_parent_layer_and_sub_name(model, + layer_name) + layer_list.append(getattr(parent_layer, sub_name)) + new_layers = _fuse_func(layer_list) + for i, item in enumerate(layers_list): + parent_layer, sub_name = find_parent_layer_and_sub_name(model, item) + setattr(parent_layer, sub_name, new_layers[i]) + return model + + +def _fuse_func(layer_list): + '''choose the fuser method and fuse layers''' + types = tuple(type(m) for m in layer_list) + fusion_method = types_to_fusion_method.get(types, None) + new_layers = [None] * len(layer_list) + fused_layer = fusion_method(*layer_list) + for handle_id, pre_hook_fn in layer_list[0]._forward_pre_hooks.items(): + fused_layer.register_forward_pre_hook(pre_hook_fn) + del layer_list[0]._forward_pre_hooks[handle_id] + for handle_id, hook_fn in layer_list[-1]._forward_post_hooks.items(): + fused_layer.register_forward_post_hook(hook_fn) + del layer_list[-1]._forward_post_hooks[handle_id] + new_layers[0] = fused_layer + for i in range(1, len(layer_list)): + identity = Identity() + identity.training = layer_list[0].training + new_layers[i] = identity + return new_layers + + +def _fuse_conv_bn(conv, bn): + '''fuse conv and bn for train or eval''' + assert(conv.training == bn.training),\ + "Conv and BN both must be in the same mode (train or eval)." + if conv.training: + assert bn._num_features == conv._out_channels, 'Output channel of Conv2d must match num_features of BatchNorm2d' + raise NotImplementedError + else: + return _fuse_conv_bn_eval(conv, bn) + + +def _fuse_conv_bn_eval(conv, bn): + '''fuse conv and bn for eval''' + assert (not (conv.training or bn.training)), "Fusion only for eval!" + fused_conv = copy.deepcopy(conv) + + fused_weight, fused_bias = _fuse_conv_bn_weights( + fused_conv.weight, fused_conv.bias, bn._mean, bn._variance, bn._epsilon, + bn.weight, bn.bias) + fused_conv.weight.set_value(fused_weight) + if fused_conv.bias is None: + fused_conv.bias = paddle.create_parameter( + shape=[fused_conv._out_channels], is_bias=True, dtype=bn.bias.dtype) + fused_conv.bias.set_value(fused_bias) + return fused_conv + + +def _fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b): + '''fuse weights and bias of conv and bn''' + if conv_b is None: + conv_b = paddle.zeros_like(bn_rm) + if bn_w is None: + bn_w = paddle.ones_like(bn_rm) + if bn_b is None: + bn_b = paddle.zeros_like(bn_rm) + bn_var_rsqrt = paddle.rsqrt(bn_rv + bn_eps) + conv_w = conv_w * \ + (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1)) + conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_b + return conv_w, conv_b + + +types_to_fusion_method = {(nn.Conv2D, nn.BatchNorm2D): _fuse_conv_bn, } diff --git a/PaddleDetection-release-2.6/ppdet/utils/logger.py b/PaddleDetection-release-2.6/ppdet/utils/logger.py new file mode 100644 index 0000000000000000000000000000000000000000..51e296205273f0cc57fc4007758342cddf5210fa --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/logger.py @@ -0,0 +1,70 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import os +import sys + +import paddle.distributed as dist + +__all__ = ['setup_logger'] + +logger_initialized = [] + + +def setup_logger(name="ppdet", output=None): + """ + Initialize logger and set its verbosity level to INFO. + Args: + output (str): a file name or a directory to save log. If None, will not save log file. + If ends with ".txt" or ".log", assumed to be a file name. + Otherwise, logs will be saved to `output/log.txt`. + name (str): the root module name of this logger + + Returns: + logging.Logger: a logger + """ + logger = logging.getLogger(name) + if name in logger_initialized: + return logger + + logger.setLevel(logging.INFO) + logger.propagate = False + + formatter = logging.Formatter( + "[%(asctime)s] %(name)s %(levelname)s: %(message)s", + datefmt="%m/%d %H:%M:%S") + # stdout logging: master only + local_rank = dist.get_rank() + if local_rank == 0: + ch = logging.StreamHandler(stream=sys.stdout) + ch.setLevel(logging.DEBUG) + ch.setFormatter(formatter) + logger.addHandler(ch) + + # file logging: all workers + if output is not None: + if output.endswith(".txt") or output.endswith(".log"): + filename = output + else: + filename = os.path.join(output, "log.txt") + if local_rank > 0: + filename = filename + ".rank{}".format(local_rank) + os.makedirs(os.path.dirname(filename)) + fh = logging.FileHandler(filename, mode='a') + fh.setLevel(logging.DEBUG) + fh.setFormatter(logging.Formatter()) + logger.addHandler(fh) + logger_initialized.append(name) + return logger diff --git a/PaddleDetection-release-2.6/ppdet/utils/profiler.py b/PaddleDetection-release-2.6/ppdet/utils/profiler.py new file mode 100644 index 0000000000000000000000000000000000000000..cae3773fade36cd1d55421dc8d8b212d8f5413d7 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/profiler.py @@ -0,0 +1,111 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import paddle + +# A global variable to record the number of calling times for profiler +# functions. It is used to specify the tracing range of training steps. +_profiler_step_id = 0 + +# A global variable to avoid parsing from string every time. +_profiler_options = None + + +class ProfilerOptions(object): + ''' + Use a string to initialize a ProfilerOptions. + The string should be in the format: "key1=value1;key2=value;key3=value3". + For example: + "profile_path=model.profile" + "batch_range=[50, 60]; profile_path=model.profile" + "batch_range=[50, 60]; tracer_option=OpDetail; profile_path=model.profile" + + ProfilerOptions supports following key-value pair: + batch_range - a integer list, e.g. [100, 110]. + state - a string, the optional values are 'CPU', 'GPU' or 'All'. + sorted_key - a string, the optional values are 'calls', 'total', + 'max', 'min' or 'ave. + tracer_option - a string, the optional values are 'Default', 'OpDetail', + 'AllOpDetail'. + profile_path - a string, the path to save the serialized profile data, + which can be used to generate a timeline. + exit_on_finished - a boolean. + ''' + + def __init__(self, options_str): + assert isinstance(options_str, str) + + self._options = { + 'batch_range': [10, 20], + 'state': 'All', + 'sorted_key': 'total', + 'tracer_option': 'Default', + 'profile_path': '/tmp/profile', + 'exit_on_finished': True + } + self._parse_from_string(options_str) + + def _parse_from_string(self, options_str): + for kv in options_str.replace(' ', '').split(';'): + key, value = kv.split('=') + if key == 'batch_range': + value_list = value.replace('[', '').replace(']', '').split(',') + value_list = list(map(int, value_list)) + if len(value_list) >= 2 and value_list[0] >= 0 and value_list[ + 1] > value_list[0]: + self._options[key] = value_list + elif key == 'exit_on_finished': + self._options[key] = value.lower() in ("yes", "true", "t", "1") + elif key in [ + 'state', 'sorted_key', 'tracer_option', 'profile_path' + ]: + self._options[key] = value + + def __getitem__(self, name): + if self._options.get(name, None) is None: + raise ValueError( + "ProfilerOptions does not have an option named %s." % name) + return self._options[name] + + +def add_profiler_step(options_str=None): + ''' + Enable the operator-level timing using PaddlePaddle's profiler. + The profiler uses a independent variable to count the profiler steps. + One call of this function is treated as a profiler step. + + Args: + profiler_options - a string to initialize the ProfilerOptions. + Default is None, and the profiler is disabled. + ''' + if options_str is None: + return + + global _profiler_step_id + global _profiler_options + + if _profiler_options is None: + _profiler_options = ProfilerOptions(options_str) + + if _profiler_step_id == _profiler_options['batch_range'][0]: + paddle.utils.profiler.start_profiler(_profiler_options['state'], + _profiler_options['tracer_option']) + elif _profiler_step_id == _profiler_options['batch_range'][1]: + paddle.utils.profiler.stop_profiler(_profiler_options['sorted_key'], + _profiler_options['profile_path']) + if _profiler_options['exit_on_finished']: + sys.exit(0) + + _profiler_step_id += 1 diff --git a/PaddleDetection-release-2.6/ppdet/utils/stats.py b/PaddleDetection-release-2.6/ppdet/utils/stats.py new file mode 100644 index 0000000000000000000000000000000000000000..4cd36d91cf80418720c24915522f1cf4587fe7bd --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/stats.py @@ -0,0 +1,94 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import collections +import numpy as np + +__all__ = ['SmoothedValue', 'TrainingStats'] + + +class SmoothedValue(object): + """Track a series of values and provide access to smoothed values over a + window or the global series average. + """ + + def __init__(self, window_size=20, fmt=None): + if fmt is None: + fmt = "{median:.4f} ({avg:.4f})" + self.deque = collections.deque(maxlen=window_size) + self.fmt = fmt + self.total = 0. + self.count = 0 + + def update(self, value, n=1): + self.deque.append(value) + self.count += n + self.total += value * n + + @property + def median(self): + return np.median(self.deque) + + @property + def avg(self): + return np.mean(self.deque) + + @property + def max(self): + return np.max(self.deque) + + @property + def value(self): + return self.deque[-1] + + @property + def global_avg(self): + return self.total / self.count + + def __str__(self): + return self.fmt.format( + median=self.median, avg=self.avg, max=self.max, value=self.value) + + +class TrainingStats(object): + def __init__(self, window_size, delimiter=' '): + self.meters = None + self.window_size = window_size + self.delimiter = delimiter + + def update(self, stats): + if self.meters is None: + self.meters = { + k: SmoothedValue(self.window_size) + for k in stats.keys() + } + for k, v in self.meters.items(): + v.update(stats[k].numpy()) + + def get(self, extras=None): + stats = collections.OrderedDict() + if extras: + for k, v in extras.items(): + stats[k] = v + for k, v in self.meters.items(): + stats[k] = format(v.median, '.6f') + + return stats + + def log(self, extras=None): + d = self.get(extras) + strs = [] + for k, v in d.items(): + strs.append("{}: {}".format(k, str(v))) + return self.delimiter.join(strs) diff --git a/PaddleDetection-release-2.6/ppdet/utils/visualizer.py b/PaddleDetection-release-2.6/ppdet/utils/visualizer.py new file mode 100644 index 0000000000000000000000000000000000000000..f7193306c93e0917ee400df3f76f28a3f436df08 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/visualizer.py @@ -0,0 +1,457 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import numpy as np +from PIL import Image, ImageDraw +import cv2 +import math + +from .colormap import colormap +from ppdet.utils.logger import setup_logger +logger = setup_logger(__name__) + +__all__ = ['visualize_results'] + + +def visualize_results(image, + bbox_res, + mask_res, + segm_res, + keypoint_res, + pose3d_res, + im_id, + catid2name, + threshold=0.5): + """ + Visualize bbox and mask results + """ + if bbox_res is not None: + image = draw_bbox(image, im_id, catid2name, bbox_res, threshold) + if mask_res is not None: + image = draw_mask(image, im_id, mask_res, threshold) + if segm_res is not None: + image = draw_segm(image, im_id, catid2name, segm_res, threshold) + if keypoint_res is not None: + image = draw_pose(image, keypoint_res, threshold) + if pose3d_res is not None: + pose3d = np.array(pose3d_res[0]['pose3d']) * 1000 + image = draw_pose3d(image, pose3d, visual_thread=threshold) + return image + + +def draw_mask(image, im_id, segms, threshold, alpha=0.7): + """ + Draw mask on image + """ + mask_color_id = 0 + w_ratio = .4 + color_list = colormap(rgb=True) + img_array = np.array(image).astype('float32') + for dt in np.array(segms): + if im_id != dt['image_id']: + continue + segm, score = dt['segmentation'], dt['score'] + if score < threshold: + continue + import pycocotools.mask as mask_util + mask = mask_util.decode(segm) * 255 + color_mask = color_list[mask_color_id % len(color_list), 0:3] + mask_color_id += 1 + for c in range(3): + color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 + idx = np.nonzero(mask) + img_array[idx[0], idx[1], :] *= 1.0 - alpha + img_array[idx[0], idx[1], :] += alpha * color_mask + return Image.fromarray(img_array.astype('uint8')) + + +def draw_bbox(image, im_id, catid2name, bboxes, threshold): + """ + Draw bbox on image + """ + draw = ImageDraw.Draw(image) + + catid2color = {} + color_list = colormap(rgb=True)[:40] + for dt in np.array(bboxes): + if im_id != dt['image_id']: + continue + catid, bbox, score = dt['category_id'], dt['bbox'], dt['score'] + if score < threshold: + continue + + if catid not in catid2color: + idx = np.random.randint(len(color_list)) + catid2color[catid] = color_list[idx] + color = tuple(catid2color[catid]) + + # draw bbox + if len(bbox) == 4: + # draw bbox + xmin, ymin, w, h = bbox + xmax = xmin + w + ymax = ymin + h + draw.line( + [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), + (xmin, ymin)], + width=2, + fill=color) + elif len(bbox) == 8: + x1, y1, x2, y2, x3, y3, x4, y4 = bbox + draw.line( + [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)], + width=2, + fill=color) + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + else: + logger.error('the shape of bbox must be [M, 4] or [M, 8]!') + + # draw label + text = "{} {:.2f}".format(catid2name[catid], score) + tw, th = draw.textsize(text) + draw.rectangle( + [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color) + draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) + + return image + + +def save_result(save_path, results, catid2name, threshold): + """ + save result as txt + """ + img_id = int(results["im_id"]) + with open(save_path, 'w') as f: + if "bbox_res" in results: + for dt in results["bbox_res"]: + catid, bbox, score = dt['category_id'], dt['bbox'], dt['score'] + if score < threshold: + continue + # each bbox result as a line + # for rbox: classname score x1 y1 x2 y2 x3 y3 x4 y4 + # for bbox: classname score x1 y1 w h + bbox_pred = '{} {} '.format(catid2name[catid], + score) + ' '.join( + [str(e) for e in bbox]) + f.write(bbox_pred + '\n') + elif "keypoint_res" in results: + for dt in results["keypoint_res"]: + kpts = dt['keypoints'] + scores = dt['score'] + keypoint_pred = [img_id, scores, kpts] + print(keypoint_pred, file=f) + else: + print("No valid results found, skip txt save") + + +def draw_segm(image, + im_id, + catid2name, + segms, + threshold, + alpha=0.7, + draw_box=True): + """ + Draw segmentation on image + """ + mask_color_id = 0 + w_ratio = .4 + color_list = colormap(rgb=True) + img_array = np.array(image).astype('float32') + for dt in np.array(segms): + if im_id != dt['image_id']: + continue + segm, score, catid = dt['segmentation'], dt['score'], dt['category_id'] + if score < threshold: + continue + import pycocotools.mask as mask_util + mask = mask_util.decode(segm) * 255 + color_mask = color_list[mask_color_id % len(color_list), 0:3] + mask_color_id += 1 + for c in range(3): + color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 + idx = np.nonzero(mask) + img_array[idx[0], idx[1], :] *= 1.0 - alpha + img_array[idx[0], idx[1], :] += alpha * color_mask + + if not draw_box: + center_y, center_x = ndimage.measurements.center_of_mass(mask) + label_text = "{}".format(catid2name[catid]) + vis_pos = (max(int(center_x) - 10, 0), int(center_y)) + cv2.putText(img_array, label_text, vis_pos, + cv2.FONT_HERSHEY_COMPLEX, 0.3, (255, 255, 255)) + else: + mask = mask_util.decode(segm) * 255 + sum_x = np.sum(mask, axis=0) + x = np.where(sum_x > 0.5)[0] + sum_y = np.sum(mask, axis=1) + y = np.where(sum_y > 0.5)[0] + x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1] + cv2.rectangle(img_array, (x0, y0), (x1, y1), + tuple(color_mask.astype('int32').tolist()), 1) + bbox_text = '%s %.2f' % (catid2name[catid], score) + t_size = cv2.getTextSize(bbox_text, 0, 0.3, thickness=1)[0] + cv2.rectangle(img_array, (x0, y0), (x0 + t_size[0], + y0 - t_size[1] - 3), + tuple(color_mask.astype('int32').tolist()), -1) + cv2.putText( + img_array, + bbox_text, (x0, y0 - 2), + cv2.FONT_HERSHEY_SIMPLEX, + 0.3, (0, 0, 0), + 1, + lineType=cv2.LINE_AA) + + return Image.fromarray(img_array.astype('uint8')) + + +def draw_pose(image, + results, + visual_thread=0.6, + save_name='pose.jpg', + save_dir='output', + returnimg=False, + ids=None): + try: + import matplotlib.pyplot as plt + import matplotlib + plt.switch_backend('agg') + except Exception as e: + logger.error('Matplotlib not found, please install matplotlib.' + 'for example: `pip install matplotlib`.') + raise e + + skeletons = np.array([item['keypoints'] for item in results]) + kpt_nums = 17 + if len(skeletons) > 0: + kpt_nums = int(skeletons.shape[1] / 3) + skeletons = skeletons.reshape(-1, kpt_nums, 3) + if kpt_nums == 17: #plot coco keypoint + EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8), + (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14), + (13, 15), (14, 16), (11, 12)] + else: #plot mpii keypoint + EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7), (7, 8), + (8, 9), (10, 11), (11, 12), (13, 14), (14, 15), (8, 12), + (8, 13)] + NUM_EDGES = len(EDGES) + + colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \ + [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \ + [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]] + cmap = matplotlib.cm.get_cmap('hsv') + plt.figure() + + img = np.array(image).astype('float32') + + color_set = results['colors'] if 'colors' in results else None + + if 'bbox' in results and ids is None: + bboxs = results['bbox'] + for j, rect in enumerate(bboxs): + xmin, ymin, xmax, ymax = rect + color = colors[0] if color_set is None else colors[color_set[j] % + len(colors)] + cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1) + + canvas = img.copy() + for i in range(kpt_nums): + for j in range(len(skeletons)): + if skeletons[j][i, 2] < visual_thread: + continue + if ids is None: + color = colors[i] if color_set is None else colors[color_set[j] + % + len(colors)] + else: + color = get_color(ids[j]) + + cv2.circle( + canvas, + tuple(skeletons[j][i, 0:2].astype('int32')), + 2, + color, + thickness=-1) + + to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0) + fig = matplotlib.pyplot.gcf() + + stickwidth = 2 + + for i in range(NUM_EDGES): + for j in range(len(skeletons)): + edge = EDGES[i] + if skeletons[j][edge[0], 2] < visual_thread or skeletons[j][edge[ + 1], 2] < visual_thread: + continue + + cur_canvas = canvas.copy() + X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]] + Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]] + mX = np.mean(X) + mY = np.mean(Y) + length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5 + angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1])) + polygon = cv2.ellipse2Poly((int(mY), int(mX)), + (int(length / 2), stickwidth), + int(angle), 0, 360, 1) + if ids is None: + color = colors[i] if color_set is None else colors[color_set[j] + % + len(colors)] + else: + color = get_color(ids[j]) + cv2.fillConvexPoly(cur_canvas, polygon, color) + canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0) + image = Image.fromarray(canvas.astype('uint8')) + plt.close() + return image + + +def draw_pose3d(image, + pose3d, + pose2d=None, + visual_thread=0.6, + save_name='pose3d.jpg', + returnimg=True): + try: + import matplotlib.pyplot as plt + import matplotlib + plt.switch_backend('agg') + except Exception as e: + logger.error('Matplotlib not found, please install matplotlib.' + 'for example: `pip install matplotlib`.') + raise e + + if pose3d.shape[0] == 24: + joints_connectivity_dict = [ + [0, 1, 0], [1, 2, 0], [5, 4, 1], [4, 3, 1], [2, 3, 0], [2, 14, 1], + [3, 14, 1], [14, 16, 1], [15, 16, 1], [15, 12, 1], [6, 7, 0], + [7, 8, 0], [11, 10, 1], [10, 9, 1], [8, 12, 0], [9, 12, 1], + [12, 19, 1], [19, 18, 1], [19, 20, 0], [19, 21, 1], [22, 20, 0], + [23, 21, 1] + ] + elif pose3d.shape[0] == 14: + joints_connectivity_dict = [ + [0, 1, 0], [1, 2, 0], [5, 4, 1], [4, 3, 1], [2, 3, 0], [2, 12, 0], + [3, 12, 1], [6, 7, 0], [7, 8, 0], [11, 10, 1], [10, 9, 1], + [8, 12, 0], [9, 12, 1], [12, 13, 1] + ] + else: + print( + "not defined joints number :{}, cannot visualize because unknown of joint connectivity". + format(pose.shape[0])) + return + + def draw3Dpose(pose3d, + ax, + lcolor="#3498db", + rcolor="#e74c3c", + add_labels=False): + # pose3d = orthographic_projection(pose3d, cam) + for i in joints_connectivity_dict: + x, y, z = [ + np.array([pose3d[i[0], j], pose3d[i[1], j]]) for j in range(3) + ] + ax.plot(-x, -z, -y, lw=2, c=lcolor if i[2] else rcolor) + + RADIUS = 1000 + center_xy = 2 if pose3d.shape[0] == 14 else 14 + x, y, z = pose3d[center_xy, 0], pose3d[center_xy, 1], pose3d[center_xy, + 2] + ax.set_xlim3d([-RADIUS + x, RADIUS + x]) + ax.set_ylim3d([-RADIUS + y, RADIUS + y]) + ax.set_zlim3d([-RADIUS + z, RADIUS + z]) + + ax.set_xlabel("x") + ax.set_ylabel("y") + ax.set_zlabel("z") + + def draw2Dpose(pose2d, + ax, + lcolor="#3498db", + rcolor="#e74c3c", + add_labels=False): + for i in joints_connectivity_dict: + if pose2d[i[0], 2] and pose2d[i[1], 2]: + x, y = [ + np.array([pose2d[i[0], j], pose2d[i[1], j]]) + for j in range(2) + ] + ax.plot(x, y, 0, lw=2, c=lcolor if i[2] else rcolor) + + def draw_img_pose(pose3d, + pose2d=None, + frame=None, + figsize=(12, 12), + savepath=None): + fig = plt.figure(figsize=figsize, dpi=80) + # fig.clear() + fig.tight_layout() + + ax = fig.add_subplot(221) + if frame is not None: + ax.imshow(frame, interpolation='nearest') + if pose2d is not None: + draw2Dpose(pose2d, ax) + + ax = fig.add_subplot(222, projection='3d') + ax.view_init(45, 45) + draw3Dpose(pose3d, ax) + ax = fig.add_subplot(223, projection='3d') + ax.view_init(0, 0) + draw3Dpose(pose3d, ax) + ax = fig.add_subplot(224, projection='3d') + ax.view_init(0, 90) + draw3Dpose(pose3d, ax) + + if savepath is not None: + plt.savefig(savepath) + plt.close() + else: + return fig + + def fig2data(fig): + """ + fig = plt.figure() + image = fig2data(fig) + @brief Convert a Matplotlib figure to a 4D numpy array with RGBA channels and return it + @param fig a matplotlib figure + @return a numpy 3D array of RGBA values + """ + # draw the renderer + fig.canvas.draw() + + # Get the RGBA buffer from the figure + w, h = fig.canvas.get_width_height() + buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8) + buf.shape = (w, h, 4) + + # canvas.tostring_argb give pixmap in ARGB mode. Roll the ALPHA channel to have it in RGBA mode + buf = np.roll(buf, 3, axis=2) + image = Image.frombytes("RGBA", (w, h), buf.tostring()) + return image.convert("RGB") + + fig = draw_img_pose(pose3d, pose2d, frame=image) + data = fig2data(fig) + if returnimg is False: + data.save(save_name) + else: + return data diff --git a/PaddleDetection-release-2.6/ppdet/utils/voc_utils.py b/PaddleDetection-release-2.6/ppdet/utils/voc_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..cd6d9f90ea85e355562d9bb8bd30319deb0f7901 --- /dev/null +++ b/PaddleDetection-release-2.6/ppdet/utils/voc_utils.py @@ -0,0 +1,86 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import os.path as osp +import re +import random + +__all__ = ['create_list'] + + +def create_list(devkit_dir, years, output_dir): + """ + create following list: + 1. trainval.txt + 2. test.txt + """ + trainval_list = [] + test_list = [] + for year in years: + trainval, test = _walk_voc_dir(devkit_dir, year, output_dir) + trainval_list.extend(trainval) + test_list.extend(test) + + random.shuffle(trainval_list) + with open(osp.join(output_dir, 'trainval.txt'), 'w') as ftrainval: + for item in trainval_list: + ftrainval.write(item[0] + ' ' + item[1] + '\n') + + with open(osp.join(output_dir, 'test.txt'), 'w') as fval: + ct = 0 + for item in test_list: + ct += 1 + fval.write(item[0] + ' ' + item[1] + '\n') + + +def _get_voc_dir(devkit_dir, year, type): + return osp.join(devkit_dir, 'VOC' + year, type) + + +def _walk_voc_dir(devkit_dir, year, output_dir): + filelist_dir = _get_voc_dir(devkit_dir, year, 'ImageSets/Main') + annotation_dir = _get_voc_dir(devkit_dir, year, 'Annotations') + img_dir = _get_voc_dir(devkit_dir, year, 'JPEGImages') + trainval_list = [] + test_list = [] + added = set() + + for _, _, files in os.walk(filelist_dir): + for fname in files: + img_ann_list = [] + if re.match(r'[a-z]+_trainval\.txt', fname): + img_ann_list = trainval_list + elif re.match(r'[a-z]+_test\.txt', fname): + img_ann_list = test_list + else: + continue + fpath = osp.join(filelist_dir, fname) + for line in open(fpath): + name_prefix = line.strip().split()[0] + if name_prefix in added: + continue + added.add(name_prefix) + ann_path = osp.join( + osp.relpath(annotation_dir, output_dir), + name_prefix + '.xml') + img_path = osp.join( + osp.relpath(img_dir, output_dir), name_prefix + '.jpg') + img_ann_list.append((img_path, ann_path)) + + return trainval_list, test_list diff --git a/PaddleDetection-release-2.6/requirements.txt b/PaddleDetection-release-2.6/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..f6297b6f7849b4f8c33be4d4386bcddce4b26665 --- /dev/null +++ b/PaddleDetection-release-2.6/requirements.txt @@ -0,0 +1,20 @@ +numpy < 1.24 +tqdm +typeguard +visualdl>=2.2.0 +opencv-python <= 4.6.0 +PyYAML +shapely +scipy +terminaltables +Cython +pycocotools +setuptools + +# for MOT evaluation and inference +lap +motmetrics +sklearn==0.0 + +# for vehicleplate in deploy/pipeline/ppvehicle +pyclipper diff --git a/PaddleDetection-release-2.6/scripts/build_wheel.sh b/PaddleDetection-release-2.6/scripts/build_wheel.sh new file mode 100644 index 0000000000000000000000000000000000000000..8b6ea06836b30328e2b54e916c4f6ba70834264c --- /dev/null +++ b/PaddleDetection-release-2.6/scripts/build_wheel.sh @@ -0,0 +1,156 @@ +#!/usr/bin/env bash + +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +#================================================= +# Utils +#================================================= + + +# directory config +DIST_DIR="dist" +BUILD_DIR="build" +EGG_DIR="paddledet.egg-info" + +CFG_DIR="configs" +TEST_DIR=".tests" +DATA_DIR="dataset" + +# command line log config +RED='\033[0;31m' +BLUE='\033[0;34m' +GREEN='\033[1;32m' +BOLD='\033[1m' +NONE='\033[0m' + +function python_version_check() { + PY_MAIN_VERSION=`python -V 2>&1 | awk '{print $2}' | awk -F '.' '{print $1}'` + PY_SUB_VERSION=`python -V 2>&1 | awk '{print $2}' | awk -F '.' '{print $2}'` + echo -e "find python version ${PY_MAIN_VERSION}.${PY_SUB_VERSION}" + if [ $PY_MAIN_VERSION -ne "3" -o $PY_SUB_VERSION -lt "5" ]; then + echo -e "${RED}FAIL:${NONE} please use Python >= 3.5 !" + exit 1 + fi +} + +function init() { + echo -e "${BLUE}[init]${NONE} removing building directory..." + rm -rf $DIST_DIR $BUILD_DIR $EGG_DIR $TEST_DIR + if [ `pip list | grep paddledet | wc -l` -gt 0 ]; then + echo -e "${BLUE}[init]${NONE} uninstalling paddledet..." + pip uninstall -y paddledet + fi + echo -e "${BLUE}[init]${NONE} ${GREEN}init success\n" +} + +function build_and_install() { + echo -e "${BLUE}[build]${NONE} building paddledet wheel..." + python setup.py sdist bdist_wheel + if [ $? -ne 0 ]; then + echo -e "${RED}[FAIL]${NONE} build paddledet wheel failed !" + exit 1 + fi + echo -e "${BLUE}[build]${NONE} ${GREEN}build paddldet wheel success\n" + + echo -e "${BLUE}[install]${NONE} installing paddledet..." + cd $DIST_DIR + find . -name "paddledet*.whl" | xargs pip install + if [ $? -ne 0 ]; then + cd .. + echo -e "${RED}[FAIL]${NONE} install paddledet wheel failed !" + exit 1 + fi + echo -e "${BLUE}[install]${NONE} ${GREEN}paddledet install success\n" + cd .. +} + +function unittest() { + if [ -d $TEST_DIR ]; then + rm -rf $TEST_DIR + fi; + + echo -e "${BLUE}[unittest]${NONE} run unittests..." + + # NOTE: perform unittests under TEST_DIR to + # make sure installed paddledet is used + mkdir $TEST_DIR + cp -r $CFG_DIR $TEST_DIR + cp -r $DATA_DIR $TEST_DIR + cd $TEST_DIR + + if [ $? != 0 ]; then + exit 1 + fi + find "../ppdet" -wholename '*tests/test_*' -type f -print0 | \ + xargs -0 -I{} -n1 -t bash -c 'python -u -s {}' + + # clean TEST_DIR + cd .. + rm -rf $TEST_DIR + echo -e "${BLUE}[unittest]${NONE} ${GREEN}unittests success\n${NONE}" +} + +function cleanup() { + if [ -d $TEST_DIR ]; then + rm -rf $TEST_DIR + fi + + rm -rf $BUILD_DIR $EGG_DIR + pip uninstall -y paddledet +} + +function abort() { + echo -e "${RED}[FAIL]${NONE} build wheel and unittest failed ! + please check your code" 1>&2 + + cur_dir=`basename "$pwd"` + if [ cur_dir==$TEST_DIR -o cur_dir==$DIST_DIR ]; then + cd .. + fi + + rm -rf $BUILD_DIR $EGG_DIR $DIST_DIR $TEST_DIR + pip uninstall -y paddledet +} + +python_version_check + +trap 'abort' 0 +set -e + +init +build_and_install +unittest +cleanup + +# get Paddle version +PADDLE_VERSION=`python -c "import paddle; print(paddle.version.full_version)"` +PADDLE_COMMIT=`python -c "import paddle; print(paddle.version.commit)"` +PADDLE_COMMIT=`git rev-parse --short $PADDLE_COMMIT` + +# get PaddleDetection branch +PPDET_BRANCH=`git rev-parse --abbrev-ref HEAD` +PPDET_COMMIT=`git rev-parse --short HEAD` + +# get Python version +PYTHON_VERSION=`python -c "import platform; print(platform.python_version())"` + +echo -e "\n${GREEN}paddledet wheel compiled and checked success !${NONE} + ${BLUE}Python version:${NONE} $PYTHON_VERSION + ${BLUE}Paddle version:${NONE} $PADDLE_VERSION ($PADDLE_COMMIT) + ${BLUE}PaddleDetection branch:${NONE} $PPDET_BRANCH ($PPDET_COMMIT)\n" + +echo -e "${GREEN}wheel saved under${NONE} ${RED}${BOLD}./dist" + +trap : 0 diff --git a/PaddleDetection-release-2.6/setup.py b/PaddleDetection-release-2.6/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..bc057d393857177d717e51136a900926b39cf7bb --- /dev/null +++ b/PaddleDetection-release-2.6/setup.py @@ -0,0 +1,133 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import os.path as osp +import glob +import shutil +import subprocess +from setuptools import find_packages, setup + +# ============== version definition ============== + +PPDET_VERSION = "2.6.0" + + +def parse_version(): + return PPDET_VERSION.replace('-', '') + + +def git_commit(): + try: + cmd = ['git', 'rev-parse', 'HEAD'] + git_commit = subprocess.Popen( + cmd, + stdout=subprocess.PIPE, ).communicate()[0].strip() + git_commit = git_commit.decode() + except: + git_commit = 'Unknown' + + return str(git_commit) + + +def write_version_py(filename='ppdet/version.py'): + ver_str = """# THIS FILE IS GENERATED FROM PADDLEPADDLE SETUP.PY +# +full_version = '%(version)s' +commit = '%(commit)s' +""" + + _git_commit = git_commit() + with open(filename, 'w') as f: + f.write(ver_str % {'version': PPDET_VERSION, 'commit': _git_commit}) + + +write_version_py() + +# ============== version definition ============== + + +def readme(): + with open('README.md', encoding='utf-8') as f: + content = f.read() + return content + + +def parse_requirements(fname): + with open(fname, encoding="utf-8-sig") as f: + requirements = f.readlines() + return requirements + + +def package_model_zoo(): + cur_dir = osp.dirname(osp.realpath(__file__)) + cfg_dir = osp.join(cur_dir, "configs") + cfgs = glob.glob(osp.join(cfg_dir, '*/*.yml')) + + valid_cfgs = [] + for cfg in cfgs: + # exclude dataset base config + if osp.split(osp.split(cfg)[0])[1] not in ['datasets']: + valid_cfgs.append(cfg) + model_names = [ + osp.relpath(cfg, cfg_dir).replace(".yml", "") for cfg in valid_cfgs + ] + + model_zoo_file = osp.join(cur_dir, 'ppdet', 'model_zoo', 'MODEL_ZOO') + with open(model_zoo_file, 'w') as wf: + for model_name in model_names: + wf.write("{}\n".format(model_name)) + + return [model_zoo_file] + + +packages = [ + 'ppdet', + 'ppdet.core', + 'ppdet.data', + 'ppdet.engine', + 'ppdet.metrics', + 'ppdet.modeling', + 'ppdet.model_zoo', + 'ppdet.slim', + 'ppdet.utils', +] + +if __name__ == "__main__": + setup( + name='paddledet', + packages=find_packages(exclude=("configs", "tools", "deploy")), + package_data={'ppdet.model_zoo': package_model_zoo()}, + author='PaddlePaddle', + version=parse_version(), + install_requires=parse_requirements('./requirements.txt'), + description='Object detection and instance segmentation toolkit based on PaddlePaddle', + long_description=readme(), + long_description_content_type='text/markdown', + url='https://github.com/PaddlePaddle/PaddleDetection', + download_url='https://github.com/PaddlePaddle/PaddleDetection.git', + keywords=['ppdet paddle ppyolo'], + classifiers=[ + 'Intended Audience :: Developers', + 'License :: OSI Approved :: Apache Software License', + 'Operating System :: OS Independent', + 'Natural Language :: Chinese (Simplified)', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', 'Topic :: Utilities' + ], + license='Apache License 2.0', + ext_modules=[]) diff --git a/PaddleDetection-release-2.6/test_tipc/README.md b/PaddleDetection-release-2.6/test_tipc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..42b1b7458b85eadef47a1c92533aff90d97ebd85 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/README.md @@ -0,0 +1,113 @@ + +# 飞桨训推一体认证 + +## 1. 简介 + +飞桨除了基本的模型训练和预测,还提供了支持多端多平台的高性能推理部署工具。 +本文档提供了PaddleDetection中所有模型的飞桨训推一体认证 (Training and Inference Pipeline Certification(TIPC)) 信息和测试工具, +方便用户查阅每种模型的训练推理部署打通情况,并可以进行一键测试。 + +
    + +
    + +## 2. 汇总信息 + +已填写的部分表示可以使用本工具进行一键测试,未填写的表示正在支持中。 + +**字段说明:** +- 基础训练预测:包括模型训练、Paddle Inference Python预测。 +- 更多训练方式:包括多机多卡、混合精度。 +- 模型压缩:包括裁剪、离线/在线量化、蒸馏。 +- 其他预测部署:包括Paddle Inference C++预测、Paddle Serving部署、Paddle-Lite部署等。 + +更详细的mkldnn、Tensorrt等预测加速相关功能的支持情况可以查看各测试工具的[更多教程](#more)。 + +| 算法论文 | 模型名称 | 模型类型 | 基础
    训练预测 | 更多
    训练方式 | 模型压缩 | 其他预测部署 | +| :--- |:-----------------------------------------------------------------------------------| :----: | :--------: | :---- | :---- | :---- | +| [PPYOLO](https://arxiv.org/abs/2007.12099) | [ppyolo_mbv3_large_coco](../configs/ppyolo/ppyolo_mbv3_large_coco.yml) | 目标检测 | 支持 | 混合精度 | FPGM裁剪
    PACT量化
    离线量化 | Paddle Inference: C++ | +| [PPYOLOv2](https://arxiv.org/abs/2104.10419) | [ppyolov2_r50vd_dcn_365e_coco](../configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | 目标检测 | 支持 | 多机多卡
    混合精度 | | Paddle Inference: C++ | +| [PP-PicoDet](https://arxiv.org/abs/2111.00902) | [picodet_s_320_coco_lcnet](../configs/picodet/picodet_s_320_coco_lcnet.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | + +更详细的汇总信息可以查看[更多模型](docs/more_models.md) + +## 3. 测试工具简介 +### 目录介绍 + +```shell +test_tipc/ +├── configs/ # 配置文件目录 +│ ├── ppyolo # ppyolo参数目录 +│ │ ├──ppyolo_mbv3_large_coco.txt +│ │ ├──ppyolo_r50vd_dcn_1x_coco.txt +│ │ ├──ppyolov2_r50vd_dcn_365e_coco.txt +│ ├── yolov3 # yolov3参数目录 +│ │ ├──yolov3_darknet53_270e_coco.txt +│ ├── ... +├── docs/ # 相关说明文档目录 +│ ├── ... +├── results/ # 预先保存的预测结果,用于和实际预测结果进行精读比对 +│ ├── xxx.txt +│ ├── ... +├── compare_results.py # 用于对比log中的预测结果与results中的预存结果精度误差是否在限定范围内 +├── prepare.sh # 完成test_*.sh运行所需要的数据和模型下载 +├── README.md # 使用文档 +├── test_inference_cpp.sh # 测试c++预测的主程序 +├── test_lite.sh # 测试lite部署预测的主程序 +├── test_serving.sh # 测试serving部署预测的主程序 +├── test_train_inference_python.sh # 测试python训练预测的主程序 +└── utils_func.sh # test_*.sh中需要用到的工具类函数 +``` + +### 测试流程概述 +使用本工具,可以测试不同功能的支持情况,以及预测结果是否对齐,测试流程概括如下: +
    + +
    + +1. 运行prepare.sh准备测试所需数据和模型; +2. 运行要测试的功能对应的测试脚本`test_*.sh`,产出log,由log可以看到不同配置是否运行成功; +3. 用`compare_results.py`对比log中的预测结果和预存在results目录下的结果,判断预测精度是否符合预期(在误差范围内)。 + +测试单项功能仅需两行命令,**如需测试不同模型/功能,替换配置文件即可**,命令格式如下: +```shell +# 功能:准备数据 +# 格式:bash + 运行脚本 + 参数1: 配置文件选择 + 参数2: 模式选择 +bash test_tipc/prepare.sh configs/[model_name]/[params_file_name] [Mode] + +# 功能:运行测试 +# 格式:bash + 运行脚本 + 参数1: 配置文件选择 + 参数2: 模式选择 +bash test_tipc/test_train_inference_python.sh configs/[model_name]/[params_file_name] [Mode] +``` + +例如,测试基本训练预测功能的`lite_train_lite_infer`模式,运行: +```shell +# 准备数据 +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_lite_infer' +# 运行测试 +bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_lite_infer' +``` +关于本示例命令的更多信息可查看[基础训练预测使用文档](docs/test_train_inference_python.md)。 + +### 配置文件命名规范 +在`configs`目录下,**按模型名称划分为子目录**,子目录中存放所有该模型测试需要用到的配置文件,配置文件的命名遵循如下规范: + +1. 基础训练预测配置简单命名为:`train_infer_python.txt`,表示**Linux环境下单机、不使用混合精度训练+python预测**,其完整命名对应`train_linux_gpu_normal_normal_infer_python_linux_gpu_cpu.txt`,由于本配置文件使用频率较高,这里进行了名称简化。 + +2. 其他带训练配置命名格式为:`train_训练硬件环境(linux_gpu/linux_dcu/…)_是否多机(fleet/normal)_是否混合精度(amp/normal)_预测模式(infer/lite/serving/js)_语言(cpp/python/java)_预测硬件环境(linux_gpu/mac/jetson/opencl_arm_gpu/...).txt`。如,linux gpu下多机多卡+混合精度链条测试对应配置 `train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt`,linux dcu下基础训练预测对应配置 `train_linux_dcu_normal_normal_infer_python_linux_dcu.txt`。 + +3. 仅预测的配置(如serving、lite等)命名格式:`model_训练硬件环境(linux_gpu/linux_dcu/…)_是否多机(fleet/normal)_是否混合精度(amp/normal)_(infer/lite/serving/js)_语言(cpp/python/java)_预测硬件环境(linux_gpu/mac/jetson/opencl_arm_gpu/...).txt`,即,与2相比,仅第一个字段从train换为model,测试时模型直接下载获取,这里的“训练硬件环境”表示所测试的模型是在哪种环境下训练得到的。 + +**根据上述命名规范,可以直接从子目录名称和配置文件名找到需要测试的场景和功能对应的配置文件。** + + + +## 4. 开始测试 +各功能测试中涉及混合精度、裁剪、量化等训练相关,及mkldnn、Tensorrt等多种预测相关参数配置,请点击下方相应链接了解更多细节和使用教程: +- [test_train_inference_python 使用](docs/test_train_inference_python.md) :测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。 +- [test_train_fleet_inference_python 使用](./docs/test_train_fleet_inference_python.md):测试基于Python的多机多卡训练与推理等基本功能。 +- [test_inference_cpp 使用](docs/test_inference_cpp.md):测试基于C++的模型推理。 +- [test_serving 使用](docs/test_serving.md):测试基于Paddle Serving的服务化部署功能,包括Python、C++。 +- test_lite_arm_cpu_cpp 使用(待开发):测试基于Paddle-Lite的ARM CPU端c++预测部署功能。 +- [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。 +- [test_ptq_inference_python 使用](docs/test_ptq_inference_python.md):测试基于Python的离线量化功能。 diff --git a/PaddleDetection-release-2.6/test_tipc/benchmark_train.sh b/PaddleDetection-release-2.6/test_tipc/benchmark_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..bb2324f00c5ecd8eb818f80d969253686f93571f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/benchmark_train.sh @@ -0,0 +1,293 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +# set env +python=python +export model_branch=`git symbolic-ref HEAD 2>/dev/null | cut -d"/" -f 3` +export model_commit=$(git log|head -n1|awk '{print $2}') +export str_tmp=$(echo `pip list|grep paddlepaddle-gpu|awk -F ' ' '{print $2}'`) +export frame_version=${str_tmp%%.post*} +export frame_commit=$(echo `${python} -c "import paddle;print(paddle.version.commit)"`) + +# run benchmark sh +# Usage: +# bash run_benchmark_train.sh config.txt params +# or +# bash run_benchmark_train.sh config.txt + +function func_parser_params(){ + strs=$1 + IFS="=" + array=(${strs}) + tmp=${array[1]} + echo ${tmp} +} + +function set_dynamic_epoch(){ + string=$1 + num=$2 + _str=${string:1:6} + IFS="C" + arr=(${_str}) + M=${arr[0]} + P=${arr[1]} + ep=`expr $num \* $P` + echo $ep +} + +function func_sed_params(){ + filename=$1 + line=$2 + param_value=$3 + params=`sed -n "${line}p" $filename` + IFS=":" + array=(${params}) + key=${array[0]} + new_params="${key}:${param_value}" + IFS=";" + cmd="sed -i '${line}s/.*/${new_params}/' '${filename}'" + eval $cmd +} + +function set_gpu_id(){ + string=$1 + _str=${string:1:6} + IFS="C" + arr=(${_str}) + M=${arr[0]} + P=${arr[1]} + gn=`expr $P - 1` + gpu_num=`expr $gn / $M` + seq=`seq -s "," 0 $gpu_num` + echo $seq +} + +function get_repo_name(){ + IFS=";" + cur_dir=$(pwd) + IFS="/" + arr=(${cur_dir}) + echo ${arr[-1]} +} + +FILENAME=$1 +# copy FILENAME as new +new_filename="./test_tipc/benchmark_train.txt" +cmd=`yes|cp $FILENAME $new_filename` +FILENAME=$new_filename +# MODE must be one of ['benchmark_train'] +MODE=$2 +PARAMS=$3 +# bash test_tipc/benchmark_train.sh test_tipc/configs/det_mv3_db_v2_0/train_benchmark.txt benchmark_train dynamic_bs8_null_DP_N1C1 +IFS=$'\n' +# parser params from train_benchmark.txt +dataline=`cat $FILENAME` +# parser params +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") + +# 获取benchmark_params所在的行数 +line_num=`grep -n -w "train_benchmark_params" $FILENAME | cut -d ":" -f 1` +# for train log parser +batch_size=$(func_parser_value "${lines[line_num]}") +line_num=`expr $line_num + 1` +fp_items=$(func_parser_value "${lines[line_num]}") +line_num=`expr $line_num + 1` +epoch=$(func_parser_value "${lines[line_num]}") +line_num=`expr $line_num + 1` +repeat=$(func_parser_value "${lines[line_num]}") + +line_num=`expr $line_num + 1` +profile_option_key=$(func_parser_key "${lines[line_num]}") +profile_option_params=$(func_parser_value "${lines[line_num]}") +profile_option="${profile_option_key}:${profile_option_params}" + +line_num=`expr $line_num + 1` +flags_value=$(func_parser_value "${lines[line_num]}") +if [ ${flags_value} != "null" ];then + # set flags + IFS=";" + flags_list=(${flags_value}) + for _flag in ${flags_list[*]}; do + cmd="export ${_flag}" + eval $cmd + done +fi + +# set log_name +repo_name=$(get_repo_name ) +SAVE_LOG=${BENCHMARK_LOG_DIR:-$(pwd)} # */benchmark_log +mkdir -p "${SAVE_LOG}/benchmark_log/" +status_log="${SAVE_LOG}/benchmark_log/results.log" + +# The number of lines in which train params can be replaced. +line_python=3 +line_gpuid=4 +line_precision=6 +line_epoch=7 +line_batchsize=9 +line_profile=13 +line_eval_py=24 +line_export_py=30 + +func_sed_params "$FILENAME" "${line_eval_py}" "null" +func_sed_params "$FILENAME" "${line_export_py}" "null" +func_sed_params "$FILENAME" "${line_python}" "${python}" + +# if params +if [ ! -n "$PARAMS" ] ;then + # PARAMS input is not a word. + IFS="|" + batch_size_list=(${batch_size}) + fp_items_list=(${fp_items}) + device_num="N1C4" + device_num_list=($device_num) + run_mode="DP" +elif [[ ${PARAMS} = "dynamicTostatic" ]] ;then + IFS="|" + model_type=$PARAMS + batch_size_list=(${batch_size}) + fp_items_list=(${fp_items}) + device_num="N1C4" + device_num_list=($device_num) + run_mode="DP" +else + # parser params from input: modeltype_bs${bs_item}_${fp_item}_${run_mode}_${device_num} + IFS="_" + params_list=(${PARAMS}) + model_type=${params_list[0]} + batch_size=${params_list[1]} + batch_size=`echo ${batch_size} | tr -cd "[0-9]" ` + precision=${params_list[2]} + run_mode=${params_list[3]} + device_num=${params_list[4]} + IFS=";" + + if [ ${precision} = "null" ];then + precision="fp32" + fi + + fp_items_list=($precision) + batch_size_list=($batch_size) + device_num_list=($device_num) +fi + +# for log name +to_static="" +# parse "to_static" options and modify trainer into "to_static_trainer" +if [[ ${model_type} = "dynamicTostatic" ]];then + to_static="d2sT_" + sed -i 's/trainer:norm_train/trainer:to_static_train/g' $FILENAME +fi + + + +if [[ ${model_name} =~ "higherhrnet" ]] || [[ ${model_name} =~ "hrnet" ]] || [[ ${model_name} =~ "tinypose" ]] || [[ ${model_name} =~ "ppyoloe_r_crn_s_3x_spine_coco" ]] ;then + echo "${model_name} run on full coco dataset" + epoch=$(set_dynamic_epoch $device_num $epoch) +else + epoch=1 + repeat=$(set_dynamic_epoch $device_num $repeat) + eval "sed -i '10c\ repeat: ${repeat}' configs/datasets/coco_detection.yml" + eval "sed -i '10c\ repeat: ${repeat}' configs/datasets/coco_instance.yml" + eval "sed -i '10c\ repeat: ${repeat}' configs/datasets/mot.yml" +fi + +IFS="|" +for batch_size in ${batch_size_list[*]}; do + for precision in ${fp_items_list[*]}; do + for device_num in ${device_num_list[*]}; do + # sed batchsize and precision + func_sed_params "$FILENAME" "${line_precision}" "$precision" + func_sed_params "$FILENAME" "${line_batchsize}" "$MODE=$batch_size" + func_sed_params "$FILENAME" "${line_epoch}" "$MODE=$epoch" + gpu_id=$(set_gpu_id $device_num) + + if [ ${#gpu_id} -le 1 ];then + log_path="$SAVE_LOG/profiling_log" + mkdir -p $log_path + log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_mode}_${device_num}_${to_static}profiling" + func_sed_params "$FILENAME" "${line_gpuid}" "0" # sed used gpu_id + # set profile_option params + tmp=`sed -i "${line_profile}s/.*/${profile_option}/" "${FILENAME}"` + + # run test_train_inference_python.sh + cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 " + echo $cmd + eval $cmd + eval "cat ${log_path}/${log_name}" + + # without profile + log_path="$SAVE_LOG/train_log" + speed_log_path="$SAVE_LOG/index" + mkdir -p $log_path + mkdir -p $speed_log_path + log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_mode}_${device_num}_${to_static}log" + speed_log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_mode}_${device_num}_${to_static}speed" + func_sed_params "$FILENAME" "${line_profile}" "null" # sed profile_id as null + cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 " + echo $cmd + job_bt=`date '+%Y%m%d%H%M%S'` + eval $cmd + job_et=`date '+%Y%m%d%H%M%S'` + export model_run_time=$((${job_et}-${job_bt})) + eval "cat ${log_path}/${log_name}" + + # parser log + _model_name="${model_name}_bs${batch_size}_${precision}_${run_mode}" + cmd="${python} ${BENCHMARK_ROOT}/scripts/analysis.py --filename ${log_path}/${log_name} \ + --speed_log_file '${speed_log_path}/${speed_log_name}' \ + --model_name ${_model_name} \ + --base_batch_size ${batch_size} \ + --run_mode ${run_mode} \ + --fp_item ${precision} \ + --keyword ips: \ + --skip_steps 2 \ + --device_num ${device_num} \ + --speed_unit images/s \ + --convergence_key loss: " + echo $cmd + eval $cmd + last_status=${PIPESTATUS[0]} + status_check $last_status "${cmd}" "${status_log}" "${model_name}" + else + IFS=";" + unset_env=`unset CUDA_VISIBLE_DEVICES` + log_path="$SAVE_LOG/train_log" + speed_log_path="$SAVE_LOG/index" + mkdir -p $log_path + mkdir -p $speed_log_path + log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_mode}_${device_num}_${to_static}log" + speed_log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_mode}_${device_num}_${to_static}speed" + func_sed_params "$FILENAME" "${line_gpuid}" "$gpu_id" # sed used gpu_id + func_sed_params "$FILENAME" "${line_profile}" "null" # sed --profile_option as null + cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 " + echo $cmd + job_bt=`date '+%Y%m%d%H%M%S'` + eval $cmd + job_et=`date '+%Y%m%d%H%M%S'` + export model_run_time=$((${job_et}-${job_bt})) + eval "cat ${log_path}/${log_name}" + # parser log + _model_name="${model_name}_bs${batch_size}_${precision}_${run_mode}" + + cmd="${python} ${BENCHMARK_ROOT}/scripts/analysis.py --filename ${log_path}/${log_name} \ + --speed_log_file '${speed_log_path}/${speed_log_name}' \ + --model_name ${_model_name} \ + --base_batch_size ${batch_size} \ + --run_mode ${run_mode} \ + --fp_item ${precision} \ + --keyword ips: \ + --skip_steps 2 \ + --device_num ${device_num} \ + --speed_unit images/s \ + --convergence_key loss: " + echo $cmd + eval $cmd + last_status=${PIPESTATUS[0]} + status_check $last_status "${cmd}" "${status_log}" "${model_name}" + fi + done + done +done diff --git a/PaddleDetection-release-2.6/test_tipc/compare_results.py b/PaddleDetection-release-2.6/test_tipc/compare_results.py new file mode 100644 index 0000000000000000000000000000000000000000..e28410ed6cb26aab7557025c06b2541a7d27c2c1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/compare_results.py @@ -0,0 +1,140 @@ +import numpy as np +import os +import subprocess +import json +import argparse +import glob + + +def init_args(): + parser = argparse.ArgumentParser() + # params for testing assert allclose + parser.add_argument("--atol", type=float, default=1e-3) + parser.add_argument("--rtol", type=float, default=1e-3) + parser.add_argument("--gt_file", type=str, default="") + parser.add_argument("--log_file", type=str, default="") + parser.add_argument("--precision", type=str, default="fp32") + return parser + + +def parse_args(): + parser = init_args() + return parser.parse_args() + + +def run_shell_command(cmd): + p = subprocess.Popen( + cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) + out, err = p.communicate() + + if p.returncode == 0: + return out.decode('utf-8') + else: + return None + + +def parser_results_from_log_by_name(log_path, names_list): + if not os.path.exists(log_path): + raise ValueError("The log file {} does not exists!".format(log_path)) + + if names_list is None or len(names_list) < 1: + return [] + + parser_results = {} + for name in names_list: + cmd = "grep {} {}".format(name, log_path) + outs = run_shell_command(cmd) + outs = outs.split("\n")[0] + result = outs.split("{}".format(name))[-1] + try: + result = json.loads(result) + except: + result = np.array([int(r) for r in result.split()]).reshape(-1, 4) + parser_results[name] = result + return parser_results + + +def load_gt_from_file(gt_file): + if not os.path.exists(gt_file): + raise ValueError("The log file {} does not exists!".format(gt_file)) + with open(gt_file, 'r') as f: + data = f.readlines() + f.close() + parser_gt = {} + for line in data: + image_name, result = line.strip("\n").split("\t") + image_name = image_name.split('/')[-1] + try: + result = json.loads(result) + except: + result = np.array([int(r) for r in result.split()]).reshape(-1, 4) + parser_gt[image_name] = result + return parser_gt + + +def load_gt_from_txts(gt_file): + gt_list = glob.glob(gt_file) + gt_collection = {} + for gt_f in gt_list: + gt_dict = load_gt_from_file(gt_f) + basename = os.path.basename(gt_f) + if "fp32" in basename: + gt_collection["fp32"] = [gt_dict, gt_f] + elif "fp16" in basename: + gt_collection["fp16"] = [gt_dict, gt_f] + elif "int8" in basename: + gt_collection["int8"] = [gt_dict, gt_f] + else: + continue + return gt_collection + + +def collect_predict_from_logs(log_path, key_list): + log_list = glob.glob(log_path) + pred_collection = {} + for log_f in log_list: + pred_dict = parser_results_from_log_by_name(log_f, key_list) + key = os.path.basename(log_f) + pred_collection[key] = pred_dict + + return pred_collection + + +def testing_assert_allclose(dict_x, dict_y, atol=1e-7, rtol=1e-7): + for k in dict_x: + np.testing.assert_allclose( + np.array(dict_x[k]), np.array(dict_y[k]), atol=atol, rtol=rtol) + + +if __name__ == "__main__": + # Usage: + # python3.7 tests/compare_results.py --gt_file=./tests/results/*.txt --log_file=./tests/output/infer_*.log + + args = parse_args() + + gt_collection = load_gt_from_txts(args.gt_file) + key_list = gt_collection["fp32"][0].keys() + + pred_collection = collect_predict_from_logs(args.log_file, key_list) + for filename in pred_collection.keys(): + if "fp32" in filename: + gt_dict, gt_filename = gt_collection["fp32"] + elif "fp16" in filename: + gt_dict, gt_filename = gt_collection["fp16"] + elif "int8" in filename: + gt_dict, gt_filename = gt_collection["int8"] + else: + continue + pred_dict = pred_collection[filename] + + try: + testing_assert_allclose( + gt_dict, pred_dict, atol=args.atol, rtol=args.rtol) + print( + "Assert allclose passed! The results of {} and {} are consistent!". + format(filename, gt_filename)) + except Exception as E: + print(E) + raise ValueError( + "The results of {} and the results of {} are inconsistent!". + format(filename, gt_filename)) diff --git a/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..f57a90113222cfd0cd32d1198be3cf5e2d40c729 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:cascade_mask_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..31e743e1003bfdea00178cef56517504e45b510b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:cascade_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/deformable_detr/deformable_detr_r50_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/deformable_detr/deformable_detr_r50_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..f4b23d2d7ddc141d8bd1f6fc7c7721469ca89ee3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/deformable_detr/deformable_detr_r50_1x_coco_train_infer_python.txt @@ -0,0 +1,58 @@ +===========================train_params=========================== +model_name:deformable_detr_r50_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=50 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/deformable_detr_r50_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml -o +pact_train:tools/train.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/deformable_detr_r50_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/deformable_detr/deformable_detr_r50_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/detr/detr_r50_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/detr/detr_r50_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..bebbad550e282e6973b3063e10c588da3c3c779b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/detr/detr_r50_1x_coco_train_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:detr_r50_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=50 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/detr_r50_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/detr/detr_r50_1x_coco.yml -o +pact_train:tools/train.py -c configs/detr/detr_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/detr/detr_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/detr/detr_r50_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/detr_r50_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/detr/detr_r50_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/detr/detr_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/detr/detr_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/detr/detr_r50_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/face_detection/blazeface_1000e_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/face_detection/blazeface_1000e_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..79884cc3a1bb5392254299e4dad015ea4e4196e8 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/face_detection/blazeface_1000e_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:blazeface_1000e +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1000 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/wider_face/WIDER_val/images/0--Parade/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/face_detection/blazeface_1000e.yml -o +pact_train:tools/train.py -c configs/face_detection/blazeface_1000e.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/face_detection/blazeface_1000e.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/face_detection/blazeface_1000e.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams +norm_export:tools/export_model.py -c configs/face_detection/blazeface_1000e.yml -o +pact_export:tools/export_model.py -c configs/face_detection/blazeface_1000e.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/face_detection/blazeface_1000e.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/face_detection/blazeface_1000e.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/wider_face/WIDER_val/images/0--Parade/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..f71de978486b8c1eb4265f952756eee05a7985d9 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:faster_rcnn_r50_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml -o +pact_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..bb15bdd8eda66e2afd5eef73f6b2772b7452d805 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:faster_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|8 +fp_items:fp32|fp16 +epoch:1 +repeat:3 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..07f799c0b1819459e708e574fd12ab13c8398eec --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:faster_rcnn_swin_tiny_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_swin_tiny_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/faster_rcnn_swin_tiny_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/fcos/fcos_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/fcos/fcos_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..4242696483200ec30016ca1b4f4598209cfc7653 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/fcos/fcos_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:fcos_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|8 +fp_items:fp32|fp16 +epoch:1 +repeat:3 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344_2.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/gfl/gfl_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/gfl/gfl_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..ed377f90a1c3ec56cdb40a85290a16f1699f290b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/gfl/gfl_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,58 @@ +===========================train_params=========================== +model_name:gfl_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/gfl/gfl_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|8 +fp_items:fp32|fp16 +epoch:1 +repeat:3 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/dark_hrnet_w32_256x192_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/dark_hrnet_w32_256x192_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..404d6468d8b697019c2f265a0ae270bce2b7614c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/dark_hrnet_w32_256x192_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:dark_hrnet_w32_256x192 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=210 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=64 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml -o +pact_train:tools/train.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams +norm_export:tools/export_model.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml -o +pact_export:tools/export_model.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +random_infer_input:[{float32,[3,256,192]}] \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/higherhrnet_hrnet_w32_512_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/higherhrnet_hrnet_w32_512_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..0bd1f61d5573507fdaf03c0e9f6f1f8142070684 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/higherhrnet_hrnet_w32_512_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:higherhrnet_hrnet_w32_512 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=20 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o +pact_train:tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams +norm_export:tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o +pact_export:tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:20|24 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +random_infer_input:[{float32,[3,512,512]}] \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/hrnet_w32_256x192_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/hrnet_w32_256x192_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..ce486db0eb691f38ffb0ba6da097a66d244a5846 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/hrnet_w32_256x192_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:hrnet_w32_256x192 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=210 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=64 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o +pact_train:tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +norm_export:tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o +pact_export:tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:64|160 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +random_infer_input:[{float32,[3,256,192]}] \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96.yml b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96.yml new file mode 100644 index 0000000000000000000000000000000000000000..338d9793c47550fbae8f79c62efca9f62edef2d9 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96.yml @@ -0,0 +1,147 @@ +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/tinypose_128x96/model_final +epoch: 420 +num_joints: &num_joints 17 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 128 +train_width: &train_width 96 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [24, 32] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] + + +#####model +architecture: TopDownHRNet + +TopDownHRNet: + backbone: LiteHRNet + post_process: HRNetPostProcess + flip_perm: *flip_perm + num_joints: *num_joints + width: &width 40 + loss: KeyPointMSELoss + use_dark: true + +LiteHRNet: + network_type: wider_naive + freeze_at: -1 + freeze_norm: false + return_idx: [0] + +KeyPointMSELoss: + use_target_weight: true + loss_scale: 1.0 + +#####optimizer +LearningRate: + base_lr: 0.008 + schedulers: + - !PiecewiseDecay + milestones: [380, 410] + gamma: 0.1 + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: + factor: 0.0 + type: L2 + + +#####data +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: train2017 + anno_path: annotations/person_keypoints_train2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: val2017 + anno_path: annotations/person_keypoints_val2017.json + dataset_dir: dataset/coco + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.5 + +TestDataset: + !ImageFolder + anno_path: dataset/coco/keypoint_imagelist.txt + +worker_num: 2 +global_mean: &global_mean [0.485, 0.456, 0.406] +global_std: &global_std [0.229, 0.224, 0.225] +TrainReader: + sample_transforms: + - RandomFlipHalfBodyTransform: + scale: 0.25 + rot: 30 + num_joints_half_body: 8 + prob_half_body: 0.3 + pixel_std: *pixel_std + trainsize: *trainsize + upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + flip_pairs: *flip_perm + - AugmentationbyInformantionDropping: + prob_cutout: 0.5 + offset_factor: 0.05 + num_patch: 1 + trainsize: *trainsize + - TopDownAffine: + trainsize: *trainsize + use_udp: true + - ToHeatmapsTopDown_DARK: + hmsize: *hmsize + sigma: 1 + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 512 + shuffle: true + drop_last: false + +EvalReader: + sample_transforms: + - TopDownAffine: + trainsize: *trainsize + use_udp: true + batch_transforms: + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 16 + +TestReader: + inputs_def: + image_shape: [3, *train_height, *train_width] + sample_transforms: + - Decode: {} + - TopDownEvalAffine: + trainsize: *trainsize + - NormalizeImage: + mean: *global_mean + std: *global_std + is_scale: true + - Permute: {} + batch_size: 1 + fuse_normalize: false diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..8f9468223586657b9107e1ef54508bdfa71d40ed --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:tinypose_128x96_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:null +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir_keypoint: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +--model_dir:./output_inference/picodet_s_320_pedestrian \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..02e7bf96fda6c28997b3e76d5d464e26d2fc4869 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:tinypose_128x96_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:tinypose_128x96 +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..d766221b483506ebc714869926764188bc16afef --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:tinypose_128x96_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..edfc125421ddb83570722aa8b469747367fd900f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:tinypose_128x96_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/tinypose_128x96_qat.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/quant/tinypose_qat.yml -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir_keypoint: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +--model_dir:./output_inference/picodet_s_320_pedestrian \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..a2692f1b12fbfb5f59862a83e175a7db3b4b3d4d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:tinypose_128x96_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/tinypose_128x96_qat.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/quant/tinypose_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +--model:null +--op:tinypose_128x96 +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..223f7911e37623429ac093160f7d12fd7e8e2ed9 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:tinypose_128x96_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/tinypose_128x96_qat.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/quant/tinypose_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b1ecda104bf0facd4acbeb0c8a93e7915bc97e0a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:tinypose_128x96 +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir_keypoint: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +--model_dir:./output_inference/picodet_s_320_pedestrian \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..c746157b204e3d6e953feb0166e01a377d8703cb --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:tinypose_128x96 +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/keypoint/tiny_pose/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/hrnet_demo.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..dcedbb0448404d5c991b18781121d30c6541a5b1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:tinypose_128x96 +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:tinypose_128x96 +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..be6831617df324e827af619da173d0cfc1a0a23d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:tinypose_128x96 +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --export_serving_model True -o +quant_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/hrnet_demo.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..3040ed53757ad4721ab956076027dbb992977583 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:tinypose_128x96 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=420 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=512 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================disable_train_benchmark========================== +batch_size:512 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +random_infer_input:[{float32,[3,128,96]}] \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..3b11a590d7b49751e7540d0f0d459f04032a1c1c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:tinypose_128x96 +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=420 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=512 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/keypoint_infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..7be6cfed13372e76eebd0b1b90e07886460a3945 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:tinypose_128x96 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=420 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=512 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..b87d583895dfe38ee3f0d025c2fbb614cf941f68 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:tinypose_128x96 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=420 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=512 +pretrain_weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..5b5dfd604ab2cc29bc20a5eb523ccbcf67a24110 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/keypoint/tinypose_128x96_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:tinypose_128x96 +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +kl_quant_export:tools/post_quant.py -c test_tipc/configs/keypoint/tinypose_128x96.yml --slim_config configs/slim/post_quant/tinypose_128x96_ptq.yml -o +export_param1:null +## +inference:./deploy/python/keypoint_infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..957d442687d56d4f165e6eeb96f168764d1af10d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:mask_rcnn_r50_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml -o +pact_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|4 +fp_items:fp32|fp16 +epoch:1 +repeat:2 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..9f0f8e6a92546a46ff67a5f6dfe732a8ed5d2f97 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:null +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..19faa50e7088f392dfd53b9f9cdef32c209265ea --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:mask_rcnn_r50_fpn_1x_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..a68d556fef975354018b86afddff6e7a840caaf4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..51c1df3ed8d9cfadc16f2a56d6d45223f912e0c3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f93974ea41d498686f1062ec3b8eb1bc50a6aaa6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +--model:null +--op:mask_rcnn_r50_fpn_1x_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..93dfab050e99d970b3333913adc7a50399a33681 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d8f41b34692dbbfced58af55377b214b54e96e3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..780d9b67b5c472b10a655ce2b7e1dc06b6d7fdd0 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:True +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:16 +--enable_dev_version:False +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b0d186371157e45fd936eb0aea8238509370a1a0 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:mask_rcnn_r50_fpn_1x_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..59106a4c4797271bd922c00f0f592a1896950ce6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..db6d2b00a3bee700e5a3ac9116dae360ece44d67 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|4 +fp_items:fp32|fp16 +epoch:1 +repeat:2 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1e824592738abcb30e03d8f810b1c0d50a9f32b5 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_onnx:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..796fd008ca16878d8f7dad9062e39dd86be57442 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..a040bae4b5f1683ea251e33ea017b1b397c6d712 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml -o +fpgm_train:tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..96c2631b1a48777ddcb37e359d252256903842ac --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:mask_rcnn_r50_fpn_1x_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_dla34_30e_1088x608_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_dla34_30e_1088x608_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..6f6e63e4a9fb17dc0d673576af4578dbaccb85e1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_dla34_30e_1088x608_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:fairmot_dla34_30e_1088x608 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=30 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=6 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/mot/test.mp4 +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o +pact_train:tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +norm_export:tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o +pact_export:tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/pptracking/python/mot_jde_infer.py +--device:gpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--video_file:./dataset/mot/test.mp4 +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:6|20 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x1088.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml new file mode 100644 index 0000000000000000000000000000000000000000..b010c2dc15bb06faf222afd86c91624dea41c26a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml @@ -0,0 +1,43 @@ +_BASE_: [ + '../../../configs/datasets/mot.yml', + '../../../configs/runtime.yml', + '../../../configs/mot/fairmot/_base_/optimizer_30e_momentum.yml', + '../../../configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml', + '../../../configs/mot/fairmot/_base_/fairmot_reader_576x320.yml', +] + +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +# add crowdhuman +TrainDataset: + !MOTDataSet + dataset_dir: dataset/mot + image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] + +worker_num: 4 +TrainReader: + inputs_def: + image_shape: [3, 320, 576] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [320, 576]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} + batch_transforms: + - Gt2FairMOTTarget: {} + batch_size: 4 + shuffle: True + drop_last: True + use_shared_memory: True + +weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320/model_final diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..ccc1570fb92ad387f67546d8743196e1f37aab66 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:fairmot_hrnetv2_w18_dlafpn_30e_576x320 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=30 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=6 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/mot/test.mp4 +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o +pact_train:tools/train.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval_mot.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/pptracking/python/mot_jde_infer.py +--device:gpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--video_file:./dataset/mot/test.mp4 +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x576.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mot/jde_darknet53_30e_1088x608_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mot/jde_darknet53_30e_1088x608_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..423c85676bb7946020b4279c20cf69522e8d0678 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mot/jde_darknet53_30e_1088x608_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:jde_darknet53_30e_1088x608 +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=30 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=4 +pretrain_weights:https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/mot/test.mp4 +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o +pact_train:tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams +norm_export:tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o +pact_export:tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/pptracking/python/mot_jde_infer.py +--device:gpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--video_file:./dataset/mot/test.mp4 +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:4|12 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x1088.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/mot/ocsort_ppyoloe_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/mot/ocsort_ppyoloe_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..0053fe17b7f1a1f43f8a1c315039accdfea20068 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/mot/ocsort_ppyoloe_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ocsort_ppyoloe +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=30 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=4 +pretrain_weights:null +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/mot/test.mp4 +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/mot/ocsort/ocsort_ppyoloe.yml -o +pact_train:tools/train.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval_mot.py -c configs/mot/ocsort/ocsort_ppyoloe.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +norm_export:tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o +pact_export:tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/pptracking/python/mot_sde_infer.py --tracker_config deploy/pptracking/python/tracker_config.yml +--device:gpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--video_file:./dataset/mot/test.mp4 +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================disable_train_benchmark========================== +batch_size:4|12 +fp_items:fp32|fp16 +epoch:1 +repeat:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..e26a3268c8867956a40dc919899c9796e8ad6b27 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:null +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f19917586226213d7cac30b082b71dfe68d040f7 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:picodet_lcnet_1_5x_416_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..5a2a42a36fc374a62deaf58cce6c400907af40d7 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..5a735875698e5b605e34f64bb29bdc9d1f375b40 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/picodet_lcnet_1_5x_416_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..8fca5df2006f2c2c7fcf96372c0a8b4f8d264f65 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/picodet_lcnet_1_5x_416_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +--model:null +--op:picodet_lcnet_1_5x_416_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..785ebbefc93f84f608c278b583aa9956ac34771b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:picodet_lcnet_1_5x_416_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/picodet_lcnet_1_5x_416_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..2efe3a95243a1ae633c12d36c430bbcc7b8f2bad --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..0c3d41d5bd4646711e9c224f8799560016bd9f30 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/yolov3_darknet53_270e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1dbd287b7b2f36a319fc34a3fdfe26e779ab5e8e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:picodet_lcnet_1_5x_416_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..c99ec0bf710e543d4baaa1e198edafea8c3c9e2b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..159e872dd3b5b9d99d13efed1cb5fd1b6d032441 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=80 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:32 +fp_items:fp32|fp16 +epoch:1 +repeat:30 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x416x416_2.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..774fcca1d4d4a42a18d442a9dfadd354cd5fa227 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=80 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..19ff2c7b8ed44ff94e00722c03ec507fe7aa8b65 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=80 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..0421fa2e4c502fe11cdf7a0b33a7c011f77386a0 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=80 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..91198f1298c01050dce664f5dd411bfdc74aa886 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_lcnet_1_5x_416_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:picodet_lcnet_1_5x_416_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_lcnet_1_5x_416_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/more_config/picodet_lcnet_1_5x_416_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..3a286ba9448dec47c48104cd376ece50cba31c6f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +quant_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..44d0554d85670ec89006e3ef05ec915b01e0cab4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +quant_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/picodet/yolov3_darknet53_270e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b69999e3797fbdeb7024934f1f3c830c11000d31 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..f51595a9e77bf1a69b245c70dfc2f3d5b6c010bd --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=64 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:64 +fp_items:fp32|fp16 +epoch:1 +repeat:25 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320_2.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..9c959a6a5e5f5e3ecc6e343678f88c8d315a775e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..cf61a5ab1c524f411c0e24c2b96b25d4da3b810a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..790357644519626b9a591d808b0b2b59b38275af --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +norm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml -o +pact_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config configs/slim/post_quant/picodet_s_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..7d5899bf4703ae326e83787e7ee00f31168d387d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_lcnet_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:picodet_s_320_coco_lcnet +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams +kl_quant_export:tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..6919e9dee1d3ac234bb68073eab773f7698fe4ba --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1b8c2d90021e84e5d4f04bd26a0afb8a5470277c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/picodet_s_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f3cf5d80645c734103bd2aa5e82ae69ba8d231e4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/post_quant/picodet_s_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..0317fcf6f649073dadb6fa6f8cc8aea288e64c25 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:64 +fp_items:fp32|fp16 +epoch:1 +repeat:50 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320_2.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..c6045b5274eb9a5fe602ebdd1ab803ae8651a731 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..4c3cf24d810f54a3ecc20ef6d24a73711bbcda4a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..5bc5bce12b157492a0fe2ab87f5525b8a7d4b293 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=128 +pretrain_weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +norm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml -o +pact_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/post_quant/picodet_s_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..24c9c209d27427b8dbb4502cd221411df19b85dd --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/picodet/picodet_s_320_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:picodet_s_320_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/picodet/legacy_model/picodet_s_320_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/pipeline/pphuman_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/pipeline/pphuman_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ea8ad6b77850c1647ad2965506b4ff95ccaef884 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/pipeline/pphuman_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt @@ -0,0 +1,13 @@ +===========================pipeline_infer_python_params=========================== +model_name:pphuman +python:python3.7 +filename:null +## +infer_mode:norm +input_list:image|video +use_gpu:True +inference:deploy/pipeline/pipeline.py --config=deploy/pipeline/config/infer_cfg_pphuman.yml +--device:cpu|gpu +--image_file:./demo/000000014439.jpg +--video_file:./dataset/mot/test.mp4 + diff --git a/PaddleDetection-release-2.6/test_tipc/configs/pipeline/ppvehicle_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/pipeline/ppvehicle_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..0914bb237bbdcb02d9d86494560c9f6f317c11cd --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/pipeline/ppvehicle_linux_gpu_normal_normal_pipeline_python_linux_gpu_cpu.txt @@ -0,0 +1,13 @@ +===========================pipeline_infer_python_params=========================== +model_name:ppvehicle +python:python3.7 +filename:null +## +infer_mode:norm +input_list:image|video +use_gpu:True +inference:deploy/pipeline/pipeline.py --config=deploy/pipeline/config/infer_cfg_ppvehicle.yml +--device:cpu|gpu +--image_file:./demo/000000014439.jpg +--video_file:./dataset/mot/test.mp4 + diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_FPGM_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_FPGM_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..c9f3f5406f61a3f4a7fa42fb8cc999dae1d15e48 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_FPGM_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_mbv3_large_coco_FPGM +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:fpgm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:fpgm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..af553e6ddff00e270c33ae4e6fcf8d2b49a2cc87 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolo_mbv3_large_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +opencv_dir:default +infer_mode:null +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..213f652fc2a6dfb8350afb61d932fbe0af43e3b4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyolo_mbv3_large_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:ppyolo_mbv3_large_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..99060a392484270ab7527daa947b5e364b770718 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolo_mbv3_large_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..3300ff5a8d50130e30727a34223e9e4e2a1fce2a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolo_mbv3_large_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..08cb4697a9a07a5c1e915914cb741f5c2f15e55a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyolo_mbv3_large_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +--model:null +--op:ppyolo_mbv3_large_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..fc0d7b5e958be83483a218f9e98825a0cb33c9bc --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolo_mbv3_large_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..5b8fadffa17df5884c7fe28d6b6f99ce1b226ac7 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..eaacc58b4948feade787851c3b58b223cd532a30 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f93cc9043037bbc6fc4d840554eb42527c77dab4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:ppyolo_mbv3_large_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..2fb74136a1c40267311b10ee40dd0687ac7bce8e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..7900d82fbb042f1457ceedf63d1187b2be77a295 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================disable_train_benchmark========================== +batch_size:24 +fp_items:fp32|fp16 +epoch:1 +repeat:10 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..cad3ab3ec3b3619a3d1aabcc6fdc131dfad5a340 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..e89918d47c6417993b92dbbac35c34356c308f00 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..d0e17fe6aa098f106bf3f85500d2bd85a45485d7 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/ppyolo_mbv3_large_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..38190e4bfce327b73585e64e8c8e9532fae44ecf --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_mbv3_large_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:ppyolo_mbv3_large_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --slim_config configs/slim/post_quant/ppyolo_mbv3_large_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_FPGM_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_FPGM_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..1699b22d91baca0c97ec6f7b95a37dfb005aa29b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_FPGM_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco_FPGM +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:fpgm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:fpgm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_KL_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_KL_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..8c6095920d394846ecc364460eeb0e0e350195bb --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_KL_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco_KL +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:null +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:kl_quant +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_PACT_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_PACT_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..ecdb0f3345354358cfebd0a60001c1484e44e0b1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_PACT_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco_PACT +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..31922da1c879483ec9c0d6dfb4f015a5e6a4a6a1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..76e39ce8d670b0acfcae20d66ebfa2a41a8c88b4 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b09d63c939c5d59f4c81179df9e6b8deca42e441 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..dc5642ee16cec3eb5fdbc141313277c12cacccbb --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================disable_train_benchmark========================== +batch_size:24 +fp_items:fp32|fp16 +epoch:1 +repeat:10 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..2c1b13be0f3f5015fbed3b0fea1d64cd1cd1622a --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..2aac9ba9f4b05ec401e44f9587cd1c684a60dd7f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/ppyolo_r50vd_qat_pact.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..fd67e2db38fbfa3c87d1c892d03b7b1cde9d227f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=405 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=24 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/ppyolo_r50vd_dcn_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..d15b9617ffd459afd19b5e619dfbd2c36d0abc6f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:ppyolo_r50vd_dcn_1x_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b92b1cca879d6cd597307852d466fd7f1a1825d6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..01d8e904f2418a61f101395dc891fdc027a2a68c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/yolov3_darknet53_270e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..cf1fef2a8acc01434af814152999fc929c3d94ad --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..e4fb006961b8e3a0bec7797b60d926848059b5f6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=650 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================disable_train_benchmark========================== +batch_size:32 +fp_items:fp32|fp16 +epoch:1 +repeat:12 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7c965396bfb06d30d91a907566d54d90d9a0c38 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=650 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x320x320.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..dcd3252fc5bd294abdb9bd455a75d8339de9cd07 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=650 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..e055806a18dc3f34bc334510ad808f743ee90353 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=650 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..d01c05fbbd536647ca8b6ff087779dd86771dbc9 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolo_tiny_650e_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:ppyolo_tiny_650e_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..e9b4e05a4a6a308982c871504bb57a70472fda52 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..83aeafc04b6dd4c4cd52dfc2ba690cb631839746 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/yolov3_darknet53_270e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1a346009f63c60d1fbf2f1a8989fc407b42eb6f3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_kl_quant --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..89b0e6e2bad5d1578ddd3d3a0ee8919334b8e39c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=365 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=12 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:12 +fp_items:fp32|fp16 +epoch:1 +repeat:5 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..5bf96efd48bb593bcb86aa9660fc841925e8ac75 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=365 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=12 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..b6423b95a2c67b14165070e60386f9a66ba1fba1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=365 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=12 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..6150c3da6727b1a973dc16c58b2f46dd3658cd8b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=365 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=12 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..5a369cef7334903c1d7ea19668a12833272266e8 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:ppyolov2_r50vd_dcn_365e_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe+/ppyoloe_plus_crn_s_80e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe+/ppyoloe_plus_crn_s_80e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..62ee907afcefe8845636fdfd469a2736019ae22c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe+/ppyoloe_plus_crn_s_80e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyoloe_plus_crn_s_80e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o +pact_train:tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================disable_train_benchmark========================== +batch_size:8 +fp_items:fp32|fp16 +epoch:1 +repeat:12 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ab954b2df207ce1b2e260fb7bd9b34663e371975 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyoloe_crn_s_300e_coco_KL +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:null +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ca4f8c586911c219716cd9ab01da842572a16ce3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:ppyoloe_crn_s_300e_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..360182fdf429da845e1ef70a520ebaefca1ed1e5 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_KL_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ab7f395c333d4e4ca902ac9bf6946dbf644ee510 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyoloe_crn_s_300e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyoloe_crn_s_300e_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/quant/ppyoloe_l_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..44910e77b0f94f9c6a4a2f66e77e887eb26e8bbd --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyoloe_crn_s_300e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyoloe_crn_s_300e_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/quant/ppyoloe_l_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +--model:null +--op:ppyoloe_crn_s_300e_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..9eaa3067571a46ff611d7fdfcfcde0c45c540cb6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyoloe_crn_s_300e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyoloe_crn_s_300e_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/quant/ppyoloe_l_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:null +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ed91dbb7ed6872c5373161e86caa87ed69dd9793 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_kl_quant -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..81c733ac03b453a7c0dc219261479d3be7db5aeb --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..24aaa668133d25277e06a27c597a3b7f55286573 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:ppyoloe_crn_s_300e_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..8e0283e535cb1f793eaa748873b94a3303a18eed --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d9f5569c7e9122e218d87153c2bf2be251e8711 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:8 +fp_items:fp32|fp16 +epoch:1 +repeat:12 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..37f9bc4a0709ba93065db81f5081dd8f0fc62d8f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f11713ba9acc3cd3870501c68ced5abcd78559a6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..dde5cb2f775bebb836414961a892e4e130760b7e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=32 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o +pact_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..4bde5e37f54a121adea8f66e67c5602e6b43c0b0 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:ppyoloe_crn_s_300e_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/retinanet/retinanet_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/retinanet/retinanet_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..802b895b7ae6ca97331fc2d7b43c9ee55a2458b8 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/retinanet/retinanet_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:retinanet_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:null +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:null +norm_export:tools/export_model.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml b/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..8935ce70926ef341823d4513c94c60fa4b3115fb --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../../../../configs/datasets/spine_coco.yml', + '../../../../configs/runtime.yml', + '../../../../configs/rotate/fcosr/_base_/optimizer_3x.yml', + '../../../../configs/rotate/fcosr/_base_/fcosr_reader.yml', + '../../../../configs/rotate/fcosr/_base_/fcosr_x50.yml' +] + +weights: output/fcosr_x50_3x_dota/model_final diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..30e08fce21807b8881cb10b8355f78d03f33595b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:fcosr_x50_3x_spine_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/spine_coco/test/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml -o +pact_train:tools/train.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/rotate/fcosr/fcosr_x50_3x_spine_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/spine_coco/test/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x1024x1024.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml b/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..f6ac3c42ad71a7abe0d118e18fc1fe31f5473f7c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + '../../../../configs/datasets/spine_coco.yml', + '../../../../configs/runtime.yml', + '../../../../configs/rotate/ppyoloe_r/_base_/optimizer_3x.yml', + '../../../../configs/rotate/ppyoloe_r/_base_/ppyoloe_r_reader.yml', + '../../../../configs/rotate/ppyoloe_r/_base_/ppyoloe_r_crn.yml' +] + +log_iter: 50 +snapshot_epoch: 1 +weights: output/ppyoloe_r_crn_s_3x_dota/model_final + +depth_mult: 0.33 +width_mult: 0.50 diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..92d2d76f89dd27bf004d62a25a4093e13f0b7f78 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyoloe_r_crn_s_3x_spine_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/spine_coco/test/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml -o +pact_train:tools/train.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_spine_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/spine_coco/test/ +--save_log_path:null +--run_benchmark:False +null:null +===========================disable_train_benchmark========================== +batch_size:2 +fp_items:fp32 +epoch:5 +repeat:12 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x1024x1024.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_1x_spine_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_1x_spine_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..4e8678ed4558b619a12b95112c1a90b514a096d3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_1x_spine_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:s2anet_1x_spine +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/spine_coco/test/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o +pact_train:tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +norm_export:tools/export_model.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o +pact_export:tools/export_model.py -c configs/rotate/s2anet/s2anet_1x_spine.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/rotate/s2anet/s2anet_1x_spine.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/rotate/s2anet/s2anet_1x_spine.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/spine_coco/test/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x1024x1024.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml new file mode 100644 index 0000000000000000000000000000000000000000..07a91225ebd9f7663ed4e301ad15b95bd2003ad2 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml @@ -0,0 +1,26 @@ +_BASE_: [ + '../../../configs/datasets/spine_coco.yml', + '../../../configs/runtime.yml', + '../../../configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml', + '../../../configs/rotate/s2anet/_base_/s2anet.yml', + '../../../configs/rotate/s2anet/_base_/s2anet_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams + +weights: output/s2anet_alignconv_2x_dota/model_final + +S2ANetHead: + anchor_strides: [8, 16, 32, 64, 128] + anchor_scales: [4] + anchor_ratios: [1.0] + anchor_assign: RBoxAssigner + stacked_convs: 2 + feat_in: 256 + feat_out: 256 + num_classes: 9 + align_conv_type: 'AlignConv' # AlignConv Conv + align_conv_size: 3 + use_sigmoid_cls: True + reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.1] + cls_loss_weight: [1.1, 1.05] + reg_loss_type: 'l1' diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..941e37bc6d169039a18d78ba47645e37aad1cf0d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_alignconv_2x_spine_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:s2anet_alignconv_2x_spine +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=24 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/spine_coco/test/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml -o +pact_train:tools/train.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/rotate/s2anet_alignconv_2x_spine.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/spine_coco/test/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x1024x1024.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine.yml b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine.yml new file mode 100644 index 0000000000000000000000000000000000000000..23610b08ab9782f741634597eff28575d3e5dafa --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine.yml @@ -0,0 +1,31 @@ +_BASE_: [ + '../../../configs/datasets/spine_coco.yml', + '../../../configs/runtime.yml', + '../../../configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml', + '../../../configs/rotate/s2anet/_base_/s2anet.yml', + '../../../configs/rotate/s2anet/_base_/s2anet_reader.yml', +] +weights: output/s2anet_conv_1x_dota/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +ResNet: + depth: 50 + variant: b + norm_type: bn + return_idx: [1,2,3] + num_stages: 4 + +S2ANetHead: + anchor_strides: [8, 16, 32, 64, 128] + anchor_scales: [4] + anchor_ratios: [1.0] + anchor_assign: RBoxAssigner + stacked_convs: 2 + feat_in: 256 + feat_out: 256 + num_classes: 9 + align_conv_type: 'Conv' # AlignConv Conv + align_conv_size: 3 + use_sigmoid_cls: True + reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.1] + cls_loss_weight: [1.1, 1.05] diff --git a/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..35497303f02e8f4f54baf46c11d8162800b0f75c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/rotate/s2anet_conv_2x_spine_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:s2anet_conv_2x_spine +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=24 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1 +pretrain_weights:https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/spine_coco/test/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml -o +pact_train:tools/train.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams +norm_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml -o +pact_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c test_tipc/configs/rotate/s2anet_conv_2x_spine.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/spine_coco/test/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x1024x1024.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..ae1081f11936e75e7fb975ff90ed45c24acb4835 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyoloe_plus_sod_crn_l_80e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml -o +pact_train:tools/train.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_coco.pdparams +norm_export:tools/export_model.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml -o +pact_export:tools/export_model.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ppyoloe/smalldet/ppyoloe_plus_sod_crn_l_80e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================disable_train_benchmark========================== +batch_size:8 +fp_items:fp32|fp16 +epoch:1 +repeat:12 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_enhance_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_enhance_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..b0ba7cfd4ae6873bd92d59d9b2c22772b6b513d0 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_enhance_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:solov2_r50_enhance_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/solov2_r50_enhance_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/solov2/solov2_r50_enhance_coco.yml -o +pact_train:tools/train.py -c configs/solov2/solov2_r50_enhance_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/solov2/solov2_r50_enhance_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/solov2/solov2_r50_enhance_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/solov2_r50_enhance_coco.pdparams +norm_export:tools/export_model.py -c configs/solov2/solov2_r50_enhance_coco.yml -o +pact_export:tools/export_model.py -c configs/solov2/solov2_r50_enhance_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/solov2/solov2_r50_enhance_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/solov2/solov2_r50_enhance_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x512x864.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..8d81e6f5a8c863c91d357016ecb402bb88592593 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/solov2/solov2_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:solov2_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================train_benchmark_params========================== +batch_size:2|4 +fp_items:fp32|fp16 +epoch:1 +repeat:2 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..e6fb799ea9eda560f33ac108b0ce2c6a1658f431 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco_train_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:sparse_rcnn_r50_fpn_3x_pro100_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=36 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://paddledet.bj.bcebos.com/models/sparse_rcnn_r50_fpn_3x_pro100_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml -o +pact_train:tools/train.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/sparse_rcnn_r50_fpn_3x_pro100_coco.pdparams +norm_export:tools/export_model.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml -o +pact_export:tools/export_model.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/sparse_rcnn/sparse_rcnn_r50_fpn_3x_pro100_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ssd/ssdlite_mobilenet_v1_300_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ssd/ssdlite_mobilenet_v1_300_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..da3eec79f52a214d948fd7a6c7d315cccd04ad55 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ssd/ssdlite_mobilenet_v1_300_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ssdlite_mobilenet_v1_300_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=1700 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=64 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ssdlite_mobilenet_v1_300_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml -o +pact_train:tools/train.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ssdlite_mobilenet_v1_300_coco.pdparams +norm_export:tools/export_model.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml -o +pact_export:tools/export_model.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ssd/ssdlite_mobilenet_v1_300_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x300x300.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/tood/tood_r50_fpn_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/tood/tood_r50_fpn_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..909d805067938366dc15882f2a94967436594a13 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/tood/tood_r50_fpn_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:tood_r50_fpn_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=4 +pretrain_weights:https://paddledet.bj.bcebos.com/models/tood_r50_fpn_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/tood/tood_r50_fpn_1x_coco.yml -o +pact_train:tools/train.py -c configs/tood/tood_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/tood/tood_r50_fpn_1x_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/tood/tood_r50_fpn_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/tood_r50_fpn_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/tood/tood_r50_fpn_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/tood/tood_r50_fpn_1x_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/tood/tood_r50_fpn_1x_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/tood/tood_r50_fpn_1x_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x800x1344.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/ttfnet/ttfnet_darknet53_1x_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/ttfnet/ttfnet_darknet53_1x_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..fbb75cba36c87f5b4d5d29f5afdcfab9f803325b --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/ttfnet/ttfnet_darknet53_1x_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:ttfnet_darknet53_1x_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=12 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=12 +pretrain_weights:https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml -o +pact_train:tools/train.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams +norm_export:tools/export_model.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml -o +pact_export:tools/export_model.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/ttfnet/ttfnet_darknet53_1x_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x512x512.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..f68826117413a4db828bd6cf9e5ff2ff11687221 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:ppyoloe_vit_base_csppan_cae_36e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=1|lite_train_whole_infer=2|whole_train_whole_infer=2 +pretrain_weights:https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml -o +pact_train:tools/train.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams +norm_export:tools/export_model.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml -o +pact_export:tools/export_model.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/vitdet/ppyoloe_vit_base_csppan_cae_36e_coco.yml --slim_config configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml -o +## +infer_mode:norm|kl_quant +infer_quant:False|True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 +===========================disable_train_benchmark========================== +batch_size:2 +fp_items:fp32|fp16 +epoch:1 +repeat:2 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_FPGM_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_FPGM_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..e56c1540becf395de5dda756b77736f09d806142 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_FPGM_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:yolov3_darknet53_270e_coco_FPGM +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:fpgm_train +norm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:fpgm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..84933390d61eb23ad4618350d84a0123e4307741 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:yolov3_darknet53_270e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +opencv_dir:default +infer_mode:quant +infer_quant:True +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..0e2a14106a3ea909d9966a5e13f5fd81bd23b8cd --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:yolov3_darknet53_270e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +--model:null +--op:yolov3_darknet53_270e_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..5eb5db71f0c9ad05ff3b948a88d4e1e909794129 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_PACT_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:yolov3_darknet53_270e_coco_PACT +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml --export_serving_model True -o +## +infer_mode:quant +infer_quant:True +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..195cfd4f2c54499a4988f2744162b9a27e1944ed --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,29 @@ +===========================cpp_infer_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +opencv_dir:default +infer_mode:norm +infer_quant:False +inference:./deploy/cpp/build/main +--device:gpu|cpu +--use_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..ad693d77cfe9c9ed835481abcaef8a9972b04b15 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,30 @@ +===========================paddle2onnx_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export_param:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +cmd:paddle2onnx +--model_dir:null +--model_filename:model.pdmodel +--params_filename:model.pdiparams +--save_file:model.onnx +--opset_version:11 +--enable_onnx_checker:True +paddle2onnx_param1:null +infer_py:./deploy/third_engine/onnx/infer.py +--infer_cfg:null +--onnx_file:null +--image_file:./demo/000000014439.jpg +infer_param1:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..121deb17319824adc1420c727280315b9319d52d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,26 @@ +===========================serving_infer_cpp_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +--model:null +--op:yolov3_darknet53_270e_coco +--port:9997 +--gpu_ids:null|0 +null:null +http_client:deploy/serving/cpp/serving_client.py +--serving_client:null +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..70852a112ebd0b2c6c529c8366ee73b98b80c560 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,24 @@ +===========================serving_infer_python_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +filename:null +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --export_serving_model True -o +quant_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml --export_serving_model True -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml --export_serving_model True -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml --export_serving_model True -o +## +infer_mode:norm +infer_quant:False +web_service:deploy/serving/python/web_service.py --config=deploy/serving/python/config.yml +--model_dir:null +--opt:cpu:op.ppdet.local_service_conf.device_type=0|gpu:op.ppdet.local_service_conf.device_type=1 +null:null +http_client:deploy/serving/python/pipeline_http_client.py +--image_file:./demo/000000014439.jpg +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..7c0b3aa5b8e6dc34f4980fd9bea8d3b59ef82bf6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt @@ -0,0 +1,62 @@ +===========================train_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o worker_num=4 +pact_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================train_benchmark_params========================== +batch_size:8 +fp_items:fp32|fp16 +epoch:1 +repeat:3 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy +===========================to_static_train_benchmark_params=========================== +to_static_train:--to_static \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..d79ac7dee17fe9261c69df787da1e3cde766f33d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..7987def921146c15ef04ba8d781465d479e3a9c1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:amp +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +null:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:1600 \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_pact_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..2d3e8e2b625cf4c343771422fa0bee121e4f96ea --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_pact_infer_python.txt @@ -0,0 +1,51 @@ +===========================train_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:pact_train +norm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/quant/yolov3_mobilenet_v3_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:pact +infer_quant:True +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +--trt_max_shape:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_ptq_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..49383b726fd1e8fa118cce5fb068b5feabe63f1d --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_ptq_infer_python.txt @@ -0,0 +1,20 @@ +===========================ptq_params=========================== +model_name:yolov3_darknet53_270e_coco +python:python3.7 +filename: +## +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +export_param1:null +## +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--run_benchmark:False +null:null \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_mobilenet_v1_270e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_mobilenet_v1_270e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..61eac65af1924444fcc8532a56d67ac017d23131 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolov3/yolov3_mobilenet_v1_270e_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:yolov3_mobilenet_v1_270e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=270 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml -o +pact_train:tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config _template_pact -o +fpgm_train:tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config _template_fpgm -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config _template_pact -o +fpgm_export:tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config _template_fpgm -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --slim_config _template_kl_quant -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x608x608.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/configs/yolox/yolox_s_300e_coco_train_infer_python.txt b/PaddleDetection-release-2.6/test_tipc/configs/yolox/yolox_s_300e_coco_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..fa1bf1d693ede67d5e997a0bea9a8592f3df0313 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/configs/yolox/yolox_s_300e_coco_train_infer_python.txt @@ -0,0 +1,53 @@ +===========================train_params=========================== +model_name:yolox_s_300e_coco +python:python3.7 +gpu_list:0|0,1 +use_gpu:True +auto_cast:null +epoch:lite_train_lite_infer=1|lite_train_whole_infer=1|whole_train_whole_infer=300 +save_dir:null +TrainReader.batch_size:lite_train_lite_infer=2|lite_train_whole_infer=2|whole_train_whole_infer=8 +pretrain_weights:https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +trained_model_name:model_final.pdparams +train_infer_img_dir:./dataset/coco/test2017/ +filename:null +## +trainer:norm_train +norm_train:tools/train.py -c configs/yolox/yolox_s_300e_coco.yml -o +pact_train:tools/train.py -c configs/yolox/yolox_s_300e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_train:tools/train.py -c configs/yolox/yolox_s_300e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c configs/yolox/yolox_s_300e_coco.yml -o +null:null +## +===========================infer_params=========================== +--output_dir:./output_inference +weights:https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +norm_export:tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o +pact_export:tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml --slim_config configs/slim/quant/yolov3_darknet_qat.yml -o +fpgm_export:tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml --slim_config configs/slim/prune/yolov3_darknet_prune_fpgm.yml -o +distill_export:null +export1:null +export2:null +kl_quant_export:tools/post_quant.py -c configs/yolox/yolox_s_300e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o +## +infer_mode:norm +infer_quant:False +inference:./deploy/python/infer.py +--device:gpu|cpu +--enable_mkldnn:False +--cpu_threads:4 +--batch_size:1|2 +--use_tensorrt:null +--run_mode:paddle +--model_dir: +--image_dir:./dataset/coco/test2017/ +--save_log_path:null +--run_benchmark:False +null:null +===========================infer_benchmark_params=========================== +numpy_infer_input:3x640x640.npy \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/docs/benchmark_train.md b/PaddleDetection-release-2.6/test_tipc/docs/benchmark_train.md new file mode 100644 index 0000000000000000000000000000000000000000..82c3b07abe9deb621c7a36c459785f68db1b984e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/benchmark_train.md @@ -0,0 +1,50 @@ +# TIPC Linux端Benchmark测试文档 + +该文档为Benchmark测试说明,Benchmark预测功能测试的主程序为`benchmark_train.sh`,用于验证监控模型训练的性能。 + +# 1. 测试流程 +## 1.1 准备数据和环境安装 +运行`test_tipc/prepare.sh`,完成训练数据准备和安装环境流程。 + +```shell +# 运行格式:bash test_tipc/prepare.sh train_benchmark.txt mode +bash test_tipc/prepare.sh test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt benchmark_train +``` + +## 1.2 功能测试 +执行`test_tipc/benchmark_train.sh`,完成模型训练和日志解析 + +```shell +# 运行格式:bash test_tipc/benchmark_train.sh train_benchmark.txt mode +bash test_tipc/benchmark_train.sh test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt benchmark_train +``` + +`test_tipc/benchmark_train.sh`支持根据传入的第三个参数实现只运行某一个训练配置,如下: +```shell +# 运行格式:bash test_tipc/benchmark_train.sh train_benchmark.txt mode +bash test_tipc/benchmark_train.sh test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt benchmark_train dynamic_bs2_fp32_DP_N1C1 +``` +dynamic_bs2_fp32_DP_N1C1为test_tipc/benchmark_train.sh传入的参数,格式如下: +`${modeltype}_${batch_size}_${fp_item}_${run_mode}_${device_num}` +包含的信息有:模型类型、batchsize大小、训练精度如fp32,fp16等、分布式运行模式以及分布式训练使用的机器信息如单机单卡(N1C1)。 + + +## 2. 日志输出 + +运行后将保存模型的训练日志和解析日志,使用 `test_tipc/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco_train_infer_python.txt` 参数文件的训练日志解析结果是: + +``` +{"model_branch": "tipc_fuse_benchmark", "model_commit": "4cce901d231f7954468045cf96302505bd6be495", "model_name": "faster_rcnn_r50_fpn_1x_coco_bs2_fp32_SingleP_DP", "batch_size": 2, "fp_item": "fp32", "run_process_type": "SingleP", "run_mode": "DP", "convergence_value": "0.556966", "convergence_key": "loss:", "ips": 4.857, "speed_unit": "images/s", "device_num": "N1C1", "model_run_time": "590", "frame_commit": "6b0c57cf65945e97d87a8fba89c0a2fc18dd8544", "frame_version": "0.0.0"} +``` + +训练日志和日志解析结果保存在benchmark_log目录下,文件组织格式如下: +``` +train_log/ +├── index +│ └── PaddleDetection_faster_rcnn_r50_fpn_1x_coco_bs2_fp32_SingleP_DP_N1C1_speed +├── profiling_log +│ └── PaddleDetection_faster_rcnn_r50_fpn_1x_coco_bs2_fp32_SingleP_DP_N1C1_profiling +└── train_log + ├── PaddleDetection_faster_rcnn_r50_fpn_1x_coco_bs2_fp32_SingleP_DP_N1C1_log + └── PaddleDetection_faster_rcnn_r50_fpn_1x_coco_bs2_fp32_MultiP_DP_N1C4_log +``` diff --git a/PaddleDetection-release-2.6/test_tipc/docs/guide.png b/PaddleDetection-release-2.6/test_tipc/docs/guide.png new file mode 100644 index 0000000000000000000000000000000000000000..319ac819daff38ed77e84cdff2b122e8bc4a8e5f Binary files /dev/null and b/PaddleDetection-release-2.6/test_tipc/docs/guide.png differ diff --git a/PaddleDetection-release-2.6/test_tipc/docs/install.md b/PaddleDetection-release-2.6/test_tipc/docs/install.md new file mode 100644 index 0000000000000000000000000000000000000000..eaac6908dbb9f5c6f394644da438c0a4f8ca60f3 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/install.md @@ -0,0 +1,149 @@ +## 1. 环境准备 + +本教程适用于test_tipc目录下基础功能测试的运行环境搭建。 + +推荐环境: +- CUDA 10.1/10.2 +- CUDNN 7.6/cudnn8.1 +- TensorRT 6.1.0.5 / 7.1 / 7.2 + +环境配置可以选择docker镜像安装,或者在本地环境Python搭建环境。推荐使用docker镜像安装,避免不必要的环境配置。 + +## 2. Docker 镜像安装 + +推荐docker镜像安装,按照如下命令创建镜像,当前目录映射到镜像中的`/paddle`目录下 +``` +# 启动docker镜像 +nvidia-docker run --name paddle -it -v $PWD:/paddle paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc82-dev /bin/bash +cd /paddle +``` + +``` +# 编译安装Paddle +git clone https://github.com/PaddlePaddle/Paddle.git +cd Paddle +mkdir build && cd build +cmake .. \ + -DWITH_MKL=ON \ + -DWITH_MKLDNN=ON \ + -DWITH_GPU=ON \ + -DWITH_DISTRIBUTE=ON \ + -DCMAKE_BUILD_TYPE=Release \ + -DCUDA_ARCH_NAME=Auto \ + -DPY_VERSION=3.7 \ + -DON_INFER=ON \ + -DWITH_TENSORRT=ON \ + -DTENSORRT_ROOT=/usr/local/TensorRT6-cuda10.1-cudnn7 +make -j 20 +pip3.7 install python/dist/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl +cd ../../ +``` +or +``` +# 下载安装Paddle-2.2 +wget https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.0.post101-cp37-cp37m-linux_x86_64.whl +pip3.7 install paddlepaddle_gpu-2.2.0.post101-cp37-cp37m-linux_x86_64.whl +# 下载C++预测库用于C++ inference +wget https://paddle-inference-lib.bj.bcebos.com/2.2.0/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz +tar -xvf paddle_inference.tgz +export PADDLE_DIR=/paddle/paddle_inference +``` + +## 3 Python 环境构建 + +如果您已经通过docker方式构建环境,跳过该部分内容。非docker环境下,环境配置比较灵活,推荐环境组合配置: +- CUDA10.1 + CUDNN7.6 + TensorRT 6 +- CUDA10.2 + CUDNN8.1 + TensorRT 7 +- CUDA11.1 + CUDNN8.1 + TensorRT 7 + +下面以 CUDA10.2 + CUDNN8.1 + TensorRT 7 配置为例,介绍环境配置的流程。 + +### 3.1 安装CUDNN + +如果当前环境满足CUDNN版本的要求,可以跳过此步骤。 + +以CUDNN8.1 安装安装为例,安装步骤如下,首先下载CUDNN,从[Nvidia官网](https://developer.nvidia.com/rdp/cudnn-archive)下载CUDNN8.1版本,下载符合当前系统版本的三个deb文件,分别是: +- cuDNN Runtime Library ,如:libcudnn8_8.1.0.77-1+cuda10.2_amd64.deb +- cuDNN Developer Library ,如:libcudnn8-dev_8.1.0.77-1+cuda10.2_amd64.deb +- cuDNN Code Samples,如:libcudnn8-samples_8.1.0.77-1+cuda10.2_amd64.deb + +deb安装可以参考[官方文档](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-deb),安装方式如下 +``` +# x.x.x表示下载的版本号 +# $HOME为工作目录 +sudo dpkg -i libcudnn8_x.x.x-1+cudax.x_arm64.deb +sudo dpkg -i libcudnn8-dev_8.x.x.x-1+cudax.x_arm64.deb +sudo dpkg -i libcudnn8-samples_8.x.x.x-1+cudax.x_arm64.deb + +# 验证是否正确安装 +cp -r /usr/src/cudnn_samples_v8/ $HOME +cd $HOME/cudnn_samples_v8/mnistCUDNN + +# 编译 +make clean && make +./mnistCUDNN +``` +如果运行mnistCUDNN完后提示运行成功,则表示安装成功。如果运行后出现freeimage相关的报错,需要按照提示安装freeimage库: +``` +sudo apt-get install libfreeimage-dev +sudo apt-get install libfreeimage +``` + +### 3.2 安装TensorRT + +首先,从[Nvidia官网TensorRT板块](https://developer.nvidia.com/tensorrt-getting-started)下载TensorRT,这里选择7.1.3.4版本的TensorRT,注意选择适合自己系统版本和CUDA版本的TensorRT,另外建议下载TAR package的安装包。 + +以Ubuntu16.04+CUDA10.2为例,下载并解压后可以参考[官方文档](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-713/install-guide/index.html#installing-tar)的安装步骤,按照如下步骤安装: +``` +# 以下安装命令中 '${version}' 为下载的TensorRT版本,如7.1.3.4 +# 设置环境变量, 为解压后的TensorRT的lib目录 +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: + +# 安装TensorRT +cd TensorRT-${version}/python +pip3.7 install tensorrt-*-cp3x-none-linux_x86_64.whl + +# 安装graphsurgeon +cd TensorRT-${version}/graphsurgeon +``` + + +### 3.3 安装PaddlePaddle + +下载支持TensorRT版本的Paddle安装包,注意安装包的TensorRT版本需要与本地TensorRT一致,下载[链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) +选择下载 linux-cuda10.2-trt7-gcc8.2 Python3.7版本的Paddle: +``` +# 从下载链接中可以看到是paddle2.1.1-cuda10.2-cudnn8.1版本 +wget https://paddle-wheel.bj.bcebos.com/with-trt/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddlepaddle_gpu-2.1.1-cp37-cp37m-linux_x86_64.whl +pip3.7 install -U paddlepaddle_gpu-2.1.1-cp37-cp37m-linux_x86_64.whl +``` + +## 4. 安装PaddleDetection依赖 +``` +# 安装AutoLog +git clone https://github.com/LDOUBLEV/AutoLog +cd AutoLog +pip3.7 install -r requirements.txt +python3.7 setup.py bdist_wheel +pip3.7 install ./dist/auto_log-1.0.0-py3-none-any.whl + +# 下载PaddleDetection代码 +cd ../ +git clone https://github.com/PaddlePaddle/PaddleDetection +``` + +安装PaddleDetection依赖: +``` +cd PaddleDetection +pip3.7 install -r ./requirements.txt +``` + +## FAQ : +Q. You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed. + +A. 问题一般是当前安装paddle版本带TRT,但是本地环境找不到TensorRT的预测库,需要下载TensorRT库,解压后设置环境变量LD_LIBRARY_PATH; +如: +``` +export LD_LIBRARY_PATH=/usr/local/python3.7.0/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/paddle/package/TensorRT-6.0.1.5/lib +``` +或者问题是下载的TensorRT版本和当前paddle中编译的TRT版本不匹配,需要下载版本相符的TensorRT重新安装。 diff --git a/PaddleDetection-release-2.6/test_tipc/docs/more_models.md b/PaddleDetection-release-2.6/test_tipc/docs/more_models.md new file mode 100644 index 0000000000000000000000000000000000000000..60cada7027b457e78a37e44d79a15fae6e65595c --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/more_models.md @@ -0,0 +1,73 @@ +## 汇总信息 + +已填写的部分表示可以使用本工具进行一键测试,未填写的表示正在支持中。 + +**字段说明:** +- 基础训练预测:包括模型训练、Paddle Inference Python预测。 +- 更多训练方式:包括多机多卡、混合精度。 +- 模型压缩:包括裁剪、离线/在线量化、蒸馏。 +- 其他预测部署:包括Paddle Inference C++预测、Paddle Serving部署、Paddle-Lite部署等。 + +| 算法论文 | 模型名称 | 模型类型 | 基础
    训练预测 | 更多
    训练方式 | 模型压缩 | 其他预测部署 | +| :--- |:----------------------------------------------------------------------------------------------------------------------| :----: | :--------: | :---- | :---- | :---- | +| [YOLOv3](https://arxiv.org/abs/1804.02767) | [yolov3_darknet53_270e_coco](../../configs/yolov3/yolov3_darknet53_270e_coco.yml) | 目标检测 | 支持 | 混合精度 | FPGM裁剪
    PACT量化
    离线量化 | Paddle Inference: C++ | +| YOLOv3 | [yolov3_mobilenet_v1_270e_coco](../../configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| YOLOv3 | [yolov3_mobilenet_v3_large_270e_coco](../../configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| YOLOv3 | [yolov3_r34_270e_coco](../../configs/yolov3/yolov3_r34_270e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| YOLOv3 | [yolov3_r50vd_dcn_270e_coco](../../configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [PPYOLO](https://arxiv.org/abs/2007.12099) | [ppyolo_mbv3_large_coco](../../configs/ppyolo/ppyolo_mbv3_large_coco.yml) | 目标检测 | 支持 | 混合精度 | FPGM裁剪
    PACT量化
    离线量化 | Paddle Inference: C++ | +| PPYOLO | [ppyolo_r50vd_dcn_1x_coco](../../configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | FPGM裁剪
    PACT量化
    离线量化 | Paddle Inference: C++ | +| PPYOLO | [ppyolo_mbv3_small_coco](../../configs/ppyolo/ppyolo_mbv3_small_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PPYOLO | [ppyolo_r18vd_coco](../../configs/ppyolo/ppyolo_r18vd_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PPYOLO-tiny | [ppyolo_tiny_650e_coco](../../configs/ppyolo/ppyolo_tiny_650e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [PPYOLOv2](https://arxiv.org/abs/2104.10419) | [ppyolov2_r50vd_dcn_365e_coco](../../configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | 目标检测 | 支持 | 多机多卡
    混合精度 | | Paddle Inference: C++ | +| PPYOLOv2 | [ppyolov2_r50vd_dcn_365e_coco](../../configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PPYOLOv2 | [ppyolov2_r101vd_dcn_365e_coco](../../configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [PP-PicoDet](https://arxiv.org/abs/2111.00902) | picodet_s_320_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_m_416_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_l_640_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_lcnet_1_5x_416_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_mobilenetv3_large_1x_416_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_r18_640_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| PP-PicoDet | picodet_shufflenetv2_1x_416_coco | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [SSD](https://arxiv.org/abs/1512.02325) | [ssdlite_mobilenet_v1_300_coco](../../configs/ssd/ssdlite_mobilenet_v1_300_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [Faster R-CNN](https://arxiv.org/abs/1506.01497) | [faster_rcnn_r50_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r34_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r34_vd_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r50_1x_coco](../../configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r50_vd_1x_coco](../../configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r50_vd_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r101_1x_coco](../../configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r101_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_r101_vd_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_x101_vd_64x4d_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Faster R-CNN | [faster_rcnn_swin_tiny_fpn_1x_coco](../../configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [Cascade Faster R-CNN](https://arxiv.org/abs/1712.00726) | [cascade_rcnn_r50_fpn_1x_coco](../../configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Cascade Faster R-CNN | [cascade_rcnn_r50_vd_fpn_ssld_1x_coco](../../configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [FCOS](https://arxiv.org/abs/1904.01355) | [fcos_r50_fpn_1x_coco](../../configs/fcos/fcos_r50_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| FCOS | [fcos_dcn_r50_fpn_1x_coco](../../configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [TTFNet](https://arxiv.org/abs/1909.00700) | [ttfnet_darknet53_1x_coco](../../configs/ttfnet/ttfnet_darknet53_1x_coco.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [S2ANet](https://arxiv.org/abs/2008.09397) | [s2anet_conv_2x_dota](../../configs/dota/s2anet_conv_2x_dota.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| S2ANet | [s2anet_1x_spine](../../configs/dota/s2anet_1x_spine.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| S2ANet | [s2anet_alignconv_2x_dota](../../configs/dota/s2anet_alignconv_2x_dota.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [BlazeFace](https://arxiv.org/abs/1907.05047) | [blazeface_1000e](../../configs/face_detection/blazeface_1000e.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| BlazeFace | [blazeface_fpn_ssh_1000e](../../configs/face_detection/blazeface_fpn_ssh_1000e.yml) | 目标检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [Mask R-CNN](https://arxiv.org/abs/1703.06870) | [mask_rcnn_r50_fpn_1x_coco](../../configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Mask R-CNN | [mask_rcnn_r50_1x_coco](../../configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Mask R-CNN | [mask_rcnn_r50_vd_fpn_1x_coco](../../configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Mask R-CNN | [mask_rcnn_r101_fpn_1x_coco](../../configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Mask R-CNN | [mask_rcnn_r101_vd_fpn_1x_coco](../../configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Mask R-CNN | [mask_rcnn_x101_vd_64x4d_fpn_1x_coco](../../configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [Cascade Mask R-CNN](https://arxiv.org/abs/1906.09756) | [cascade_mask_rcnn_r50_fpn_1x_coco](../../configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| Cascade Mask R-CNN | [cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco](../../configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [SOLOv2](https://arxiv.org/abs/2003.10152) | [solov2_r50_fpn_1x_coco](../../configs/solov2/solov2_r50_fpn_1x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| SOLOv2 | [solov2_r50_enhance_coco](../../configs/solov2/solov2_r50_enhance_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| SOLOv2 | [solov2_r101_vd_fpn_3x_coco](../../configs/solov2/solov2_r101_vd_fpn_3x_coco.yml) | 实例分割 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [PP-Tinypose] | [tinypose_128x96](../../configs/keypoint/tiny_pose/tinypose_128x96.yml) | 关键点检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [HRNet](https://arxiv.org/abs/1902.09212) | [hrnet_w32_256x192](../../configs/keypoint/hrnet/hrnet_w32_256x192.yml) | 关键点检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| HRNet | [dark_hrnet_w32_256x192](../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) | 关键点检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| HRNet | [dark_hrnet_w48_256x192](../../configs/keypoint/hrnet/dark_hrnet_w48_256x192.yml) | 关键点检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [HigherHRNet](https://arxiv.org/abs/1908.10357) | [higherhrnet_hrnet_w32_512](../../configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml) | 关键点检测 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [FairMot](https://arxiv.org/abs/2004.01888) | [fairmot_dla34_30e_576x320](../../configs/mot/fairmot/fairmot_dla34_30e_576x320.yml) | 目标跟踪 | 支持 | 混合精度 | | Paddle Inference: C++ | +| FairMot | [fairmot_hrnetv2_w18_dlafpn_30e_576x320](../../configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) | 目标跟踪 | 支持 | 混合精度 | | Paddle Inference: C++ | +| [JDE](https://arxiv.org/abs/1909.12605) | [jde_darknet53_30e_576x320](../../configs/mot/jde/jde_darknet53_30e_576x320.yml) | 目标跟踪 | 支持 | 混合精度 | | Paddle Inference: C++ | diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test.png b/PaddleDetection-release-2.6/test_tipc/docs/test.png new file mode 100644 index 0000000000000000000000000000000000000000..f99f23d7050eb61879cf317c0d7728ef14531b08 Binary files /dev/null and b/PaddleDetection-release-2.6/test_tipc/docs/test.png differ diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_inference_cpp.md b/PaddleDetection-release-2.6/test_tipc/docs/test_inference_cpp.md new file mode 100644 index 0000000000000000000000000000000000000000..01847af3b694ecc4dfd99d3c686cb31542448adc --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_inference_cpp.md @@ -0,0 +1,99 @@ +# C++预测功能测试 + +C++预测功能测试的主程序为`test_inference_cpp.sh`,可以测试基于C++预测库的模型推理功能。 + +## 1. 测试结论汇总 + +基于训练是否使用量化,进行本测试的模型可以分为`正常模型`和`量化模型`,这两类模型对应的C++预测功能汇总如下: + +| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 | +| ---- | ---- | ---- | :----: | :----: | :----: | +| 正常模型 | GPU | 1/8 | fp32/fp16 | - | - | +| 正常模型 | CPU | 1/8 | - | fp32 | 支持 | +| 量化模型 | GPU | 1/8 | int8 | - | - | +| 量化模型 | CPU | 1/8 | - | int8 | 支持 | + +## 2. 测试流程 +运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。 +``` +# 请设置paddle_inference环境变量,如: +export PADDLE_INFER_DIR=/path/to/paddle_inference +# 若不设置paddle_inference环境变量,也可通过指定参数的方式使脚本自动下载paddle_inference.tgz,如: +bash test_tipc/test_inference_cpp.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt 'https://paddle-inference-lib.bj.bcebos.com/2.3.0/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz' + +# 若未使用docker镜像: paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc82-dev +# 请设置TensorRT环境变量,如: +export TENSORRT_ROOT=/usr/local/TensorRT6-cuda10.1-cudnn7 +``` + +### 2.1 功能测试 +先运行`prepare.sh`准备数据和模型,然后运行`test_inference_cpp.sh`进行测试,最终在```test_tipc/output```目录下生成`cpp_infer_*.log`后缀的日志文件。 + +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt "cpp_infer" +# 用法1: +bash test_tipc/test_inference_cpp.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt +# 用法2: 指定下载paddle_inference.tgz链接,第二个传入参数为下载链接 +bash test_tipc/test_inference_cpp.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt 'https://paddle-inference-lib.bj.bcebos.com/2.3.0/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz' +# 用法3: 同时指定下载paddle_inference.tgz链接和指定GPU卡预测,第三个传入参数为GPU卡号 +bash test_tipc/test_inference_cpp.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt 'https://paddle-inference-lib.bj.bcebos.com/2.3.0/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz' '1' +``` + +运行预测指令后,在`test_tipc/output`文件夹下自动会保存运行日志,包括以下文件: + +```shell +test_tipc/output/ +|- results_cpp.log # 运行指令状态的日志 +|- cpp_infer_cpu_usemkldnn_False_threads_1_precision_fluid_batchsize_1.log # CPU上不开启Mkldnn,线程数设置为1,测试batch_size=1条件下的预测运行日志 +|- cpp_infer_cpu_usemkldnn_False_threads_6_precision_fluid_batchsize_1.log # CPU上不开启Mkldnn,线程数设置为6,测试batch_size=1条件下的预测运行日志 +|- cpp_infer_gpu_precision_fluid_batchsize_1.log # GPU上不开启TensorRT,测试batch_size=1的fp32精度预测日志 +|- cpp_infer_gpu_precision_trt_fp16_batchsize_1.log # GPU上开启TensorRT,测试batch_size=1的fp16精度预测日志 +...... +``` +其中results_cpp.log中包含了每条指令的运行状态,如果运行成功会输出: + +``` +Run successfully with command - python3.7 tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams filename=yolov3_darknet53_270e_coco --output_dir=./output_inference ! +Run successfully with command - ./deploy/cpp/build/main --device=gpu --run_mode=fluid --model_dir=./output_inference/yolov3_darknet53_270e_coco --batch_size=8 --image_dir=./dataset/coco/test2017/ --run_benchmark=True > ./test_tipc/output/cpp_infer_gpu_precision_fluid_batchsize_8.log 2>&1 ! +...... +``` +如果运行失败,会输出: +``` +Run failed with command - python3.7 tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams filename=yolov3_darknet53_270e_coco --output_dir=./output_inference ! +Run failed with command - ./deploy/cpp/build/main --device=gpu --run_mode=fluid --model_dir=./output_inference/yolov3_darknet53_270e_coco --batch_size=8 --image_dir=./dataset/coco/test2017/ --run_benchmark=True > ./test_tipc/output/cpp_infer_gpu_precision_fluid_batchsize_8.log 2>&1 ! +...... +``` +可以很方便的根据results_cpp.log中的内容判定哪一个指令运行错误。 + + +### 2.2 精度测试 + +使用compare_results.py脚本比较模型预测的结果是否符合预期,主要步骤包括: +- 提取日志中的预测坐标; +- 从本地文件中提取保存好的坐标结果; +- 比较上述两个结果是否符合精度预期,误差大于设置阈值时会报错。 + +#### 使用方式 +运行命令: +```shell +python3.7 test_tipc/compare_results.py --gt_file=./test_tipc/results/cpp_*.txt --log_file=./test_tipc/output/cpp_*.log --atol=1e-3 --rtol=1e-3 +``` + +参数介绍: +- gt_file: 指向事先保存好的预测结果路径,支持*.txt 结尾,会自动索引*.txt格式的文件,文件默认保存在test_tipc/result/ 文件夹下 +- log_file: 指向运行test_tipc/test_inference_cpp.sh 脚本的infer模式保存的预测日志,预测日志中打印的有预测结果,比如:文本框,预测文本,类别等等,同样支持cpp_infer_*.log格式传入 +- atol: 设置的绝对误差 +- rtol: 设置的相对误差 + +#### 运行结果 + +正常运行效果如下图: + + +出现不一致结果时的运行输出: + + + +## 3. 更多教程 + +本文档为功能测试用,更详细的c++预测使用教程请参考:[C++预测](../../deploy/cpp/README.md) diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_paddle2onnx.md b/PaddleDetection-release-2.6/test_tipc/docs/test_paddle2onnx.md new file mode 100644 index 0000000000000000000000000000000000000000..373bdb2cb5fb20e93806599b75672e3210e13297 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_paddle2onnx.md @@ -0,0 +1,47 @@ +# Paddle2onnx预测功能测试 + +PaddleServing预测功能测试的主程序为`test_paddle2onnx.sh`,可以测试Paddle2ONNX的模型转化功能,并验证正确性。 + +## 1. 测试结论汇总 + +基于训练是否使用量化,进行本测试的模型可以分为`正常模型`和`量化模型`,这两类模型对应的Paddle2ONNX预测功能汇总如下: + +| 模型类型 |device | +| ---- | ---- | +| 正常模型 | GPU | +| 正常模型 | CPU | +| 量化模型 | GPU | +| 量化模型 | CPU | + +## 2. 测试流程 +### 2.1 功能测试 +先运行`prepare.sh`准备数据和模型,然后运行`test_paddle2onnx.sh`进行测试,最终在```test_tipc/output```目录下生成`paddle2onnx_infer_*.log`后缀的日志文件。 + +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt "paddle2onnx_infer" + +# 用法: +bash test_tipc/test_paddle2onnx.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt +``` + +#### 运行结果 + +各测试的运行情况会打印在 `test_tipc/output/results_paddle2onnx.log` 中: +运行成功时会输出: + +``` +Run successfully with command - yolov3_darknet53_270e_coco - paddle2onnx --model_dir=./output_inference/yolov3_darknet53_270e_coco --model_filename=model.pdmodel --params_filename=model.pdiparams --save_file=./output_inference/yolov3_darknet53_270e_coco/model.onnx --opset_version=11 --enable_onnx_checker=True ! +Run successfully with command - yolov3_darknet53_270e_coco - python3.7 ./deploy/third_engine/onnx/infer.py --infer_cfg=./output_inference/yolov3_darknet53_270e_coco/infer_cfg.yml --onnx_file=./output_inference/yolov3_darknet53_270e_coco/model.onnx --image_file=./demo/000000014439.jpg > ./test_tipc/output/paddle2onnx_infer_cpu.log 2>&1 ! +``` + +运行失败时会输出: + +``` +Run failed with command - yolov3_darknet53_270e_coco - paddle2onnx --model_dir=./output_inference/yolov3_darknet53_270e_coco --model_filename=model.pdmodel --params_filename=model.pdiparams --save_file=./output_inference/yolov3_darknet53_270e_coco/model.onnx --opset_version=11 --enable_onnx_checker=True ! +... +``` + + +## 3. 更多教程 + +本文档为功能测试用,更详细的Paddle2onnx预测使用教程请参考:[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_ptq_inference_python.md b/PaddleDetection-release-2.6/test_tipc/docs/test_ptq_inference_python.md new file mode 100644 index 0000000000000000000000000000000000000000..7b1c04c5b01b5d67ba285e88b1b8c9e3361c2b82 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_ptq_inference_python.md @@ -0,0 +1,44 @@ +# Linux GPU/CPU 离线量化功能测试 + +Linux GPU/CPU 离线量化功能测试的主程序为`test_ptq_inference_python.sh`,可以测试基于Python的离线量化功能。 + +## 1. 测试结论汇总 + +| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 | +| ---- | ---- |-----------| :----: | :----: | :----: | +| 量化模型 | GPU | 1/2 | int8 | - | - | +| 量化模型 | CPU | 1/2 | - | int8 | 支持 | + +## 2. 测试流程 +### 2.1 功能测试 +先运行`prepare.sh`准备数据和模型,然后运行`test_ptq_inference_python.sh`进行测试,最终在```test_tipc/output```目录下生成`python_infer_*.log`后缀的日志文件。 + +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_ptq_infer_python.txt "whole_infer" + +# 用法: +bash test_tipc/test_ptq_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_ptq_infer_python.txt +``` + +#### 运行结果 + +各测试的运行情况会打印在 `test_tipc/output/results_ptq_python.log` 中: +运行成功时会输出: + +``` +Run successfully with command - yolov3_darknet53_270e_coco - python3.7 tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams filename=yolov3_darknet53_270e_coco --output_dir=./output_inference ! +Run successfully with command - yolov3_darknet53_270e_coco - python3.7 ./deploy/python/infer.py --device=gpu --run_mode=paddle --model_dir=./output_inference/yolov3_darknet53_270e_coco --batch_size=2 --image_dir=./dataset/coco/test2017/ --run_benchmark=False > ./test_tipc/output/yolov3_darknet53_270e_coco/whole_infer/python_infer_gpu_mode_paddle_batchsize_2.log 2>&1 ! +... +``` + +运行失败时会输出: + +``` +Run failed with command - yolov3_darknet53_270e_coco - python3.7 tools/post_quant.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --slim_config configs/slim/post_quant/yolov3_darknet53_ptq.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams filename=yolov3_darknet53_270e_coco --output_dir=./output_inference! +... +``` + + +## 3. 更多教程 + +本文档为功能测试用,更详细的离线量化功能使用教程请参考:[Paddle 离线量化官网教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static) diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_serving.md b/PaddleDetection-release-2.6/test_tipc/docs/test_serving.md new file mode 100644 index 0000000000000000000000000000000000000000..097df74dc159cca8a5b96f60de5b5011e92a36fe --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_serving.md @@ -0,0 +1,91 @@ +# PaddleServing预测功能测试 + +PaddleServing预测功能测试的主程序为`test_serving_infer_python.sh`和`test_serving_infer_cpp.sh`,可以测试基于PaddleServing的部署功能。 + +## 1. 测试结论汇总 + +基于训练是否使用量化,进行本测试的模型可以分为`正常模型`和`量化模型`,这两类模型对应的Serving预测功能汇总如下: + +| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 | +| ---- | ---- |-----------| :----: | :----: | :----: | +| 正常模型 | GPU | 1/2 | fp32/fp16 | - | - | +| 正常模型 | CPU | 1/2 | - | fp32 | 支持 | +| 量化模型 | GPU | 1/2 | int8 | - | - | +| 量化模型 | CPU | 1/2 | - | int8 | 支持 | + +## 2. 测试流程 +运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。 + +### 2.1 功能测试 +**python serving** +先运行`prepare.sh`准备数据和模型,然后运行`test_serving_infer_python.sh`进行测试,最终在```test_tipc/output```目录下生成`serving_infer_python*.log`后缀的日志文件。 + +```shell +bash test_tipc/prepare.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt "serving_infer" + +# 用法1: +bash test_tipc/test_serving_infer_python.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt +# 用法2: 指定GPU卡预测,第二个传入参数为GPU卡号 +bash test_tipc/test_serving_infer_python.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt "1" +``` +**cpp serving** +先运行`prepare.sh`准备数据和模型,然后运行`test_serving_infer_cpp.sh`进行测试,最终在```test_tipc/output```目录下生成`serving_infer_cpp*.log`后缀的日志文件。 + +```shell +bash test_tipc/prepare.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt "serving_infer" + +# 用法: +bash test_tipc/test_serving_infer_cpp.sh test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_model_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt +``` + +#### 运行结果 + +各测试的运行情况会打印在 `test_tipc/output/results_serving.log` 中: +运行成功时会输出: + +``` +Run successfully with command - python3.7 pipeline_http_client.py --image_dir=../../doc/imgs > ../../tests/output/server_infer_cpu_usemkldnn_True_threads_1_batchsize_1.log 2>&1 ! +Run successfully with command - xxxxx +... +``` + +运行失败时会输出: + +``` +Run failed with command - python3.7 pipeline_http_client.py --image_dir=../../doc/imgs > ../../tests/output/server_infer_cpu_usemkldnn_True_threads_1_batchsize_1.log 2>&1 ! +Run failed with command - python3.7 pipeline_http_client.py --image_dir=../../doc/imgs > ../../tests/output/server_infer_cpu_usemkldnn_True_threads_6_batchsize_1.log 2>&1 ! +Run failed with command - xxxxx +... +``` + +详细的预测结果会存在 test_tipc/output/ 文件夹下,例如`server_infer_gpu_usetrt_True_precision_fp32_batchsize_1.log`中会返回检测框的坐标: + +``` +{'err_no': 0, 'err_msg': '', 'key': ['dt_boxes'], 'value': ['[[[ 78. 642.]\n [409. 640.]\n [409. 657.]\n +[ 78. 659.]]\n\n [[ 75. 614.]\n [211. 614.]\n [211. 635.]\n [ 75. 635.]]\n\n +[[103. 554.]\n [135. 554.]\n [135. 575.]\n [103. 575.]]\n\n [[ 75. 531.]\n +[347. 531.]\n [347. 549.]\n [ 75. 549.] ]\n\n [[ 76. 503.]\n [309. 498.]\n +[309. 521.]\n [ 76. 526.]]\n\n [[163. 462.]\n [317. 462.]\n [317. 493.]\n +[163. 493.]]\n\n [[324. 431.]\n [414. 431.]\n [414. 452.]\n [324. 452.]]\n\n +[[ 76. 412.]\n [208. 408.]\n [209. 424.]\n [ 76. 428.]]\n\n [[307. 409.]\n +[428. 409.]\n [428. 426.]\n [307 . 426.]]\n\n [[ 74. 385.]\n [217. 382.]\n +[217. 400.]\n [ 74. 403.]]\n\n [[308. 381.]\n [427. 380.]\n [427. 400.]\n +[308. 401.]]\n\n [[ 74. 363.]\n [195. 362.]\n [195. 378.]\n [ 74. 379.]]\n\n +[[303. 359.]\n [423. 357.]\n [423. 375.]\n [303. 377.]]\n\n [[ 70. 336.]\n +[239. 334.]\n [239. 354.]\ n [ 70. 356.]]\n\n [[ 70. 312.]\n [204. 310.]\n +[204. 327.]\n [ 70. 330.]]\n\n [[303. 308.]\n [419. 306.]\n [419. 326.]\n +[303. 328.]]\n\n [[113. 2 72.]\n [246. 270.]\n [247. 299.]\n [113. 301.]]\n\n + [[361. 269.]\n [384. 269.]\n [384. 296.]\n [361. 296.]]\n\n [[ 70. 250.]\n + [243. 246.]\n [243. 265.]\n [ 70. 269.]]\n\n [[ 65. 221.]\n [187. 220.]\n +[187. 240.]\n [ 65. 241.]]\n\n [[337. 216.]\n [382. 216.]\n [382. 240.]\n +[337. 240.]]\n\n [ [ 65. 196.]\n [247. 193.]\n [247. 213.]\n [ 65. 216.]]\n\n +[[296. 197.]\n [423. 191.]\n [424. 209.]\n [296. 215.]]\n\n [[ 65. 167.]\n [244. 167.]\n +[244. 186.]\n [ 65. 186.]]\n\n [[ 67. 139.]\n [290. 139.]\n [290. 159.]\n [ 67. 159.]]\n\n +[[ 68. 113.]\n [410. 113.]\n [410. 128.]\n [ 68. 129.] ]\n\n [[277. 87.]\n [416. 87.]\n +[416. 108.]\n [277. 108.]]\n\n [[ 79. 28.]\n [132. 28.]\n [132. 62.]\n [ 79. 62.]]\n\n +[[163. 17.]\n [410. 14.]\n [410. 50.]\n [163. 53.]]]']} +``` + +## 3. 更多教程 + +本文档为功能测试用,更详细的Serving预测使用教程请参考:[PaddleDetection 服务化部署](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/deploy/serving) diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_train_fleet_inference_python.md b/PaddleDetection-release-2.6/test_tipc/docs/test_train_fleet_inference_python.md new file mode 100644 index 0000000000000000000000000000000000000000..98b0bbd3cb3cf7eba96162f653155185f7b32bc1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_train_fleet_inference_python.md @@ -0,0 +1,76 @@ +# Linux GPU/CPU 多机多卡训练推理测试 + +Linux GPU/CPU 多机多卡训练推理测试的主程序为`test_train_fleet_inference_python.sh`,可以测试基于Python的模型训练、评估、推理等基本功能。 + +## 1. 测试结论汇总 + +- 训练相关: + +| 算法名称 | 模型名称 | 多机多卡 | +|:--------:| :----: | :----: | +| PP-YOLOE | ppyoloe_crn_s_300e_coco | 分布式训练 | + + +- 推理相关: + +| 算法名称 | 模型名称 | device_CPU | device_GPU | batchsize | +|:--------:|:------------------------:| :----: | :----: |:---------:| +| PP-YOLOE | ppyoloe_crn_s_300e_coco | 支持 | 支持 | 1, 2 | + + +## 2. 测试流程 + +运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。 + +### 2.1 功能测试 + +#### 2.1.1 修改配置文件 + +首先,修改配置文件中的`ip`设置: 假设两台机器的`ip`地址分别为`192.168.0.1`和`192.168.0.2`,则对应的配置文件`gpu_list`字段需要修改为`gpu_list:192.168.0.1,192.168.0.2;0,1`; `ip`地址查看命令为`ifconfig`。 + + +#### 2.1.2 准备数据 + +运行`prepare.sh`准备数据和模型,以配置文件`test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt`为例,数据准备命令如下所示。 + +```shell +bash test_tipc/prepare.sh test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt lite_train_lite_infer +``` + +**注意:** 由于是多机训练,这里需要在所有的节点上均运行启动上述命令,准备数据。 + +#### 2.1.3 修改起始端口并开始测试 + +在多机的节点上使用下面的命令设置分布式的起始端口(否则后面运行的时候会由于无法找到运行端口而hang住),一般建议设置在`10000~20000`之间。 + +```shell +export FLAGS_START_PORT=17000 +``` + +以配置文件`test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt`为例,测试方法如下所示。 + +```shell +bash test_tipc/test_train_inference_python.sh test_tipc/configs/ppyoloe/ppyoloe_crn_s_300e_coco_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt lite_train_lite_infer +``` + +**注意:** 由于是多机训练,这里需要在所有的节点上均运行启动上述命令进行测试。 + + +#### 2.1.4 输出结果 + +输出结果如下,表示命令运行成功。 + +```bash + Run successfully with command - python3.7 -m paddle.distributed.launch --ips=192.168.0.1,192.168.0.2 --gpus=0,1 + tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o log_iter=1 use_gpu=True save_dir=./test_tipc/outpu +t/ppyoloe_crn_s_300e_coco/norm_train_gpus_0,1_autocast_null_nodes_2 epoch=1 pretrain_weights=https://paddledet.bj.bc +ebos.com/models/ppyoloe_crn_s_300e_coco.pdparams TrainReader.batch_size=2 filename=ppyoloe_crn_s_300e_coco ! + + ...... + Run successfully with command - python3.7 ./deploy/python/infer.py --device=cpu --enable_mkldnn=False --cpu_threads +=4 --model_dir=./test_tipc/output/ppyoloe_crn_s_300e_coco/norm_train_gpus_0,1_autocast_null_nodes_2/ppyoloe_crn_s_30 +0e_coco --batch_size=2 --image_dir=./dataset/coco/test2017/ --run_benchmark=False --trt_max_shape=1600 > ./test_tipc +/output/ppyoloe_crn_s_300e_coco/python_infer_cpu_usemkldnn_False_threads_4_precision_fluid_batchsize_2.log 2>&1 ! +``` + +**注意:** 由于分布式训练时,仅在`trainer_id=0`所在的节点中保存模型,因此其他的节点中在运行模型导出与推理时会报错,为正常现象。 diff --git a/PaddleDetection-release-2.6/test_tipc/docs/test_train_inference_python.md b/PaddleDetection-release-2.6/test_tipc/docs/test_train_inference_python.md new file mode 100644 index 0000000000000000000000000000000000000000..10459b84346352e1f13c846919574f908b6be2da --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/docs/test_train_inference_python.md @@ -0,0 +1,152 @@ +# Linux端基础训练预测功能测试 + +Linux端基础训练预测功能测试的主程序为`test_train_inference_python.sh`,可以测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。 + +- Mac端基础训练预测功能测试参考[链接](./) +- Windows端基础训练预测功能测试参考[链接](./) + +## 1. 测试结论汇总 + +- 训练相关: + +| 算法名称 | 模型名称 | 单机单卡 | 单机多卡 | 多机多卡 | 模型压缩(单机多卡) | +| :---- | :---- | :---- | :---- | :---- | :---- | +| PPYOLO | ppyolo_mbv3_large_coco | 正常训练
    混合精度 | 正常训练
    混合精度 | 正常训练
    混合精度 | 正常训练:FPGM裁剪、PACT量化
    离线量化(无需训练) | +| PPYOLO | ppyolo_r50vd_dcn_1x_coco | 正常训练
    混合精度 | 正常训练
    混合精度 | 正常训练
    混合精度 | 正常训练:FPGM裁剪、PACT量化
    离线量化(无需训练) | + + +- 预测相关:基于训练是否使用量化,可以将训练产出的模型可以分为`正常模型`和`量化模型`,这两类模型对应的预测功能汇总如下, + +| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 | +| ---- | ---- | ---- | :----: | :----: | :----: | +| 正常模型 | GPU | 1/8 | fp32/fp16 | - | - | +| 正常模型 | CPU | 1/8 | - | fp32/fp16 | 支持 | +| 量化模型 | GPU | 1/8 | int8 | - | - | +| 量化模型 | CPU | 1/8 | - | int8 | 支持 | + + +## 2. 测试流程 + +运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。 + +### 2.1 安装依赖 +- 安装PaddlePaddle >= 2.2 +- 安装PaddleDetection依赖 + ``` + pip install -r ./requirements.txt + pip install -r ./test_tipc/requirements.txt + ``` +- 安装autolog(规范化日志输出工具) + ``` + git clone https://github.com/LDOUBLEV/AutoLog + cd AutoLog + pip install -r ./requirements.txt + python setup.py bdist_wheel + pip install ./dist/auto_log-1.0.0-py3-none-any.whl + ``` +- 安装PaddleSlim (可选) + ``` + # 如果要测试量化、裁剪等功能,需要安装PaddleSlim + pip install paddleslim + ``` + + +### 2.2 功能测试 +先运行`prepare.sh`准备数据和模型,然后运行`test_train_inference_python.sh`进行测试,最终在```test_tipc/output```目录下生成`python_infer_*.log`格式的日志文件, +以yolov3_darknet53_270e_coco为例。 + + +`test_train_inference_python.sh`包含5种运行模式,每种模式的运行数据不同,分别用于测试速度和精度,分别是: + +- 模式1:lite_train_lite_infer,使用少量数据训练,用于快速验证训练到预测的走通流程,不验证精度和速度; +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_lite_infer' +bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_lite_infer' +``` + +- 模式2:lite_train_whole_infer,使用少量数据训练,一定量数据预测,用于验证训练后的模型执行预测,预测速度是否合理; +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_whole_infer' +bash test_tipc/test_train_inference_python.sh ../test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'lite_train_whole_infer' +``` + +- 模式3:whole_infer,不训练,全量数据预测,走通开源模型评估、动转静,检查inference model预测时间和精度; +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'whole_infer' +# 用法1: +bash test_tipc/test_train_inference_python.sh ../test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'whole_infer' +# 用法2: 指定GPU卡预测,第三个传入参数为GPU卡号 +bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'whole_infer' '1' +``` + +- 模式4:whole_train_whole_infer,CE: 全量数据训练,全量数据预测,验证模型训练精度,预测精度,预测速度; +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'whole_train_whole_infer' +bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'whole_train_whole_infer' +``` + +- 模式5:klquant_whole_infer,测试离线量化; +```shell +bash test_tipc/prepare.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'klquant_whole_infer' +bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/yolov3/yolov3_darknet53_270e_coco_train_infer_python.txt 'klquant_whole_infer' +``` + +运行相应指令后,在`test_tipc/output`文件夹下自动会保存运行日志。如'lite_train_lite_infer'模式下,会运行训练+推理的链条,因此,在`test_tipc/output`文件夹有以下文件: +``` +test_tipc/output/ +|- results_python.log # 运行指令状态的日志 +|- norm_train_gpus_0_autocast_null/ # GPU 0号卡上正常训练的训练日志和模型保存文件夹 +|- pact_train_gpus_0_autocast_null/ # GPU 0号卡上量化训练的训练日志和模型保存文件夹 +...... +|- python_infer_cpu_usemkldnn_True_threads_1_precision_fluid_batchsize_1.log # CPU上开启Mkldnn线程数设置为1,测试batch_size=1条件下的预测运行日志 +|- python_infer_gpu_precision_trt_fp16_batchsize_1.log # GPU上开启TensorRT,测试batch_size=1的半精度预测日志 +...... +``` + +其中`results_python.log`中包含了每条指令的运行状态,如果运行成功会输出: +``` +Run successfully with command - python3.7 tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o use_gpu=True save_dir=./test_tipc/output/norm_train_gpus_0_autocast_null epoch=1 pretrain_weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams TrainReader.batch_size=2 filename=yolov3_darknet53_270e_coco ! +Run successfully with command - python3.7 tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=./test_tipc/output/norm_train_gpus_0_autocast_null/yolov3_darknet53_270e_coco/model_final.pdparams use_gpu=True ! +...... +``` +如果运行失败,会输出: +``` +Run failed with command - python3.7 tools/train.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o use_gpu=True save_dir=./test_tipc/output/norm_train_gpus_0_autocast_null epoch=1 pretrain_weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams TrainReader.batch_size=2 filename=yolov3_darknet53_270e_coco ! +Run failed with command - python3.7 tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=./test_tipc/output/norm_train_gpus_0_autocast_null/yolov3_darknet53_270e_coco/model_final.pdparams use_gpu=True ! +...... +``` +可以很方便的根据`results_python.log`中的内容判定哪一个指令运行错误。 + + +### 2.3 精度测试 + +使用compare_results.py脚本比较模型预测的结果是否符合预期,主要步骤包括: +- 提取日志中的预测坐标; +- 从本地文件中提取保存好的坐标结果; +- 比较上述两个结果是否符合精度预期,误差大于设置阈值时会报错。 + +#### 使用方式 +运行命令: +```shell +python3.7 test_tipc/compare_results.py --gt_file=./test_tipc/results/python_*.txt --log_file=./test_tipc/output/python_*.log --atol=1e-3 --rtol=1e-3 +``` + +参数介绍: +- gt_file: 指向事先保存好的预测结果路径,支持*.txt 结尾,会自动索引*.txt格式的文件,文件默认保存在test_tipc/result/ 文件夹下 +- log_file: 指向运行test_tipc/test_train_inference_python.sh 脚本的infer模式保存的预测日志,预测日志中打印的有预测结果,比如:文本框,预测文本,类别等等,同样支持python_infer_*.log格式传入 +- atol: 设置的绝对误差 +- rtol: 设置的相对误差 + +#### 运行结果 + +正常运行效果如下图: + + +出现不一致结果时的运行输出: + + + +## 3. 更多教程 +本文档为功能测试用,更丰富的训练预测使用教程请参考: +[模型训练](../../docs/tutorials/GETTING_STARTED_cn.md) +[PaddleDetection预测部署](../../deploy/README.md) diff --git a/PaddleDetection-release-2.6/test_tipc/prepare.sh b/PaddleDetection-release-2.6/test_tipc/prepare.sh new file mode 100644 index 0000000000000000000000000000000000000000..5d3d890f8808fe7218408b451cb1d38e3aa3bec6 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/prepare.sh @@ -0,0 +1,178 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +# MODE be one of ['lite_train_lite_infer' 'lite_train_whole_infer' +# 'whole_train_whole_infer', 'whole_infer', 'klquant_whole_infer', +# 'cpp_infer', 'serving_infer', 'lite_infer', 'paddle2onnx_infer'] +MODE=$2 + +# parse params +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) + +# The training params +model_name=$(func_parser_value "${lines[1]}") +python=$(func_parser_value "${lines[2]}") + +if [ ${MODE} = "whole_train_whole_infer" ];then + mv ./dataset/coco/download_coco.py . && rm -rf ./dataset/coco/* && mv ./download_coco.py ./dataset/coco/ + # prepare whole training data + eval "${python} ./dataset/coco/download_coco.py" +elif [ ${MODE} = "cpp_infer" ];then + # download coco lite data + wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/tipc/coco_tipc.tar --no-check-certificate + cd ./dataset/coco/ && tar -xvf coco_tipc.tar && mv -n coco_tipc/* . + rm -rf coco_tipc/ && cd ../../ + # download wider_face lite data + wget -nc -P ./dataset/wider_face/ https://paddledet.bj.bcebos.com/data/tipc/wider_tipc.tar --no-check-certificate + cd ./dataset/wider_face/ && tar -xvf wider_tipc.tar && mv -n wider_tipc/* . + rm -rf wider_tipc/ && cd ../../ + # download spine lite data + wget -nc -P ./dataset/spine_coco/ https://paddledet.bj.bcebos.com/data/tipc/spine_tipc.tar --no-check-certificate + cd ./dataset/spine_coco/ && tar -xvf spine_tipc.tar && mv -n spine_tipc/* . + rm -rf spine_tipc/ && cd ../../ + if [[ ${model_name} =~ "s2anet" ]]; then + cd ./ppdet/ext_op && eval "${python} setup.py install" + cd ../../ + elif [[ ${model_name} =~ "tinypose" ]]; then + wget -nc -P ./output_inference/ https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar --no-check-certificate + cd ./output_inference/ && tar -xvf picodet_s_320_pedestrian.tar + cd ../ + fi + # download KL model + if [[ ${model_name} = "picodet_lcnet_1_5x_416_coco_KL" ]]; then + wget -nc -P ./output_inference/picodet_lcnet_1_5x_416_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/picodet_lcnet_1_5x_416_coco_ptq.tar --no-check-certificate + cd ./output_inference/picodet_lcnet_1_5x_416_coco_KL/ && tar -xvf picodet_lcnet_1_5x_416_coco_ptq.tar && mv -n picodet_lcnet_1_5x_416_coco_ptq/* . + cd ../../ + elif [[ ${model_name} = "ppyoloe_crn_s_300e_coco_KL" ]]; then + wget -nc -P ./output_inference/ppyoloe_crn_s_300e_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyoloe_crn_s_300e_coco_ptq.tar --no-check-certificate + cd ./output_inference/ppyoloe_crn_s_300e_coco_KL/ && tar -xvf ppyoloe_crn_s_300e_coco_ptq.tar && mv -n ppyoloe_crn_s_300e_coco_ptq/* . + cd ../../ + elif [[ ${model_name} = "ppyolo_mbv3_large_coco_KL" ]]; then + wget -nc -P ./output_inference/ppyolo_mbv3_large_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyolo_mbv3_large_ptq.tar --no-check-certificate + cd ./output_inference/ppyolo_mbv3_large_coco_KL/ && tar -xvf ppyolo_mbv3_large_ptq.tar && mv -n ppyolo_mbv3_large_ptq/* . + cd ../../ + elif [[ ${model_name} = "mask_rcnn_r50_fpn_1x_coco_KL" ]]; then + wget -nc -P ./output_inference/mask_rcnn_r50_fpn_1x_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/mask_rcnn_r50_fpn_1x_coco_ptq.tar --no-check-certificate + cd ./output_inference/mask_rcnn_r50_fpn_1x_coco_KL/ && tar -xvf mask_rcnn_r50_fpn_1x_coco_ptq.tar && mv -n mask_rcnn_r50_fpn_1x_coco_ptq/* . + cd ../../ + elif [[ ${model_name} = "tinypose_128x96_KL" ]]; then + wget -nc -P ./output_inference/tinypose_128x96_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/tinypose_128x96_ptq.tar --no-check-certificate + cd ./output_inference/tinypose_128x96_KL/ && tar -xvf tinypose_128x96_ptq.tar && mv -n tinypose_128x96_ptq/* . + cd ../../ + fi + # download mot lite data + wget -nc -P ./dataset/mot/ https://paddledet.bj.bcebos.com/data/tipc/mot_tipc.tar --no-check-certificate + cd ./dataset/mot/ && tar -xvf mot_tipc.tar && mv -n mot_tipc/* . + rm -rf mot_tipc/ && cd ../../ + + opencv_dir=$(func_parser_value "${lines[15]}") + # prepare opencv + cd ./deploy/cpp + if [ ${opencv_dir} = "default" ] || [ ${opencv_dir} = "null" ]; then + if [ -d "deps/opencv-3.4.16_gcc8.2_ffmpeg/" ]; then + echo "################### Opencv already exists, skip downloading. ###################" + else + mkdir -p $(pwd)/deps && cd $(pwd)/deps + wget -c https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz --no-check-certificate + tar -xvf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz && cd ../ + echo "################### Finish downloading opencv. ###################" + fi + fi + cd ../../ +elif [ ${MODE} = "benchmark_train" ];then + pip install -U pip + pip install Cython + pip install -r requirements.txt + if [[ ${model_name} =~ "higherhrnet" ]] || [[ ${model_name} =~ "hrnet" ]] || [[ ${model_name} =~ "tinypose" ]];then + wget -nc -P ./dataset/ https://bj.bcebos.com/v1/paddledet/data/coco.tar --no-check-certificate + cd ./dataset/ && tar -xf coco.tar + ls ./coco/ + cd ../ + elif [[ ${model_name} =~ "ppyoloe_r_crn_s_3x_spine_coco" ]];then + wget -nc -P ./dataset/spine_coco/ https://paddledet.bj.bcebos.com/data/tipc/spine_coco_tipc.tar --no-check-certificate + cd ./dataset/spine_coco/ && tar -xvf spine_coco_tipc.tar && mv -n spine_coco_tipc/* . + rm -rf spine_coco_tipc/ && cd ../../ + cd ./ppdet/ext_op && eval "${python} setup.py install" + cd ../../ + else + # prepare lite benchmark coco data + wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/coco_benchmark.tar --no-check-certificate + cd ./dataset/coco/ && tar -xf coco_benchmark.tar + mv -u coco_benchmark/* ./ + ls ./ + cd ../../ + # prepare lite benchmark mot data + wget -nc -P ./dataset/mot/ https://paddledet.bj.bcebos.com/data/mot_benchmark.tar --no-check-certificate + cd ./dataset/mot/ && tar -xf mot_benchmark.tar + mv -u mot_benchmark/* ./ + ls ./ + cd ../../ + fi +elif [ ${MODE} = "paddle2onnx_infer" ];then + # install paddle2onnx + ${python} -m pip install paddle2onnx + ${python} -m pip install onnx onnxruntime +elif [ ${MODE} = "serving_infer" ];then + unset https_proxy http_proxy + # download coco lite data + wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/tipc/coco_tipc.tar --no-check-certificate + cd ./dataset/coco/ && tar -xvf coco_tipc.tar && mv -n coco_tipc/* . + rm -rf coco_tipc/ && cd ../../ + # download KL model + if [[ ${model_name} = "picodet_lcnet_1_5x_416_coco_KL" ]]; then + wget -nc -P ./output_inference/picodet_lcnet_1_5x_416_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/picodet_lcnet_1_5x_416_coco_ptq.tar --no-check-certificate + cd ./output_inference/picodet_lcnet_1_5x_416_coco_KL/ && tar -xvf picodet_lcnet_1_5x_416_coco_ptq.tar && mv -n picodet_lcnet_1_5x_416_coco_ptq/* . + cd ../../ + eval "${python} -m paddle_serving_client.convert --dirname output_inference/picodet_lcnet_1_5x_416_coco_KL/ --model_filename model.pdmodel --params_filename model.pdiparams --serving_server output_inference/picodet_lcnet_1_5x_416_coco_KL/serving_server --serving_client output_inference/picodet_lcnet_1_5x_416_coco_KL/serving_client" + elif [[ ${model_name} = "ppyoloe_crn_s_300e_coco_KL" ]]; then + wget -nc -P ./output_inference/ppyoloe_crn_s_300e_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyoloe_crn_s_300e_coco_ptq.tar --no-check-certificate + cd ./output_inference/ppyoloe_crn_s_300e_coco_KL/ && tar -xvf ppyoloe_crn_s_300e_coco_ptq.tar && mv -n ppyoloe_crn_s_300e_coco_ptq/* . + cd ../../ + eval "${python} -m paddle_serving_client.convert --dirname output_inference/ppyoloe_crn_s_300e_coco_KL/ --model_filename model.pdmodel --params_filename model.pdiparams --serving_server output_inference/ppyoloe_crn_s_300e_coco_KL/serving_server --serving_client output_inference/ppyoloe_crn_s_300e_coco_KL/serving_client" + elif [[ ${model_name} = "ppyolo_mbv3_large_coco_KL" ]]; then + wget -nc -P ./output_inference/ppyolo_mbv3_large_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/ppyolo_mbv3_large_ptq.tar --no-check-certificate + cd ./output_inference/ppyolo_mbv3_large_coco_KL/ && tar -xvf ppyolo_mbv3_large_ptq.tar && mv -n ppyolo_mbv3_large_ptq/* . + cd ../../ + eval "${python} -m paddle_serving_client.convert --dirname output_inference/ppyolo_mbv3_large_coco_KL/ --model_filename model.pdmodel --params_filename model.pdiparams --serving_server output_inference/ppyolo_mbv3_large_coco_KL/serving_server --serving_client output_inference/ppyolo_mbv3_large_coco_KL/serving_client" + elif [[ ${model_name} = "mask_rcnn_r50_fpn_1x_coco_KL" ]]; then + wget -nc -P ./output_inference/mask_rcnn_r50_fpn_1x_coco_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/mask_rcnn_r50_fpn_1x_coco_ptq.tar --no-check-certificate + cd ./output_inference/mask_rcnn_r50_fpn_1x_coco_KL/ && tar -xvf mask_rcnn_r50_fpn_1x_coco_ptq.tar && mv -n mask_rcnn_r50_fpn_1x_coco_ptq/* . + cd ../../ + eval "${python} -m paddle_serving_client.convert --dirname output_inference/mask_rcnn_r50_fpn_1x_coco_KL/ --model_filename model.pdmodel --params_filename model.pdiparams --serving_server output_inference/mask_rcnn_r50_fpn_1x_coco_KL/serving_server --serving_client output_inference/mask_rcnn_r50_fpn_1x_coco_KL/serving_client" + elif [[ ${model_name} = "tinypose_128x96_KL" ]]; then + wget -nc -P ./output_inference/tinypose_128x96_KL/ https://bj.bcebos.com/v1/paddledet/data/tipc/models/tinypose_128x96_ptq.tar --no-check-certificate + cd ./output_inference/tinypose_128x96_KL/ && tar -xvf tinypose_128x96_ptq.tar && mv -n tinypose_128x96_ptq/* . + cd ../../ + eval "${python} -m paddle_serving_client.convert --dirname output_inference/tinypose_128x96_KL/ --model_filename model.pdmodel --params_filename model.pdiparams --serving_server output_inference/tinypose_128x96_KL/serving_server --serving_client output_inference/tinypose_128x96_KL/serving_client" + fi +else + # download coco lite data + wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/tipc/coco_tipc.tar --no-check-certificate + cd ./dataset/coco/ && tar -xvf coco_tipc.tar && mv -n coco_tipc/* . + rm -rf coco_tipc/ && cd ../../ + # download wider_face lite data + wget -nc -P ./dataset/wider_face/ https://paddledet.bj.bcebos.com/data/tipc/wider_tipc.tar --no-check-certificate + cd ./dataset/wider_face/ && tar -xvf wider_tipc.tar && mv -n wider_tipc/* . + rm -rf wider_tipc/ && cd ../../ + # download spine_coco lite data + wget -nc -P ./dataset/spine_coco/ https://paddledet.bj.bcebos.com/data/tipc/spine_coco_tipc.tar --no-check-certificate + cd ./dataset/spine_coco/ && tar -xvf spine_coco_tipc.tar && mv -n spine_coco_tipc/* . + rm -rf spine_coco_tipc/ && cd ../../ + if [[ ${model_name} =~ "s2anet" ]]; then + cd ./ppdet/ext_op && eval "${python} setup.py install" + cd ../../ + elif [[ ${model_name} =~ "ppyoloe_r_crn_s_3x_spine_coco" ]]; then + cd ./ppdet/ext_op && eval "${python} setup.py install" + cd ../../ + elif [[ ${model_name} =~ "fcosr_x50_3x_spine_coco" ]]; then + cd ./ppdet/ext_op && eval "${python} setup.py install" + cd ../../ + fi + # download mot lite data + wget -nc -P ./dataset/mot/ https://paddledet.bj.bcebos.com/data/tipc/mot_tipc.tar --no-check-certificate + cd ./dataset/mot/ && tar -xvf mot_tipc.tar && mv -n mot_tipc/* . + rm -rf mot_tipc/ && cd ../../ +fi diff --git a/PaddleDetection-release-2.6/test_tipc/requirements.txt b/PaddleDetection-release-2.6/test_tipc/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..1a1aa828406da3c5e884799339564e5c1672edec --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/requirements.txt @@ -0,0 +1,7 @@ +pynvml +psutil +GPUtil +paddleslim +onnx +onnxruntime +paddle2onnx diff --git a/PaddleDetection-release-2.6/test_tipc/test_inference_cpp.sh b/PaddleDetection-release-2.6/test_tipc/test_inference_cpp.sh new file mode 100644 index 0000000000000000000000000000000000000000..270ee70397f006ab2460f5e29503586abcd2fa64 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_inference_cpp.sh @@ -0,0 +1,221 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="cpp_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet cpp_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") +filename_value=$(func_parser_value "${lines[3]}") + +# export params +save_export_key=$(func_parser_key "${lines[5]}") +save_export_value=$(func_parser_value "${lines[5]}") +export_weight_key=$(func_parser_key "${lines[6]}") +export_weight_value=$(func_parser_value "${lines[6]}") +norm_export=$(func_parser_value "${lines[7]}") +pact_export=$(func_parser_value "${lines[8]}") +fpgm_export=$(func_parser_value "${lines[9]}") +distill_export=$(func_parser_value "${lines[10]}") +export_key1=$(func_parser_key "${lines[11]}") +export_value1=$(func_parser_value "${lines[11]}") +export_key2=$(func_parser_key "${lines[12]}") +export_value2=$(func_parser_value "${lines[12]}") +kl_quant_export=$(func_parser_value "${lines[13]}") + +# parser cpp inference model +opencv_dir=$(func_parser_value "${lines[15]}") +cpp_infer_mode_list=$(func_parser_value "${lines[16]}") +cpp_infer_is_quant_list=$(func_parser_value "${lines[17]}") +# parser cpp inference +inference_cmd=$(func_parser_value "${lines[18]}") +cpp_use_gpu_key=$(func_parser_key "${lines[19]}") +cpp_use_gpu_list=$(func_parser_value "${lines[19]}") +cpp_use_mkldnn_key=$(func_parser_key "${lines[20]}") +cpp_use_mkldnn_list=$(func_parser_value "${lines[20]}") +cpp_cpu_threads_key=$(func_parser_key "${lines[21]}") +cpp_cpu_threads_list=$(func_parser_value "${lines[21]}") +cpp_batch_size_key=$(func_parser_key "${lines[22]}") +cpp_batch_size_list=$(func_parser_value "${lines[22]}") +cpp_use_trt_key=$(func_parser_key "${lines[23]}") +cpp_use_trt_list=$(func_parser_value "${lines[23]}") +cpp_precision_key=$(func_parser_key "${lines[24]}") +cpp_precision_list=$(func_parser_value "${lines[24]}") +cpp_infer_model_key=$(func_parser_key "${lines[25]}") +cpp_image_dir_key=$(func_parser_key "${lines[26]}") +cpp_infer_img_dir=$(func_parser_value "${lines[26]}") +cpp_benchmark_key=$(func_parser_key "${lines[27]}") +cpp_benchmark_value=$(func_parser_value "${lines[27]}") +cpp_infer_key1=$(func_parser_key "${lines[28]}") +cpp_infer_value1=$(func_parser_value "${lines[28]}") + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_cpp.log" + +function func_cpp_inference(){ + IFS='|' + _script=$1 + _model_dir=$2 + _log_path=$3 + _img_dir=$4 + _flag_quant=$5 + # inference + for use_gpu in ${cpp_use_gpu_list[*]}; do + if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then + for use_mkldnn in ${cpp_use_mkldnn_list[*]}; do + if [ ${use_mkldnn} = "False" ] && [ ${_flag_quant} = "True" ]; then + continue + fi + for threads in ${cpp_cpu_threads_list[*]}; do + for batch_size in ${cpp_batch_size_list[*]}; do + _save_log_path="${_log_path}/cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_mode_paddle_batchsize_${batch_size}.log" + set_infer_data=$(func_set_params "${cpp_image_dir_key}" "${_img_dir}") + set_benchmark=$(func_set_params "${cpp_benchmark_key}" "${cpp_benchmark_value}") + set_batchsize=$(func_set_params "${cpp_batch_size_key}" "${batch_size}") + set_cpu_threads=$(func_set_params "${cpp_cpu_threads_key}" "${threads}") + set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}") + set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}") + command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${cpp_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + done + done + elif [ ${use_gpu} = "True" ] || [ ${use_gpu} = "gpu" ]; then + for precision in ${cpp_precision_list[*]}; do + if [[ ${precision} != "paddle" ]]; then + if [[ ${_flag_quant} = "False" ]] && [[ ${precision} = "trt_int8" ]]; then + continue + fi + if [[ ${_flag_quant} = "True" ]] && [[ ${precision} != "trt_int8" ]]; then + continue + fi + fi + for batch_size in ${cpp_batch_size_list[*]}; do + _save_log_path="${_log_path}/cpp_infer_gpu_mode_${precision}_batchsize_${batch_size}.log" + set_infer_data=$(func_set_params "${cpp_image_dir_key}" "${_img_dir}") + set_benchmark=$(func_set_params "${cpp_benchmark_key}" "${cpp_benchmark_value}") + set_batchsize=$(func_set_params "${cpp_batch_size_key}" "${batch_size}") + set_precision=$(func_set_params "${cpp_precision_key}" "${precision}") + set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}") + set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}") + command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + done + else + echo "Does not support hardware other than CPU and GPU Currently!" + fi + done +} + +cd ./deploy/cpp +# set OPENCV_DIR +if [ ${opencv_dir} = "default" ] || [ ${opencv_dir} = "null" ]; then + OPENCV_DIR=$(pwd)/deps/opencv-3.4.16_gcc8.2_ffmpeg +else + OPENCV_DIR=${opencv_dir} +fi + +# build program +# TODO: set PADDLE_INFER_DIR and TENSORRT_ROOT +if [ -z $PADDLE_INFER_DIR ]; then + Paddle_Infer_Link=$2 + if [ "" = "$Paddle_Infer_Link" ];then + wget -nc https://paddle-inference-lib.bj.bcebos.com/2.2.2/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz --no-check-certificate + tar zxf paddle_inference.tgz + PADDLE_INFER_DIR=$(pwd)/paddle_inference + else + wget -nc $Paddle_Infer_Link --no-check-certificate + tar zxf paddle_inference.tgz + PADDLE_INFER_DIR=$(pwd)/paddle_inference + if [ ! -d "paddle_inference" ]; then + PADDLE_INFER_DIR=$(pwd)/paddle_inference_install_dir + fi + fi +fi +if [ -z $TENSORRT_ROOT ]; then + TENSORRT_ROOT=/usr/local/TensorRT6-cuda10.1-cudnn7 +fi +CUDA_LIB=$(dirname `find /usr -name libcudart.so`) +CUDNN_LIB=$(dirname `find /usr -name libcudnn.so`) +TENSORRT_LIB_DIR="${TENSORRT_ROOT}/lib" +TENSORRT_INC_DIR="${TENSORRT_ROOT}/include" + +rm -rf build +mkdir -p build +cd ./build +cmake .. \ + -DWITH_GPU=ON \ + -DWITH_MKL=ON \ + -DWITH_TENSORRT=OFF \ + -DPADDLE_LIB_NAME=libpaddle_inference \ + -DPADDLE_DIR=${PADDLE_INFER_DIR} \ + -DCUDA_LIB=${CUDA_LIB} \ + -DCUDNN_LIB=${CUDNN_LIB} \ + -DTENSORRT_LIB_DIR=${TENSORRT_LIB_DIR} \ + -DTENSORRT_INC_DIR=${TENSORRT_INC_DIR} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DWITH_KEYPOINT=ON \ + -DWITH_MOT=ON + +make -j8 +cd ../../../ +echo "################### build finished! ###################" + + +# set cuda device +GPUID=$3 +if [ ${#GPUID} -le 0 ];then + env=" " +else + env="export CUDA_VISIBLE_DEVICES=${GPUID}" +fi +eval $env + +# run cpp infer +Count=0 +IFS="|" +infer_quant_flag=(${cpp_infer_is_quant_list}) +for infer_mode in ${cpp_infer_mode_list[*]}; do + if [ ${infer_mode} != "null" ]; then + # run export + case ${infer_mode} in + norm) run_export=${norm_export} ;; + quant) run_export=${pact_export} ;; + fpgm) run_export=${fpgm_export} ;; + distill) run_export=${distill_export} ;; + kl_quant) run_export=${kl_quant_export} ;; + *) echo "Undefined infer_mode!"; exit 1; + esac + set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") + set_filename=$(func_set_params "${filename_key}" "${model_name}") + export_log_path="${LOG_PATH}/export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_save_export_dir} " + echo $export_cmd + eval "${export_cmd} > ${export_log_path} 2>&1" + status_export=$? + cat ${export_log_path} + status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + fi + + #run inference + save_export_model_dir="${save_export_value}/${model_name}" + is_quant=${infer_quant_flag[Count]} + func_cpp_inference "${inference_cmd}" "${save_export_model_dir}" "${LOG_PATH}" "${cpp_infer_img_dir}" ${is_quant} + Count=$(($Count + 1)) +done +eval "unset CUDA_VISIBLE_DEVICES" diff --git a/PaddleDetection-release-2.6/test_tipc/test_lite.sh b/PaddleDetection-release-2.6/test_tipc/test_lite.sh new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/PaddleDetection-release-2.6/test_tipc/test_paddle2onnx.sh b/PaddleDetection-release-2.6/test_tipc/test_paddle2onnx.sh new file mode 100644 index 0000000000000000000000000000000000000000..df4e7a0dc8c0a4f6af844a9fa2edd95f6070073e --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_paddle2onnx.sh @@ -0,0 +1,130 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="paddle2onnx_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet onnx_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") +filename_value=$(func_parser_value "${lines[3]}") + +# export params +save_export_key=$(func_parser_key "${lines[5]}") +save_export_value=$(func_parser_value "${lines[5]}") +export_weight_key=$(func_parser_key "${lines[6]}") +export_weight_value=$(func_parser_value "${lines[6]}") +norm_export=$(func_parser_value "${lines[7]}") +pact_export=$(func_parser_value "${lines[8]}") +fpgm_export=$(func_parser_value "${lines[9]}") +distill_export=$(func_parser_value "${lines[10]}") +export_key1=$(func_parser_key "${lines[11]}") +export_value1=$(func_parser_value "${lines[11]}") +export_param_key=$(func_parser_key "${lines[12]}") +export_param_value=$(func_parser_value "${lines[12]}") +kl_quant_export=$(func_parser_value "${lines[13]}") + +# parser paddle2onnx params +infer_mode_list=$(func_parser_value "${lines[15]}") +infer_is_quant_list=$(func_parser_value "${lines[16]}") + +padlle2onnx_cmd=$(func_parser_value "${lines[17]}") +model_dir_key=$(func_parser_key "${lines[18]}") +model_filename_key=$(func_parser_key "${lines[19]}") +model_filename_value=$(func_parser_value "${lines[19]}") +params_filename_key=$(func_parser_key "${lines[20]}") +params_filename_value=$(func_parser_value "${lines[20]}") +save_file_key=$(func_parser_key "${lines[21]}") +save_file_value=$(func_parser_value "${lines[21]}") +opset_version_key=$(func_parser_key "${lines[22]}") +opset_version_value=$(func_parser_value "${lines[22]}") +enable_onnx_checker_key=$(func_parser_key "${lines[23]}") +enable_onnx_checker_value=$(func_parser_value "${lines[23]}") +paddle2onnx_params1_key=$(func_parser_key "${lines[24]}") +paddle2onnx_params1_value=$(func_parser_value "${lines[24]}") + +# parser onnx inference +inference_py=$(func_parser_value "${lines[25]}") +infer_cfg_key=$(func_parser_key "${lines[26]}") +onnx_file_key=$(func_parser_key "${lines[27]}") +infer_image_key=$(func_parser_key "${lines[28]}") +infer_image_value=$(func_parser_value "${lines[28]}") +infer_param1_key=$(func_parser_key "${lines[29]}") +infer_param1_value=$(func_parser_value "${lines[29]}") + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_paddle2onnx.log" + +function func_paddle2onnx_inference(){ + IFS='|' + _python=$1 + _log_path=$2 + _export_model_dir=$3 + + # paddle2onnx + echo "################### run paddle2onnx ###################" + set_dirname=$(func_set_params "${model_dir_key}" "${_export_model_dir}") + set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") + set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") + set_save_model=$(func_set_params "${save_file_key}" "${_export_model_dir}/${save_file_value}") + set_opset_version=$(func_set_params "${opset_version_key}" "${opset_version_value}") + set_enable_onnx_checker=$(func_set_params "${enable_onnx_checker_key}" "${enable_onnx_checker_value}") + set_paddle2onnx_params1=$(func_set_params "${paddle2onnx_params1_key}" "${paddle2onnx_params1_value}") + trans_log_path="${_log_path}/trans_model.log" + trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker} ${set_paddle2onnx_params1}" + eval "${trans_model_cmd} > ${trans_log_path} 2>&1" + last_status=${PIPESTATUS[0]} + cat ${trans_log_path} + status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log_path}" + + # python inference + echo "################### run onnx infer ###################" + set_infer_cfg=$(func_set_params "${infer_cfg_key}" "${_export_model_dir}/infer_cfg.yml") + set_onnx_file=$(func_set_params "${onnx_file_key}" "${_export_model_dir}/${save_file_value}") + set_infer_image_file=$(func_set_params "${infer_image_key}" "${infer_image_value}") + set_infer_param1=$(func_set_params "${infer_param1_key}" "${infer_param1_value}") + _save_log_path="${_log_path}/paddle2onnx_infer_cpu.log" + infer_model_cmd="${python} ${inference_py} ${set_infer_cfg} ${set_onnx_file} ${set_infer_image_file} ${set_infer_param1}" + eval "${infer_model_cmd} > ${_save_log_path} 2>&1" + last_status=${PIPESTATUS[0]} + cat ${_save_log_path} + status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" +} + +export Count=0 +IFS="|" +for infer_mode in ${infer_mode_list[*]}; do + if [ ${infer_mode} != "null" ]; then + # run export + case ${infer_mode} in + norm) run_export=${norm_export} ;; + quant) run_export=${pact_export} ;; + fpgm) run_export=${fpgm_export} ;; + distill) run_export=${distill_export} ;; + kl_quant) run_export=${kl_quant_export} ;; + *) echo "Undefined infer_mode!"; exit 1; + esac + set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") + set_filename=$(func_set_params "${filename_key}" "${model_name}") + set_export_param=$(func_set_params "${export_param_key}" "${export_param_value}") + export_log_path="${LOG_PATH}/export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_export_param} ${set_save_export_dir} " + echo $export_cmd + eval "${export_cmd} > ${export_log_path} 2>&1" + status_export=$? + cat ${export_log_path} + status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + fi + + #run inference + export_model_dir="${save_export_value}/${model_name}" + func_paddle2onnx_inference "${python}" "${LOG_PATH}" "${export_model_dir}" + Count=$(($Count + 1)) +done diff --git a/PaddleDetection-release-2.6/test_tipc/test_pipeline_infer_python.sh b/PaddleDetection-release-2.6/test_tipc/test_pipeline_infer_python.sh new file mode 100644 index 0000000000000000000000000000000000000000..2e6ef90f5e72d3bf92f56102a10382a8e3678ca1 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_pipeline_infer_python.sh @@ -0,0 +1,94 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="pipeline_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet pipeline_python_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") +filename_value=$(func_parser_value "${lines[3]}") + +# parser infer params +infer_mode_list=$(func_parser_value "${lines[5]}") +input_key=$(func_parser_key "${lines[6]}") +input_list=$(func_parser_value "${lines[6]}") +use_gpu=$(func_parser_value "${lines[7]}") +inference_py=$(func_parser_value "${lines[8]}") +use_device_key=$(func_parser_key "${lines[9]}") +use_device_list=$(func_parser_value "${lines[9]}") +image_dir_key=$(func_parser_key "${lines[10]}") +infer_img_dir=$(func_parser_value "${lines[10]}") +video_dir_key=$(func_parser_key "${lines[11]}") +infer_video_dir=$(func_parser_value "${lines[11]}") + + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_serving_python.log" + + +function func_pipeline_inference(){ + IFS='|' + _python=$1 + _log_path=$2 + _pipeline_script=$3 + _infer_dir=$4 + _input_type=$5 + _device_cmd=$6 + _device_type=$7 + # inference + + pipeline_log_path="${_log_path}/python_pipeline_${_input_type}_${_device_type}.log" + output_path="--output_dir=${LOG_PATH}/" + mot_flag="-o MOT.enable=True" + if [ ${_input_type} = "video" ]; then + pipeline_cmd="${_python} ${_pipeline_script} ${_infer_dir} ${_device_cmd} ${output_path} ${mot_flag} > ${pipeline_log_path} 2>&1 &" + else + pipeline_cmd="${_python} ${_pipeline_script} ${_infer_dir} ${_device_cmd} ${output_path} > ${pipeline_log_path} 2>&1 &" + fi + # run + eval $pipeline_cmd + last_status=${PIPESTATUS[0]} + eval "cat ${pipeline_log_path}" + status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${pipeline_log_path}" + +} + + +#run infer +Count=0 +IFS="|" + +for input in ${input_list[*]}; do + for device_type in ${use_device_list[*]};do + # set cuda device + if [ ${use_gpu} = "False" ] || [ ${device_type} = "cpu" ]; then + device_cmd=$(func_set_params "${use_device_key}" "${device_type}") + elif [ ${use_gpu} = "True" ] && [ ${device_type} = "gpu" ]; then + device_cmd=$(func_set_params "${use_device_key}" "${device_type}") + env="export CUDA_VISIBLE_DEVICES=0" + eval $env + else + echo "Does not support hardware other than CPU and GPU Currently!" + fi + if [ ${input} != "null" ]; then + case ${input} in + image) set_infer_file=$(func_set_params "${image_dir_key}" "${infer_img_dir}") ;; + video) set_infer_file=$(func_set_params "${video_dir_key}" "${infer_video_dir}") ;; + *) echo "Undefined input mode!"; exit 1; + esac + + fi + #run inference + func_pipeline_inference "${python}" "${LOG_PATH}" "${inference_py}" ${set_infer_file} ${input} ${device_cmd} ${device_type} + Count=$(($Count + 1)) + eval "unset CUDA_VISIBLE_DEVICES" + done +done + diff --git a/PaddleDetection-release-2.6/test_tipc/test_ptq_inference_python.sh b/PaddleDetection-release-2.6/test_tipc/test_ptq_inference_python.sh new file mode 100644 index 0000000000000000000000000000000000000000..6371d2ab3e15f0cc94babb2ff57c6e94026c7fba --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_ptq_inference_python.sh @@ -0,0 +1,115 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="whole_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet ptq: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") + +# parser export params +save_export_key=$(func_parser_key "${lines[5]}") +save_export_value=$(func_parser_value "${lines[5]}") +export_weight_key=$(func_parser_key "${lines[6]}") +export_weight_value=$(func_parser_value "${lines[6]}") +kl_quant_export=$(func_parser_value "${lines[7]}") +export_param1_key=$(func_parser_key "${lines[8]}") +export_param1_value=$(func_parser_value "${lines[8]}") + +# parser infer params +inference_py=$(func_parser_value "${lines[10]}") +device_key=$(func_parser_key "${lines[11]}") +device_list=$(func_parser_value "${lines[11]}") +use_mkldnn_key=$(func_parser_key "${lines[12]}") +use_mkldnn_list=$(func_parser_value "${lines[12]}") +cpu_threads_key=$(func_parser_key "${lines[13]}") +cpu_threads_list=$(func_parser_value "${lines[13]}") +batch_size_key=$(func_parser_key "${lines[14]}") +batch_size_list=$(func_parser_value "${lines[14]}") +run_mode_key=$(func_parser_key "${lines[15]}") +run_mode_list=$(func_parser_value "${lines[15]}") +model_dir_key=$(func_parser_key "${lines[16]}") +image_dir_key=$(func_parser_key "${lines[17]}") +image_dir_value=$(func_parser_value "${lines[17]}") +run_benchmark_key=$(func_parser_key "${lines[18]}") +run_benchmark_value=$(func_parser_value "${lines[18]}") +infer_param1_key=$(func_parser_key "${lines[19]}") +infer_param1_value=$(func_parser_value "${lines[19]}") + + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_ptq_python.log" + +function func_ptq_inference(){ + IFS='|' + _python=$1 + _log_path=$2 + _script=$3 + _set_model_dir=$4 + + set_image_dir=$(func_set_params "${image_dir_key}" "${image_dir_value}") + set_run_benchmark=$(func_set_params "${run_benchmark_key}" "${run_benchmark_value}") + set_infer_param1=$(func_set_params "${infer_param1_key}" "${infer_param1_value}") + # inference + for device in ${device_list[*]}; do + set_device=$(func_set_params "${device_key}" "${device}") + if [ ${device} = "cpu" ]; then + for use_mkldnn in ${use_mkldnn_list[*]}; do + set_use_mkldnn=$(func_set_params "${use_mkldnn_key}" "${use_mkldnn}") + for threads in ${cpu_threads_list[*]}; do + set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}") + for batch_size in ${batch_size_list[*]}; do + _save_log_path="${_log_path}/python_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_mode_paddle_batchsize_${batch_size}.log" + set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") + command="${_python} ${_script} ${set_device} ${set_use_mkldnn} ${set_cpu_threads} ${_set_model_dir} ${set_batchsize} ${set_image_dir} ${set_run_benchmark} ${set_infer_param1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + done + done + elif [ ${device} = "gpu" ]; then + for run_mode in ${run_mode_list[*]}; do + if [[ ${run_mode} = "paddle" ]] || [[ ${run_mode} = "trt_int8" ]]; then + for batch_size in ${batch_size_list[*]}; do + _save_log_path="${_log_path}/python_infer_gpu_mode_${run_mode}_batchsize_${batch_size}.log" + set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") + set_run_mode=$(func_set_params "${run_mode_key}" "${run_mode}") + command="${_python} ${_script} ${set_device} ${set_run_mode} ${_set_model_dir} ${set_batchsize} ${set_image_dir} ${set_run_benchmark} ${set_infer_param1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + fi + done + else + echo "Does not support hardware other than CPU and GPU Currently!" + fi + done +} + +IFS="|" +# run ptq +set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") +set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") +set_filename=$(func_set_params "${filename_key}" "${model_name}") +export_log_path="${LOG_PATH}/export.log" +ptq_cmd="${python} ${kl_quant_export} ${set_export_weight} ${set_filename} ${set_save_export_dir}" +echo $ptq_cmd +eval "${ptq_cmd} > ${export_log_path} 2>&1" +status_export=$? +cat ${export_log_path} +status_check $status_export "${ptq_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + +#run inference +set_export_model_dir=$(func_set_params "${model_dir_key}" "${save_export_value}/${model_name}") +func_ptq_inference "${python}" "${LOG_PATH}" "${inference_py}" "${set_export_model_dir}" diff --git a/PaddleDetection-release-2.6/test_tipc/test_serving_infer_cpp.sh b/PaddleDetection-release-2.6/test_tipc/test_serving_infer_cpp.sh new file mode 100644 index 0000000000000000000000000000000000000000..4be299c16d3cabf39683d906cdd4b34bc8171351 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_serving_infer_cpp.sh @@ -0,0 +1,131 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="serving_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet serving_cpp_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") +filename_value=$(func_parser_value "${lines[3]}") + +# parser export params +save_export_key=$(func_parser_key "${lines[5]}") +save_export_value=$(func_parser_value "${lines[5]}") +export_weight_key=$(func_parser_key "${lines[6]}") +export_weight_value=$(func_parser_value "${lines[6]}") +norm_export=$(func_parser_value "${lines[7]}") +pact_export=$(func_parser_value "${lines[8]}") +fpgm_export=$(func_parser_value "${lines[9]}") +distill_export=$(func_parser_value "${lines[10]}") +export_key1=$(func_parser_key "${lines[11]}") +export_value1=$(func_parser_value "${lines[11]}") +export_key2=$(func_parser_key "${lines[12]}") +export_value2=$(func_parser_value "${lines[12]}") +kl_quant_export=$(func_parser_value "${lines[13]}") + +# parser serving params +infer_mode_list=$(func_parser_value "${lines[15]}") +infer_is_quant_list=$(func_parser_value "${lines[16]}") + +model_key=$(func_parser_key "${lines[17]}") +op_key=$(func_parser_key "${lines[18]}") +op_value=$(func_parser_value "${lines[18]}") +port_key=$(func_parser_key "${lines[19]}") +port_value=$(func_parser_value "${lines[19]}") +gpu_ids_key=$(func_parser_key "${lines[20]}") +gpu_ids_value=$(func_parser_value "${lines[20]}") +web_service_key1=$(func_parser_key "${lines[21]}") +web_service_value1=$(func_parser_value "${lines[21]}") +http_client_py=$(func_parser_value "${lines[22]}") +serving_client_key=$(func_parser_key "${lines[23]}") +infer_image_key=$(func_parser_key "${lines[24]}") +infer_image_value=$(func_parser_value "${lines[24]}") +http_client_key1=$(func_parser_key "${lines[25]}") +http_client_value1=$(func_parser_value "${lines[25]}") + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_serving_cpp.log" + +function func_serving_inference(){ + IFS='|' + _python=$1 + _log_path=$2 + _set_server_model_dir=$3 + _set_client_model_dir=$4 + _set_image_file=$5 + + set_op=$(func_set_params "${op_key}" "${op_value}") + set_port=$(func_set_params "${port_key}" "${port_value}") + set_web_service_params1=$(func_set_params "${web_service_key1}" "${web_service_value1}") + set_http_client_params1=$(func_set_params "${http_client_key1}" "${http_client_value1}") + # inference + for gpu_ids in ${gpu_ids_value[*]}; do + if [ ${gpu_ids} = "null" ];then + server_log_path="${_log_path}/cpp_server_cpu.log" + client_log_path="${_log_path}/cpp_client_cpu.log" + else + server_log_path="${_log_path}/cpp_server_gpu.log" + client_log_path="${_log_path}/cpp_client_gpu.log" + fi + set_gpu_ids=$(func_set_params "${gpu_ids_key}" "${gpu_ids}") + # run web service + web_service_cmd="${_python} -m paddle_serving_server.serve ${_set_server_model_dir} ${set_op} ${set_port} ${set_gpu_ids} ${set_web_service_params1} > ${server_log_path} 2>&1 &" + eval $web_service_cmd + last_status=${PIPESTATUS[0]} + cat ${server_log_path} + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" + sleep 5s + # run http client + http_client_cmd="${_python} ${http_client_py} ${_set_client_model_dir} ${_set_image_file} ${set_http_client_params1} > ${client_log_path} 2>&1" + eval $http_client_cmd + last_status=${PIPESTATUS[0]} + cat ${client_log_path} + status_check $last_status "${http_client_cmd}" "${status_log}" "${model_name}" "${client_log_path}" + ps ux | grep -i ${port_value} | awk '{print $2}' | xargs kill -s 9 + sleep 2s + done +} + +# run serving infer +Count=0 +IFS="|" +infer_quant_flag=(${infer_is_quant_list}) +for infer_mode in ${infer_mode_list[*]}; do + if [ ${infer_mode} != "null" ]; then + # run export + case ${infer_mode} in + norm) run_export=${norm_export} ;; + quant) run_export=${pact_export} ;; + fpgm) run_export=${fpgm_export} ;; + distill) run_export=${distill_export} ;; + kl_quant) run_export=${kl_quant_export} ;; + *) echo "Undefined infer_mode!"; exit 1; + esac + set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") + set_filename=$(func_set_params "${filename_key}" "${model_name}") + export_log_path="${LOG_PATH}/export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_save_export_dir} " + echo $export_cmd + eval "${export_cmd} > ${export_log_path} 2>&1" + status_export=$? + cat ${export_log_path} + status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + fi + + #run inference + set_server_model_dir=$(func_set_params "${model_key}" "${save_export_value}/${model_name}/serving_server") + set_client_model_dir=$(func_set_params "${serving_client_key}" "${save_export_value}/${model_name}/serving_client") + set_infer_image_file=$(func_set_params "${infer_image_key}" "${infer_image_value}") + is_quant=${infer_quant_flag[Count]} + func_serving_inference "${python}" "${LOG_PATH}" "${set_server_model_dir}" "${set_client_model_dir}" ${set_infer_image_file} + Count=$(($Count + 1)) +done +eval "unset CUDA_VISIBLE_DEVICES" diff --git a/PaddleDetection-release-2.6/test_tipc/test_serving_infer_python.sh b/PaddleDetection-release-2.6/test_tipc/test_serving_infer_python.sh new file mode 100644 index 0000000000000000000000000000000000000000..fd7cc07b1449793e4ab52b2cd3e2bc7d78cb7433 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_serving_infer_python.sh @@ -0,0 +1,130 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +MODE="serving_infer" + +# parser model_name +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet serving_python_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +filename_key=$(func_parser_key "${lines[3]}") +filename_value=$(func_parser_value "${lines[3]}") + +# parser export params +save_export_key=$(func_parser_key "${lines[5]}") +save_export_value=$(func_parser_value "${lines[5]}") +export_weight_key=$(func_parser_key "${lines[6]}") +export_weight_value=$(func_parser_value "${lines[6]}") +norm_export=$(func_parser_value "${lines[7]}") +pact_export=$(func_parser_value "${lines[8]}") +fpgm_export=$(func_parser_value "${lines[9]}") +distill_export=$(func_parser_value "${lines[10]}") +export_key1=$(func_parser_key "${lines[11]}") +export_value1=$(func_parser_value "${lines[11]}") +export_key2=$(func_parser_key "${lines[12]}") +export_value2=$(func_parser_value "${lines[12]}") +kl_quant_export=$(func_parser_value "${lines[13]}") + +# parser serving params +infer_mode_list=$(func_parser_value "${lines[15]}") +infer_is_quant_list=$(func_parser_value "${lines[16]}") + +web_service_py=$(func_parser_value "${lines[17]}") +model_dir_key=$(func_parser_key "${lines[18]}") +opt_key=$(func_parser_key "${lines[19]}") +opt_use_gpu_list=$(func_parser_value "${lines[19]}") +web_service_key1=$(func_parser_key "${lines[20]}") +web_service_value1=$(func_parser_value "${lines[20]}") +http_client_py=$(func_parser_value "${lines[21]}") +infer_image_key=$(func_parser_key "${lines[22]}") +infer_image_value=$(func_parser_value "${lines[22]}") +http_client_key1=$(func_parser_key "${lines[23]}") +http_client_value1=$(func_parser_value "${lines[23]}") + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_serving_python.log" + +function func_serving_inference(){ + IFS='|' + _python=$1 + _log_path=$2 + _service_script=$3 + _client_script=$4 + _set_model_dir=$5 + _set_image_file=$6 + set_web_service_params1=$(func_set_params "${web_service_key1}" "${web_service_value1}") + set_http_client_params1=$(func_set_params "${http_client_key1}" "${http_client_value1}") + # inference + for opt in ${opt_use_gpu_list[*]}; do + device_type=$(func_parser_key "${opt}") + server_log_path="${_log_path}/python_server_${device_type}.log" + client_log_path="${_log_path}/python_client_${device_type}.log" + opt_value=$(func_parser_value "${opt}") + _set_opt=$(func_set_params "${opt_key}" "${opt_value}") + # run web service + web_service_cmd="${_python} ${_service_script} ${_set_model_dir} ${_set_opt} ${set_web_service_params1} > ${server_log_path} 2>&1 &" + eval $web_service_cmd + last_status=${PIPESTATUS[0]} + cat ${server_log_path} + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" + sleep 5s + # run http client + http_client_cmd="${_python} ${_client_script} ${_set_image_file} ${set_http_client_params1} > ${client_log_path} 2>&1" + eval $http_client_cmd + last_status=${PIPESTATUS[0]} + cat ${client_log_path} + status_check $last_status "${http_client_cmd}" "${status_log}" "${model_name}" "${client_log_path}" + ps ux | grep -E 'web_service' | awk '{print $2}' | xargs kill -s 9 + sleep 2s + done +} + +# set cuda device +GPUID=$3 +if [ ${#GPUID} -le 0 ];then + env="export CUDA_VISIBLE_DEVICES=0" +else + env="export CUDA_VISIBLE_DEVICES=${GPUID}" +fi +eval $env + +# run serving infer +Count=0 +IFS="|" +infer_quant_flag=(${infer_is_quant_list}) +for infer_mode in ${infer_mode_list[*]}; do + if [ ${infer_mode} != "null" ]; then + # run export + case ${infer_mode} in + norm) run_export=${norm_export} ;; + quant) run_export=${pact_export} ;; + fpgm) run_export=${fpgm_export} ;; + distill) run_export=${distill_export} ;; + kl_quant) run_export=${kl_quant_export} ;; + *) echo "Undefined infer_mode!"; exit 1; + esac + set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") + set_filename=$(func_set_params "${filename_key}" "${model_name}") + export_log_path="${LOG_PATH}/export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_save_export_dir} " + echo $export_cmd + eval "${export_cmd} > ${export_log_path} 2>&1" + status_export=$? + cat ${export_log_path} + status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + fi + + #run inference + set_export_model_dir=$(func_set_params "${model_dir_key}" "${save_export_value}/${model_name}") + set_infer_image_file=$(func_set_params "${infer_image_key}" "${infer_image_value}") + is_quant=${infer_quant_flag[Count]} + func_serving_inference "${python}" "${LOG_PATH}" "${web_service_py}" "${http_client_py}" "${set_export_model_dir}" ${set_infer_image_file} + Count=$(($Count + 1)) +done +eval "unset CUDA_VISIBLE_DEVICES" diff --git a/PaddleDetection-release-2.6/test_tipc/test_train_inference_python.sh b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python.sh new file mode 100644 index 0000000000000000000000000000000000000000..d5c09ccf09d56a5320abab9611311d930d9b3318 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python.sh @@ -0,0 +1,365 @@ +#!/bin/bash +source test_tipc/utils_func.sh + +FILENAME=$1 +# MODE be one of ['lite_train_lite_infer' 'lite_train_whole_infer' +# 'whole_train_whole_infer', 'whole_infer', 'klquant_whole_infer'] +MODE=$2 + +# parse params +dataline=$(cat ${FILENAME}) +IFS=$'\n' +lines=(${dataline}) + +# The training params +model_name=$(func_parser_value "${lines[1]}") +echo "ppdet python_infer: ${model_name}" +python=$(func_parser_value "${lines[2]}") +gpu_list=$(func_parser_value "${lines[3]}") +train_use_gpu_key=$(func_parser_key "${lines[4]}") +train_use_gpu_value=$(func_parser_value "${lines[4]}") +autocast_list=$(func_parser_value "${lines[5]}") +autocast_key=$(func_parser_key "${lines[5]}") +epoch_key=$(func_parser_key "${lines[6]}") +epoch_num=$(func_parser_params "${lines[6]}") +save_model_key=$(func_parser_key "${lines[7]}") +train_batch_key=$(func_parser_key "${lines[8]}") +train_batch_value=$(func_parser_params "${lines[8]}") +pretrain_model_key=$(func_parser_key "${lines[9]}") +pretrain_model_value=$(func_parser_value "${lines[9]}") +train_model_name=$(func_parser_value "${lines[10]}") +train_infer_img_dir=$(func_parser_value "${lines[11]}") +train_param_key1=$(func_parser_key "${lines[12]}") +train_param_value1=$(func_parser_value "${lines[12]}") + +trainer_list=$(func_parser_value "${lines[14]}") +norm_key=$(func_parser_key "${lines[15]}") +norm_trainer=$(func_parser_value "${lines[15]}") +pact_key=$(func_parser_key "${lines[16]}") +pact_trainer=$(func_parser_value "${lines[16]}") +fpgm_key=$(func_parser_key "${lines[17]}") +fpgm_trainer=$(func_parser_value "${lines[17]}") +distill_key=$(func_parser_key "${lines[18]}") +distill_trainer=$(func_parser_value "${lines[18]}") +trainer_key1=$(func_parser_key "${lines[19]}") +trainer_value1=$(func_parser_value "${lines[19]}") +trainer_key2=$(func_parser_key "${lines[20]}") +trainer_value2=$(func_parser_value "${lines[20]}") + +# eval params +eval_py=$(func_parser_value "${lines[23]}") +eval_key1=$(func_parser_key "${lines[24]}") +eval_value1=$(func_parser_value "${lines[24]}") + +# export params +save_export_key=$(func_parser_key "${lines[27]}") +save_export_value=$(func_parser_value "${lines[27]}") +export_weight_key=$(func_parser_key "${lines[28]}") +export_weight_value=$(func_parser_value "${lines[28]}") +norm_export=$(func_parser_value "${lines[29]}") +pact_export=$(func_parser_value "${lines[30]}") +fpgm_export=$(func_parser_value "${lines[31]}") +distill_export=$(func_parser_value "${lines[32]}") +export_key1=$(func_parser_key "${lines[33]}") +export_value1=$(func_parser_value "${lines[33]}") +export_onnx_key=$(func_parser_key "${lines[34]}") +export_value2=$(func_parser_value "${lines[34]}") +kl_quant_export=$(func_parser_value "${lines[35]}") + +# parser inference model +infer_mode_list=$(func_parser_value "${lines[37]}") +infer_is_quant_list=$(func_parser_value "${lines[38]}") +# parser inference +inference_py=$(func_parser_value "${lines[39]}") +use_gpu_key=$(func_parser_key "${lines[40]}") +use_gpu_list=$(func_parser_value "${lines[40]}") +use_mkldnn_key=$(func_parser_key "${lines[41]}") +use_mkldnn_list=$(func_parser_value "${lines[41]}") +cpu_threads_key=$(func_parser_key "${lines[42]}") +cpu_threads_list=$(func_parser_value "${lines[42]}") +batch_size_key=$(func_parser_key "${lines[43]}") +batch_size_list=$(func_parser_value "${lines[43]}") +use_trt_key=$(func_parser_key "${lines[44]}") +use_trt_list=$(func_parser_value "${lines[44]}") +precision_key=$(func_parser_key "${lines[45]}") +precision_list=$(func_parser_value "${lines[45]}") +infer_model_key=$(func_parser_key "${lines[46]}") +image_dir_key=$(func_parser_key "${lines[47]}") +infer_img_dir=$(func_parser_value "${lines[47]}") +save_log_key=$(func_parser_key "${lines[48]}") +benchmark_key=$(func_parser_key "${lines[49]}") +benchmark_value=$(func_parser_value "${lines[49]}") +infer_key1=$(func_parser_key "${lines[50]}") +infer_value1=$(func_parser_value "${lines[50]}") + +LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +mkdir -p ${LOG_PATH} +status_log="${LOG_PATH}/results_python.log" + +line_num=`grep -n -w "to_static_train_benchmark_params" $FILENAME | cut -d ":" -f 1` +to_static_key=$(func_parser_key "${lines[line_num]}") +to_static_trainer=$(func_parser_value "${lines[line_num]}") + +function func_inference(){ + IFS='|' + _python=$1 + _script=$2 + _model_dir=$3 + _log_path=$4 + _img_dir=$5 + _flag_quant=$6 + _gpu=$7 + # inference + for use_gpu in ${use_gpu_list[*]}; do + if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then + for use_mkldnn in ${use_mkldnn_list[*]}; do + if [ ${use_mkldnn} = "False" ] && [ ${_flag_quant} = "True" ]; then + continue + fi + for threads in ${cpu_threads_list[*]}; do + for batch_size in ${batch_size_list[*]}; do + _save_log_path="${_log_path}/python_infer_cpu_gpus_${gpu}_usemkldnn_${use_mkldnn}_threads_${threads}_mode_paddle_batchsize_${batch_size}.log" + set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") + set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") + set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") + set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}") + set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}") + set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}") + command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + done + done + elif [ ${use_gpu} = "True" ] || [ ${use_gpu} = "gpu" ]; then + for precision in ${precision_list[*]}; do + if [[ ${precision} != "paddle" ]]; then + if [[ ${_flag_quant} = "False" ]] && [[ ${precision} = "trt_int8" ]]; then + continue + fi + if [[ ${_flag_quant} = "True" ]] && [[ ${precision} != "trt_int8" ]]; then + continue + fi + fi + for batch_size in ${batch_size_list[*]}; do + _save_log_path="${_log_path}/python_infer_gpu_gpus_${gpu}_mode_${precision}_batchsize_${batch_size}.log" + set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") + set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") + set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") + set_precision=$(func_set_params "${precision_key}" "${precision}") + set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}") + set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}") + command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done + done + else + echo "Does not support hardware other than CPU and GPU Currently!" + fi + done +} + +if [ ${MODE} = "whole_infer" ] || [ ${MODE} = "klquant_whole_infer" ]; then + # set CUDA_VISIBLE_DEVICES + GPUID=$3 + if [ ${#GPUID} -le 0 ];then + env=" " + else + env="export CUDA_VISIBLE_DEVICES=${GPUID}" + fi + eval $env + + Count=0 + gpu=0 + IFS="|" + infer_quant_flag=(${infer_is_quant_list}) + for infer_mode in ${infer_mode_list[*]}; do + if [ ${infer_mode} = "null" ]; then + continue + fi + if [ ${MODE} = "klquant_whole_infer" ] && [ ${infer_mode} != "kl_quant" ]; then + continue + fi + if [ ${MODE} = "whole_infer" ] && [ ${infer_mode} = "kl_quant" ]; then + continue + fi + # run export + case ${infer_mode} in + norm) run_export=${norm_export} ;; + pact) run_export=${pact_export} ;; + fpgm) run_export=${fpgm_export} ;; + distill) run_export=${distill_export} ;; + kl_quant) run_export=${kl_quant_export} ;; + *) echo "Undefined infer_mode!"; exit 1; + esac + set_export_weight=$(func_set_params "${export_weight_key}" "${export_weight_value}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_export_value}") + set_filename=$(func_set_params "filename" "${model_name}") + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_save_export_dir} " + echo $export_cmd + eval $export_cmd + status_check $? "${export_cmd}" "${status_log}" "${model_name}" + + #run inference + save_export_model_dir="${save_export_value}/${model_name}" + is_quant=${infer_quant_flag[Count]} + func_inference "${python}" "${inference_py}" "${save_export_model_dir}" "${LOG_PATH}" "${infer_img_dir}" ${is_quant} "{gpu}" + Count=$((${Count} + 1)) + done +else + IFS="|" + Count=0 + for gpu in ${gpu_list[*]}; do + use_gpu=${train_use_gpu_value} + Count=$((${Count} + 1)) + ips="" + if [ ${gpu} = "-1" ];then + env="" + use_gpu=False + elif [ ${#gpu} -le 1 ];then + env="export CUDA_VISIBLE_DEVICES=${gpu}" + eval ${env} + elif [ ${#gpu} -le 15 ];then + IFS="," + array=(${gpu}) + env="export CUDA_VISIBLE_DEVICES=${array[0]}" + IFS="|" + else + IFS=";" + array=(${gpu}) + ips=${array[0]} + gpu=${array[1]} + IFS="|" + env=" " + fi + for autocast in ${autocast_list[*]}; do + for trainer in ${trainer_list[*]}; do + flag_quant=False + set_to_static="" + if [ ${trainer} = "${norm_key}" ]; then + run_train=${norm_trainer} + run_export=${norm_export} + elif [ ${trainer} = "${pact_key}" ]; then + run_train=${pact_trainer} + run_export=${pact_export} + flag_quant=True + elif [ ${trainer} = "${fpgm_key}" ]; then + run_train=${fpgm_trainer} + run_export=${fpgm_export} + elif [ ${trainer} = "${distill_key}" ]; then + run_train=${distill_trainer} + run_export=${distill_export} + elif [ ${trainer} = "${trainer_key1}" ]; then + run_train=${trainer_value1} + run_export=${export_value1} + elif [ ${trainer} = "${trainer_key2}" ]; then + run_train=${trainer_value2} + run_export=${export_value2} + elif [ ${trainer} = "${to_static_key}" ]; then + run_train=${norm_trainer} + run_export=${norm_export} + set_to_static=${to_static_trainer} + else + continue + fi + + if [ ${run_train} = "null" ]; then + continue + fi + + set_epoch=$(func_set_params "${epoch_key}" "${epoch_num}") + set_pretrain=$(func_set_params "${pretrain_model_key}" "${pretrain_model_value}") + set_batchsize=$(func_set_params "${train_batch_key}" "${train_batch_value}") + set_filename=$(func_set_params "filename" "${model_name}") + set_use_gpu=$(func_set_params "${train_use_gpu_key}" "${use_gpu}") + set_train_params1=$(func_set_params "${train_param_key1}" "${train_param_value1}") + save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}" + if [ ${autocast} = "amp" ] || [ ${autocast} = "fp16" ]; then + set_autocast="--amp" + set_amp_level="amp_level=O2" + else + set_autocast=" " + set_amp_level=" " + fi + if [ ${MODE} = "benchmark_train" ]; then + set_shuffle="TrainReader.shuffle=False" + set_enable_ce="--enable_ce=True" + else + set_shuffle=" " + set_enable_ce=" " + fi + + set_save_model=$(func_set_params "${save_model_key}" "${save_log}") + nodes="1" + if [ ${#gpu} -le 2 ];then # train with cpu or single gpu + cmd="${python} ${run_train} LearningRate.base_lr=0.0001 log_iter=1 ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_batchsize} ${set_filename} ${set_shuffle} ${set_amp_level} ${set_enable_ce} ${set_autocast} ${set_to_static} ${set_train_params1}" + elif [ ${#ips} -le 15 ];then # train with multi-gpu + cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} log_iter=1 ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_batchsize} ${set_filename} ${set_shuffle} ${set_amp_level} ${set_enable_ce} ${set_autocast} ${set_to_static} ${set_train_params1}" + else # train with multi-machine + IFS="," + ips_array=(${ips}) + nodes=${#ips_array[@]} + save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}" + IFS="|" + set_save_model=$(func_set_params "${save_model_key}" "${save_log}") + cmd="${python} -m paddle.distributed.launch --ips=${ips} --gpus=${gpu} ${run_train} log_iter=1 ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_batchsize} ${set_filename} ${set_shuffle} ${set_amp_level} ${set_enable_ce} ${set_autocast} ${set_to_static} ${set_train_params1}" + fi + # run train + train_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}.log" + eval "${cmd} > ${train_log_path} 2>&1" + last_status=$? + cat ${train_log_path} + status_check $last_status "${cmd}" "${status_log}" "${model_name}" "${train_log_path}" + + set_eval_trained_weight=$(func_set_params "${export_weight_key}" "${save_log}/${model_name}/${train_model_name}") + # run eval + if [ ${eval_py} != "null" ]; then + set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}") + eval_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_eval.log" + eval_cmd="${python} ${eval_py} ${set_eval_trained_weight} ${set_use_gpu} ${set_eval_params1}" + eval "${eval_cmd} > ${eval_log_path} 2>&1" + last_status=$? + cat ${eval_log_path} + status_check $last_status "${eval_cmd}" "${status_log}" "${model_name}" "${eval_log_path}" + fi + # run export model + if [ ${run_export} != "null" ]; then + save_export_model_dir="${save_log}/${model_name}" + set_export_weight=$(func_set_params "${export_weight_key}" "${save_log}/${model_name}/${train_model_name}") + set_save_export_dir=$(func_set_params "${save_export_key}" "${save_log}") + if [ ${export_onnx_key} = "export_onnx" ]; then + # run export onnx model for rcnn + export_log_path_onnx=${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_onnx_export.log + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} export_onnx=True ${set_save_export_dir} >${export_log_path_onnx} 2>&1" + eval $export_cmd + status_check $? "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path_onnx}" + # copy model for inference benchmark + eval "cp ${save_export_model_dir}/* ${save_log}/" + fi + # run export model + export_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_filename} ${set_save_export_dir} " + eval "${export_cmd} > ${export_log_path} 2>&1" + last_status=$? + cat ${export_log_path} + status_check $last_status "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" + + #run inference + if [ ${export_onnx_key} != "export_onnx" ]; then + # copy model for inference benchmark + eval "cp ${save_export_model_dir}/* ${save_log}/" + fi + eval $env + func_inference "${python}" "${inference_py}" "${save_export_model_dir}" "${LOG_PATH}" "${train_infer_img_dir}" "${flag_quant}" "{gpu}" + + eval "unset CUDA_VISIBLE_DEVICES" + fi + done # done with: for trainer in ${trainer_list[*]}; do + done # done with: for autocast in ${autocast_list[*]}; do + done # done with: for gpu in ${gpu_list[*]}; do +fi # end if [ ${MODE} = "infer" ]; then diff --git a/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_npu.sh b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_npu.sh new file mode 100644 index 0000000000000000000000000000000000000000..5b51ac7ac368e96076490ee3399e9974d3279042 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_npu.sh @@ -0,0 +1,71 @@ +#!/bin/bash +source test_tipc/utils_func.sh +function readlinkf() { + perl -MCwd -e 'print Cwd::abs_path shift' "$1"; +} +function func_parser_config() { + strs=$1 + IFS=" " + array=(${strs}) + tmp=${array[2]} + echo ${tmp} +} +function func_parser_dir() { + strs=$1 + IFS="/" + array=(${strs}) + len=${#array[*]} + dir="" + count=1 + for arr in ${array[*]}; do + if [ ${len} = "${count}" ]; then + continue; + else + dir="${dir}/${arr}" + count=$((${count} + 1)) + fi + done + echo "${dir}" +} +BASEDIR=$(dirname "$0") +REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../) +FILENAME=$1 + # change gpu to npu in tipc txt configs + sed -i "s/use_gpu:True/use_npu:True/g" $FILENAME + sed -i "s/--device:gpu|cpu/--device:npu|cpu/g" $FILENAME + sed -i "s/trainer:pact_train/trainer:norm_train/g" $FILENAME + sed -i "s/trainer:fpgm_train/trainer:norm_train/g" $FILENAME + sed -i "s/--slim_config _template_pact/ /g" $FILENAME + sed -i "s/--slim_config _template_fpgm/ /g" $FILENAME + sed -i "s/--slim_config _template_kl_quant/ /g" $FILENAME + sed -i 's/\"gpu\"/\"npu\"/g' test_tipc/test_train_inference_python.sh + + # parser params +dataline=`cat $FILENAME` +IFS=$'\n' +lines=(${dataline}) +# replace training config file +grep -n '.yml' $FILENAME | cut -d ":" -f 1 \ +| while read line_num ; do + train_cmd=$(func_parser_value "${lines[line_num-1]}") + trainer_config=$(func_parser_config ${train_cmd}) + echo ${trainer_config} + sed -i 's/use_gpu/use_npu/g' "$REPO_ROOT_PATH/$trainer_config" + # fine use_gpu in those included yaml + sub_datalinee=`cat $REPO_ROOT_PATH/$trainer_config` + IFS=$'\n' + sub_lines=(${sub_datalinee}) + grep -n '.yml' "$REPO_ROOT_PATH/$trainer_config" | cut -d ":" -f 1 \ + | while read sub_line_num; do + sub_config=${sub_lines[sub_line_num-1]} + dst=${#sub_config}-5 + sub_path=$(func_parser_dir "${trainer_config}") + sub_config_path="${REPO_ROOT_PATH}${sub_path}/${sub_config:3:${dst}}" + echo ${sub_config_path} + sed -i 's/use_gpu/use_npu/g' "$sub_config_path" + done +done +# pass parameters to test_train_inference_python.sh +cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2" +echo $cmd +eval $cmd diff --git a/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_xpu.sh b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_xpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..b020377f1e7712fde129e7362b8807f94a4c2e35 --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/test_train_inference_python_xpu.sh @@ -0,0 +1,79 @@ +#!/bin/bash + source test_tipc/utils_func.sh + + function readlinkf() { + perl -MCwd -e 'print Cwd::abs_path shift' "$1"; + } + + function func_parser_config() { + strs=$1 + IFS=" " + array=(${strs}) + tmp=${array[2]} + echo ${tmp} + } + + function func_parser_dir() { + strs=$1 + IFS="/" + array=(${strs}) + len=${#array[*]} + dir="" + count=1 + for arr in ${array[*]}; do + if [ ${len} = "${count}" ]; then + continue; + else + dir="${dir}/${arr}" + count=$((${count} + 1)) + fi + done + echo "${dir}" + } + + BASEDIR=$(dirname "$0") + REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../) + + FILENAME=$1 + + # change gpu to xpu in tipc txt configs + sed -i "s/use_gpu:True/use_xpu:True/g" $FILENAME + sed -i "s/--device:gpu|cpu/--device:xpu|cpu/g" $FILENAME + sed -i "s/trainer:pact_train/trainer:norm_train/g" $FILENAME + sed -i "s/trainer:fpgm_train/trainer:norm_train/g" $FILENAME + sed -i "s/--slim_config _template_pact/ /g" $FILENAME + sed -i "s/--slim_config _template_fpgm/ /g" $FILENAME + sed -i "s/--slim_config _template_kl_quant/ /g" $FILENAME + sed -i 's/\"gpu\"/\"xpu\"/g' test_tipc/test_train_inference_python.sh + + # parser params + dataline=`cat $FILENAME` + IFS=$'\n' + lines=(${dataline}) + + # replace training config file + grep -n '.yml' $FILENAME | cut -d ":" -f 1 \ + | while read line_num ; do + train_cmd=$(func_parser_value "${lines[line_num-1]}") + trainer_config=$(func_parser_config ${train_cmd}) + echo ${trainer_config} + sed -i 's/use_gpu/use_xpu/g' "$REPO_ROOT_PATH/$trainer_config" + # fine use_gpu in those included yaml + sub_datalinee=`cat $REPO_ROOT_PATH/$trainer_config` + IFS=$'\n' + sub_lines=(${sub_datalinee}) + grep -n '.yml' "$REPO_ROOT_PATH/$trainer_config" | cut -d ":" -f 1 \ + | while read sub_line_num; do + sub_config=${sub_lines[sub_line_num-1]} + dst=${#sub_config}-5 + sub_path=$(func_parser_dir "${trainer_config}") + sub_config_path="${REPO_ROOT_PATH}${sub_path}/${sub_config:3:${dst}}" + echo ${sub_config_path} + sed -i 's/use_gpu/use_xpu/g' "$sub_config_path" + done + done + + # pass parameters to test_train_inference_python.sh + cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2" + echo $cmd + eval $cmd \ No newline at end of file diff --git a/PaddleDetection-release-2.6/test_tipc/utils_func.sh b/PaddleDetection-release-2.6/test_tipc/utils_func.sh new file mode 100644 index 0000000000000000000000000000000000000000..4f52f34ccb404b86794ec826df78a2756a746b5f --- /dev/null +++ b/PaddleDetection-release-2.6/test_tipc/utils_func.sh @@ -0,0 +1,60 @@ +#!/bin/bash + +function func_parser_key(){ + strs=$1 + echo ${strs%%:*} +} + +function func_parser_value(){ + strs=$1 + echo ${strs#*:} +} + +function func_set_params(){ + key=$1 + value=$2 + if [ ${key}x = "null"x ];then + echo " " + elif [[ ${value} = "null" ]] || [[ ${value} = " " ]] || [ ${#value} -le 0 ];then + echo " " + else + echo "${key}=${value}" + fi +} + +function func_parser_params(){ + strs=$1 + IFS=":" + array=(${strs}) + key=${array[0]} + tmp=${array[1]} + IFS="|" + res="" + for _params in ${tmp[*]}; do + IFS="=" + array=(${_params}) + mode=${array[0]} + value=${array[1]} + if [[ ${mode} = ${MODE} ]]; then + IFS="|" + #echo $(func_set_params "${mode}" "${value}") + echo $value + break + fi + IFS="|" + done + echo ${res} +} + +function status_check(){ + last_status=$1 # the exit code + run_command=$2 + run_log=$3 + model_name=$4 + log_path=$5 + if [ $last_status -eq 0 ]; then + echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command} - ${log_path} \033[0m" | tee -a ${run_log} + else + echo -e "\033[33m Run failed with command - ${model_name} - ${run_command} - ${log_path} \033[0m" | tee -a ${run_log} + fi +} diff --git a/PaddleDetection-release-2.6/tools/anchor_cluster.py b/PaddleDetection-release-2.6/tools/anchor_cluster.py new file mode 100644 index 0000000000000000000000000000000000000000..e892d403090e6569e16d9548c00841368b427793 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/anchor_cluster.py @@ -0,0 +1,249 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from ppdet.utils.logger import setup_logger +logger = setup_logger('ppdet.anchor_cluster') + +from scipy.cluster.vq import kmeans +import numpy as np +from tqdm import tqdm + +from ppdet.utils.cli import ArgsParser +from ppdet.utils.check import check_gpu, check_version, check_config +from ppdet.core.workspace import load_config, merge_config + + +class BaseAnchorCluster(object): + def __init__(self, n, cache_path, cache, verbose=True): + """ + Base Anchor Cluster + + Args: + n (int): number of clusters + cache_path (str): cache directory path + cache (bool): whether using cache + verbose (bool): whether print results + """ + super(BaseAnchorCluster, self).__init__() + self.n = n + self.cache_path = cache_path + self.cache = cache + self.verbose = verbose + + def print_result(self, centers): + raise NotImplementedError('%s.print_result is not available' % + self.__class__.__name__) + + def get_whs(self): + whs_cache_path = os.path.join(self.cache_path, 'whs.npy') + shapes_cache_path = os.path.join(self.cache_path, 'shapes.npy') + if self.cache and os.path.exists(whs_cache_path) and os.path.exists( + shapes_cache_path): + self.whs = np.load(whs_cache_path) + self.shapes = np.load(shapes_cache_path) + return self.whs, self.shapes + whs = np.zeros((0, 2)) + shapes = np.zeros((0, 2)) + self.dataset.parse_dataset() + roidbs = self.dataset.roidbs + for rec in tqdm(roidbs): + h, w = rec['h'], rec['w'] + bbox = rec['gt_bbox'] + wh = bbox[:, 2:4] - bbox[:, 0:2] + 1 + wh = wh / np.array([[w, h]]) + shape = np.ones_like(wh) * np.array([[w, h]]) + whs = np.vstack((whs, wh)) + shapes = np.vstack((shapes, shape)) + + if self.cache: + os.makedirs(self.cache_path, exist_ok=True) + np.save(whs_cache_path, whs) + np.save(shapes_cache_path, shapes) + + self.whs = whs + self.shapes = shapes + return self.whs, self.shapes + + def calc_anchors(self): + raise NotImplementedError('%s.calc_anchors is not available' % + self.__class__.__name__) + + def __call__(self): + self.get_whs() + centers = self.calc_anchors() + if self.verbose: + self.print_result(centers) + return centers + + +class YOLOv2AnchorCluster(BaseAnchorCluster): + def __init__(self, + n, + dataset, + size, + cache_path, + cache, + iters=1000, + verbose=True): + super(YOLOv2AnchorCluster, self).__init__( + n, cache_path, cache, verbose=verbose) + """ + YOLOv2 Anchor Cluster + + The code is based on https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py + + Args: + n (int): number of clusters + dataset (DataSet): DataSet instance, VOC or COCO + size (list): [w, h] + cache_path (str): cache directory path + cache (bool): whether using cache + iters (int): kmeans algorithm iters + verbose (bool): whether print results + """ + self.dataset = dataset + self.size = size + self.iters = iters + + def print_result(self, centers): + logger.info('%d anchor cluster result: [w, h]' % self.n) + for w, h in centers: + logger.info('[%d, %d]' % (round(w), round(h))) + + def metric(self, whs, centers): + wh1 = whs[:, None] + wh2 = centers[None] + inter = np.minimum(wh1, wh2).prod(2) + return inter / (wh1.prod(2) + wh2.prod(2) - inter) + + def kmeans_expectation(self, whs, centers, assignments): + dist = self.metric(whs, centers) + new_assignments = dist.argmax(1) + converged = (new_assignments == assignments).all() + return converged, new_assignments + + def kmeans_maximizations(self, whs, centers, assignments): + new_centers = np.zeros_like(centers) + for i in range(centers.shape[0]): + mask = (assignments == i) + if mask.sum(): + new_centers[i, :] = whs[mask].mean(0) + return new_centers + + def calc_anchors(self): + self.whs = self.whs * np.array([self.size]) + # random select k centers + whs, n, iters = self.whs, self.n, self.iters + logger.info('Running kmeans for %d anchors on %d points...' % + (n, len(whs))) + idx = np.random.choice(whs.shape[0], size=n, replace=False) + centers = whs[idx] + assignments = np.zeros(whs.shape[0:1]) * -1 + # kmeans + if n == 1: + return self.kmeans_maximizations(whs, centers, assignments) + + pbar = tqdm(range(iters), desc='Cluster anchors with k-means algorithm') + for _ in pbar: + # E step + converged, assignments = self.kmeans_expectation(whs, centers, + assignments) + if converged: + logger.info('kmeans algorithm has converged') + break + # M step + centers = self.kmeans_maximizations(whs, centers, assignments) + ious = self.metric(whs, centers) + pbar.desc = 'avg_iou: %.4f' % (ious.max(1).mean()) + + centers = sorted(centers, key=lambda x: x[0] * x[1]) + return centers + + +def main(): + parser = ArgsParser() + parser.add_argument( + '--n', '-n', default=9, type=int, help='num of clusters') + parser.add_argument( + '--iters', + '-i', + default=1000, + type=int, + help='num of iterations for kmeans') + parser.add_argument( + '--verbose', '-v', default=True, type=bool, help='whether print result') + parser.add_argument( + '--size', + '-s', + default=None, + type=str, + help='image size: w,h, using comma as delimiter') + parser.add_argument( + '--method', + '-m', + default='v2', + type=str, + help='cluster method, v2 is only supported now') + parser.add_argument( + '--cache_path', default='cache', type=str, help='cache path') + parser.add_argument( + '--cache', action='store_true', help='whether use cache') + FLAGS = parser.parse_args() + + cfg = load_config(FLAGS.config) + merge_config(FLAGS.opt) + check_config(cfg) + # check if set use_gpu=True in paddlepaddle cpu version + if 'use_gpu' not in cfg: + cfg.use_gpu = False + check_gpu(cfg.use_gpu) + # check if paddlepaddle version is satisfied + check_version('develop') + + # get dataset + dataset = cfg['TrainDataset'] + if FLAGS.size: + if ',' in FLAGS.size: + size = list(map(int, FLAGS.size.split(','))) + assert len(size) == 2, "the format of size is incorrect" + else: + size = int(FLAGS.size) + size = [size, size] + elif 'inputs_def' in cfg['TestReader'] and 'image_shape' in cfg[ + 'TestReader']['inputs_def']: + size = cfg['TestReader']['inputs_def']['image_shape'][1:] + else: + raise ValueError('size is not specified') + + if FLAGS.method == 'v2': + cluster = YOLOv2AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path, + FLAGS.cache, FLAGS.iters, FLAGS.verbose) + else: + raise ValueError('cluster method: %s is not supported' % FLAGS.method) + + anchors = cluster() + + +if __name__ == "__main__": + main() diff --git a/PaddleDetection-release-2.6/tools/box_distribution.py b/PaddleDetection-release-2.6/tools/box_distribution.py new file mode 100644 index 0000000000000000000000000000000000000000..f7979ecc125f4f35f7256b6219707d04218210af --- /dev/null +++ b/PaddleDetection-release-2.6/tools/box_distribution.py @@ -0,0 +1,141 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import matplotlib.pyplot as plt +import json +import numpy as np +import argparse +from pycocotools.coco import COCO +from tqdm import tqdm + + +def median(data): + data.sort() + mid = len(data) // 2 + median = (data[mid] + data[~mid]) / 2 + return median + + +def draw_distribution(width, height, out_path): + w_bins = int((max(width) - min(width)) // 10) + h_bins = int((max(height) - min(height)) // 10) + plt.figure() + plt.subplot(221) + plt.hist(width, bins=w_bins, color='green') + plt.xlabel('Width rate *1000') + plt.ylabel('number') + plt.title('Distribution of Width') + plt.subplot(222) + plt.hist(height, bins=h_bins, color='blue') + plt.xlabel('Height rate *1000') + plt.title('Distribution of Height') + plt.savefig(out_path) + print(f'Distribution saved as {out_path}') + plt.show() + + +def get_ratio_infos(jsonfile, out_img, eval_size, small_stride): + coco = COCO(annotation_file=jsonfile) + allannjson = json.load(open(jsonfile, 'r')) + be_im_id = allannjson['annotations'][0]['image_id'] + be_im_w = [] + be_im_h = [] + ratio_w = [] + ratio_h = [] + im_wid,im_hei=[],[] + for ann in tqdm(allannjson['annotations']): + if ann['iscrowd']: + continue + x0, y0, w, h = ann['bbox'][:] + if be_im_id == ann['image_id']: + be_im_w.append(w) + be_im_h.append(h) + else: + im_w = coco.imgs[be_im_id]['width'] + im_h = coco.imgs[be_im_id]['height'] + im_wid.append(im_w) + im_hei.append(im_h) + im_m_w = np.mean(be_im_w) + im_m_h = np.mean(be_im_h) + dis_w = im_m_w / im_w + dis_h = im_m_h / im_h + ratio_w.append(dis_w) + ratio_h.append(dis_h) + be_im_id = ann['image_id'] + be_im_w = [w] + be_im_h = [h] + + + im_w = coco.imgs[be_im_id]['width'] + im_h = coco.imgs[be_im_id]['height'] + im_wid.append(im_w) + im_hei.append(im_h) + all_im_m_w = np.mean(im_wid) + all_im_m_h = np.mean(im_hei) + + + im_m_w = np.mean(be_im_w) + im_m_h = np.mean(be_im_h) + dis_w = im_m_w / im_w + dis_h = im_m_h / im_h + ratio_w.append(dis_w) + ratio_h.append(dis_h) + mid_w = median(ratio_w) + mid_h = median(ratio_h) + + reg_ratio = [] + ratio_all = ratio_h + ratio_w + for r in ratio_all: + if r < 0.2: + reg_ratio.append(r) + elif r < 0.4: + reg_ratio.append(r/2) + else: + reg_ratio.append(r/4) + reg_ratio = sorted(reg_ratio) + max_ratio = reg_ratio[int(0.95*len(reg_ratio))] + reg_max = round(max_ratio*eval_size/small_stride) + + ratio_w = [i * 1000 for i in ratio_w] + ratio_h = [i * 1000 for i in ratio_h] + print(f'Suggested reg_range[1] is {reg_max+1}' ) + print(f'Mean of all img_w is {all_im_m_w}') + print(f'Mean of all img_h is {all_im_m_h}') + print(f'Median of ratio_w is {mid_w}') + print(f'Median of ratio_h is {mid_h}') + print('all_img with box: ', len(ratio_h)) + print('all_ann: ', len(allannjson['annotations'])) + draw_distribution(ratio_w, ratio_h, out_img) + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument( + '--json_path', type=str, default=None, help="Dataset json path.") + parser.add_argument( + '--eval_size', type=int, default=640, help="eval size.") + parser.add_argument( + '--small_stride', type=int, default=8, help="smallest stride.") + parser.add_argument( + '--out_img', + type=str, + default='box_distribution.jpg', + help="Name of distibution img.") + args = parser.parse_args() + + get_ratio_infos(args.json_path, args.out_img, args.eval_size, args.small_stride) + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/PaddleDetection-release-2.6/tools/cam_ppdet.py b/PaddleDetection-release-2.6/tools/cam_ppdet.py new file mode 100644 index 0000000000000000000000000000000000000000..c8922b9353f90c5f40472ac8e1b7621bf903e620 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/cam_ppdet.py @@ -0,0 +1,115 @@ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +# add python path of PadleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') +from ppdet.utils.cli import ArgsParser, merge_args +from ppdet.core.workspace import load_config, merge_config +from ppdet.utils.check import check_gpu, check_npu, check_xpu, check_version, check_config +from ppdet.utils.cam_utils import BBoxCAM +import paddle + + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--infer_img", + type=str, + default='demo/000000014439.jpg', # hxw: 404x640 + help="Image path, has higher priority over --infer_dir") + parser.add_argument("--weights", + type=str, + default='output/faster_rcnn_r50_vd_fpn_2x_coco_paddlejob/best_model.pdparams' + ) + parser.add_argument("--cam_out", + type=str, + default='cam_faster_rcnn' + ) + parser.add_argument("--use_gpu", + type=bool, + default=True) + parser.add_argument( + "--infer_dir", + type=str, + default=None, + help="Directory for images to perform inference on.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory for storing the output visualization files.") + parser.add_argument( + "--draw_threshold", + type=float, + default=0.8, + help="Threshold to reserve the result for visualization.") + parser.add_argument( + "--save_results", + type=bool, + default=False, + help="Whether to save inference results to output_dir.") + parser.add_argument( + "--target_feature_layer_name", + type=str, + default='model.backbone', # define the featuremap to show grad cam, such as model.backbone, model.bbox_head.roi_extractor + help="Whether to save inference results to output_dir.") + args = parser.parse_args() + + return args + +def run(FLAGS, cfg): + assert cfg.architecture in ['FasterRCNN', 'MaskRCNN', 'YOLOv3', 'PPYOLOE', + 'PPYOLOEWithAuxHead', 'BlazeFace', 'SSD', 'RetinaNet'], \ + 'Only supported cam for faster_rcnn based and yolov3 based architecture for now, ' \ + 'the others are not supported temporarily!' + + bbox_cam = BBoxCAM(FLAGS, cfg) + bbox_cam.get_bboxes_cams() + + print('finish') + + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_args(cfg, FLAGS) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + else: + place = paddle.set_device('cpu') + + check_config(cfg) + check_gpu(cfg.use_gpu) + check_npu(cfg.use_npu) + check_xpu(cfg.use_xpu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/eval.py b/PaddleDetection-release-2.6/tools/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..384a497906bc436a9bd20a5d0cb3165dc4af4f3b --- /dev/null +++ b/PaddleDetection-release-2.6/tools/eval.py @@ -0,0 +1,203 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle + +from ppdet.core.workspace import load_config, merge_config +from ppdet.utils.check import check_gpu, check_npu, check_xpu, check_mlu, check_version, check_config +from ppdet.utils.cli import ArgsParser, merge_args +from ppdet.engine import Trainer, init_parallel_env +from ppdet.metrics.coco_utils import json_eval_results +from ppdet.slim import build_slim_model + +from ppdet.utils.logger import setup_logger +logger = setup_logger('eval') + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--output_eval", + default=None, + type=str, + help="Evaluation directory, default is current directory.") + + parser.add_argument( + '--json_eval', + action='store_true', + default=False, + help='Whether to re eval with already exists bbox.json or mask.json') + + parser.add_argument( + "--slim_config", + default=None, + type=str, + help="Configuration file of slim method.") + + # TODO: bias should be unified + parser.add_argument( + "--bias", + action="store_true", + help="whether add bias or not while getting w and h") + + parser.add_argument( + "--classwise", + action="store_true", + help="whether per-category AP and draw P-R Curve or not.") + + parser.add_argument( + '--save_prediction_only', + action='store_true', + default=False, + help='Whether to save the evaluation results only') + + parser.add_argument( + "--amp", + action='store_true', + default=False, + help="Enable auto mixed precision eval.") + + # for smalldet slice_infer + parser.add_argument( + "--slice_infer", + action='store_true', + help="Whether to slice the image and merge the inference results for small object detection." + ) + parser.add_argument( + '--slice_size', + nargs='+', + type=int, + default=[640, 640], + help="Height of the sliced image.") + parser.add_argument( + "--overlap_ratio", + nargs='+', + type=float, + default=[0.25, 0.25], + help="Overlap height ratio of the sliced image.") + parser.add_argument( + "--combine_method", + type=str, + default='nms', + help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']." + ) + parser.add_argument( + "--match_threshold", + type=float, + default=0.6, + help="Combine method matching threshold.") + parser.add_argument( + "--match_metric", + type=str, + default='ios', + help="Combine method matching metric, choose in ['iou', 'ios'].") + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + if FLAGS.json_eval: + logger.info( + "In json_eval mode, PaddleDetection will evaluate json files in " + "output_eval directly. And proposal.json, bbox.json and mask.json " + "will be detected by default.") + json_eval_results( + cfg.metric, + json_directory=FLAGS.output_eval, + dataset=cfg['EvalDataset']) + return + + # init parallel environment if nranks > 1 + init_parallel_env() + + # build trainer + trainer = Trainer(cfg, mode='eval') + + # load weights + trainer.load_weights(cfg.weights) + + # training + if FLAGS.slice_infer: + trainer.evaluate_slice( + slice_size=FLAGS.slice_size, + overlap_ratio=FLAGS.overlap_ratio, + combine_method=FLAGS.combine_method, + match_threshold=FLAGS.match_threshold, + match_metric=FLAGS.match_metric) + else: + trainer.evaluate() + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_args(cfg, FLAGS) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if 'use_gpu' not in cfg: + cfg.use_gpu = False + + # disable mlu in config by default + if 'use_mlu' not in cfg: + cfg.use_mlu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + elif cfg.use_mlu: + place = paddle.set_device('mlu') + else: + place = paddle.set_device('cpu') + + if FLAGS.slim_config: + cfg = build_slim_model(cfg, FLAGS.slim_config, mode='eval') + + check_config(cfg) + check_gpu(cfg.use_gpu) + check_npu(cfg.use_npu) + check_xpu(cfg.use_xpu) + check_mlu(cfg.use_mlu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/eval_mot.py b/PaddleDetection-release-2.6/tools/eval_mot.py new file mode 100644 index 0000000000000000000000000000000000000000..b88d0c4a1dec6b368f9753a0b4fa2b319210c87c --- /dev/null +++ b/PaddleDetection-release-2.6/tools/eval_mot.py @@ -0,0 +1,144 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle + +from ppdet.core.workspace import load_config, merge_config +from ppdet.utils.check import check_gpu, check_npu, check_xpu, check_mlu, check_version, check_config +from ppdet.utils.cli import ArgsParser +from ppdet.engine import Tracker + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--det_results_dir", + type=str, + default='', + help="Directory name for detection results.") + parser.add_argument( + '--output_dir', + type=str, + default='output', + help='Directory name for output tracking results.') + parser.add_argument( + '--save_images', + action='store_true', + help='Save tracking results (image).') + parser.add_argument( + '--save_videos', + action='store_true', + help='Save tracking results (video).') + parser.add_argument( + '--show_image', + action='store_true', + help='Show tracking results (image).') + parser.add_argument( + '--scaled', + type=bool, + default=False, + help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 " + "True in general detector.") + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + dataset_dir = cfg['EvalMOTDataset'].dataset_dir + data_root = cfg['EvalMOTDataset'].data_root + data_root = '{}/{}'.format(dataset_dir, data_root) + seqs = os.listdir(data_root) + seqs.sort() + + # build Tracker + tracker = Tracker(cfg, mode='eval') + + # load weights + if cfg.architecture in ['DeepSORT', 'ByteTrack']: + tracker.load_weights_sde(cfg.det_weights, cfg.reid_weights) + else: + tracker.load_weights_jde(cfg.weights) + + # inference + tracker.mot_evaluate( + data_root=data_root, + seqs=seqs, + data_type=cfg.metric.lower(), + model_type=cfg.architecture, + output_dir=FLAGS.output_dir, + save_images=FLAGS.save_images, + save_videos=FLAGS.save_videos, + show_image=FLAGS.show_image, + scaled=FLAGS.scaled, + det_results_dir=FLAGS.det_results_dir) + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if 'use_gpu' not in cfg: + cfg.use_gpu = False + + # disable mlu in config by default + if 'use_mlu' not in cfg: + cfg.use_mlu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + elif cfg.use_mlu: + place = paddle.set_device('mlu') + else: + place = paddle.set_device('cpu') + + check_config(cfg) + check_gpu(cfg.use_gpu) + check_npu(cfg.use_npu) + check_xpu(cfg.use_xpu) + check_mlu(cfg.use_mlu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/export_model.py b/PaddleDetection-release-2.6/tools/export_model.py new file mode 100644 index 0000000000000000000000000000000000000000..20cfcfaa57289f09ed216a912f0a4bccbceacce0 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/export_model.py @@ -0,0 +1,110 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.utils.check import check_gpu, check_version, check_config +from ppdet.utils.cli import ArgsParser +from ppdet.engine import Trainer +from ppdet.slim import build_slim_model + +from ppdet.utils.logger import setup_logger +logger = setup_logger('export_model') + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--output_dir", + type=str, + default="output_inference", + help="Directory for storing the output model files.") + parser.add_argument( + "--export_serving_model", + type=bool, + default=False, + help="Whether to export serving model or not.") + parser.add_argument( + "--slim_config", + default=None, + type=str, + help="Configuration file of slim method.") + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + # build detector + trainer = Trainer(cfg, mode='test') + + # load weights + if cfg.architecture in ['DeepSORT', 'ByteTrack']: + trainer.load_weights_sde(cfg.det_weights, cfg.reid_weights) + else: + trainer.load_weights(cfg.weights) + + # export model + trainer.export(FLAGS.output_dir) + + if FLAGS.export_serving_model: + from paddle_serving_client.io import inference_model_to_serving + model_name = os.path.splitext(os.path.split(cfg.filename)[-1])[0] + + inference_model_to_serving( + dirname="{}/{}".format(FLAGS.output_dir, model_name), + serving_server="{}/{}/serving_server".format(FLAGS.output_dir, + model_name), + serving_client="{}/{}/serving_client".format(FLAGS.output_dir, + model_name), + model_filename="model.pdmodel", + params_filename="model.pdiparams") + + +def main(): + paddle.set_device("cpu") + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_config(FLAGS.opt) + + if FLAGS.slim_config: + cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test') + + # FIXME: Temporarily solve the priority problem of FLAGS.opt + merge_config(FLAGS.opt) + check_config(cfg) + if 'use_gpu' not in cfg: + cfg.use_gpu = False + check_gpu(cfg.use_gpu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/gen_semi_coco.py b/PaddleDetection-release-2.6/tools/gen_semi_coco.py new file mode 100644 index 0000000000000000000000000000000000000000..acacb5861279a6128971ccb49cc51a030a43e381 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/gen_semi_coco.py @@ -0,0 +1,102 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import argparse +import numpy as np + + +def save_json(path, images, annotations, categories): + new_json = { + 'images': images, + 'annotations': annotations, + 'categories': categories, + } + with open(path, 'w') as f: + json.dump(new_json, f) + print('{} saved, with {} images and {} annotations.'.format( + path, len(images), len(annotations))) + + +def gen_semi_data(data_dir, + json_file, + percent=10.0, + seed=1, + seed_offset=0, + txt_file=None): + json_name = json_file.split('/')[-1].split('.')[0] + json_file = os.path.join(data_dir, json_file) + anno = json.load(open(json_file, 'r')) + categories = anno['categories'] + all_images = anno['images'] + all_anns = anno['annotations'] + print( + 'Totally {} images and {} annotations, about {} gts per image.'.format( + len(all_images), len(all_anns), len(all_anns) / len(all_images))) + + if txt_file: + print('Using percent {} and seed {}.'.format(percent, seed)) + txt_file = os.path.join(data_dir, txt_file) + sup_idx = json.load(open(txt_file, 'r'))[str(percent)][str(seed)] + # max(sup_idx) = 117262 # 10%, sup_idx is not image_id + else: + np.random.seed(seed + seed_offset) + sup_len = int(percent / 100.0 * len(all_images)) + sup_idx = np.random.choice( + range(len(all_images)), size=sup_len, replace=False) + labeled_images, labeled_anns = [], [] + labeled_im_ids = [] + unlabeled_images, unlabeled_anns = [], [] + + for i in range(len(all_images)): + if i in sup_idx: + labeled_im_ids.append(all_images[i]['id']) + labeled_images.append(all_images[i]) + else: + unlabeled_images.append(all_images[i]) + + for an in all_anns: + im_id = an['image_id'] + if im_id in labeled_im_ids: + labeled_anns.append(an) + else: + continue + + save_path = '{}/{}'.format(data_dir, 'semi_annotations') + if not os.path.exists(save_path): + os.mkdir(save_path) + + sup_name = '{}.{}@{}.json'.format(json_name, seed, int(percent)) + sup_path = os.path.join(save_path, sup_name) + save_json(sup_path, labeled_images, labeled_anns, categories) + + unsup_name = '{}.{}@{}-unlabeled.json'.format(json_name, seed, int(percent)) + unsup_path = os.path.join(save_path, unsup_name) + save_json(unsup_path, unlabeled_images, unlabeled_anns, categories) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument('--data_dir', type=str, default='./dataset/coco') + parser.add_argument( + '--json_file', type=str, default='annotations/instances_train2017.json') + parser.add_argument('--percent', type=float, default=10.0) + parser.add_argument('--seed', type=int, default=1) + parser.add_argument('--seed_offset', type=int, default=0) + parser.add_argument('--txt_file', type=str, default='COCO_supervision.txt') + args = parser.parse_args() + print(args) + gen_semi_data(args.data_dir, args.json_file, args.percent, args.seed, + args.seed_offset, args.txt_file) diff --git a/PaddleDetection-release-2.6/tools/infer.py b/PaddleDetection-release-2.6/tools/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..b8ab436240c32f24aeecbe4910b098ee25fb0190 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/infer.py @@ -0,0 +1,236 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') +import glob +import ast + +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.engine import Trainer +from ppdet.utils.check import check_gpu, check_npu, check_xpu, check_mlu, check_version, check_config +from ppdet.utils.cli import ArgsParser, merge_args +from ppdet.slim import build_slim_model + +from ppdet.utils.logger import setup_logger +logger = setup_logger('train') + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--infer_dir", + type=str, + default="PICT", + help="Directory for images to perform inference on.") + parser.add_argument( + "--infer_img", + type=str, + default=None, + help="Image path, has higher priority over --infer_dir") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory for storing the output visualization files.") + parser.add_argument( + "--draw_threshold", + type=float, + default=0.5, + help="Threshold to reserve the result for visualization.") + parser.add_argument( + "--slim_config", + default=None, + type=str, + help="Configuration file of slim method.") + parser.add_argument( + "--use_vdl", + type=bool, + default=False, + help="Whether to record the data to VisualDL.") + parser.add_argument( + '--vdl_log_dir', + type=str, + default="vdl_log_dir/image", + help='VisualDL logging directory for image.') + parser.add_argument( + "--save_results", + type=bool, + default=False, + help="Whether to save inference results to output_dir.") + parser.add_argument( + "--slice_infer", + action='store_true', + help="Whether to slice the image and merge the inference results for small object detection." + ) + parser.add_argument( + '--slice_size', + nargs='+', + type=int, + default=[640, 640], + help="Height of the sliced image.") + parser.add_argument( + "--overlap_ratio", + nargs='+', + type=float, + default=[0.25, 0.25], + help="Overlap height ratio of the sliced image.") + parser.add_argument( + "--combine_method", + type=str, + default='nms', + help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']." + ) + parser.add_argument( + "--match_threshold", + type=float, + default=0.6, + help="Combine method matching threshold.") + parser.add_argument( + "--match_metric", + type=str, + default='ios', + help="Combine method matching metric, choose in ['iou', 'ios'].") + parser.add_argument( + "--visualize", + type=ast.literal_eval, + default=True, + help="Whether to save visualize results to output_dir.") + args = parser.parse_args() + return args + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--infer_img or --infer_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + logger.info("Found {} inference images in total.".format(len(images))) + + return images + + +def run(FLAGS, cfg): + # build trainer + trainer = Trainer(cfg, mode='test') + + # load weights + trainer.load_weights(cfg.weights) + + # get inference images + images = get_test_images(FLAGS.infer_dir, FLAGS.infer_img) + + # inference + if FLAGS.slice_infer: + trainer.slice_predict( + images, + slice_size=FLAGS.slice_size, + overlap_ratio=FLAGS.overlap_ratio, + combine_method=FLAGS.combine_method, + match_threshold=FLAGS.match_threshold, + match_metric=FLAGS.match_metric, + draw_threshold=FLAGS.draw_threshold, + output_dir=FLAGS.output_dir, + save_results=FLAGS.save_results, + visualize=FLAGS.visualize) + else: + trainer.predict( + images, + draw_threshold=FLAGS.draw_threshold, + output_dir=FLAGS.output_dir, + save_results=FLAGS.save_results, + visualize=FLAGS.visualize) + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_args(cfg, FLAGS) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if 'use_gpu' not in cfg: + cfg.use_gpu = False + + # disable mlu in config by default + if 'use_mlu' not in cfg: + cfg.use_mlu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + elif cfg.use_mlu: + place = paddle.set_device('mlu') + else: + place = paddle.set_device('cpu') + + if FLAGS.slim_config: + cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test') + + check_config(cfg) + check_gpu(cfg.use_gpu) + check_npu(cfg.use_npu) + check_xpu(cfg.use_xpu) + check_mlu(cfg.use_mlu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/infer_mot.py b/PaddleDetection-release-2.6/tools/infer_mot.py new file mode 100644 index 0000000000000000000000000000000000000000..547857beaaf7b1cd2246a80d859d4be053ef6404 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/infer_mot.py @@ -0,0 +1,156 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.engine import Tracker +from ppdet.utils.check import check_gpu, check_npu, check_xpu, check_mlu, check_version, check_config +from ppdet.utils.cli import ArgsParser + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + '--video_file', type=str, default=None, help='Video name for tracking.') + parser.add_argument( + '--frame_rate', + type=int, + default=-1, + help='Video frame rate for tracking.') + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Directory for images to perform inference on.") + parser.add_argument( + "--det_results_dir", + type=str, + default='', + help="Directory name for detection results.") + parser.add_argument( + '--output_dir', + type=str, + default='output', + help='Directory name for output tracking results.') + parser.add_argument( + '--save_images', + action='store_true', + help='Save tracking results (image).') + parser.add_argument( + '--save_videos', + action='store_true', + help='Save tracking results (video).') + parser.add_argument( + '--show_image', + action='store_true', + help='Show tracking results (image).') + parser.add_argument( + '--scaled', + type=bool, + default=False, + help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 " + "True in general detector.") + parser.add_argument( + "--draw_threshold", + type=float, + default=0.5, + help="Threshold to reserve the result for visualization.") + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + # build Tracker + tracker = Tracker(cfg, mode='test') + + # load weights + if cfg.architecture in ['DeepSORT', 'ByteTrack']: + tracker.load_weights_sde(cfg.det_weights, cfg.reid_weights) + else: + tracker.load_weights_jde(cfg.weights) + + # inference + tracker.mot_predict_seq( + video_file=FLAGS.video_file, + frame_rate=FLAGS.frame_rate, + image_dir=FLAGS.image_dir, + data_type=cfg.metric.lower(), + model_type=cfg.architecture, + output_dir=FLAGS.output_dir, + save_images=FLAGS.save_images, + save_videos=FLAGS.save_videos, + show_image=FLAGS.show_image, + scaled=FLAGS.scaled, + det_results_dir=FLAGS.det_results_dir, + draw_threshold=FLAGS.draw_threshold) + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if 'use_gpu' not in cfg: + cfg.use_gpu = False + + # disable mlu in config by default + if 'use_mlu' not in cfg: + cfg.use_mlu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + elif cfg.use_mlu: + place = paddle.set_device('mlu') + else: + place = paddle.set_device('cpu') + + check_config(cfg) + check_gpu(cfg.use_gpu) + check_npu(cfg.use_npu) + check_xpu(cfg.use_xpu) + check_mlu(cfg.use_mlu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/post_quant.py b/PaddleDetection-release-2.6/tools/post_quant.py new file mode 100644 index 0000000000000000000000000000000000000000..60acaccc87f5be7256f492a2dd8efa439e888d57 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/post_quant.py @@ -0,0 +1,98 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle + +from ppdet.core.workspace import load_config, merge_config +from ppdet.utils.check import check_gpu, check_version, check_config +from ppdet.utils.cli import ArgsParser +from ppdet.engine import Trainer +from ppdet.slim import build_slim_model + +from ppdet.utils.logger import setup_logger +logger = setup_logger('post_quant') + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--output_dir", + type=str, + default="output_inference", + help="Directory for storing the output model files.") + parser.add_argument( + "--slim_config", + default=None, + type=str, + help="Configuration file of slim method.") + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + # build detector + trainer = Trainer(cfg, mode='eval') + + # load weights + if cfg.architecture in ['DeepSORT']: + if cfg.det_weights != 'None': + trainer.load_weights_sde(cfg.det_weights, cfg.reid_weights) + else: + trainer.load_weights_sde(None, cfg.reid_weights) + else: + trainer.load_weights(cfg.weights) + + # post quant model + trainer.post_quant(FLAGS.output_dir) + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + # TODO: to be refined in the future + if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn': + FLAGS.opt['norm_type'] = 'bn' + merge_config(FLAGS.opt) + + if FLAGS.slim_config: + cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test') + + # FIXME: Temporarily solve the priority problem of FLAGS.opt + merge_config(FLAGS.opt) + check_config(cfg) + if 'use_gpu' not in cfg: + cfg.use_gpu = False + check_gpu(cfg.use_gpu) + check_version() + + run(FLAGS, cfg) + + +if __name__ == '__main__': + main() diff --git a/PaddleDetection-release-2.6/tools/slice_image.py b/PaddleDetection-release-2.6/tools/slice_image.py new file mode 100644 index 0000000000000000000000000000000000000000..f739d74244b0e4672a5b2ed3430f89b936f0bef5 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/slice_image.py @@ -0,0 +1,56 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +from tqdm import tqdm + + +def slice_data(image_dir, dataset_json_path, output_dir, slice_size, + overlap_ratio): + try: + from sahi.scripts.slice_coco import slice + except Exception as e: + raise RuntimeError( + 'Unable to use sahi to slice images, please install sahi, for example: `pip install sahi`, see https://github.com/obss/sahi' + ) + tqdm.write( + f" slicing for slice_size={slice_size}, overlap_ratio={overlap_ratio}") + slice( + image_dir=image_dir, + dataset_json_path=dataset_json_path, + output_dir=output_dir, + slice_size=slice_size, + overlap_ratio=overlap_ratio, ) + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument( + '--image_dir', type=str, default=None, help="The image folder path.") + parser.add_argument( + '--json_path', type=str, default=None, help="Dataset json path.") + parser.add_argument( + '--output_dir', type=str, default=None, help="Output dir.") + parser.add_argument( + '--slice_size', type=int, default=500, help="slice_size") + parser.add_argument( + '--overlap_ratio', type=float, default=0.25, help="overlap_ratio") + args = parser.parse_args() + + slice_data(args.image_dir, args.json_path, args.output_dir, args.slice_size, + args.overlap_ratio) + + +if __name__ == "__main__": + main() diff --git a/PaddleDetection-release-2.6/tools/sniper_params_stats.py b/PaddleDetection-release-2.6/tools/sniper_params_stats.py new file mode 100644 index 0000000000000000000000000000000000000000..358aa63c5fa46812cccafda3cd2b42d65f89f02c --- /dev/null +++ b/PaddleDetection-release-2.6/tools/sniper_params_stats.py @@ -0,0 +1,178 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import json +import logging +import numpy as np + +from ppdet.utils.logger import setup_logger +logger = setup_logger('sniper_params_stats') + +def get_default_params(architecture): + """get_default_params""" + if architecture == "FasterRCNN": + anchor_range = np.array([64., 512.]) # for frcnn-fpn + # anchor_range = np.array([16., 373.]) # for yolov3 + # anchor_range = np.array([32., 373.]) # for yolov3 + default_crop_size = 1536 # mod 32 for frcnn-fpn + default_max_bbox_size = 352 + elif architecture == "YOLOv3": + anchor_range = np.array([32., 373.]) # for yolov3 + default_crop_size = 800 # mod 32 for yolov3 + default_max_bbox_size = 352 + else: + raise NotImplementedError + + return anchor_range, default_crop_size, default_max_bbox_size + + +def get_box_ratios(anno_file): + """ + get_size_ratios + :param anno_file: coco anno flile + :return: size_ratio: (box_long_size / pic_long_size) + """ + coco_dict = json.load(open(anno_file)) + image_list = coco_dict['images'] + anno_list = coco_dict['annotations'] + + image_id2hw = {} + for im_dict in image_list: + im_id = im_dict['id'] + h, w = im_dict['height'], im_dict['width'] + image_id2hw[im_id] = (h, w) + + box_ratios = [] + for a_dict in anno_list: + im_id = a_dict['image_id'] + im_h, im_w = image_id2hw[im_id] + bbox = a_dict['bbox'] + x1, y1, w, h = bbox + pic_long = max(im_h, im_w) + box_long = max(w, h) + box_ratios.append(box_long / pic_long) + + return np.array(box_ratios) + + +def get_target_size_and_valid_box_ratios(anchor_range, box_ratio_p2, box_ratio_p98): + """get_scale_and_ratios""" + anchor_better_low, anchor_better_high = anchor_range # (60., 512.) + anchor_center = np.sqrt(anchor_better_high * anchor_better_low) + + anchor_log_range = np.log10(anchor_better_high) - np.log10(anchor_better_low) + box_ratio_log_range = np.log10(box_ratio_p98) - np.log10(box_ratio_p2) + logger.info("anchor_log_range:{}, box_ratio_log_range:{}".format(anchor_log_range, box_ratio_log_range)) + + box_cut_num = int(np.ceil(box_ratio_log_range / anchor_log_range)) + box_ratio_log_window = box_ratio_log_range / box_cut_num + logger.info("box_cut_num:{}, box_ratio_log_window:{}".format(box_cut_num, box_ratio_log_window)) + + image_target_sizes = [] + valid_ratios = [] + for i in range(box_cut_num): + # # method1: align center + # box_ratio_log_center = np.log10(p2) + 0.5 * box_ratio_log_window + i * box_ratio_log_window + # box_ratio_center = np.power(10, box_ratio_log_center) + # scale = anchor_center / box_ratio_center + # method2: align left low + box_ratio_low = np.power(10, np.log10(box_ratio_p2) + i * box_ratio_log_window) + image_target_size = anchor_better_low / box_ratio_low + + image_target_sizes.append(int(image_target_size)) + valid_ratio = anchor_range / image_target_size + valid_ratios.append(valid_ratio.tolist()) + + logger.info("Box cut {}".format(i)) + logger.info("box_ratio_low: {}".format(box_ratio_low)) + logger.info("image_target_size: {}".format(image_target_size)) + logger.info("valid_ratio: {}".format(valid_ratio)) + + return image_target_sizes, valid_ratios + + +def get_valid_ranges(valid_ratios): + """ + get_valid_box_ratios_range + :param valid_ratios: + :return: + """ + valid_ranges = [] + if len(valid_ratios) == 1: + valid_ranges.append([-1, -1]) + else: + for i, vratio in enumerate(valid_ratios): + if i == 0: + valid_ranges.append([-1, vratio[1]]) + elif i == len(valid_ratios) - 1: + valid_ranges.append([vratio[0], -1]) + else: + valid_ranges.append(vratio) + return valid_ranges + + +def get_percentile(a_array, low_percent, high_percent): + """ + get_percentile + :param low_percent: + :param high_percent: + :return: + """ + array_p0 = min(a_array) + array_p100 = max(a_array) + array_plow = np.percentile(a_array, low_percent) + array_phigh = np.percentile(a_array, high_percent) + logger.info( + "array_percentile(0): {},array_percentile low({}): {}, " + "array_percentile high({}): {}, array_percentile 100: {}".format( + array_p0, low_percent, array_plow, high_percent, array_phigh, array_p100)) + return array_plow, array_phigh + + +def sniper_anno_stats(architecture, anno_file): + """ + sniper_anno_stats + :param anno_file: + :return: + """ + + anchor_range, default_crop_size, default_max_bbox_size = get_default_params(architecture) + + box_ratios = get_box_ratios(anno_file) + + box_ratio_p8, box_ratio_p92 = get_percentile(box_ratios, 8, 92) + + image_target_sizes, valid_box_ratios = get_target_size_and_valid_box_ratios(anchor_range, box_ratio_p8, box_ratio_p92) + + valid_ranges = get_valid_ranges(valid_box_ratios) + + crop_size = min(default_crop_size, min([item for item in image_target_sizes])) + crop_size = int(np.ceil(crop_size / 32.) * 32.) + crop_stride = max(min(default_max_bbox_size, crop_size), crop_size - default_max_bbox_size) + logger.info("Result".center(100, '-')) + logger.info("image_target_sizes: {}".format(image_target_sizes)) + logger.info("valid_box_ratio_ranges: {}".format(valid_ranges)) + logger.info("chip_target_size: {}, chip_target_stride: {}".format(crop_size, crop_stride)) + + return { + "image_target_sizes": image_target_sizes, + "valid_box_ratio_ranges": valid_ranges, + "chip_target_size": crop_size, + "chip_target_stride": crop_stride + } + +if __name__=="__main__": + architecture, anno_file = sys.argv[1], sys.argv[2] + sniper_anno_stats(architecture, anno_file) diff --git a/PaddleDetection-release-2.6/tools/train.py b/PaddleDetection-release-2.6/tools/train.py new file mode 100644 index 0000000000000000000000000000000000000000..ec846519e997f0a70183f1972aa3120555a0f446 --- /dev/null +++ b/PaddleDetection-release-2.6/tools/train.py @@ -0,0 +1,202 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +# add python path of PaddleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +# ignore warning log +import warnings +warnings.filterwarnings('ignore') + +import paddle + +from ppdet.core.workspace import load_config, merge_config + +from ppdet.engine import Trainer, TrainerCot, init_parallel_env, set_random_seed, init_fleet_env +from ppdet.engine.trainer_ssod import Trainer_DenseTeacher + +from ppdet.slim import build_slim_model + +from ppdet.utils.cli import ArgsParser, merge_args +import ppdet.utils.check as check +from ppdet.utils.logger import setup_logger +logger = setup_logger('train') + + +def parse_args(): + parser = ArgsParser() + parser.add_argument( + "--eval", + action='store_true', + default=False, + help="Whether to perform evaluation in train") + parser.add_argument( + "-r", "--resume", default=None, help="weights path for resume") + parser.add_argument( + "--slim_config", + default=None, + type=str, + help="Configuration file of slim method.") + parser.add_argument( + "--enable_ce", + type=bool, + default=False, + help="If set True, enable continuous evaluation job." + "This flag is only used for internal test.") + parser.add_argument( + "--amp", + action='store_true', + default=False, + help="Enable auto mixed precision training.") + parser.add_argument( + "--fleet", action='store_true', default=False, help="Use fleet or not") + parser.add_argument( + "--use_vdl", + type=bool, + default=False, + help="whether to record the data to VisualDL.") + parser.add_argument( + '--vdl_log_dir', + type=str, + default="vdl_log_dir/scalar", + help='VisualDL logging directory for scalar.') + parser.add_argument( + "--use_wandb", + type=bool, + default=False, + help="whether to record the data to wandb.") + parser.add_argument( + '--save_prediction_only', + action='store_true', + default=False, + help='Whether to save the evaluation results only') + parser.add_argument( + '--profiler_options', + type=str, + default=None, + help="The option of profiler, which should be in " + "format \"key1=value1;key2=value2;key3=value3\"." + "please see ppdet/utils/profiler.py for detail.") + parser.add_argument( + '--save_proposals', + action='store_true', + default=False, + help='Whether to save the train proposals') + parser.add_argument( + '--proposals_path', + type=str, + default="sniper/proposals.json", + help='Train proposals directory') + parser.add_argument( + "--to_static", + action='store_true', + default=False, + help="Enable dy2st to train.") + + args = parser.parse_args() + return args + + +def run(FLAGS, cfg): + # init fleet environment + if cfg.fleet: + init_fleet_env(cfg.get('find_unused_parameters', False)) + else: + # init parallel environment if nranks > 1 + init_parallel_env() + + if FLAGS.enable_ce: + set_random_seed(0) + + # build trainer + ssod_method = cfg.get('ssod_method', None) + if ssod_method is not None: + if ssod_method == 'DenseTeacher': + trainer = Trainer_DenseTeacher(cfg, mode='train') + else: + raise ValueError( + "Semi-Supervised Object Detection only support DenseTeacher now." + ) + elif cfg.get('use_cot', False): + trainer = TrainerCot(cfg, mode='train') + else: + trainer = Trainer(cfg, mode='train') + + # load weights + if FLAGS.resume is not None: + trainer.resume_weights(FLAGS.resume) + elif 'pretrain_weights' in cfg and cfg.pretrain_weights: + trainer.load_weights(cfg.pretrain_weights) + + # training + trainer.train(FLAGS.eval) + + +def main(): + FLAGS = parse_args() + cfg = load_config(FLAGS.config) + merge_args(cfg, FLAGS) + merge_config(FLAGS.opt) + + # disable npu in config by default + if 'use_npu' not in cfg: + cfg.use_npu = False + + # disable xpu in config by default + if 'use_xpu' not in cfg: + cfg.use_xpu = False + + if 'use_gpu' not in cfg: + cfg.use_gpu = False + + # disable mlu in config by default + if 'use_mlu' not in cfg: + cfg.use_mlu = False + + if cfg.use_gpu: + place = paddle.set_device('gpu') + elif cfg.use_npu: + place = paddle.set_device('npu') + elif cfg.use_xpu: + place = paddle.set_device('xpu') + elif cfg.use_mlu: + place = paddle.set_device('mlu') + else: + place = paddle.set_device('cpu') + + if FLAGS.slim_config: + cfg = build_slim_model(cfg, FLAGS.slim_config) + + # FIXME: Temporarily solve the priority problem of FLAGS.opt + merge_config(FLAGS.opt) + check.check_config(cfg) + check.check_gpu(cfg.use_gpu) + check.check_npu(cfg.use_npu) + check.check_xpu(cfg.use_xpu) + check.check_mlu(cfg.use_mlu) + check.check_version() + + run(FLAGS, cfg) + + +if __name__ == "__main__": + main() diff --git a/PaddleDetection-release-2.6/tools/x2coco.py b/PaddleDetection-release-2.6/tools/x2coco.py new file mode 100644 index 0000000000000000000000000000000000000000..78e8619b42edfa8343770b1bbf12991d8d4d326a --- /dev/null +++ b/PaddleDetection-release-2.6/tools/x2coco.py @@ -0,0 +1,542 @@ +#!/usr/bin/env python +# coding: utf-8 +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import glob +import json +import os +import os.path as osp +import shutil +import xml.etree.ElementTree as ET + +import numpy as np +import PIL.ImageDraw +from tqdm import tqdm +import cv2 + +label_to_num = {} +categories_list = [] +labels_list = [] + + +class MyEncoder(json.JSONEncoder): + def default(self, obj): + if isinstance(obj, np.integer): + return int(obj) + elif isinstance(obj, np.floating): + return float(obj) + elif isinstance(obj, np.ndarray): + return obj.tolist() + else: + return super(MyEncoder, self).default(obj) + + +def images_labelme(data, num): + image = {} + image['height'] = data['imageHeight'] + image['width'] = data['imageWidth'] + image['id'] = num + 1 + if '\\' in data['imagePath']: + image['file_name'] = data['imagePath'].split('\\')[-1] + else: + image['file_name'] = data['imagePath'].split('/')[-1] + return image + + +def images_cityscape(data, num, img_file): + image = {} + image['height'] = data['imgHeight'] + image['width'] = data['imgWidth'] + image['id'] = num + 1 + image['file_name'] = img_file + return image + + +def categories(label, labels_list): + category = {} + category['supercategory'] = 'component' + category['id'] = len(labels_list) + 1 + category['name'] = label + return category + + +def annotations_rectangle(points, label, image_num, object_num, label_to_num): + annotation = {} + seg_points = np.asarray(points).copy() + seg_points[1, :] = np.asarray(points)[2, :] + seg_points[2, :] = np.asarray(points)[1, :] + annotation['segmentation'] = [list(seg_points.flatten())] + annotation['iscrowd'] = 0 + annotation['image_id'] = image_num + 1 + annotation['bbox'] = list( + map(float, [ + points[0][0], points[0][1], points[1][0] - points[0][0], points[1][ + 1] - points[0][1] + ])) + annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3] + annotation['category_id'] = label_to_num[label] + annotation['id'] = object_num + 1 + return annotation + + +def annotations_polygon(height, width, points, label, image_num, object_num, + label_to_num): + annotation = {} + annotation['segmentation'] = [list(np.asarray(points).flatten())] + annotation['iscrowd'] = 0 + annotation['image_id'] = image_num + 1 + annotation['bbox'] = list(map(float, get_bbox(height, width, points))) + annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3] + annotation['category_id'] = label_to_num[label] + annotation['id'] = object_num + 1 + return annotation + + +def get_bbox(height, width, points): + polygons = points + mask = np.zeros([height, width], dtype=np.uint8) + mask = PIL.Image.fromarray(mask) + xy = list(map(tuple, polygons)) + PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1) + mask = np.array(mask, dtype=bool) + index = np.argwhere(mask == 1) + rows = index[:, 0] + clos = index[:, 1] + left_top_r = np.min(rows) + left_top_c = np.min(clos) + right_bottom_r = np.max(rows) + right_bottom_c = np.max(clos) + return [ + left_top_c, left_top_r, right_bottom_c - left_top_c, + right_bottom_r - left_top_r + ] + + +def deal_json(ds_type, img_path, json_path): + data_coco = {} + images_list = [] + annotations_list = [] + image_num = -1 + object_num = -1 + for img_file in os.listdir(img_path): + img_label = os.path.splitext(img_file)[0] + if img_file.split('.')[ + -1] not in ['bmp', 'jpg', 'jpeg', 'png', 'JPEG', 'JPG', 'PNG']: + continue + label_file = osp.join(json_path, img_label + '.json') + print('Generating dataset from:', label_file) + image_num = image_num + 1 + with open(label_file) as f: + data = json.load(f) + if ds_type == 'labelme': + images_list.append(images_labelme(data, image_num)) + elif ds_type == 'cityscape': + images_list.append(images_cityscape(data, image_num, img_file)) + if ds_type == 'labelme': + for shapes in data['shapes']: + object_num = object_num + 1 + label = shapes['label'] + if label not in labels_list: + categories_list.append(categories(label, labels_list)) + labels_list.append(label) + label_to_num[label] = len(labels_list) + p_type = shapes['shape_type'] + if p_type == 'polygon': + points = shapes['points'] + annotations_list.append( + annotations_polygon(data['imageHeight'], data[ + 'imageWidth'], points, label, image_num, + object_num, label_to_num)) + + if p_type == 'rectangle': + (x1, y1), (x2, y2) = shapes['points'] + x1, x2 = sorted([x1, x2]) + y1, y2 = sorted([y1, y2]) + points = [[x1, y1], [x2, y2], [x1, y2], [x2, y1]] + annotations_list.append( + annotations_rectangle(points, label, image_num, + object_num, label_to_num)) + elif ds_type == 'cityscape': + for shapes in data['objects']: + object_num = object_num + 1 + label = shapes['label'] + if label not in labels_list: + categories_list.append(categories(label, labels_list)) + labels_list.append(label) + label_to_num[label] = len(labels_list) + points = shapes['polygon'] + annotations_list.append( + annotations_polygon(data['imgHeight'], data[ + 'imgWidth'], points, label, image_num, object_num, + label_to_num)) + data_coco['images'] = images_list + data_coco['categories'] = categories_list + data_coco['annotations'] = annotations_list + return data_coco + + +def voc_get_label_anno(ann_dir_path, ann_ids_path, labels_path): + with open(labels_path, 'r') as f: + labels_str = f.read().split() + labels_ids = list(range(1, len(labels_str) + 1)) + + with open(ann_ids_path, 'r') as f: + ann_ids = [lin.strip().split(' ')[-1] for lin in f.readlines()] + + ann_paths = [] + for aid in ann_ids: + if aid.endswith('xml'): + ann_path = os.path.join(ann_dir_path, aid) + else: + ann_path = os.path.join(ann_dir_path, aid + '.xml') + ann_paths.append(ann_path) + + return dict(zip(labels_str, labels_ids)), ann_paths + + +def voc_get_image_info(annotation_root, im_id): + filename = annotation_root.findtext('filename') + assert filename is not None + img_name = os.path.basename(filename) + + size = annotation_root.find('size') + width = float(size.findtext('width')) + height = float(size.findtext('height')) + + image_info = { + 'file_name': filename, + 'height': height, + 'width': width, + 'id': im_id + } + return image_info + + +def voc_get_coco_annotation(obj, label2id): + label = obj.findtext('name') + assert label in label2id, "label is not in label2id." + category_id = label2id[label] + bndbox = obj.find('bndbox') + xmin = float(bndbox.findtext('xmin')) + ymin = float(bndbox.findtext('ymin')) + xmax = float(bndbox.findtext('xmax')) + ymax = float(bndbox.findtext('ymax')) + assert xmax > xmin and ymax > ymin, "Box size error." + o_width = xmax - xmin + o_height = ymax - ymin + anno = { + 'area': o_width * o_height, + 'iscrowd': 0, + 'bbox': [xmin, ymin, o_width, o_height], + 'category_id': category_id, + 'ignore': 0, + } + return anno + + +def voc_xmls_to_cocojson(annotation_paths, label2id, output_dir, output_file): + output_json_dict = { + "images": [], + "type": "instances", + "annotations": [], + "categories": [] + } + bnd_id = 1 # bounding box start id + im_id = 0 + print('Start converting !') + for a_path in tqdm(annotation_paths): + # Read annotation xml + ann_tree = ET.parse(a_path) + ann_root = ann_tree.getroot() + + img_info = voc_get_image_info(ann_root, im_id) + output_json_dict['images'].append(img_info) + + for obj in ann_root.findall('object'): + ann = voc_get_coco_annotation(obj=obj, label2id=label2id) + ann.update({'image_id': im_id, 'id': bnd_id}) + output_json_dict['annotations'].append(ann) + bnd_id = bnd_id + 1 + im_id += 1 + + for label, label_id in label2id.items(): + category_info = {'supercategory': 'none', 'id': label_id, 'name': label} + output_json_dict['categories'].append(category_info) + output_file = os.path.join(output_dir, output_file) + with open(output_file, 'w') as f: + output_json = json.dumps(output_json_dict) + f.write(output_json) + + +def widerface_to_cocojson(root_path): + train_gt_txt = os.path.join(root_path, "wider_face_split", "wider_face_train_bbx_gt.txt") + val_gt_txt = os.path.join(root_path, "wider_face_split", "wider_face_val_bbx_gt.txt") + train_img_dir = os.path.join(root_path, "WIDER_train", "images") + val_img_dir = os.path.join(root_path, "WIDER_val", "images") + assert train_gt_txt + assert val_gt_txt + assert train_img_dir + assert val_img_dir + save_path = os.path.join(root_path, "widerface_train.json") + widerface_convert(train_gt_txt, train_img_dir, save_path) + print("Wider Face train dataset converts sucess, the json path: {}".format(save_path)) + save_path = os.path.join(root_path, "widerface_val.json") + widerface_convert(val_gt_txt, val_img_dir, save_path) + print("Wider Face val dataset converts sucess, the json path: {}".format(save_path)) + + +def widerface_convert(gt_txt, img_dir, save_path): + output_json_dict = { + "images": [], + "type": "instances", + "annotations": [], + "categories": [{'supercategory': 'none', 'id': 0, 'name': "human_face"}] + } + bnd_id = 1 # bounding box start id + im_id = 0 + print('Start converting !') + with open(gt_txt) as fd: + lines = fd.readlines() + + i = 0 + while i < len(lines): + image_name = lines[i].strip() + bbox_num = int(lines[i + 1].strip()) + i += 2 + img_info = get_widerface_image_info(img_dir, image_name, im_id) + if img_info: + output_json_dict["images"].append(img_info) + for j in range(i, i + bbox_num): + anno = get_widerface_ann_info(lines[j]) + anno.update({'image_id': im_id, 'id': bnd_id}) + output_json_dict['annotations'].append(anno) + bnd_id += 1 + else: + print("The image dose not exist: {}".format(os.path.join(img_dir, image_name))) + bbox_num = 1 if bbox_num == 0 else bbox_num + i += bbox_num + im_id += 1 + with open(save_path, 'w') as f: + output_json = json.dumps(output_json_dict) + f.write(output_json) + + +def get_widerface_image_info(img_root, img_relative_path, img_id): + image_info = {} + save_path = os.path.join(img_root, img_relative_path) + if os.path.exists(save_path): + img = cv2.imread(save_path) + image_info["file_name"] = os.path.join(os.path.basename( + os.path.dirname(img_root)), os.path.basename(img_root), + img_relative_path) + image_info["height"] = img.shape[0] + image_info["width"] = img.shape[1] + image_info["id"] = img_id + return image_info + + +def get_widerface_ann_info(info): + info = [int(x) for x in info.strip().split()] + anno = { + 'area': info[2] * info[3], + 'iscrowd': 0, + 'bbox': [info[0], info[1], info[2], info[3]], + 'category_id': 0, + 'ignore': 0, + 'blur': info[4], + 'expression': info[5], + 'illumination': info[6], + 'invalid': info[7], + 'occlusion': info[8], + 'pose': info[9] + } + return anno + + +def main(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument( + '--dataset_type', + help='the type of dataset, can be `voc`, `widerface`, `labelme` or `cityscape`') + parser.add_argument('--json_input_dir', help='input annotated directory') + parser.add_argument('--image_input_dir', help='image directory') + parser.add_argument( + '--output_dir', help='output dataset directory', default='./') + parser.add_argument( + '--train_proportion', + help='the proportion of train dataset', + type=float, + default=1.0) + parser.add_argument( + '--val_proportion', + help='the proportion of validation dataset', + type=float, + default=0.0) + parser.add_argument( + '--test_proportion', + help='the proportion of test dataset', + type=float, + default=0.0) + parser.add_argument( + '--voc_anno_dir', + help='In Voc format dataset, path to annotation files directory.', + type=str, + default=None) + parser.add_argument( + '--voc_anno_list', + help='In Voc format dataset, path to annotation files ids list.', + type=str, + default=None) + parser.add_argument( + '--voc_label_list', + help='In Voc format dataset, path to label list. The content of each line is a category.', + type=str, + default=None) + parser.add_argument( + '--voc_out_name', + type=str, + default='voc.json', + help='In Voc format dataset, path to output json file') + parser.add_argument( + '--widerface_root_dir', + help='The root_path for wider face dataset, which contains `wider_face_split`, `WIDER_train` and `WIDER_val`.And the json file will save in this path', + type=str, + default=None) + args = parser.parse_args() + try: + assert args.dataset_type in ['voc', 'labelme', 'cityscape', 'widerface'] + except AssertionError as e: + print( + 'Now only support the voc, cityscape dataset and labelme dataset!!') + os._exit(0) + + if args.dataset_type == 'voc': + assert args.voc_anno_dir and args.voc_anno_list and args.voc_label_list + label2id, ann_paths = voc_get_label_anno( + args.voc_anno_dir, args.voc_anno_list, args.voc_label_list) + voc_xmls_to_cocojson( + annotation_paths=ann_paths, + label2id=label2id, + output_dir=args.output_dir, + output_file=args.voc_out_name) + elif args.dataset_type == "widerface": + assert args.widerface_root_dir + widerface_to_cocojson(args.widerface_root_dir) + else: + try: + assert os.path.exists(args.json_input_dir) + except AssertionError as e: + print('The json folder does not exist!') + os._exit(0) + try: + assert os.path.exists(args.image_input_dir) + except AssertionError as e: + print('The image folder does not exist!') + os._exit(0) + try: + assert abs(args.train_proportion + args.val_proportion \ + + args.test_proportion - 1.0) < 1e-5 + except AssertionError as e: + print( + 'The sum of pqoportion of training, validation and test datase must be 1!' + ) + os._exit(0) + + # Allocate the dataset. + total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json'))) + if args.train_proportion != 0: + train_num = int(total_num * args.train_proportion) + out_dir = args.output_dir + '/train' + if not os.path.exists(out_dir): + os.makedirs(out_dir) + else: + train_num = 0 + if args.val_proportion == 0.0: + val_num = 0 + test_num = total_num - train_num + out_dir = args.output_dir + '/test' + if args.test_proportion != 0.0 and not os.path.exists(out_dir): + os.makedirs(out_dir) + else: + val_num = int(total_num * args.val_proportion) + test_num = total_num - train_num - val_num + val_out_dir = args.output_dir + '/val' + if not os.path.exists(val_out_dir): + os.makedirs(val_out_dir) + test_out_dir = args.output_dir + '/test' + if args.test_proportion != 0.0 and not os.path.exists(test_out_dir): + os.makedirs(test_out_dir) + count = 1 + for img_name in os.listdir(args.image_input_dir): + if count <= train_num: + if osp.exists(args.output_dir + '/train/'): + shutil.copyfile( + osp.join(args.image_input_dir, img_name), + osp.join(args.output_dir + '/train/', img_name)) + else: + if count <= train_num + val_num: + if osp.exists(args.output_dir + '/val/'): + shutil.copyfile( + osp.join(args.image_input_dir, img_name), + osp.join(args.output_dir + '/val/', img_name)) + else: + if osp.exists(args.output_dir + '/test/'): + shutil.copyfile( + osp.join(args.image_input_dir, img_name), + osp.join(args.output_dir + '/test/', img_name)) + count = count + 1 + + # Deal with the json files. + if not os.path.exists(args.output_dir + '/annotations'): + os.makedirs(args.output_dir + '/annotations') + if args.train_proportion != 0: + train_data_coco = deal_json(args.dataset_type, + args.output_dir + '/train', + args.json_input_dir) + train_json_path = osp.join(args.output_dir + '/annotations', + 'instance_train.json') + json.dump( + train_data_coco, + open(train_json_path, 'w'), + indent=4, + cls=MyEncoder) + if args.val_proportion != 0: + val_data_coco = deal_json(args.dataset_type, + args.output_dir + '/val', + args.json_input_dir) + val_json_path = osp.join(args.output_dir + '/annotations', + 'instance_val.json') + json.dump( + val_data_coco, + open(val_json_path, 'w'), + indent=4, + cls=MyEncoder) + if args.test_proportion != 0: + test_data_coco = deal_json(args.dataset_type, + args.output_dir + '/test', + args.json_input_dir) + test_json_path = osp.join(args.output_dir + '/annotations', + 'instance_test.json') + json.dump( + test_data_coco, + open(test_json_path, 'w'), + indent=4, + cls=MyEncoder) + + +if __name__ == '__main__': + main() diff --git a/labels.txt b/labels.txt new file mode 100644 index 0000000000000000000000000000000000000000..9513a5faa0f2f75a3f9aa2470ff541a16dc888da --- /dev/null +++ b/labels.txt @@ -0,0 +1,2 @@ +fall +nofall \ No newline at end of file diff --git a/model_final.pdparams b/model_final.pdparams new file mode 100644 index 0000000000000000000000000000000000000000..fb683fe8916da7663700e08fe960cd95977bce4f --- /dev/null +++ b/model_final.pdparams @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95ab73bfb0983295eab2e36a6373f080e0b951431fde93ef115faff4c6d2b637 +size 246347282 diff --git a/optimizer_270e.yml b/optimizer_270e.yml new file mode 100644 index 0000000000000000000000000000000000000000..c439200758005edd357b9232127a38884062a66d --- /dev/null +++ b/optimizer_270e.yml @@ -0,0 +1,21 @@ +epoch: 400 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/runtime.yml b/runtime.yml new file mode 100644 index 0000000000000000000000000000000000000000..33a403950d1ddbbb5c41323be41d9ea7a57a93ec --- /dev/null +++ b/runtime.yml @@ -0,0 +1,16 @@ +use_gpu: false +use_xpu: false +use_mlu: false +use_npu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false +print_params: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + fuse_conv_bn: False diff --git a/setup.py b/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..bc057d393857177d717e51136a900926b39cf7bb --- /dev/null +++ b/setup.py @@ -0,0 +1,133 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import os.path as osp +import glob +import shutil +import subprocess +from setuptools import find_packages, setup + +# ============== version definition ============== + +PPDET_VERSION = "2.6.0" + + +def parse_version(): + return PPDET_VERSION.replace('-', '') + + +def git_commit(): + try: + cmd = ['git', 'rev-parse', 'HEAD'] + git_commit = subprocess.Popen( + cmd, + stdout=subprocess.PIPE, ).communicate()[0].strip() + git_commit = git_commit.decode() + except: + git_commit = 'Unknown' + + return str(git_commit) + + +def write_version_py(filename='ppdet/version.py'): + ver_str = """# THIS FILE IS GENERATED FROM PADDLEPADDLE SETUP.PY +# +full_version = '%(version)s' +commit = '%(commit)s' +""" + + _git_commit = git_commit() + with open(filename, 'w') as f: + f.write(ver_str % {'version': PPDET_VERSION, 'commit': _git_commit}) + + +write_version_py() + +# ============== version definition ============== + + +def readme(): + with open('README.md', encoding='utf-8') as f: + content = f.read() + return content + + +def parse_requirements(fname): + with open(fname, encoding="utf-8-sig") as f: + requirements = f.readlines() + return requirements + + +def package_model_zoo(): + cur_dir = osp.dirname(osp.realpath(__file__)) + cfg_dir = osp.join(cur_dir, "configs") + cfgs = glob.glob(osp.join(cfg_dir, '*/*.yml')) + + valid_cfgs = [] + for cfg in cfgs: + # exclude dataset base config + if osp.split(osp.split(cfg)[0])[1] not in ['datasets']: + valid_cfgs.append(cfg) + model_names = [ + osp.relpath(cfg, cfg_dir).replace(".yml", "") for cfg in valid_cfgs + ] + + model_zoo_file = osp.join(cur_dir, 'ppdet', 'model_zoo', 'MODEL_ZOO') + with open(model_zoo_file, 'w') as wf: + for model_name in model_names: + wf.write("{}\n".format(model_name)) + + return [model_zoo_file] + + +packages = [ + 'ppdet', + 'ppdet.core', + 'ppdet.data', + 'ppdet.engine', + 'ppdet.metrics', + 'ppdet.modeling', + 'ppdet.model_zoo', + 'ppdet.slim', + 'ppdet.utils', +] + +if __name__ == "__main__": + setup( + name='paddledet', + packages=find_packages(exclude=("configs", "tools", "deploy")), + package_data={'ppdet.model_zoo': package_model_zoo()}, + author='PaddlePaddle', + version=parse_version(), + install_requires=parse_requirements('./requirements.txt'), + description='Object detection and instance segmentation toolkit based on PaddlePaddle', + long_description=readme(), + long_description_content_type='text/markdown', + url='https://github.com/PaddlePaddle/PaddleDetection', + download_url='https://github.com/PaddlePaddle/PaddleDetection.git', + keywords=['ppdet paddle ppyolo'], + classifiers=[ + 'Intended Audience :: Developers', + 'License :: OSI Approved :: Apache Software License', + 'Operating System :: OS Independent', + 'Natural Language :: Chinese (Simplified)', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', 'Topic :: Utilities' + ], + license='Apache License 2.0', + ext_modules=[]) diff --git a/train_list.txt b/train_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..458ec80aa6fcad04f038b211207995553455cb55 --- /dev/null +++ b/train_list.txt @@ -0,0 +1,1008 @@ +JPEGImages/fall_127.jpg Annotations/fall_127.xml +JPEGImages/fall_858.jpg Annotations/fall_858.xml +JPEGImages/fall_197.jpg Annotations/fall_197.xml +JPEGImages/fall_1205.jpg Annotations/fall_1205.xml +JPEGImages/fall_327.jpg Annotations/fall_327.xml +JPEGImages/fall_49.jpg Annotations/fall_49.xml +JPEGImages/fall_931.jpg Annotations/fall_931.xml +JPEGImages/fall_620.jpg Annotations/fall_620.xml +JPEGImages/fall_1228.jpg Annotations/fall_1228.xml +JPEGImages/fall_453.jpg Annotations/fall_453.xml +JPEGImages/fall_1118.jpg Annotations/fall_1118.xml +JPEGImages/fall_477.jpg Annotations/fall_477.xml +JPEGImages/fall_4.jpg Annotations/fall_4.xml +JPEGImages/fall_129.jpg Annotations/fall_129.xml +JPEGImages/fall_823.jpg Annotations/fall_823.xml +JPEGImages/fall_797.jpg Annotations/fall_797.xml +JPEGImages/fall_69.jpg Annotations/fall_69.xml +JPEGImages/fall_178.jpg Annotations/fall_178.xml +JPEGImages/fall_507.jpg Annotations/fall_507.xml +JPEGImages/fall_362.jpg Annotations/fall_362.xml +JPEGImages/fall_14.jpg Annotations/fall_14.xml +JPEGImages/fall_584.jpg Annotations/fall_584.xml +JPEGImages/fall_561.jpg Annotations/fall_561.xml +JPEGImages/fall_713.jpg Annotations/fall_713.xml +JPEGImages/fall_1429.jpg Annotations/fall_1429.xml +JPEGImages/fall_755.jpg Annotations/fall_755.xml +JPEGImages/fall_304.jpg Annotations/fall_304.xml +JPEGImages/fall_164.jpg Annotations/fall_164.xml +JPEGImages/fall_1011.jpg Annotations/fall_1011.xml +JPEGImages/fall_761.jpg Annotations/fall_761.xml +JPEGImages/fall_884.jpg Annotations/fall_884.xml +JPEGImages/fall_442.jpg Annotations/fall_442.xml +JPEGImages/fall_1372.jpg Annotations/fall_1372.xml +JPEGImages/fall_387.jpg Annotations/fall_387.xml +JPEGImages/fall_1264.jpg Annotations/fall_1264.xml +JPEGImages/fall_1182.jpg Annotations/fall_1182.xml +JPEGImages/fall_162.jpg Annotations/fall_162.xml +JPEGImages/fall_272.jpg Annotations/fall_272.xml +JPEGImages/fall_217.jpg Annotations/fall_217.xml +JPEGImages/fall_45.jpg Annotations/fall_45.xml +JPEGImages/fall_229.jpg Annotations/fall_229.xml +JPEGImages/fall_878.jpg Annotations/fall_878.xml +JPEGImages/fall_651.jpg Annotations/fall_651.xml +JPEGImages/fall_1370.jpg Annotations/fall_1370.xml +JPEGImages/fall_1145.jpg Annotations/fall_1145.xml +JPEGImages/fall_185.jpg Annotations/fall_185.xml +JPEGImages/fall_261.jpg Annotations/fall_261.xml +JPEGImages/fall_299.jpg Annotations/fall_299.xml +JPEGImages/fall_731.jpg Annotations/fall_731.xml +JPEGImages/fall_479.jpg Annotations/fall_479.xml +JPEGImages/fall_988.jpg Annotations/fall_988.xml +JPEGImages/fall_67.jpg Annotations/fall_67.xml +JPEGImages/fall_354.jpg Annotations/fall_354.xml +JPEGImages/fall_1157.jpg Annotations/fall_1157.xml +JPEGImages/fall_983.jpg Annotations/fall_983.xml +JPEGImages/fall_629.jpg Annotations/fall_629.xml +JPEGImages/fall_519.jpg Annotations/fall_519.xml +JPEGImages/fall_1081.jpg Annotations/fall_1081.xml +JPEGImages/fall_1269.jpg Annotations/fall_1269.xml +JPEGImages/fall_1231.jpg Annotations/fall_1231.xml +JPEGImages/fall_1062.jpg Annotations/fall_1062.xml +JPEGImages/fall_1397.jpg Annotations/fall_1397.xml +JPEGImages/fall_1406.jpg Annotations/fall_1406.xml +JPEGImages/fall_165.jpg Annotations/fall_165.xml +JPEGImages/fall_967.jpg Annotations/fall_967.xml +JPEGImages/fall_1223.jpg Annotations/fall_1223.xml +JPEGImages/fall_130.jpg Annotations/fall_130.xml +JPEGImages/fall_114.jpg Annotations/fall_114.xml +JPEGImages/fall_324.jpg Annotations/fall_324.xml +JPEGImages/fall_377.jpg Annotations/fall_377.xml +JPEGImages/fall_597.jpg Annotations/fall_597.xml +JPEGImages/fall_1402.jpg Annotations/fall_1402.xml +JPEGImages/fall_209.jpg Annotations/fall_209.xml +JPEGImages/fall_342.jpg Annotations/fall_342.xml +JPEGImages/fall_809.jpg Annotations/fall_809.xml +JPEGImages/fall_1276.jpg Annotations/fall_1276.xml +JPEGImages/fall_329.jpg Annotations/fall_329.xml +JPEGImages/fall_228.jpg Annotations/fall_228.xml +JPEGImages/fall_223.jpg Annotations/fall_223.xml +JPEGImages/fall_1295.jpg Annotations/fall_1295.xml +JPEGImages/fall_409.jpg Annotations/fall_409.xml +JPEGImages/fall_555.jpg Annotations/fall_555.xml +JPEGImages/fall_974.jpg Annotations/fall_974.xml +JPEGImages/fall_53.jpg Annotations/fall_53.xml +JPEGImages/fall_194.jpg Annotations/fall_194.xml +JPEGImages/fall_412.jpg Annotations/fall_412.xml +JPEGImages/fall_1140.jpg Annotations/fall_1140.xml +JPEGImages/fall_875.jpg Annotations/fall_875.xml +JPEGImages/fall_1217.jpg Annotations/fall_1217.xml +JPEGImages/fall_796.jpg Annotations/fall_796.xml +JPEGImages/fall_554.jpg Annotations/fall_554.xml +JPEGImages/fall_606.jpg Annotations/fall_606.xml +JPEGImages/fall_1342.jpg Annotations/fall_1342.xml +JPEGImages/fall_882.jpg Annotations/fall_882.xml +JPEGImages/fall_266.jpg Annotations/fall_266.xml +JPEGImages/fall_603.jpg Annotations/fall_603.xml +JPEGImages/fall_738.jpg Annotations/fall_738.xml +JPEGImages/fall_376.jpg Annotations/fall_376.xml +JPEGImages/fall_1386.jpg Annotations/fall_1386.xml +JPEGImages/fall_856.jpg Annotations/fall_856.xml +JPEGImages/fall_516.jpg Annotations/fall_516.xml +JPEGImages/fall_504.jpg Annotations/fall_504.xml +JPEGImages/fall_355.jpg Annotations/fall_355.xml +JPEGImages/fall_1034.jpg Annotations/fall_1034.xml +JPEGImages/fall_501.jpg Annotations/fall_501.xml +JPEGImages/fall_849.jpg Annotations/fall_849.xml +JPEGImages/fall_518.jpg Annotations/fall_518.xml +JPEGImages/fall_1268.jpg Annotations/fall_1268.xml +JPEGImages/fall_1237.jpg Annotations/fall_1237.xml +JPEGImages/fall_487.jpg Annotations/fall_487.xml +JPEGImages/fall_947.jpg Annotations/fall_947.xml +JPEGImages/fall_51.jpg Annotations/fall_51.xml +JPEGImages/fall_1161.jpg Annotations/fall_1161.xml +JPEGImages/fall_44.jpg Annotations/fall_44.xml +JPEGImages/fall_958.jpg Annotations/fall_958.xml +JPEGImages/fall_560.jpg Annotations/fall_560.xml +JPEGImages/fall_1434.jpg Annotations/fall_1434.xml +JPEGImages/fall_321.jpg Annotations/fall_321.xml +JPEGImages/fall_1412.jpg Annotations/fall_1412.xml +JPEGImages/fall_1039.jpg Annotations/fall_1039.xml +JPEGImages/fall_160.jpg Annotations/fall_160.xml +JPEGImages/fall_1149.jpg Annotations/fall_1149.xml +JPEGImages/fall_1362.jpg Annotations/fall_1362.xml +JPEGImages/fall_633.jpg Annotations/fall_633.xml +JPEGImages/fall_263.jpg Annotations/fall_263.xml +JPEGImages/fall_1218.jpg Annotations/fall_1218.xml +JPEGImages/fall_1263.jpg Annotations/fall_1263.xml +JPEGImages/fall_830.jpg Annotations/fall_830.xml +JPEGImages/fall_295.jpg Annotations/fall_295.xml +JPEGImages/fall_665.jpg Annotations/fall_665.xml +JPEGImages/fall_98.jpg Annotations/fall_98.xml +JPEGImages/fall_115.jpg Annotations/fall_115.xml +JPEGImages/fall_613.jpg Annotations/fall_613.xml +JPEGImages/fall_773.jpg Annotations/fall_773.xml +JPEGImages/fall_100.jpg Annotations/fall_100.xml +JPEGImages/fall_543.jpg Annotations/fall_543.xml +JPEGImages/fall_746.jpg Annotations/fall_746.xml +JPEGImages/fall_885.jpg Annotations/fall_885.xml +JPEGImages/fall_451.jpg Annotations/fall_451.xml +JPEGImages/fall_843.jpg Annotations/fall_843.xml +JPEGImages/fall_897.jpg Annotations/fall_897.xml +JPEGImages/fall_193.jpg Annotations/fall_193.xml +JPEGImages/fall_438.jpg Annotations/fall_438.xml +JPEGImages/fall_1015.jpg Annotations/fall_1015.xml +JPEGImages/fall_642.jpg Annotations/fall_642.xml +JPEGImages/fall_981.jpg Annotations/fall_981.xml +JPEGImages/fall_58.jpg Annotations/fall_58.xml +JPEGImages/fall_1077.jpg Annotations/fall_1077.xml +JPEGImages/fall_820.jpg Annotations/fall_820.xml +JPEGImages/fall_721.jpg Annotations/fall_721.xml +JPEGImages/fall_1172.jpg Annotations/fall_1172.xml +JPEGImages/fall_964.jpg Annotations/fall_964.xml +JPEGImages/fall_490.jpg Annotations/fall_490.xml +JPEGImages/fall_1302.jpg Annotations/fall_1302.xml +JPEGImages/fall_128.jpg Annotations/fall_128.xml +JPEGImages/fall_1199.jpg Annotations/fall_1199.xml +JPEGImages/fall_710.jpg Annotations/fall_710.xml +JPEGImages/fall_278.jpg Annotations/fall_278.xml +JPEGImages/fall_1086.jpg Annotations/fall_1086.xml +JPEGImages/fall_1437.jpg Annotations/fall_1437.xml +JPEGImages/fall_397.jpg Annotations/fall_397.xml +JPEGImages/fall_432.jpg Annotations/fall_432.xml +JPEGImages/fall_313.jpg Annotations/fall_313.xml +JPEGImages/fall_110.jpg Annotations/fall_110.xml +JPEGImages/fall_918.jpg Annotations/fall_918.xml +JPEGImages/fall_1192.jpg Annotations/fall_1192.xml +JPEGImages/fall_812.jpg Annotations/fall_812.xml +JPEGImages/fall_833.jpg Annotations/fall_833.xml +JPEGImages/fall_225.jpg Annotations/fall_225.xml +JPEGImages/fall_1.jpg Annotations/fall_1.xml +JPEGImages/fall_235.jpg Annotations/fall_235.xml +JPEGImages/fall_212.jpg Annotations/fall_212.xml +JPEGImages/fall_1315.jpg Annotations/fall_1315.xml +JPEGImages/fall_275.jpg Annotations/fall_275.xml +JPEGImages/fall_154.jpg Annotations/fall_154.xml +JPEGImages/fall_1313.jpg Annotations/fall_1313.xml +JPEGImages/fall_357.jpg Annotations/fall_357.xml +JPEGImages/fall_1238.jpg Annotations/fall_1238.xml +JPEGImages/fall_716.jpg Annotations/fall_716.xml +JPEGImages/fall_963.jpg Annotations/fall_963.xml +JPEGImages/fall_1153.jpg Annotations/fall_1153.xml +JPEGImages/fall_326.jpg Annotations/fall_326.xml +JPEGImages/fall_116.jpg Annotations/fall_116.xml +JPEGImages/fall_920.jpg Annotations/fall_920.xml +JPEGImages/fall_1420.jpg Annotations/fall_1420.xml +JPEGImages/fall_585.jpg Annotations/fall_585.xml +JPEGImages/fall_334.jpg Annotations/fall_334.xml +JPEGImages/fall_1371.jpg Annotations/fall_1371.xml +JPEGImages/fall_233.jpg Annotations/fall_233.xml +JPEGImages/fall_1094.jpg Annotations/fall_1094.xml +JPEGImages/fall_328.jpg Annotations/fall_328.xml +JPEGImages/fall_877.jpg Annotations/fall_877.xml +JPEGImages/fall_1244.jpg Annotations/fall_1244.xml +JPEGImages/fall_204.jpg Annotations/fall_204.xml +JPEGImages/fall_566.jpg Annotations/fall_566.xml +JPEGImages/fall_1415.jpg Annotations/fall_1415.xml +JPEGImages/fall_1294.jpg Annotations/fall_1294.xml +JPEGImages/fall_901.jpg Annotations/fall_901.xml +JPEGImages/fall_293.jpg Annotations/fall_293.xml +JPEGImages/fall_425.jpg Annotations/fall_425.xml +JPEGImages/fall_315.jpg Annotations/fall_315.xml +JPEGImages/fall_240.jpg Annotations/fall_240.xml +JPEGImages/fall_668.jpg Annotations/fall_668.xml +JPEGImages/fall_677.jpg Annotations/fall_677.xml +JPEGImages/fall_1312.jpg Annotations/fall_1312.xml +JPEGImages/fall_340.jpg Annotations/fall_340.xml +JPEGImages/fall_1049.jpg Annotations/fall_1049.xml +JPEGImages/fall_611.jpg Annotations/fall_611.xml +JPEGImages/fall_1089.jpg Annotations/fall_1089.xml +JPEGImages/fall_605.jpg Annotations/fall_605.xml +JPEGImages/fall_151.jpg Annotations/fall_151.xml +JPEGImages/fall_1242.jpg Annotations/fall_1242.xml +JPEGImages/fall_586.jpg Annotations/fall_586.xml +JPEGImages/fall_307.jpg Annotations/fall_307.xml +JPEGImages/fall_912.jpg Annotations/fall_912.xml +JPEGImages/fall_421.jpg Annotations/fall_421.xml +JPEGImages/fall_630.jpg Annotations/fall_630.xml +JPEGImages/fall_1035.jpg Annotations/fall_1035.xml +JPEGImages/fall_623.jpg Annotations/fall_623.xml +JPEGImages/fall_848.jpg Annotations/fall_848.xml +JPEGImages/fall_31.jpg Annotations/fall_31.xml +JPEGImages/fall_152.jpg Annotations/fall_152.xml +JPEGImages/fall_156.jpg Annotations/fall_156.xml +JPEGImages/fall_621.jpg Annotations/fall_621.xml +JPEGImages/fall_5.jpg Annotations/fall_5.xml +JPEGImages/fall_3.jpg Annotations/fall_3.xml +JPEGImages/fall_1053.jpg Annotations/fall_1053.xml +JPEGImages/fall_343.jpg Annotations/fall_343.xml +JPEGImages/fall_1438.jpg Annotations/fall_1438.xml +JPEGImages/fall_317.jpg Annotations/fall_317.xml +JPEGImages/fall_1090.jpg Annotations/fall_1090.xml +JPEGImages/fall_1298.jpg Annotations/fall_1298.xml +JPEGImages/fall_1433.jpg Annotations/fall_1433.xml +JPEGImages/fall_703.jpg Annotations/fall_703.xml +JPEGImages/fall_1407.jpg Annotations/fall_1407.xml +JPEGImages/fall_457.jpg Annotations/fall_457.xml +JPEGImages/fall_717.jpg Annotations/fall_717.xml +JPEGImages/fall_628.jpg Annotations/fall_628.xml +JPEGImages/fall_1074.jpg Annotations/fall_1074.xml +JPEGImages/fall_1272.jpg Annotations/fall_1272.xml +JPEGImages/fall_118.jpg Annotations/fall_118.xml +JPEGImages/fall_664.jpg Annotations/fall_664.xml +JPEGImages/fall_933.jpg Annotations/fall_933.xml +JPEGImages/fall_1316.jpg Annotations/fall_1316.xml +JPEGImages/fall_813.jpg Annotations/fall_813.xml +JPEGImages/fall_1046.jpg Annotations/fall_1046.xml +JPEGImages/fall_1374.jpg Annotations/fall_1374.xml +JPEGImages/fall_747.jpg Annotations/fall_747.xml +JPEGImages/fall_1151.jpg Annotations/fall_1151.xml +JPEGImages/fall_718.jpg Annotations/fall_718.xml +JPEGImages/fall_145.jpg Annotations/fall_145.xml +JPEGImages/fall_653.jpg Annotations/fall_653.xml +JPEGImages/fall_683.jpg Annotations/fall_683.xml +JPEGImages/fall_985.jpg Annotations/fall_985.xml +JPEGImages/fall_1132.jpg Annotations/fall_1132.xml +JPEGImages/fall_1100.jpg Annotations/fall_1100.xml +JPEGImages/fall_1040.jpg Annotations/fall_1040.xml +JPEGImages/fall_762.jpg Annotations/fall_762.xml +JPEGImages/fall_1278.jpg Annotations/fall_1278.xml +JPEGImages/fall_325.jpg Annotations/fall_325.xml +JPEGImages/fall_1180.jpg Annotations/fall_1180.xml +JPEGImages/fall_689.jpg Annotations/fall_689.xml +JPEGImages/fall_1066.jpg Annotations/fall_1066.xml +JPEGImages/fall_1324.jpg Annotations/fall_1324.xml +JPEGImages/fall_1088.jpg Annotations/fall_1088.xml +JPEGImages/fall_415.jpg Annotations/fall_415.xml +JPEGImages/fall_864.jpg Annotations/fall_864.xml +JPEGImages/fall_792.jpg Annotations/fall_792.xml +JPEGImages/fall_258.jpg Annotations/fall_258.xml +JPEGImages/fall_169.jpg Annotations/fall_169.xml +JPEGImages/fall_1087.jpg Annotations/fall_1087.xml +JPEGImages/fall_862.jpg Annotations/fall_862.xml +JPEGImages/fall_794.jpg Annotations/fall_794.xml +JPEGImages/fall_12.jpg Annotations/fall_12.xml +JPEGImages/fall_184.jpg Annotations/fall_184.xml +JPEGImages/fall_1010.jpg Annotations/fall_1010.xml +JPEGImages/fall_310.jpg Annotations/fall_310.xml +JPEGImages/fall_1121.jpg Annotations/fall_1121.xml +JPEGImages/fall_1009.jpg Annotations/fall_1009.xml +JPEGImages/fall_149.jpg Annotations/fall_149.xml +JPEGImages/fall_1416.jpg Annotations/fall_1416.xml +JPEGImages/fall_1343.jpg Annotations/fall_1343.xml +JPEGImages/fall_1107.jpg Annotations/fall_1107.xml +JPEGImages/fall_723.jpg Annotations/fall_723.xml +JPEGImages/fall_1051.jpg Annotations/fall_1051.xml +JPEGImages/fall_536.jpg Annotations/fall_536.xml +JPEGImages/fall_969.jpg Annotations/fall_969.xml +JPEGImages/fall_1126.jpg Annotations/fall_1126.xml +JPEGImages/fall_1254.jpg Annotations/fall_1254.xml +JPEGImages/fall_1317.jpg Annotations/fall_1317.xml +JPEGImages/fall_42.jpg Annotations/fall_42.xml +JPEGImages/fall_1146.jpg Annotations/fall_1146.xml +JPEGImages/fall_619.jpg Annotations/fall_619.xml +JPEGImages/fall_748.jpg Annotations/fall_748.xml +JPEGImages/fall_361.jpg Annotations/fall_361.xml +JPEGImages/fall_871.jpg Annotations/fall_871.xml +JPEGImages/fall_1022.jpg Annotations/fall_1022.xml +JPEGImages/fall_318.jpg Annotations/fall_318.xml +JPEGImages/fall_644.jpg Annotations/fall_644.xml +JPEGImages/fall_93.jpg Annotations/fall_93.xml +JPEGImages/fall_1239.jpg Annotations/fall_1239.xml +JPEGImages/fall_1318.jpg Annotations/fall_1318.xml +JPEGImages/fall_25.jpg Annotations/fall_25.xml +JPEGImages/fall_538.jpg Annotations/fall_538.xml +JPEGImages/fall_19.jpg Annotations/fall_19.xml +JPEGImages/fall_845.jpg Annotations/fall_845.xml +JPEGImages/fall_1232.jpg Annotations/fall_1232.xml +JPEGImages/fall_959.jpg Annotations/fall_959.xml +JPEGImages/fall_645.jpg Annotations/fall_645.xml +JPEGImages/fall_1023.jpg Annotations/fall_1023.xml +JPEGImages/fall_1206.jpg Annotations/fall_1206.xml +JPEGImages/fall_1091.jpg Annotations/fall_1091.xml +JPEGImages/fall_593.jpg Annotations/fall_593.xml +JPEGImages/fall_824.jpg Annotations/fall_824.xml +JPEGImages/fall_811.jpg Annotations/fall_811.xml +JPEGImages/fall_282.jpg Annotations/fall_282.xml +JPEGImages/fall_1159.jpg Annotations/fall_1159.xml +JPEGImages/fall_393.jpg Annotations/fall_393.xml +JPEGImages/fall_82.jpg Annotations/fall_82.xml +JPEGImages/fall_458.jpg Annotations/fall_458.xml +JPEGImages/fall_1209.jpg Annotations/fall_1209.xml +JPEGImages/fall_945.jpg Annotations/fall_945.xml +JPEGImages/fall_719.jpg Annotations/fall_719.xml +JPEGImages/fall_135.jpg Annotations/fall_135.xml +JPEGImages/fall_500.jpg Annotations/fall_500.xml +JPEGImages/fall_1399.jpg Annotations/fall_1399.xml +JPEGImages/fall_1422.jpg Annotations/fall_1422.xml +JPEGImages/fall_173.jpg Annotations/fall_173.xml +JPEGImages/fall_1093.jpg Annotations/fall_1093.xml +JPEGImages/fall_440.jpg Annotations/fall_440.xml +JPEGImages/fall_996.jpg Annotations/fall_996.xml +JPEGImages/fall_844.jpg Annotations/fall_844.xml +JPEGImages/fall_1365.jpg Annotations/fall_1365.xml +JPEGImages/fall_1379.jpg Annotations/fall_1379.xml +JPEGImages/fall_1290.jpg Annotations/fall_1290.xml +JPEGImages/fall_353.jpg Annotations/fall_353.xml +JPEGImages/fall_247.jpg Annotations/fall_247.xml +JPEGImages/fall_172.jpg Annotations/fall_172.xml +JPEGImages/fall_163.jpg Annotations/fall_163.xml +JPEGImages/fall_248.jpg Annotations/fall_248.xml +JPEGImages/fall_1078.jpg Annotations/fall_1078.xml +JPEGImages/fall_305.jpg Annotations/fall_305.xml +JPEGImages/fall_48.jpg Annotations/fall_48.xml +JPEGImages/fall_6.jpg Annotations/fall_6.xml +JPEGImages/fall_1067.jpg Annotations/fall_1067.xml +JPEGImages/fall_801.jpg Annotations/fall_801.xml +JPEGImages/fall_932.jpg Annotations/fall_932.xml +JPEGImages/fall_940.jpg Annotations/fall_940.xml +JPEGImages/fall_1096.jpg Annotations/fall_1096.xml +JPEGImages/fall_66.jpg Annotations/fall_66.xml +JPEGImages/fall_672.jpg Annotations/fall_672.xml +JPEGImages/fall_1143.jpg Annotations/fall_1143.xml +JPEGImages/fall_283.jpg Annotations/fall_283.xml +JPEGImages/fall_1408.jpg Annotations/fall_1408.xml +JPEGImages/fall_1173.jpg Annotations/fall_1173.xml +JPEGImages/fall_1341.jpg Annotations/fall_1341.xml +JPEGImages/fall_489.jpg Annotations/fall_489.xml +JPEGImages/fall_1222.jpg Annotations/fall_1222.xml +JPEGImages/fall_222.jpg Annotations/fall_222.xml +JPEGImages/fall_948.jpg Annotations/fall_948.xml +JPEGImages/fall_35.jpg Annotations/fall_35.xml +JPEGImages/fall_1196.jpg Annotations/fall_1196.xml +JPEGImages/fall_142.jpg Annotations/fall_142.xml +JPEGImages/fall_634.jpg Annotations/fall_634.xml +JPEGImages/fall_10.jpg Annotations/fall_10.xml +JPEGImages/fall_759.jpg Annotations/fall_759.xml +JPEGImages/fall_1000.jpg Annotations/fall_1000.xml +JPEGImages/fall_356.jpg Annotations/fall_356.xml +JPEGImages/fall_767.jpg Annotations/fall_767.xml +JPEGImages/fall_389.jpg Annotations/fall_389.xml +JPEGImages/fall_1092.jpg Annotations/fall_1092.xml +JPEGImages/fall_471.jpg Annotations/fall_471.xml +JPEGImages/fall_486.jpg Annotations/fall_486.xml +JPEGImages/fall_888.jpg Annotations/fall_888.xml +JPEGImages/fall_756.jpg Annotations/fall_756.xml +JPEGImages/fall_1333.jpg Annotations/fall_1333.xml +JPEGImages/fall_741.jpg Annotations/fall_741.xml +JPEGImages/fall_319.jpg Annotations/fall_319.xml +JPEGImages/fall_369.jpg Annotations/fall_369.xml +JPEGImages/fall_78.jpg Annotations/fall_78.xml +JPEGImages/fall_157.jpg Annotations/fall_157.xml +JPEGImages/fall_246.jpg Annotations/fall_246.xml +JPEGImages/fall_473.jpg Annotations/fall_473.xml +JPEGImages/fall_109.jpg Annotations/fall_109.xml +JPEGImages/fall_76.jpg Annotations/fall_76.xml +JPEGImages/fall_1367.jpg Annotations/fall_1367.xml +JPEGImages/fall_1045.jpg Annotations/fall_1045.xml +JPEGImages/fall_1419.jpg Annotations/fall_1419.xml +JPEGImages/fall_567.jpg Annotations/fall_567.xml +JPEGImages/fall_1305.jpg Annotations/fall_1305.xml +JPEGImages/fall_1197.jpg Annotations/fall_1197.xml +JPEGImages/fall_1142.jpg Annotations/fall_1142.xml +JPEGImages/fall_1166.jpg Annotations/fall_1166.xml +JPEGImages/fall_977.jpg Annotations/fall_977.xml +JPEGImages/fall_881.jpg Annotations/fall_881.xml +JPEGImages/fall_286.jpg Annotations/fall_286.xml +JPEGImages/fall_1155.jpg Annotations/fall_1155.xml +JPEGImages/fall_1112.jpg Annotations/fall_1112.xml +JPEGImages/fall_926.jpg Annotations/fall_926.xml +JPEGImages/fall_465.jpg Annotations/fall_465.xml +JPEGImages/fall_37.jpg Annotations/fall_37.xml +JPEGImages/fall_144.jpg Annotations/fall_144.xml +JPEGImages/fall_1240.jpg Annotations/fall_1240.xml +JPEGImages/fall_906.jpg Annotations/fall_906.xml +JPEGImages/fall_631.jpg Annotations/fall_631.xml +JPEGImages/fall_998.jpg Annotations/fall_998.xml +JPEGImages/fall_869.jpg Annotations/fall_869.xml +JPEGImages/fall_978.jpg Annotations/fall_978.xml +JPEGImages/fall_745.jpg Annotations/fall_745.xml +JPEGImages/fall_842.jpg Annotations/fall_842.xml +JPEGImages/fall_680.jpg Annotations/fall_680.xml +JPEGImages/fall_1123.jpg Annotations/fall_1123.xml +JPEGImages/fall_962.jpg Annotations/fall_962.xml +JPEGImages/fall_599.jpg Annotations/fall_599.xml +JPEGImages/fall_873.jpg Annotations/fall_873.xml +JPEGImages/fall_855.jpg Annotations/fall_855.xml +JPEGImages/fall_1170.jpg Annotations/fall_1170.xml +JPEGImages/fall_294.jpg Annotations/fall_294.xml +JPEGImages/fall_553.jpg Annotations/fall_553.xml +JPEGImages/fall_498.jpg Annotations/fall_498.xml +JPEGImages/fall_1308.jpg Annotations/fall_1308.xml +JPEGImages/fall_95.jpg Annotations/fall_95.xml +JPEGImages/fall_386.jpg Annotations/fall_386.xml +JPEGImages/fall_804.jpg Annotations/fall_804.xml +JPEGImages/fall_1354.jpg Annotations/fall_1354.xml +JPEGImages/fall_1427.jpg Annotations/fall_1427.xml +JPEGImages/fall_1041.jpg Annotations/fall_1041.xml +JPEGImages/fall_1071.jpg Annotations/fall_1071.xml +JPEGImages/fall_92.jpg Annotations/fall_92.xml +JPEGImages/fall_698.jpg Annotations/fall_698.xml +JPEGImages/fall_434.jpg Annotations/fall_434.xml +JPEGImages/fall_133.jpg Annotations/fall_133.xml +JPEGImages/fall_1385.jpg Annotations/fall_1385.xml +JPEGImages/fall_720.jpg Annotations/fall_720.xml +JPEGImages/fall_472.jpg Annotations/fall_472.xml +JPEGImages/fall_1114.jpg Annotations/fall_1114.xml +JPEGImages/fall_544.jpg Annotations/fall_544.xml +JPEGImages/fall_610.jpg Annotations/fall_610.xml +JPEGImages/fall_224.jpg Annotations/fall_224.xml +JPEGImages/fall_447.jpg Annotations/fall_447.xml +JPEGImages/fall_221.jpg Annotations/fall_221.xml +JPEGImages/fall_695.jpg Annotations/fall_695.xml +JPEGImages/fall_899.jpg Annotations/fall_899.xml +JPEGImages/fall_1424.jpg Annotations/fall_1424.xml +JPEGImages/fall_289.jpg Annotations/fall_289.xml +JPEGImages/fall_708.jpg Annotations/fall_708.xml +JPEGImages/fall_638.jpg Annotations/fall_638.xml +JPEGImages/fall_1393.jpg Annotations/fall_1393.xml +JPEGImages/fall_1211.jpg Annotations/fall_1211.xml +JPEGImages/fall_1080.jpg Annotations/fall_1080.xml +JPEGImages/fall_814.jpg Annotations/fall_814.xml +JPEGImages/fall_815.jpg Annotations/fall_815.xml +JPEGImages/fall_21.jpg Annotations/fall_21.xml +JPEGImages/fall_8.jpg Annotations/fall_8.xml +JPEGImages/fall_1200.jpg Annotations/fall_1200.xml +JPEGImages/fall_966.jpg Annotations/fall_966.xml +JPEGImages/fall_1184.jpg Annotations/fall_1184.xml +JPEGImages/fall_296.jpg Annotations/fall_296.xml +JPEGImages/fall_533.jpg Annotations/fall_533.xml +JPEGImages/fall_1144.jpg Annotations/fall_1144.xml +JPEGImages/fall_1042.jpg Annotations/fall_1042.xml +JPEGImages/fall_861.jpg Annotations/fall_861.xml +JPEGImages/fall_1409.jpg Annotations/fall_1409.xml +JPEGImages/fall_523.jpg Annotations/fall_523.xml +JPEGImages/fall_276.jpg Annotations/fall_276.xml +JPEGImages/fall_484.jpg Annotations/fall_484.xml +JPEGImages/fall_701.jpg Annotations/fall_701.xml +JPEGImages/fall_1274.jpg Annotations/fall_1274.xml +JPEGImages/fall_333.jpg Annotations/fall_333.xml +JPEGImages/fall_1220.jpg Annotations/fall_1220.xml +JPEGImages/fall_385.jpg Annotations/fall_385.xml +JPEGImages/fall_1262.jpg Annotations/fall_1262.xml +JPEGImages/fall_1141.jpg Annotations/fall_1141.xml +JPEGImages/fall_370.jpg Annotations/fall_370.xml +JPEGImages/fall_807.jpg Annotations/fall_807.xml +JPEGImages/fall_1403.jpg Annotations/fall_1403.xml +JPEGImages/fall_257.jpg Annotations/fall_257.xml +JPEGImages/fall_337.jpg Annotations/fall_337.xml +JPEGImages/fall_411.jpg Annotations/fall_411.xml +JPEGImages/fall_99.jpg Annotations/fall_99.xml +JPEGImages/fall_351.jpg Annotations/fall_351.xml +JPEGImages/fall_919.jpg Annotations/fall_919.xml +JPEGImages/fall_648.jpg Annotations/fall_648.xml +JPEGImages/fall_1128.jpg Annotations/fall_1128.xml +JPEGImages/fall_124.jpg Annotations/fall_124.xml +JPEGImages/fall_378.jpg Annotations/fall_378.xml +JPEGImages/fall_733.jpg Annotations/fall_733.xml +JPEGImages/fall_704.jpg Annotations/fall_704.xml +JPEGImages/fall_892.jpg Annotations/fall_892.xml +JPEGImages/fall_413.jpg Annotations/fall_413.xml +JPEGImages/fall_854.jpg Annotations/fall_854.xml +JPEGImages/fall_676.jpg Annotations/fall_676.xml +JPEGImages/fall_1301.jpg Annotations/fall_1301.xml +JPEGImages/fall_461.jpg Annotations/fall_461.xml +JPEGImages/fall_1130.jpg Annotations/fall_1130.xml +JPEGImages/fall_1330.jpg Annotations/fall_1330.xml +JPEGImages/fall_1101.jpg Annotations/fall_1101.xml +JPEGImages/fall_1098.jpg Annotations/fall_1098.xml +JPEGImages/fall_1085.jpg Annotations/fall_1085.xml +JPEGImages/fall_1208.jpg Annotations/fall_1208.xml +JPEGImages/fall_271.jpg Annotations/fall_271.xml +JPEGImages/fall_1109.jpg Annotations/fall_1109.xml +JPEGImages/fall_75.jpg Annotations/fall_75.xml +JPEGImages/fall_791.jpg Annotations/fall_791.xml +JPEGImages/fall_949.jpg Annotations/fall_949.xml +JPEGImages/fall_384.jpg Annotations/fall_384.xml +JPEGImages/fall_1405.jpg Annotations/fall_1405.xml +JPEGImages/fall_1152.jpg Annotations/fall_1152.xml +JPEGImages/fall_105.jpg Annotations/fall_105.xml +JPEGImages/fall_1271.jpg Annotations/fall_1271.xml +JPEGImages/fall_1414.jpg Annotations/fall_1414.xml +JPEGImages/fall_526.jpg Annotations/fall_526.xml +JPEGImages/fall_1038.jpg Annotations/fall_1038.xml +JPEGImages/fall_831.jpg Annotations/fall_831.xml +JPEGImages/fall_220.jpg Annotations/fall_220.xml +JPEGImages/fall_923.jpg Annotations/fall_923.xml +JPEGImages/fall_1002.jpg Annotations/fall_1002.xml +JPEGImages/fall_1378.jpg Annotations/fall_1378.xml +JPEGImages/fall_306.jpg Annotations/fall_306.xml +JPEGImages/fall_1226.jpg Annotations/fall_1226.xml +JPEGImages/fall_1103.jpg Annotations/fall_1103.xml +JPEGImages/fall_1247.jpg Annotations/fall_1247.xml +JPEGImages/fall_402.jpg Annotations/fall_402.xml +JPEGImages/fall_481.jpg Annotations/fall_481.xml +JPEGImages/fall_573.jpg Annotations/fall_573.xml +JPEGImages/fall_1164.jpg Annotations/fall_1164.xml +JPEGImages/fall_879.jpg Annotations/fall_879.xml +JPEGImages/fall_301.jpg Annotations/fall_301.xml +JPEGImages/fall_1099.jpg Annotations/fall_1099.xml +JPEGImages/fall_961.jpg Annotations/fall_961.xml +JPEGImages/fall_502.jpg Annotations/fall_502.xml +JPEGImages/fall_908.jpg Annotations/fall_908.xml +JPEGImages/fall_269.jpg Annotations/fall_269.xml +JPEGImages/fall_497.jpg Annotations/fall_497.xml +JPEGImages/fall_902.jpg Annotations/fall_902.xml +JPEGImages/fall_136.jpg Annotations/fall_136.xml +JPEGImages/fall_455.jpg Annotations/fall_455.xml +JPEGImages/fall_652.jpg Annotations/fall_652.xml +JPEGImages/fall_817.jpg Annotations/fall_817.xml +JPEGImages/fall_1326.jpg Annotations/fall_1326.xml +JPEGImages/fall_650.jpg Annotations/fall_650.xml +JPEGImages/fall_835.jpg Annotations/fall_835.xml +JPEGImages/fall_30.jpg Annotations/fall_30.xml +JPEGImages/fall_640.jpg Annotations/fall_640.xml +JPEGImages/fall_470.jpg Annotations/fall_470.xml +JPEGImages/fall_851.jpg Annotations/fall_851.xml +JPEGImages/fall_308.jpg Annotations/fall_308.xml +JPEGImages/fall_936.jpg Annotations/fall_936.xml +JPEGImages/fall_404.jpg Annotations/fall_404.xml +JPEGImages/fall_1115.jpg Annotations/fall_1115.xml +JPEGImages/fall_344.jpg Annotations/fall_344.xml +JPEGImages/fall_681.jpg Annotations/fall_681.xml +JPEGImages/fall_122.jpg Annotations/fall_122.xml +JPEGImages/fall_834.jpg Annotations/fall_834.xml +JPEGImages/fall_582.jpg Annotations/fall_582.xml +JPEGImages/fall_905.jpg Annotations/fall_905.xml +JPEGImages/fall_250.jpg Annotations/fall_250.xml +JPEGImages/fall_1410.jpg Annotations/fall_1410.xml +JPEGImages/fall_182.jpg Annotations/fall_182.xml +JPEGImages/fall_424.jpg Annotations/fall_424.xml +JPEGImages/fall_506.jpg Annotations/fall_506.xml +JPEGImages/fall_1418.jpg Annotations/fall_1418.xml +JPEGImages/fall_139.jpg Annotations/fall_139.xml +JPEGImages/fall_876.jpg Annotations/fall_876.xml +JPEGImages/fall_707.jpg Annotations/fall_707.xml +JPEGImages/fall_1355.jpg Annotations/fall_1355.xml +JPEGImages/fall_1047.jpg Annotations/fall_1047.xml +JPEGImages/fall_1113.jpg Annotations/fall_1113.xml +JPEGImages/fall_818.jpg Annotations/fall_818.xml +JPEGImages/fall_1375.jpg Annotations/fall_1375.xml +JPEGImages/fall_206.jpg Annotations/fall_206.xml +JPEGImages/fall_414.jpg Annotations/fall_414.xml +JPEGImages/fall_52.jpg Annotations/fall_52.xml +JPEGImages/fall_787.jpg Annotations/fall_787.xml +JPEGImages/fall_1260.jpg Annotations/fall_1260.xml +JPEGImages/fall_339.jpg Annotations/fall_339.xml +JPEGImages/fall_134.jpg Annotations/fall_134.xml +JPEGImages/fall_90.jpg Annotations/fall_90.xml +JPEGImages/fall_604.jpg Annotations/fall_604.xml +JPEGImages/fall_190.jpg Annotations/fall_190.xml +JPEGImages/fall_463.jpg Annotations/fall_463.xml +JPEGImages/fall_366.jpg Annotations/fall_366.xml +JPEGImages/fall_917.jpg Annotations/fall_917.xml +JPEGImages/fall_1058.jpg Annotations/fall_1058.xml +JPEGImages/fall_971.jpg Annotations/fall_971.xml +JPEGImages/fall_179.jpg Annotations/fall_179.xml +JPEGImages/fall_1388.jpg Annotations/fall_1388.xml +JPEGImages/fall_1084.jpg Annotations/fall_1084.xml +JPEGImages/fall_374.jpg Annotations/fall_374.xml +JPEGImages/fall_1381.jpg Annotations/fall_1381.xml +JPEGImages/fall_930.jpg Annotations/fall_930.xml +JPEGImages/fall_956.jpg Annotations/fall_956.xml +JPEGImages/fall_446.jpg Annotations/fall_446.xml +JPEGImages/fall_588.jpg Annotations/fall_588.xml +JPEGImages/fall_1307.jpg Annotations/fall_1307.xml +JPEGImages/fall_927.jpg Annotations/fall_927.xml +JPEGImages/fall_1105.jpg Annotations/fall_1105.xml +JPEGImages/fall_1207.jpg Annotations/fall_1207.xml +JPEGImages/fall_302.jpg Annotations/fall_302.xml +JPEGImages/fall_1248.jpg Annotations/fall_1248.xml +JPEGImages/fall_1043.jpg Annotations/fall_1043.xml +JPEGImages/fall_1213.jpg Annotations/fall_1213.xml +JPEGImages/fall_262.jpg Annotations/fall_262.xml +JPEGImages/fall_694.jpg Annotations/fall_694.xml +JPEGImages/fall_984.jpg Annotations/fall_984.xml +JPEGImages/fall_418.jpg Annotations/fall_418.xml +JPEGImages/fall_702.jpg Annotations/fall_702.xml +JPEGImages/fall_774.jpg Annotations/fall_774.xml +JPEGImages/fall_806.jpg Annotations/fall_806.xml +JPEGImages/fall_491.jpg Annotations/fall_491.xml +JPEGImages/fall_775.jpg Annotations/fall_775.xml +JPEGImages/fall_284.jpg Annotations/fall_284.xml +JPEGImages/fall_1439.jpg Annotations/fall_1439.xml +JPEGImages/fall_443.jpg Annotations/fall_443.xml +JPEGImages/fall_285.jpg Annotations/fall_285.xml +JPEGImages/fall_893.jpg Annotations/fall_893.xml +JPEGImages/fall_580.jpg Annotations/fall_580.xml +JPEGImages/fall_527.jpg Annotations/fall_527.xml +JPEGImages/fall_943.jpg Annotations/fall_943.xml +JPEGImages/fall_1277.jpg Annotations/fall_1277.xml +JPEGImages/fall_975.jpg Annotations/fall_975.xml +JPEGImages/fall_348.jpg Annotations/fall_348.xml +JPEGImages/fall_911.jpg Annotations/fall_911.xml +JPEGImages/fall_1070.jpg Annotations/fall_1070.xml +JPEGImages/fall_372.jpg Annotations/fall_372.xml +JPEGImages/fall_663.jpg Annotations/fall_663.xml +JPEGImages/fall_1031.jpg Annotations/fall_1031.xml +JPEGImages/fall_1309.jpg Annotations/fall_1309.xml +JPEGImages/fall_509.jpg Annotations/fall_509.xml +JPEGImages/fall_1108.jpg Annotations/fall_1108.xml +JPEGImages/fall_183.jpg Annotations/fall_183.xml +JPEGImages/fall_207.jpg Annotations/fall_207.xml +JPEGImages/fall_973.jpg Annotations/fall_973.xml +JPEGImages/fall_203.jpg Annotations/fall_203.xml +JPEGImages/fall_1018.jpg Annotations/fall_1018.xml +JPEGImages/fall_1389.jpg Annotations/fall_1389.xml +JPEGImages/fall_106.jpg Annotations/fall_106.xml +JPEGImages/fall_1249.jpg Annotations/fall_1249.xml +JPEGImages/fall_155.jpg Annotations/fall_155.xml +JPEGImages/fall_916.jpg Annotations/fall_916.xml +JPEGImages/fall_91.jpg Annotations/fall_91.xml +JPEGImages/fall_1064.jpg Annotations/fall_1064.xml +JPEGImages/fall_1012.jpg Annotations/fall_1012.xml +JPEGImages/fall_754.jpg Annotations/fall_754.xml +JPEGImages/fall_375.jpg Annotations/fall_375.xml +JPEGImages/fall_416.jpg Annotations/fall_416.xml +JPEGImages/fall_454.jpg Annotations/fall_454.xml +JPEGImages/fall_111.jpg Annotations/fall_111.xml +JPEGImages/fall_1054.jpg Annotations/fall_1054.xml +JPEGImages/fall_622.jpg Annotations/fall_622.xml +JPEGImages/fall_1267.jpg Annotations/fall_1267.xml +JPEGImages/fall_805.jpg Annotations/fall_805.xml +JPEGImages/fall_373.jpg Annotations/fall_373.xml +JPEGImages/fall_1400.jpg Annotations/fall_1400.xml +JPEGImages/fall_1076.jpg Annotations/fall_1076.xml +JPEGImages/fall_727.jpg Annotations/fall_727.xml +JPEGImages/fall_38.jpg Annotations/fall_38.xml +JPEGImages/fall_26.jpg Annotations/fall_26.xml +JPEGImages/fall_688.jpg Annotations/fall_688.xml +JPEGImages/fall_662.jpg Annotations/fall_662.xml +JPEGImages/fall_1134.jpg Annotations/fall_1134.xml +JPEGImages/fall_512.jpg Annotations/fall_512.xml +JPEGImages/fall_474.jpg Annotations/fall_474.xml +JPEGImages/fall_534.jpg Annotations/fall_534.xml +JPEGImages/fall_176.jpg Annotations/fall_176.xml +JPEGImages/fall_264.jpg Annotations/fall_264.xml +JPEGImages/fall_637.jpg Annotations/fall_637.xml +JPEGImages/fall_552.jpg Annotations/fall_552.xml +JPEGImages/fall_1368.jpg Annotations/fall_1368.xml +JPEGImages/fall_1292.jpg Annotations/fall_1292.xml +JPEGImages/fall_87.jpg Annotations/fall_87.xml +JPEGImages/fall_515.jpg Annotations/fall_515.xml +JPEGImages/fall_132.jpg Annotations/fall_132.xml +JPEGImages/fall_569.jpg Annotations/fall_569.xml +JPEGImages/fall_121.jpg Annotations/fall_121.xml +JPEGImages/fall_765.jpg Annotations/fall_765.xml +JPEGImages/fall_422.jpg Annotations/fall_422.xml +JPEGImages/fall_493.jpg Annotations/fall_493.xml +JPEGImages/fall_1124.jpg Annotations/fall_1124.xml +JPEGImages/fall_547.jpg Annotations/fall_547.xml +JPEGImages/fall_587.jpg Annotations/fall_587.xml +JPEGImages/fall_1435.jpg Annotations/fall_1435.xml +JPEGImages/fall_1195.jpg Annotations/fall_1195.xml +JPEGImages/fall_1243.jpg Annotations/fall_1243.xml +JPEGImages/fall_1384.jpg Annotations/fall_1384.xml +JPEGImages/fall_865.jpg Annotations/fall_865.xml +JPEGImages/fall_1068.jpg Annotations/fall_1068.xml +JPEGImages/fall_986.jpg Annotations/fall_986.xml +JPEGImages/fall_822.jpg Annotations/fall_822.xml +JPEGImages/fall_450.jpg Annotations/fall_450.xml +JPEGImages/fall_705.jpg Annotations/fall_705.xml +JPEGImages/fall_494.jpg Annotations/fall_494.xml +JPEGImages/fall_188.jpg Annotations/fall_188.xml +JPEGImages/fall_740.jpg Annotations/fall_740.xml +JPEGImages/fall_336.jpg Annotations/fall_336.xml +JPEGImages/fall_379.jpg Annotations/fall_379.xml +JPEGImages/fall_913.jpg Annotations/fall_913.xml +JPEGImages/fall_79.jpg Annotations/fall_79.xml +JPEGImages/fall_1037.jpg Annotations/fall_1037.xml +JPEGImages/fall_408.jpg Annotations/fall_408.xml +JPEGImages/fall_1353.jpg Annotations/fall_1353.xml +JPEGImages/fall_639.jpg Annotations/fall_639.xml +JPEGImages/fall_1156.jpg Annotations/fall_1156.xml +JPEGImages/fall_1261.jpg Annotations/fall_1261.xml +JPEGImages/fall_539.jpg Annotations/fall_539.xml +JPEGImages/fall_724.jpg Annotations/fall_724.xml +JPEGImages/fall_218.jpg Annotations/fall_218.xml +JPEGImages/fall_709.jpg Annotations/fall_709.xml +JPEGImages/fall_65.jpg Annotations/fall_65.xml +JPEGImages/fall_909.jpg Annotations/fall_909.xml +JPEGImages/fall_838.jpg Annotations/fall_838.xml +JPEGImages/fall_1065.jpg Annotations/fall_1065.xml +JPEGImages/fall_259.jpg Annotations/fall_259.xml +JPEGImages/fall_476.jpg Annotations/fall_476.xml +JPEGImages/fall_643.jpg Annotations/fall_643.xml +JPEGImages/fall_753.jpg Annotations/fall_753.xml +JPEGImages/fall_1322.jpg Annotations/fall_1322.xml +JPEGImages/fall_887.jpg Annotations/fall_887.xml +JPEGImages/fall_903.jpg Annotations/fall_903.xml +JPEGImages/fall_94.jpg Annotations/fall_94.xml +JPEGImages/fall_749.jpg Annotations/fall_749.xml +JPEGImages/fall_1325.jpg Annotations/fall_1325.xml +JPEGImages/fall_1060.jpg Annotations/fall_1060.xml +JPEGImages/fall_904.jpg Annotations/fall_904.xml +JPEGImages/fall_789.jpg Annotations/fall_789.xml +JPEGImages/fall_1120.jpg Annotations/fall_1120.xml +JPEGImages/fall_464.jpg Annotations/fall_464.xml +JPEGImages/fall_436.jpg Annotations/fall_436.xml +JPEGImages/fall_273.jpg Annotations/fall_273.xml +JPEGImages/fall_300.jpg Annotations/fall_300.xml +JPEGImages/fall_1136.jpg Annotations/fall_1136.xml +JPEGImages/fall_1122.jpg Annotations/fall_1122.xml +JPEGImages/fall_267.jpg Annotations/fall_267.xml +JPEGImages/fall_1310.jpg Annotations/fall_1310.xml +JPEGImages/fall_670.jpg Annotations/fall_670.xml +JPEGImages/fall_102.jpg Annotations/fall_102.xml +JPEGImages/fall_934.jpg Annotations/fall_934.xml +JPEGImages/fall_419.jpg Annotations/fall_419.xml +JPEGImages/fall_847.jpg Annotations/fall_847.xml +JPEGImages/fall_1233.jpg Annotations/fall_1233.xml +JPEGImages/fall_782.jpg Annotations/fall_782.xml +JPEGImages/fall_972.jpg Annotations/fall_972.xml +JPEGImages/fall_1116.jpg Annotations/fall_1116.xml +JPEGImages/fall_482.jpg Annotations/fall_482.xml +JPEGImages/fall_1360.jpg Annotations/fall_1360.xml +JPEGImages/fall_800.jpg Annotations/fall_800.xml +JPEGImages/fall_1176.jpg Annotations/fall_1176.xml +JPEGImages/fall_729.jpg Annotations/fall_729.xml +JPEGImages/fall_241.jpg Annotations/fall_241.xml +JPEGImages/fall_1079.jpg Annotations/fall_1079.xml +JPEGImages/fall_1377.jpg Annotations/fall_1377.xml +JPEGImages/fall_23.jpg Annotations/fall_23.xml +JPEGImages/fall_303.jpg Annotations/fall_303.xml +JPEGImages/fall_232.jpg Annotations/fall_232.xml +JPEGImages/fall_641.jpg Annotations/fall_641.xml +JPEGImages/fall_571.jpg Annotations/fall_571.xml +JPEGImages/fall_627.jpg Annotations/fall_627.xml +JPEGImages/fall_309.jpg Annotations/fall_309.xml +JPEGImages/fall_1117.jpg Annotations/fall_1117.xml +JPEGImages/fall_1204.jpg Annotations/fall_1204.xml +JPEGImages/fall_1336.jpg Annotations/fall_1336.xml +JPEGImages/fall_535.jpg Annotations/fall_535.xml +JPEGImages/fall_488.jpg Annotations/fall_488.xml +JPEGImages/fall_97.jpg Annotations/fall_97.xml +JPEGImages/fall_1095.jpg Annotations/fall_1095.xml +JPEGImages/fall_1119.jpg Annotations/fall_1119.xml +JPEGImages/fall_101.jpg Annotations/fall_101.xml +JPEGImages/fall_548.jpg Annotations/fall_548.xml +JPEGImages/fall_997.jpg Annotations/fall_997.xml +JPEGImages/fall_20.jpg Annotations/fall_20.xml +JPEGImages/fall_508.jpg Annotations/fall_508.xml +JPEGImages/fall_590.jpg Annotations/fall_590.xml +JPEGImages/fall_395.jpg Annotations/fall_395.xml +JPEGImages/fall_550.jpg Annotations/fall_550.xml +JPEGImages/fall_371.jpg Annotations/fall_371.xml +JPEGImages/fall_358.jpg Annotations/fall_358.xml +JPEGImages/fall_976.jpg Annotations/fall_976.xml +JPEGImages/fall_525.jpg Annotations/fall_525.xml +JPEGImages/fall_1286.jpg Annotations/fall_1286.xml +JPEGImages/fall_171.jpg Annotations/fall_171.xml +JPEGImages/fall_137.jpg Annotations/fall_137.xml +JPEGImages/fall_1320.jpg Annotations/fall_1320.xml +JPEGImages/fall_1404.jpg Annotations/fall_1404.xml +JPEGImages/fall_522.jpg Annotations/fall_522.xml +JPEGImages/fall_565.jpg Annotations/fall_565.xml +JPEGImages/fall_937.jpg Annotations/fall_937.xml +JPEGImages/fall_62.jpg Annotations/fall_62.xml +JPEGImages/fall_403.jpg Annotations/fall_403.xml +JPEGImages/fall_924.jpg Annotations/fall_924.xml +JPEGImages/fall_1373.jpg Annotations/fall_1373.xml +JPEGImages/fall_570.jpg Annotations/fall_570.xml +JPEGImages/fall_1423.jpg Annotations/fall_1423.xml +JPEGImages/fall_1154.jpg Annotations/fall_1154.xml +JPEGImages/fall_1072.jpg Annotations/fall_1072.xml +JPEGImages/fall_690.jpg Annotations/fall_690.xml +JPEGImages/fall_577.jpg Annotations/fall_577.xml +JPEGImages/fall_692.jpg Annotations/fall_692.xml +JPEGImages/fall_1258.jpg Annotations/fall_1258.xml +JPEGImages/fall_1179.jpg Annotations/fall_1179.xml +JPEGImages/fall_1352.jpg Annotations/fall_1352.xml +JPEGImages/fall_153.jpg Annotations/fall_153.xml +JPEGImages/fall_131.jpg Annotations/fall_131.xml +JPEGImages/fall_253.jpg Annotations/fall_253.xml +JPEGImages/fall_89.jpg Annotations/fall_89.xml +JPEGImages/fall_420.jpg Annotations/fall_420.xml +JPEGImages/fall_513.jpg Annotations/fall_513.xml +JPEGImages/fall_1284.jpg Annotations/fall_1284.xml +JPEGImages/fall_545.jpg Annotations/fall_545.xml +JPEGImages/fall_332.jpg Annotations/fall_332.xml +JPEGImages/fall_405.jpg Annotations/fall_405.xml +JPEGImages/fall_563.jpg Annotations/fall_563.xml +JPEGImages/fall_825.jpg Annotations/fall_825.xml +JPEGImages/fall_1303.jpg Annotations/fall_1303.xml +JPEGImages/fall_113.jpg Annotations/fall_113.xml +JPEGImages/fall_466.jpg Annotations/fall_466.xml +JPEGImages/fall_63.jpg Annotations/fall_63.xml +JPEGImages/fall_234.jpg Annotations/fall_234.xml +JPEGImages/fall_602.jpg Annotations/fall_602.xml +JPEGImages/fall_939.jpg Annotations/fall_939.xml +JPEGImages/fall_428.jpg Annotations/fall_428.xml +JPEGImages/fall_280.jpg Annotations/fall_280.xml +JPEGImages/fall_210.jpg Annotations/fall_210.xml +JPEGImages/fall_921.jpg Annotations/fall_921.xml +JPEGImages/fall_951.jpg Annotations/fall_951.xml +JPEGImages/fall_83.jpg Annotations/fall_83.xml +JPEGImages/fall_1030.jpg Annotations/fall_1030.xml +JPEGImages/fall_32.jpg Annotations/fall_32.xml +JPEGImages/fall_1168.jpg Annotations/fall_1168.xml +JPEGImages/fall_70.jpg Annotations/fall_70.xml +JPEGImages/fall_1382.jpg Annotations/fall_1382.xml +JPEGImages/fall_120.jpg Annotations/fall_120.xml +JPEGImages/fall_1300.jpg Annotations/fall_1300.xml +JPEGImages/fall_71.jpg Annotations/fall_71.xml +JPEGImages/fall_790.jpg Annotations/fall_790.xml +JPEGImages/fall_828.jpg Annotations/fall_828.xml +JPEGImages/fall_167.jpg Annotations/fall_167.xml +JPEGImages/fall_655.jpg Annotations/fall_655.xml +JPEGImages/fall_492.jpg Annotations/fall_492.xml +JPEGImages/fall_950.jpg Annotations/fall_950.xml +JPEGImages/fall_1296.jpg Annotations/fall_1296.xml +JPEGImages/fall_330.jpg Annotations/fall_330.xml +JPEGImages/fall_1007.jpg Annotations/fall_1007.xml +JPEGImages/fall_598.jpg Annotations/fall_598.xml +JPEGImages/fall_776.jpg Annotations/fall_776.xml +JPEGImages/fall_1440.jpg Annotations/fall_1440.xml +JPEGImages/fall_1432.jpg Annotations/fall_1432.xml +JPEGImages/fall_944.jpg Annotations/fall_944.xml +JPEGImages/fall_400.jpg Annotations/fall_400.xml +JPEGImages/fall_798.jpg Annotations/fall_798.xml +JPEGImages/fall_1391.jpg Annotations/fall_1391.xml +JPEGImages/fall_1083.jpg Annotations/fall_1083.xml +JPEGImages/fall_1193.jpg Annotations/fall_1193.xml +JPEGImages/fall_1221.jpg Annotations/fall_1221.xml +JPEGImages/fall_1025.jpg Annotations/fall_1025.xml +JPEGImages/fall_1357.jpg Annotations/fall_1357.xml +JPEGImages/fall_1210.jpg Annotations/fall_1210.xml +JPEGImages/fall_1413.jpg Annotations/fall_1413.xml +JPEGImages/fall_656.jpg Annotations/fall_656.xml +JPEGImages/fall_686.jpg Annotations/fall_686.xml +JPEGImages/fall_314.jpg Annotations/fall_314.xml +JPEGImages/fall_559.jpg Annotations/fall_559.xml +JPEGImages/fall_1293.jpg Annotations/fall_1293.xml +JPEGImages/fall_691.jpg Annotations/fall_691.xml +JPEGImages/fall_734.jpg Annotations/fall_734.xml +JPEGImages/fall_96.jpg Annotations/fall_96.xml +JPEGImages/fall_829.jpg Annotations/fall_829.xml +JPEGImages/fall_673.jpg Annotations/fall_673.xml +JPEGImages/fall_562.jpg Annotations/fall_562.xml +JPEGImages/fall_68.jpg Annotations/fall_68.xml +JPEGImages/fall_1366.jpg Annotations/fall_1366.xml +JPEGImages/fall_752.jpg Annotations/fall_752.xml +JPEGImages/fall_198.jpg Annotations/fall_198.xml +JPEGImages/fall_445.jpg Annotations/fall_445.xml +JPEGImages/fall_148.jpg Annotations/fall_148.xml +JPEGImages/fall_1319.jpg Annotations/fall_1319.xml +JPEGImages/fall_15.jpg Annotations/fall_15.xml +JPEGImages/fall_1251.jpg Annotations/fall_1251.xml +JPEGImages/fall_1055.jpg Annotations/fall_1055.xml +JPEGImages/fall_423.jpg Annotations/fall_423.xml +JPEGImages/fall_660.jpg Annotations/fall_660.xml +JPEGImages/fall_1150.jpg Annotations/fall_1150.xml +JPEGImages/fall_1229.jpg Annotations/fall_1229.xml +JPEGImages/fall_810.jpg Annotations/fall_810.xml +JPEGImages/fall_159.jpg Annotations/fall_159.xml +JPEGImages/fall_1339.jpg Annotations/fall_1339.xml +JPEGImages/fall_360.jpg Annotations/fall_360.xml +JPEGImages/fall_743.jpg Annotations/fall_743.xml +JPEGImages/fall_143.jpg Annotations/fall_143.xml +JPEGImages/fall_1063.jpg Annotations/fall_1063.xml +JPEGImages/fall_1033.jpg Annotations/fall_1033.xml +JPEGImages/fall_1338.jpg Annotations/fall_1338.xml +JPEGImages/fall_199.jpg Annotations/fall_199.xml +JPEGImages/fall_1008.jpg Annotations/fall_1008.xml +JPEGImages/fall_979.jpg Annotations/fall_979.xml +JPEGImages/fall_205.jpg Annotations/fall_205.xml +JPEGImages/fall_312.jpg Annotations/fall_312.xml +JPEGImages/fall_467.jpg Annotations/fall_467.xml +JPEGImages/fall_191.jpg Annotations/fall_191.xml +JPEGImages/fall_1056.jpg Annotations/fall_1056.xml +JPEGImages/fall_608.jpg Annotations/fall_608.xml +JPEGImages/fall_616.jpg Annotations/fall_616.xml +JPEGImages/fall_546.jpg Annotations/fall_546.xml +JPEGImages/fall_279.jpg Annotations/fall_279.xml +JPEGImages/fall_751.jpg Annotations/fall_751.xml +JPEGImages/fall_852.jpg Annotations/fall_852.xml +JPEGImages/fall_575.jpg Annotations/fall_575.xml +JPEGImages/fall_260.jpg Annotations/fall_260.xml +JPEGImages/fall_1201.jpg Annotations/fall_1201.xml +JPEGImages/fall_1369.jpg Annotations/fall_1369.xml +JPEGImages/fall_1253.jpg Annotations/fall_1253.xml +JPEGImages/fall_968.jpg Annotations/fall_968.xml +JPEGImages/fall_970.jpg Annotations/fall_970.xml +JPEGImages/fall_187.jpg Annotations/fall_187.xml +JPEGImages/fall_1167.jpg Annotations/fall_1167.xml +JPEGImages/fall_1275.jpg Annotations/fall_1275.xml +JPEGImages/fall_900.jpg Annotations/fall_900.xml +JPEGImages/fall_1224.jpg Annotations/fall_1224.xml +JPEGImages/fall_1075.jpg Annotations/fall_1075.xml +JPEGImages/fall_821.jpg Annotations/fall_821.xml +JPEGImages/fall_249.jpg Annotations/fall_249.xml +JPEGImages/fall_576.jpg Annotations/fall_576.xml +JPEGImages/fall_1185.jpg Annotations/fall_1185.xml +JPEGImages/fall_174.jpg Annotations/fall_174.xml +JPEGImages/fall_928.jpg Annotations/fall_928.xml +JPEGImages/fall_531.jpg Annotations/fall_531.xml +JPEGImages/fall_697.jpg Annotations/fall_697.xml +JPEGImages/fall_396.jpg Annotations/fall_396.xml +JPEGImages/fall_346.jpg Annotations/fall_346.xml +JPEGImages/fall_368.jpg Annotations/fall_368.xml +JPEGImages/fall_760.jpg Annotations/fall_760.xml +JPEGImages/fall_777.jpg Annotations/fall_777.xml +JPEGImages/fall_863.jpg Annotations/fall_863.xml +JPEGImages/fall_1138.jpg Annotations/fall_1138.xml +JPEGImages/fall_706.jpg Annotations/fall_706.xml +JPEGImages/fall_728.jpg Annotations/fall_728.xml +JPEGImages/fall_335.jpg Annotations/fall_335.xml +JPEGImages/fall_126.jpg Annotations/fall_126.xml +JPEGImages/fall_1347.jpg Annotations/fall_1347.xml +JPEGImages/fall_1351.jpg Annotations/fall_1351.xml +JPEGImages/fall_607.jpg Annotations/fall_607.xml +JPEGImages/fall_1398.jpg Annotations/fall_1398.xml +JPEGImages/fall_1340.jpg Annotations/fall_1340.xml +JPEGImages/fall_907.jpg Annotations/fall_907.xml +JPEGImages/fall_18.jpg Annotations/fall_18.xml +JPEGImages/fall_1013.jpg Annotations/fall_1013.xml +JPEGImages/fall_323.jpg Annotations/fall_323.xml +JPEGImages/fall_1216.jpg Annotations/fall_1216.xml +JPEGImages/fall_237.jpg Annotations/fall_237.xml +JPEGImages/fall_85.jpg Annotations/fall_85.xml +JPEGImages/fall_81.jpg Annotations/fall_81.xml +JPEGImages/fall_496.jpg Annotations/fall_496.xml +JPEGImages/fall_1345.jpg Annotations/fall_1345.xml +JPEGImages/fall_1029.jpg Annotations/fall_1029.xml +JPEGImages/fall_298.jpg Annotations/fall_298.xml +JPEGImages/fall_1003.jpg Annotations/fall_1003.xml +JPEGImages/fall_1376.jpg Annotations/fall_1376.xml +JPEGImages/fall_542.jpg Annotations/fall_542.xml +JPEGImages/fall_1203.jpg Annotations/fall_1203.xml +JPEGImages/fall_1125.jpg Annotations/fall_1125.xml +JPEGImages/fall_192.jpg Annotations/fall_192.xml +JPEGImages/fall_231.jpg Annotations/fall_231.xml +JPEGImages/fall_462.jpg Annotations/fall_462.xml +JPEGImages/fall_1127.jpg Annotations/fall_1127.xml +JPEGImages/fall_615.jpg Annotations/fall_615.xml +JPEGImages/fall_1358.jpg Annotations/fall_1358.xml +JPEGImages/fall_47.jpg Annotations/fall_47.xml +JPEGImages/fall_456.jpg Annotations/fall_456.xml +JPEGImages/fall_146.jpg Annotations/fall_146.xml +JPEGImages/fall_1069.jpg Annotations/fall_1069.xml +JPEGImages/fall_1057.jpg Annotations/fall_1057.xml +JPEGImages/fall_764.jpg Annotations/fall_764.xml +JPEGImages/fall_112.jpg Annotations/fall_112.xml +JPEGImages/fall_449.jpg Annotations/fall_449.xml +JPEGImages/fall_291.jpg Annotations/fall_291.xml +JPEGImages/fall_1280.jpg Annotations/fall_1280.xml +JPEGImages/fall_1390.jpg Annotations/fall_1390.xml +JPEGImages/fall_784.jpg Annotations/fall_784.xml +JPEGImages/fall_426.jpg Annotations/fall_426.xml +JPEGImages/fall_1036.jpg Annotations/fall_1036.xml +JPEGImages/fall_609.jpg Annotations/fall_609.xml +JPEGImages/fall_202.jpg Annotations/fall_202.xml +JPEGImages/fall_1183.jpg Annotations/fall_1183.xml +JPEGImages/fall_826.jpg Annotations/fall_826.xml +JPEGImages/fall_957.jpg Annotations/fall_957.xml +JPEGImages/fall_388.jpg Annotations/fall_388.xml +JPEGImages/fall_117.jpg Annotations/fall_117.xml +JPEGImages/fall_529.jpg Annotations/fall_529.xml +JPEGImages/fall_661.jpg Annotations/fall_661.xml +JPEGImages/fall_219.jpg Annotations/fall_219.xml +JPEGImages/fall_292.jpg Annotations/fall_292.xml +JPEGImages/fall_696.jpg Annotations/fall_696.xml +JPEGImages/fall_768.jpg Annotations/fall_768.xml +JPEGImages/fall_1135.jpg Annotations/fall_1135.xml +JPEGImages/fall_991.jpg Annotations/fall_991.xml +JPEGImages/fall_459.jpg Annotations/fall_459.xml +JPEGImages/fall_1270.jpg Annotations/fall_1270.xml +JPEGImages/fall_9.jpg Annotations/fall_9.xml +JPEGImages/fall_439.jpg Annotations/fall_439.xml +JPEGImages/fall_714.jpg Annotations/fall_714.xml +JPEGImages/fall_28.jpg Annotations/fall_28.xml +JPEGImages/fall_1133.jpg Annotations/fall_1133.xml +JPEGImages/fall_1334.jpg Annotations/fall_1334.xml +JPEGImages/fall_1287.jpg Annotations/fall_1287.xml +JPEGImages/fall_211.jpg Annotations/fall_211.xml +JPEGImages/fall_778.jpg Annotations/fall_778.xml +JPEGImages/fall_1160.jpg Annotations/fall_1160.xml +JPEGImages/fall_816.jpg Annotations/fall_816.xml +JPEGImages/fall_1111.jpg Annotations/fall_1111.xml diff --git a/val_list.txt b/val_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..daa2142049f4a30b2e25a526cb56a7ec2092cbc4 --- /dev/null +++ b/val_list.txt @@ -0,0 +1,288 @@ +JPEGImages/fall_1102.jpg Annotations/fall_1102.xml +JPEGImages/fall_505.jpg Annotations/fall_505.xml +JPEGImages/fall_27.jpg Annotations/fall_27.xml +JPEGImages/fall_341.jpg Annotations/fall_341.xml +JPEGImages/fall_230.jpg Annotations/fall_230.xml +JPEGImages/fall_1426.jpg Annotations/fall_1426.xml +JPEGImages/fall_1163.jpg Annotations/fall_1163.xml +JPEGImages/fall_1349.jpg Annotations/fall_1349.xml +JPEGImages/fall_147.jpg Annotations/fall_147.xml +JPEGImages/fall_277.jpg Annotations/fall_277.xml +JPEGImages/fall_216.jpg Annotations/fall_216.xml +JPEGImages/fall_417.jpg Annotations/fall_417.xml +JPEGImages/fall_17.jpg Annotations/fall_17.xml +JPEGImages/fall_925.jpg Annotations/fall_925.xml +JPEGImages/fall_803.jpg Annotations/fall_803.xml +JPEGImages/fall_591.jpg Annotations/fall_591.xml +JPEGImages/fall_1097.jpg Annotations/fall_1097.xml +JPEGImages/fall_1306.jpg Annotations/fall_1306.xml +JPEGImages/fall_1285.jpg Annotations/fall_1285.xml +JPEGImages/fall_7.jpg Annotations/fall_7.xml +JPEGImages/fall_265.jpg Annotations/fall_265.xml +JPEGImages/fall_726.jpg Annotations/fall_726.xml +JPEGImages/fall_880.jpg Annotations/fall_880.xml +JPEGImages/fall_215.jpg Annotations/fall_215.xml +JPEGImages/fall_60.jpg Annotations/fall_60.xml +JPEGImages/fall_104.jpg Annotations/fall_104.xml +JPEGImages/fall_103.jpg Annotations/fall_103.xml +JPEGImages/fall_74.jpg Annotations/fall_74.xml +JPEGImages/fall_485.jpg Annotations/fall_485.xml +JPEGImages/fall_674.jpg Annotations/fall_674.xml +JPEGImages/fall_1395.jpg Annotations/fall_1395.xml +JPEGImages/fall_13.jpg Annotations/fall_13.xml +JPEGImages/fall_687.jpg Annotations/fall_687.xml +JPEGImages/fall_758.jpg Annotations/fall_758.xml +JPEGImages/fall_953.jpg Annotations/fall_953.xml +JPEGImages/fall_890.jpg Annotations/fall_890.xml +JPEGImages/fall_201.jpg Annotations/fall_201.xml +JPEGImages/fall_189.jpg Annotations/fall_189.xml +JPEGImages/fall_532.jpg Annotations/fall_532.xml +JPEGImages/fall_1441.jpg Annotations/fall_1441.xml +JPEGImages/fall_170.jpg Annotations/fall_170.xml +JPEGImages/fall_392.jpg Annotations/fall_392.xml +JPEGImages/fall_1327.jpg Annotations/fall_1327.xml +JPEGImages/fall_245.jpg Annotations/fall_245.xml +JPEGImages/fall_1158.jpg Annotations/fall_1158.xml +JPEGImages/fall_595.jpg Annotations/fall_595.xml +JPEGImages/fall_495.jpg Annotations/fall_495.xml +JPEGImages/fall_654.jpg Annotations/fall_654.xml +JPEGImages/fall_667.jpg Annotations/fall_667.xml +JPEGImages/fall_1256.jpg Annotations/fall_1256.xml +JPEGImages/fall_795.jpg Annotations/fall_795.xml +JPEGImages/fall_381.jpg Annotations/fall_381.xml +JPEGImages/fall_735.jpg Annotations/fall_735.xml +JPEGImages/fall_390.jpg Annotations/fall_390.xml +JPEGImages/fall_1178.jpg Annotations/fall_1178.xml +JPEGImages/fall_72.jpg Annotations/fall_72.xml +JPEGImages/fall_1359.jpg Annotations/fall_1359.xml +JPEGImages/fall_658.jpg Annotations/fall_658.xml +JPEGImages/fall_1328.jpg Annotations/fall_1328.xml +JPEGImages/fall_1364.jpg Annotations/fall_1364.xml +JPEGImages/fall_581.jpg Annotations/fall_581.xml +JPEGImages/fall_867.jpg Annotations/fall_867.xml +JPEGImages/fall_736.jpg Annotations/fall_736.xml +JPEGImages/fall_540.jpg Annotations/fall_540.xml +JPEGImages/fall_108.jpg Annotations/fall_108.xml +JPEGImages/fall_1255.jpg Annotations/fall_1255.xml +JPEGImages/fall_1202.jpg Annotations/fall_1202.xml +JPEGImages/fall_430.jpg Annotations/fall_430.xml +JPEGImages/fall_980.jpg Annotations/fall_980.xml +JPEGImages/fall_34.jpg Annotations/fall_34.xml +JPEGImages/fall_870.jpg Annotations/fall_870.xml +JPEGImages/fall_1299.jpg Annotations/fall_1299.xml +JPEGImages/fall_556.jpg Annotations/fall_556.xml +JPEGImages/fall_744.jpg Annotations/fall_744.xml +JPEGImages/fall_771.jpg Annotations/fall_771.xml +JPEGImages/fall_666.jpg Annotations/fall_666.xml +JPEGImages/fall_107.jpg Annotations/fall_107.xml +JPEGImages/fall_1279.jpg Annotations/fall_1279.xml +JPEGImages/fall_1017.jpg Annotations/fall_1017.xml +JPEGImages/fall_213.jpg Annotations/fall_213.xml +JPEGImages/fall_444.jpg Annotations/fall_444.xml +JPEGImages/fall_1073.jpg Annotations/fall_1073.xml +JPEGImages/fall_499.jpg Annotations/fall_499.xml +JPEGImages/fall_898.jpg Annotations/fall_898.xml +JPEGImages/fall_57.jpg Annotations/fall_57.xml +JPEGImages/fall_537.jpg Annotations/fall_537.xml +JPEGImages/fall_914.jpg Annotations/fall_914.xml +JPEGImages/fall_140.jpg Annotations/fall_140.xml +JPEGImages/fall_827.jpg Annotations/fall_827.xml +JPEGImages/fall_503.jpg Annotations/fall_503.xml +JPEGImages/fall_352.jpg Annotations/fall_352.xml +JPEGImages/fall_410.jpg Annotations/fall_410.xml +JPEGImages/fall_238.jpg Annotations/fall_238.xml +JPEGImages/fall_1110.jpg Annotations/fall_1110.xml +JPEGImages/fall_322.jpg Annotations/fall_322.xml +JPEGImages/fall_725.jpg Annotations/fall_725.xml +JPEGImages/fall_186.jpg Annotations/fall_186.xml +JPEGImages/fall_24.jpg Annotations/fall_24.xml +JPEGImages/fall_678.jpg Annotations/fall_678.xml +JPEGImages/fall_510.jpg Annotations/fall_510.xml +JPEGImages/fall_1380.jpg Annotations/fall_1380.xml +JPEGImages/fall_270.jpg Annotations/fall_270.xml +JPEGImages/fall_1314.jpg Annotations/fall_1314.xml +JPEGImages/fall_1171.jpg Annotations/fall_1171.xml +JPEGImages/fall_739.jpg Annotations/fall_739.xml +JPEGImages/fall_579.jpg Annotations/fall_579.xml +JPEGImages/fall_1061.jpg Annotations/fall_1061.xml +JPEGImages/fall_239.jpg Annotations/fall_239.xml +JPEGImages/fall_1001.jpg Annotations/fall_1001.xml +JPEGImages/fall_1215.jpg Annotations/fall_1215.xml +JPEGImages/fall_618.jpg Annotations/fall_618.xml +JPEGImages/fall_600.jpg Annotations/fall_600.xml +JPEGImages/fall_1230.jpg Annotations/fall_1230.xml +JPEGImages/fall_841.jpg Annotations/fall_841.xml +JPEGImages/fall_254.jpg Annotations/fall_254.xml +JPEGImages/fall_675.jpg Annotations/fall_675.xml +JPEGImages/fall_551.jpg Annotations/fall_551.xml +JPEGImages/fall_1311.jpg Annotations/fall_1311.xml +JPEGImages/fall_722.jpg Annotations/fall_722.xml +JPEGImages/fall_712.jpg Annotations/fall_712.xml +JPEGImages/fall_1289.jpg Annotations/fall_1289.xml +JPEGImages/fall_195.jpg Annotations/fall_195.xml +JPEGImages/fall_64.jpg Annotations/fall_64.xml +JPEGImages/fall_331.jpg Annotations/fall_331.xml +JPEGImages/fall_1194.jpg Annotations/fall_1194.xml +JPEGImages/fall_1234.jpg Annotations/fall_1234.xml +JPEGImages/fall_1169.jpg Annotations/fall_1169.xml +JPEGImages/fall_646.jpg Annotations/fall_646.xml +JPEGImages/fall_1304.jpg Annotations/fall_1304.xml +JPEGImages/fall_624.jpg Annotations/fall_624.xml +JPEGImages/fall_1104.jpg Annotations/fall_1104.xml +JPEGImages/fall_1162.jpg Annotations/fall_1162.xml +JPEGImages/fall_244.jpg Annotations/fall_244.xml +JPEGImages/fall_1273.jpg Annotations/fall_1273.xml +JPEGImages/fall_196.jpg Annotations/fall_196.xml +JPEGImages/fall_990.jpg Annotations/fall_990.xml +JPEGImages/fall_872.jpg Annotations/fall_872.xml +JPEGImages/fall_1050.jpg Annotations/fall_1050.xml +JPEGImages/fall_568.jpg Annotations/fall_568.xml +JPEGImages/fall_1052.jpg Annotations/fall_1052.xml +JPEGImages/fall_431.jpg Annotations/fall_431.xml +JPEGImages/fall_1188.jpg Annotations/fall_1188.xml +JPEGImages/fall_433.jpg Annotations/fall_433.xml +JPEGImages/fall_874.jpg Annotations/fall_874.xml +JPEGImages/fall_46.jpg Annotations/fall_46.xml +JPEGImages/fall_1165.jpg Annotations/fall_1165.xml +JPEGImages/fall_365.jpg Annotations/fall_365.xml +JPEGImages/fall_166.jpg Annotations/fall_166.xml +JPEGImages/fall_394.jpg Annotations/fall_394.xml +JPEGImages/fall_251.jpg Annotations/fall_251.xml +JPEGImages/fall_860.jpg Annotations/fall_860.xml +JPEGImages/fall_316.jpg Annotations/fall_316.xml +JPEGImages/fall_564.jpg Annotations/fall_564.xml +JPEGImages/fall_1014.jpg Annotations/fall_1014.xml +JPEGImages/fall_1137.jpg Annotations/fall_1137.xml +JPEGImages/fall_952.jpg Annotations/fall_952.xml +JPEGImages/fall_1175.jpg Annotations/fall_1175.xml +JPEGImages/fall_208.jpg Annotations/fall_208.xml +JPEGImages/fall_1177.jpg Annotations/fall_1177.xml +JPEGImages/fall_168.jpg Annotations/fall_168.xml +JPEGImages/fall_150.jpg Annotations/fall_150.xml +JPEGImages/fall_954.jpg Annotations/fall_954.xml +JPEGImages/fall_475.jpg Annotations/fall_475.xml +JPEGImages/fall_763.jpg Annotations/fall_763.xml +JPEGImages/fall_769.jpg Annotations/fall_769.xml +JPEGImages/fall_84.jpg Annotations/fall_84.xml +JPEGImages/fall_679.jpg Annotations/fall_679.xml +JPEGImages/fall_521.jpg Annotations/fall_521.xml +JPEGImages/fall_583.jpg Annotations/fall_583.xml +JPEGImages/fall_287.jpg Annotations/fall_287.xml +JPEGImages/fall_59.jpg Annotations/fall_59.xml +JPEGImages/fall_383.jpg Annotations/fall_383.xml +JPEGImages/fall_594.jpg Annotations/fall_594.xml +JPEGImages/fall_685.jpg Annotations/fall_685.xml +JPEGImages/fall_853.jpg Annotations/fall_853.xml +JPEGImages/fall_1425.jpg Annotations/fall_1425.xml +JPEGImages/fall_896.jpg Annotations/fall_896.xml +JPEGImages/fall_530.jpg Annotations/fall_530.xml +JPEGImages/fall_601.jpg Annotations/fall_601.xml +JPEGImages/fall_868.jpg Annotations/fall_868.xml +JPEGImages/fall_693.jpg Annotations/fall_693.xml +JPEGImages/fall_175.jpg Annotations/fall_175.xml +JPEGImages/fall_61.jpg Annotations/fall_61.xml +JPEGImages/fall_772.jpg Annotations/fall_772.xml +JPEGImages/fall_1428.jpg Annotations/fall_1428.xml +JPEGImages/fall_345.jpg Annotations/fall_345.xml +JPEGImages/fall_1332.jpg Annotations/fall_1332.xml +JPEGImages/fall_1252.jpg Annotations/fall_1252.xml +JPEGImages/fall_123.jpg Annotations/fall_123.xml +JPEGImages/fall_1181.jpg Annotations/fall_1181.xml +JPEGImages/fall_478.jpg Annotations/fall_478.xml +JPEGImages/fall_29.jpg Annotations/fall_29.xml +JPEGImages/fall_589.jpg Annotations/fall_589.xml +JPEGImages/fall_16.jpg Annotations/fall_16.xml +JPEGImages/fall_429.jpg Annotations/fall_429.xml +JPEGImages/fall_1174.jpg Annotations/fall_1174.xml +JPEGImages/fall_1032.jpg Annotations/fall_1032.xml +JPEGImages/fall_636.jpg Annotations/fall_636.xml +JPEGImages/fall_1131.jpg Annotations/fall_1131.xml +JPEGImages/fall_177.jpg Annotations/fall_177.xml +JPEGImages/fall_982.jpg Annotations/fall_982.xml +JPEGImages/fall_1187.jpg Annotations/fall_1187.xml +JPEGImages/fall_161.jpg Annotations/fall_161.xml +JPEGImages/fall_1282.jpg Annotations/fall_1282.xml +JPEGImages/fall_783.jpg Annotations/fall_783.xml +JPEGImages/fall_1337.jpg Annotations/fall_1337.xml +JPEGImages/fall_669.jpg Annotations/fall_669.xml +JPEGImages/fall_0.jpg Annotations/fall_0.xml +JPEGImages/fall_119.jpg Annotations/fall_119.xml +JPEGImages/fall_480.jpg Annotations/fall_480.xml +JPEGImages/fall_895.jpg Annotations/fall_895.xml +JPEGImages/fall_730.jpg Annotations/fall_730.xml +JPEGImages/fall_659.jpg Annotations/fall_659.xml +JPEGImages/fall_960.jpg Annotations/fall_960.xml +JPEGImages/fall_866.jpg Annotations/fall_866.xml +JPEGImages/fall_364.jpg Annotations/fall_364.xml +JPEGImages/fall_732.jpg Annotations/fall_732.xml +JPEGImages/fall_1356.jpg Annotations/fall_1356.xml +JPEGImages/fall_941.jpg Annotations/fall_941.xml +JPEGImages/fall_1225.jpg Annotations/fall_1225.xml +JPEGImages/fall_73.jpg Annotations/fall_73.xml +JPEGImages/fall_779.jpg Annotations/fall_779.xml +JPEGImages/fall_1020.jpg Annotations/fall_1020.xml +JPEGImages/fall_1147.jpg Annotations/fall_1147.xml +JPEGImages/fall_236.jpg Annotations/fall_236.xml +JPEGImages/fall_995.jpg Annotations/fall_995.xml +JPEGImages/fall_468.jpg Annotations/fall_468.xml +JPEGImages/fall_50.jpg Annotations/fall_50.xml +JPEGImages/fall_682.jpg Annotations/fall_682.xml +JPEGImages/fall_1401.jpg Annotations/fall_1401.xml +JPEGImages/fall_671.jpg Annotations/fall_671.xml +JPEGImages/fall_226.jpg Annotations/fall_226.xml +JPEGImages/fall_55.jpg Annotations/fall_55.xml +JPEGImages/fall_832.jpg Annotations/fall_832.xml +JPEGImages/fall_255.jpg Annotations/fall_255.xml +JPEGImages/fall_350.jpg Annotations/fall_350.xml +JPEGImages/fall_1266.jpg Annotations/fall_1266.xml +JPEGImages/fall_1350.jpg Annotations/fall_1350.xml +JPEGImages/fall_1329.jpg Annotations/fall_1329.xml +JPEGImages/fall_274.jpg Annotations/fall_274.xml +JPEGImages/fall_750.jpg Annotations/fall_750.xml +JPEGImages/fall_256.jpg Annotations/fall_256.xml +JPEGImages/fall_359.jpg Annotations/fall_359.xml +JPEGImages/fall_541.jpg Annotations/fall_541.xml +JPEGImages/fall_558.jpg Annotations/fall_558.xml +JPEGImages/fall_614.jpg Annotations/fall_614.xml +JPEGImages/fall_922.jpg Annotations/fall_922.xml +JPEGImages/fall_938.jpg Annotations/fall_938.xml +JPEGImages/fall_894.jpg Annotations/fall_894.xml +JPEGImages/fall_786.jpg Annotations/fall_786.xml +JPEGImages/fall_349.jpg Annotations/fall_349.xml +JPEGImages/fall_1383.jpg Annotations/fall_1383.xml +JPEGImages/fall_138.jpg Annotations/fall_138.xml +JPEGImages/fall_965.jpg Annotations/fall_965.xml +JPEGImages/fall_1005.jpg Annotations/fall_1005.xml +JPEGImages/fall_1189.jpg Annotations/fall_1189.xml +JPEGImages/fall_1048.jpg Annotations/fall_1048.xml +JPEGImages/fall_1214.jpg Annotations/fall_1214.xml +JPEGImages/fall_1106.jpg Annotations/fall_1106.xml +JPEGImages/fall_891.jpg Annotations/fall_891.xml +JPEGImages/fall_407.jpg Annotations/fall_407.xml +JPEGImages/fall_857.jpg Annotations/fall_857.xml +JPEGImages/fall_2.jpg Annotations/fall_2.xml +JPEGImages/fall_742.jpg Annotations/fall_742.xml +JPEGImages/fall_1259.jpg Annotations/fall_1259.xml +JPEGImages/fall_1335.jpg Annotations/fall_1335.xml +JPEGImages/fall_437.jpg Annotations/fall_437.xml +JPEGImages/fall_955.jpg Annotations/fall_955.xml +JPEGImages/fall_1059.jpg Annotations/fall_1059.xml +JPEGImages/fall_1363.jpg Annotations/fall_1363.xml +JPEGImages/fall_1431.jpg Annotations/fall_1431.xml +JPEGImages/fall_77.jpg Annotations/fall_77.xml +JPEGImages/fall_288.jpg Annotations/fall_288.xml +JPEGImages/fall_1019.jpg Annotations/fall_1019.xml +JPEGImages/fall_574.jpg Annotations/fall_574.xml +JPEGImages/fall_1021.jpg Annotations/fall_1021.xml +JPEGImages/fall_88.jpg Annotations/fall_88.xml +JPEGImages/fall_1283.jpg Annotations/fall_1283.xml +JPEGImages/fall_1212.jpg Annotations/fall_1212.xml +JPEGImages/fall_788.jpg Annotations/fall_788.xml +JPEGImages/fall_338.jpg Annotations/fall_338.xml +JPEGImages/fall_382.jpg Annotations/fall_382.xml +JPEGImages/fall_347.jpg Annotations/fall_347.xml +JPEGImages/fall_252.jpg Annotations/fall_252.xml +JPEGImages/fall_1219.jpg Annotations/fall_1219.xml +JPEGImages/fall_1227.jpg Annotations/fall_1227.xml +JPEGImages/fall_839.jpg Annotations/fall_839.xml +JPEGImages/fall_1235.jpg Annotations/fall_1235.xml diff --git a/voc.yml b/voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..e444adf784f3ed060080d6ab81a0fedbae8cd74a --- /dev/null +++ b/voc.yml @@ -0,0 +1,21 @@ +metric: VOC +map_type: 11point +num_classes: 1 + +TrainDataset: + name: VOCDataSet + dataset_dir: ./model/ + anno_path: train_list.txt + label_list: labels.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +EvalDataset: + name: VOCDataSet + dataset_dir: ./model/ + anno_path: val_list.txt + label_list: labels.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +TestDataset: + name: ImageFolder + anno_path: ../model/labels.txt diff --git a/yolov3_darknet53.yml b/yolov3_darknet53.yml new file mode 100644 index 0000000000000000000000000000000000000000..8341f740a651edaf272153296b970206322e1cf5 --- /dev/null +++ b/yolov3_darknet53.yml @@ -0,0 +1,40 @@ +architecture: YOLOv3 +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 diff --git a/yolov3_darknet53_270e_voc.yml b/yolov3_darknet53_270e_voc.yml new file mode 100644 index 0000000000000000000000000000000000000000..c9d94f87ccbdbf7a58601cef379ccfae02cfa743 --- /dev/null +++ b/yolov3_darknet53_270e_voc.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../model/voc.yml', + '../model/runtime.yml', + '../model/optimizer_270e.yml', + '../model/yolov3_darknet53.yml', + '../model/yolov3_reader.yml', +] + +snapshot_epoch: 5 +weights: ../model/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams + + +# set collate_batch to false because ground-truth info is needed +# on voc dataset and should not collate data in batch when batch size +# is larger than 1. +EvalReader: + collate_batch: false + + +# ### remove comment below and run evaluate again to get 56.1 COCO for mAP(0.5:0.95) +# metric: COCO +# EvalDataset: +# !COCODataSet +# image_dir: VOCdevkit/VOC2007/JPEGImages +# anno_path: voc_test.json +# dataset_dir: dataset/voc +# # wget https://bj.bcebos.com/v1/paddledet/data/voc.zip diff --git a/yolov3_reader.yml b/yolov3_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..5dab6742b120a68ea76599b911567ee753b68253 --- /dev/null +++ b/yolov3_reader.yml @@ -0,0 +1,44 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + mixup_epoch: 250 + use_shared_memory: true + +EvalReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1